Download essay

Document related concepts

Birkhoff's representation theorem wikipedia , lookup

Congruence lattice problem wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Transcript
Szemerédi’s Theorem via Ergodic Theory
Jonathan Hickman
January 25, 2016
Abstract
This essay investigates Furstenberg’s proof of Szemerédi’s Theorem. The necessary concepts and results from ergodic theory are introduced, including the Poincaré and Mean Ergodic
Theorems which are proved in full. The Ergodic Decomposition Theorem is also discussed.
Furstenberg’s Multiple Recurrence Theorem is then stated and shown to imply Szemerédi’s
Theorem. The remainder of the essay concentrates on proving the Multiple Recurrence Theorem, following the method laid out in [10]. As part of this proof the notions of conditional
expectation and measure are introduced and discussed in some detail.
1
Introduction
This essay is concerned with exploring whether certain subsets of the integers contain arithmetic
progressions. Recall that an (integer) arithmetic progression is a set {a + jb}k−1
j=0 ⊂ Z where
k ∈ N≥1 is the length of the progression; a ∈ Z and b ∈ N≥1 . Usually we will not be concerned
with the values of the starting point a and step-size b, we merely want to know if given a set
Λ ⊆ Z and k ∈ N there exists a progression of length k within Λ. In 1927 van der Waerden
proved the following fundamental theorem:
Theorem 1.1 (van der Waerden). Suppose A1 , . . . , An ⊆ Z are disjoint sets which partition
the integers,
Z = A1 t A2 t · · · t An
Then there exists some l ∈ {1, . . . , n} such that Al contains arithmetic progressions of arbitrary
length.
We will not present a proof of Theorem 1.1 here1 , rather remark that when studying questions
pertaining to the presence of arithmetic progressions, it transpires that the idea of partitioning
the integers into a finite number of sets (or ‘colouring’ the integers, as it is often termed in such
combinatorial contexts) is something of a red herring. A more general approach is to consider
the ‘density’ of a set of integers.
Definition 1. Let Λ ⊆ Z. We define the upper Banach density of Λ to be
|Λ ∩ [M, N )|
d¯B (Λ) := lim sup
N −M
N −M →∞
In 1936 Erdös and Turán [6] conjectured the stronger statement that any subset of Z with
positive upper Banach density contains arithmetic progressions of arbitrary length. Proving this
conjecture was no easy task: indeed, it was not until 1952 that the first non-trivial case, existence
of arithmetic progressions of length k = 3, was shown by Roth [20] who used techniques form
harmonic analysis. Szemerédi [22] then proved the case k = 4 in 1969 by appling combinatorial
methods and finally the general case was proven, again by Szemerd́i [23], in 1975. The result
now bears his name.
1
A short proof of van der Waerden’s Theorem can be found in [5, 11]. Alternatively, it follows directly from
Szemerédi’s Theorem, that is Theorem 1.2 below.
1
Theorem 1.2 (Szemerédi’s Theorem). Let Λ ⊆ Z be a set of positive upper Banach density.
Then Λ contains arithmetic progressions of arbitrary length.
Szemerédi’s proof of this theorem is long and difficult and we will not concern ourselves with
it here. Instead, we investigate an alternative, more accessible proof given by Furstenberg. Soon
after Szemerédi’s proof was published, Furstenberg noticed Theorem 1.2 would follow from a
substantial generalisation of the classical Poincaré Recurrence Theorem from ergodic theory.
He subsequently proved this generalisation, now known as Furstenberg’s Multiple Recurrence
Theorem, and provided a scheme, the Correspondence Principle, for relating problems in combinatorial number theory to problems in ergodic theory [7].
The importance of Furstenberg’s work is arguably two-fold: first of all the Multiple Recurrence Theorem is itself a substantial result in ergodic theory. Secondly, the correspondence
principle provides an effective tool for tackling problems in number theory. Some results, such
as the multidimensional analogue of Szemerédi’s Theorem [9], have only been proven through
the use of the ergodic theoretical methods laid out by Furstenberg.
This document presents the proof of Szemerédi’s Theorem as given in Furstenberg, Katznelson and Ornstein’s article [10] assuming the reader has no prior knowledge of ergodic theory
(although some familiarity with elementary measure theory and functional analysis). In order
to do this we will introduce the necessary definitions and results from ergodic theory and get a
flavour of what the discipline entails. We then proceed to state Furstenberg’s Multiple Recurrence Theorem and show how this implies Szemerédi’s Theorem. The bulk of the essay then
attempts to explain the proof of the Multiple Recurrence Theorem.
2
Upper Banach and Asymptotic Density
We begin by briefly discussing notions of density of a subset of the integers. The results proved
here will not be required until Section 4 and they can thus be considered something of a prelude
to the mathematics described later in the essay.
It is easy to deduce from Definition 1 that if a set Λ ⊆ Z has upper Banach density d¯B (Λ)
then there exists a sequence of intervals [an , bn ) with bn − an → ∞ as n → ∞ such that
|Λ ∩ [an , bn )|
d¯B (Λ) = lim
n→∞
bn − an
(2.1)
If we enforce the stronger condition that (2.1) must hold for [an , bn ) = [1, n + 1) for all n ∈ N
then we arrive at the stronger notion of asymptotic density.
Definition 2. For Λ ⊆ N define the lower and upper asymptotic densities as
d(Λ) = lim inf
N →∞
and
|Λ ∩ [1, N ]|
N
|Λ ∩ [1, N ]|
¯
d(Λ)
= lim sup
N
N →∞
¯
respectively. If d(Λ) = d(Λ)
then we define the asymptotic density as
d(Λ) = lim
N →∞
|Λ ∩ [1, N ]|
N
¯
so that d(Λ) = d(Λ) = d(Λ).
Obviously any subset Λ ⊂ N of positive upper asymptotic density is automatically of positive
upper Banach density.
Example 1.
1. Trivially, any finite set F ⊂ N has d(F ) = 0.
2. For any a, k ∈ N the arithmetic sequence Λ = a + kN has density d(Λ) = k1 .
2
An example which is only slightly less trivial, though important, is that of a syndetic set [5].
Definition 3. We say that a subset K ⊆ N is syndetic if there exists some a ∈ N such that
a−1
[
K −i=N
(2.2)
i=0
where K − i = {n ∈ N : n + i ∈ K}. This is equivalent to the property that for every k ∈ N we
have {k, k + 1, . . . , k + a − 1} ∩ K 6= ∅, so syndetic therefore are sets with ‘bounded gaps’.
It is easy to see a syndetic set K ⊆ N has positive lower density. For suppose a ∈ N satisfies
(2.2). Then for each k ∈ N the interval ((k − 1)a, ka] contains at least one element of K. Let
N ∈ N with N ≥ a. By writing N = ka + r where k ∈ N and 0 ≤ r < a observe
k
k
1
|K ∩ [1, N ]| X |K ∩ ((j − 1)a, ja]| |K ∩ (ka, ka + r]|
=
+
≥
>
>0
N
N
N
ka
+
r
2a
j=1
and so d(K) > 0 as required.
A significant result concerning asymptotic density is a theorem of Koopman and von Neumann, adapted here from ([5], Chapter 2; [1]), which relates asymptotic density and Cesàro
summation of sequences. As we shall see, Cesàro sums play a dominant rôle in ergodic theory.
The following lemma effectively tells us that the behaviour of a sequence on terms belonging to
a set of zero density is ‘averaged out’ when taking the Cesàro mean.
Theorem 2.1. Let (an )n∈N ∈ l∞ (R≥0 ) be a bounded sequence of non-negative real numbers.
Then the following are equivalent
1. The Cesàro Mean of (an )n∈N is 0:
N
1 X
an = 0
N →∞ N
n=1
lim
2. There exists some set Z ⊂ N of zero density such that
lim a
n→∞ n
n∈Z
/
=0
Before presenting the proof of Theorem 2.1 we note the following corollary, which is an immediately consequence. This corollary will be repeatedly exploited in the proceeding discourse.
Corollary 2.2. Let (an )n∈N ∈ l∞ (R≥0 ) be a bounded sequence of non-negative real numbers.
Then
N
N
1 X
1 X 2
lim
an = 0 if and only if
lim
an = 0
N →∞ N
N →∞ N
n=1
n=1
We now turn to the proof of Theorem 2.1.
Proof (of Theorem 2.1). We first assume 1. Define for each k ∈ N≥1 the set
1
Zk := j ∈ N : aj ≥
k
Note that the Zk form an increasing chain Z1 ⊆ Z2 ⊆ . . . . Furthermore, fixing k ∈ N,
|Zk ∩ [1, N ]|
k
≤
N
N
X
n∈Zk ∩[1,N ]
3
aj ≤
N
k X
an
N n=1
Thus, by taking the limit a N → ∞ each Zk is a set of zero density. Moreover, there exists an
increasing sequence 1 < N1 < N2 <, . . . such that
1
|Zk ∩ [1, N ]|
<
N
k
Now define
Z=
for all N ≥ Nk and k ∈ N≥1
∞
[
Zk ∩ [Nk−1 , Nk )
k=1
Note that if N ∈ {Nk0 −1 , . . . , Nk0 − 1} for some k0 ∈ N then, as the Zk form an increasing
chain, Z ∩ [1, N ] = Zk0 ∩ [1, N ]. Thus
|Z ∩ [1, N ]|
|Zk0 ∩ [1, N ]|
1
≤
<
N
N
k0
By taking limits we conclude Z is a set of zero density. Furthermore, if n ∈ N \ Z and n ≥ Nk
then by the definition of Z we must have n ∈
/ Zk and so an < k1 . Hence it is easy to see that
lim a
n→∞ n
n∈Z
/
=0
as required.
Conversely, suppose 2 holds. Then for any > 0 there exists N1 ∈ N such that
for all n ≥ N1 , n ∈
/Z
an <
3
Let M > 0 be an upperbound on the sequence (an )n∈N . As Z is a set of zero density there
exists N2 ∈ N such that
|Z ∩ [1, N ]|
<
for all N ≥ N2
N
3M
1 −1)
< 3 . Then for all N ≥ N0 we have
Finally, let N0 > max{N1 , N2 } be such that M (N
N0
#
" N −1
N
N
N
1
X
X
1 X
1 X
an
an =
an +
an +
N n=1
N n=1
n=N1
n∈Z
≤
n=N1
n∈Z
/
N
M (N1 − 1) M |Z ∩ [1, N ]|
1 X
+
+
an < N
N
N
n=N1
n∈Z
/
and we are done.
Relating combinatorial concepts (such as asymptotic or upper Banach density) to notions of
averaging (such as Cesàro summation or measure) is a major theme of this study and we will
return to it in Section 4 when we discuss the connections between ergodic theory and Szemerédi’s
theorem. Before then, however, we take some time to introduce the fundamentals of ergodic
theory.
3
Elementary Ergodic Theory
This section acts as a brief introduction to ergodic theory. It is stressed that this survey
is far from comprehensive2 ; indeed it was the author’s intention to minimise the number of
prerequisites needed in proving Szemerédi’s Theorem and for the most part we focus only on
introducing these prerequisites. The reader is encouraged to consult other sources3 for a broader
introduction to this field.
2
3
For instance, one conspicuous omission is Birkhoff’s Pointwise Ergodic Theorem.
For example [5, 13, 17, 19, 21].
4
3.1
Basic Definitions
Ergodic theory belongs to the broader discipline of the qualitative study of dynamical systems.
A dynamical system is essentially consists of a ‘state space’ X and transformation T : X → X
and one wishes to determine to some extent the behaviour of X under iterations of T . In the
case of ergodic theory the state space and transformation are endowed with measure theoretic
structure; this is natural in the context of analysing the long-term average behaviour of systems
[19]. The fundamental objects of study are measure preserving systems.
Definition 4. Let (X, B, µ) be a measure space. Suppose T : X → X is measurable and for
all A ∈ B we have µ(T −1 A) = µ(A). Then we say T is a measure preserving transformation
on (X, B, µ). Furthermore, the pair (X, B, µ, T ) consisting of the space and transformation is
called a measure preserving system.
We say (X, B, µ, T ) is invertible if the transformation T is invertible with measurable inverse.
Remark 1. In some instances it is convenient to change the perspective somewhat: given a
measurable space (X, B) and measurable function T : X → X we say a measure µ on (X, B)
is T -invariant if T is a measure preserving transformation on (X, B, µ).
The study of these objects arose from attempts to model certain physical systems in applied mathematics, particularly statistical mechanics (see, for instance [3]). Since then a rich
and varied theory of measure preserving systems has been developed with applications in pure
mathematics, not least combinatorial number theory and Furstenberg’s proof of Szemerédi’s
Theorem.
3.1.1
Examples of Measure Preserving Systems
This discussion will only consider a few simple examples of measure preserving systems. Although practical in the context of this study, our examples do not reflect the true diversity of
potential systems and are not representative of ergodic theory as a whole4 .
Example 2 (Rotations of the Circle). Consider the circle T := R/Z with the usual Lebesgue
measure. Then for any τ ∈ R the rotation map
Rτ : T → T
Rτ : x 7→ x + τ
is clearly measurable with measurable inverse R−τ . We claim this map is measure preserving.
Indeed, by a simple application of Dynkin’s Lemma as Λ = {A ∈ B : µ(Rτ−1 (A)) = µ(A)} is
a d-system5 we only need check µ(Rτ−1 [a, b)) = µ([a, b)) for all intervals [a, b) ∈ T. But this is
obvious.
In studying a measure preserving system (X, B, µ, T ) we wish to determine qualitatively
how the space (X, B, µ) behaves under repeated applications of the transformation T . If we
wish to investigate the behaviour of a single point x ∈ X we define its orbit {T n x : n ∈ N} and
study this set. Even in the simple case of circle rotation we can make non-trivial conclusions
relating to the qualitative properties of the dynamics.
The following result is adapted from ([21], Chapter 3).
Proposition 3.1.
subset of T.
1. If τ is rational then the orbit of any point x ∈ T under Rτ is a finite
4
There is an abundance of examples of measure preserving systems to be found within the literature. For example,
see [19] which give a much better idea of the scope of ergodic theory.
5
Recall, Λ is a d-system on T if:
1. T ∈ Λ
2. For all A, B ∈ Λ such that A ⊆ B we have B \ A ∈ Λ
3. If (An )n∈N ⊆ Λ is an increasing sequence of sets, that is A1 ⊆ A2 ⊆ . . . , then
All these properties are easily verified in this case.
5
S
n∈N
An ∈ Λ
2. If τ is irrational then the orbit of any point x ∈ T under Rτ is a dense subset of T.
Proof. Let x ∈ T be given and suppose τ = pq is rational. Then Rτq is the identity map on T
and it follows immediately the orbit {T n x : n ∈ N} is finite.
Now consider the case τ is irrational. First we show that the points Rτn x and Rτm x are
distinct for n 6= m. For suppose m, n ∈ N are such that Rτm x = Rτn x. Then Rτn−m x = x and it
follows that x + (n − m)τ = x mod 1 so (n − m)τ = k for some k ∈ Z. Hence, as τ is irrational
we must have n = m.
Let d : T × T → [0, ∞) denote the metric
d(x, y) = min{|x − y|, 1 − |x − y|}
It is easy to verify that d is an Rτ -invariant metric and the quotient topology on T induced
from R is precisely the metric topology on T with given by d. Furthermore, if d(x, y) < 21 then
d(x, y) = |x − y|.
Let 0 < < 21 be given. By the Bolzano-Weierstrass property the sequence (Rτn x)n∈N
must have some convergent subsequence. In particular, the must be some l, k ∈ N such that
|Rτl x − Rτk x| = d(Rτl−k x, x) < . Consider the set
{Rτ(l−k)n x : n ∈ N}
(l−k)n
(l−k)(n+1)
We claim this set is -dense in T. Indeed, for consecutive points Rτ
x and Rτ
have
0 < d(Rτ(l−k)n x, Rτ(l−k)(n+1) x) = d(Rτl−k x, x) < and the claim follows immediately. As 0 < <
dense subset of T.
1
2
x we
was arbitrary it follows {Rτ x : n ∈ N} is a
Example 3 (Bernoulli Shifts). The reader is assumed some familiarity with this archetype
probability space and thus we proceed rapidly. For further details see [5]. Consider the finite
probability space on [n] = {1, . . . , P
n} where the elements are given probabilities p1 , . . . , pn ,
n
respectively where each pi ≥ 0 with i=1 pi = 1. Explicitly, we define a probability measure µ0
on the set [n] by
µ0 ({i}) = pi
for all i = 1, . . . , n
Consider the set X = [n]Z of two-sided sequences in [n],
X := (ωl )l∈Z : ωl ∈ [n] for all l ∈ N
Let B be the σ-algebra
Q generated by the projections πi : ω 7→ ωi for each i ∈ Z and µ the
product measure µ = Z µ0 . Then (X, B, µ) is a probability space.
Define a cylinder set to be any subset of X containing precisely the sequences which agree
on a finite number of fixed entries. Explicitly, A ∈ B is a cylinder set if there exists a finite
co-ordinate set I ⊂ Z, and a : I → [n] such that
A = (ωl )l∈Z ∈ X : ωi = a(i) for all i ∈ I
Note that the measure of A is given by
µ(A) =
Y
pa(i)
i∈I
If we denote G the collection of all cylinder sets it is not difficult to deduce that G is an algebra6
and the σ-algebra B is generated by G .
6
Recall, a set system G ⊆ P(X) is an algebra if
1. X ∈ G
2. If A ∈ G then Ac ∈ G
3. If A, B ∈ G then A ∪ B ∈ G
6
There is a simple and natural measure preserving transformation on (X, B, µ). Define the
left shift transformation T : X → X by
T : (ωl )l∈Z 7→ (ωl+1 )l∈Z
We claim T is measurable and measure preserving. As in the previous example it suffices only
to check that T −1 (A) ∈ B and µ(T −1 A) = µ(A) for all A ∈ G , and both these conditions are
obvious.
3.1.2
Measure Preserving Transformations and L2
Throughout this section, unless otherwise stated, we assume (X, B, µ, T ) is an arbitrary measure
preserving system. It is often fruitful to study (X, B, µ, T ) by considering the associated Hilbert
Space L2 (X, B, µ), exploiting its additional linear structure. We will briefly discuss some basic
properties of T in relation to L2 (X, B, µ).
Definition 5. The measure preserving transformation T induces a linear map
UT : L2 (X, B, µ) → L2 (X, B, µ)
UT : f 7→ f ◦ T
called the associated operator [5]. We will adopt the convention T f = UT f and hence, depending
on the context, T denotes a map on either X or L2 (X, B, µ).
Proposition 3.2. Let f ∈ L1 (X, B, µ). Then
Z
Z
T f dµ = f dµ
Proof. As T is measure preserving the proposition holds for all characteristic functions. Explicitly, if A ∈ B then
Z
Z
Z
T 1A dµ = 1T −1 A dµ = µ(T −1 A) = µ(A) = 1A dµ
Thus, by the linearity of the integral the proposition holds for all simple functions φ ∈ L1 (X, B, µ).
For any non-negative g ∈ L1 (X, B, µ) there exists an increasing sequence of simple functions
(φn )n∈N tending pointwise to g. Note (T φn )n∈N is an increasing sequence of simple functions
tending pointwise to T g. By monotone convergence
Z
Z
Z
Z
φn dµ = g dµ
T φn dµ = lim
T g dµ = lim
n→∞
n→∞
The general case f ∈ L (X, B, µ) is then obtained by writing f = f+ − f− where f+ , f− ∈
L1 (X, B, µ) are non-negative.
1
An immediate implication of Proposition 3.2 is that the associated operator T : L2 (X, B, µ) →
L (X, B, µ) is an isometry. Furthermore, if (X, B, µ, T ) is an invertible system then T is unitary.
2
3.2
Poincaré Recurrence
Much of this essay is concerned with notions of ‘recurrence’ in measure preserving systems and
here we introduce the simplest of these: that of Poincaré Recurrence. Recall, given a system
(X, B, µ, T ) and x ∈ X we define the orbit of x ∈ X as the set {T n x : n ∈ N}. Assuming the
reader has some familiarity with the basic premise of the the study of dynamical systems, it is
natural (or at least convenient for discursive purposes) to imagine x as the ‘initial state’ of a
system and the T n x as describing the evolution of the system over discrete units of time. An
obvious question is whether an orbit will return to its initial state (or something close to it) at
some point in the future. That is, given some ‘neighbourhood’ A of x does there exist some
n ∈ N such that T n x ∈ A? In the measure-theoretic context it is natural that our neighbourhood
should be a measurable set with µ(A) > 0. The Poincaré Recurrence Theorem states that for
any finite7 measure preserving systems (X, B, µ, T ), given a measurable set A ∈ B the orbits
7
We say the system (X, B, µ, T ) is finite if µ(X) < ∞.
7
of almost all the points x ∈ A return to A infinitely many times. The statement and proof of
this theorem is adapted from ([5] Chapter 2; [19] Chapter 2).
Theorem 3.3 (Poincaré Recurrence Theorem). Let (X, B, µ, T ) be a finite measure preserving
system. For any A ∈ B, for almost every x ∈ A there exists a strictly increasing sequence
(nk )k∈N ⊆ N such that T nk x ∈ A for all k ∈ N.
Proof. Consider the set E := {x ∈ A : T n x ∈
/ A for all n ∈ N≥1 }. Alternatively,
E=
∞
\
A \ T −n A
n=1
Clearly E is measurable. It is easy to see the sets E, T −1 E, T −2 E, . . . are pairwise disjoint and,
as T is measure preserving, they all have measure µ(E). Note that
∞
X
µ(E)
[
∞
−n
= µ
T E
n=1
n=1
≤ µ(X) < ∞
Hence µ(E) = 0 and if we define F1 = A\E then µ(F1 ) = µ(A) and for all x ∈ F1 there exists an
n ∈ N≥1 such that T n x ∈ A. Note that T 2 , T 3 , . . . define measure preserving transformations on
(X, B, µ) and we may repeat the above construction to give F2 , F3 , · · · ⊆ A each with measure
µ(A) and every point in Fk returning to A under the transformation T k . Finally, define
F :=
∞
\
Fk
k=1
then µ(F ) = µ(A) and every x ∈ F returns to A under T infinitely many times.
We now consider the orbit of an entire measurable set A with µ(A) > 0 rather than just
individual points.
Corollary 3.4. Let (X, B, µ, T ) be a finite measure preserving system and A ∈ B with µ(A) >
0. Then there exists some n ∈ N such that
µ(T −n A ∩ A) > 0
(3.1)
Proof. For each j ∈ N≥1 consider the set T −j A ∩ A = {x ∈ A : T j x ∈ A}. Then by the Poincaré
Recurrence Theorem
[
∞
µ
T −j A ∩ A = µ(A) > 0
j=1
and so there must exist some n ∈ N such that µ(T −n A ∩ A) > 0 as required.
The main body of this essay is dedicated to proving a remarkable generalisation of the version
of the Poincaré Recurrence Theorem stated in Corollary 3.4. Later we will investigate a much
stronger notion of recurrence than that described in (3.1) known as multiple recurrence. The
analagous result to Corollary 3.4 for multiple recurrence is precisely what is required in order
to prove Szemerédi’s Theorem. This connection will be explained in detail in Section 4.
Remark 2. It is clear from the proof of the Poincaré Recurrence Theorem that the measure
space in question must be finite8 . As we will be considering a generalisation of this result, we
will henceforth assume (tacitly) all measure preserving systems are finite, and in particular are
probability systems.
8
It is not difficult to conceive counter-examples in an infinite setting; for instance, consider the shift map x 7→ x+1
on R with Lebesgue measure.
8
3.2.1
Ergodicity
A key notion in the study of measure preserving systems is that of ergodicity.
Definition 6. Let (X, B, µ, T ) be a measure preserving system. A set A ∈ B is T -invariant if
T −1 A = A. The collection of all T -invariant sets forms a σ-algebra, which we denote ET .
Definition 7. A measure preserving transformation T on (X, B, µ) is ergodic if the σ-algebra
ET contains only null and conull sets.
Remark 3. In some instances it is convenient to change the perspective: if (X, B) is a measurable space and T : X → X a measurable function we say that a T -invariant measure µ is an
ergodic measure for T if T is ergodic as a measure preserving transformation on (X, B, µ).
We characterise ergodicity in two (equivalent) ways:
• Immediately from the definition we deduce that ergodicity is an indecomposability condition. If A ∈ B is a T -invariant set with 0 < µ(A) < 1, then so too is B = Ac . For any
x ∈ A (resp. B) the orbit of x never leaves A (resp. B) and hence, for all intents and
purposes, A and B may be considered9 separate dynamical systems. When T is ergodic
every T -invariant set has either measure 0 or measure 1: it is only possible to decompose
the system into a portion which is essentially the whole system and a portion which is
negligible.
• Ergodicity is precisely the condition that for almost all x ∈ X the orbit {T n x : n ∈ N}
visits almost all of the space. This will be explained in detail below (Lemma 3.6). In
particular, we will show that T is an ergodic transformation if and only if for all A, B ∈ B
with µ(A), µ(B) > 0 there exists some n ∈ N such that
µ(T −n A ∩ B) > 0
Note that this condition is much stronger than the similar-looking statement of Poincaré
Recurrence, given in (3.1).
In order to explain these characterisations fully, first we note that the definition of ergodicity
is equivalent to a similar statement concerning ‘almost invariant’ rather than invariant sets. The
proof of this lemma is adapted from ([5], Chapter 2).
Lemma 3.5. A system (X, B, µ, T ) is ergodic if and only if for all A ∈ B such that µ(A4T −1 A) =
0 either µ(A) = 0 or µ(A) = 1.
Proof. If the condition holds then it is obvious that the system is ergodic. For the converse,
suppose (X, B, µ, T ) is ergodic and A ∈ B satisfies µ(A4T −1 A) = 0. It suffices to show that
there exists a T -invariant set B such that µ(B) = µ(A). Define B as the measurable set
∞ [
∞
\
B=
T −n A
N =0 n=N
S∞
−n
First
T∞we show that
T∞ B is T -invariant. Let BN = n=N T A. Then B0 ⊇ B1 ⊇ B2 ⊇ . . . and
so N =0 BN = N =k BN for all k ∈ N. In particular,
T −1 B =
∞ [
∞
\
T −(n+1) A =
N =0 n=N
∞ [
∞
\
T −n A = B
N =1 n=N
Hence B is T -invariant. To see that µ(B) = µ(A) note that for all N ∈ N,
[
∞
µ(A4BN ) ≤ µ
A4T −n A
n=N
≤
∞
X
n=N
9
By restricting B, µ and T to these sets.
9
µ(A4T −n A) = 0
where we have used fact that µ(A4T −n A) = 0 for all n ∈ N, which follows immediately from
A4T −n A ⊆
n−1
[
T −j A4T −(j+1) A
j=0
As µ(A4BN ) = 0 for all N ∈ N and B =
as required.
S∞
N =0
BN , it follows µ(A4B) = 0 and so µ(A) = µ(B)
We are now in a position to elucidate precisely the second characterisation of ergodicity.
Proposition 3.6. Let (X, B, µ, T ) be a measure preserving system. The following are equivalent:
1. The system is ergodic.
2. For any A ∈ B with µ(A) > 0 we have
[
∞
−n
µ
T A =1
(3.2)
n=0
3. For any A, B ∈ B with µ(A), µ(B) > 0 there exists some n0 ∈ N such that
µ(T −n0 A ∩ B) > 0
(3.3)
Remark 4. The behaviour of the orbits of an ergodic system can be understood thus: (3.2)
tells us that for any given set A of positive measure, for almost every x ∈ X the orbit of x will
at some point pass through A. From this it is easy to deduce (3.3): given another set B of
positive measure, at some point the orbits of a significant proportion of the elements of B will
pass through A. It is in this sense that ergodicity is the condition that ‘almost all of the orbits
visit almost all of the space’.
Proof (of Proposition 3.6). (1 ⇒ 2). Let A ∈ B with µ(A) > 0 and consider the set
B=
∞
[
T −j A
j=0
Then T −1 B ⊆ B and µ(T −1 B) = µ(B) so we must have µ(B4T −1 B) = 0. By Lemma 3.5 and
noting A ⊆ B it follows µ(B) = 1.
(2 ⇒ 3). Let A, B ∈ B with µ(A), µ(B) > 0 and note
∞
∞
[
X
−j
(B ∩ T −j A)
T A ≤µ
0 < µ(B) = µ B ∩
j=1
j=1
Thus µ(B ∩ T −n A) > 0 for some n ∈ N.
(3 ⇒ 1). Suppose A ∈ B is T -invariant. Then for all j ∈ N we have µ(T −j A ∩ Ac ) = 0 and
thus either µ(A) = 0 or µ(Ac ) = 0 as required.
Finally, we note an easily deduced equivalent definition or ergodicity, described in terms of
invariant functions rather than invariant sets.
Definition 8. Let (X, B, µ, T ) be a measure preserving space. We say f ∈ L1 (X, B, µ) is
T -invariant if f = T f . Explicitly, f is T -invariant if for almost all x ∈ X we have f (x) = f (T x)
Proposition 3.7. Let (X, B, µ, T ) be a measure preserving space. Then T is ergodic if and
only if every T -invariant function is constant.
10
Proof. Suppose all T -invariant functions are constant. Given an invariant set A ∈ B it is easy
to see that the characteristic function 1A is T -invariant. Hence either 1A (x) = 0 or 1A (x) = 1
almost everywhere and so T is ergodic.
Conversely, suppose T is ergodic and f ∈ L1 (X, B, µ) is T -invariant. Let a ∈ R and define
the set A(a) := {x ∈ X : f (x) < a}. Then A(a) is measurable and
T −1 A(a) = {x ∈ X : T f (x) < a} = {x ∈ X : f (x) < a} = A(a)
For each a ∈ R the set A(a) is T -invariant and so by ergodicity µ(A(a)) = 0 or µ(A(a)) = 1.
Taking
a0 = sup{a ∈ R : µ(A(a)) = 0}
it is easy to see f (x) = a0 for almost all x ∈ X.
3.2.2
Examples of Ergodicity
In order to prove the ergodicity of certain systems we will utilise the following approximation
argument.
Lemma 3.8. Let (X, B, µ) be a probability space where B = σ(G ) is generated by an algebra
G . Then for any > 0 and any A ∈ B there exists an A1 ∈ G such that
µ(A4A1 ) < (3.4)
In particular, we may take A1 ∈ G to be such that A ⊆ A1 and A1 =
B1 , . . . , Bn ∈ G disjoint.
Sn
i=1
Bi for some
Proof. The proof relies on the notion of outer measure, as usually defined in the proof of
Carathéodory’s Extension Theorem (see [14] Chapter 1; [4] Chapter 4). In particular, for A ∈ B
define
[ UA := F ⊆ G : F is countable; A ⊆
F
F ∈F
Then the outer measure of µ∗ (A) of A is the value
X
µ∗ (A) := inf
F ∈UA
µ(F )
F ∈F
It is well known that the outer measure and measure of A ∈ B coincide. Hence given > 0
there exists a sequence (Bi )i∈N ⊆ A such that
µ(A) ≥
∞
X
µ(Bi ) −
i=1
2
(3.5)
As the left hand side of this inequality is finite, the sum on the right hand side must converge.
Let n ∈ N be such that
∞
X
µ(Bi ) <
(3.6)
2
i=n+1
and define
A1 =
n
[
Bi ∈ A
A2 =
i=1
∞
[
Bi ∈ B
i=n+1
Note that without loss of generality we may assume the B1 , . . . , Bn are disjoint. To see the
choice of A1 satisfies (3.4), note that
A4A1 ⊆ (A1 \ A) ∪ (A \ (A1 ∪ A2 )) ∪ A2 ⊆ (A1 ∪ A2 )4A ∪ A2
Hence, using the bounds stated in (3.5) and (3.6) we have
µ(A4A1 ) ≤ µ((A1 ∪ A2 )4A) + µ(A2 ) < as required.
11
We now apply Lemma 3.8 to our two examples of measure preserving systems to see they
are both ergodic. The proof of the next proposition is adapted from ([21], Chapter 3).
Proposition 3.9. Irrational rotations of the circle (c.f. Example 2) are ergodic.
Proof. We use the characterisation (3) of ergodicity from Proposition 3.6. Let A, B ⊆ T be
measurable subsets with µ(A), µ(B) > 0.
Claim. There exist intervals I = [a, b), J = [c, d) ⊆ T such that
µ(A ∩ I) >
3
µ(I)
4
µ(B ∩ J) >
3
µ(J)
4
(3.7)
The collection of finite unions of (disjoint) intervals of the form [x, y) ⊆ T is an algebra which
we denote G . Note that G generates the Borel σ-algebra on T. Thus, bySLemma 3.8 there exists
N
a sequence of disjoint intervals I1 , . . . , IN such that, by denoting A1 = i=1 Ii we have
A ⊆ A1
and
µ(A1 \ A) <
1
µ(A)
7
Now,
µ(A ∩ A1 ) = µ(A1 ) − µ(A1 \ A) >
6
µ(A)
7
(3.8)
On the other hand,
µ(A1 ) = µ(A1 \ A) + µ(A) <
8
µ(A)
7
(3.9)
We claim that for some i0 ∈ {1, . . . , N } we have µ(A ∩ Ii0 ) > 43 µ(Ii0 ). Once this is shown
by setting I = Ii0 and repeating the argument for the set B to obtain J the claim is proven.
Suppose not; then for all i ∈ {1, . . . , N } we have
µ(A ∩ Ii ) ≤
3
µ(Ii )
4
Thus, by applying (3.9) we have µ(A ∩ A1 ) ≤ 43 µ(A1 ) < 67 µ(A). But this contradicts (3.8).
Now we have the claim, the result follows easily. Let I = [a, b) and J = [c, d) satisfy (3.7).
Without loss of generality µ(I) ≥ µ(J). Note that, we may also assume 12 µ(I) ≤ µ(J). Indeed,
otherwise we can partition I into two intervals I1 = a, a+b
and I2 = a+b
2
2 , b . At least one
half Ii must be such that µ(Ii ∩ A) > 34 µ(Ii ) and so we rename the interval Ii as I and repeat
the halving process until 12 µ(I) ≤ µ(J) . By Proposition 3.1, the orbit of d ∈ T is dense in T
and so there exists some n ∈ N such that
b+a
< T nd < b
2
It follows that µ(T −n I ∩ J) ≥ 12 µ(I). Finally
µ(T −n A ∩ B) ≥ µ(T −n I ∩ J) − µ(I \ A) − µ(J \ B)
1
1
1
>
µ(I) − µ(I) − µ(J) > 0
2
4
4
as required.
The proof of the next proposition is adapted from ([5], Chapter 2).
Proposition 3.10. Bernoulli Shifts (c.f. Example 3) are ergodic.
12
Proof. Let (X, B, µ, T ) denote the Bernoulli system where X = [n]Z . As in the previous proposition, we show that the characterisation (3) of ergodicity from Proposition 3.6 holds for this
system. Moreover, we show the stronger property that for all A, B ∈ B we have
(3.10)
lim µ(T −n A ∩ B) − µ(A)µ(B) = 0
n→∞
Thus, considering the case µ(A), µ(B) > 0, ergodicity follows directly.
Let A, B ∈ B and > 0 be given. By Lemma 3.8 there exists cylinder sets A1 , B1 such that
A ⊆ A1 , B ⊆ B1 and
µ(A1 4A) <
and µ(B1 4B) <
4
4
Let IA1 , IB1 ⊂ Z be the finite co-ordinate sets for A1 and B1 respectively and a : IA1 → [n] and
b : IB1 → [n] be such that
A1 =
(ωl )l∈Z ∈ X : ωi = a(i) for all i ∈ IA1
B1 =
(ωl )l∈Z ∈ X : ωi = b(j) for all j ∈ IB1
Choose any n > maxi∈IA1 |i|+maxj∈IB1 |j| so that the co-ordinate sets IT −n A1 = {i−n : i ∈ IA1 }
and IB1 are disjoint. Then
T −n A1 = {(ωl )l∈Z ∈ X : ωi = a(i + n) for all i ∈ IT −n A1
is a cylinder set and it follows that the sets T −n A1 and B1 are independent. Thus
µ(T −n A1 ∩ B1 ) = µ(A1 )µ(B1 )
(3.11)
Also note that
µ(T −n A ∩ B) ≤ µ(T −n A1 ∩ B1 ) ≤ µ(T −n A ∩ B) + µ(A1 \ A) + µ(B1 \ B)
(3.12)
Hence, by applying the triangle inequality and (3.11) and (3.12) we have
µ(T −n A ∩ B) − µ(A)µ(B) ≤ µ(T −n A1 ∩ B1 ) − µ(T −n A ∩ B)
+ µ(A1 )|µ(B1 ) − µ(B)| + |µ(A1 ) − µ(A)|µ(B)
≤
2µ(A1 \ A) + 2µ(B1 \ B) < as required.
Remark 5. The property of the Bernoulli system described in (3.10) is known as mixing. This
is an important concept and later in this discourse we will spend some time investigating the
notion of mixing in its own right.
3.3
The Mean Ergodic Theorem
For our purposes, the Mean Ergodic Theorem will provide an invaluable tool in the study of
measure preserving systems: in particular it gives a useful equivalent definition of ergodicity
(see Corollaries 3.12 and 3.13) and also serves as a partial motivation for the study of mixing
in Section 6. Having said that, the result is certainly of interest in its own right and it would
be negligent to omit some explanation of the intuition behind it.
Given a (finite) measure preserving system (X, B, µ, T ) and function f ∈ L2 (X, B, µ) there
are two canonical definitions of the average value of f . The first is the ‘space’ average f¯, whereby
we take the average value of f over all possible states:
Z
1
¯
f :=
f dµ
(3.13)
µ(X)
13
Of course, if (X, B, µ) is a probability space this is merely the expected value of f . The second
notion of averaging is the ‘time’ average. The time average (provided it exists) is a function
fˆ ∈ L1 (X, B, µ) where fˆ(x) is defined almost everywhere as the average value of f over the
orbit {T n x : n ∈ N}:
N −1
1 X n
T f
(3.14)
fˆ := lim
N →∞ N
n=0
An important question originally motivated by statistical mechanics [21] which we will now
partially examine asks under what conditions the two averages agree. Perhaps it is reasonable
to hypothesise that if the orbits of the system penetrated almost every facet of the state space, as
in the case of T ergodic, then (3.13) and (3.14) are equal. The classical Ergodic Theorems answer
questions relating to the existence of (3.14) and detail precisely the relationship between the
values (3.13) and (3.14). As the name suggests, the Mean Ergodic Theorem shows convergence
PN −1
of N1 n=0 T n f in the L2 (X, B, µ) sense. In particular, if the system is ergodic we have
−1
1 NX
lim T n f − f¯ 2 = 0
N →∞ N
L (µ)
n=0
providing an answer to our question.
The statement and proof of von Neumann’s Theorem are adapted from ([17], Chapter 2;
[21], Chapter 5).
Theorem 3.11 (von Neumann’s Mean Ergodic Theorem). Suppose (X, B, µ, T ) is a measure
preserving system on a probability space and let πI be the orthogonal projection of L2 (X, B, µ)
onto the closed linear subspace I of T -invariant L2 (X, B, µ) functions
I := g ∈ L2 (X, B, µ) : T g = g
Then for any fixed f ∈ L2 (X, B, µ)
−1
1 NX
lim T n f − πI (f )
=0
N →∞ N
L2 (µ)
n=0
Proof. Consider the set J := {T g − g : g ∈ L2 (X, B, µ)}. We claim that J ⊥ = I. Indeed, given
f ∈ J ⊥ we have
hf, T gi = hf, gi
for all g ∈ L2 (X, B, µ)
Hence, T ∗ f = f where T ∗ denotes the adjoint of T . Note that
kf − T f k2L2 (µ)
= kf k2L2 (µ) − hf, T f i − hT f, f i + kT f k2L2 (µ)
=
2kf k2L2 (µ) − hT ∗ f, f i − hf, T ∗ f i = 0
and so f = T f as required. Conversely, if f ∈ I then f = T f and thus
for all g ∈ L2 (X, B, µ)
hf, T g − gi = hT f, T gi − hf, gi = 0
Hence I = J ⊥ as claimed. Therefore, by orthogonal decomposition we have L2 (µ) = I ⊕ clos(J)
and so if we fix f ∈ L2 (µ) we may write
f − πI (f ) = h
where h ∈ clos(J)
To conclude the proof we show that
N −1
1 X n T
h
lim N →∞ N
n=0
14
L2 (µ)
=0
Indeed, given > 0, as h ∈ clos(J) there exists h0 ∈ J such that kh−h0 kL2 (µ) < 2 . Furthermore,
there exists g0 ∈ L2 (X, B, µ) such that h0 = T g0 − g0 and we have
NX
1 −1 n
T
(T
g
−
g
)
0
0 N
=
L2 (µ)
n=0
2
1
T N g0 − g0 2 ≤ 2kg0 kL (µ)
L
(µ)
N
N
and thus there exists N0 ∈ N such that
NX
1 −1 n T h0 N
n=0
≤
L2 (µ)
2
for all N ≥ N0
Finally, it follows that for any N ≥ N0 we have
N −1
1 X n T h
N
n=0
L2 (µ)
N −1
N −1
1 X n 1 X
n
≤
T (h − h0 ) L2 (µ) + T h0 2 <
N n=0
N n=0
L (µ)
Remark 6. Considering the case (X, B, µ, T ) is an ergodic system, for any f ∈ L2 (X, B, µ)
the function πI (f ) as definedRabove is T -invariant and thus, by Proposition 3.7 constant. It is
then easy to deduce πI (f ) = f dµ, as we will see in the proof of the following corollary.
A corollary to the Mean Ergodic Theorem is the following equivalent definition of ergodicity.
Corollary 3.12. A measure preserving system (X, B, µ, T ) is ergodic if and only if for every
pair f, g ∈ L2 (X, B, µ) the following holds
−1
1 NX
hT n f, gi − hf, 1ih1, gi = 0
lim N →∞ N
n=0
(3.15)
Proof. Suppose (X, B, µ, T ) is ergodic and fix f ∈ L2 (X, B, µ). By the Mean Ergodic Theorem
−1
1 NX
lim T n f − πI (f )
=0
N →∞ N
L2 (µ)
n=0
Where πI (f ) is some T -invariant function. By ergocity πI (f ) = c1 for some constant c. As
norm convergence implies weak convergence, it follows that for all g ∈ L2 (X, B, µ)
−1
1 NX
lim hT n f, gi − ch1, gi = 0
N →∞ N
n=0
In particular, taking g = 1 and noting hT n f, 1i = hT n f, T n 1i = hf, 1i gives c = hf, 1i.
Conversely, suppose (3.15) holds and f ∈ L2 (X, B, µ) is T -invariant. Then for all g ∈
2
L (X, B, µ) we have
−1
1 NX
lim hT n f, gi − hf, 1ih1, gi = 0
N →∞ N
n=0
On the other hand, for all N ∈ N,
−1
1 NX
hT n f, gi − hf, 1ih1, gi = hf − hf, 1i1, gi
N n=0
Thus hf − hf, 1i, gi = 0 for all g ∈ L2 (X, B, µ) and so by elementary properties of the inner
product we must have f ≡ hf, 1i1 is constant.
15
We can weaken the statement of Corollary 3.12.
Corollary 3.13. A measure preserving system (X, B, µ, T ) is ergodic if and only there exists
a dense subset H ⊆ L2 (X, B, µ) such that for every pair φ, ψ ∈ H the following holds
−1
1 NX
lim hT n φ, ψi − hφ, 1ih1, ψi = 0
N →∞ N
n=0
(3.16)
Proof. By Corollary 3.12 it suffices to show that if (3.16) holds for all φ, ψ ∈ H, then it holds
for all f, g ∈ L2 (X, B, µ).
Let > 0 be given and f, g ∈ L2 (X, B, µ) with f, g 6= 0. Then there exist φ, ψ ∈ H such
that
kg − ψk2 <
4kf k2
kf − φk2 <
8kψk2
Now, for each N ∈ N
NX
1 −1 n
hT f, ψi − hf, 1ih1, ψi
N
n=1
≤
N −1
1 X n
hT (f − φ), ψi − hf − φ, 1ih1, ψi (3.17)
N n=1
N −1
1 X n
hT φ, ψi − hφ, 1ih1, ψi
+
N n=1
Note that as (X, B, µ) is a probability space, |hf − φ, 1i| ≤ kf − φk2 . Hence, by the triangle
and Cauchy-Schwarz inequalities,
n
hT (f − φ), ψi − hf − φ, 1ih1, ψi ≤ 2kf − φk2 kψk2 < (3.18)
4
By assumption (3.16) holds for φ, ψ ∈ H and thus by combining (3.17) and (3.18) there exists
some N0 ∈ N such that
−1
1 NX
hT n f, ψi − hf, 1ih1, ψi <
N n=0
2
for all N > N0
(3.19)
By what amounts to the same procedure, this time estimating
n
hT f, gi − hf, 1ih1, gi ≤ hT n f, g − ψi − hf, 1ih1, g − ψi
+ hT n f, ψi − hf, 1ih1, ψi
and then applying (3.19), we conclude
−1
1 NX
hT n f, gi − hf, 1ih1, gi < N n=0
for all N > N1
for some N1 ∈ N, as required.
3.4
Borel Probability Spaces
We conclude this lengthy introductory section with a brief discussion of Borel probability spaces
and some of their interesting properties.
Definition 9. Suppose (X, B, µ) is a measure space where X is a compact metric space, B is
the Borel σ-algebra generated by the metric topology on X and µ a (Borel) probability measure.
Then we say (X, B, µ) is a Borel probability space.
16
Example 4. Note that all our previous examples of measure preserving systems are defined
on Borel probability spaces. In particular, the state space X = [n]Z in the Bernoulli system
(X, B, µ, T ) is a compact space by Tychonoff’s Theorem. Define the δ metric on [n] by
0 if a = b
δ(a, b) =
for all a, b ∈ [n]
1 otherwise
Then let d : X × X → [0, 1] be given by
d(α, β) =
1 X δ(αk , βk )
3
2|k|
for all α = (αl )l∈Z , β = (βl )l∈Z ∈ X
k∈Z
Then d is a metric on X and it is easy to see that the product topology on X corresponds
precisely to the metric topology induced by d.
There are numerous advantages in considering measure preserving systems on Borel probability space; they essentially constitute the class of ‘well behaved’ systems. Here we will highlight
some of their favourable properties. We begin with a basic result from topology which will be
used frequently throughout this study.
Proposition 3.14. Let (X, d) be a compact metric space. Then the normed space (C(X), k·k∞ )
of real-valued continuous functions on X with uniform norm is separable.
Proof. The rather elegant proof described here is adapted from ([5] Appendix B) and uses the
Stone Weierstrass Theorem (see [2]). Any compact metric space is separable so there exists
some dense countable set (xn )n∈N ⊂ X. Define the function fn : X → R by fn (x) = d(xn , x)
for each n ∈ N. Each of these functions is continuous and by the density of (xn )n∈N , the fn
separate points. If F 0 denotes the algebra generated by (fn )n∈N then it follows that F 0 is dense
in C(X) by the Stone-Weierstrass Theorem. Let F denote the algebra generated by (fn )n∈N
over Q (that is F is the smallest set containing (fn )n∈N which is closed under both (finite)
Q-linear combinations and multiplication). Then F is countable and dense in F 0 . Thus F is
a countable dense subset of C(X).
Definition 10. Let X be a compact metric space and B the Borel σ-algebra on X. Denote by
M (X) the set of Borel probability measures on (X, B).
Proposition 3.15. If (X, B, µ) is a Borel probability space then there exists a natural choice
of topology on the set M (X) such that M (X) is a compact, metrizable topological space.
Proof. The details of the proof are sketched; for a more thorough discussion of the properties
of M (X) see [18, 5].
By the Riesz Representation Theorem [18], the dual space C(X)∗ is identified with the space
M of finite signed measures on (X, B). Explicitly, for all θ ∈ C(X)∗ there exists a µθ ∈ M such
that
Z
θ(f ) = f dµθ
for all f ∈ C(X)
Each measure can be viewed as a linear functional and thus it makes sense to define the weak*topology
R on M. This is the weakest topology such that for every f ∈ C(X), the linear operator
λ 7→ f dλ evaluating each functional at f is continuous.
The set M (X) of Borel probability measures is then given the subspace topology induced
by the weak*-topology on M. We claim that the topological space M (X) is compact and
metrizable.
The proof that the space is metrizable follows by explicitly defining a metric on M (X). By
the previous proposition (C(X), k · k∞ ) is separable and so there exists some dense sequence
(fn )n∈N ⊂ C(X). Define a metric dM on M (X) by
R
R
∞
X
1 | fn dλ − fn dκ|
R
R
dM (λ, κ) =
2n 1 + | fn dλ − fn dκ|
n=1
17
The topology induced by the metric dM then correpsonds precisely to the weak*-topology. Here
we omit the details (for a full proof see [5], Appendix B).
To demonstrate compactness, recall the famous Alaoglu’s Theorem [2] states that the closed
unit ball of the dual of a normed space is weak*-compact. Write
Z
Z
M (X) = λ ∈ M : 1X dλ = 1, f dλ ≥ 0 for all f ∈ C(X), f ≥ 0
It is then easy to see M (X) is a subset of the closed unit ball of M ∼
= C(X)∗ . Furthermore, if
(λn )n∈N ⊆ M (X) is a sequence of measures which converge to λ ∈ M in the weak*-topology
then
Z
Z
f dλn → f dλ
for all f ∈ C(X)
Considering the case f ≥ 0 and f = 1X it follows λ ∈ M (X). Therefore M (X) is a closed
subspace of a compact space and so is itself compact.
The topological properties of C(X) and M (X) are of great benefit when studying measure
preserving systems defined on Borel probability space. For this reason we will for the most
part concern ourselves only with the theory of this specific class of systems. Exploiting their
additional structure is crutial in our proof of Szemerédi’s Theorem. As an example of this, the
following subsection describes the powerful Ergodic Decomposition Theorem which is a direct
consequence of the geometric and topological structure of M (X). The Ergodic Decomposition
Theorem is important to us as it allows us to relate our study of ergodic systems to more general
classes of measure preserving systems. Further, in Section 9 we use the properties of C(X) and
M (X) to introduce the notions of conditional expectation and measure. These have a very
significant rôle in our study.
3.5
Ergodic Decomposition
So far many of our major results are applicable only to ergodic systems. We shall briefly describe
how the study of ergodic systems can be related to the study of systems in more generality.
As explained above, ergodicity is an indecomposability (or irreducibility) condition. An
obvious question is whether it is possible to decompose an arbitrary system into irreducible
components, akin to decomposing a positive integer into prime factors. A remarkable fact, and
one of the major strengths of ergodic theory, is that for the class of systems defined on Borel
probability space this is indeed possible. What we have described here is the content of the
Ergodic Decomposition Theorem, which we will state (but refrain from proving) in this section.
We present a statement of the Ergodic Decomposition Theorem, adapted from [13].
Theorem 3.16 (Ergodic Decomposition Theorem). Suppose (X, B, µ, T ) is a measure preserving system on a Borel probability space. Then there exists an almost everywhere defined function
x 7→ µx from X to the space MT (X) ⊆ M (X) of T -invariant Borel probability measures on
(X, B) such that
1. µx is ergodic for almost every x ∈ X.
2. For any f ∈ L1 (X, B, µ) we have f ∈ L1 (X, B, µx ) for almost every x ∈ X and
Z
Z Z
f dµ =
f dµx dµ(x)
Proof. The details of the proof are omitted. See ([5] Chapter 4, [13] Chapter 2) for a more precise
picture. The idea is to show that the space MT (X) is a compact, convex space and, moreover,
that the ergodic measures are precisely the extreme points of MT (X). The decomposition
theorem then follows by applying the Krein-Milman Theorem (see [2]) to show that any measure
µ ∈ MT (X) is a convex combination of ergodic measures.
18
Remark 7. Later in the discourse the idea of conditional measures is introduced. The construction of conditional measures generalises the Ergodic Decomposition Theorem and the latter
result can be obtained from this construction (although it does not follow directly).
Essentially, Ergodic Decomposition tells us that any property which is preserved under integration and holds for ergodic systems, holds for systems in general. As we shall see later in our
study, this is an important observation: it implies ergodicity is a far more reasonable condition
to assume than it may first appear.
4
Furstenberg’s Correspondence Principle
In this section we will establish the connection between ergodic theory and Szemerédi’s Theorem.
In particular we will show that a ‘generalised Poincaré Recurrence Theorem’ implies Szemerédi’s
Theorem. In fact, the two results are equivalent but for the sake of brevity we will omit the
proof of the converse (although it is not too difficult to establish).
Theorem 4.1. Let (X, B, µ) be a probability space and T a measure preserving transformation
on (X, B, µ). Then for any A ∈ B with µ(A) > 0 and any k ∈ N there exists an n ∈ N such
that
k−1
\
µ
T −jn A > 0
(4.1)
j=0
Remark 8. Note that the case k = 2 is given by the Poincaré Recurrence Theorem, specifically
Corollary 3.4.
Proposition 4.2. Theorem 4.1 implies Szemerédi’s Theorem.
Remark 9. Before presenting a proof of Proposition 4.1 it is remarked that the connection
described here is one instance of a more general scheme laid out by Furstenberg [7, 8] relating
problems in combinatorial number theory to recurrence in ergodic theory. The idea, as explained
in [7], is to exploit the similarities between asymptotic density in sets of integers and measure
in a probability space. In particular, the shift map S : Z → Z given by S : x 7→ x + 1 preserves
the upper Banach density of sets and thus is analogous to a measure preserving transformation.
In short, in order to prove Proposition 4.2, following the argument given in [5, 10], we
will take a set Λ ⊂ Z of positive upper Banach density and use the aforementioned analogies
to construct a measure preserving system (X, B, µ, T ), the (hypothesised) dynamics of which
reflect the structure of Λ.
Proof (of Proposition 4.2). Assuming Theorem 4.1, let Λ ⊆ Z be a subset of the integers with
positive upper Banach density and fix some k ∈ N. We will show Λ contains an arithmetic
progression of length k.
First consider the measure space (X0 , B0 ) of Bernoulli trials on two letters. Explicitly, let
X0 := {0, 1}Z be the space of all two-sided sequences in 0s and 1s given the product topology
induced from the discrete topology on {0, 1} and B0 the corresponding Borel σ-algebra. Equivalently, B0 is the σ-algebra generated by the collection of projections πi : ω 7→ ωi for all i ∈ Z.
Let T0 : X0 → X0 be the left shift transform on X0 , that is
for all (ωl )l∈Z ∈ X0
T0 (ωl )l∈Z = (ωl+1 )l∈Z
For each i ∈ Z let Ai denote the cylinder set
Ai := ω ∈ X0 : ωi = 1
Let ωΛ ∈ X0 be the characteristic function of Λ. Now for any a ∈ Z and b ∈ N consider the
length k arithmetic progression {a + jb}k−1
j=0 which starts at a and has step size b. Note that
k−1
\
−(a+jb)
T0
A0
j=0
=
k−1
\
j=0
19
Aa+jb
and it follows that
{a + jb}k−1
j=0 ⊆ Λ ⇐⇒ ωΛ ∈
k−1
\
−(a+jb)
T0
A0
j=0
We are interested only in showing there exists an arithmetic progression in Λ of length k. In
particular, the values of the starting point a and step size b are immaterial. Removing these
conditions, it suffices to show
ωΛ ∈
∞
∞ k−1
[
[
\
−(a+jb)
T0
A0
a=−∞ b=1 j=0
or, equivalently that there exists some a ∈ Z such that
∞ k−1
[
\
T0a ωΛ ∈
T0−jb A0
(4.2)
b=1 j=0
Prompted by (4.2) we restrict our attention to the (two-sided) orbit of ωΛ under T0 . Define
X = clos{T a ωΛ : a ∈ Z} with the subspace topology induced from X0 and let T be the restriction
of T0 to X0 . Consider the open set A = X ∩ A0 . To see Λ contains an arithmetic progression of
length k it suffices to show
∞ k−1
[
\
T −jb A 6= ∅
b=1 j=0
Tk−1
Indeed, if ω ∈ j=0 T −jb A for some b ≥ 1 then, as this set is open and {T a ωΛ : a ∈ Z} is dense
in X, there must be some a ∈ Z for which (4.2) holds and we are done.
We claim, using the assumption Λ has positive upper Banach density, that we can construct
a T -invariant measure µ on X with the property µ(A) > 0. Then by Theorem 4.1 there exists
some b ∈ N such that
k−1
\
−jb
T
A >0
µ
j=0
and we are done.
We begin by constructing a sequence of ‘almost T -invariant’ measures on X. By assumption
|Λ ∩ [M, N )|
¯
>0
d(Λ)
= lim sup
N −M
N −M →∞
Moreover, there exists a sequence of intervals [an , bn ) with bn − an → ∞ such that
lim
n→∞
|Λ ∩ [an , bn )|
¯
= d(Λ)
>0
bn − an
Define the measure
µn =
(4.3)
bX
n −1
1
δ j
bn − an j=a T ωΛ
n
where δx denotes the delta measure
δx (ω) =
1 if ω = x
0 otherwise
Then clearly each µn is a probability measure. Moreover, it is easy to see
µn (A) =
|Λ ∩ [an , bn )|
bn − an
20
(4.4)
Futhermore, the µn are ‘almost T -invariant’ in the sense that for all B ∈ B
1
|µn (B) − µn (T −1 B)| = (δT an ωΛ − δT bn ωΛ )
bn − a n
2
≤
bn − an
so that limn→∞ |µn (B) − µn (T −1 B)| = 0.
In Proposition 3.15 we established that M (X) with the weak*-topology is a compact, metrizable space. Hence there must exist a weak*-subsequential limit µ of (µn )n∈N . Clearly µ must be
a Borel probability measure on X. By comparing (4.3) and (4.4) we conclude µ(A) > 0. Finally,
to see that µ is T -invariant, note by the definition of the weak*-limit for every f ∈ C(X) we
have
Z
Z
lim
T f − f dµnl = T f − f dµ
l→∞
for some subsequence (µnl )l∈N . As the µnl are almost T -invariant we conclude
Z
Z
T f dµ = f dµ
Given any U ⊆ X open, there exists an increasing sequence of continuous functions fn → 1U
pointwise. Then (T fn )n∈N is an increasing sequence of continuous functions such that T fn →
1T −1 U pointwise. Hence by the Monotone Convergence Theorem µ(T −1 U ) = µ(U ). But the
open sets form an algebra which generates B, so µ is T -invariant.
5
Examples of Multiple Recurrence
In the previous section we deduced that in order to prove Szemerédi’s Theorem it suffices to
show Theorem 4.1. Thus the remainder of this essay will concentrate on proving Theorem 4.110 .
We follow the argument described in Furstenberg, Katznelson and Ornstein’s article [10] and
first prove the result separately for the familiar examples of Bernoulli shifts and circle rotations.
We will see that Theorem 4.1 holds in both cases, but for strikingly different reasons.
5.1 Multiple Recurrence for the Bernoulli and Circle Rotation Systems
Proposition 5.1. Let (X, B, µ, T ) be a Bernoulli system and A0 , . . . , Ak−1 ∈ B. Then
k−1
k−1
\
Y
lim µ
T −jn Aj −
µ(Aj ) = 0
n→∞
j=0
(5.1)
j=0
Moreover, it follows that if A ∈ B with µ(A) > 0 then
k−1
\
lim µ
T −jn A = µ(A)k−1 > 0
n→∞
j=0
Hence Theorem 4.1 holds for Bernoulli systems.
Proof. In the proof of Proposition 3.10, where we showed Bernoulli systems are ergodic, we
demonstrated the case k = 2. It is easy to see that the general case can be obtained by a
completely analogous method. First suppose A0 , . . . , Ak−1 ∈ B are all cylinder sets. Take
10
In fact we will prove the stronger Furstenberg Multiple Recurrence Theorem, which is introduced later in this
section.
21
n ∈ N so large that the sets defining the co-ordinates for the T j Aj are all disjoint. Then the
T j Aj are independent sets and
k−1
k−1
\
Y
µ
T −jn Aj =
µ(Aj )
j=0
j=0
The general case where A0 , . . . , Ak−1 ∈ B are arbitrary measurable sets is now obtained by
using approximating sequences of cylinder sets, as in Proposition 3.10.
The property of Bernoulli systems described in (5.1) is called ‘k-order mixing’. Here the
asymptotic independence of the sets T jn Aj can be thought of as indicative of ‘random-like’
behaviour within the system. We will discuss this phenomenon in far more detail in the forthcoming section, but for now it is good to acknowledge that in this case Theorem 4.1 holds here
due to the random-like behaviour of the system.
Proposition 5.2. Let (X, B, µ, T ) denote the system corresponding to rotations of the circle
by τ . Then given any A ∈ B with µ(A) > 0 and any k ∈ N
k−1
N
\
1 X
−jn
µ
lim inf
T
A >0
N →∞ N
n=1
j=0
Proof. Fix A ∈ B with µ(A) > 0 and k ∈ N.
If τ is rational then there exists some q ∈ Z such that T −q A = A and the result follows
immediately.
For the case τ irrational, by Lemma 3.8 there exists a finite sequence of (non-empty) disjoint
SN
intervals I1 , . . . , IN such that, writing H = j=1 Ij , we have A ⊆ H and µ(H \ A) < µ(A)
4k .
Now suppose y ∈ T with |y| < δ where
µ(A)
µ(Ij )
: j = 0, . . . , N ∪
δ = min
k−1
4(k − 1)
Then
k−1
\
µ(A)
H − jy ≥ µ H ∩ (H − (k − 1)y) > µ(H) −
µ
4
j=0
Now for each 0 ≤ j ≤ k − 1 we have A − jy ⊆ H − jy. Note that
µ(A ∩ (A − y)) ≥ µ(H ∩ (H − y)) − µ(H \ A) − µ((H − y) \ (A − y))
By a simple induction
k−1
k−1
k−1
\
\
X
H − jy −
µ((H − jy) \ (A − jy))
A − jy ≥ µ
µ
j=0
j=0
l=0
Furthermore, µ (H − ly) \ (A − ly) = µ(H \ A) <
k−1
\
µ
A − jy
µ(A)
4k
for each 0 ≤ j ≤ k − 1 and so
>
k−1
\
µ(A)
µ
H − jy −
4
j=0
≥
µ(A)
>0
2
j=0
To finish we claim that the set Λ = {n ∈ N : τ n ∈ (−δ, δ)} has positive lower asymptotic density,
that is
|Λ ∩ [1, N ]|
lim inf
>0
N →∞
N
22
It then follows that
lim inf
N →∞
k−1
N
[
1 X
µ(A) |Λ ∩ [1, N ]|
µ
T −jn A ≥ lim inf
>0
N →∞
N n=1
2
N
j=0
To prove the claim we show Λ is syndetic: that there exists some a ∈ N such that {k, k +
1, . . . , k + a − 1} ∩ Λ 6= ∅ for all k ∈ N. As τ is irrational, by Proposition 3.1 the orbit of 0 is
dense in T. Thus there exists some a ∈ N such that the set {0, τ, . . . , (a − 1)τ } is 2δ-dense in T.
Moreover, for any k ∈ N the translated set
{kτ, (k + 1)τ, . . . , (k + a − 1)τ }
(5.2)
is 2δ-dense in T. It follows that for every k ∈ N, the set (5.2) of a consecutive points of the orbit
of 0 contains at least one element in (−δ, δ) and Λ is syndetic.
The reason why Theorem 4.1 holds in the case of rotations is transparent: the rotations do
not move nearby points apart. This is a markedly different situation than that of the Bernoulli
shifts; here we have an example structured, as opposed to random-like behaviour.
5.2
Furstenberg’s Multiple Recurrence Theorem
In both Proposition 5.1 and 5.2 in order to verify Theorem 4.1 held in the respective cases we in
fact demonstrated a stronger result. This suggests that it is more natural to attempt to prove
a stronger statement.
Theorem 5.3 (Furstenberg’s Multiple Recurrence Theorem). Let (X, B, µ, T ) be a measure
preserving system and A ∈ B with µ(A) > 0. Then for any k ∈ N
lim inf
N →∞
k−1
N
\
1 X
T −jn A > 0
µ
N n=1
j=0
The method by which we prove Theorem 5.3 is rather complicated; before we begin it
is prudent to briefly sketch the first few steps of the argument. The proceeding section is
occupied with determining the property of the Bernoulli system which accounts for the Multiple
Recurrence Theorem holding in this instance. It transpires that Bernoulli systems belong a
larger class of ‘weak mixing’ systems. In Section 6 we prove multiple recurrence holds for all
weak mixing systems owing to their ‘random-like’ behaviour. On the other hand, rotations
of the circle exemplifies what is known as a ‘compact system’. Compact, in contrast to weak
mixing systems, exhibit ‘structured’ behaviour. In Section 7 we show multiple recurrence holds
for all compact systems owing to their structured behaviour.
The notions of weak mixing and compactness are mutually exclusive and in terms of randomness and structure they represent two extremes in behaviour. To show that the Multiple
Recurrence Theorem holds for these two cases is a significant step toward the full proof: at the
very least, as noted in [5], the fact that the result should hold for these two opposite classes
of systems could be interpreted as an indication that we are dealing with a more general phenomenon.
It is important to bear in mind that there are many examples of systems which are neither
weak mixing nor compact. Importantly, however, we will see that there is a dichotomy between
weak mixing and compact systems. Specifically, if a system is not weak mixing then it must
contain some compact ‘subsystem’. The ‘subsystems’ in question are correctly known as factors.
Thus the combined result of studying weak mixing and compact systems and demonstrating this
dichotomy will be the following:
Given any measure preserving system, the Multiple Recurrence Theorem holds
for some (non-trivial) factor.
This is the first significant step toward the full proof.
23
5.3
Reducing the Problem
Throughout the remainder of this discourse we will make a number of reasonable assumptions
concerning the measure preserving systems we consider. The reader is reminded that all our
measure preserving systems are defined on probability space: it is necessary that the systems
are finite for multiple recurrence to hold.
In addition we will (tacitly) assume that all measure preserving systems we consider are both
invertible and defined on Borel probability space. By careful consideration of the proof of the
correspondence result Proposition 4.2, to deduce Szemerédi’s Theorem it suffices only to show
the Multiple Recurrence Theorem for this class of systems.
Moreover, the general case follows fairly easily once Multiple Recurrence has been established
for such systems, but here we omit the proof of this fact (for the details, see [5, 8]).
We claim that without loss of generality we may only consider ergodic systems. This is an
application of the Ergodic Decomposition Theorem.
Proposition 5.4. Suppose multiple recurrence holds for all invertible, ergodic systems defined
on Borel probability space. Then it holds for any invertible system (X, B, µ, T ) defined on a
Borel probability space.
Proof. Consider (X, B, µ, T ) where (X, B, µ) is a Borel probability space. By Ergodic Decomposition there exists some X 0 ∈ B with µ(X 0 ) = 1 and a collection of probability measures
{µx : x ∈ X 0 } on (X, B) such that each µx is an ergodic measure for T . Moreover, for all
f ∈ L1 (X, B, µ) we have f ∈ L1 (X, B, µx ) for almost all x and
Z
Z Z
f dµ =
f dµx dµ(x)
In particular, given A ∈ B with µ(A) > 0 and k ∈ N we have
Z
µx (A) dµ(x) = µ(A) > 0
and so µ({x ∈ X : µx (A) > 0}) > 0. By Fatou’s lemma,
k−1
N
\
1 X
−jn
T
A
=
µ
lim inf
N →∞ N
n=1
j=0
Z
lim inf
N →∞
Z
=
lim inf
N →∞
k−1
N
\
1 X
−jn
µx
T
A dµ(x)
N n=1
j=0
k−1
N
\
1 X
µx
T −jn A dµ(x) > 0
N n=1
j=0
where we have applied the assumption that the Multiple Recurrence Theorem holds for ergodic
systems.
6
Weak Mixing Systems
As discussed earlier, the multiple recurrence result holds for Bernoulli system owing to the
‘random-like’ behaviour the system exhibits. In this section we discuss and define precisely
what is meant by ‘random-like’ behaviour, and conclude that in general the multiple recurrence
result holds for the general class of systems with this property.
Definition 11. A measure preserving system (X, B, µ, T ) is mixing if for all A, B ∈ B the
following holds:
lim µ(A ∩ T −n B) = µ(A)µ(B)
(6.1)
n→∞
24
It is natural to think of a mixing system as one which exhibits random-like behaviour. To
see why, assume µ(B) > 0 and write (6.1) as
µ(A ∩ T −n B)
lim − µ(A) = 0
−n
n→∞
µ(T B)
−n
B)
where we have used the fact T is measure preserving. Then µ(A∩T
is the conditional
µ(T −n B)
n
probability that x ∈ A given T (x) ∈ B; namely, the probability of the event A happening
now, given that B will occur n intervals of time into the future. Hence a mixing system is
characterised by the property that for any pair of events A and B, the events along the orbit of
B asymptotically bear no relation to A.
Remark 10. Whilst we are primarily concerned with studying mixing with regard to multiple
recurrence, it is worthwhile to note that the consideration of mixing systems naturally emanates
from the notion of ergodicity and the Mean Ergodic Theorem. As discussed in ([5] Chapter 2;
[16]), by the Mean Ergodic Theorem (Corollary 3.12) a measure preserving system (X, B, µ, T )
is ergodic if and only if for all f, g ∈ L2 (X, B, µ) we have
X
Z
Z
Z
1 N
n
f T g dµ − f dµ g dµ = 0
lim N →∞ N
n=1
By considering characteristic functions this is equivalent to the property that given any A, B ∈ B
−1
1 NX
lim µ(A ∩ T −n B) − µ(A)µ(B) = 0
N →∞ N
n=0
This convergence may take place for a number of reasons, one of which being that the sets A
and T −n B are asymptotically independent, motivating the notion of mixing.
It transpires that the mixing condition is rather strong. A weaker and more malleable notion
is that of weak-mixing, and it is with weak-mixing systems we will concern ourselves.
Definition 12. A measure preserving system (X, B, µ, T ) is weak mixing if for all A, B ∈ B
the following holds:
N
2
1 X
lim
µ(A ∩ T −n B) − µ(A)µ(B) = 0
(6.2)
N →∞ N
n=1
Note that, trivially, weak-mixing systems are ergodic.
Lemma 6.1. A weak mixing system (X, B, µ, T ) is ergodic.
Proof. Suppose B ∈ B is T -invariant. Then by the weak mixing property
N
2
1 X
µ(B ∩ T −n B) − µ(B)2 = 0
N →∞ N
n=1
lim
For all n ∈ N we have T −n B = B and so it follows µ(B) = µ(B)2 . Hence either B or B c is a
null set.
We can define weak mixing in terms of functions.
Lemma 6.2. A system (X, B, µ, T ) is weak mixing if and only if for all f, g ∈ L2 (X, B, µ)
2
Z
Z
N Z
1 X
n
lim
f T g dµ − f dµ g dµ = 0
N →∞ N
n=1
25
(6.3)
Proof. Clearly by taking f = 1A and g = 1B in (6.3) gives (6.2). Conversely, by assuming weak
mixing we assume (6.3) holds for all characteristic functions. These functions span the subspace
of simple functions, dense in L2 (X, B, µ). In general if (6.3) holds for all f, g ∈ S where S is a
set linearly spanning a dense subspace V , then (6.3) holds for all f, g ∈ L2 (X, B, µ). This fact
follows a simple approximation argument, similar to that used in Corollary 3.13.
There are many equivalent formulations of the weak-mixing property11 which can been seen
as an indication that the concept is a natural one. Presently we introduce some of the formulations which are necessary to our study.
Proposition 6.3. A system (X, B, µ, T ) is weak mixing if and only if the product system
(X × X, B ⊗ B, µ ⊗ µ, T ⊗ T ) is ergodic.
Proof. Suppose (X, B, µ, T ) is weak mixing. By Lemma 6.1 it suffices to show the product
system is weak mixing. Furthermore, by the proof of Lemma 6.2, it suffices to show that if
f1 , f2 , g1 , g2 ∈ L2 (X, B, µ) then the weak mixing property (6.3) holds for f1 ⊗ f2 and g1 ⊗ g2 ∈
L2 (X × X) as functions of this type span a dense subspace of L2 (X × X). Note that
Z
Z
Z
n
n
f1 ⊗ f2 (T × T ) g1 ⊗ g2 d(µ ⊗ µ) =
f1 T g1 dµ f2 T n g2 dµ
Z
Z
Z
f1 ⊗ f2 d(µ ⊗ µ) =
f1 dµ f2 dµ
Z
Z
Z
g1 ⊗ g2 d(µ ⊗ µ) =
g1 dµ g2 dµ
Thus the result is easily deduced through an application of the triangle inequality.
Conversely, suppose the product system is ergodic; it is easy to see that (X, B, µ, T ) must
also be ergodic. For suppose B ∈ B is a T -invariant set. Then B ×B ∈ B ⊗B is T ×T -invariant
and hence by ergodicity µ(B)2 = µ ⊗ µ(B × B) = 0 or 1. Thus B is null or has null complement.
Let f, g ∈ L2 (X, B, µ). By the Mean Ergodic Theorem,
Z
Z
N Z
1 X
n
f T g dµ − f dµ g dµ = 0
lim N →∞ N
(6.4)
n=1
As the product system is ergodic we can carry out exactly the same proceedure on (X × X, B ⊗
B, µ ⊗ µ, T ⊗ T ) and the functions g ⊗ g and f ⊗ f to give
X
Z
2 Z
2 Z
2 1 N
lim f T n g dµ −
f dµ
g dµ = 0
N →∞ N
(6.5)
n=1
We claim (6.4) and (6.5) together imply the weak mixing property, namely
2
Z
Z
N Z
1 X
n
lim
f T g dµ − f dµ g dµ = 0
N →∞ N
n=1
Let an =
properties
R
f T n g dµ and b =
R
R
f dµ g dµ. Then the sequence (an )n∈N ⊆ R has the
N
1 X
an = b
N →∞ N
n−1
lim
It follows that
11
(6.6)
and
N
1 X 2
an = b2
N →∞ N
n−1
lim
N
N
N
1 X
1 X 2 2b X
(an − b)2 =
an −
an + b2
N n=1
N n=1
N n=1
For lists and proofs of equivalences see [1, 5, 8].
26
Taking the limit as N → ∞ we have
N
1 X
(an − b)2 = 0
N →∞ N
n=1
lim
which is precisely (6.6), as required.
Definition 13. A measure preserving system (X, B, µ, T ) is weak mixing of all orders if for
any k ∈ N, given a collection A0 , . . . , Ak ∈ B of measurable sets
Y
2
N \
k
k
1 X
−jn
µ
lim
T
Aj −
µ(Aj ) = 0
N →∞ N
n=1
j=0
j=0
(6.7)
We can instantly deduce multiple recurrence holds for any system which is weak mixing of
all orders: for any A ∈ B of positive measure and any k ∈ N let A0 = A1 = · · · = Ak = A and
apply (6.7) to obtain
\
N
k
1 X
lim
µ
T −jn A = µ(A)k+1 > 0
N →∞ N
n=1
j=0
Presently we demonstrate that the ostensibly stronger condition of weak mixing of all orders
is in fact equivalent to the weak mixing condition. Thus we will have shown that for any weak
mixing system multiple recurrence holds.
Remark 11. Analogous to weak mixing of all orders, there are several possible definitions of
mixing of all orders. Once such, given in ([10], §3), is that a system (X, B, µ, T ) is mixing of all
orders if for any collection A0 , . . . , Ak ∈ B we have
lim
nj →∞
|nj −ni |→∞
\
Y
k
k
−nj
T
Aj =
µ
µ(Aj )
j=0
j=0
Acertaining whether mixing implies mixing of all orders is a major unsolved problem in Ergodic
theory12 . For example it is not even known [10] whether mixing implies
lim µ A0 ∩ T −n A1 ∩ T −2n A2 = µ(A0 )µ(A1 )µ(A2 )
n→∞
Theorem 6.4. Suppose (X, B, µ, T ) is a weak mixing system. Then it is weak mixing of all
orders.
Assuming (X, B, µ, T ) is weak mixing, we will in fact show that for any finite collection
f0 , . . . , fk ∈ L∞ (X, B, µ) the following holds:
2
N Z Y
k
k Z
Y
1 X
ln
T fl dµ −
fl dµ = 0
lim
N →∞ N
n=1
l=0
(6.8)
l=0
Theorem 6.4 then follows by applying this result to characteristic functions. To prove (6.8) we
use induction on k ∈ N. As the system is weak mixing, applying Lemma 6.2 gives the case
k = 1. In order to prove the inductive step we will need to introduce the van der Corput trick,
the statement and proof of which are adapted from ([5] Chapter 7; [1]).
Lemma 6.5 (Van der Corput Trick). Let (un )n∈N be a sequence in some Hilbert Space (H , h·, ·i).
Define a sequence (sh )h∈N0 ⊂ H by
N
1 X
hun+h , un i
sh = lim sup N →∞ N n=1
12
For a brief discussion of this problem see [12].
27
If
H−1
1 X
sh = 0
H→∞ H
lim
(6.9)
h=0
then
N
1 X
u
lim n = 0
N →∞ N
n=1
PN
PN PH−1
1
Proof. First note that for fixed H, the sums N1 n=1 un and HN
n=1
h=0 un+h are arbitrarily close for large N . Indeed, the sums overlap: they agree except on the first and last H
terms.
N H−1
1 XX
un+h
HN n=1
=
h=0
=
1
NH
1
N
X
N
un +
n=1
N
X
n=H+1
N
+1
X
un + · · · +
N +H−1
X
n=2
un
n=H+1
H
1 X
n(un + uN +H−n )
un +
N H n=1
As H is fixed and the sequence (un )n∈N is bounded, the difference between the two sums is
averaged out as N → ∞. Explicitly, if M > 0 is such that kun k ≤ M for all n ∈ N it follows
N H−1
N
X
1 X
1 X M (2H + 1)
un+h −
un = o(1)
HN
≤
N
N
n=1
n=1
(6.10)
h=0
Now it suffices to consider the asymptotic behaviour of the norm of the double average. This is
preferable as one may estimate the limit using terms of (sh )h∈N . By the triangle inequality
2
N H−1
X
1 X
lim sup u
n+h HN n=1
N →∞
≤ lim sup
N →∞
h=0
=
lim sup
N →∞
≤
1
H2
2
N H−1
X
1 X
1
u
n+h N n=1 H
h=0
1
N
H−1
X
h,h0 =0
N
X
H−1
1 X
hun+h , un+h0 i
H2
0
n=1
h,h =0
N
1 X
lim sup hun+h , un+h0 i
N →∞ N n=1
Notice that if h0 ≥ h then
N
1 X
lim sup hun+h , un+h0 i
N →∞ N n=1
=
+h
1 NX
lim sup hun , un+h0 −h i
N →∞ N
n=1+h
=
N
1 X
lim sup hun , un+h0 −h i = sh0 −h
N →∞ N n=1
By a symmetric argument if h0 ≥ h then the left hand side is equal to sh−h0 . In short
N
1 X
hun+h , un+h0 i = s|h−h0 |
lim sup
N →∞ N n=1
Thus
2
N H−1
H−1
X
1 X
1 X
lim sup un+h ≤
s|h−h0 |
HN n=1
H2
N →∞
0
h=0
h,h =0
28
PH−1
Finally we split the sum H12 h,h0 =0 s|h−h0 | into three parts and use the hypothesis (6.9) to
estimate. Let > 0 be given. Choose H0 ∈ N such that
H−1
1 X
sh <
H
4
for all H ≥ H0 .
(6.11)
H−H
H−1
1 X0 1 X
sh0 −h
H
H 0
(6.12)
h=0
For any H ≥ H0 we may write
H−1
1 X
s|h−h0 |
H2
0
=
h=0
h,h =0
+
h =h
H−H
1 X0 1
H
H
h=0
+
1
H2
H−1
X
sh−h0
(6.13)
s|h0 −h|
(6.14)
h=h0 +1
H−1
X
h0 ,h=H−H0 +1
Applying (6.11) we deduce that for all H0 ≤ h, ≤ H
H−1
1 X
sh0 −h <
H 0
4
h =h
so that the sum in (6.12) is bounded above by 4 . Similarly the sum in (6.13) is also bounded
above by 4 . As the sh are bounded we may fix H ≥ H0 large so that the third sum (6.14) is
bounded above by 4 . Now if we choose N0 ∈ N so large that M (2H+1)
< 4 then by (6.10) we
N0
conclude
N
1 X
un for all N ≥ N0
N
<
n=1
As the choice of > 0 was arbitrary, the result is established.
The Van der Corput trick is the central ingredient in the inductive step in the proof of
Theorem 6.4 which we now turn to. Here we follow the method outlined in ([10], §3).
Proof (of Theorem 6.4). We shall show that for all f0 , . . . , fk ∈ L∞ (X, B, µ) the following two
statements hold:
2
k Z
N Z Y
k
Y
1 X
fl dµ = 0
(6.15)
lim
T ln fl dµ −
N →∞ N
n=1
l=0
l=0
and
X
k Z
k
Y
1 N Y
ln
fl dµ
T fl −
lim
N →∞ N
n=1 l=1
l=1
=0
(6.16)
L2 (µ)
Here we use an inductive argument. Recall that (6.15) case k = 1 holds by the hypothesis that
the system is weak mixing. We show that for k ≥ 2
• Claim (1): (6.15) case k − 1 ⇒ (6.16) case k
• Claim (2): (6.16) case k ⇒ (6.15) case k
and conclude that both results hold for all k ∈ N.
Claim (1). Suppose every weak mixing system (X, B, µ, T ) has the property that for all g0 , . . . , gk−1 ∈
L∞ (X, B, µ) the following holds:
2
N Z k−1
k−1
Y
YZ
1 X
jn
lim
T gj dµ −
gj dµ = 0
N →∞ N
n=1
j=0
j=0
29
(6.17)
Then for a fixed weak mixing system (X, B, µ, T ), given f1 , . . . , fk ∈ L∞ (X, B, µ) we have
X
k
k Z
Y
1 N Y
ln
lim T fl −
fl dµ
N →∞ N
n=1 l=1
=0
L2 (µ)
l=1
Proof.
First consider the case that for some j0 ∈ 1, . . . , k the function fj0 has zero expectation:
R
fj0 dµ = 0. In this case we are required to prove
N Y
k
1 X
ln lim
T fl N →∞ N
=0
(6.18)
L2 (µ)
n=1 l=1
To do this we use the van der Corput trick: let (un )n∈N ⊂ L∞ (X, B, µ) be the sequence defined
by
k
Y
un =
T ln fl
l=1
Note that (6.18) can then be written as
N
1 X
u
lim n
N →∞ N
=0
(6.19)
L2 (µ)
n=1
and by van der Corput, to prove (6.19) it suffices to show
H−1
1 X
sh = 0
H→∞ H
lim
(6.20)
h=0
where
N
1 X
sh = lim sup hun+h , un i
N →∞ N n=1
Now, using the fact T is measure preserving, we have
hun+h , un i =
Z Y
k
T l(n+h) fl
l=1
=
Z Y
k
k
Y
T ln fl dµ
l=1
T ln (fl T lh fl )dµ
l=1
Z
=
Tn
k−1
Y
T ln (fl+1 T (l+1)h fl+1 )dµ
l=0
=
Z k−1
Y
T ln (fl+1 T (l+1)h fl+1 )dµ
l=0
For each h ∈ N and each l = 0, . . . , k − 1 define gh,l = fl+1 T (l+1)h fl+1 ∈ L∞ (X, B, µ). By
applying the hypothesis (6.17) of the claim to the functions gh,l for a fixed h,
sh
N
1 X
hun+h , un i
lim sup N →∞ N n=1
N Z k−1
Y
1 X
= lim sup gh,l dµ
N →∞ N n=1
l=0
k−1
YZ
=
gh,l dµ
=
l=0
30
Now we show that
lim
sh = 0
h→∞,h∈Z
/
(6.21)
where Z ⊂ N is a set of zero density and hence, recalling Theorem 2.1, establish (6.20). Indeed,
by the weak mixing property applied to the function fj0 ,
Z
H Z
2 2
1 X
h
=0
lim
fj0 T fj0 dµ −
fj0 dµ
H→∞ H
h=1
Recall by hypothesis
R
fj0 dµ = 0. Summing over a subsequence gives
2
H Z
1 X
j0 h
fj0 T fj0 dµ = 0
lim
H→∞ H
| {z }
h=1
gh,j0 −1
Once again recalling Theorem 2.1, there exists some set Z ⊂ N of zero density such that
Z
gh,j0 −1 dµ = 0
lim
h→∞,h∈Z
/
Noting that kgh,l k∞ ≤ maxj=1,...,k kfj k∞ is bounded uniformly in both h and l,
k−1
YZ
lim
h→∞,h∈Z
/
gh,l dµ = 0
l=0
Thus we have (6.21) and consequently (6.18).
The general case reduces to the above case by noting the following lemma:
Lemma 6.6. Let (al )kl=1 , (bl )kl=1 ⊂ R be finite sequences of real numbers of length k. Then
k
Y
al −
l=1
k
Y
bl =
l=1
k j−1
X
Y
j=1
al (aj − bj )
l=1
Y
k
bl
l=j+1
Proof. We induct on k. The case k = 1 is trivial and for c1 , c2 , d1 , d2 ∈ R we have
c1 c2 − d1 d2 = (c1 − d1 )d2 + c1 (c2 − d2 )
(6.22)
Assume the result holds for the case k − 1 and let (al )kl=1 , (bl )kl=1 ⊂ R be finite sequences of real
numbers of length k. Then
#
Y
" k−1
k−1
k
k j−1
k−1
X
Y X j−1
Y Y Y
al (aj − bj )
bl =
al (aj − bj )
bl bk +
al (ak − bk )
j=1
l=1
j=1
l=j+1
l=1
l=j+1
l=1
Qk−1
By applying the induction hypothesis and setting c1 = l=1 al , c2 = ak and d1 =
d2 = bk we reduce the problem to the case k = 2 and thus we are done by (6.22).
Qk−1
l=1
bl ,
Given f1 , . . . , fk ∈ L∞ (X, B, µ) with arbitrary expectation, pointwise application of the
above lemma gives:
k
Y
l=1
ln
T fl −
k Z
Y
l=1
fl dµ =
k j−1
X
Y
j=1
ln
T fl
T
jn
l=1
Z
fj −
fj dµ
Y
Z
k
l=j+1
Now for each j = 1, . . . , k define

for
 fl R
fRj − fj dµ
gj,l :=

fl dµ
31
l = 1, . . . , j − 1
l=j
l = j + 1, . . . , k
fl dµ
(6.23)
The right hand side of (6.23) then becomes
k Y
k
X
T ln gj,l
j=1 l=1
Clearly for each j = 1, . . . , k the function gj,j has zero expectation and so we may apply the
previous case to each set of functions gj,1 , . . . , gj,k ∈ L∞ (X, B, µ). Moreover,
X
X
k
k Y
k Z
k
Y
1 N X
1 N Y
ln
ln
T
f
−
=
f
dµ
T
g
l
l
j,l N
N
2
L (µ)
L2 (µ)
n=1
n=1 j=1
l=1
l=1
l=1
≤
k X
N Y
k
X
j=1
n=1 l=1
1
N
T ln gj,l L2 (µ)
And so the first claim is established.
The second claim is somewhat easier to deduce.
Claim (2). Suppose that every weak mixing system (X, B, µ, T ) has the property that for all
f1 , . . . , fk ∈ L∞ (X, B, µ) the following holds:
N Y
k
k Z
Y
1 X
ln
(6.24)
T fl −
fl dµ
lim
2 =0
N →∞ N
L (µ)
n=1
l=1
l=1
Then for a fixed weak mixing system (X, B, µ, T ), given g0 , . . . , gk ∈ L∞ (X, B, µ) we have
2
N Z Y
k
k Z
Y
1 X
ln
lim
T fl dµ −
fl dµ = 0
N →∞ N
n=1
l=0
l=0
Indeed, given g0 , . . . , gk ∈ L∞ (X, B, µ), by the hypothesis (6.24) of the claim
N Y
k
k Z
Y
1 X
ln
T
g
−
g
dµ
lim l
l
2 =0
N →∞ N
L (µ)
n=1
l=1
l=1
Now Rnorm convergence implies weak convergence and so by considering the linear functional
f 7→ f g0 dµ on L∞ (X, B, µ) we have
X
Z Y
k Z
k
Y
1 N
ln
lim gl dµ = 0
T gl dµ −
N →∞ N
n=1
l=0
l=0
In the proof of Proposition 6.3 we demonstrated that the product system (X × X, B ⊗ B, µ ⊗
µ, T ⊗ T ) is also weak mixing and hence we may applying the foregoing reasoning to this system
and the functions gl ⊗ gl for l = 0, . . . , k. The result of this is that
X
2 Z Y
2 Y
k Z
k
1 N
ln
gl dµ = 0
T gl dµ −
lim
N →∞ N
n=1
l=0
l=0
By
the same argument
R Qprecisely
Qk R as used in the end of Proposition 6.3, taking this time an =
k
ln
T
g
dµ
and
b
=
l
l=0
l=0 gl dµ we obtain the desired result.
Thus we have obtained our first general result toward the proof of the Multiple Recurrence
Theorem.
Theorem 6.7. The Multiple Recurrence Theorem holds for any weak mixing system. More
precisely, if (X, B, µ, T ) is a weak mixing system, A ∈ B a set of positive measure and k ∈ N
then
\
N
k
1 X
lim
µ
T −jn A = µ(A)k+1 > 0
N →∞ N
n=1
j=0
32
7
Compact Systems
We now turn our attention to the class of systems which exhibit ordered behaviour, exemplified
by rotations of the circle. As in the previous section, we will prove the Multiple Recurrence
Theorem holds for this particular class of systems. Thus we will have shown the result holds
in the two extreme cases: when the system behaves randomly and when the behaviour of the
system is structured. The work in this section is adapted from ([10], §4).
Definition 14. Let (X, B, µ, T ) be a measure preserving system and f ∈ L2 (X, B, µ). We say
f is almost periodic (AP) if the orbit of f under T is relatively compact. In other words
clos{T n f : n ∈ N}
is a compact set in the norm topology.
A subset of a complete metric space is relative compact if and only if it is totally bounded.
Hence a function f is AP if and only if for every > 0 there exists some finite set g1 , . . . , gm ∈
L2 (X, B, µ) such that
min kT n f − gj kL2 (µ) < for all n ∈ N
1≤j≤m
Definition 15. A system (X, B, µ, T ) is compact if every f ∈ L2 (X, B, µ) is almost periodic.
Proposition 7.1. Let (X, B, µ, T ) be a measure preserving system and f ∈ L2 (X, B, µ) an AP
function. Then for every > 0 there exists a syndetic set K ⊆ N such that
kT n f − f kL2 (µ) < for all n ∈ K
Proof. Let > 0 be given. Then as clos{T n f : n ∈ N0 } is compact there exists a (finite)
maximum −seperated set, and by density we may assume this set has the form
{T t1 f, . . . , T tr f }
for some ti ∈ N numbered so that t1 < t2 < · · · < tr . For each n ∈ N consider the set
{T n+t1 f, . . . , T n+tr f }
(7.1)
Note that as T is measure preserving, for any pair 1 ≤ i < j ≤ r
kT n+ti f − T n+tj f kL2 (µ) = kT n (T ti f − T tj f )kL2 (µ) = kT ti f − T tj f kL2 (µ) ≥ Thus, for each n ∈ N the set (7.1) is also a maximum −seperated set.
It follow for each n ∈ N there exists 1 ≤ i(n) ≤ n such that
kT n+ti(n) f − f kL2 (µ) < as otherwise one may add f = T 0 f to the set (7.1) to obtain a larger −seperated set, contradicting maximality of (7.1).
S1+t −t
Finally take K := {n + ti(n) : n ∈ N}. It is easy to see N = a=1r 1 K − a and so K is
indeed syndetic.
Theorem 7.2. Let (X, B, µ, T ) be a compact measure preserving system, f ∈ L∞ (X, B, µ, T )
such that f ≥ 0 and f is not almost everywhere 0. Then for all k ∈ N
N Z k
1 X Y ln
lim inf
T f dµ > 0
N →∞ N
n=1
l=1
33
(7.2)
Proof. By scaling we may assume without loss of generality 0 ≤ f ≤ 1. To prove the theorem
it suffices to show there exists a set K ⊆ N of positive lower density and constant a > 0 such
that for all n ∈ N
Z Y
k
T ln f dµ ≥ a
(7.3)
l=1
For once this is established it then follows that
N Z k
1
1 X Y ln
T f dµ ≥
N n=1
N
l=1
≥ a
Z Y
k
X
n∈F ∩[1,N ]
T ln f dµ
l=1
|F ∩ [1, N ]|
N
and as d(F ) > 0, taking the limit infimum gives (7.2).
The proof proceeds in two simple stages:
• We show Proposition 7.1 implies given > 0 there exists a syndetic set K ⊆ N such that
for all n ∈ K and for all l = 0, . . . , k one has kT ln f − f kL2 (µ) < .
• Given n ∈ K we demonstrate
as the T Rln f are −close in norm to f , it follows that
R Qk that
ln
the difference between
f k+1 dµ is small. As the latter intergral is
l=0 T f dµ and
strictly positive, for appropriate choice of so too is the former.
By the proposition, for any > 0 there exists a syndetic set K ⊆ N such that for all n ∈ K
kT n f − f kL2 (µ) <
k
Now, for all n ∈ K and all l = 0, . . . , k, by the triangle inequality and T measure preserving
kT ln f − f kL2 (µ)
≤
l
X
kT (l−j+1)n f − T (l−j)n f kL2 (µ)
j=1
=
l
X
kT n f − f kL2 (µ) < j=1
For the second stage, note that for each n ∈ K
Z k
Z
Z k j−1
Y
Y
X
T ln f dµ − f k+1 dµ = T ln f (T jn f − f )f k−j dµ
j=0 l=0
l=0
k Z j−1
X
Y
≤
j=0
k Z
X
≤
T ln f |T jn f − f |f k−j dµ
l=0
|T jn f − f | dµ
j=0
Where we have used our assumption 0 ≤ f ≤ 1. By the Cauchy Schwarz inequality
f | dµ ≤ kT jn f − f kL2 (µ) < . Hence
Z Y
k
T ln f dµ −
Z
f k+1 dµ < (k + 1)
R
|T jn f −
(7.4)
l=0
1
So far the choice of has been arbitrary. Now if we take 0 < < k+1
we conclude
Z Y
Z
k
T ln f dµ ≥ f k+1 dµ − (k + 1) > 0
l=0
Thus setting a =
R
f
k+1
dµ − (k + 1) we have (7.3) as required.
34
R
f k+1 dµ, then by (7.4)
Theorem 7.3. Multiple recurrence holds for compact systems. Let (X, B, µ, T ) be a compact
measure preserving system, A ∈ B a set of positive measure and k ∈ N. Then
lim inf
N →∞
\
N
k
1 X
µ
T −ln A > 0
N n=1
l=0
Proof. Simply apply Theorem 7.2 to the characteristic function 1A .
8 The Dichotomy Between Weak Mixing and Compact
Systems
The purpose of the previous sections has been to prove the multiple recurrence result for certain
types of measure preserving systems, namely weak mixing and compact systems. Whilst these
classes are large enough to include important and interesting examples, by no means do they
classify all measure preserving systems.
In order to arrive at a comprehensive multiple recurrence result we shall have to delve deeper
and use a more subtle approach. Recalling the intuitive notion of weak mixing systems exhibiting
random and compact systems exhibiting structured behaviour, it is reasonable to suppose if a
measure preserving system fails to behave completely randomly then some significant ‘subsystem’ must behave in a structured manner.
Definition 16. Let (X, B, µ, T ) be a measure preserving system and suppose A ⊆ B is a
sub-σ-algebra which is T -invariant: i.e. T −1 A ⊆ A . Then (X, A , µ, T ) is a measure preserving
system and we say this system is a factor of (X, B, µ, T ). If A only contains null and conull
sets, we say the factor (X, A , µ, T ) is trivial.
Factors of measure preserving systems will henceforth dominate our study. Rather than
attempt demonstrate multiple recurrence for the system as a whole, we shall concentrate on
demonstrating it for factors.
Definition 17. Suppose (X, A , µ, T ) is a factor of a system (X, B, µ, T ) and that for all A ∈ A
with µ(A) > 0 and all k ∈ N we have
lim inf
N →∞
\
k
N
1 X
T −jn A = 0
µ
N n=1
j=0
then we say the action of T on (X, A , µ) is Szemerédi (SZ).
The bulk of this section will be occupied with proving the following Theorem:
Theorem 8.1. A system (X, B, µ, T ) is not weak mixing if and only if there exists some nontrivial compact factor.
An intuitive interpretation of the content of this theorem might be: ‘a system exhibits
completely random behaviour if and only if no non-trivial factor exhibits structure’.
An important consequence of this is the following:
Corollary 8.2. Let (X, B, µ, T ) be a measure preserving system. Then there exists a non-trivial
factor (X, A , µ, T ) for which the action of T on (X, A , µ) is SZ.
Proof. Assuming Theorem 8.1 the result follows directly from the multiple recurrence results
for weak mixing and compact systems, as explained by the following flow chart:
35
Is the
system
weak
mixing?
No
By Theorem
8.1 there exists
a non-trivial
compact factor
(X, A , µ, T )
Yes
By Theorem
6.7 the action
of T on
(X, B, µ) is SZ
By Theorem
7.3 the action
of T on
(X, A , µ) is SZ
In order to prove Theorem 8.1 we use the following characterisation of compact factors:
Proposition 8.3. Let (X, B, µ, T ) be a measure preserving system. Then there exists a nontrivial compact factor if and only if exists some non-constant AP function.
Proof. Assume that every AP function f ∈ L2 (X, B, µ) is constant and (X, A , µ, T ) is some
compact factor of (X, B, µ, T ). Then every g ∈ L2 (X, A , µ) is an AP function and thus also
AP in the sense of L2 (X, B, µ). Thus the space L2 (X, A , µ) is precisely the set of constant
functions on X and therefore A must only contain sets of measure 1 and their complements.
For the converse, assume there exists some non-constant F ∈ L2 (X, B, µ) which is AP. Let
A denote the σ-algebra generated by {T n F : n ∈ N}. It is easy to check A is a T -invariant
sub-σ-algebra. Indeed, if A ∈ A then A = (T n F )−1 (B) for some Borel set B ⊆ R and
T −1 A
= T −1 {x ∈ X : T n F (x) ∈ B}
= {x ∈ X : T n+1 F (x) ∈ B}
=
(T n+1 F )−1 (B) ∈ A
So T −1 A ⊆ A as required.
It remains to show the factor is compact: that every L2 (X, A , µ) function is AP. To begin
we show the subset V ⊆ L∞ (X, A , µ) of bounded AP functions is a closed vector subspace.
Suppose f, g ∈ V and a, b ∈ R \ {0} and let > 0 be given. Then there exist functions f1 , . . . , fl
and g1 , . . . , gm ∈ L2 (X, A , µ) such that
min kT n f − fi kL2 (µ)
<
min kT n g − gj kL2 (µ)
<
1≤i≤l
1≤j≤m
2a
2b
for all n ∈ N
for all n ∈ N
Applying the triangle inequality gives
min kT n (af + bg) − (afi + bgj )k < 1≤i≤l
1≤j≤m
for all n ∈ N
Hence the function af + bg ∈ V and V is a vector space. By a method similar to the proof that
V is a vector space, we can show that the limit in L2 (X, A , µ) of any convergent sequence of
AP functions is AP and hence the space V is topologically closed.
To complete the proof we claim that for every A ∈ A the function 1A is AP. It then follows
by linearity that all simple functions are AP and thus, by the density of the space of simple
functions, V = L2 (X, A , µ) as required.
36
Consider the case A = F −1 (a, ∞) for some a ∈ R. It is easy to show that for any pair
f, g ∈ V the functions min(f, g), max(f, g) ∈ V . In particular the function G = max(F − a, 0)
and functions Hn = min(1, nG) defined for all n ∈ N are AP. Now
Z
k1A − Hn kL2 (µ) =
|1 − nG|2 dµ ≤ µ({x ∈ A : nG(x) < 1}
{x∈A:nG(x)<1}
and thus
lim k1A − Hn kL2 (µ) = 0
n→∞
Hence 1A is the L2 (X, A , µ)-limit of AP functions and is therefore AP. As the collection of
functions of the form F −1 (a, ∞) together with {∅, X} ia a generating algebra G for B, the
general case A ∈ B follows by approximating A by a sequence of sets (An )n∈N ⊂ G .
Applying Proposition 8.3 we restate Theorem 8.1 as the following:
Proposition 8.4. A system (X, B, µ, T ) is not weak mixing if and only if there exists a nonconstant A.P. function f ∈ L2 (X, B, µ).
Proof. Recall by Proposition 3.7 a system (X, B, µ, T ) is ergodic if and only all T -invariant
functions are constant. Hence if (X, B, µ, T ) is not ergodic the result is trivial: simply take
f ∈ L1 (X, B, µ) a non-constant T −invariant function. Clearly f is A.P. as the closure of the
orbit clos{T n f : n ∈ N} is just the (compact) singleton {f }.
Otherwise we consider (X, B, µ, T ) an ergodic system which is not weak mixing. In this
case, by Proposition 6.3 the product system X × X is not ergodic. It follows that there must
exist some non-constant T × T −invariant function H ∈ L1 (X × X).
Consider the convolution operator H∗ : L2 (X, B, µ) → L2 (X, B, µ) given by
Z
H ∗ φ (x) = H(x, x0 )φ(x0 ) dµ(x0 ) for all φ ∈ L2 (X, B, µ)
(8.1)
We will show that every function in the image of this operator is AP. It will provide a wide
selection of AP functions from which it is straightforward to find one which is non-constant.
It is an elementary fact from linear analysis that (8.1) is a (Hilbert-Schmidt and therefore)
compact operator (see [2, 24]). Thus for any R ≥ 0, the set
clos{H ∗ θ : kθkL2 (µ) ≤ R}
is compact in L2 (X, B, µ). We claim the following:
T n (H ∗ φ) = H ∗ T n φ
This is easily deduced using the fact T
Z
T n (H ∗ φ) (x) =
Z
=
Z
=
for all φ ∈ L2 (X, B, µ) and all n ∈ N
is measure preserving and H is T × T −invariant:
H(T n x, x0 )φ(x0 ) dµ(x0 )
H(T n x, T n x0 )φ(T n x0 ) dµ(x0 )
H(x, x0 )φ(T n x0 ) dµ(x0 ) = H ∗ T n φ (x)
Now fix φ ∈ L2 (X, B, µ), let R = kφkL2 (µ) and note that kT n φkL2 (µ) = R for all n ∈ N. It
follows that
clos{T n H ∗ φ : n ∈ N} = clos{H ∗ T n φ : n ∈ N} ⊂ clos{H ∗ θ : kθk ≤ R}
Thus clos{T n H ∗ φ : n ∈ N} is a closed subset of a compact set, and hence compact. Moreover
for each φ ∈ L2 (X, B, µ) the function H ∗ φ ∈ L2 (X, B, µ) is AP.
37
It remains to find a function φ̃ ∈ L2 (X, B, µ) such that H ∗ φ̃ is non-constant. Consider the
function H1 ∈ L1 (X, B, µ) given by
Z
H1 (x0 ) = H(x, x0 ) dµ(x)
It is easy to see H1 is T -invariant:
0
H1 (T x )
Z
H(x, T x0 ) dµ(x)
Z
H(T x, T x0 ) dµ(x)
Z
H(x, x0 ) dµ(x) = H1 (x0 )
=
=
=
Recalling our assumption (X, B, µ, T ) is ergodic, this function must therefore be constant,
H1 ≡ c, say.
Now H − c is a non-constant T × T -invariant function, so all our previous reasoning holds
if we replace H with H − c. Thus we may assume without loss of generality that H has the
property H1 ≡ 0.
By definition H itself is non-constant and in particular H is not almost everywhere 0. Thus
there exists some φ̃ ∈ L2 (X, B, µ) such that
Z
H ∗ φ̃ (x) = H(x, x0 )φ(x0 ) dµ(x0 ) 6= 0
It follows from our assumption H1 ≡ 0 and Fubini’s Theorem that
Z
ZZ
H ∗ φ̃ (x) dµ(x) =
H(x, x0 )φ̃(x0 ) dµ(x0 )dµ(x)
Z Z
0
=
H(x, x ) dµ(x) φ̃(x0 ) dµ(x0 ) = 0
As the function H ∗ φ̃ ∈ L2 (X, B, µ) is non-zero but has zero expectation, it must be nonconstant as required.
For the converse, suppose (X, B, µ, T ) is a weak mixing system. We will show that any AP
function is constant.
Let > 0 be given and suppose f ∈ L2 (X, B, µ) is AP. If f ≡ 0 we are done, so we may
assume kf kL2 (µ) 6= 0. By Proposition 7.1 there exists a syndetic set G ⊆ N such that
kT n f − f kL2 (µ) <
2kf kL2 (µ)
for all n ∈ G
(8.2)
On the other hand, if we recall the definition of weak mixing we have
lim
N →∞
N Z
X
f T n f dµ −
Z
2 2
f dµ
=0
n=1
Using the alternative characterisation of convergence in the mean, stated in Theorem 2.1, there
exists a set Z ⊆ N of zero density such that
Z
Z
2 f T n f dµ −
for all n ∈ N \ Z
(8.3)
f dµ <
2
Obviously G ∩ N \ Z 6= ∅ as otherwise G ⊆ Z and G is a set of positive lower density, a
contradiction. Thus there exists some n0 ∈ N such that the both equalities stated in (8.2) and
(8.3) hold for n = n0 .
38
Finally, by the triangle and Cauchy-Schwarz inequalities
Z
Z
Z
2 Z
2 Z
n0
n0
f 2 dµ −
f dµ f dµ ≤ f (T f − f ) dµ + T f dµ −
< kf kL2 (µ) kT n0 f − f kL2 (µ) +
<
2
2
R
R
f dµ . The fuction x 7→ x2 on R is convex and
As is arbitrary we must have f 2 dµ =
we conclude for this equality to hold f must be constant.
8.1
Proof of the Multiple Recurrence Theorem: The Strategy
We are now in a position to sketch the details of the strategy used to prove the Multiple
Recurrence Theorem. The language used here is somewhat vague and naı̈ve, but it is hoped
that this brief description will give the reader some sense of what we are aiming to achieve.
Essentially the proof is a standard Zorn’s Lemma argument. Using our new terminology, we
are required to prove that given any system (X, B, µ, T ) the action of T on (X, B, µ, T ) is SZ.
Let F denote the collection of factors A ⊆ B for which the action of T on (X, A , µ, T ) is SZ,
ordered by inclusion. By our previous work F is non-empty and in particular contains some
non-trivial factor. In Section 10 we use Zorn’s Lemma to show F contains a maximal element.
Sections 11 to 13 then concentrate on showing that no proper factor can be maximal and hence
this maximal SZ factor must be the whole of B.
In order to prove the latter claim we embark on a programme of ‘relativising’ the notions
introduced in the previous sections. In particular, in Section 11 we concentrate on defining
‘weak mixing relative to (X, A , µ, T )’. The most significant result of this will be:
If the action of T on (X, A , µ, T ) is SZ and (X, B, µ, T ) is a weak mixing relative
to (X, A , µ, T ) then the action of T on (X, B, µ, T ) is also SZ.
Similarly, in Section 12 we define what it is meant for a system to be ‘relatively compact’ and derive an entirely complementary result for the case (X, B, µ, T ) compact relative to (X, A , µ, T ).
If the action of T on (X, A , µ, T ) is SZ and (X, B, µ, T ) is compact relative to
(X, A , µ, T ) then the action of T on (X, B, µ, T ) is also SZ.
The final step in proving the Multiple Recurrence Theorem will be to prove a dichotomy between
relative weak mixing and relative compact systems. This result is very similar in flavour to
Theorem 8.1. In particular we show
For any factor (X, A , µ, T ) of (X, B, µ, T ) either:
• (X, B, µ, T ) is a weak mixing relative to (X, A , µ, T ), or
• There exists some intermediate factor A ⊆ B ∗ ⊆ B such that A is a proper
factor of B ∗ and (X, B ∗ , µ, T ) is compact relative to (X, A , µ, T ).
Combining these three results we conclude that no proper factor can be a maximal SZ factor
and the proof of the Multiple Recurrence Theorem is complete.
Although the terminology used here may be as yet unclear to the reader, it should be obvious
that the remainder of the proof mirrors to some extent the work we have done so far. In short,
we prove a result for two opposite cases and then demonstrate and exploit a relationship between
these extremes.
The definitions of ‘relatively compact’ and ‘relatively weak mixing’ will arise from considering
conditional expectation and measure. In the following section we develop important measure
theoretical techniques, generalising the familiar idea of conditional expectation of a discrete
random variable from elementary probability theory. This will allow us to define a system of
probability measures µy which are ‘conditioned on the factor A ’. The definitions of weak mixing
relative to (X, A , µ, T ) and compact relative to (X, A , µ, T ) are then simply analogies of the
familiar definitions of weak mixing and compact systems which use these conditional measures.
39
9
Factors of Measure Preserving Systems
In the previous section we made significant progress toward the proof of the Multiple Recurrence
Theorem. We demonstrated that for any measure preserving system (X, B, µ, T ) there exists
some non-trivial factor (X, A , µ, T ) for which the action of T is SZ. Now we develop the theory
necessary in order to make a serious study of factors, introducing certain tools which will allow
us to relate the properties of the factors to the properties of an entire system.
9.1
9.1.1
Conditional Expectation
Motivation
This example serves as a reminder of the simple notion of conditional expectation assumed
familiar to the reader13 .
Example 5. Suppose (Ω, F , P) is a probability space and f, g are random variables, each with
a finite number of outcomes
f :Ω →
{a1 , . . . , an }
g:Ω →
{b1 , . . . , bm }
Then the conditional probability that f = ai given that g = bj is defined as
P(f = ai | g = bj ) =
P(f = ai , g = bj )
P(g = bj )
This leads to the definition of the conditional expectation of f , given g = bj
E(f | g = bj ) =
n
X
ai P(f = ai | g = bj )
i=1
The conditional expectation of f given g is then defined as a random variable E(f | g)
E(f | g)(ω) = E(f | g = bj )
for ω ∈ g −1 (bj )
Implicit in this definition is the fact that the sets Dj = g −1 ({bj }) for j = 1, . . . , m are disjoint
and partition Ω and that the conditional expectation is constant on each of the Dj . An important
idea to grasp is that by defining G := σ({Dj : j = 1, . . . , m}) = σ(g), we may think of E(f | g)
as a G -measurable function. Furthermore, if we give the set Ω0 = {bj }m
j=1 the pull-back measure
P0 = g ∗ P induced by g, that is
P0 ({bj }) = P(g −1 {bj })
for j = 1, . . . , m
then
g : (Ω, F , P) → (Ω0 , P(Ω0 ), P0 )
(9.1)
0
is a measure preserving map and we may equally consider E(f | g) a P -measurable function on
Ω0 .
Our aim will be to generalise this notion of conditional expectation so that f, g may be
taken to be continuous random variables. This is rather a non-trivial task and in some cases no
meaningful analogies are available (although in many reasonable cases, and in particular when
we are dealing with Borel probability space, such an analogy does exist).
13
Although it is stressed in advance that we will not, in fact, be adopting any decidedly probabilistic viewpoint in
any part of this study.
40
Our approach will be to consider θ : (X, B, µ) → (Y, D, ν) a measure preserving map
(c.f. g in (9.1)) and define for all f ∈ L1 (X, B, µ) the conditional expectation as a function E(f | Y) ∈ L1 (Y, D, ν). Intuitively, we think of E(f | Y) as a direct analogy of the simple
notion of conditional expectation from Example 5: that is, for every y ∈ Y we have
E(f | Y)(y) = E(f | θ = y)
This, however, is not a precise way of thinking; it is clear that much more subtlety is required14 .
The sophomore part of this section will relate the theory of conditional expectation to the
study of factors by introducing the notion of an extension of measure preserving systems.
The probabilistic context is not vital for this study: we shall introduce condition expectation
as an object of linear algebra and not probability theory15 . However, understanding the major
result of this section, Theorem 9.4, is greatly aided by some probabilistic intuition.
9.1.2
Definition and Basic Properties
We follow the construction of the conditional expectation as given in ([8] Chapter 5). Let
θ : (X, B, µ) → (Y, D, ν) denote a measure preserving map between measure spaces. There is a
naturally defined lift
θ̃ : L2 (Y, D, ν) → L2 (X, B, µ)
given by
θ̃(f ) = f θ where f θ (x) = f (θ(x)) for all x ∈ X.
Now, it is easy to see L2 (Y, D, ν)θ ⊆ L2 (X, B, µ) is a closed, linear subspace. We may use
orthogonal projects to relate the whole space to this subspace, and then map back to L2 (Y, D, ν)
in order to give a map L2 (X, B, µ) → L2 (Y, D, ν).
Definition 18. Define the conditional expectation operator16
E(· | Y) : L2 (X, B, µ) → L2 (Y, D, ν)
by
E(f | Y)θ = P f
for all f ∈ L2 (X, B, µ)
where P is the orthogonal projection of L2 (X, B, µ) onto the closed subspace L2 (Y, D, ν)θ .
It is clear that E(· | Y) is a linear operator and furthermore it is a contraction. Note also
that if f ∈ L2 (Y, D, ν) then E(f θ | Y) = f . Using elementary Hilbert space theory we establish
the following:
Proposition 9.1. Let f ∈ L2 (X, B, µ) and h ∈ L2 (Y, D, ν). Then
Z
Z
f hθ dµ = E(f | Y)h dν
(9.2)
In particular, taking h = 1A for any A ∈ D we have
Z
Z
f dµ =
E(f | Y) dν
for all f ∈ L2 (X, B, µ)
(9.3)
θ −1 (A)
A
Finally, if g ∈ L∞ (Y) then
E(g θ f | Y) = gE(f | Y)
14
(9.4)
For instance, are all the singletons {y} measurable; are the functions E(f | Y) necessarily everywhere defined?
Of course a probabilistic definition does exists, see for instance ([5] Chapter 5; [25], Chapter 9).
16
Provided there is no risk of ambiguity, throughout the remainder of this discourse X will denote the measure
space (X, B, µ) and Y the measure space (Y, D, ν).
15
41
Proof. Note that L2 (Y, D, ν)θ is a closed subspace of the Hilbert space L2 (X, B, µ) and so the
projection P satisfies hP f, ki = hf, ki for all k ∈ L2 (Y, D, ν)θ . Thus, as θ is measure preserving,
Z
Z
Z
Z
θ
θ
θ θ
f h dµ = P f h dµ = E(f | Y) h dµ = E(f | Y)h dν
giving (9.2) and (9.3). To show (9.4), note for any g ∈ L∞ (Y) we have gh ∈ L2 (Y) and
f g θ ∈ L2 (X). Applying (9.3) to gh gives
Z
Z
f (gh)θ dµ = E(f | Y)gh dν
On the other hand,
Z
f (gh)θ dµ =
Z
(f g θ )hθ dµ =
Z
E(f g θ | Y)h dν
R
R
Thus E(f | Y)gh dν = E(f g θ | Y)h dν for all h ∈ L2 (Y) so we must have (9.4) by the
elementary properties of the inner product.
Proposition 9.2. For any f ∈ L2 (X, B, µ) with f (x) ≥ 0 for almost every x ∈ X we have
E(f | Y)(y) ≥ 0 for almost every y ∈ Y .
Proof. Suppose f ∈ L2 (X, B, µ) with f ≥ 0, that is f (x) ≥ 0 for almost every x ∈ X.
The subspace L2 (Y, D, ν)α ⊆ L2 (X, B, µ) is closed under f1 , f2 7→ max(f1 , f2 ) so that g =
max(E(f | Y), 0) ∈ L2 (Y, D, ν)α . Now suppose E(f | Y) 0. The g 6= E(f | Y)kL2 (µ) and
kf − gkL2 (µ) < kf − E(f | Y). But this contradicts the closest point property of orthogonal
projections onto closed linear subspaces. Hence E(f | Y) ≥ 0.
Proposition 9.3. The conditional expectation operator extends to an operator E( · | Y) : L1 (X, B, µ) →
L1 (Y, D, ν) satisfying all previously established properties mutatis mutandis.
Proof. The result follows by the density of L2 (X, B, µ) in L1 (X, B, µ) and applying standard
Banach space theory. The details are omitted; see [5] or [8] for the full proof.
9.1.3
Disintegration of Measure
We are now in a position to present the main result of this section which makes explicit the sense
by which the conditional expectation operator generalises the notion of conditional expectation
as detailed in Example 5. The statement and subsequent proof of the following theorem are
adapted from ([5] Chapter 5; [8] Chapter 5).
Theorem 9.4 (Disintegration Theorem). Let θ : (X, B, µ) → (Y, D, ν) be a measure preserving
map between probability spaces, where we assume (X, B, µ) is a Borel probability space. Then
there exists a D-measurable set Y 0 ⊆ Y of measure 1 and system of probability measures {µy :
y ∈ Y 0 } on (X, B) such that
1. For all f ∈ L1 (X, B, µ) and y ∈ Y 0 , f ∈ L1 (X, B, µy ) and
Z
f dµy = E(f | Y)(y)
(9.5)
2. For all f ∈ L1 (X, B, µ) the ν-almost everywhere defined function
Z
y 7→ f dµy
is ν-measurable. Furthermore,
Z Z
Z
f dµy dν(y) = f dµ
42
(9.6)
Example 6. Inspired by Example 5, let (X, B, µ) be any probability space satisfying the
hypotheses of Theorem 9.4 and θ ∈ L1 (X, B, µ) be any simple function. Then there exists some
(minimal) finite subset {b1 , . . . , bm } ⊂ R such that
θ(x) ∈ {b1 , . . . , bm }
for almost every x ∈ X
Let Bi = θ−1 ({bi }) for each i ∈ {1, . . . , m}. If y = bi then for any f ∈ L1 (X, B, µ) observe
Z
Z
1
f dµ
f dµy =
µ(Bi ) Bi
R
Hence on each of the Bi the value of f dµy is the average of the value of f over the region Bi .
In particular, if A ∈ B then
µ(Bi ∩ A)
µy (A) =
µ(Bi )
Thus the conditional measures and expectations generalise the simple case of Example 5.
As the above example is somewhat degenerate we now turn to a more appropriate example.
Example 7. Denote (X, B, µ) the unit square [0, 1] × [0, 1] with the usual Lebesgue measure
and (Y, D, ν) the unit interval [0, 1] again with Lebesgue measure, so that µ = ν ⊗ ν. If we let
θ denote the projection onto the first co-ordinate,
θ : (X, B, µ) → (Y, D, ν)
;
(x1 , x2 ) 7→ x1
then θ is a measure preserving map. For any f ∈ L1 (X, B, µ) the conditional expectation is
given by
Z
E(f | Y)(x1 ) = f (x1 , x2 ) dν(x2 )
For any y ∈ Y the set A = {a} × [0, 1] ∈ B is µ-null. However,
Z
Z
µy (A) = 1A (a, x2 ) dν(x2 ) =
dν(x2 ) = 1
Hence the measure µy is supported on a µ-null set.
In proving Theorem 9.4 the idea is to fix y ∈ Y and use the Riesz Representation Theorem
to associate a measure to the (restricted) functional E(· | Y)(y) : C(X) → R. This approach,
however, is naive and flawed: each E(f | Y) ∈ L1 (Y, D, ν) and thus only defined up to nullfunctions. For any given f ∈ C(X), the existence of E(f | Y)(y) is only guaranteed ν-almost
everywhere.
Happily, one can show the set on which the definition can fail is a countable union of null sets.
The measures µy can then be defined for ν-almost every y ∈ Y as we will presently demonstrate.
Proof (of Theorem 9.4). First note that once Part 1 of the theorem holds for some f ∈ L1 (X, B, µ),
Part 2 holds for f by virtue of the properties of the conditional expectation operator. Hence we
are only required to prove Part 1.
Recall for the compact metric space X, the space (C(X), k · k∞ ) is separable. Therefore
there exists a countable, dense set of functions {h0 = 1X , h1 , h2 , . . . } ⊂ C(X). The set of linear
combinations of these functions over Q is countable dense vector space lying within C(X). We
write this space as
F = {f0 = 1X , f1 , f2 , . . . } ⊂ C(X)
Recall, E( · | Y) : L1 (X, B, µ) → L1 (Y, D, ν) is a linear, bounded (with operator norm 1)
and positive operator. Thus, fixing n ∈ N
1. If fn = pfi + qfj for some p, q ∈ Q and i, j ∈ N then
E(fn | Y)(y) = pE(fi | Y)(y) + qE(fj | Y)(y)
43
ν-almost everywhere.
2. |E(fn | Y)(y)| ≤ kfn k∞ ν-almost everywhere.
3. If fn ≥ 0 then E(fn | Y)(y) ≥ 0 ν-almost everywhere.
Thus, the set An ⊆ Y on which conditions 1, 2 and 3 hold for the function fn is of measure
1. The set Y 0 on whichTthe conditions hold for all f ∈ F is also measure 1: it is given by the
countable intersection n∈N An .
For each n ∈ N let gn ∈ L1 (Y, D, ν) be an (everywhere defined) function representing the
equivalence class E(fn | Y) ∈ L1 (Y, D, ν). For each y ∈ Y 0 we may then define a functional
Ey : F → R by Ey (fn ) = gn (x). By the construction of the set Y 0 this functional is Q-linear,
bounded and positive.
As the set F is dense in C(X), by elementary theory of linear operators we may extend Ey
to a continuous Q-linear functional
Ey : C(X) → R
It then follows by continuity Ey is R-linear. By the Riesz Representation Theorem, for every
y ∈ Y 0 as Ey ∈ C(X)∗ it can be represented by a signed measure µy on (X, B) with
Z
Ey (f ) = f dµy
∀f ∈ C(X)
Moreover, Ey is a positive functional and by Proposition 9.1 we have Ey (f0 ) = E(1X | Y) = 1.
It follows for all y ∈ Y 0 , the measure µy is a probability measure on (X, B).
We have obtained our system of measures {µy : y ∈ Y 0 }. It remains to show the properties
enumerated in the statement of the theorem.
By definition for all fn ∈ F we have
Z
fn dµy = E(fn | Y)(y)
almost everywhere y ∈ Y 0
(9.7)
Thus the theorem holds for all fn ∈ F . We will use approximations to extend the result to
all f ∈ L1 (X, B, µ).
By the density of F , given any g ∈ C(X) there exists a sequence (fni )i∈N ⊆ F such that
limi→∞ kg − fni k∞ = 0. Therefore, by (9.7) we immediately deduce the theorem holds for all
continuous functions.
As the continuous functions are dense in L1 (X, B, µ) for any f ∈ L1 (X, B, µ) there exists a
sequence
(gn )n∈N ⊂ C(X)
with
∞
X
kgn kL1 (µ) < ∞
such that k
n=1
∞
X
gn − f kL1 (µ) = 0.
n=1
Note that by
R the preceding argument, for almost every y ∈ Y and each n ∈ N we have
E(gn | Y) = gn dµy .
The P
conditional expectation operator is a contraction. As kE(|gn | | Y)kL1 (ν) ≤ kgn kL1 (µ) it
∞
follows n=1 kE(|gn | | Y)kL1 (ν) < ∞. In particular
∞
X
E(|gn | | Y)(y) < ∞
almost everywhere y ∈ Y
n=1
At a point y ∈ Y such that the above sum converges we have
∞
X
|gn | ∈ L1 (X, B, µy )
n=1
44
(9.8)
PN
For each N ∈ N and almost every y ∈ Y the function n=1 gn ∈ L1 (X, B, µy ) is dominated by
(9.8) and hence by dominated convergence
Z X
∞
∞ Z
X
gn dµy =
gn dµy
almost everywhere y ∈ Y
n=1
=
n=1
∞
X
E(gn | Y)(y)
almost everywhere y ∈ Y
n=1
Equivalently, thinking of y 7→
R PN
n=1 gn
dµy as an L1 (Y, D, ν) function,
Z X
N
N
X
lim E(gn | Y) −
gn dµ(·) N →∞
n=1
L1 (ν)
n=1
=0
(9.9)
On the other hand,
N
X
|E(gn | Y)|
L1 (ν)
≤
n=1
and so k
P∞
n=1
N
X
kE(gn | Y)kL1 (ν) ≤
n=1
N
X
kgn kL1 (µ) < ∞
n=1
|E(gn | Y)|kL1 (ν) < ∞. Hence, recalling (9.9) we conclude
Z X
∞
gn dµy
= E
∞
X
n=1
gn Y (y)
almost everywhere y ∈ Y
n=1
= E(f | Y)(y)
It remains to show
Z
f dµy =
Z X
∞
gn dµy
almost everywhere y ∈ Y
(9.10)
n=1
P∞
In particular, suppose A ∈ B is such that n=1 gn − f 6= 0 for all x ∈ A. Then µ(A) = 0 so
it is clear that (9.10) follows once we have shown A is also µy -null for almost all y ∈ Y .
The open subsets of X generate B and so there exists a sequence (An )n∈N of open sets
such that A ⊆ An and µ(An ) → 0 as n → ∞. Let hn ∈ C(X) be supported on clos(An ) with
0 ≤ hn ≤ 1 for all n ∈ N. Then for each > 0 the functions hn are continuous and
Z Z
Z
hn dµy dν(y) = hn dµ ≤ µ(An )
Letting → 0 we have
Z
µy (An ) dν(y) ≤ µ(An )
and so, by letting n → ∞
Z
µy (A) dν(y) = 0
Thus µy (A) = 0 for almost every y ∈ Y as required.
9.2
Extensions of Measure Preserving Systems
So far we have only studied maps θ between measure spaces, rather than measure preserving
systems. In order to apply the preceding theory to the ergodic theory setting we make the
following definition.
Definition 19. Let α : (X, B, µ, T ) → (Y, D, ν, S) be a measurable map between measure
preserving systems. If α preserves the structure of the systems in the sense α ◦ T = S ◦ α then
α is called an extension of measure preserving systems.
45
Proposition 9.5. Let α : (X, B, µ, T ) → (Y, D, ν, S) be an extension of invertible measure
preserving systems. The conditional expectation operator E( · | Y) : L1 (X, B, µ) → L1 (Y, D, ν)
satisfies
SE(f | Y) = E(T f | Y)
for all f ∈ L1 (X, B, µ)
Proof. Let f ∈ L1 (X, B, µ). By the Proposition 9.1 we have
Z
Z
E(T f | Y)h dν = T f hα dµ
for all h ∈ L1 (Y, D, ν)
As T is measure preserving so is T −1 and hence
Z
Z
T f hα dµ = f T −1 hα dµ
Now, by the property of extensions T −1 hα = (S −1 h)α . Putting everything together
Z
Z
E(T f | Y)h dν =
f (S −1 h)α dµ
Z
=
E(f | Y)S −1 h dµ
Z
=
SE(f | Y)h dµ
As this holds for all h ∈ L1 (Y, D, ν), by the elementary properties of the inner product SE(f | Y) =
E(T f | Y).
Corollary 9.6. With the same hypotheses as the previous proposition,
µy (T −1 A) = µSy (A)
for all A ∈ B
for all y ∈ Y be such that the conditional measure µy is defined.
Proof. Let A ∈ B and y ∈ Y such that µy is defined in accordance with Theorem 9.4. Note
that T 1A = 1T −1 A so by Proposition 9.5
Z
SE(1A | Y)(y) = E(1T −1 A | Y)(y) = 1T −1 A dµy = µy (T −1 A)
On the other hand,
Z
SE(1A | Y)(y) = E(1A | Y)(Sy) =
1A dµSy = µSy (A)
Thus µy (T −1 A) = µSy (A) as required.
9.3
Factors and Extensions of Measure Preserving Systems
In this section we will demonstrate a correspondence between the factors of a system and
extensions which will allow us to apply the notion of conditional measures to the study of
factors. Our work here is adapted from ([5] Chapter 5). It is easy to see that any extension of
measure preserving space induces a factor.
Lemma 9.7. Let α : (X, B, µ, T ) → (Y, D, ν, S) be an extension of measure preserving systems.
Then A = α−1 D is T -invariant sub-σ-algebra of B and hence defines factor of (X, B, µ, T ).
46
Proof. It is clear that A is a σ-algebra on X and, as α is measurable, A ⊆ B. All that remains
is to show A is T -invariant.
Let B ∈ A then B = α−1 D for some set D ∈ D. Using the property of an extension,
T −1 B = {x ∈ X : α ◦ T (x) ∈ D} = {x ∈ X : S ◦ α(x) ∈ D} = α−1 (S −1 D)
As S is D-measurable, S −1 D ∈ D and so T −1 B ∈ A as required.
Our aim will be to prove a converse. Before we do this we note an important fact about Borel
probability systems (X, B, µ). We show that any sub-σ-algebra A ⊆ B is ‘almost generated by
a countable collection of sets’.
Definition 20. A σ-algebra A on a set X is countably generated if there exists a countable
collection of sets {An : n ∈ N} ⊂ P(X) such that A = σ(A1 , A2 , . . . )
µ
Definition 21. Let B be a σ-algebra and A , A 0 ⊆ B sub-σ-algebras. We write A ⊆ A 0
to denote the relation: for all A ∈ A there exists some A0 ∈ A 0 such that µ(A4A0 ) = 0.
µ
µ
µ
Furthermore, we write A = A 0 whenever A ⊆ A 0 and A 0 ⊆ A .
Proposition 9.8. If (X, B, µ) is a Borel probability space and A ⊆ B is a sub-σ-algebra then
µ
there exists a countably generated sub-σ-algebra A 0 = σ(A1 , A2 , . . . ) such that A = A 0 .
Proof. As C(X) is a separable metric space and dense in L1 (X, B, µ) it follows L1 (X, B, µ) is
separable. Furthermore, the collection of characteristic functions
{1A : A ∈ A } ⊆ L1 (X, A , µ) ⊆ L1 (X, B, µ)
is separable. Let A1 , A2 , . . . be such that {1An : n ∈ N} is a dense subset of {1A : A ∈ A } and
denote A 0 = σ(A1 , A2 , . . . ). Then A 0 ⊆ A . Moreover, by density for any A ∈ A there exists a
sequence (nk )k∈N of positive integers such that
µ(A4Ank ) = k1A − 1Ank kL1 (µ) <
Denote A0 =
T
k∈N
1
k
Ank . Then A0 ∈ A 0 and µ(A4A0 ) = 0 as required.
We now turn to the converse of Lemma 9.7.
Theorem 9.9. Let (X, B, µ, T ) be a measure preserving system defined on a Borel probability
space and A a T -invariant sub-σ-algebra of B. Then there exists a system (Y, D, ν, S) defined on
µ
a Borel probability space and an extension α : (X, B, µ, T ) → (Y, D, ν, S) such that A = α−1 D.
At a first glance the theorem may appear somewhat trivial: why not take (Y, D, ν, S) =
(X, A , µ, T ) and α the identity map? Of course, the key to this result is that (Y, D, µ) is a
Borel probability space: we cannot guarantee that this holds in general for (X, A , µ, T ). Our
entire theory of conditional measure is reliant on the spaces in question being Borel probability
spaces and thus this theorem is necessary when dealing with important situations involving
intermediate extensions.
We will take Y in the statement of Theorem 9.9 to be the (compact, metrizable) space M (X)
of Borel probability measures on (X, B). First we show that a canonical Borel probability system
may be constructed on this space; once this is accomplished, Theorem 9.9 follows with ease.
Proof (of Theorem 9.9). As we demonstrated in Section 3.4, the set Y = M (X) with the weak*
topology is a compact, metrizable space. Let D be the Borel σ-algebra generated by the weak*topology. In order to find a natural choice of measure on (Y, D) we use the Disintegration
Theorem. Note that the identity on X induces a map
ι : (X, B, µ) → (X, A , µ)
47
;
x 7→ x
from a Borel probability space and is (trivially) measure preserving. By disintegration we have
an almost everywhere defined measurable map
α : (X, A , µ) → (Y, D)
;
y 7→ µy
(9.11)
taking y ∈ X to the conditional measure µy . We give (Y, D) the pull-back measure ν = α∗ µ
induced by this map, so that
for all D ∈ D
ν(D) = µ(α−1 D)
Finally we wish to define a natural measure preserving transformation on the Borel probability space (Y, D, ν). This is marginally trickier than the previous constructions.
Claim. The pull-back map S = T ∗ : (Y, D, ν) → (Y, D, ν) defined by
Sλ(A) = λ(T −1 A)
for all λ ∈ M (X) and all A ∈ A
is measurable. Moreover, it preserves the measure ν.
Proof (of Claim). We follow the argument given in ([5] Chapter 5). In order to prove S is
measurable we must first make some observations concerning the topology that generates D.
In particular, it is easy to see by the definition of the weak*-topology that a set U ⊆ M (X) is
open if for every κ ∈ U there exists functions f1 , . . . , fr ∈ C(X) and constants 1 , . . . , r > 0
such that
Z
Z
λ ∈ M (X) : fi dλ − fi dκ < i , for all i = 1, . . . , r ⊆ U
Recall C(X) is separable. Let (gn )n∈N ⊂ C(X) be a dense subset of C(X) and > 0 be given.
Then for any f ∈ C(X) there exists an n0 ∈ N such that kf − gn0 k∞ < 3 . If λ, κ ∈ M (X) are
such that
Z
Z
f dλ − f dκ < 3
then by the triangle inequality
Z
Z
gn0 dλ −
gn0 dκ < It follows that the collection of all finite intersections of sets of the form
Z
Z
o
n
λ ∈ M (X) : gn dλ − gn dκ < for some κ ∈ M (X), > 0 and n ∈ N is a basis for the topology on M (X). Moreover, by the
density of the rationals, the collection of finite intersections of sets of the form
Z
n
o
U(n,r,q) = λ ∈ M (X) :
gn dλ − r < q
for some n ∈ N and r, q ∈ Q is a countable basis for the topology on M (X). Now,
Z
n
o
S −1 U(n,r,q) = λ ∈ M (X) : T gn dλ − r < q
is clearly measurable. Moreover, for any U ⊂ M (X) open, S −1 U is a countable union of finite
intersections of measurable sets and therefore measurable. The open subsets of M (X) form a
generating algebra for D and hence S is D-measurable.
It remains to prove S preserves the measure ν. By Corollary 9.6, for all A ∈ B we have
µT y (A) = Sµy (A) and so
S◦α=α◦T
(9.12)
48
It follows that for any D ∈ D we have
α−1 S −1 (D) = T −1 α−1 (D)
Finally
α∗ µ(S −1 D) = µ(α−1 S −1 (D)) = µ(T −1 α−1 (D)) = ν(D)
and hence S preserves ν.
Now that the measure preserving system (Y, D, ν, S) has been constructed, most of the work
in proving Theorem 9.9 has been done. Through a slight abuse of notation consider
α : (X, B, µ, T ) → (Y, D, ν, S)
Then, as we have shown, (Y, D, ν, S) is a measure preserving system and by (9.12), α is an
extension of measure preserving systems.
µ
We are required to prove A = π −1 D. To see this note that as (X, B, µ) is a Borel probability
space, by Proposition 9.8 there exists some countably generated sub-σ-algebra A 0 such that
µ
µ
A = A 0 . As the = relation is transitive, we may assume therefore without loss of generality
A = σ(A1 , A2 , . . . ) is countably generated.
µ
To see A ⊆ α−1 D it is enough to show that for each n ∈ N there exists some set A0n ∈ α−1 D
such that µ(An 4A0n ) = 0.
Consider the D-measurable set D = {ν ∈ Y : ν(An ) = 1}. Note that
α−1 D = {x ∈ X : µx (An ) = 1}
Recalling that the conditional measures are defined through the trivial extension ι it is easy to
see
µx (An ) = 1An (x)
almost everywhere x ∈ X
Define
X0 =
\
{x ∈ X : µx (An ) = 1An (x)}
n∈N
Then µ(X 0 ) = 1 and so, if we set A0n = An ∩ X 0 ∈ D then µ(A0n 4An ) = 0 for all n ∈ N.
µ
Conversely, to prove α−1 D ⊆ A it suffices to show α−1 (U ) ∈ A for all open sets U of the
form (9.3) as these sets form a basis for the topology on M (X) which generates D. But this is
immediate as for any f ∈ C(X) and any r, > 0 we have
Z
Z
o
o n
n
−1
= x∈X:
f dµx − r < α
f dν − r < ν ∈ M (X) :
is A -measurable by the construction of the conditional measures.
We now have enough measure-theoretic tools to return to the proof of the Multiple Recurrence Theorem.
10
Existence of Maximal SZ Factors
As payoff for the work done in the previous section we turn to proving the following theorem,
an important step towards the proof of the Multiple Recurrence Theorem.
Theorem 10.1. Let (X, B, µ, T ) be a measure preserving system and F the set of factors
A ⊆ B of the system for which the action of T on (X, A , µ, T ) is SZ. Then F contains a
maximal element with respect to inclusion.
49
Following the argument given in ([10], §7) we use Zorn’s Lemma. In particular, we consider
a chain, or totally ordered subset, {Ai : i ∈ I} ⊆ F of F and define
[ sup Ai := σ
Ai
i∈I
i∈I
To prove Theorem 10.1 it suffices to show supi∈I Ai is an upperbound for the chain. Clearly
• sup Ai is a σ-algebra
• Ai ⊆ sup Ai for all i ∈ I
• sup Ai is T -invariant
Note that for the final bulletpoint, the set {A ∈ supi∈I Ai : T −1 A ∈ supi∈I Ai } is a d-system
and thus by Dynkin’s Lemma it suffices to show T -invariance only for sets in the generating
algebra ∪i∈I Ai . For such sets the result is trivial.
The proof of Theorem 10.1 therefore boils down to showing that the action of T on (X, supi∈I Ai , µ, T )
is SZ. Explicitly, for any A ∈ supi∈I Ai such that µ(A) > 0 we need to show that given any
k ∈ N,
\
N
k
1 X
−jn
µ
T
A >0
(10.1)
lim inf
N →∞ N
n=1
j=0
Proof (of Theorem 10.1). Let A ∈ supi∈I Ai be such that µ(A) > 0 and k ∈ N. By the above
discussion it suffices to show (10.1). By Lemma 3.8, if = 14 ηµ(A) where η = 21 (k + 1)−1 then
there exists a set A1 ∈ Ai0 for some i0 ∈ I such that
A ⊆ A1
and
µ(A4A1 ) < (10.2)
Note that µ(A1 ) > 0. In particular,
1
µ(A1 ) ≥ µ(A) − µ(A4A1 ) > µ(A) − ηµ(A)
4
(10.3)
Now we apply the results of the previous section. By Theorem 9.9, there exists a Borel
probability system (Y, D, ν, S) and extension α : (X, B, µ, T ) → (Y, D, ν, S) such that Ai0 =
α−1 D. In particular, A1 = α−1 D1 for some set D1 ∈ D.
Disintegrate the measure µ with respect to (Y, D, ν) to form a collection of probability
measures µy defined for almost every y ∈ Y . Note that
µy (A1 ) = E(1α
D1 | Y)(y) = 1D1 (y)
Now A1 may not give a good approximation to A in terms of the conditional measures µy . To
this end we define
D0 := {y ∈ Y : µy (A) ≥ 1 − η}
and let A0 = α−1 (D0 ). We claim µ(A0 ) > 21 µ(A) > 0. Indeed, using the properties of the
conditional measure
Z
µ(A1 \ A) =
µy (A1 \ A) dν(y)
Z
≥
µy (A1 ) − µy (A) dν(y)
D1 \D0
Z
=
1 − µy (A) dν(y)
D1 \D0
≥ ην(D1 \ D0 )
≥ η(ν(D1 ) − ν(D0 ))
50
Now α is measure preserving and so
1
µ(A0 ) ≥ µ(A1 ) − µ(A1 \ A)
η
1
= µ(A1 ) − µ(A1 4A)
η
(10.4)
Finally, by applying (10.2) and (10.3) to (10.4) we get
µ(A0 ) >
3−η
1
µ(A) > µ(A)
4
2
By hypothesis the action of T on (X, Ai0 , µ, T ) is SZ. In particular,
\
N
k
1 X
µ
T −jn A0 > 0
N n=1
j=0
lim inf
N →∞
(10.5)
We claim that
µy
\
k
T
−jn
A
>
j=0
1
2
Once the claim is proven it is easy to see that
\
k
−jn
µ
T
A
≥
Z
µy
Tk
j=0
j=0
≥
=
1
ν
2
\
k
S −jn D0
\
k
S
T
−jn
A dν
j=0
−jn
D0
j=0
\
k
1
−jn
T
A0
µ
2
j=0
(10.6)
and hence comparing (10.5) and (10.6) concludes the proof.
Tk
To prove the claim suppose y ∈ j=0 S −jn D0 . Then for j = 0, . . . , k, we have S jn y ∈ D0
and thus, applying Corollary 9.6,
µy (T −jn A) = µS jn y (A) > 1 − η
for all j ∈ {0, . . . , k}
In particular the complement of each T −jn A has measure µy ((T −jn A)c ) < η. The measure of
Tk
the intersection j=0 T −jn A is smallest when the complements (T −jn A)c are all disjoint. Hence
µy
\
k
T −jn A
> 1 − (k + 1)η =
j=0
1
2
and we are done.
11
Weak Mixing Extensions
We will briefly recap what we have thus achieved and remind the reader of the proof strategy
laid out earlier in the essay. So far we have shown that any system (X, B, µ, T ) contains
some maximal (non-trivial) factor with respect to inclusion, (X, Aˆ, µ, T ) say. The proof of the
Multiple Recurrence Theorem is concluded by demonstrating no proper factor can be maximal.
The method used to achieve this rather ingeniously mirrors our previous work.
51
In this section we describe what it is meant for a system (X, B, µ, T ) to be weak mixing
relative to a factor (X, A , µ, T ). As factors correspond precisely to extensions, this is equivalent
to defining a particular kind of extension α : (X, B, µ, T ) → (Y, D, ν, S) known as a weak
mixing extension. We will then show that weak mixing extensions preserve the SZ property:
if the action of S on (Y, D, ν, S) is SZ and α : (X, B, µ, T ) → (Y, D, ν, S) is a weak mixing
extension then the action of T on (X, B, µ, T ) is also SZ. Thus we will have shown:
If (X, Aˆ, µ, T ) is a proper maximal SZ factor of (X, B, µ, T ) then (X, B, µ, T ) cannot
be weak mixing relative to (X, Aˆ, µ, T ).
Similarly, in Section 12 we define what is meant for a system to be compact relative to a factor
via the notion of compact extensions. We will show that compact extensions also preserve the SZ
property. Hence if (X, Aˆ, µ, T ) is a proper maximal SZ factor of (X, B, µ, T ) then (X, B, µ, T )
cannot be compact relative to (X, Aˆ, µ, T ). Furthermore, we deduce a more general statement:
Suppose (X, Aˆ, µ, T ) is a proper maximal SZ factor of (X, B, µ, T ) and B ∗ is some
intermediate factor, that is (X, B ∗ , µ, T ) is a factor of (X, B, µ, T ) and  ⊆ B ∗ ⊆
B. If (X, Aˆ, µ, T ) is a proper factor of (X, B ∗ , µ, T ), then (X, B ∗ , µ, T ) cannot be
compact relative to (X, Aˆ, µ, T ).
In the final section we conclude the proof of the Multiple Recurrence Theorem by demonstrating
a dichotomy between weak mixing and compact extensions. We demonstrate that for any proper
factor (X, A , µ, T ) of (X, B, µ, T ) either:
• (X, B, µ, T ) is weak mixing relative to (X, A , µ, T ) or
• There exists some intermediate factor A ⊆ B ∗ ⊆ B such that A is a proper factor of B ∗
and (X, B ∗ , µ, T ) is compact relative to (X, A , µ, T ).
Bringing all this together we conclude no proper factor can be maximal.
Presently we will use the notions of conditional expectation and measure to define the concept
of a ‘relative weak mixing’ system. Recall that a system (X, B, µ, T ) is weak mixing if and only
the product system X × X is ergodic. This inspires us to consider defining a ‘conditional
product’.
11.1
The Relative Square
This subsection is based on ([8] Chapter 5).
Definition 22. Suppose (X1 , B1 , µ1 , T1 ) and (X2 , B2 , µ2 , T2 ) are measure preserving systems
defined on Borel probability spaces which both extend the system (Y, D, ν, S). Explicitly, suppose there exists extensions
α1 : (X1 , B1 , µ1 , T1 ) →
(Y, D, ν, S)
α2 : (X2 , B2 , µ2 , T2 ) →
(Y, D, ν, S)
Thus we may disintegrate the measures µ1 and µ2 with respect to (Y, D, ν) to form two systems
of conditional measures µ1,y and µ2,y , defined for almost all y ∈ Y . It is not difficult to see
µ1,y ⊗ µ2,y is ν-measurable (for instance, see [8]). We define the relative independent joining
X1 ×Y X2 of the systems X1 = (X1 , B1 , µ1 , T1 ) and X2 = (X2 , B2 , µ2 , T2 ) as the system
(X1 × X2 , B1 ⊗ B2 , µ1 ×Y µ2 , T1 × T2 )
R
where the measure µ1 ×Y µ2 = µ1,y ⊗ µ2,y dν(y).
Lemma 11.1. Let f1 ∈ L2 (X1 , B1 , µ1 ) and f2 ∈ L2 (X2 , B2 , µ2 ). Then f1 ⊗ f2 ∈ L2 (µ1 ×Y µ2 )
and
Z
Z
f1 ⊗ f2 d(µ1 ×Y µ2 ) = E(f1 | Y)E(f2 | Y) dν
52
Proof. It is easy to see
Z
f1 ⊗ f2 d(µ1 ×Y µ2 )
Z Z
=
Z Z
=
f1 ⊗ f2 d(µ1,y ⊗ µ2,y ) dν(y)
Z
f1 dµ1,y f2 dµ2,y dν(y)
Z
=
E(f1 | Y)(y)E(f2 | Y)(y) dν(y)
Definition 23. Suppose α : (X, B, µ, T ) → (Y, D, ν, S) is an extension of measure preserving
systems where (X, B, µ, T ) is defined on a Borel probability space. Then we can form the relative
independent joining X ×Y X. We say X ×Y X is the Relative Square of the system (X, B, µ, T )
with respect to the factor (Y, D, ν, S). Note by Lemma 11.1, for any f ∈ L2 (X, B, µ) we have
Z
Z
f ⊗ f d(µ ×Y µ) = E(f | Y)2 dν
If α : (X, B, µ, T ) → (Y, D, ν, S) is an extension and π : X × X → X denotes the projection
onto the first co-ordinate then both
π : X ×Y X → X
and α ◦ π : X ×Y X → Y
are extensions. In particular, we may define the expectation for any function f ∈ L2 (µ1 ×Y µ2 )
conditioned on either X or Y.
Lemma 11.2. For any f ∈ L2 (µ1 ×Y µ2 ) we have
E(f | Y) = E(E(f | X) | Y)
Proof. Considering orthogonal projections
P1
: L2 (X ×Y X) → L(X, B, µ)π
P2
: L(X, B, µ) → L(Y, D, ν)α
Q : L2 (X ×Y X) → L(Y, D, ν)α◦π
it is easy to see Q = P2π ◦P1 . The result then follows immediately by the definition of conditional
expectation.
Proposition 11.3. For any f1 , f2 ∈ L2 (X, B, µ) we have
E(f1 ⊗ f2 | Y) = E(f1 | Y)E(f2 | Y)
Proof. First we show that E(f1 ⊗ f2 | X) = f1 E(f2 | Y)α . Indeed, for any h ∈ L2 (X, B, µ), by
applying Lemma 11.1 and the standard properties of conditional expectation we have
Z
Z
α
f1 E(f2 | Y) h dµ =
E(f1 E(f2 | Y)α h | Y) dν
(11.1)
Z
=
E(f1 h | Y)E(f2 | Y) dν
Z
=
f1 ⊗ f2 hπ d(µ ×Y µ)
Further, now considering the extension π1 : X ×Y X → X observe
Z
Z
f1 ⊗ f2 hπ d(µ ×Y µ) = E(f1 ⊗ f2 | X)h dµ
53
(11.2)
Hence, by comparing the left hand side of (11.1) and the right hand side of (11.2), and noting
that the equalities hold for all h ∈ L2 (X, B, µ) we conclude E(f1 ⊗ f2 | X) = f1 E(f2 | Y)α by
the elementary properties of the inner product.
The proposition now follows by applying the preceding lemma. Explicitly,
E(f1 ⊗ f2 | Y) = E(E(f1 ⊗ f2 | X) | Y) = E(f1 E(f2 | Y)α | Y) = E(f1 | Y)E(f2 | Y)
11.2
Weak Mixing Extensions
The following discussion of weak mixing extensions is adapted from ([10], §8).
Definition 24. Let α : (X, B, µ, T ) → (Y, D, ν, S) be an extension of measure preserving
systems. We say the extension is weak mixing if the relative-square system X ×Y X is ergodic.
Alternatively, in this case we may also say that (X, B, µ, T ) is weak mixing relative to the
factor (X, A , µ, T ) where A = α−1 (D) or weak mixing relative to (Y, D, ν, S).
Following the argument given in ([10], §8) we first note the following lemma.
Lemma 11.4. Suppose α : (X, B, µ, T ) → (Y, D, ν, S) is a weak mixing extension and f, g ∈
L∞ (X, B, µ) then the following holds:
N Z
2
1 X E(f T n g | Y) − E(f | Y)S n E(g | Y) dν = 0
N →∞ N
n=1
lim
Proof. Let f, g ∈ L∞ (X, B, µ) and first consider the case E(f | Y) ≡ 0. By hypothesis the
system X ×Y X is ergodic. Apply Lemma 11.1 and the Mean Ergodic Theorem to the functions
f ⊗ f, g ⊗ g ∈ L∞ (X ×Y X) to obtain
N Z
2
1 X E(f T n g | Y) dν
N →∞ N
n=1
lim
N Z
1 X
f ⊗ f (T × T )n (g ⊗ g) d(µ ×Y µ)
N →∞ N
n=1
Z
Z
=
f ⊗ f d(µ ×Y µ) g ⊗ g d(µ ×Y µ)
Z
Z
=
E(f | Y)2 dν E(g | Y)2 dν = 0
=
lim
The general case where f ∈ L∞ (X, B, µ) has arbitrary conditional expectation now follows by
considering the function F = f − E(f | Y). Clearly E(F | Y) = E(f | Y) − E(E(f | Y) | Y) =
E(f | Y) − E(f | Y) = 0 and by the previous case
N Z
2
1 X lim
E(F T n g | Y) dν = 0
N →∞ N
n=1
But E(F T n g | Y) = E(f T n g | Y) − E(f | Y)E(g | Y), as required.
Lemma 11.5. Suppose (X, B, µ, T ) is weak mixing relative to (Y, D, ν, S) and T is an ergodic transformation. Then the relative square system X ×Y X is also weak mixing relative to
(Y, D, ν, S).
Proof. Write X̃ = X ×Y X. In particular, let µ̃ = µ ×Y µ and T̃ = T × T . We are required to
prove that the system X̃ ×Y X̃ is ergodic.
Now write X̂ = X̃ ×Y X̃ so that µ̂ = µ̃ ×Y µ̃ and T̂ = T̃ × T̃ . By the Mean Ergodic Theorem,
specifically Corollary 3.13, it suffices to show that there exists a dense subset H ⊆ L2 (µ̂) such
that for all F, G ∈ H
NX
Z
Z
Z
1 −1
n
lim F T̂ G dµ̂ − F dµ̂ G dµ̂ = 0
(11.3)
N →∞ N
n=0
54
In particular, it is suffices to show (11.3) holds for functions F and G of the form
F (x1 , x2 , x3 , x4 )
=
4
M
fi (x1 , x2 , x3 , x4 ) =
i=1
G(x1 , x2 , x3 , x4 )
=
4
M
4
Y
fi (xi )
i=1
gi (x1 , x2 , x3 , x4 ) =
i=1
4
Y
gi (xi )
i=1
where each fi , gi ∈ L∞ (X, B, µ). By applying Lemma 11.1 and Proposition 11.3 we obtain
Z
Z
n
F T̂ G dµ̂ =
E(f1 T n g1 ⊕ f2 T n g2 | Y)E(f3 T n g3 ⊕ f4 T n g4 | Y) dν
=
Z Y
4
E(fi T n gi | Y) dν
i=1
Similarly,
Z
F dµ̂ =
Z Y
4
Z
E(fi | Y) dν
and
G dµ̂ =
i=1
Z Y
4
E(gi | Y) dν
i=1
By using these equalities to rewrite (11.3) and then applying the triangle inequality it suffices
to show the following two statements:
N Z Y
4
4
Y
1 X
n
n
E(fi T gi | Y) −
E(fi | Y)S E(gi | Y) dν = 0
lim N →∞ N
n=1
i=1
(11.4)
i=1
and
X
Z Y
Z Y
Z Y
4
4
4
1 N
n
lim E(fi | Y)S E(gi | Y) dν −
E(fi | Y) dν
E(gi | Y) dν = 0 (11.5)
N →∞ N
n=1
i=1
i=1
i=1
Statement (11.4) is easily deduced by rewriting the integrand as an appropriate sum and applying
Lemma 11.1. To verify (11.5) first apply the triangle and Cauchy-Schwarz inequalities to see
that for each N ∈ N the N th term of the sequence is bounded above by
Y
4
E(fi | Y)
L2 (ν)
i=1
X
Z Y
4
4
1 N n Y
E(gi | Y) −
E(gi | Y) dν S
N
n=1
i=1
L2 (ν)
i=1
Recall the transformation T is ergodic. This implies S is also ergodic and we are done by the
Mean Ergodic Theorem.
Analogous to the result proved during our study of weak mixing systems, we now show
relative weak mixing in some sense implies relative weak mixing of all orders.
Theorem 11.6. Suppose (X, B, µ, T ) is weak mixing relative to (Y, D, ν, S). Then for any
k ∈ N≥2 and all f0 , . . . , fk ∈ L∞ (X, B, µ) the following hold:
N Z
1 X
N →∞ N
n=1
lim
k−1
2
k−1
Y
Y
ln
ln
E
dν = 0
T
f
|
Y
−
S
E(f
|
Y)
l
l
l=0
(11.6)
l=0
and
N Y
k
k
Y
1 X
ln
ln
α lim T fl −
T E(fj | Y) N →∞ N
n=1
l=1
l=1
55
L2 (µ)
=0
(11.7)
Proof. The work for the most part is analogous to that of proof of Theorem 6.4. Once again
we use an inductive method; Lemma 11.4 is precisely (11.6) case k = 2 whilst (11.7) case k = 2
holds for the system X ×Y X by the Mean Ergodic Theorem. We shall show for k ≥ 3
• Claim (1): (11.6) case k − 1 ⇒ (11.7) case k
• Claim (2): (11.7) case k for the system X ×Y X ⇒ (11.6) case k
By Proposition 11.5, if the system (X, B, µ, T ) is a relatively weak mixing extension of (Y, D, ν, S)
then so is X ×Y X. Therefore the Theorem follows directly from these two claims.
Claim (1). Suppose for all g0 , . . . , gk−1 ∈ L∞ (X, B, µ) the following holds:
N Z
1 X
lim
N →∞ N
n=1
k−1
2
k−1
Y
Y
ln
ln
E
T gl | Y −
S E(gl | Y) dν = 0
l=0
(11.8)
l=0
Then given f1 , . . . , fk ∈ L∞ (X, B, µ) we have
N Y
k
k
Y
1 X
ln
α
lim
T fl −
E(fj | Y) N →∞ N
n=1 l=1
=0
L2 (µ)
l=1
As in the proof of the analogous claim in Theorem 6.4, this is an application of van der
Corput’s trick17 . Given f1 , . . . , fk ∈ L∞ (X, B, µ), first consider the case where for some j0 ∈
{1, . . . , k} the function fj0 has zero conditional expectation: E(fj0 | Y) = 0. Then by taking
un =
k
Y
T ln fl
l=1
it suffices to show
H−1
1 X
sh = 0
H→∞ H
lim
(11.9)
h=0
where
N
1 X
sh = lim sup
hun+h , un i
N →∞ N n=1
As in the proof of Theorem 6.4 we have
Z k−1
Y
hun+h , un i =
T ln (fl+1 T (l+1)h fl+1 ) dµ
(11.10)
l=0
Define gh,l = fl+1 T (l+1)h fl+1 ∈ L∞ (X, B, µ) for each h ∈ N and each l ∈ {0, . . . , k − 1}. Then
(11.10) becomes
hun+h , un i =
Z k−1
Y
T ln gh,l dµ
l=0
Z
=
E
k−1
Y
l=0
17
The proof is very similar so we work rapidly.
56
T gh,l Y dν
ln
By applying the hypothesis of the claim applied to the collection of functions gh,0 . . . , gh,k−1 for
each h ∈ N observe
N
1 X
sh = lim sup hun+h , un i
N →∞ N n=1
k−1
N Z
1 X
Y
ln
= lim sup E
T gh,l Y dν N
N →∞
n=1
l=0
Z k−1
Y
=
S jn E(gh,l | Y) dν
(11.11)
l=0
We claim that for some set Z ⊂ N of zero density
lim
sh = 0
h→∞,h∈Z
/
(11.12)
and thus, by Theorem 2.1, establish (11.9). Indeed, as (X, B, µ, T ) is weak mixing relative to
(Y, D, ν, S), we may apply Lemma 11.4 to the function fj0 to obtain
H Z
2
1 X lim
E(fj0 T j0 h fj0 | Y) − E(fj0 | Y)S j0 h E(fj0 | Y) dν = 0
H→∞ H
h=1
By hypothesis E(fj0 | Y) = 0 and so
Z
lim
h→∞,h∈Z
/
E(gh,j
0 −1
2
| Y) dν = 0
(11.13)
for some set Z ⊂ N of zero density. Finally, applying the Cauchy Schwarz inequality to (11.11)
gives
|sh |
≤
≤
E(gh,j0 −1 | Y)
L2 (ν)
k−1
Y
S jn E(gh,l | Y)
l=0
l6=j0 −1
k−1
Y
kE(gh,j0 −1 | Y)kL2 (ν)
L2 (ν)
kfl k2∞
(11.14)
l=0
l6=j0 −1
Thus, by comparing (11.13) and (11.14) we obtain (11.12) as desired.
The general case reduces to the above case by writing
Y
k
k j−1
k
k
Y
Y
X
Y
E(fj | Y)α =
T ln fl T jn fj − E(fj | Y)α
E(fj | Y)α (11.15)
T ln fl −
l=1
l=1
j=1
l=1
l=j+1
and noting E(fj − E(fj | Y)α | Y) = E(fj | Y) − E(fj | Y) = 0.
For the second claim we turn our attention to the system X ×Y X. We adopt the notation
µ̃ = µ ×Y µ and T̃ = T × T and let α̃ : X ×Y X → Y denote the weak mixing extension given
by composing α with the projection onto the first co-ordinate.
Claim (2). Suppose for all f1 , . . . , fk ∈ L∞ (X ×Y X) the following holds:
X
k
k
Y
1 N Y
ln
ln
α̃ lim T̃
f
−
T̃
E(f
|
Y)
l
l
2 =0
N →∞ N
L (µ̃)
n=1
l=1
(11.16)
l=1
Then given g0 , . . . , gk ∈ L∞ (X, B, µ) we have
2
k−1
N Z k−1
Y
Y
1 X ln
ln
dν = 0
lim
E
T
g
|
Y
−
S
E(g
|
Y)
l
l
N →∞ N
n=1
l=0
l=0
57
(11.17)
Here we use a slightly different approach than that utilised in dealing with the analogous
claim in the proof of Theorem 6.4. The following argument is adapted from ([8] Chapter 7).
Let g0 , . . . , gk ∈ L∞ (X, B, µ). First we consider the case where E(gj0 | Y) = 0 for some j0 ∈
{0, . . . , k}. By Proposition 11.5 the relative square system X ×Y X is also weak mixing relative
to (Y, D, ν, S). Apply the hypothesis (11.16) of the claim to the functions g1 ⊗ g1 , . . . , gk ⊗ gk
so that
X
k
k
Y
1 N Y
ln
ln
α̃ T̃
g
⊗
g
−
lim T̃
E(g
⊗
g
|
Y)
l
l
l
l
2 =0
N →∞ N
L (µ̃)
n=1
l=1
l=1
Norm convergence implies weak convergence so that
X
Y
Y
Z
Z
k
k
1 N
ln
ln
α
lim
g0 ⊗ g0
T̃ gl ⊗ gl dµ̃ − g0 ⊗ g0
T̃ E(gl ⊗ gl | Y) dµ̃ = 0
N →∞ N
n=1
l=1
l=1
We may rewrite the left hand integral as
Z Y
k
T̃ ln gl ⊗
l=0
k
Y
T̃ ln gl dµ̃ =
Z
E
Y
k
l=0
l=0
2
T ln gl Y dν
Do the same for the right hand integral and apply the assumption E(gj0 | Y) = 0 to give
2
Y
k
N Z
1 X
T ln gl Y dν = 0
E
N →∞ N
n=1
lim
l=0
Thus we have established the result for this case. Note that by Theorem 2.1 we also have
N Y
k
1 X
ln lim
T gl Y 2 =0
E
N →∞ N
L (ν)
n=1
(11.18)
l=0
For the general case observe
E
k−1
Y
l=0
k−1
k−1
k−1
Y
Y
Y
T ln gl −
T ln E(gl | Y)α Y
T ln gl Y −
S ln E(gl | Y) = E
l=0
l=0
l=0
Using the identity stated in (11.15) (replacing the fl with gl ), the above expression may be
written as
Y
k
k
X
ln
T Gj,l Y
(11.19)
E
j=1
where
Gj,l
l=1

 fl
fj − E(fj | Y)α
=

E(fl | Y)α
if l = 1, . . . , j − 1
if l = j
if l = j + 1, . . . , k
For each j = 1, . . . , k we have E(Gj,j | Y) = 0 and so, by taking the L2 -norm of (11.19) and
applying the triangle inequality we can reduce the general case to the previous case. Specifically,
by applying (11.18) to each collection of functions Gj,1 , . . . , Gj,k it follow
k−1
N k−1
Y
Y
1 X
ln ln
lim
T gl Y −
S E(gl | Y)
E
2 =0
N →∞ N
L (ν)
n=1
l=0
l=0
and we obtain (11.17) by once again applying Theorem 2.1.
58
We now use the previous theorem to deduce the main result of this section: that the SZ
property is preserved under weak mixing extensions.
Theorem 11.7. Suppose α : (X, B, µ, T ) → (Y, D, ν, S) is a weak mixing extension and that
the action of S on (Y, D, ν, S) is SZ. Then so too is the action of T on (X, B, µ, T ).
Proof. Let A ∈ B be a set with positive measure µ(A) > 0 and k ∈ N. Define
n
1o
Dn := y ∈ Y : µy (A) ≥
n
Then each Dn is ν-measurable and
Z
lim n→∞ Dn
µy (A) dν(y) − µ(A) = 0
and hence there must exist n0 ∈ N such that ν(Dn0 ) >0. In other words, there exists a constant
a > 0 such that if we define D := y ∈ Y : µy (A) ≥ a then ν(D) > 0.
Let > 0. By Theorem 11.6 there exists an N0 ∈ N such that
k
k
N Y
Y
1 X
ln
ln
E
T 1A Y −
S E(1A | Y)
2 < for all N > N0
N n=1
L (ν)
l=0
l=0
Thus, fixing N > N0 , by applying the triangle and Cauchy-Schwarz inequalities we have
Y
Z Y
k
k
N Z
1 X
ln
ln
<
Y dν −
T
1
S
E(1
|
Y)
dν
E
A
A
N
n=1
l=0
l=0
Note that
Z Y
\
k
k
ln
−ln
T 1A Y dν
T
A = E
µ
l=0
l=0
and as E(1A | Y) ≥ 0 it follows E(1A | Y) ≥ a1D . Thus
\
k
N
1 X
T −ln A
>
µ
N n=1
l=0
N Z k
1 X Y ln
S E(1A | Y) dν − N n=1
l=0
N Z k
1 X Y ln
≥ ak+1
S 1D dν − N n=1
l=0
\
N
k
1 X
≥ ak+1
ν
S −ln D − N n=1
l=0
As was arbitrary and the SZ property holds for the system (Y, D, ν, S) it follows
\
\
N
k
N
k
X
1 X
−ln
k+1 1
−ln
lim inf
µ
T
A ≥ lim inf a
ν
S D >0
N →∞ N
N →∞
N n=1
n=1
l=0
l=0
as required.
12
Compact Extensions
We now turn to defining compact extensions and showing that they too preserve the SZ property.
59
Definition 25. Let α : (X, B, µ, T ) → (Y, D, ν, S) be an extension of a measure preserving
system and f ∈ L2 (X, B, µ). We say f is almost periodic relative to Y (AP rel. Y) if for every
> 0 there exists functions g1 , . . . , gm ∈ L2 (X, B, µ) such that
min kT n f − gj kL2 (µy ) < 1≤j≤m
almost everywhere y ∈ Y
Moreover, the extension is compact if every the set of AP rel. Y functions is dense in L2 (X, B, µ).
Complementing the result of the previous section, the purpose of this section will be to prove
the following theorem:
Theorem 12.1. Let α : (X, B, µ, T ) → (Y, D, ν, S) be a compact extension and suppose the
action of S on (Y, D, ν) is SZ. Then so is the action of T on (X, B, µ).
The proof of this result builds upon our work on compact systems in a far from trivial
manner. The proof is divided into a number of results: the first reduces the problem, and the
remainder to some extent provide an analogy of the method used to show the SZ property for
compact systems in Theorem 7.2.
Recall that in order to prove Theorem 12.1 we must show that for all A ∈ B with µ(A) > 0
and for all k ∈ N
\
N
k
1 X
−jn
µ
T
A >0
(12.1)
lim
N →∞ N
n=1
j=0
We begin by demonstrating that without loss of generality we may only consider A ∈ B with
certain “nice” properties. The following lemma is adapted from ([5], Chapter 7).
Lemma 12.2. Let α : (X, B, µ, T ) → (Y, D, ν, S) be a compact extension and A ∈ B a set of
positive measure. Then there exists a measurable subset à ⊆ A of positive measure such that
1. For almost every y ∈ Y either µy (Ã) = 0 or µy (Ã) > 12 µ(Ã).
2. The characteristic function 1Ã ∈ L2 (X, B, µ) is A.P. rel. Y.
Remark 12. Clearly (12.1) holds for A automatically if it holds for any measurable subset
à ⊆ A. Thus, in proving Theorem 12.1 we may restrict our attention without loss of generality
to showing (12.1) holds for sets A ∈ B with µ(A) > 0 which satisfy properties 1 and 2.
Proof of Lemma 12.2. To begin we simply remove the parts of the set which contradict property
1. To do this let
1
B := y ∈ Y : µy (A) ≤ µ(A)
2
0
−1
Then B is measurable and hence so is A = A \ α (B). For almost every y ∈ Y \ B the set
α−1 (B) is µy -null and so
1
1
µy (A0 ) = µy (A) > µ(A) ≥ µ(A0 )
2
2
On the other hand, for almost every y ∈ B then the support of the measure µy is contained in
α−1 (B) and so
µy (A0 ) = 0
Hence for almost every y ∈ Y either µy (A0 ) > 12 µ(A0 ) or µy (A0 ) = 0. It remains to show A0 is
a set of positive measure. If y ∈ B then µy (A \ A0 ) = µy (A) ≤ 12 µ(A) and if y ∈ Y \ B then
µy (A \ A0 ) = 0. Integrating gives
Z
1
µ(A \ A0 ) = µy (A \ A0 ) dν(y) ≤ µ(A)
2
and so µ(A0 ) ≥ 21 µ(A) > 0.
We claim there exists a set B 0 ∈ D such that if we define à = A0 \ α−1 (B 0 ) then µ(Ã) > 0
and the function 1Ã is AP rel Y. This set then also has property 1: for almost every y ∈ B 0
60
we have µy (Ã) = 0. On the other hand, for almost every y ∈
/ B 0 it follows µy (Ã) = µy (A0 ) >
1
1
0
2 µ(A ) ≥ 2 µ(Ã).
1
µ(A0 ) so that
Let (l )l∈N be the sequence l = 2l+2
∞
X
l =
l=1
1
1
µ(A0 ) < µ(A0 )
4
2
(12.2)
By hypothesis the set of AP rel Y functions is dense in L2 (X, B, µ). For every l ∈ N there
exists an fl ∈ L2 (X, B, µ) such that fl is AP rel Y and
k1A0 − fl kL2 (µ) < l
Now consider the sets
Bl = y ∈ Y : k1A0 − fl k2L2 (µy ) ≥ l
defined for every l ∈ N. Then each Bl is measurable and
Z
Z
1
ν(Bl ) =
k1A0 − fl k2L2 (µy ) dν(y)
dν(y) ≤
l Bl
Bl
Z
1
≤
k1A0 − fl k2L2 (µy ) dν(y)
l
1
=
k1A0 − fl k2L2 (µ) < l
(12.3)
l
S
1
Define à = A0 \ α−1
l∈N Bl . By (12.2) and (12.3) one may easily deduce µ(Ã) > 2 µ(A). We
1
claim 1Ã is AP rel Y. Indeed, given > 0 there exists l0 ∈ N such that l0 < 2 and functions
g1 , . . . , gm ∈ L2 (X, B, µ) such that
for all n ∈ N and almost everywhere y ∈ Y
1≤j≤m
2
S
Note, for almost every y ∈ Y if S n y ∈
/ l∈N Bl then
min kT n fl0 − gj kL2 (µy ) <
kT n 1Ã − T n fl kL2 (µy )
On the other hand, if S n y ∈
S
l∈N
=
k1Ã − fl kL2 (µSn y )
=
k1A0 − fl kL2 (µSn y ) <
2
Bl then
kT n 1Ã kL2 (µy ) = k1Ã kL2 (µSn y ) = 0
If we take g0 ≡ 0 then by the triangle inequality
min kT n 1Ã − gj kL2 (µy ) ≤ min kT n 1Ã − T n fl kL2 (µy ) + kT n fl − gj kL2 (µy ) < 0≤j≤m
0≤j≤m
as required.
At the heart of the multiple recurrence result for compact systems was the idea expounded
in Proposition 7.1, which we presently recall. If a system (X, B, µ, T ) is compact, then for all
f ∈ L2 (X, B, µ) A.P. there exists a set K ⊂ N of positive lower density (in particular a syndetic
set) such that
kT n f − f kL2 (µ) < for all n ∈ K
(12.4)
Following the argument given in [5, 10], our recourse in proving Theorem 12.1 will be to
formulate and prove a similar statement but concerning compact extensions α : (X, B, µ, T ) →
(Y, D, ν, S) and A.P. rel. Y functions and in particular we consider f = 1A .
61
Proposition 12.3. Let f = 1A where we assume A ∈ B is of positive measure and satisfies
properties 1 and 2 of Lemma 12.2 and let > 0 be given. There exists T
a set D ∈ D of positive
k
measure and finite set F ⊂ Z such that given any n ∈ N, for any y ∈ l=0 S −ln D there exists
an m ∈ F such that
kT l(n+m) f − f kL2 (µy ) < for all l ∈ {1, . . . , k}
(12.5)
As in the proof of Proposition 7.1 we consider finite maximum −seperated sets. The main
problem we encounter here is that we now need to find finite sets
{T t1 f, . . . , T tr f } ⊂ L2 (X, B, µy )
for all y ∈
k
\
S −ln D
l=0
which are maximal −seperated in many different vector spaces. To this end we make the
following definition:
Lk
2
Definition 26. For each y ∈ Y let
l=0 L (µy ) denote the vector space of (k + 1)−tuples
2
(f0 , . . . , fk ), where each fi ∈ L (µy ) with the addition and scalar multiplication defined in the
Lk
obvious way. Define a norm k · k⊕ on l=0 L2 (µy ) by
k(f0 , . . . , fk )k⊕ = max kfj kL2 (µy )
0≤j≤k
Proof of Proposition 12.3. We devide the proof into three claims, first focusing our attention
on subsets of the following form
L (y) :=
m
(f, T f, . . . , T
km
f) ∈
k
M
M
k
L2 (µy )
L (µy ) : m ∈ Z ⊆
2
l=0
l=0
By the proof of Lemma 12.2 there exists a set D0 ∈ D of positive measure such that µy (A) >
1
0
0
2 µ(A) for almost every y ∈ D and µy (A) = 0 for almost every y ∈ Y \ D . Consider the subset
∗
L (y) ⊆ L (y) given by
L ∗ (y) :=
(f, T m f, . . . , T km f ) : m ∈ Z, y ∈
k
\
S −lm D0
l=0
Tk
Explicitly, for any m ∈ Z, (f, T m f, . . . , T km f ) ∈ L ∗ (y) provided y ∈ l=0 S −lm D0 . This is
m
km
precisely the subset consisting of the (f, T f, . . . , T f ) for which each entry has non-zero
L2 (µy )-norm. Indeed, suppose (f, T m f, . . . , T km f ) ∈ L ∗ (y) then for any l ∈ {1, . . . , k} we have
S lm y ∈ D0 and so
r
1
lm
kT f kL2 (µy ) = kf kL2 (µSlm y ) >
µ(A) > 0
2
Claim (1). Let > 0 be given. Then there exists a (non-empty) finite set F0 ⊂ Z such that
k
n
o
M
(f, T m f, . . . , T km f ) ∈
L2 (µy ) : m ∈ F0
l=0
is maximal -seperated in L ∗ (y) for all y ∈
Tk
l=0
S −lm D0 .
Proof. As f is AP rel Y, by definition the set {T n f : n ∈ N} is totally bounded in L2 (X, B, µy ).
Lk
It follows that L (y) and, moreover, L ∗ (y) are totally bounded in l=0 L2 (µy ).
For any finite set F ⊂ Z let Sep (F ) be the set of all y ∈ D0 such that the set
k
n
o
M
(f, T m f, . . . , T km f ) ∈
L2 (µy ) : m ∈ F
l=0
62
(12.6)
is maximal -seperated in L ∗ (y). Alternatively, define the function gF on Y by
gF (y)
gF (y)
=
=
min
0
0
max kT lm f − T lm f kL2 (µy )
m6=m ∈F j=0,...,k
if y ∈
k
\ \
S −jm D
(12.7)
m∈F j=0
0
otherwise
If the value of gF (y) > 0, then gF (y) gives the seperation of (12.6) in L (y) and it follows
Sep (F ) ⊂ Y can be expressed in the following form
Sep (F ) = y ∈ Y : gF (y) > but gF̃ (y) ≤ whenever |F̃ | > |F |}
We claim that there exists some finite set F0 ⊂ Z such that ν(Sep (F )) > 0. To prove the
claim it suffices to show the set system {Sep (F ) : F ⊂ Z finite} is (almost) a covering for D0 .
As there are only countably many finite subsets of integers, all the Sep (F ) cannot have zero
measure for otherwise
X
ν Sep (F ) = 0
0 < ν(D0 ) ≤
F ⊂Z
F finite
For any y ∈ D0 , as (f, . . . , f ) ∈ L ∗ (y) the set L ∗ (y) is non-empty and totally bounded and
thus there exists a maximal -seperated subset of L ∗ (y). It follows directly that there exists
Fy ⊂ Z finite such that y ∈ Sep (Fy ) and we are done.
Henceforth we fix F = F0 ⊂ Z as a set of integers such that ν(Sep (F )) > 0. Note that for
each l ∈ N
1
Sepl (F ) := y ∈ Y : gF (y) > + but gF̃ (y) ≤ whenever |F̃ | > |F |}
l
S
is a measurable subset of Sep (F ) and Sep (F ) = l∈N Sepl (F ). There must be some l0 ∈ N
such that ν(Sepl0 (F )) > 0. Thus, defining η = l10 the set
D̃ := y ∈ Y : gF (y) > + η but gF̃ (y) ≤ whenever |F̃ | > |F |}
(12.8)
is of positive measure.
Claim (2). There exists a subset D ⊆ D̃ of positive measure such that for each pair m, m0 ∈ F ,
each l = 0, . . . , k and each pair y, y 0 ∈ D
lm
kT f − T lm0 f kL2 (y) − kT lm f − T lm0 f kL2 (y0 ) < η
Proof. For each choice of m, m0 ∈ F and l ∈ {0, . . . , k} define the function θ(m, m0 , l) : D̃ → [0, 1]
by
0
θ(m, m0 , l) : y 7→ kT ml f − T m l f kL2 (µy )
Then these functions are measurable. Let I1 , . . . , It be a partition of [0, 1] into disjoint closed
intervals, each of length < η. Note that for each choice of m, m0 and l the sets
θ(m, m0 , l)−1 (I1 ), . . . , θ(m, m0 , l)−1 (Ik )
partition D̃ into a finite number of measurable sets. Let D̃1 , . . . , D̃M ⊆ D̃ be the partition which
refines the collection of partitions corresponding P
to every choice of m, m0 and l. Then each D̃i
M
is an intersection of measurable sets and ν(D̃) = i=1 ν(D̃i ). There must be some set D = D̃i0
with positive measure. It is easy to check D satisfies the required property.
Finally, Proposition 12.3 proceeds directly from the following claim.
63
Claim (3). For any n ∈ N the set
Bn := {(f, T n+m f, . . . , T k(n+m) f ) : m ∈ F }
Tk
is an −seperated subset of L ∗ (y) for all y ∈ j=0 S −jn D.
Tk
Once we have proven Claim 3, fixing n ∈ N and y ∈ j=0 S −jn D ⊆ D̃, it follows from
the definition of D̃ given in (12.8) that for the set Bn is maximal −seperated in L ∗ (y). In
particular, if we define F̃ = (n + F ) ∪ {0} then as |F̃ | > |F |, the set {(f, . . . , f )} ∪ Bn cannot
be −seperated in L ∗ (y). Moreover, we must have some m ∈ F such that for l = 0, . . . , k
kT l(m+n) f − f kL2 (µy ) ≤ k(f, T m+n f, . . . , T k(m+n) f ) − (f, . . . , f )k⊕ < Proof (of Claim 3). By hypothesis the action of S on (Y, D, ν) is SZ, and as ν(D) > 0 this
Tk
ensures that the set j=0 S −jn D in non-empty for infinitely many values of n and therefore the
claim is not vacuous. Explicitly,
\
N
k
1 X
−jn
lim
ν
S
D >0
N →∞ N
n=1
j=0
and so there exists a set of positive lower density such that for all n belonging to this set
ν
\
k
S
−jn
D
>0
(12.9)
j=0
Tk
Fix n ∈ N such that j=0 S −jn D is non-empty. We will demonstrate that the set Bn has the
desired properties.
First we show that Bn ⊆ L ∗ (y). Recalling the definition of L ∗ (y), this is true provided
Tk
Tk
that for each y ∈ l=0 S −ln D we have y ∈ l=0 S −l(m+n) D0 for every m ∈ F .
Tk
Let y ∈ l=0 S −ln D, then S ln y ∈ D for every l ∈ {0, . . . , k} and so it follows gF (S ln y) >
T
Tk
+ η > 0. We must have S ln y ∈ m∈F j=0 S −jm D0 by the definition of gF . In particular
T
S ln y ∈ m∈F S −lm D0 and so S l(m+n) y = S lm S ln y ∈ D0 as required.
Now we show the −seperation. As y ∈ D ⊆ D̃, if m 6= m0 ∈ F then
0
kT lm f − T lm f kL2 (µy ) > + η
(12.10)
On the other hand, by Claim 2,
lm
kT f − T lm0 f kL2 (µ ) − kT lm f − T lm0 f kL2 (µ
<η
y
S ln y)
(12.11)
and so combining (12.10) and (12.11) gives
0
kT lm f − T lm f kL2 (µSln y ) > But
0
kT l(n+m) f − T l(n+m ) f kL2 (y)
0
=
kT ln (T lm f − T lm )f kL2 (y)
=
kT lm f − T lm f kL2 (S ln y) < 0
as required.
This concludes the proof of Proposition 12.3.
We now turn to the proof of Theorem 12.1. Recall we are trying to show if α : (X, B, µ, T ) →
(Y, D, ν, S) is a compact extension and the action of S on (Y, D, ν, S) is SZ then it follows that
the action of T on (X, B, µ, T ) is SZ.
64
1
Proof (of Theorem 12.1). Take = 4(k+1)
µ(A) in Proposition 12.3 to obtain a set D ∈ D of
positive
Tk measure and finite set F ⊂ Z with the stated properties. Temporarily fixing n ∈ N and
y ∈ l=0 S −ln D there exists m ∈ F such that
kT l(n+m) f − f kL2 (µy ) < for all l ∈ {0, . . . , k}
Thus, for any l ∈ {0, . . . , k} we have
Z k j−1
X Y l(n+m)
j(n+m)
k−j
T
f (T
f − f )f
dµy Z k
Z
Y l(n+m)
k+1
T
f dµy − f
dµy =
j=0 l=0
l=0
k Z j−1
Y
X
≤
j=0
l=0
k Z
X
≤
T l(n+m) f |T j(n+m) f − f |f k−j dµy
|T j(n+m) f − f | dµy
j=0
R
By the Cauchy Schwarz inequality
Z Y
k
|T j(n+m) f − f | dµy ≤ kT j(n+m) f − f kL2 (µy ) < . Hence
T ln f dµ −
Z
f k+1 dµ < (k + 1)
l=0
By our choice of and property 1 of the set A as given in Lemma 12.2 we have
Z Y
k
T l(n+m) f dµy
Z
≥
f k+1 dµy − (k + 1)
(12.12)
l=0
1
1
= µy (A) − µ(A) ≥ µ(A)
4
4
Tk
Now the choice of m does depend on y ∈ l=0 S −ln D, but now we exploit the fact there are
only finitely many possible choices of m. Consider the sum
k
XZ Y
m∈F
T l(n+m) f dµy
l=0
Tk
For any y ∈ l=0 S −ln D, the inequality (12.12) must hold for some term in this summation and
all terms are non-negative. Hence
k
XZ Y
m∈F
Integrating (12.13) over
Tk
l=0
l=0
1
µ(A)
4
(12.13)
S −ln D therefore gives
k
XZ Y
m∈F
T l(n+m) f dµy ≥
T l(n+m) f dµ ≥
l=0
1
µ(A)ν
4
\
k
S −ln D
l=0
Averaging both sides over 1 ≤ n ≤ N and taking the limit as N → ∞ gives
\
\
N
k
N
k
1 X
1
1 X
µ
T −ln A ≥ µ(A) lim
ν
S −ln D > 0
N →∞ N
N →∞ N
4
n=1
n=1
|F | lim
l=0
l=0
65
13 The Dichotomy Between Weak Mixing and Compact
Extensions
We are now able to conclude the proof of Furstenberg’s Multiple Recurrence Theorem. The
final piece of the argument is the following Theorem:
Theorem 13.1. Let α : (X, B, µ, T ) → (Y, D, ν, S) be an extension of measure preserving
systems. Then one of the following holds:
• Either α : (X, B, µ, T ) → (Y, D, ν, S) is a weak mixing extension,
• Or there exists a non-trivial intermediate compact extension. Explicitly, there exists a
system (Z, E , θ, R) defined on a Borel Probability space and an extension
β : (X, B, µ, T ) → (Z, E , θ, R)
and a non-trivial compact extension
γ : (Z, E , θ, R) → (Y, D, ν, S)
with the property α = γβ.
Once Theorem 13.1 has been established the Multiple Recurrence Theorem follows directly
as we shall now illustrate:
Proof (of Furstenberg’s Multiple Recurrence Theorem). Suppose (X, B, µ, T ) is an invertible,
ergodic measure preserving system defined on a Borel probability space. By Theorem 10.1
there exists some maximal SZ factor (X, Aˆ, µ, T ), say. Suppose this is a proper factor. Recall
that there exists some system (Y, D, ν, S) defined on a Borel probability space corresponding to this maximal factor, namely there is an extension α : (X, B, µ, T ) → (Y, D, ν, S)
µ
such that Aˆ = α−1 (D). Now this extension cannot be weak mixing for otherwise by Theorem 11.7 the action of T on (X, B, µ, T ) is SZ, contradicting the maximality of (X, Aˆ, µ, T ).
Therefore, by Theorem 13.1 there exists some system (Z, E , θ, R) defined on a Borel Probability space; an extension β : (X, B, µ, T ) → (Z, E , θ, R) and a non-trivial compact extension
γ : (Z, E , θ, R) → (Y, D, ν, S). By Theorem 12.1 the action of R on (Z, E , θ, R) is SZ. But this
system corresponds to a factor (X, B ∗ , µ, T ) where B ∗ = β −1 (E ) and A ⊂ B ∗ is a proper factor. As the action of T on (X, B ∗ , µ, T ) is SZ, this too contradicts maximality. Hence the factor
(X, Aˆ, µ, T ) cannot be proper and thus the action of T on (X, B, µ, T ) is SZ as required.
The proof of Theorem 13.1 runs similarly to that of Theorem 8.1, albeit with substantial
modifications. We will assume that the extension is not weak mixing and show there exists
a non-trivial intermediate compact extension. To do this, using the correspondence between
extensions and factors, it is enough to demonstrate the existence of a T -invariant σ-algebra B ∗
with the following properties:
• It is an intermediate factor: α−1 (D) ⊂ B ∗ ⊆ B
• The AP functions in L2 (X, B ∗ , µ) are dense in L2 (X, B ∗ , µ)
Proof (of Theorem 13.1). The argument here is adapted from ([5] Chapter 7; [8] §10). We
divide the proof into three claims. The first of these claims mirrors the approach used in
proving Theorem 8.1 where we considered a convolution operator.
Assume that the extension α : (X, B, µ, T ) → (Y, D, ν, S) is not weak mixing. Then X ×Y X
is not ergodic and so there exists a bounded, non-constant T -invariant function H ∈ L∞ (X ×Y
X). Define the convolution operator H∗ : L2 (X, B, µ) → L2 (X, B, µ) by
Z
H ∗ φ (x) = H(x, x0 )φ(x0 ) dµα(x) (x0 ) for all φ ∈ L2 (X, B, µ)
(13.1)
66
Claim (1). Let φ ∈ L2 (X, B, µ) and > 0 be given. Then for almost every y ∈ Y there exist
y
∈ L2 (X, B, µy ) such that
functions g1y , . . . , gm(y)
min
|≤j≤m(y)
kT n H ∗ φ − gjy kL2 (µy ) < for all n ∈ N0
Remark 13. Note that Claim 1 does not show each H ∗ φ is AP rel Y. For this we would
need uniform boundedness: that given > 0 there exists a single set of functions g1 , . . . , gm ∈
L2 (X, B, µ) such that
min kT n H ∗ φ − gj kL2 (µy ) < |≤j≤m
almost everywhere y ∈ Y for all n ∈ N0
(13.2)
Proof (of Claim 1). For almost every y ∈ Y the operator H∗ : L2 (X, B, µy ) → L2 (X, B, µy ) is
compact. Also, we have the identity
T n (H ∗ φ) = H ∗ T n φ
This is easily deduced using the fact
Z
T n (H ∗ φ) (x) =
Z
=
Z
=
Z
=
for all φ ∈ L2 (X, B, µ) and all n ∈ N
(13.3)
T is measure preserving and H is T × T −invariant:
H(T n x, x0 )φ(x0 ) dµα(T n x) (x0 )
H(T n x, x0 )φ(x0 ) dµS n α(x) (x0 )
H(T n x, T n x0 )φ(T n x0 ) dµα(x) (x0 )
H(x, x0 )φ(T n x0 ) dµα(x) (x0 ) = H ∗ (T n φ) (x)
Hence clos{T n H ∗ φ : n ∈ N} is a compact subset in L2 (X, B, µy ) for almost every y ∈ Y and
Claim 1 follows directly.
Although we have not demonstrated that the H ∗ φ are AP rel Y functions18 , the next claim
shows us each H ∗ φ is arbitrarily close to an AP rel Y function in terms of L2 (X, B, µ)-norm.
Claim (2). For any φ ∈ L∞ (X, B, µ), given > 0 there exists a function f ∈ L2 (X, B, µ) such
that f is AP rel Y and kf − H ∗ φkL2 (µ) < .
Proof. Let φ ∈ L∞ (X, B, µ) be some fixed function. The case φ ≡ 0 is trivial, so we assume
kφk∞ 6= 0.
We will construct a function f equal to H ∗ φ on all of X except for a set where H ∗ φ is
‘badly behaved’ in a sense presently described.
Fix some y ∈ Y so that the conditional measure µy is defined and some δ > 0. The family
of δ-balls {Bδ (n, y)}n∈Z where
Bδ (n, y) = h ∈ clos{T m f : m ∈ Z} : kh − T n H ∗ φkL2 (µy ) < δ
cover the compact subset clos{T n f : n ∈ Z} ⊆ L2 (X, B, µ). By taking a finite refinement that
there exists some M ∈ N such that {T n H ∗ φ : |n| ≤ M } is δ-dense in {T n f : n ∈ Z}
Now let mδ : Y → N be the almost everywhere defined function which takes y ∈ Y to the
least positive integer such that {T m H ∗ φ : |m| ≤ mδ (y)} is δ-dense in {T n f : n ∈ Z}. This is a
measurable function. To see this, note that it suffices to show for each M ∈ N the set
m−1 ({0, . . . , M }) = {y ∈ Y : m(y) ≤ M }
18
It is possible to show that the H ∗ φ are indeed AP rel Y (see [5] Theorem 7.21). However, here we follow a
different method and show the functions H ∗ φ can be slightly modified to produce function f easily seen to be AP
rel Y.
67
is measurable. This is precisely the set of y ∈ Y with the property that
min kT m (H ∗ φ) − T j (H ∗ φ)kL2 (µy ) < δ
|j|≤M
Recall for any F ∈ L2 (X, B, µ) the function y 7→
follows that the function Φ : Y → [0, ∞) given by
R
for all m ∈ N
(13.4)
F dµy is measurable by Theorem 9.4. It
Φ : y 7→ kT m (H ∗ φ) − T i (H ∗ φ)kL2 (µy )
is measurable. Clearly the set defined by the condition in (13.4) is precisely the set Φ−1 ([0, δ))
and so m−1 ({0, . . . , M }) is measurable as required.
Moreover, for each k ∈ N the set Ak = m−1
δ [k, ∞) ⊆ Y is measurable. It is clear that the
sequence of sets (Ak )k∈N is decreasing in measure and that
ν(Ak ) → 0
as k → ∞
Therefore, given η > 0 there exists M (δ, η) ∈ N such that ν(An ) < η for all n ≥ M (δ, ).
Let E(δ, η) = AM (δ,) . Then ν(E(δ, η)) < η and for almost every y ∈ Y \ E(δ, η) one has
mδ (y) < M (δ, η).
Let > 0 be given. Define
n = (2n kHk∞ kφk∞ )−1 for all n ∈ N
P∞
so that = kHk∞ kφk∞ n=1 n . Also let δn = n1 for all n ∈ N.
Repeat the above argument for each n ∈ N, taking δ = δn and η = n . to obtain a sequence
of sets E(n) ⊆ Y of measure ν(E(n)) < n and integers M (n) ∈ N such that for y ∈ Y \ E(n)
the set {T m H ∗ φ : |m| ≤ M (n)} is δn -dense in L2 (µy ).
Let
S
0
if x ∈ α−1
E(n)
n∈N
f (x) =
(13.5)
H ∗ φ (x) otherwise
P∞
Then kf − H ∗ φkL2 (µ) < kHk∞ kφk∞ n=1 n = .
It remains to show f is AP rel Y. But this is obvious from the construction: for given
δ > 0 there exists n ∈ N such that δn < δ. Then for almost every y ∈ Y \ E(n) we have
{T m H ∗ φ : |m| ≤ M (δn , n )} is δ−dense in {T m H ∗ φ : m ∈ Z} in the L2 (µy )-norm. Thus
{0} ∪ {T m H ∗ φ : |m| ≤ M (δn , n )}
is δ−dense in {T m f : m ∈ Z} in the L2 (µy )-norm for almost every y ∈ Y .
Claim (3). Let F be the algebra generated by the set
H = {H ∗ φ : H ∈ L∞ (µ ×Y µ), H is T × T -invariant; φ ∈ L∞ (X)}
Denote B ∗ = σ(F ). Then the following holds:
1. B ∗ is T -invariant and hence defines a factor (X, B ∗ , µ, T ).
2. B ∗ is an intermediate factor: α−1 (D) ⊆ B ∗ ⊆ B
3. The AP functions in L2 (X, B ∗ , µ) are dense in L2 (X, B ∗ , µ)
Proof of Claim 3.
1. The fact B ∗ is T -invariant follows immediately by the identity
T (H ∗ φ) = H ∗ T φ
from (13.3), which holds for all H ∗ φ ∈ H .
68
2. Note that the constant function 1 := 1X×X ∈ L∞ (µ ×Y µ) is T × T -invariant. For any
∞
α
D ∈ D the function 1α
D = 1α−1 (D) ∈ L (X, B, µ). Thus the convolution of 1 and 1D is a
B ∗ -measurable function and is given by
Z
0
0
1 ∗ 1α
(x)
=
1α
D
D (x ) dµα(x) (x )
α
= E(1α
D | Y)(x)
= 1α
D (x) = 1α−1 (D) (x)
Therefore for every set D ∈ D the function 1α−1 (D) is B ∗ -measurable and subsequently
α−1 (D) ⊆ B ∗ .
3. By Claim 2, given any H ∗ φ ∈ H and > 0 there exists an AP rel Y function f such
that kf − H ∗ φkL2 (µ) < .
The sets E(δ, η) constructed in the proof of Claim 2 are D-measurable. Considering the
form of the function f as given in (13.5), it follows that f ∈ L2 (X, B ∗ , µ). Thus
H ⊆ clos{f ∈ L2 (X, B ∗ , µ) : f is AP rel Y}
(13.6)
The set on the right hand side of (13.6) is an algebra and it follows
F ⊆ clos{f ∈ L2 (X, B ∗ , µ) : f is AP rel Y}
(13.7)
It suffices to show that the set F is dense in L2 (X, B ∗ , µ). This follows from the following
lemma, the proof of which is adapted from [5].
Lemma 13.2. Let (X, B, µ) be a probability space. Suppose F ⊆ L∞ (X, B, µ) is an algebra
and 1X ∈ F . Then F is dense in L2 (X, B ∗ , µ) where B ∗ = σ(F ) is the σ-algebra generated
by F .
Proof. It is enough to prove that for all B ∈ B ∗ and any > 0 there exists some g ∈ F such
that
k1B − gkL2 (µ) < In other words, if we denote
C = B ∈ B ∗ : 1B ∈ clos(F )}
then it suffices to show B ∗ = C . Once this is established, by considering linear combinations
and using the fact F is an algebra containing the constants, the space of all simple functions in
L2 (X, B ∗ , µ) must also lie within the L2 -closure of F . By the density of the simple functions
we conclude F is dense in L2 (X, B ∗ , µ).
Sets of the form f −1 ([a, b]) for some f ∈ F and [a, b] ⊂ R generate B. Using the StoneWeierstrass Theorem (see [2]) we show these generators are contained in C . Indeed, for any
function f ∈ F given any compact interval [a, b] ⊂ R by the Stone-Weierstrass Theorem the
characteristic function 1[a,b] can be approximated by polynomials on the interval [−kf k∞ , kf k∞ ].
In particular, for every > 0 there exists some p ∈ R[x] such that
k1[a,b] − pkL2 (f ∗ µ) < where f ∗ is the pull-back Borel measure on R induced by f , that is
f ∗ µ(B) = µ(f −1 (B))
for all Borel sets B ⊆ R
69
(13.8)
It is clear the expression in (13.8) is equivalent to
k1f −1 [a,b] − p(f )kL2 (µ) < As F is an algebra, p(f ) ∈ F and hence f −1 ([a, b]) ∈ C . It remains to show C is a σ-algebra.
Clearly X ∈ C . For any f ∈ F we have 1X − f ∈ F and it follows C is closed under
complementation. Suppose A, B ∈ F and let > 0 be given. Then there exist f1 , f2 ∈ F such
that
and k1B − f2 kL2 (µ) <
k1A − f1 kL2 (µ) <
2
2kf1 k∞
Hence
k1A 1B − f1 f2 kL2 (µ) ≤ k1B k∞ k1A − f1 kL2 (µ) + kf1 k∞ k1B − f2 kL2 (µ) < so A ∩ B ∈ C . Moreover, by considering 1A∪B = 1A + 1B − 1A
S1∞B we have A ∪ B ∈ C . Thus,
given any sequence (An )n∈N ⊂ C and > 0 it is easy to deduce n=1 An ∈ C by noting that the
function 1S∞
can be approximated in L2 (X, B, µ) by functions of the form 1SN
.
n=1 An
n=1 An
This concludes the proof of the Multiple Recurrence and therefore Szemerédi’s Theorem.
References
[1] http://matheuscmss.wordpress.com/2010/02/15/
Ergodic Ramsey Theory: where Combinatorics and Number Theory meet Dynamics;
Viewed 17-01-2011
[2] B. Bolloás; Linear Analysis; Second Edition; Cambridge University Press, 1999.
[3] N. Chernov, R. Markarian; Chaotic Billiards (Mathematical Surveys and Monographs);
AMS, USA, 2006.
[4] J. Doob; Measure Theory; GTM 143; Springer-Verlag, New York 1994.
[5] M. Einsiedler, T. Ward; Ergodic Theory, with a view towards Number Theory; GTM 259;
Springer-Verlag, London, 2011.
[6] P. Erdös and P. Turán; ‘On some sequences of integers’; J. London Math. Soc. 11, 1936,
261 - 264.
[7] H. Furstenberg; Ergodic Behaviour of Diagonal Measures and a Theorem of Szemerédi on
Arithmetic Progressions; J. d’Analyse Math. Vol. 31, 1977.
[8] H. Furstenberg; Recurrence in Ergodic Theory and Combinatorial Number Theory; Princeton University Press, 1992.
[9] H. Furstenberg, Y. Katznelson; An Ergodic Szemerédi theorem for commuting transformations; J. d’Analyse Math. 3, 1978, 275 - 291.
[10] H. Furstenberg, Y. Katznelson, D. Ornstein; The Ergodic Theoretical Proof of Szemerédi’s
Theorem; Bulletin (New Series) of the American Mathematical Society, Vol. 7, Num. 3,
November 1982.
[11] R. Graham, B Rothschild; ‘A short proof of van der Waerden’s theorem on arithmetic
progressions’; Proc. Amer. Math. Soc. 42, 1974, 385-386.
[12] B. Hasselblatt, A. Katok; Principal Structures, from Handbook of Dynamical Systems 1A;
Elsevier Science, 2002.
[13] S. Kalikow, R. McCutcheon; An Outline of Ergodic Theory, Cambridge studies in advanced
mathematics 122; Cambridge University Press, Cambridge, 2010.
[14] A. Klenke; Probability Theory, A Comprehensive Course; Springer-Verlag, London, 2008.
[15] B. Koopman, J. v Neumann; Dynamical systems of continous spectra, Proc. Nat. Acad.
Sci. USA, 18 (1932), 255-263
70
[16] S. Luzzatt; Stochastic-Like Behaviour in Nonuniformly Expanding Maps, from Handbook
of Dynamical Systems 1B; Elsevier Science, 2006.
[17] M. Nadkarni; Basic Ergodic Theory, TRIM 6; Hindustan Book Agency, India, 1995.
[18] K. R. Parthasarathy; Probability Measures on Metric Spaces; Academic Press, 1967.
[19] K. Petersen; Ergodic Theory; Cambridge University Press, 1983.
[20] K. Roth; ‘Sur quelques ensembles d’entiers’; C. R. Acad. Sci. Paris, 234, 1952, 388-390.
[21] C. Silva; Invitation to Ergodic Theory, Student Mathematical Library Vol. 42; AMS, USA,
2008.
[22] E. Szemerèdi; ‘On sets of integers containing no four elements in arithmetic progression’;
Acta Math. Acad. Sci. Hungar. 20, 1969, 89-104.
[23] E. Szemerèdi; ‘On sets of integers containing no k elements in arithmetic progression’; Acta
Math. Acad. Sci. Hungar. 27, 1975, 199-245.
[24] N. Young; An Introduction to Hilbert Space; Cambridge University Press, 1988.
[25] D. Williams; Probability with Martingales; Cambridge University Press, 1991.
71