Download Script: Diophantine Approximation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Polynomial wikipedia , lookup

Resultant wikipedia , lookup

Quadratic form wikipedia , lookup

Field (mathematics) wikipedia , lookup

Polynomial ring wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Polynomial greatest common divisor wikipedia , lookup

System of polynomial equations wikipedia , lookup

Factorization wikipedia , lookup

Eisenstein's criterion wikipedia , lookup

Algebraic number field wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Transcript
Script: Diophantine Approximation
A. Kresch
Spring 2016
1
Elementary theory
Diophantine approximation is the study of approximations of real numbers by rational numbers. Given a real number α, we ask, how well can we approximate α by
a rational number p/q? For a fixed value of q, the real number qα differs from some
integer by an amount less than 1:
p 1
∀ q ∈ N>0 ∃ p ∈ Z : α − < .
q
q
Suppose we no longer fix the value of q, but allow q to vary. Our first task will be
to obtain an improved bound, in which 1/q is replaced by 1/q 2 ; when α is irrational
the improved bound
p
1
∃ p ∈ Z : α − < 2 .
q
q
will be satisfied for infinitely many values of q.
When we attempt to replace 2 by a greater exponent on the right-hand side,
the validity of such a bound is connected with the nature of the number α, and
particularly whether α is algebraic or transcendental. We will obtain results that,
when α is algebraic, restrict the exponents for which such a bound may be attained
for more than just finitely many values of q. The first of these, Liouville’s theorem,
is elementary in nature, having been presented in connection with the historically
first transcendental numbers to be exhibited (Liouville 1844).
1.1
Dirichlet’s approximation theorem
The proof of Dirichlet’s theorem on Diophantine approximation, which furnishes
the improved bound mentioned above, is a beautiful application of the pigeonhole
principle, the assertion that, given a positive integer Q, any function mapping a set
of cardinality greater than Q to a set of cardinality Q must assign the same value
to some pair of distinct elements of the domain.
1
2
1. Elementary theory
Proposition 1.1 (Dirichlet’s approximation theorem). Let α ∈ R and Q ∈ N>0 .
Then there exist integers p and q with
1 ≤ q ≤ Q,
such that
α − p < 1 .
q
Qq
Proof. We consider the function from {0, 1, . . . , Q} to {0, 1, . . . , Q − 1}, defined by
q 7→ Q(qα − bqαc) .
In other words, the fractional part qα − bqαc of qα is rounded down to the nearest
multiple of 1/Q, and the function records the multiple of 1/Q obtained in this
manner. By the pigeonhole principle, there exist integers
0 ≤ q < q0 ≤ Q
where the function has the same value; in particular,
|q 0 α − qα − p| <
1
Q
for some integer p. It follows that
α −
q0
p 1
<
.
0
−q
Q(q − q)
Since 1 ≤ q 0 − q ≤ Q, we have the desired conclusion.
Corollary 1.2. For α ∈ R the set of rational numbers p/q satisfying
α − p < 1 ,
q
q2
(1)
where p and q are relatively prime integers, is:
(i) infinite, if α is irrational;
(ii) finite, if α is rational, in which case there exists real C > 0 such that for any
integers p and q with q > 0 and p/q 6= α we have
α − p ≥ C ,
q
q
and more generally for any real number s > 1 the inequality, as in (1) but with
exponent 2 replaced by s, is also satisfied for only finitely many rational numbers
p/q.
1.1. Dirichlet’s approximation theorem
3
Proof. For any Q ∈ N>0 , the p and q that we obtain from Proposition 1.1 may be
taken to be relatively prime (by dividing out their gcd) and yield p/q satisfying (1).
If α ∈
/ Q then for any finite set S of rational numbers, with sufficiently large Q the
conclusion of Proposition 1.1 does not hold for any element of S. Therefore the set
of rational numbers satisfying (1) is infinite. If α ∈ Q, say, α = p0 /q0 for integers p0
and q0 with q0 > 0, then
p
p0 q − pq0
α− =
.
q
q0 q
1/(s−1)
So (ii) holds with C = 1/q0 and implies, for any integers p and q with q ≥ q0
and p/q 6= α, that |α − p/q| ≥ 1/q0 q ≥ 1/q s .
Notice that Corollary 1.2(ii) yields an amusingly simple proof of the irrationality
of the number e:
n
∞
X
X
1
2
2
1
1
0<e−
=
<
=
·
k!
k!
(n + 1)!
n + 1 n!
k=0
k=n+1
for n ∈ N. (The proof of the transcendence of e, as presented in the Algebra I lecture
and discussed below, is more sophisticated and makes use of a nontrivial auxiliary
function.)
The nth Farey sequence is the increasing sequence of rational numbers between
0 and 1 (including the endpoints), with denominator less than or equal to n. E.g.,
for n = 5,
1 1 1 2 1 3 2 3 4
0, , , , , , , , , , 1.
5 4 3 5 2 5 3 4 5
Lemma 1.3. If p/q and p0 /q 0 are adjacent terms in the nth Farey sequence for some
n, where p, q, p0 , q 0 are integers with gcd(p, q) = gcd(p0 , q 0 ) = 1, then
|p0 q − pq 0 | = 1.
Proof. There is no loss of generality in assuming that p, q, p0 , q 0 are nonnegative,
and
p
p0
< 0.
q
q
We first observe, there are uniquely determined integers r and s satisfying
qr − ps = 1
and
1 ≤ s ≤ q.
Since q − p ≥ 1 = qr − ps, which implies q(s − 1) ≥ p(s − 1) ≥ q(r − 1), we have
s ≥ r > 0.
In particular, r/s occurs in the nth Farey sequence, somewhere to the right of p/q.
4
1. Elementary theory
We prove the lemma by contradiction. Suppose
p0 q − pq 0 ≥ 2.
Since p0 /q 0 is adjacent to p/q in the nth Farey sequence, we must have
p0
r
< .
q0
s
We claim, p0 − (p0 q − pq 0 − 1)r and q 0 − (p0 q − pq 0 − 1)s are positive, relatively prime
integers satisfying
p0 − (p0 q − pq 0 − 1)r
p0
p
< 0
<
.
q
q − (p0 q − pq 0 − 1)s
q0
Indeed,
p0 q − (p0 q − pq 0 − 1)qr − (pq 0 − (p0 q − pq 0 − 1)ps) = p0 q − pq 0 − (p0 q − pq 0 − 1) = 1,
p0 q 0 − (p0 q − pq 0 − 1)p0 s − (p0 q 0 − (p0 q − pq 0 − 1)q 0 r) = (p0 q − pq 0 − 1)(q 0 r − p0 s) > 0,
q 0 − (p0 q − pq 0 − 1)s = q(q 0 r − p0 s) + s > 0.
The claim stands in contradiction to the hypothesis that p0 /q 0 is adjacent to p/q in
the nth Farey sequence.
Lemma 1.4. (i) If p/q and p0 /q 0 are adjacent terms in the nth Farey sequence, for
some integers p, q, p0 , q 0 with q, q 0 > 0 and gcd(p, q) = gcd(p0 , q 0 ) = 1, then
(p0 + p)q − p(q + q 0 ) = p0 (q + q 0 ) − q 0 (p + p0 ) = 1,
and the rational number
p + p0
q + q0
lies between p/q and p0 /q 0 .
(ii) If p/q, p00 /q 00 , and p0 /q 0 are three adjacent terms in the nth Farey sequence, for
some integers p, q, p00 , q 00 , p0 , q 0 with q, q 00 , q 0 > 0 and gcd(p, q) = gcd(p00 , q 00 ) =
gcd(p0 , q 0 ) = 1, then
p00
p + p0
=
.
q 00
q + q0
Proof. For (i) we may suppose without loss of generality, p/q < p0 /q 0 . So p0 q−pq 0 = 1
by Lemma 1.3. This implies (p0 + p)q − p(q + q 0 ) = 1 and p0 (q + q 0 ) − q 0 (p + p0 ) = 1.
As a consequence,
p
p + p0
p0
<
<
.
q
q + q0
q0
For (ii) we may suppose without loss of generality, p/q < p00 /q 00 < p0 /q 0 . By Lemma
1.3,
p00 q − pq 00 = p0 q 00 − p00 q 0 = 1.
So pq 00 + p0 q 00 − p00 q − p00 q 0 = −1 + 1 = 0.
1.1. Dirichlet’s approximation theorem
5
Proposition 1.5. If p/q and p0 /q 0 are adjacent terms in the nth Farey sequence,
for some integers p, q, p0 , q 0 with q, q 0 > 0 and gcd(p, q) = gcd(p0 , q 0 ) = 1, then for
every real number α between p/q and p0 /q 0 at least one of the following inequalities
holds:
α − p < √ 1 ,
q
5q 2
0
α − p + p < √ 1
,
q + q0
5(q + q 0 )2
0
α − p < √ 1 .
q0
5q 02
Proof. As before we may suppose p/q < p0 /q 0 . Define p00 := p + p0 and q 00 := q + q 0 .
We argue by contradiction. Suppose, first, we have
p
p00
< α < 00
q
q
and all three inequalities fail, i.e.,
α−
p
1
,
≥√
q
5q 2
p00
1
,
−α≥ √
00
q
5q 002
p0
1
.
−α≥ √
0
q
5q 02
Adding pairs of inequalities and applying Lemmas 1.3 and 1.4(i), we obtain
1
p00 p
1
1
1
= 00 − ≥ √
+
qq 00
q
q
5 q 2 q 002
and
So we have
1
p0 p
1
1
1
= 0− ≥√
+
.
qq 0
q
q
5 q 2 q 02
√
5qq 00 ≥ q 2 + q 002
and
√ 0
5qq ≥ q 2 + q 02 .
Adding, expanding, and rearranging, we obtain
√
2
5−1
0
0≥2
q−q ,
2
which is impossible. The argument for the case p00 /q 00 < α < p0 /q 0 is similar.
Corollary 1.6 (Hurwitz). For every irrational number α ∈ R there exist infinitely
many rational numbers p/q, where p and q are relatively prime integers satisfying
α − p < √ 1 .
q
5q 2
Proof. There is no loss of generality in supposing that 0 < α < 1. Then, for every
n ∈ N>0 , Proposition 1.5 supplies a rational number p/q that satisfies the desired
condition. It remains to show that there are infinitely many such rational numbers.
Given any finite set S of rational numbers, we take n to be large enough, so that
6
1. Elementary theory
|γ − α| > 1/n for every γ ∈ S. The adjacent terms p/q and p0 /q 0 in the nth Farey
sequence, with p/q < α < p0 /q 0 , are therefore not in S, and as well (p + p0 )/(q + q 0 )
is not in S by Lemma 1.4(i). So Proposition 1.5 supplies a rational number p/q that
satisfies the desired condition and is not in S.
The constant in Corollary 1.6 is optimal. Indeed, when
√
α=
5−1
2
we have, for any ε > 0, that there are only finitely many rational numbers p/q, with
p and q relatively prime integers satisfying
α − p < √ 1
.
q
( 5 + ε)q 2
(2)
We see this by writing
√
2
x +x−1= x−
√
5 − 1 − 5 − 1
x−
.
2
2
So,
2
2
α − p = |p √+ pq − q |
q
− 5−1 − p q 2
2
(3)
q
√
√
When |(− 5 − 1)/2 − p/q|√> 5 + ε we have |α − p/q| > ε, and this is compatible
2
−1
−1
with
of q. When
√(2) only for q < ε√ ( 5 + ε) , i.e., only for finitely many values √
|(− 5 − 1)/2 − p/q| ≤ 5 + ε, the right-hand side of (3) is at least 1/( 5 + ε)q 2 .
We recall, an algebraic number is a root of a nontrivial polynomial with integer
coefficients. An algebraic number α has a minimal polynomial, the unique monic
polynomial in Q[x] that divides every polynomial f ∈ Q[x] satisfying f (α) = 0. The
degree of an algebraic number α is the degree of the minimal polynomial of α.
Proposition 1.7 (Liouville). Let α ∈ R be an algebraic number of degree n ≥ 2.
Then there exists a positive constant C such that every rational number p/q, where
p and q are relatively prime integers with q > 0, satisfies
α − p ≥ C .
q
qn
The proof follows the same argument as that given to establish the optimality of
the constant in Hurwitz’s theorem, with the minimal polynomial of α factored into
n linear factors in C[x].
1.1. Dirichlet’s approximation theorem
7
Proof. Let f ∈ Q[x] be the minimal polynomial of α, and let M ∈ N>0 be a common
denominator of the coefficients of f , so that we may write
f (x) =
n
1 X
Ai x i
M
i=0
with Ai ∈ Z for all i. We factor f (x) over the complex numbers as
f (x) = (x − α1 )(x − α2 ) · · · (x − αn ),
with α1 , . . . , αn ∈ C and α1 = α. We have the following equality, where the
numerator on the right-hand side is a nonzero integer:
n
A
pn−1 q + · · · + A0 q n |
α − p = |An p +
n−1p .
q
M α2 − q · · · αn − pq q n
We claim, the result holds with
C :=
M
1
.
i=2 (|αi − α| + 1)
Qn
Indeed, we have
1
1
α − p ≥
Q
Q
≥
,
q
M q n ni=2 |αi − pq |
M q n ni=2 (|αi − α| + |α − pq |)
and this implies the result.
Corollary 1.8. Let α ∈ R be an algebraic numer of degree n ≥ 2. Then for every
real number s > n, the inequality
α − p < 1
q
qs
is satisfied for at most finitely many rational numbers p/q, with p and q relatively
prime integers and q > 0.
Proof. Let C be as in Proposition 1.7, and let s > n. For any positive integer q with
q s−n ≥
1
,
C
we have, for any integer p,
α − p ≥ C ≥ 1 .
q
qn
qs
So for any rational number p/q satisfying the inequality in the statement we must
have q < C −1/(s−n) .
8
1. Elementary theory
Corollary 1.9. Let α ∈ R be an irrational number. Suppose that there exist a
sequence (pi /qi ) of rational numbers tending to α, with pi and qi relatively prime
integers and qi > 0 for every i, and an unbounded increasing sequence of real numbers
(si ), with
α − pi < 1
qi
q si
for every i. Then α is transcendental.
Proof. By Corollary 1.2(ii), α is irrational. If we suppose that α is algebraic of degree
n ≥ 2, then by taking i such that si > n we obtain a contradiction to Corollary 1.8
for s = si .
Example. For Liouville’s number
∞
X
1
,
10j!
j=1
we may take for pi /qi the partial sums. We have, then, qi = 10i! . The hypothesis of
Corollary 1.9 is satisfied with si = i, hence Liouville’s number is transcendental.
Definition. The irrationality measure of a real number α is the supremum of
the set of real numbers s such that
α − p < 1
q
qs
holds for infinitely many rational numbers p/q, with relatively prime integers p and
q and q > 0.
The observations and results stated so far tell us:
• Every real number has irrationality measure at least 1.
• If α ∈ Q then the irrationality measure of α is equal to 1.
• If α ∈
/ Q then the irrationality measure of α is at least 2.
• If α is algebraic of degree n ≥ 2 then the irrationality measure of α is ≤ n.
• As consequences:
– Quadratic irrational numbers have irrationality measure 2.
– If α has infinite irrationality measure, then α is transcendental.
It is also a nice exercise to show that the real numbers with irrationality measure
greater than 2 form a set of Lebesgue measure zero. A highlight of this lecture will
be Roth’s theorem, which is the statement that irrational algebraic numbers all have
irrationality measure equal to 2.
1.2. Continued fractions
1.2
9
Continued fractions
An important source of approximations of an irrational number α ∈ R is the continued fraction expansion, an expression of the form
1
a0 +
(1)
1
a1 +
a2 +
1
a3 + · · ·
with a0 ∈ Z and ai ∈ N>0 for i ≥ 1. In the other direction, given the sequence (ai )
we may define the above expression by truncating and passing to the limit.
Definition. Let a0 , a1 , . . . be a sequence of integers with ai > 0 for i ≥ 1. We
define rational numbers [a0 , . . . , an ] for n ∈ N, called convergents, recursively by
[a0 ] := a0 ,
[a0 , a1 , . . . , an ] := a0 +
1
.
[a1 , . . . , an ]
We will see soon that the sequence of numbers [a0 , . . . , an ] does, in fact, converge.
First we record some basic properties.
Proposition 1.10. Given a sequence of integers a0 , a1 , . . . with ai > 0 for i ≥ 1,
we define
p−1 := 1,
q−1 := 0,
p0 := a0 ,
q0 := 1,
c0 := a0 ,
and recursively, for n > 0,
pn := an pn−1 + pn−2 ,
qn := an qn−1 + qn−2 ,
cn :=
pn
.
qn
Then (qn )n∈N>0 is an increasing sequence of positive integers, pn and qn are relatively
prime for every n, and the following identities are valid for n ∈ N>0 :
pn pn−1
pn−1 pn−2
an 1
=
(2)
qn qn−1
qn−1 qn−2
1 0
pn qn−1 − pn−1 qn = (−1)n−1
(3)
n
(4)
cn = [a0 , a1 , . . . , an ].
(5)
pn qn−2 − pn−2 qn = (−1) an
Proof. That (qn )n∈N>0 is an increasing sequence is clear from the definition, as is
equation (2). Evaluating determinants, we obtain (3) by an inductive argument, and
deduce as a consequence that pn and qn are relatively prime. A matrix equation,
similar to (3) but with the two subscripts n − 1 changed to n − 2 on the left-hand
side and the 0 and 1 in the rightmost column swapped on the right-hand side, lets
10
1. Elementary theory
us deduce (by evaluating determinants) equation (4). To obtain (5), we define a
new sequence by ãn := an+1 , and with this, sequences (p̃n ) and (q̃n ), which by an
inductive argument are seen to satisfy
pn = a0 p̃n−1 + q̃n−1
and
qn = p̃n−1 .
(This inductive argument takes the previous two cases as induction hypothesis.)
Now (5) follows by straightforward induction on n.
Corollary 1.11. With notation as above, the numbers cn = [a0 , a1 , . . . , an ] form a
convergent sequence. If we define α := limn→∞ cn then the following properties hold.
(i) The subsequence (c2n ) increases toward α.
(ii) The subsequence (c2n+1 ) decreases toward α.
(iii) We have α ∈
/ Q.
(iv) For every n ∈ N we have
1
pn 1
< α − <
.
2
(an+1 + 2)qn
qn
an+1 qn2
Proof. We have, by (3) and (4),
c0 < c2 < c4 < · · · < c5 < c3 < c1 ,
and
|cn+1 − cn | =
1
1
≤
(an+1 qn + qn−1 )qn
an+1 qn2
for n ∈ N. So limn→∞ cn exists, and the limit α satisfies properties (i), (ii), and the
portion |α − cn | < 1/an+1 qn2 of (iv), which by Corollary 1.2(ii) implies (iii). As well,
|cn+2 − cn | < |α − cn |,
and by the calculation
an+2
(an+2 an+1 + 1)qn2 + an+2 qn−1 qn
an+2
≥
(an+2 an+1 + an+2 + 1)qn2
1
≥
.
(an+1 + 2)qn2
|cn+2 − cn | =
we obtain the remaining part of (iv).
Definition. Given a sequence of integers a0 , a1 , . . . with ai > 0 for i ≥ 1, the
corresponding infinite continued fraction shown in (1) and denoted as well by
[a0 , a1 , . . . ],
is defined to be the limit α appearing in Corollary 1.11.
1.2. Continued fractions
11
Corollary 1.12. We have
[a0 , a1 , . . . ] = a0 +
1
.
[a1 , . . . ]
Proof. The sequence of convergents [a0 , a1 , . . . , an ] converges to [a0 , a1 , . . . ], and the
sequence of quantities a0 + 1/[a1 , . . . , an ] converges to a0 + 1/[a1 , . . . ].
Example. When ai = 1 for every i we find
√
5+1
[1, 1, . . . ] =
.
2
√
Indeed, α := [1, 1, . . . ] satisfies α = 1+1/α, which implies α = (± 5+1)/2, and since
1 < α < 2 (Corollary 1.11) sign must be +. This continued fraction is connected
with the Fibonacci numbers, given by F0 := 0, F1 := 1, and Fn := Fn−1 + Fn−2 ;
specifically, pn = Fn+2 and qn = Fn+1 . As well we have (solution to linear recurrence
relation)
√
√
1 1 + 5 n 1 − 5 n
Fn = √
−
.
2
2
5
Given a finite sequence of integers a0 , a1 , . . . , an for some n ∈ N, with ai > 0 for
1 ≤ i ≤ n, we have the rational number [a0 , . . . , an ] (defined as above), which may
be described as a finite continued fraction. The next result shows that every
rational number may be expressed as a finite continued fraction, in a manner that
is unique up to the substitution of (an − 1) + 1/1 for an ≥ 2, and every irrational
number is in a unique way an infinite continued fraction.
Proposition 1.13. Let β ∈ R. We define a0 := bβc, and if β ∈ Z we set n := 0,
otherwise we define β1 := 1/(β − a0 ). Recursively, given βk , we define ak := bβk c,
and if βk ∈ Z we set n := k, otherwise βk+1 := 1/(βk − ak ). If β ∈ Q then the
procedure terminates with
[a0 , . . . , an ] = β
with n = 0 or an ≥ 2, and [a0 , . . . , an ] and [a0 , . . . , an −1, 1] are the only ways to
express β as a finite continued fraction. If β ∈
/ Q, then the procedure defines an
infinite sequence a0 , a1 , . . . , uniquely characterized by the property
[a0 , a1 , . . . ] = β.
Moreover, if β ∈ Q then [ai , . . . , an ] = βi for i ≤ n; if β ∈
/ Q then [ai , ai+1 , . . . ] = βi
for all i.
Proof. For i ≥ 1, we have βi > 1, with βi ∈ Q if and only if β ∈ Q. Furthermore,
we claim
(
[a0 , . . . , ai ] ≤ β < [a0 , . . . , ai−1 ], if i is even,
[a0 , . . . , ai−1 ] < β ≤ [a0 , . . . , ai ], if i is odd,
12
1. Elementary theory
with equality in each case if and only if βi ∈ Z. Indeed, we have
β − [a0 , . . . , ai ] = (−1)i
β i − ai
β1 · · · βi [a1 , . . . , ai ] · · · [ai−1 , ai ]ai
by an inductive argument, where we apply the induction hypothesis in the form
of the expression for β1 − [a1 , . . . , ai ]. The procedure either stops with a0 , . . . , an
with n = 0 or an ≥ 2, and β = [a0 , . . . , an ] = [a0 , . . . , an −1, 1], or yields an infinite
sequence with β = [a0 , a1 , . . . ].
Finally, if we apply the procedure to β := [a0 , . . . , an ] with n = 0 or an ≥ 2, then
we obtain βi = [ai , . . . , an ] and bβi c = ai for i = 1, . . . , n, and with β := [a0 , a1 , . . . ]
we obtain βi = [ai , ai+1 , . . . ] and bβi c = ai for all i ∈ N>0 . (The latter assertion
uses Corollary 1.12.) These observations justify the uniqueness assertions.
Lemma 1.14. Given β ∈ R, let sequences (βk ), (pk ), and (qk ) be as in Proposition
1.13 and (ck ) the associated sequence (finite or infinite) of convergents. Let k ∈ N be
such that ck = pk /qk and ck+1 = pk+1 /qk+1 are defined (always the case, if β ∈
/ Q).
(i) We have
βk+1 pk + pk−1
.
β=
βk+1 qk + qk−1
(ii) For any integers p and q with 0 < q < qk+1 , we have |p − qβ| ≥ |pk − qk β|.
Proof. We prove (i) by induction on k. The case k = 0 is clear. For the inductive
step, from βk = ak + 1/βk+1 and the induction hypothesis we have
β=
βk+1 (ak pk−1 + pk−2 ) + pk−1
βk+1 pk + pk−1
=
.
βk+1 (ak qk−1 + qk−2 ) + qk−1
βk+1 qk + qk−1
For (ii), we define r := pk q − qk p and s := pk+1 q − qk+1 p. By Proposition 1.10,
spk − rpk+1 = (−1)k p
and
sqk − rqk+1 = (−1)k q.
Notice, because of the hypothesis imposed on q, we must have s 6= 0. For the same
reason we must have rs ≥ 0. So by Corollary 1.11(i)–(ii) we have
|p − qβ| = |(spk − rpk+1 ) − (sqk − rqk+1 )β| = |s(pk − qk β) − r(pk+1 − qk+1 β)|
= |s(pk − qk β)| + |r(pk+1 − qk+1 β)|,
and the desired inequality follows.
Proposition 1.15. Given β ∈ R the associated sequence of convergents contains
any rational number p/q with integers p and q satisfying
β − p < 1 .
q
2q 2
1.2. Continued fractions
13
Proof. The result is clear if β = p/q, so we suppose the contrary. There is no loss
of generality in supposing that p and q are relatively prime, with q positive. We let
k ∈ N be such that qk ≤ q < qk+1 . By Lemma 1.14(ii) we have |p − qβ| ≥ |pk − qk β|.
So
p p p
p
1
1
1
k
k
≤
,
− ≤ − β + − β < 2 +
q
qk
q
qk
2q
2qqk
qqk
and this forces p/q = pk /qk .
Proposition 1.15 is a powerful result. It tells us that if we want to check some
property about approximations of an irrational number by rational numbers, it
suffices to focus our attention just on the convergents, at least for questions about
approximations by p/q to within 1/2q 2 . We illustrate this with the next result.
Let us say that an irrational number α ∈ R is badly approximable if there
exists a positive constant C such that
α − p < C
q
q2
holds for only finitely many rational numbers p/q, with p and q relatively prime
integers.
Proposition 1.16. An irrational number α = [a0 , a1 , . . . ] is badly approximable if
and only if the sequence (an ) is bounded.
Proof. Suppose that α is badly approximable, and let C be as in the definition. So in
particular, the convergents pn /qn for sufficiently large n satisfy |α − pn /qn | ≥ C/qn2 .
Then for such n we have an+1 < 1/C by Corollary 1.11(iv), and thus the sequence
(an ) is bounded.
For the reverse implication, we suppose that (an ) is bounded and suppose specifically that n0 , K ∈ N are such that an ≤ K for all n > n0 . We claim, the definition
of badly approximable for α is satisfied with C = 1/(K + 2). Indeed, any rational
number p/q satisfying the inequality is a convergent, by Proposition 1.15. For the
convergents pn /qn with n ≥ n0 we have |α−pn /qn | > C/qn2 by Corollary 1.11(iv).
A classical fact is that an irrational number α has (eventually) periodic continued
fraction expansion
[a0 , . . . , am−1 , am , . . . , an ]
if and only if α belongs to a quadratic extension of Q; we give a proof of this fact
below; then Proposition 1.16 will tell us that all such numbers are badly approximable, a fact that also follows directly √
from Liouville’s theorem (Proposition 1.7).
Let d ≥ 2 be a squarefree integer and Q(√ d) the √
corresponding quadratic extension,
with nontrivial automorphism (sending d to − d) denoted by
β 7→ β 0 .
14
1. Elementary theory
√
An element β ∈ Q( d) is said to be reduced if
β>1
and
− 1 < β 0 < 0.
We define
ι(β) := −
1
β0
√
for β ∈ Q( d)× and observe that ι(ι(β)) = β, and if β is reduced then so is ι(β).
Lemma 1.17. The formula
β 7→
1
β − bβc
√
defines a bijective map from the set of reduced elements of Q( d) to itself, with
inverse given by
1
β 7→ ι
.
ι(β) − bι(β)c
Proof. If β is reduced, then in particular β is irrational. We have 0 < β − bβc < 1,
and hence 1/(β − bβc) > 1. As well, bβc is positive, so β 0 − bβc < −1, and hence
1/(β 0 − bβc) lies between −1 and 0.
Applying ι to 1/(β − bβc), we obtain −β 0 + bβc. It is then straightforward to
verify the formula for the inverse.
√
For an irrational number β in Q( d), we consider the minimal polynomial of β,
scaled to have integer coefficients with gcd 1
Ax2 + Bx + C
(6)
and define the discriminant of β to be B 2 − 4AC.
Lemma 1.18. The map in Lemma 1.17 preserve the discriminant,
and for given
√
D ∈ N>0 there are only finitely many reduced elements of Q( d) of discriminant D.
Proof. Subtracting an integer from β does not change the discriminant, as we may
see by an explicit calculation. The discriminant 1/β is equal to the discriminant of β,
since the minimal polynomial (rescaled, as above) has the same coefficients, in reverse
order. Since the map in Lemma 1.17 is built out of these operations, the discriminant
is preserved. For the second assertion, we consider a (scaled) minimal polynomial (6),
which without loss of generality
coefficient
√ may be assumed to have positive leading
√
A. The roots are (−B ± D)/2A. Only
the
larger
root
(−B
+
D)/2A
has a
√
√
chance√
to be reduced,√and this requires D > 2A + B and B > − D, which imply
|B| < D and A < D. Finally, C is determined
by A, B, and D, so there are
√
only finitely many reduced elements of Q( d) of discriminant D.
1.2. Continued fractions
15
Proposition 1.19. An irrational number β ∈ R has periodic continued fraction
expansion if and only if β belongs to a quadratic extension of Q.
Proof. It is straightforward to see that any periodic continued fraction belongs to
a√quadratic extension of Q (by arguing, essentially, as in the example
[1, 1, . . . ] =
√
( 5 + 1)/2). It remains to show that any irrational number in Q( d) has periodic
continued fraction (where d ≥ 2 is a squarefree integer). The argument proceeds in
two steps: (i) we show that the sequence (βn ) of Proposition 1.13 contains a reduced
element; (ii) we establish the periodicity of the subsequence of reduced elements and
hence of the corresponding integers an = bβn c.
√
For (i) we have, for irrational β ∈ Q( d), the following consequence of Lemma
1.14(i):
1
βqn − pn
qn
(−1)n
,
−
=
=
+
n−1
2
βn+1
βqn−1 − pn−1
qn−1 (β − pqn−1
)qn−1
where the second equality
uses (3) of Proposition 1.10. We apply the nontrivial
√
automorphism of Q( d) to the left- and right-hand sides to obtain a new relation
containing the expression β 0 − pn−1 /qn−1 , which converges to β 0 − β and hence for
sufficiently large n has constant sign. It follows that βn is reduced, for some n.
For (ii) we observe by Lemma 1.17, if βn is reduced then so is βn+1 . In combination with Lemma 1.18, this shows that the sequence βn , βn+1 , . . . is periodic.
Next, in this brief treatment of continued fractions, we recall the proof of the
transcendence of e by means of auxiliary functions, which as we will see, following
H. Cohn, A short proof of the simple continued fraction expansion of e, American
Mathematical Monthly volume 113 (2006), pp. 57–62, may also be used to deduce
the continued fraction expansion. Let d ∈ N>0 and c0 , . . . , cd ∈ Z be given, with
c0 6= 0; we assert that
d
X
cj ej 6= 0,
j=0
and since d may be increased arbitrarily (by introducing additional coefficients, equal
to zero), we may suppose that
A :=
d
X
j=0
cj
d
Y
(j − i)
i=0
i6=j
is nonzero. We choose an integer n > |A| so that
Ced (dd+2 )n < n!
where
C :=
d
X
j=0
|cj |.
16
1. Elementary theory
Let us introduce the auxiliary function (a function of j)
1 j
e
n!
Z
j
xn (x − 1)n · · · (x − d)n e−x dx.
0
For j ∈ {0, 1, . . . , d} we have
1 j
e
n!
Z
0
j
1
1
xn (x − 1)n · · · (x − d)n e−x dx ≤ ej jdn(d+1) ≤ ed dn(d+2) .
n!
n!
For the linear combination of function values with coefficients cj , then,
Z
d
C
X
cj j j n
n
n −x
x
(x
−
1)
·
·
·
(x
−
d)
e
dx
e
≤ ed (dd+2 )n < 1.
n!
n!
0
(7)
j=0
If g(x) is any polynomial and we write g + g 0 + . . . for the sum of derivatives of
all orders (a finite sum since g (k) = 0 for sufficiently large k), then
d
(g + g 0 + · · · )e−x = −g(x)e−x .
dx
This observation lets us evaluate the integral in the auxiliary function. Setting
g(x) := xn (x − 1)n · · · (x − d)n ,
the linear combination of function values in (7) evaluates to
d
1 X j
cj e
n!
∞
Z
j=0
0
g(x)e−x dx −
d
X
cj
j=0
n!
g(j) + g 0 (j) + · · · .
(8)
The second sum in (8) is an integer, since the derivatives to order less than n
contribute nothing and n! divides g (n) .
The remainder of the argument is number-theoretic in nature. We may suppose
that n = p, a prime number, and we examine the value of the second sum in (8)
mod p. By expanding g (k) for k ≥ p with the iterated Leibniz rule and discarding
all terms divisible by x − j or by p · p! we are left to consider only the contribution
from g (p) , and from the general fact ap ≡ a mod p (Fermat’s little theorem) we find
d
X
cj
j=0
p!
g(j) + g 0 (j) + · · · ≡ A
mod p.
Since p > |A|, the second sum in (8) is a nonzero integer. Compatibility with (7)
requires the first sum to be nonzero, as desired.
1.2. Continued fractions
17
We now focus on the case d = 1 of the above argument, where the value at j = 1
of the auxiliary function is
Z
e 1 n
An :=
x (x − 1)n e−x dx.
n! 0
We introduce some similar expressions:
Z
e 1 n
Bn :=
x (x − 1)n+1 e−x dx,
n! 0
Z
e 1 n+1
Cn :=
x
(x − 1)n e−x dx.
n! 0
Each of these quantities is a Z-linear combination of 1 and e. For instance with
g(x) as above,
e
1
(g(0) + g 0 (0) + · · · ) − (g(1) + g 0 (1) + · · · ).
n!
n!
There is also the obvious relation
An =
Cn = An + Bn .
We obtain two further relations, for n ≥ 1:
An = Bn−1 + Cn−1 ,
Bn = 2nAn + Cn−1 ,
as consequences of
d n
x (x − 1)n e−x = nxn−1 (x − 1)n e−x + nxn (x − 1)n−1 e−x − xn (x − 1)n e−x ,
dx
d n+1
x
(x − 1)n e−x = nxn (x − 1)n e−x + xn (x − 1)n e−x
dx
+ nxn (x − 1)n e−x + nxn (x − 1)n−1 e−x − xn+1 (x − 1)n e−x .
Proposition 1.20. We have
e = [2, 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1, . . . ].
Proof. Let the displayed continued fraction define pn and qn for n = −1, 0, 1, . . . .
Now the above relations, plus the starting data
B0 = −1
and
C0 = −2 + e
(which we obtain by direct computation) yield, by an inductive argument,
An = −p3n−2 + q3n−2 e,
Bn = −p3n−1 + q3n−1 e,
Cn = −p3n + q3n e.
As a consequence of (7), the sequence (An ) is bounded (in absolute value). This
implies, since (qn )n∈N>0 is an increasing sequence, that the convergents p3n−2 /q3n−2
tend to e. It follows that the continued fraction is equal to e.
18
1. Elementary theory
The proof of the following corollary uses the general observation that the sequence
(qn ) of any infinite continued fraction grows at least exponentially. Indeed, in the
example [1, 1, . . . ] this sequence consists of Fibonacci numbers, whose growth is
asymptotically exponential. The Fibonacci sequence (with a shift in index by one)
is thus a lower bound for the sequence in general, since in the general case we have
qn = an qn−1 + qn−2 ≥ qn−1 + qn−2 .
Corollary 1.21. The number e has irrationality measure equal to 2.
Proof. Since e is irrational, the irrationality measure must be at least 2. So it
remains only to show, for ε > 0, that the inequality
e − p < 1
q
q 2+ε
holds for only finitely many rational numbers p/q (where p and q are relatively prime
integers, with q > 0). When q ≥ 21/ε , the inequality implies |e − p/q| < 1/2q 2 , which
by Proposition 1.15 implies that p/q appears in the sequence of convergents to e.
For n ∈ N the terms an in the continued fraction expansion given in Proposition
1.20 satisfy
2n + 10
,
an+1 + 2 ≤
3
hence by Corollary 1.11,
3
e − pn >
.
qn
(2n + 10)qn2
Since the sequence (qn ) grows at least exponentially, the linear expression (2n+10)/3
is smaller than qnε for all sufficiently large n, and we are done.
Liouville’s number, and other numbers whose definition follows a similar pattern,
have continued fraction expansion that follows a regular pattern. This was observed
by J. Shallit in a pair of articles published in the Journal of Number Theory (volume
11, pages 209–217 and volume 14, pages 228–231).
Lemma 1.22. For positive integers a1 , a2 , . . . with a1 ≥ 2, −[0, a1 , . . . , an ] =
[−1, 1, a1 −1, a2 , . . . , an ] for all n ∈ N>0 , and −[0, a1 , a2 , . . . ] = [−1, 1, a1 −1, a2 , . . . ].
Proof. This results by comparing, with a shift by one, the respective sequences of
convergents.
Lemma 1.23. Let x := [0, a1 , . . . , an ] be a finite continued fraction for some positive
integer n, with an ≥ 2; we write x = p/q where p and q are positive relatively prime
integers. Then for any integer m ≥ 2,
x+
(−1)n
= [0, a1 , . . . , an , m − 1, 1, an − 1, an−1 , . . . , a1 ].
mq 2
1.3. Heights and coefficients
19
Proof. By Lemma 1.22, we have
[m − 1, 1, an − 1, an−1 , . . . , a1 ] = m − [0, an , . . . , a1 ].
Let us adopt the notation of Proposition 1.10, so e.g., p = pn and q = qn . From (2)
we have the matrix identity
0 1
a1 1
an 1
pn pn−1
···
=
.
1 0
1 0
1 0
qn qn−1
Transposing and conjugating by the non-identity 2 × 2 permutation matrix, we find
[0, an , . . . , a1 ] =
qn−1
.
qn
Now by Lemma 1.14(i),
[0, a1 , . . . , an , m − 1, 1, an − 1, an−1 , . . . , a1 ] =
(m −
qn−1
qn )pn
qn−1
qn )qn
+ pn−1
+ qn−1
(m −
mpn qn − pn qn−1 + pn−1 qn
=
mqn2
n
pn (−1)
,
=
+
qn
mqn2
where in the last step we have used (3).
Example. Starting with 1/10 + 1/100 = [0, 9, 11] and applying Lemma 1.23 repeatedly we obtain the continued fraction expansion of Liouville’s number
[0, 9, 11, 101·2! − 1, 1, 10, 9, 102·3! − 1, 1, 8, 10, 1, 101·2! − 1, 11, 9, . . . ]
P
−3i , the
Similarly, starting with 1/10 + 1/1000 = [0, 9, 1, 9, 10] we obtain, for ∞
i=0 10
continued fraction expansion
1
2
1
[0, 9, 1, 9, 10, 103 − 1, 1, 9, 9, 1, 9, 103 − 1, 1, 8, 1, 9, 9, 1, 103 − 1, 10, 9, 1, 9, . . . ].
It is a nice exercise to use
fraction expansion and Proposition 1.15 to
P the continued
−3i has irrationality measure 3.
10
show that the number ∞
i=0
1.3
Heights and coefficients
The classical auxiliary functions in §1.1 and §1.2, minimal polynomials of algebraic
numbers and (linear combinations of) residuals of Taylor series approximations to
ex , are naturally suggested by the respective irrational numbers under consideration.
The more modern results in Diophantine approximation that will be presented make
use of auxiliary functions which cannot be written down explicitly. Instead, one
20
1. Elementary theory
keeps careful track of the sizes of coefficients and deduces the existence of suitable
auxiliary functions from clever application of the pigeonhole principle. Here we
introduce some notions and results that will be used in demonstrating the existence
of and working with such auxiliary functions.
Let f ∈ Z[x] be a nonzero polynomial. The most elementary measure of size of
f is the degree, which is applicable as well to an algebraic number α (as the degree
of the minimal polynomial). Now we define the height of f = an xn + · · · + a1 x + a0
(ai ∈ Z for all i) to be the maximum of the absolute values of the coefficients:
H(f ) := max |ai |;
0≤i≤n
the height of an integer polynomial in several variables is, similarly, the maximum
of the absolute values of the coefficients of all the monomials. If α is an algebraic
number, then as mentioned before, the minimal polynomial may be scaled to have
integer coefficients with gcd 1, and the maximum of the absolute values of the
coefficients is the height H(α). (The scaled polynomial is unique up to sign, so the
absolute values of coefficients are well-defined.) For example, for relatively prime
integers p and q with q 6= 0, the rational number p/q has height max(|p|, |q|).
The following important property is clear from the fact that there are only finitely
many integer polynomials with degree and absolute value of coefficients bounded by
given quantities.
Fact (Northcott property). Let n be a positive integer and T a positive real number.
Among all algebraic numbers α of degree at most n, those with H(α) ≤ T are finite
in number.
For instance, the set of algebraic numbers of degree at most 2 and height at
most
three rational numbers, 0 and ±1, the real quadratic numbers
√ 1 consists of √
( 5 ± 1)/2 and (− 5 ± 1)/2, and the non-real 4th and 6th roots of unity.
Let α be an algebraic integer of degree n. Then Z[α] is a free Z-module of rank
n; the standard basis is 1, α, α2 , . . . , αn−1 . Now we are interested in the coefficients
of a power of α with respect to this basis. Since the powers up to n − 1 of α are the
basis elements, we make a statement that applies to powers at least n.
Lemma 1.24. Let α be an algebraic integer of degree n and height H. Then for
i ≥ n the coefficients of αi with respect to the standard Z-basis 1, α, . . . , αn−1 of
Z[α] have absolute value at most (H + 1)i−n+1 .
Proof. We let xn + an−1 xn−1 + · · · + a0 be the minimal polynomial of α, with ai ∈ Z
for i = 0, . . . , n − 1, and we prove the statement by induction on i. For the base
case i = n we have
αn = −a0 − a1 α − · · · − an−1 αn−1 ,
1.3. Heights and coefficients
21
with coefficients at most H in absolute value, by the definition of height. For the
inductive step we suppose i > n, with
αi−1 = b0 + b1 α + · · · + bn−1 αn−1
for some integers b0 , . . . , bn−1 with |bi | ≤ (H + 1)i−n for all i. Then
αi = −a0 bn−1 + (b0 − a1 bn−1 )α + · · · + (bn−2 − an−1 bn−1 )αn−1 ,
with coefficient of 1 of absolute value at most H(H + 1)i−n and other coefficients of
absolute value at most (H + 1)i−n+1 .
Lemma 1.25. Let H, m, and n be positive integers, and let (aij ) ∈ Mat(m × n, Z)
with |aij | ≤ H for all i and j. Assume that m < n. Then there exists a nontrivial
solution (x1 , . . . , xn ) ∈ Zn to the system of linear Diophantine equations
n
X
aij xj = 0
(1 ≤ i ≤ m)
j=1
with, for every j,
m
|xj | ≤ (nH) n−m .
Proof. Let B := b(nH)m/(n−m) c, and let
S := {(x1 , . . . , xn ) ∈ Zn | 0 ≤ xj ≤ B ∀ j},
−
so the cardinality of S is (B + 1)n . Let A := (aij ), and let n+
i , respectively ni
denote the number of positive, respectively negative entries in the ith row of A. So,
−
−
n+
i + ni ≤ n, and for x = (x1 , . . . , xn ) ∈ S the ith entry of Ax lies between −ni BH
+
and ni BH. This means, the linear map corresponding to A maps S to a set of
cardinality at most (nBH + 1)m . Now (B + 1)n−m > (nH)m , and hence
(B + 1)n > (nBH + 1)m .
We may apply the pigeonhole principle to deduce that there exist distinct x and x0
in S with Ax = Ax0 , and x − x0 is a solution to the system of linear Diophantine
equations that obeys the stated bound.
Lemma 1.26. (i) Let f ∈ Z[x] be a polynomial of degree d and height H. For k ∈ N,
(1/k!)f (k) has integer coefficients and height at most 2d H.
(ii) Let f ∈ Z[x1 , . . . , xn ] have degree dj in the variable xj for all j and height H.
For i1 , . . . , in ∈ N, (1/i1 ! · · · in !)∂ i1 +···+in /∂xi11 · · · ∂xinn f has integer coefficients and
height at most 2d1 +···+dn H.
`
Proof. For (i), if a` denotes
the coefficient of x , then for k ≤ ` ≤ d the coefficient
`
`−k
(k)
of x
of (1/k!)f
is k a` . Since the binomial coefficient is bounded by 2` , this
gives (i). The multivariable generalization (ii) is clear by the same reasoning.
22
2
2. Thue’s theorem
Thue’s theorem
Algebraic numbers of degree 2, we have seen, are badly approximable, which implies
(and is stronger than) irrationality measure 2. For real numbers that are algebraic
of degree n ≥ 3, there is a gap between the lower bound for the irrational measure
of 2 (by Dirichlet’s approximation theorem) and the upper bound of n (coming from
Liouville’s theorem). The first step toward closing this gap is Thue’s theorem.
Theorem 2.1 (Thue). Let α ∈ R be an algebraic number of degree n ≥ 3. For every
ε > 0 the inequality
1
α − p <
n/2+1+ε
q
q
holds for only finitely many rational numbers p/q, where p and q are relatively prime
integers with q > 0.
As a first reduction step, we point out that it suffices to prove Thue’s theorem
under the assumption that α is an algebraic integer. Indeed, suppose that we know
the result for algebraic integers. If α is a general algebraic number, then mα is an
algebraic integer for some m ∈ N>0 , and hence
1
mα − p <
n/2+1+ε/2
q
q
holds for only finitely many rational numbers p/q, where p and q are relatively prime
integers with q > 0. Since for q ≥ m2/ε we have
m
q n/2+1+ε
≤
1
q n/2+1+ε/2
,
we deduce the desired result for α.
The exposition follows K. B. Stolarsky, Algebraic Numbers and Diophantine Approximation.
2.1
A class of auxiliary functions
We suppose that α is an algebraic integer of degree n ≥ 3, with minimal polynomial
xn + an−1 xn−1 + · · · + a1 x + a0
of height H := max(|a0 |, . . . , |an−1 |). The auxiliary functions used in the proof of
Thue’s theorem will be nonzero polynomials in two variables of the form
P (x) − yQ(x) ∈ Z[x, y]
2.1. A class of auxiliary functions
23
such that, when α is substituted for y, there is an identity of the form
P (x) − αQ(x) = (x − α)h (F0 (x) + αF1 (x) + · · · + αn−1 Fn−1 (x))
(1)
in (Z[α])[x], for some integer h ≥ 2. Notice, given (1) and supposing Q(α) 6= 0 for
simplicity, that for β ∈ R with |β − α| sufficiently small, γ := P (β)/Q(β) satisfies
|γ − α| < |β − α|; when β is a rational number, then so is γ.
Let us postulate that each Fi has degree at most some given k ∈ N (where we
also allow Fi to be zero for some i) and write
Fi (x) =
k
X
cjn+i+1 xj .
j=0
for i = 0, . . . , n − 1. Let us as well write the powers of α in terms of the standard
Z-basis of Z[α]:
n−1
X (m)
m
α =
bi αi .
i=0
Then (1) translates into the system of linear equations in the coefficients ci :
n−1
X min(k,m)
X
j
(−1)
i=0
j=0
j≥m−h
h
(h+i+j−m)
b`
cjn+i+1 = 0
m−j
(2 ≤ ` ≤ n−1, 0 ≤ m ≤ h+k).
In every summand, the binomial coefficients is at most 2h , and by Lemma 1.24 the
(h+j+`−m)
quantity |b`
| is at most (H + 1)h .
Proposition 2.2. With the notation as above, suppose that
n
− 1 h < k + 1 < (n − 1)h.
2
Then, setting
r :=
k+1
,
(n/2 − 1)h
there exist Z-linearly independent polynomials P , Q ∈ Z[x] satisfying (1) for some
F0 , . . . , Fn−1 ∈ Z[x], each (zero or) of degree at most k and of height at most
n
r
(n(k + 1)(2H + 2)h ) 2 r−1 −1 .
Proof. We view (1), as above, as a system of (n − 2)(h + k + 1) linear Diophantine
equations in c1 , . . . , cn(k+1) with coefficients of absolute value ≤ (2H + 2)h . Then
by Lemma 1.25, there exist a nontrivial identity (1) where F0 , . . . , Fn−1 have degree
at most k and height at most (n(k + 1)(2H + 2)h )(n−2)(h+k+1)/(2(k+1)−(n−2)h) . The
exponent is (n/2)(r/(r − 1)) − 1.
24
2. Thue’s theorem
It remains only to exclude that P and Q in the identity (1) are Z-linearly dependent. A Z-linear dependence would imply that P and Q are divisible by the hth
power of the minimal polynomial of α. But the degrees of P and Q can be at most
h + k, and by the hypothesis, this is less than hn.
For our purposes it will suffice to fix a value of r slightly more than 1. E.g., we
may take r equal to (or close to) 1 + 1/N for a positive integer N .
Corollary 2.3. Suppose that α is an algebraic integer of degree n ≥ 3 and height
H, let N be a positive integer, let h ≥ 4N , and define
k :=
j
1+
k
1 n
− 1 h − 1.
N
2
Then there exist Z-linearly independent polynomials P , Q ∈ Z[x] of height at most
h
1
(2nH + 2n)(N + 2 )n ,
satisfying an identity (1) for some F0 , . . . , Fn−1 ∈ Z[x], each (zero or) of degree at
most k and height at most
h
1
(2nH + 2n)(N + 2 )n−1 .
Proof. We have
i
1 n
−1 h−
.
k+1= 1+
N
2
2N
for some 0 ≤ i < 2N . Define r := (k + 1)/(n/2 − 1)h, as in Proposition 2.2. Then
r = 1 + 1/N − i/N (n − 2)h, hence 1 + 1/2N < r ≤ 1 + 1/N and
n r
1
− 1 < N + n − 1.
2r−1
2
As well, n(k + 1) ≤ n(n − 2)h ≤ nh . Now we apply Proposition 2.2, and from the
height bound in that statement we obtain H(Fi ) ≤ ((2nH +2n)h )(N +1/2)n−1 for all i.
Furthermore, each coefficient of P or Q is a linear function of the coefficients of the
polynomials Fi , with coefficients of absolute value at most (2H +2)h . So the absolute
value of each coefficient is at most n(k + 1)(2H + 2)h ((2nH + 2n)(N +1/2)n−1 )h , and
this is ≤ ((2nH + 2n)(N +1/2)n )h .
2.2
Excluded approximations from an approximation
Let α be an algebraic integer of degree n ≥ 3 and height H, let N be a positive
integer, and let P , Q ∈ Z[x] be as in Corollary 2.3 for some h ≥ 4N .
2.2. Excluded approximations from an approximation
25
Let κ > n/2 + 1, and let p and q be relatively prime integers with q > 0 and
α − p < 1 .
q
qκ
(1)
We will use P and Q to exhibit an interval of integers, depending on q, that other
rational approximations satisfying (1) must avoid.
Lemma 2.4. Let α be an algebraic integer of degree n ≥ 3 and height H, let N be
a positive integer, set
C := (4nH + 4n)(2N +1)n ,
let polynomials P , Q ∈ Z[x] be as in Corollary 2.3 for some h ≥ 4N , let κ > n/2+1,
and let p and q be relatively prime integers with q > 0 satisfying (1). Let t denote
the multiplicity of p/q as a root of P 0 (x)Q(x) − P (x)Q0 (x); in particular, t is 0 if
p/q is not a root of P 0 (x)Q(x) − P (x)Q0 (x). Then
qt ≤ C h.
We remark that P 0 (x)Q(x) − P (x)Q0 (x) = Q(x)2 (d/dx)(P (x)/Q(x)) is not the
zero polynomial by the Z-linear independence of P and Q. So the multiplicity t in
the statement of Lemma 2.4 is well-defined.
Proof. By Gauss’s lemma, we have
P 0 (x)Q(x) − P (x)Q0 (x) = (qx − p)t G(x)
(2)
for some G ∈ Z[x]. It follows that the leading coefficient of P 0 (x)Q(x) − P (x)Q0 (x)
is at least q t . But the coefficients of P 0 (x)Q(x) − P (x)Q0 (x) have absolute value at
most 2(h + k)2 (2nH + 2n)(2N +1)nh , and this is less than (4nH + 4n)(2N +1)nh .
With the notation of Lemma 2.4, we have
dt
0
0
P (x)Q(x) − P (x)Q (x) 6= 0.
dxt
x= p
q
By the general Leibniz rule we deduce that there exist natural numbers i and j
summing to t + 1, such that
P (i)
p (j) p p (i) p Q
− P (j)
Q
6 0.
=
q
q
q
q
Now let p0 and q 0 be relatively prime integers with q 0 > q and
0
α − p < 1 .
q0
q 0κ
26
2. Thue’s theorem
In this situation, we either have Q(i) (p/q)Q(j) (p/q) = 0, in which case by swapping i and j if necessary we may suppose Q(i) (p/q) = 0 and as a consequence
P (i) (p/q) 6= 0, or we have Q(i) (p/q)Q(j) (p/q) 6= 0, in which case P (i) (p/q)/Q(i) (p/q)
and P (j) (p/q)/Q(j) (p/q) are distinct rational numbers, and by swapping i and j if
necessary we may suppose Q(i) (p/q) 6= 0 and P (i) (p/q)/Q(i) (p/q) 6= p0 /q 0 . Summarizing, there exists i ∈ N satisfying:
(
Q(i) (p/q) = 0, P (i) (p/q) 6= 0 or
log C
i≤
(3)
h+1
and
P (i) (p/q)
p0
log q
Q(i) (p/q) 6= 0, Q
(i) (p/q) 6= q 0 .
Lemma 2.5. Let U , V , and W be positive real numbers and κ > 1. Assume that
2U 1/κ V 1−1/κ < W.
Then for r ∈ R>0 ,
Ur
−κ+1
+Vr ≥W
implies
r∈
/
2U
W
1
κ−1
,
W
.
2V
Proof. By the assumption, (U/V )1/κ lies in the interior of the stated interval. If
r < (U/V )1/κ then V r < U r−κ+1 , hence the inequality implies 2U r−κ+1 > W . If
r > (U/V )1/κ then V r > U r−κ+1 , hence the inequality implies 2V r > W .
Lemma 2.6. Let α be an algebraic integer of degree n and height H, let h ∈ N, and
let S ∈ (Z[α])[x] be a polynomial of degree d, with
S = S0 + S1 α + · · · + Sn−1 αn−1 .
We define
R(x) := (x − α)h S(x)
and for i = 0, . . . , h define S [i] ∈ (Z[α])[x] by
R(i) = (x − α)h−i S [i] (x),
written as
[i]
[i]
[i]
S [i] (x) = S0 (x) + αS1 (x) + · · · + αn−1 Sn−1 (x).
[i]
Then each polynomial Sj has coefficients divisible by i! and height at most
i!(H + 1)i+n−1 2d+h+i+3 max(H(S0 ), . . . , H(Sn−1 )).
Proof. By the general Leibniz rule,
(i)
h−i
R (x) = (x − α)
i X
h
1
i!
(x − α)c S (c) (x).
i−c
c!
c=0
We expand using the binomial theorem and apply Lemmas 1.24 and 1.26(i).
2.2. Excluded approximations from an approximation
27
Proposition 2.7. Let α be an algebraic integer of degree n ≥ 3 and height H, and
let N be a positive integer. Then there is a positive constant δ such that for h ≥ 4N ,
polynomials P , Q ∈ Z[x] as in Corollary 2.3, real number κ > n/2 + 1, relatively
prime integers p and q satisfying (1) with q ≥ (4nH + 4n)(2N +1)2n , relatively prime
integers p0 and q 0 satisfying q 0 > q and
0
α − p < 1 ,
q0
q 0κ
and natural number i as in (3), we have
1
n
1
n
q h+b(1+ N )( 2 −1)hc−1−i q 0−κ+1 + q h+b(1+ N )( 2 −1)hc−1−i−(h−i)κ q 0 ≥ δ h .
Proof. As in Corollary 2.3 we define k := b(1 + 1/N )(n/2 − 1)hc − 1. We define
A :=
q h+k−i (i) p ,
P
i!
q
B :=
q h+k−i (i) p .
Q
i!
q
Since P (i) /i! and Q(i) /i! are integer polynomials of degree at most h + k − i, we know
that A and B must be integers, and by (3) they satisfy
Bp0 − Aq 0 6= 0.
By Lemma 1.26(i) and (1) there is a constant D depending only on α and N , such
that
|A|, |B| ≤ q h+k−i Dh .
Now by application of Lemma 2.6 there is a constant E, also depending only on α
and N , such that
1 ≤ |Bp0 − Aq 0 |
≤ |B||p0 − αq 0 | + q 0 |A − αB|
p h−i
≤ q h+k−i q 0−κ+1 Dh + q h+k−i q 0 α − E h
q
≤ (q h+k−i q 0−κ+1 + q h+k−i−(h−i)κ q 0 ) max(D, E)h
Setting δ := 1/ max(D, E), we obtain the desired inequality.
In order for Proposition 2.7 to be useful, Lemma 2.5 should be applicable. The
following result describes choices that make Lemma 2.5 applicable.
Proposition 2.8. Let an integer n ≥ 3 and real numbers κ > n/2 + 1 and C > 1
be given. Then, for any N ∈ N>0 satisfying
n
1 n
+
− 1 < κ − 1,
(4)
2 N 2
28
2. Thue’s theorem
real number β > 1 satisfying
n
1 n
β
< κ − 1,
+
−1
2 N 2
integer h0 ≥ 4N satisfying
1−
(5)
1
1
> ,
h0
β
(6)
and positive real number δ there exists an integer q0 ≥ C 2 such that for all integers
h ≥ h0 and q ≥ q0 and all i ∈ N with
i≤
log C
h+1
log q
(7)
we have
h
< β,
h−i
h + b(1 + N1 )( n2 − 1)hc − 1 − i
1 n
n
<
+
− 1 β,
h−i
2 N 2
2U 1/κ V 1−1/κ < W,
(8)
(9)
(10)
where
1
n
U := q h+b(1+ N )( 2 −1)hc−1−i ,
V := q
1
h+b(1+ N
)( n
−1)hc−1−i−(h−i)κ
2
(11)
,
h
W := δ .
(12)
(13)
Proof. Without loss of generality we have δ ≤ 1. We may choose q0 ≥ C 2 so that
1−
Then
1
log C
1
−
> .
h0 log q0
β
h
≤
h−i
1−
1
log C
log q
−
1
h
< β.
With k := b(1 + 1/N )(n/2 − 1)hc − 1, then,
h+k−i
h+k h
n
1 n
≤
<
+
− 1 β < κ − 1.
h−i
h h−i
2 N 2
Now the condition 2U 1/κ V 1−1/κ < W is equivalent to
21/(h−i) h+k−i
q h−i < q κ−1 ,
δ h/(h−i)
(14)
and we see, the first factor on the left-hand side must lie between δ −1 and 2δ −β . So,
after suitably increasing q0 if needed, we have that (14) holds for all q ≥ q0 .
2.2. Excluded approximations from an approximation
29
Corollary 2.9. Let α be an algebraic integer of degree n ≥ 3, and let κ > n/2 + 1.
Then there exist integers h0 and q0 and a positive constant γ < 1 such that for any
relatively prime integers p and q satisfying q ≥ q0 and (1) and any integer h ≥ h0 ,
if p0 and q 0 are relatively prime integers with q 0 > q and
0
α − p < 1 ,
q0
q 0κ
then
q0 ∈
/ [q γh , q h ].
Proof. We make choices of constants as in Proposition 2.8. So, first, N ∈ N>0 should
be chosen to satisfy (4), and as in Lemma 2.4 we set C := (4nH + 4n)(2N +1)n , where
H denotes the height of α. Now, to get the conclusion stated here, we impose the
requirement that β > 1 should satisfy
n
1 n
β 1+ +
< κ;
(15)
−1
2 N 2
notice that this implies (5).
We let h0 be as in (6) and δ as in Proposition 2.7. Now Proposition 2.8 supplies
an integer q0 ≥ C 2 such that for all h ≥ h0 , q ≥ q0 , and i ∈ N satisfying (7),
inequalities (8)–(10) are satisfied; let U , V , and W be as defined in (11)–(13). We
observe, by (8) and (9),
h+b(1+1/N )(n/2−1)hc−1−i
W
δh
)
h−i
= q (h−i)(κ−
2V
2
δ h h (κ−( n2 + N1 ( n2 −1))β)
> qβ
2
δ βκ −( n2 + N1 ( n2 −1)) h
=
q
.
21/h
In the last expression, the exponent of q (within the outer brackets) is, by (15),
greater than 1. So, for sufficiently large q we have W/2V > q h . Similarly, using (9),
1
h+b(1+1/N )(n/2−1)hc−1−i (n/2+(1/N )(n/2−1))β
1
h
2U κ−1
κ−1
= 2 κ−1 δ − κ−1 q (n/2+(1/N )(n/2−1))β
W
1
h
1
h
< 2 κ−1 δ − κ−1 q (h−i)
≤ 2 κ−1 δ − κ−1 q h
(n/2+(1/N )(n/2−1))β
κ−1
(n/2+(1/N )(n/2−1))β
κ−1
.
By reasoning as above, if we fix γ such that
( n2 +
1 n
N(2
− 1))β
< γ < 1,
κ−1
then for sufficiently large q we have (2U/W )1/(κ−1) < q γh . The result now follows
from Lemma 2.5 and Propositions 2.7 and 2.8.
30
2.3
2. Thue’s theorem
Proof of theorem
Given the last result established in §2.2, the proof of Theorem 2.1 goes quite quickly.
Proof of Theorem 2.1. We have already seen, for the proof it suffices to consider the
case that α is an algebraic integer of degree n ≥ 3. Given ε > 0 we set κ := n/2+1+ε
and let h0 , q0 , and γ be as in Corollary 2.9. Increasing h0 if necessary, we may
suppose that
γ
h0 ≥
.
1−γ
We need to show, there are only finitely many rational numbers p/q, where p and
q are relatively prime positive integers with q > 0, such that
α − p < 1 .
q
qκ
We are, of course, done if all such p/q satisfy q < q0 . So we suppose that there is
some such rational approximation p/q with q ≥ q0 .
By applying Corollary 2.9 to h = h0 , h0 +1, . . . and observing from our additional
requirement on h0 that the adjacent pairs of excluded intervals overlap each other,
we find, that for any relatively prime integers p0 and q 0 with q 0 > q and
0
α − p < 1 ,
q0
q 0κ
we have
q 0 < q γh0 .
This observation completes the proof of Thue’s theorem.
A natural question is, for given α and ε, how in practice to list all the rational
approximations satisfying the bound in Thue’s theorem. One can search for such
approximations, by letting q run from 1 up to any given positive integer and testing
for each q if some relatively prime integer p satisfies the inequality. One can (after
reducing to the case that α is an algebraic integer) also compute constants h0 , q0 , and
γ as in Corollary 2.9: we have made constants explicit in §2.1 and throughout most
of §2.2, and one could continue this effort and produce explicit forms for constants
D and E in Proposition 2.7 (which determine δ) and q0 in Proposition 2.8.
However, one is left with the logical alternative in the proof of Theorem 2.1, that
either all approximations p/q meeting the bound in Thue’s theorem satisfy q < q0 ,
in which case we just search this far for approximations, or some such p/q exists with
q ≥ q0 , in which case it suffices to search up to q γh0 . But it is not known, in general,
how to distinguish these cases. So we say that Thue’s theorem is an ineffective
result, meaning that it describes a finite set that we have no way of determining.
2.3. Proof of theorem
31
For a certain class of Diophantine equations, Thue’s theorem implies that the
set of solutions is finite. The Diophantine equation in the next result is the Thue
equation.
Theorem 2.10. Let n ≥ 3, let F (x, y) be an irreducible homogeneous polynomial of
degree n whose coefficients are integers with gcd 1, and let m be a nonzero integer.
Then
F (x, y) = m
has only finitely many solutions in integers x and y.
Proof. The solutions with y = 0 are easily determined, so we assume y 6= 0 throughout the proof. Let f (x) := F (x, 1) be the non-homogeneous form of the given polynomial, with leading coefficient a and roots α1 , . . . , αn ∈ C. The given Diophantine
equation is equivalent to
n
a Y x
1
− αi = n .
(1)
m
y
y
i=1
Let
δ :=
1
min |αi − αj |.
2 1≤i<j≤n
Notice, if |x/y − αi | ≥ δ for all i then (1) implies |y| ≤ |m/a|1/n /δ; there are thus
only finitely many integer solutions (x, y) obeying this bound.
It remains to treat the case that for some i we have |x/y − αi | < δ. The other
n − 1 factors in the product in (1) all have absolute value greater than δ. So, we
have
1−n
αi − x < |m/a|δ
.
(2)
y
|y|n
Let κ be any real number with
n
+ 1 < κ < n.
2
By Thue’s theorem there are only finitely many pairs of relatively prime integers p
and q with q > 0, such that
αi − p < 1 .
q
qκ
This implies that there are only finitely many pairs of integers x and y such that
αi − x < 1 ,
(3)
y
|y|κ
since to each rational approximation p/q obeying the given bound there correspond
just finitely many pairs of integers x and y with x/y = p/q satisfying (3). Since
κ < n, the finiteness assertion for (3) implies that there are only finitely many pairs
of integers x and y satisfying (2).
32
3. Roth’s theorem
3
Roth’s theorem
Roth’s theorem, proved in 1955 and for which Roth was awarded the Fields medal
in 1958, says that every algebraic number in R has irrationality measure 2. Since
the case of quadratic irrational numbers has already been discussed, the theorem is
stated for algebraic numbers of degree ≥ 3.
Theorem 3.1 (Roth). Let α ∈ R be an algebraic number of degree n ≥ 3. For every
ε > 0 the inequality
α − p < 1
q
q 2+ε
holds for only finitely many rational numbers p/q, where p and q are relatively prime
integers with q > 0.
Roth’s theorem is also called Thue-Siegel-Roth theorem, since it stands as final
improvement to Thue’s theorem (1909), which Siegel (1921) had strengthened, replacing the exponent n/2+1+ε by n/k +k −1+ε for k ∈ N, (k −1)k < n ≤ k(k +1).
This way, the chapter on Thue’s theorem is logically unnecessary, though many of
the same techniques are used here in a more sophisticated setting.
Exactly as for Thue’s theorem, we make the initial reduction step, that it suffices
to prove Roth’s theorem under the assumption that α is an algebraic integer.
For the proof we follow J. W. S. Cassels, An Introduction to Diophantine Approximation.
3.1
Wronskians
The proof of Roth’s theorem uses, as auxiliary functions, polynomials in several
variables. We need a generalization of the fact, used in Lemma 2.4, that for a pair
of polynomials in one variable, neither a constant multiple of the other, the 2 × 2
matrix with the polynomials and their derivatives has nonzero determinant. We
work over Q here, though this could be replaced by any field of characteristic 0.
Let m ∈ N>0 , and let K := Q(x1 , . . . , xm ). The next result relates two notions,
the Q-linear independence of elements f1 , . . . , f` ∈ K and the K-linear independence
of vectors (D1 f1 , . . . , D1 f` ), . . . , (D` f1 , . . . , D` f` ) ∈ K ` for differential operators D1 ,
. . . , D` , each a monomial in ∂/∂x1 , . . . , ∂/∂xm . The latter can be detected by ` × `
determinants, which are called generalized Wronskians.
A monomial in differential operators
∂
∂
···
∂xa1
∂xar
has a well-defined degree r. The identity operator is the unique such differential
operator having degree 0.
3.1. Wronskians
33
Proposition 3.2. Let m ∈ N>0 , and let K := Q(x1 , . . . , xm ). For ` ∈ N>0 and
elements f1 , . . . , f` ∈ K the following are equivalent:
(i) The elements f1 , . . . , f` are Q-linearly independent.
(ii) There exist differential operators D1 , . . . , D` , where for each i the operator
Di is a monomial in ∂/∂x1 , . . . , ∂/∂xm of degree less than i, such that the
generalized Wronskian
det(Di fj )1≤i,j≤`
(1)
is nonzero.
Proof. The implication (ii) ⇒ (i) is obvious. We prove (i) ⇒ (ii) by induction on `.
The case ` = 1 is trivial.
Given ` > 1 and assuming the result known for (` − 1)-tuples of elements of K,
we suppose that f1 , . . . , f` are such that the generalized Wronskians (1) all vanish.
We may further suppose that f1 , . . . , f`−1 are Q-linearly independent, since if f1 ,
. . . , f`−1 are Q-linearly dependent then so are f1 , . . . , f` . Applying the induction
hypothesis, then, some generalized Wronskian of size (` − 1) × (` − 1) is nonzero:
det(Di fj )1≤i,j≤`−1 6= 0.
(2)
We consider the (` − 1) × ` matrix over K


D1 f1 . . .
D1 f`
 ..
..  .
 .
. 
D`−1 f1 . . . D`−1 f`
By (2), this is a matrix of rank ` − 1, whose kernel is the K-span of (a1 , . . . , a`−1 , 1)
for some a1 , . . . , a`−1 ∈ K. By assumption the generalized Wronskians (1) vanish,
so
a1 Df1 + · · · + a`−1 Df`−1 + Df` = 0
(3)
for every monomial D of degree less than ` in ∂/∂x1 , . . . , ∂/∂xm .
We apply ∂/∂xh to (3):
∂a1
∂a`−1
Df1 + · · · +
Df`−1
∂xh
∂xh
∂
∂
∂
+ a1
Df1 + · · · + a`−1
Df`−1 +
Df` = 0.
∂xh
∂xh
∂xh
(4)
When D has degree less than ` − 1, the operator (∂/∂xh )D has degree less than
`, and by (3) the rightmost ` terms in (4) sum to zero. This is the case when
D ∈ {D1 , . . . , D`−1 }; by these instances of (4), using (2), we have ∂aj /∂xh = 0 for
1 ≤ j ≤ ` − 1 and 1 ≤ h ≤ m. This implies aj ∈ Q for j = 1, . . . , ` − 1, and now
a1 f1 + · · · + a`−1 f`−1 + f` = 0 is a nontrivial Q-linear relation among f1 , . . . , f` .
34
3.2
3. Roth’s theorem
Multivariable auxiliary functions
Let α be an algebraic integer of degree n ≥ 3. The auxiliary functions used in the
proof of Roth’s theorem will be polynomials f ∈ Q[x1 , . . . , xm ] where m depends on
n and the constant ε in the statement of Roth’s theorem, vanishing to high order
(according to a suitable measure of order of vanishing) at (α, . . . , α).
Definition. Let F ∈ Q[x1 , . . . , xm ] and α1 , . . . , αm ∈ C. The index of F at
(α1 , . . . , αm ) relative to given positive integers r1 , . . . , rm is defined to be
i1
im ind(F ) = ind(α1 ,...,αm ) (F ) := min
+ ··· +
,
rm
(i1 ,...,im ) r1
where the minimum is taken over all (i1 , . . . , im ) such that the coefficient of
(x1 − α1 )i1 · · · (xm − αm )im
in the Taylor expansion of F around (α1 , . . . , αm ) is nonzero. (When F = 0 we make
the convention that the index is infinite.)
Equivalently, the index is the minimum of the given expression over all (i1 , . . . , im )
such that
∂ i1 +···+im F
(α1 , . . . , αm ) 6= 0.
∂xi11 · · · ∂ximm
Proposition 3.3. Given α1 , . . . , αm ∈ C and r1 , . . . , rm ∈ N>0 the index at
(α1 , . . . , αm ) relative to r1 , . . . , rm satisfies the
P following properties:
i1
i
+···+i
i
m
m
1
(i) ind(∂
F/∂x1 · · · ∂xm ) ≥ ind(F ) − ν iν /rν ;
(ii) ind(F + G) ≥ min(ind(F ), ind(G));
(iii) ind(F G) = ind(F ) + ind(G);
(iv) ind(F ) is, for F ∈ Q[x1 , . . . , xk ] with k < m, equal to the index of F at
(α1 , . . . , αk ) relative to r1 , . . . , rk .
Proof. By the definition of index as stated in terms of vanishing of partial derivatives
of F at (α1 , . . . , αm ), we have (i). Assertion (ii) is obvious, and (iii) is immediate
from the observation that ind(F ) is the t-degree after substituting xi := αi + t1/ri yi .
Under the assumption in (iv) all monomials with nonzero coefficient in the Taylor
expansion of F around (α1 , . . . , αm ) have ik+1 = · · · = in = 0, and from this the
property is immediate.
Lemma 3.4. Let m be a positive integer. Given positive integers r1 , . . . , rm and
real 0 < δ < 1, the number of m-tuples (i1 , . . . , im ) of natural numbers satisfying
iν ≤ rν ∀ ν
and
i1
im
1
+ ··· +
≤ m(1 − δ)
r1
rm
2
is at most
(r1 + 1) · · · (rm + 1)
δ
r
2
.
m
3.2. Multivariable auxiliary functions
35
p
Proof. We prove the result by induction on m. The result is trivial if δ ≤ 2/m
p and
in particular is trivial for m = 1 and m = 2. So we suppose m > 2 and δ > 2/m.
Fixing im , by the induction hypothesis the number of (i1 , . . . , im−1 ) satisfying the
given inequality is at most
(r1 + 1) · · · (rm−1 + 1) √
2m − 2.
δm − 1 + 2 rim
m
We have
rm
X
i=0
r
m
1
1X
=
i
2
δm − 1 + 2 rm
i=0
=
1
1
+
i
i
δm − 1 + 2 rm
δm + 1 − 2 rm
rm
X
i=0
δ 2 m2
δm
i 2
− (1 − 2 rm
)
δm
δ 2 m2 − 1
1
< (rm + 1) p
,
δ m(m − 1)
≤ (rm + 1)
p
where in the last step we have used the assumption δ > 2/m:
r
1
1
δ 2 m2 − 1 > δ 2 m2 1 −
> δ 2 m2 1 − .
2m
m
This gives the desired bound.
The following result supplies the auxiliary functions used in the proof of Roth’s
theorem.
Proposition 3.5. Let α be an algebraic integer of degree n ≥ 3 and height H, and
let δ > 0. For any integer m with
m>
8n2
δ2
and r1 , . . . , rm ∈ N>0 , there exists nonzero F ∈ Z[x1 , . . . , xm ] having degree in xj
at most rj for all j, index at least
1
m(1 − δ),
2
at (α, . . . , α) relative to r1 , . . . , rm , and height at most (4H + 4)r1 +···+rm .
Proof. We write
F =
X
ci1 ...im xi11 · · · ximm
36
3. Roth’s theorem
where the multi-index ranges over 0 ≤ i1 ≤ r1 , . . . , 0 ≤ im ≤ rm . The number of
unknown coefficients is
(r1 + 1) · · · (rm + 1).
To have index at least (1/2)m(1 − δ) imposes the constraints
1
∂ i1 +···+im F
(α, . . . , α) = 0
i1 ! · · · im ! ∂xi11 · · · ∂ximm
for all i1 , . . . , im with
i1
im
1
+ ··· +
< m(1 − δ).
r1
rm
2
The number of constraints is the quantity appearing in Lemma 3.4; each constraint
consists of n linear equations (one for every basis element of Z[α]) in the ci1 ...im with
coefficients of absolute value at most (2H + 2)r1 +···+rm . Indeed, differentiating and
dividing by the appropriate factorials introduces coefficients bounded by 2r1 +···+rm
(cf. Lemma 1.26(ii)), and the expression of powers of α in terms of the standard
basis of Z[α] multiplies these coefficients by at most (H + 1)r1 +···+rm (by Lemma
1.24).
We apply Lemma 1.25. By the hypothesis on m, the exponent in Lemma 1.25
is less than 1. So a solution exists, corresponding to nonzero F meeting the degree
and index requirements, with height at most (r1 + 1) · · · (rm + 1)(2H + 2)r1 +···+rm .
Bounding ri + 1 by 2ri leads to the height bound in the statement.
3.3
Index at nearby rational points
Given a collection of good rational approximations to α we will obtain a lower bound
on the index of an auxiliary function as in §3.2. The lower bound will be applicable
when the denominators of the rational approximations are large and close to each
other in a weighted sense depending on the given r1 , . . . , rm and will be paired with
a general upper bound, obtained using the machinery of Wronskians.
Proposition 3.6. Let α be an algebraic integer of degree n ≥ 3 and height H, and
let δ and ε be positive real numbers with
15δ < ε <
1
.
12
Given positive integers r1 , . . . , rm and pairs of relatively prime integers pi and qi
with qi > 0 and
α − pi < 1
qi
qi2+ε
for i = 1, . . . , m such that, for every i,
qiδ > 64(H + 1) max(1, |α|),
and
r1 log q1 ≤ ri log qi ≤ (1 + δ)r1 log q1 ,
3.3. Index at nearby rational points
37
we have
1
ind(p1 /q1 ,...,pm /qm ) (F ) ≥ εm
8
for all F ∈ Z[x1 , . . . , xm ] with xj -degree ≤ rj for all j, ind(α,...,α) (F ) ≥ (1/2)m(1−δ),
and H(F ) ≤ (4H + 4)r1 +···+rm .
Proof. Given F ∈ Z[x1 , . . . , xm ] as in the statement and i1 , . . . , im such that
i1
im
1
+ ··· +
< εm
r1
rm
8
we need to show that
G :=
∂ i1 +···+im F
1
i1 ! · · · im ! ∂xi11 · · · ∂ximm
satisfies
G
p1
pm ,...,
= 0.
q1
qm
By Lemma 1.26(ii) we have H(G) ≤ (8H + 8)r1 +···+rm . By Proposition 3.3(i),
ind(α,...,α) (G) ≥ (1/2)m(1 − δ) − (1/8)εm > (1/2)m(1 − ε/3).
We express G(p1 /q1 , . . . , pm /qm ) using the Taylor expansion around (α, . . . , α),
which by the index bound reduces to
X
j1
+···+ rjm ≥ 21 m(1− 3ε )
r1
m
j
j
1
∂ j1 +···+jm G
p1
pm
− α 1 ···
− α m.
(α, . . . , α)
j
j
m
1
j1 ! · · · jm ! ∂x1 · · · ∂xm
q1
qm
Notice,
j +···+j
mG
∂1
1
≤ 32(H + 1) max(1, |α|) r1 +···+rm ,
(α,
.
.
.
,
α)
j1 ! · · · jm ! ∂xj11 · · · ∂xjmm
by Lemma 1.26(ii) and the bound (r1 + 1) · · · (rm + 1) ≤ 2r1 +···rm on the number of
terms in G. As well,
m
X
p1
j
pm
j
− log − α 1 · · · − α m ≥ (2 + ε)
jν log qν
q1
qm
ν=1
1 ε r1 log q1
≥ (2 + ε) m 1 −
2
3
m
X
ε
ε
rν log qν .
≥ 1+
1 − (1 + δ)−1
2
3
ν=1
Since (1 + ε/2)(1 − ε/3) = 1 + (1/6)ε(1 − ε) > (1 + δ)2 , we have
p1
rm −1−δ
− αj1 · · · pm − αjm < |q r1 · · · qm
|
.
1
q1
qm
38
3. Roth’s theorem
Combining, we have at most 2r1 +···+rm terms, each of absolute value at most
rm |−1−δ . It follows that the absolute value
(32(H + 1) max(1, |α|))r1 +···+rm |q1r1 · · · qm
of the integer
rm
G(p1 /q1 , . . . , pm /qm )
q1r1 · · · qm
is at most
r +···+rm r1
rm −δ
| ,
|q1 · · · qm
64(H + 1) max(1, |α|) 1
and by the hypothesis on qiδ this is less than 1.
Proposition 3.7. Let m be a positive integer and δ ∈ R with 0 < δ < 1/12, and set
γ :=
24 δ 2m−1
.
2m 12
Given positive integers r1 , . . . , rm satisfying ri+1 ≤ γri for i = 1, . . . , m − 1 and
pairs of relatively prime integers pi and qi with qi > 0 for i = 1, . . . , m such that
for every i,
qiγ ≥ 8m
and
r1 log q1 ≤ ri log qi ,
then
ind(p1 /q1 ,...,pm /qm ) (F ) ≤ δ
for every 0 6= F ∈ Z[x1 , . . . , xm ] with xj -degree ≤ rj for all j and H(F ) ≤ q1γr1 .
Proof. We prove the result by induction on m. For the base case m = 1 we write
F (x) = (qx − p)t G(x)
(where we have omitted the subscript 1 from p, q, and x), with G(p/q) 6= 0. We
have G ∈ Z[x] (Gauss’s lemma). So q t divides the leading coefficient of F , and thus
(writing r for r1 ) q t ≤ H(F ) ≤ q δr . This gives the result, since indp/q (F ) = t/r.
For the inductive step, we suppose m > 1 and the result known for smaller values
of m. We may write
`
X
F =
gi (x1 , . . . , xm−1 )hi (xm )
i=1
where gi and hi are polynomials with rational coefficients in the indicated variables
and ` is minimal; certainly ` ≤ rm + 1. A Q-linear dependence among g1 , . . . , g` or
among h1 , . . . , h` would allow us to write F as above with fewer than ` summands.
So the minimalily of ` implies that g1 , . . . , g` are Q-linearly independent and h1 ,
. . . , h` are Q-linearly independent.
We apply Proposition 3.2. Applied, to g1 , . . . , g` we find that there exist D1 ,
. . . , D` , where Di is a monomial in ∂/∂x1 , . . . , ∂/∂xm−1 of degree less than i for
every i, such that
det(Di gj )1≤i,j≤` 6= 0.
3.3. Index at nearby rational points
39
In the one-variable case (usual Wronskians), the Di hj are just hj , h0j , . . . , and we
obtain
(i−1)
det(hj
)1≤i,j≤` 6= 0.
a
i(m−1)
Writing Di as ∂ ai1 +···+ai(m−1) /∂xa1i1 · · · ∂xm−1
, we introduce
∆i :=
1
∂ ai1 +···+ai(m−1)
ai(m−1)
ai1 ! · · · ai(m−1) ! ∂xa1i1 · · · ∂xm−1
and
u := det(∆i gj )1≤i,j≤` .
Similarly, we introduce
v := det
1
(i−1)
.
hj
(i − 1)!
1≤i,j≤`
By the multiplicativity of the determinant, the polynomial with integer coefficients
1
∂F W := det ∆i
(j − 1)! ∂xj−1
1≤i,j≤`
m
satisfies
W = uv.
There exists c ∈ Q, unique up to sign, so that
V := cv
has integer coefficients with gcd 1. Then
U := c−1 u
as well has integer coefficients. So we have the factorization of integer polynomials
W (x1 , . . . , xm ) = U (x1 , . . . , xm−1 )V (xm ).
(1)
Since W is the determinant of an ` × ` matrix whose entries are polynomials
with xj -degree at most rj for every j, the xj -degree of W is at most `rj . By
Lemma 1.26(ii) and the hypothesis, the entries of the matrix have height at most
2r1 +···+rm q1γr1 . The determinant is a sum of `! ≤ `rm ≤ 2`rm terms, each of height at
most (r1 + 1)` · · · (rm + 1)` (2r1 +···+rm q1γr1 )` , which is ≤ 4(r1 +···+rm )` q1γr1 ` . So
H(W ) ≤ 8(r1 +···+rm )` q1γr1 ` ≤ q12γr1 `
where for the second inequality we have used r1 + · · · + rm ≤ mr1 and 8m ≤ q1γ . In
the factorization (1) every coefficient of W is a product of corresponding coefficients
of U and V , so
H(U ), H(V ) ≤ q12γr1 ` .
40
3. Roth’s theorem
The hypotheses of the proposition are satisfied for the quantities
m0 := m − 1,
δ 0 :=
δ2
,
12
ri0 := `ri (i = 1, . . . , m − 1),
with corresponding quantity γ 0 related by γ 0 = 2γ. So by the induction hypothesis
and Proposition 3.3(iv),
ind(p1 /q1 ,...,pm /qm ) (U ) ≤ `
δ2
.
12
Similarly, by appealing to the single-variable case of the proposition, we have
ind(p1 /q1 ,...,pm /qm ) (V ) ≤ `
δ2
.
12
Now by Proposition 3.3(ii),
ind(p1 /q1 ,...,pm /qm ) (W ) ≤ `
δ2
.
6
(2)
The proof concludes by relating the index of F to that of W . This is possible by
expanding the determinant and applying Proposition 3.3(i)–(iii). We let
θ := ind(p1 /q1 ,...,pm /qm ) (F ).
So by Proposition 3.3(i) and the inequality
ai(m−1) j − 1
ai1 + · · · + ai(m−1) j − 1
ai1
+ ··· +
+
≤
+
r1
rm−1
rm
rm−1
rm
`−1 j−1
≤
+
rm−1
rm
j−1
rm
≤
+
rm−1
rm
j−1
≤γ+
rm
we have
ind(p1 /q1 ,...,pm /qm ) ∆i
∂F δ2
j−1 ≥
max
θ
−
−
,0
j−1
∂ xm
24
rm
and hence by Proposition 3.3(ii)–(iii) and (2),
`
X
j=1
max θ −
δ2
j−1 δ2
−
,0 ≤ ` .
24
rm
6
(3)
3.3. Index at nearby rational points
41
There are now two cases. If θ ≥ (`−1)/rm then, directly, we obtain θ ≤ δ 2 /2 < δ.
If θ < (` − 1)/rm , then the terms with j > θrm + 1 contribute trivially, and hence
(3) implies
δ2
bθrm c(bθrm c + 1)
δ2
(bθrm c + 1)θ − ` −
≤` .
24
2rm
6
So
θ(bθrm c + 1)
θ(bθrm c + 1)
δ2
δ2
θ2
≤
= (bθrm c + 1)θ −
≤ ` ≤ rm ,
2
2
2
4
2
which implies θ ≤ δ.
rm
Proof of Theorem 3.1. We may suppose ε < 1/12. Let n denote the degree and H
the height of α. We prove the result by contradiction. Suppose that there exist
infinitely many rational numbers p/q, where p and q are relatively prime integers
with q > 0, such that
α − p < 1 .
(4)
q
q 2+ε
Then we let δ be a positive real number less than ε/15, and we choose an integer
m > 8n2 /δ 2 . We define γ as in Proposition 3.7.
We choose, among the rational approximations satisfying (4), one whose denominator is greater than both (4H + 4)m/γ and (64(H + 1) max(1, |α|))1/δ ; this we call
p1 /q1 . Now we let p2 /q2 , . . . , pm /qm be further such approximations, chosen so that
2/γ
qi+1 > qi
for every i. Then we take r1 to be an integer satisfying
r1 ≥
and define
ri :=
1 log qm
δ log q1
j r log q k
1
1
+1
log qi
for i = 2, . . . , m.
Since m > 8n2 /δ 2 , we may apply Proposition 3.5 to obtain 0 6= F ∈ Z[x1 , . . . , xm ]
with xj -degree ≤ rj for all j, index at least (1/2)m(1 − δ) at (α, . . . , α), and height
at most (4H + 4)r1 +···+rm . The hypotheses of Propositions 3.6 and 3.7 are satisfied:
we have qm > · · · > q1 > (64(H + 1) max(1, |α|))1/δ , we easily verify r1 log q1 ≤
ri log qi ≤ (1 + δ)r1 log q1 for all i, from which follows γri > 2(log qi / log qi+1 )ri ≥
(2/(1 + δ))ri+1 ≥ ri+1 ; as well, we have H(F ) ≤ (4H + 4)mr1 ≤ q1γr1 . By Proposition
3.6 the index of F at (p1 /q1 , . . . , pm /qm ) is at least εm/8. By Proposition 3.7,
ind(p1 /q1 ,...,pm /qm ) (F ) ≤ δ < ε/15, and we have a contradiction.