Download Lecture3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of randomness wikipedia , lookup

Inductive probability wikipedia , lookup

Birthday problem wikipedia , lookup

Probability interpretations wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Entropy (information theory) wikipedia , lookup

Probability box wikipedia , lookup

Random variable wikipedia , lookup

Randomness wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Central limit theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Lecture 3
The entropy method and implications of Keevash’s theorems
Zur Luria
1
∗
Upper bounds: The entropy method
The entropy method is a very powerful method of obtaining bounds, usually upper bounds. First,
let us recall the notion of information entropy and some of its basic properties.
Let X be a random variable with a finite range. We will only be interested in X’s distribution,
and so we can really think of X as a finite collection {p1 , ..., pn } of nonnegative numbers that sum
to 1. The entropy of X is defined to be
H(X) =
n
X
pi log(1/pi ).
i=1
1. Intuitively, the entropy of X is the amount of information it encodes, measured in bits. There
are theorems that justify this interpretation, saying that the average number of bits needed to
encode X’s value is exactly H(X).
2. For any random variable X taking on n possible values there holds H(X) ≤ log(n), with
equality iff X is uniform.
3. Let X, Y be discrete random variables. The joint entropy of X and Y is just the entropy of the
pair (X, Y ) considered as a single random variable whose distribution is the joint distribution
of X and Y .
4. Let X, Y be discrete random variables. The conditional entropy of X given Y is
X
X
H(X|Y ) =
Pr(Y = y)
Pr(X = x|Y = y) log(1/ Pr(X = x|Y = y)).
y∈Range(Y )
=
X
x∈Range(X)
Pr(Y = y)H(X|Y = y) = EY [H(X|Y = y)].
y∈Range(Y )
This quantity may be interpreted as the average amount of information that X gives us if we
already know Y .
∗
Institute of Theoretical Studies, ETH, 8092 Zurich, Switzerland. [email protected]. Research supported by
Dr. Max Rössler, the Walter Haefner Foundation and the ETH Foundation.
1
5. The chain rule: For a sequence of random variables X1 , ..., Xn , we have
H(X1 , ..., Xn ) = H(X1 ) + H(X2 |X1 ) + ... + H(Xn |X1 , ..., Xn−1 ).
For an entropy upper bound on the number of STSs, see [?].
We will illustrate the entropy method by proving an upper bound on the number of Sudoku
squares, that as far as I know is new.
Theorem 1.1. For a square N = n2 , let SN denote the number of order N Sudoku squares. Then
SN
2
N N
≤ (1 + o(1)) 3
e
Proof. Fix N = n2 , and let X be an order-N Sudoku square chosen uniformly at random. Then
H(X) = log(SN ). Let Xi,j denote the value of Xi,j . Using the chain rule, for any ordering of the
Xi,j ’s we have
X
H(X) =
E[H(Xi,j | values for the previous variables)]
i,j
Assume now that we choose a random ordering of the variables by choosing xi,j ∼ U [0, 1] independently for each (i, j), and ordering the variables in order of decreasing xi,j . Let x = (xi,j )i,j , and
observe that this is, of course, nothing but a uniformly random ordering, but it is convenient to use
these xi,j ’s.
A value s is unavailable for Xi,j given previously observed variables if we already observed a
variable in the same column, row or box whose value was s. Let Ni,j denote the number of values
that are available for Xi,j given the previous variables. Then we have
X
H(X) ≤ Exi,j
E[H(Xi,j | values for the previous variables)]
i,j


X
≤ EX 
Ex [log(Ni,j )]
i,j


X
≤ EX 
Exi,j [log(Exk,l :(k,l)6=(i,j) [Ni,j ])] .
i,j
So let’s turn to calculate the inner expectation. The situation is that X, the Sudoku square, is fixed,
and xi,j , the random number given to (i, j) is also fixed, and we are calculating the expectation of
Ni,j over the choices for xk,l where (k, l) 6= (i, j).
Using linearity of expectation,
Exk,l :(k,l)6=(i,j) [Ni,j ] =
n
X
Pr(s is available for Xi,j given previously seen variables|X, xi,j ).
s=1
There are three cases for s, depending on X.
• If Xi,j = s then s is clearly available for Xi,j no matter which variables were seen before.
2
• If Xi,j 6= s, then if Xk,l = s for (k, l) in the same box as (i, j) and in the same row or column
as (i, j), then there are exactly two variables that can rule out s. The value s remains legal if
Xi,j precedes them both, which means that their x-values are smaller than xi,j , which happens
with probability x2i,j . There are always 2n − 2 such values s.
• Otherwise, there are exactly three variables that can rule out s- the s-valued variables in the
same row, column and box as (i, j). There are N − 2n + 1 = (n − 1)2 such values s, and the
probability that such an s is available is x3i,j , similarly to the previous case.
Therefore
Exk,l :(k,l)6=(i,j) [Ni,j ] = 1 + (2n − 2)x2i,j + (n − 1)2 x3i,j .
Plugging this in, we get
H(X) ≤ N 2 ·
Z
1
log(1 + 2(n − 1)x2 + (n − 1)2 x3 )
0
N 2 · (log(N ) − 3 + o(1)).
The last thing isn’t obvious at all, but mathematica solves the integral in about ten seconds. Thus,
SN ≤ exp(N 2 · (log(N ) − 3 + o(1))) =
2
N N
.
(1 + o(1)) 3
e
This proof can be adapted to give upper bounds on many different types of combinatorial objects,
including designs, high dimensional permutations, Latin transversals, and various generalizations of
these objects.
2
Immediate consequences of Keevash’s result
Before we delve into Keevash’s proof, there are some interesting implications that can be inferred,
using Theorem ?? and the lower bound on Steiner triple systems as a black boxes.
• Typical designs: One natural direction to take in the study of designs is to investigate the
properties of uniformly random design. For random regular graphs this has been a fertile area,
and what was missing for designs until now was good estimates on their number. Here are a
couple of examples.
3−ε ,
Theorem 2.1. There exists > 0 such that for every set of triples B ⊆ [n]
3 with |B| ≥ n
with high probability an order-n STS chosen uniformly at random contains an element of B.
Proof. It is straightforward to get an entropy upper bound on the number of order-n STSs that
avoid B. Dividing this bound by Keevash’s lower bound gives the result
Theorem 2.2. With high probability an order-n STS chosen uniformly at random doesn’t
contain an order-u STS for u ≥ n/5.
3
Proof. First, we obtain an entropy upper bound for the number of STSs containing a smaller
STS on a fixed set U ⊂ [n] of size u for some u ≥ n/5, and then divide by Keevash’s lower
bound. This is an upper bound on the probability that U is a sub-STS of the random STS.
We get the result via a union bound over all such sets U .
• Another application of Keevash’s theorem is the construction of designs (and related objects)
with desirable properties. If the property is monotone (ie, adding edges can only help), then
all we have to do is to construct a partial STS that has the desired property and make sure
that the remaining uncovered edges satisfy the requirements of theorem [?]. In this manner,
we can prove results such as the following.
Theorem 2.3.
– There exists a constant M > 0 such that for every large enough n there
is an order-n STS that contains a triple t ∈ S3 for every S ⊆ [n] whose size is at least
M (n log n)1/2 .
– There exists a constant M 0 > 0 such that for every large enough n there is an order-n
Latin square that contains an element (a, b, c) ∈ A × B × C for every A, B, C such that
|A||B||C| ≥ M 0 n2 .
Proof. We can prove these kinds of statements by analyzing the random greedy process, and
showing that with high probability the triangles chosen by the random greedy process already
have this property. Let us prove the first item.
Fix S ⊆ [n] and let s = |S|. The number of triples in S3 is about s3 /6, while the total number
of triples is about n3 /6, and so the probability that the first triangle is in S is s3 /n3 . Let Gi
denote the graph consisting of all of the uncovered edges before the i-th step of the greedy
algorithm, and let Ai denote the event that the triangle chosen in the i-th step isn’t in S.
Let 0 < α < 16 be a constant whose value will be determined later. The probability that no
triangle of S is chosen during the first αn2 steps of the greedy algorithm is
Pr(A1 ) · Pr(A2 |A1 ) · ... · Pr(Aαn2 |A1 , ..., Aαn2 −1 ).
Fix 1 ≤ k ≤ αn2 . Note that Pr(Ak |A1 , ..., Ak−1 ) is one minus the number of triangles in S in
Gi (which we shall denote by T (S, Gi )) divided by the total number of triangles in Gi , so
Pr(Ak |A1 , ..., Ak−1 ) ≤ 1 −
T (S, Gi )
.
n
3
It remains to give a lower bound on T (S, Gi ). Since we know that previous steps did not
choose a triangle in S, the only way a triangle in S can be eliminated by previous choices is
if a previous triangle had an edge in S. Every such triangle can eliminate at most s triangles
of S, and so if ti is the number of triangles prior to the i-th step with an edge in S, then
T (S, Gi ) ≥ 3s − s · ti . The next step is to obtain an upper bound on ti .
The number of triangles in Gj is at least n3 − 3(j − 1)n. Since we are only considering the
first αn2 steps, this is at least n3 − 3αn3 , and so the probability that the j-th triangle has an
(2s)n
edge in S is at most n −3αn
≈ 3s2 n/(1 − 18α)n3 .
3
(3)
4
Therefore, we can upper bound ti by the sum of i independent Bernoulli random variables with
p = 3s2 n/(1 − 18α)n3 . Chernoff’s bound tells us that
αs2
2
.
Pr tαn2 > 6αs /(1 − 18α) ≤ exp −
(1 − 18α)
So whp this doesn’t happen, and we have ti ≤ 6αs2 /(1 − 18α) for all 1 ≤ i ≤ αn2 , so
s
3
− 18α)
1 − 54α s3
3 − 6αs /(1
Pr(Ak |A1 , ..., Ak−1 ) ≤ 1 −
≈
1
−
.
n
1 − 18α n3
3
3
s
So for an appropriately small constant α, we have Pr(Ak |A1 , ..., Ak−1 ) ≤ 1 − 2n
3 (α = 0.01
suffices), and then the probability that no triangle of S is chosen during the first αn2 steps of
the greedy algorithm is at most
αs2
s3
2
exp −
+ (1 − 3 )αn ≤ 2 exp(−αs3 /2n).
(1 − 18α)
2n
Using the union bound over all sets S whose size is at least
are no such sets that don’t contain any triangles.
2√
α n log n
implies that whp there
For the second statement, recall that every STS is equivalent to a Latin square, so proving the
result for Latin squares reduces to proving a similar result for STSs.
There is a nice interpretation of this result. The chromatic index of an STS is the smallest
number of colors needed to color the vertices such that there is no monochromatic triple. The
1/2
n
previous theorem tells us that in Keevash’s STSs, the chromatic index is at least Ω( log n
).
References
[1] P. Keevash, The existence of designs, arXiv preprint arXiv:1401.3665 (2014).
[2] P. Keevash, Counting designs, arXiv preprint arXiv:1504.02909 (2015).
5