Download Appendices

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Randomness wikipedia , lookup

Birthday problem wikipedia , lookup

Inductive probability wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Central limit theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Guaranteed Recovery of Planted Cliques and Dense
Subgraphs by Convex Relaxation: Additional Proofs
∗
Brendan P.W. Ames
May 16, 2015
1
Recovery of the Densest k-Subgraph in the Adversarial Case
We consider the densest k-subgraph problem: given a graph G = (V, E), identify a k-node
subgraph of G of maximum density. We are interested in establishing conditions for when
the optimal solution of the combinatorial formulation of this problem
min
X,Y ∈ΣV
n
o
rank (X) + γkY k0 : eT Xe = k 2 , Xij + Yij = 0 if ij ∈ Ẽ, X ∈ {0, 1}V ×V
(1.1)
can be recovered from the optimal solution of the convex relaxation
n
o
min kXk∗ + γkY k1 : eT Xe = k 2 , Xij + Yij = 0 if ij ∈ Ẽ, X ∈ [0, 1]V ×V
X,Y
(1.2)
Here ΣV is the cone of semidefinite matrices with entries indexed by V , Ẽ = (V × V ) − E −
{vv : v ∈ V } is the complement of the edge set of G, e is the all-ones vector in RV , kY k0 ,
kY k1 , denote the `0 and `1 norms of the vectorization of Y ∈ RV ×V , and kXk∗ denotes the
nuclear norm of X ∈ ΣV . In both (1.1) and (1.2), γ is a regularization parameter to be
chosen by the user and k is the size of the desired subgraph of G.
This section provides a proof of the following theorem, which states that the densest
k-subgraph can be recovered using (1.2) in the case that the input graph G consists of a
single clique, corrupted by determistic noise.
∗
Department of Mathematics, Box 870350, Tuscaloosa, AL, 35487-0350, Tel.:
[email protected]
1
+1-205-348-5155,
Theorem 1.1 Let V ∗ be a k-subset of nodes of the graph G = (V, E) and let v be its
characteristic vector. Suppose that G contains at most r edges not in G(V ∗ ) and G(V ∗ )
contains at least k2 − s edges, such that each vertex in V ∗ is adjacent to at least (1 − δ1 )k
nodes in V ∗ and each vertex in V − V ∗ is adjacent to at most δ2 k nodes in V ∗ for some
δ1 , δ2 ∈ (0, 1) satisfying 2δ1 + δ2 < 1. Let (X ∗ , Y ∗ ) = (vvT , −PẼ (vvT )), where v is the
characteristic vector of V ∗ and PẼ is the projection onto the set of matrices with support
contained in Ẽ. Then there exist scalars c1 , c2 > 0, depending only on δ1 and δ2 such that
if s ≤ c1 k 2 and r ≤ c2 k 2 then G(V ∗ ) is the unique maximum density k-subgraph of G and
−1
(X ∗ , Y ∗ ) is the unique optimal solution of (1.2) for γ = 2 (1 − 2δ1 − δ2 )k .
The proof of Theorem 1.1 is identical to that of [1, Theorem 1], with minor modifications
made to accommodate the deterministic nature of the noise in Theorem 1.1. Specifically,
we will establish Theorem 1.1 by showing that (X ∗ , Y ∗ ) = (vvT , −PẼ (vvT )) satisfy the
optimality conditions for (1.2) given by the Karush-Kuhn-Tucker theorem (see, for example,
[3, Section 5.5.3]) if the input graph G satisfies the hypothesis of Theorem 1.1.
1.1
Optimality Conditions
We begin by stating the following theorem, which provides the required sufficient conditions
for optimality of (X ∗ , Y ∗ ).
Theorem 1.2 ( [1, Theorem 5]) For graph G = (V, E), let V̄ be a subset of V of cardinality k of graph G = (V, E), and let v̄ be the characteristic vector V̄ . Let X̄ = v̄v̄T and
Ȳ = −PẼ (X̄). Suppose that there exist F, W ∈ RV ×V , λ ∈ R, and M ∈ RV+×V such that
X̄
+ W − λeeT − γ(Ȳ + F ) + M = 0
k
W v̄ = W T v̄ = 0, kW k ≤ 1
(1.4)
kF k∞ ≤ 1
(1.5)
Fij = 0 for all (i, j) ∈ E ∪ {vv : v ∈ V }
(1.6)
Mij = 0 for all (i, j) ∈ (V × V ) − (V̄ × V̄ ).
(1.7)
PẼ (F ) = 0,
(1.3)
Then (X̄, Ȳ ) is an optimal solution of (1.2) and the subgraph G(V̄ ) induced by V̄ is a
maximum density k-subgraph of G. Moreover, if kW k < 1 and kF k∞ < 1 then (X̄, Ȳ ) is the
unique optimal solution of (1.2) and G(V̄ ) is the unique maximum density k-subgraph of G.
We complete the proof by verifying that multipliers satisfying the hypothesis of Theorem 1.2 do indeed exist for the proposed solution (X ∗ , Y ∗ ). In particular, we consider W
and F chosen as follows:
2
(ω1 ) If (i, j) ∈ V ∗ × V ∗ such that ij ∈ E or i = j, then we take Wij = λ̃ − Mij , where
λ̃ := λ − 1/k, to ensure that the left-hand side of (1.3) is equal to 0.
(ω2 ) if ij ∈ Ω := V ∗ × V ∗ − (E ∪ {ii : i ∈ V ∗ }) then we choose Wij = λ̃ − γ − Mij .
(ω3 ) If (i, j) ∈ (V × V ) − (V ∗ × V ∗ ) such that ij ∈ E or i = j then the left-hand side of
(1.3) is equal to Wij − λ = 0. In this case, we choose Wij = λ.
(ω4 ) If i, j ∈ V − V ∗ such that ij ∈ Ẽ, we take Wij = 0 and Fij = −λ/γ to ensure that the
left-hand side of (1.3) is equal to 0.
(ω5 ) If i ∈ V ∗ and j ∈ V − V ∗ such that (i, j) ∈ Ẽ, we choose
Wij = −λ
nj
k − nj
λ
, Fij = −
γ
k
k − nj
,
where nj is the number of neighbours of j in V ∗ .
(ω6 ) If i ∈ V − V ∗ , j ∈ V ∗ such that (i, j) ∈ Ẽ, we take Wij = Wji , Fij = Fji according to
(ω5 ).
Note that this construction is identical to that in [1, Section 4.1], except for Case (ω4 ).
It remains to show that there exist multipliers λ, M ∈ RV+×V and regularization parameter
γ such that W and F , as chosen above, satisfy W v = W T v = 0, kW k < 1, and kF k∞ < 1.
1.2
Choice of the Multipliers λ and M
The multiplier M is chosen so that W v = W T v = 0. By the choice of Wij in (ω3 ) and (ω5 ),
we have
ni
=0
[W v]i = ni λ − (k − ni )λ
k − ni
for all i ∈ V − V ∗ . On the other hand, the requirement that
0=
X
j∈V
∗
Wij = k λ̃ − γdi −
X
j∈V
Mij , 0 =
∗
X
i∈V
Wij = k λ̃ − γdj −
∗
X
i∈V
Mij
∗
for all i, j ∈ V ∗ , defines a system of 2k equations for the k 2 unknown entries of M , where
di = k − 1 − ni for all i ∈ V ∗ . To obtain a solution of this underdetermined system, we
parametrize M as M = yeT + eyT , where y is the solution of the linear system
(kI + eeT )y = λ̃ke − γd.
3
∗
Here d ∈ RV is the vector with ith component equal to di . By the Sherman-MorrisonWoodbury formula [4, Equation (2.1.4)], we have
1
y=
2k
Note that
s dT e
1 k λ̃e − γ 2d −
e
=
k λ̃e − 2γ d −
e .
k
2k
k
(1.8)
s
1
di − ≤ kdk∞ 1 −
≤ δ1 (k − 1)
k
k
for all i ∈ V ∗ , by the facts that s ≥ kdk∞ and kdk∞ ≤ δ1 (k −1). It follows that each element
yi of y satisfies 2kyi ≥ k λ̃ − 2γδ1 (k − 1). Taking λ = 2γδ1 + 1/k ensures that the entries of
y and, hence, M are strictly positive.
1.3
A Bound on kF k∞
We next show that our choice of γ and λ ensures that kF k∞ ≤ 1. Note that Fij = 0 for all
(i, j) ∈ V × V except those corresponding to Cases (ω4 ), (ω5 ), and (ω6 ). In Case (ω4 ), we
have
1
2δ1 + δ2 − 1
λ
− 2δ1 =
− 2δ1 > −1
Fij = − = −
γ
kγ
2
by the choice of γ = 2((1 − 2δ1 − δ2 )k)−1 and λ = 2γδ1 + 1/k, as well as the assumption that
2δ1 + δ2 < 1. On the other hand, we take
λ
Fij = −
γ
k
k − nj
for all (i, j) corresponding to Case (ω5 ). In this case, we have |Fij | < 1 because
1
nj
1
1 + 2δ1 + δ2
+ 2δ1 +
≤
+ 2δ1 + δ2 ≤
< 1.
kγ
k
kγ
2
by the fact that nj ≤ δ2 k and the assumption that 2δ1 + δ2 < 1. The case corresponding to
(ω6 ) follows similarly by the symmetry of F .
1.4
A Bound on kW k
It remains to establish that our choice of W satisfies kW k ≤ 1. The following lemma provides
the necessary bound on kW k.
4
Lemma 1.1 Suppose that W is constructed according to (ω1 )-(ω6 ). Then
2
2
2
kW k ≤ 4γ (1 + 2δ1 )s + 2λ
1+2
2 − δ2
1 − δ2
r .
Proof: We decompose W as W = Q + R, where Q(V ∗ , V ∗ ) = W (V ∗ , V ∗ ) and all remaining
entries of Q are zero. It follows that kW k2 ≤ kW k2F = kQk2F + kRk2F . By an argument
identical to that in [2, Section 4.1], we have
kRk2F
2 − δ2
r .
≤ 2λ 1 + 2
1 − δ2
2
It remains to derive the necessary upper bound on kQk2F .
Let Q̃ = Q(V ∗ , V ∗ ). Note that
dT e T
1
T
T
Q̃ = γ −H +
de + ed − 2 ee
k
k
by our choice of y, where H is the adjacency matrix of the complement of G(V ∗ ). By the
fact that both kHk2F and dT e are equal to twice the number of pairs of nonadjacent nodes
in V ∗ we have kHk2F = dT e ≤ 2s. On the other hand, we have
1
dT e T 2
T
T
k de + ed − k 2 ee F
4s2
1
4s
T
T
T
T 2
T
T 2
= 2 kde + ed kF − Tr (de + ed )ee + 2 kee kF
k
k
k
2
4s
1
4s
1
· 4ks + 2 · k 2 = 2 2kkdk2 − 4s2 .
≤ 2 8s2 + 2kkdk2 −
k
k
k
k
Applying Hölder’s inequality [5, Equation (5.4.14)] and the inequalities s ≥ kdk∞ and
kdk∞ ≤ δ1 (k − 1) implies that
1
dT e T 2
T
T
≤ 2 kkdk1 − 2s kdk∞ ≤ 4s δ1 (k − 1)2 ≤ 4δ1 s.
de
+
ed
−
ee
k
k2
k2
k2
F
It follows immediately that kQk2F = kQ̃k2F ≤ 2γ 2 (2s + 4δ1 s) = 4γ 2 (1 + 2δ1 )s. This completes
the proof.
By our choice of λ and γ, Lemma 1.1 implies that we can have r and s as large as O(k 2 )
and still have kW k < 1.
5
2
2.1
Recovery of the Densest (k1, k2)-Subgraph
The Adversarial Case
We next derive the guarantee for recovery of the maximum density (k1 , k2 )-subgraph under
adversarial noise from the optimal solution of the convex problem
n
o
X
U ×V
min kXk∗ + γkY k1 :
Xij ≥ k1 k2 , Xij + Yij = 0 ∀ ij ∈
/ E, Xij ∈ [0, 1]
(2.1)
given by the following theorem.
Theorem 2.1 Let G = (U, V, E) be a bipartite graph and let U ∗ ⊆ U , V ∗ ⊆ V be subsets of
cardinality k1 and k2 respectively. Let u and v denote the characteristic vectors of U ∗ and
V ∗ . Let (X ∗ , Y ∗ ) = (uvT , −PẼ (uvT )). Suppose that G(U ∗ , V ∗ ) contains at least k1 k2 − s
edges and that G contains at most r edges other than those in G(U ∗ , V ∗ ). Suppose that
every node in V ∗ is adjacent to at least (1 − α1 )k1 nodes in U ∗ and every node in U ∗ is
adjacent to at least (1 − α2 )k2 nodes in V ∗ for some scalars α1 , α2 > 0. Further, suppose
that each node in V − V ∗ is adjacent to at most β1 k1 nodes in U ∗ and each node in U − U ∗
is adjacent to at most β2 k2 nodes in V ∗ for some β1 , β2 > 0. Finally suppose that the scalars
α1 , α2 , β1 , β2 satisfy α1 + α2 + max{β1 , β2 } < 1. Then there exist scalars c1 , c2 > 0, depending
only on α1 , α2 , β1 , and β2 , such that if r ≤ c1 k1 k2 and s ≤ c2 k1 k2 then G(U ∗ , V ∗ ) is the
unique maximum density (k1 , k2 )-subgraph of G and (X ∗ , Y ∗ ) is the unique optimal solution
−1
√
of (2.1) for γ = 2 k1 k2 (1 − α1 − α2 − max{β1 , β2 })
.
As before, the proof of Theorem 2.1 relies on establishing that the proposed solution
(X ∗ , Y ∗ ) satisfies the sufficient conditions for optimality given by the Karush-Kuhn-Tucker
theorem. The following theorem provides the necessary specialization of these conditions for
(2.1).
Theorem 2.2 Let G = (U, V, E) be a bipartite graph. Let Ū ⊆ U , V̄ ⊆ V have cardinality
k1 and k2 , respectively and let ū, v̄ be their respective characteristic vectors. Let X̄ = ūv̄T
and let Ȳ = −PẼ (X̄). Suppose that there exist F, W ∈ RU ×V , λ ∈ R, and M ∈ RU+×V such
that
√
X̄
+ W − λeeT − γ(Ȳ + F ) + M = 0
k1 k2
(2.2)
W v̄ = 0, W T ū = 0, kW k ≤ 1
(2.3)
kF k∞ ≤ 1
(2.4)
PẼ (F ) = 0,
6
Fij = 0 for all (i, j) ∈ E
(2.5)
Mij = 0 for all (i, j) ∈ (U × V ) − (Ū × V̄ ).
(2.6)
Then (X̄, Ȳ ) is an optimal solution of (2.1) and the subgraph G(Ū , V̄ ) induced by (Ū , V̄ )
is a maximum density (k1 , k2 )-subgraph of G. Moreover, if kW k < 1 and kF k∞ < 1 then
(X̄, Ȳ ) is the unique optimal solution of (2.1) and G(V̄ ) is the unique maximum density
(k1 , k2 )-subgraph of G.
The proof of Theorem 2.2 is analogous to that of [1, Theorem 4.1] and is left to the
reader. As in the proof of Theorem 1.1, we establish Theorem 2.1 by showing that there
exist multipliers corresponding to our proposed solution (X ∗ , Y ∗ ) satisfying the hypothesis
of Theorem 2.2. In particular, we choose W and F according to the following cases.
(ψ1 ) If (i, j) ∈ (U ∗ × V ∗ ) ∩ E then (2.2) implies that W must satisfy Wij = λ̂ − Mij , where
√
λ̂ := λ − 1/ k1 k2 .
(ψ2 ) If (i, j) ∈ U ∗ × V ∗ such that (i, j) ∈
/ E, we take Wij = λ̂ − γ − Mij .
(ψ3 ) If (i, j) ∈ (U × V ) − (U ∗ × V ∗ ) such that (i, j) ∈ E, we let Wij = λ to ensure that the
left-hand side of (2.2) is equal to zero.
(ψ4 ) If i ∈
/ U, j ∈
/ V , (i, j) ∈
/ E, we take Wij = 0. In this case, we must have Fij = −λ/γ to
ensure that the left-hand side of (2.2) is zero.
(ψ5 ) Suppose that i ∈
/ U ∗ , j ∈ V ∗ such that (i, j) ∈
/ E. We choose
Wij = −λ
ni
k2 − ni
λ
, Fij = −
γ
k2
k2 − ni
,
in this case, where ni is the number of neighbours of i in V ∗ .
(ψ6 ) Finally, if i ∈ U ∗ , j ∈
/ V ∗ , (i, j) ∈
/ E then we take
Wij = −λ
mj
k1 − mj
λ
, Fij = −
γ
k1
k1 − mj
,
where mj is the number of neighbours of j in U ∗ .
We next establish that there exist λ ∈ R and M ∈ RU+×V such that W v = 0 and
W T u = 0. If i ∈
/ U ∗ , then
[W v]i =
X
Wij = ni λ − (k2 − ni )
j∈V ∗
7
ni
k2 − ni
λ = 0.
Similarly, if j ∈
/ V ∗ , we have
T
[W u]j =
X
Wij = mj λ − (k1 − mj )
i∈U ∗
mj
k1 − mj
λ = 0.
The requirement that [W v]i = 0, and [W T u]j = 0 for all i ∈ U ∗ , j ∈ V ∗ defines an underdetermined system of k1 + k2 equations for k1 k2 unknowns. To identify a solution of this
system, we parametrize M as M = yeT + ezT for some y ∈ Rk1 and z ∈ Rk2 . The conditions
W (U ∗ , V ∗ )e = 0 and W (U ∗ , V ∗ )T e = 0, imply that y and z are solutions of the system
k2 I eeT
eeT k1 I
!
y
z
!
!
b1
b2
=
,
(2.7)
∗
∗
where b1 := k2 λ̂e − (k2 e − n)γ, b2 := k1 λ̂ − (k1 e − m)γ, and n ∈ RU , m ∈ RV are
the vectors whose entries are equal to the degrees in G(U ∗ , V ∗ ) of each node in U ∗ and
V ∗ , respectively. This system is singular with nullspace spanned by the vector (e; −e) =
(ek1 ; −ek2 ), where ek1 , ek2 are the all-ones vectors in Rk1 and Rk2 , respectively. Since (e; −e)
is in the nullspace of the coefficient matrix, the unique solution of the nonsingular system
obtained by perturbing (2.7) by (e; −e)(e; −e)T ,
k2 I + eeT
0
0
k1 I + eeT
!
y
z
!
b1
b2
=
!
,
(2.8)
is also a solution of (2.7). Indeed, taking the inner product of each side of (2.8) with (e; −e)
yields (k1 + k2 )(eT y − eT z) = bT1 e − bT2 e. However, bT1 e − bT2 e = γ(nT e − mT e) = 0 by the
fact that both mT e and nT e equal |E(G(U ∗ , V ∗ ))|. It follows that yT e − zT e = 0 and so
the solution of (2.8) must satisfy
b1
b2
!
=
=
!
!
k2 I + eeT
0
y
0
k1 I + eeT
z
!
!
k2 I eeT
y
+ (yT e − zT e)
T
ee k1 I
z
e
−e
!
=
k2 I eeT
eeT k1 I
!
y
z
!
.
Thus, the unique solution of (2.8) is a solution of (2.7).
Applying the Sherman-Morrison-Woodbury formula to (2.8) shows that
1
y=
k2
I−
eeT
k1 + k2
b1 = λ̂
k2
k1 + k2
8
γ
−
k2
dT2 e
d2 −
e ,
k1 + k2
(2.9)
where d2 = k2 e − n. Similarly,
z = λ̂
k1
k1 + k2
γ
e−
k1
dT1 e
d1 −
e ,
k1 + k2
(2.10)
where d1 = k1 e − m. The following lemma provides a lower bound on the entries of M ,
based on lower bounds on the entries of y and z.
Lemma 2.1 For all i ∈ U ∗ , j ∈ V ∗ ,
Mij ≥ λ̂ − γ(α1 + α2 ) 1 −
1
k1 + k2
.
Proof: Recall that 0 ≤ [d2 ]i ≤ α2 k2 for all i ∈ U ∗ . It follows immediately that
dT2 e
1
1
[d2 ]i −
≤ kd2 k∞ 1 −
≤ α2 k2 1 −
k1 + k2
k1 + k2
k1 + k2
(2.11)
for all i ∈ U ∗ . Similarly,
1
dT1 e
≤ α1 k1 1 −
[d1 ]j −
k1 + k2
k1 + k2
(2.12)
for all j ∈ V ∗ . Substituting (2.11) and (2.12) into (2.9) and (2.10) completes the proof.
It follows immediately from Lemma 2.1 that all entries of M are nonnegative if λ =
√
γ(α1 + α2 ) − 1/ k1 k2 . We next establish that kF k∞ < 1 for γ chosen as in Theorem 2.1
and this particular choice of λ, M . Recall that Fij = 0 except for all (i, j) corresponding to
Cases (ψ4 ), (ψ5 ), and (ψ6 ). In (ψ4 ), we take Fij = −λ/γ. In this case
λ
1
|Fij | = = √
+ α1 + α2 =
γ
γ k1 k2
1 − α1 − α2 − max{β1 , β2 }
2
+ α1 + α2 < 1,
by our choice of γ and the assumption that α1 + α2 + max{β1 , β2 } < 1. On the other hand,
if (i, j) are as in (ψ5 ),
λ
k2
|Fij | =
γ k2 − ni
which is bounded above by 1 by the fact that ni ≤ β2 k2 and our choice of γ. Similarly, if
(i, j) are as in (ψ6 ), then |Fij | < 1 since mj ≤ β1 k1 . Therefore, kF k∞ ≤ 1 for this particular
choice of λ, M and γ.
It remains to show that our choice of W has spectral norm bounded above by 1 when the
hypothesis of Theorem 2.1 is satisfied. The following lemma provides the necesssary upper
9
bound on kW k.
Lemma 2.2 The matrix W constructed according to (ψ1 ) − (ψ6 ) satisfies
kW k ≤ 2γ (α1 + α2 + 1)s + λ 1 +
2
2
1
1
+
1 − β1 1 − β2
r
As an immediate consequence of Lemma 2.2, note that kW k ≤ 1 for r and s as large as
O(k1 k2 ). That is, we have exact recovery of the hidden biclique after up to O(k1 k2 ) edge
additions or deletions.
The remainder of this section comprises a proof of Lemma 2.2. We decompose W as
W = Q + R, where Q is such that Q(U ∗ , V ∗ ) = W (U ∗ , V ∗ ) and all remaining entries of Q
are zero. We will obtain bounds on kQkF and kRkF individually and bound kW k by
kW k2 ≤ kW k2F = kQk2F + kRk2F .
The fact that
kRk2F
≤λ r 1+
2
1
1
+
1 − β1 1 − β2
follows from an argument identical to that in [2, Section 5.1]. It remains to derive the
necessary upper bound on kQkF .
Let Q̃ = Q(U ∗ , V ∗ ). By our choice of M , we have
T
d2 eT
edT1
d1 e dT2 e
eeT
Q̃ = γ −H +
+
−
+
,
k2
k1
k1
k2
k1 + k2
where H is the adjacency matrix of the complement of G(U ∗ , V ∗ ). By our definition of H,
d1 and d2 , we have
kHk2F = dT1 e = dT2 e ≤ s.
We assume that s = dT1 e = dT2 e for simplicity in the following calculations. Then, Hölder’s
inequality and the assumption that kd1 k1 = kd2 k1 = s implies that
T
T 2
2
2
2
d2 eT
ed
1
1
ee
1
= kd1 k + kd2 k − s
+
−
s
+
k2
k1
k1 k2 k1 + k2 F
k1
k2
k1 k2
skd1 k∞
1
skd2 k∞
1
≤
1−
+
1−
k1
k2
k2
k1
1
1
≤ sα1 1 −
+ sα2 1 −
≤ s(α1 + α2 ),
k2
k1
10
where the second to last inequality follows from the fact that kd1 k∞ ≤ α1 k1 , kd2 k∞ ≤ α2 k2 .
Putting everything together yields
2
2
kQk = kQ̃k ≤
kQ̃k2F
≤2
kHk2F
+ kQ̃ −
Hk2F
≤ 2s(1 + α1 + α2 )
as required.
2.2
The Random Case
We conclude by proving an analogous result to Theorem 2.1 for random graphs constructed
as follows:
(Ψ1 ) For some k1 -subset U ∗ ⊆ U and k2 -subset V ∗ ⊆ V , we add each potential edge from
U ∗ to V ∗ independently with probability 1 − q.
(Ψ2 ) Then each remaining possible edge is added independently to E with probability p.
By construction, G(U ∗ , V ∗ ) is significantly more dense than all other (k1 , k2 )-subgraphs in
expectation in the case that p + q < 1. So if k1 and k2 are sufficiently large, we should expect
to recover G(U ∗ , V ∗ ) from the optimal solution of (2.1). Indeed, we have the following
theorem.
Theorem 2.3 Suppose that the (N1 , N2 )-node bipartite graph G = (U, V, E) is constructed
according to (Ψ1 ) and (Ψ2 ) such that p + q < 1 and
(1 − p)ki ≥ max{8, 64p} log ki
(2.13)
pNi ≥ (1 − p)2 log2 Ni
n
1/2 o
(1 − p − q)ki ≥ 72 max log ki , q(1 − q)ki log ki
(2.14)
(2.15)
for i = 1, 2. Then there exist absolute constants c1 , c2 , c3 > 0 such that if
p
1/2
c1 (1 − p − q) k1 k2 ≥ Nmax
log(Nmax ) · max p1/2 , ((1 − p) min{k1 , k2 })−1/2 ,
(2.16)
where Nmax = max{N1 , N2 }, then G(U ∗ , V ∗ ) is the densest (k1 , k2 )-subgraph of G and
(X ∗ , Y ∗ ) is the unique optimal solution of (2.1) with high probability for
γ∈
c3
c2
√
√
,
(1 − p − q) k1 k2 (1 − p − q) k1 k2
11
.
As before, we establish optimality of (X ∗ , Y ∗ ) for (2.1) by proving that a particular
proposed choice of multipliers satisfies the conditions for uniqueness and optimality given by
Theorem 2.2. We choose W and F as in (ψ1 )-(ψ6 ) with the following exception:
(ψ40 ) If i ∈
/ U ∗, j ∈
/ V ∗ such that (i, j) ∈
/ E, we choose Wij = −λp/(1 − p) and Fij =
−λ/(γ(1 − p)).
We next establish that there exists choice of λ, M and γ corresponding to this choice of W
and F satisfying the hypothesis of Theorem 2.2.
U ×V
We begin by showing that there exist λ ∈ R and M ∈ R+
such that W v = 0 and
T
T
∗
W u = 0. As before, [W v]i = 0 and [W u]j = 0 for all i ∈
/U ,j∈
/ V ∗ by our choice of Wij
in (ψ3 ), (ψ5 ) and (ψ6 ), and we parametrize M as M = yeT + ezT where y, z are given by
(2.9) and (2.10). By this choice of y, z we have [W v]i = 0 and [W T u]j = 0 for all i ∈ U ∗
and j ∈ V ∗ .
We next show that the entries of y and z are nonnegative with high probability. Note
that
k
k1
2
e and E[z] = (λ̂ − qγ)
e
E[y] = λ̂ − qγ
k1 + k2
k1 + k2
since E[d1 ] = qk1 e and E[d2 ] = qk2 e. It follows that E[Mij ] = λ̂ − γq > 0 under the
assumption that λ̂ > γq. Thus, M is nonnegative in expectation. The following lemma shows
that the entries of y and z are concentrated near their expected values and, consequently,
M is nonnegative with high probability. Here, and in the rest of this note, an event occurs
with high probability (w.h.p.) if the event occurs with probability tending polynomially to 1
as min{k1 , k2 } tends to infinity.
Lemma 2.3 Suppose that q(1 − q)k1 ≥ log k1 , q(1 − q)k2 ≥ log k2 , and q(1 − q)k1 k2 ≥
log(k1 + k2 ). Then
ky − E[y]k∞
kz − E[z]k∞
n
1/2 o
12
≤
max log k2 , q(1 − q)k2 log k2
, and
k2
n
1/2 o
12
≤
max log k1 , q(1 − q)k1 log k1
k1
with high probability.
Proof: We prove the bound on ky − E[y]k∞ . An identical argument yields the bound on
kz − E[z]k∞ . Recall that
γ
y − E[y] =
k2
1
T
d e − qk1 k2 e − (d2 − qk2 e) .
k1 + k2 2
12
(2.17)
The entries of d2 are binomially distributed random variables, each corresponding to k2 independent Bernoulli trials with probability of success q. By the standard Bernstein inequality
(see [1, Lemma 1]), we have
|[d2 ]i − qk2 | ≤ 6 max
n
q(1 − q)k2 log k2
1/2
, log k2
o
(2.18)
for all i ∈ U ∗ with high probability. Similarly, dT2 e is a binomially distributed random variable corresponding to k1 k2 independent Bernoulli trials. Applying the Bernstein inequality
again establishes that
|dT2 e − qk1 k2 | ≤ 6 max
n
o
1/2
q(1 − q)k1 k2 log(k1 k2 )
, log(k1 k2 )
(2.19)
with high probability. Substituting (2.18) and (2.19) into (2.17) yields the desired bound on
ky − E[y]k∞ .
Lemma 2.3 implies that the entries of M are nonnegative with high probability for suffi√
ciently large values of k1 , k2 . Indeed, taking λ = γ((1 − p − q)/3 + q) + 1/ k1 k2 yields
γ
Mij ≥
3
1/2 72 log(kmin ), q(1 − q)kmin log(kmin )
1−p−q−
kmin
for all i ∈ U ∗ , j ∈ V ∗ with high probability by Lemma 2.3, where kmin = min{k1 , k2 }.
Therefore, all entries of M are nonnegative provided p, q, k1 and k2 satisfy (2.15) by the fact
that log x/x is a decreasing function on the interval x ∈ [3, ∞).
We next verify that our choices of λ, M and γ yield F such that kF k∞ < 1. As before,
Fij = 0 for all i, j except those corresponding to Cases (ψ4 ), (ψ5 ) and (ψ6 ). In (ψ4 ), we have
2(1 − p − q)
+q <1−p
3
−1
√
and, consequently, |Fij | < 1 for γ ≥ 3 (1 − p − q) k1 k2
and λ = γ((1 − p − q)/3 + q) +
√
1/ k1 k2 . On the other hand, when (i, j) corresponds to (ψ5 ) we have
|Fij |(1 − p) ≤
λ
|Fij | =
γ
k2
k2 − ni
Therefore, |Fij | < 1 in this case if and only if
λ ni
+
< 1.
γ k2
13
.
p
By the Bernstein inequality, ni ≤ pk2 + 6 max{ p(1 − p)k2 log k2 , log k2 } with high probability, which ensures that |Fij | ≤ 1 with high probability. By a similar argument, we have
|Fij | < 1 in Case (ψ6 ) with high probability for sufficiently large k1 .
It remains to establish that kW k ≤ 1 with high probability for this choice of multipliers.
The following lemma provides the necessary bound on kW k.
Lemma 2.4 Suppose that p, q, k1 , k2 , N1 , and N2 satisfy (2.13) and (2.14). Let kmax =
max{k1 , k2 }, kmin = min{k1 , k2 }, and Nmax = max{N1 , N2 }. Then
(
kW k ≤ 24γ max
+λ
1/2
q(1 − q)kmax log(kmax )
,
36
1−p
1/2
Nmax
kmax
kmin
1/2
)
log(kmax )
n
−1/2 o
1/2
log(Nmax ) · max p , (1 − p)kmin
with high probability.
As an immediate consequence of Lemma 2.4, note that there exist absolute constants
c1 , c3 such that kW k < 1 if
c1 (1 − p − q)
p
1/2
k1 k2 ≥ Nmax
log(Nmax ) · max p1/2 , ((1 − p) min{k1 , k2 })−1/2
√
√
and we take 3((1 − p − q) k1 k2 )−1 ≤ γ ≤ c3 (1 − p − q) k1 k2 )−1 .
The remainder of this section consists of a proof of Lemma 2.4. We decompose W as
W = Q + R as before and bound kQk and kRk separately. To do so, we will repeatedly
apply the following bound on the largest singular value of random matrix.
Lemma 2.5 Let A ∈ Rm×n be a random matrix with i.i.d. entries having mean zero, variance σ 2 , and magnitude bounded above by B. Let n̄ = max{m, n}. Then
n
o
kAk ≤ 6 max σ (n̄ log n̄)1/2 , B log n̄
with probability at least 1 − 2n̄−8 .
Lemma 2.5 follows immediately from applying the Noncommutative Bernstein Inequality
T
[6, Theorem 1.6] to the sequence of matrices {Zij }i=1,...m
j=1,...,n defined by Zij = Aij ei ej , where ei ,
ej are the ith and jth standard basis vectors in Rm and Rn respectively; the details of the
proof are left to the reader.
14
We are now ready to bound kQk and kRk. We begin with kQk. Without loss of generality,
we assume that k1 ≤ k2 . Under this assumption kmin = k1 and kmax = k2 . Let Q̃ :=
Q(U ∗ , V ∗ ). We decompose Q̃ as Q̃ = γ(Q1 + Q2 + Q3 + Q4 ), where
edT1
d2 eT
− qeeT ,
Q3 =
− qeeT ,
Q1 = qeeT − H,
Q2 =
k2
k1
T
d1 e
dT2 e
eeT
Q4 =
− qk1 +
− qk2
.
k1
k2
k1 + k2
The entries of Q1 are i.i.d. mean zero variables with variance σ 2 = q(1 − q) and magnitude
bounded above by 1. Applying Lemma 2.5 shows that
n
1/2 o
kQ1 k = kH − qeeT k ≤ 6 max log k2 , q(1 − q)k2 log k2
(2.20)
with high probability. On the other hand, applying (2.19) and the triangle inequality shows
that
T
√
|d1 e − qk1 k2 | |dT2 e − qk1 k2 |
k1 k2
kQ4 k ≤
+
k1
k2
k1 + k2
(
)
2
1/2 log (k1 k2 ) 1/2
≤ 6 max q(1 − q) log(k1 k2 )
(2.21)
,
k1 k2
with high probability. Similarly,
!1/2
1
1 X
kQ2 k = √ kd2 − qk2 ek =
([d2 ]i − qk2 )2
k
k2
2
i∈U ∗
1/2
n
1/2 o
k1
≤6
max log k2 , q(1 − q)k2 log k2
k2
(2.22)
with high probability by (2.18). By symmetry, we have
kQ3 k ≤ 6
k2
k1
1/2
n
1/2 o
max log k2 , q(1 − q)k2 log k2
with high probability. Combining (2.20), (2.21), (2.22), and (2.23) shows that
(
kQk ≤ 24γ max
q(1 − q)k2 log k2
with high probability.
15
1/2
,
k2
k1
)
1/2
log k2
(2.23)
We complete the proof by deriving the required upper bound on kRk. To do so, we
decompose R as R = λ(R1 + R2 + R3 + R4 ) as in [1, Section 3.4]. We first approximate R as
a random matrix R1 in RN1 ×N2 with i.i.d. entries having mean 0, variance σ 2 = p/(1 − p),
and magnitude bounded by B := max{1, p/(1 − p)}. Specifically, for all ij ∈ (U − U ∗ ) ×
(V − V ∗ ) ∩ E we let [R1 ]ij = 1 = Wij /λ if ij ∈ E and [R1 ]ij = −p/(1 − p) otherwise. For all
i ∈ U ∗ , j ∈ V ∗ , [R1 ]ij is a random variable independently sampled from the distribution
(
x=
1,
with probability p,
−p/(1 − p), with probability 1 − p.
Applying Lemma 2.5 to R1 shows that
(
kR1 k ≤ 6 max B log2 (Nmax ),
p
1−p
1/2 )
Nmax log(Nmax )
(2.24)
with high probability.
The remaining terms in the decomposition R = λ(R1 + R2 + R3 + R4 ) are corrections for
each of the following cases. First, R2 is the correction matrix for the (U ∗ , V ∗ ) block of R.
Recall that R(U ∗ , V ∗ ) = 0 by construction. We choose R2 (U ∗ , V ∗ ) = −R1 (U ∗ , V ∗ ) and set
all remaining entries of R2 equal to 0. Then, again by Lemma 2.5, we have
(
kR2 k ≤ 6 max
B log2 k2 ,
p
1−p
1/2 !)
k2 log k2
(2.25)
with high probability.
To complete the proof, we take R3 and R4 to be the correction matrices corresponding
to Cases (ψ5 ) and (ψ6 ). We take
p
ni
−
, if i ∈ U − U ∗ , j ∈ V ∗ , ij ∈
/ E,
1
−
p
k
−
n
2
i
[R3 ]ij =
 0,
otherwise,


and similarly define R4 as the correction for the (U ∗ , V − V ∗ ) block of R. Note that
2
kR3 k ≤
kR3 k2F
2
X p
X
ni
(pk2 − ni )2
=
−
=
.
2 (k − n )
1
−
p
k
(1
−
p)
2 = −ni
2
i
∗
∗
i∈U −U
i∈U −U
16
Recall that the Bernstein inequality implies that |ni − pk2 | ≤ 6 p(1 − p)k2 log k2
high probability if p(1 − p)k2 ≥ log k2 . In this case,
kR3 k2 ≤
≤
1/2
with
36N1 max{p(1 − p)k2 log k2 , log2 k2 }
p
(1 − p)2 (1 − p)k2 − 6 max{ p(1 − p)k2 log k2 , log k2 }
144N1 max{p(1 − p)k2 log k2 , log2 k2 }
(1 − p)3 k2
(2.26)
with high probability, where the last inequality holds under the assumption that (1 − p)k2 ≥
max{8, 64p} log k2 . Similarly,
144N2 max{p(1 − p)k1 log k1 , log2 k1 }
kR4 k ≤
(1 − p)3 k1
2
(2.27)
with high probability if (1 − p)k1 ≥ max{8, 64p} log k1 . Combining (2.24), (2.25), (2.26),
and (2.27) shows that
kRk ≤ λ
36
1−p
1/2
Nmax
n
log(Nmax ) · max p
1/2
, (1 − p)k1
−1/2 o
with high probability under the assumptions of Theorem 2.3; in particular, the assumption
(pNmax )1/2 ≥ (1 − p) log(Nmax ) is used to bound the B log2 (Nmax ) term in (2.24) in order to
simplify the expression for the bound on kRk. This completes the proof.
References
[1] B. Ames. Guaranteed Recovery of Planted Cliques and Dense Subgraphs by Convex
Relaxation. Arxiv preprint arXiv: 1305.4891, 2013. 2, 3, 7, 13, 16
[2] B. Ames and S. Vavasis. Nuclear norm minimization for the planted clique and biclique
problems. Mathematical Programming, 129(1):1–21, 2011. 5, 10
[3] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, Cambridge, UK, 2004. 2
[4] G.H. Golub and C.F. Van Loan. Matrix computations. Johns Hopkins University Press,
1996. 4
[5] R.A. Horn and C.R. Johnson. Matrix analysis. Cambridge University Press, 2005. 5
17
[6] J.A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of
Computational Mathematics, pages 1–46, 2011. 14
18