Download Appendices

Guaranteed Recovery of Planted Cliques and Dense Subgraphs by Convex Relaxation: Additional Proofs ∗ Brendan P.W. Ames May 16, 2015 1 Recovery of the Densest k-Subgraph in the Adversarial Case We consider the densest k-subgraph problem: given a graph G = (V, E), identify a k-node subgraph of G of maximum density. We are interested in establishing conditions for when the optimal solution of the combinatorial formulation of this problem min X,Y ∈ΣV n o rank (X) + γkY k0 : eT Xe = k 2 , Xij + Yij = 0 if ij ∈ Ẽ, X ∈ {0, 1}V ×V (1.1) can be recovered from the optimal solution of the convex relaxation n o min kXk∗ + γkY k1 : eT Xe = k 2 , Xij + Yij = 0 if ij ∈ Ẽ, X ∈ [0, 1]V ×V X,Y (1.2) Here ΣV is the cone of semidefinite matrices with entries indexed by V , Ẽ = (V × V ) − E − {vv : v ∈ V } is the complement of the edge set of G, e is the all-ones vector in RV , kY k0 , kY k1 , denote the `0 and `1 norms of the vectorization of Y ∈ RV ×V , and kXk∗ denotes the nuclear norm of X ∈ ΣV . In both (1.1) and (1.2), γ is a regularization parameter to be chosen by the user and k is the size of the desired subgraph of G. This section provides a proof of the following theorem, which states that the densest k-subgraph can be recovered using (1.2) in the case that the input graph G consists of a single clique, corrupted by determistic noise. ∗ Department of Mathematics, Box 870350, Tuscaloosa, AL, 35487-0350, Tel.: [email protected] 1 +1-205-348-5155, Theorem 1.1 Let V ∗ be a k-subset of nodes of the graph G = (V, E) and let v be its characteristic vector. Suppose that G contains at most r edges not in G(V ∗ ) and G(V ∗ ) contains at least k2 − s edges, such that each vertex in V ∗ is adjacent to at least (1 − δ1 )k nodes in V ∗ and each vertex in V − V ∗ is adjacent to at most δ2 k nodes in V ∗ for some δ1 , δ2 ∈ (0, 1) satisfying 2δ1 + δ2 < 1. Let (X ∗ , Y ∗ ) = (vvT , −PẼ (vvT )), where v is the characteristic vector of V ∗ and PẼ is the projection onto the set of matrices with support contained in Ẽ. Then there exist scalars c1 , c2 > 0, depending only on δ1 and δ2 such that if s ≤ c1 k 2 and r ≤ c2 k 2 then G(V ∗ ) is the unique maximum density k-subgraph of G and −1 (X ∗ , Y ∗ ) is the unique optimal solution of (1.2) for γ = 2 (1 − 2δ1 − δ2 )k . The proof of Theorem 1.1 is identical to that of [1, Theorem 1], with minor modifications made to accommodate the deterministic nature of the noise in Theorem 1.1. Specifically, we will establish Theorem 1.1 by showing that (X ∗ , Y ∗ ) = (vvT , −PẼ (vvT )) satisfy the optimality conditions for (1.2) given by the Karush-Kuhn-Tucker theorem (see, for example, [3, Section 5.5.3]) if the input graph G satisfies the hypothesis of Theorem 1.1. 1.1 Optimality Conditions We begin by stating the following theorem, which provides the required sufficient conditions for optimality of (X ∗ , Y ∗ ). Theorem 1.2 ( [1, Theorem 5]) For graph G = (V, E), let V̄ be a subset of V of cardinality k of graph G = (V, E), and let v̄ be the characteristic vector V̄ . Let X̄ = v̄v̄T and Ȳ = −PẼ (X̄). Suppose that there exist F, W ∈ RV ×V , λ ∈ R, and M ∈ RV+×V such that X̄ + W − λeeT − γ(Ȳ + F ) + M = 0 k W v̄ = W T v̄ = 0, kW k ≤ 1 (1.4) kF k∞ ≤ 1 (1.5) Fij = 0 for all (i, j) ∈ E ∪ {vv : v ∈ V } (1.6) Mij = 0 for all (i, j) ∈ (V × V ) − (V̄ × V̄ ). (1.7) PẼ (F ) = 0, (1.3) Then (X̄, Ȳ ) is an optimal solution of (1.2) and the subgraph G(V̄ ) induced by V̄ is a maximum density k-subgraph of G. Moreover, if kW k < 1 and kF k∞ < 1 then (X̄, Ȳ ) is the unique optimal solution of (1.2) and G(V̄ ) is the unique maximum density k-subgraph of G. We complete the proof by verifying that multipliers satisfying the hypothesis of Theorem 1.2 do indeed exist for the proposed solution (X ∗ , Y ∗ ). In particular, we consider W and F chosen as follows: 2 (ω1 ) If (i, j) ∈ V ∗ × V ∗ such that ij ∈ E or i = j, then we take Wij = λ̃ − Mij , where λ̃ := λ − 1/k, to ensure that the left-hand side of (1.3) is equal to 0. (ω2 ) if ij ∈ Ω := V ∗ × V ∗ − (E ∪ {ii : i ∈ V ∗ }) then we choose Wij = λ̃ − γ − Mij . (ω3 ) If (i, j) ∈ (V × V ) − (V ∗ × V ∗ ) such that ij ∈ E or i = j then the left-hand side of (1.3) is equal to Wij − λ = 0. In this case, we choose Wij = λ. (ω4 ) If i, j ∈ V − V ∗ such that ij ∈ Ẽ, we take Wij = 0 and Fij = −λ/γ to ensure that the left-hand side of (1.3) is equal to 0. (ω5 ) If i ∈ V ∗ and j ∈ V − V ∗ such that (i, j) ∈ Ẽ, we choose Wij = −λ nj k − nj λ , Fij = − γ k k − nj , where nj is the number of neighbours of j in V ∗ . (ω6 ) If i ∈ V − V ∗ , j ∈ V ∗ such that (i, j) ∈ Ẽ, we take Wij = Wji , Fij = Fji according to (ω5 ). Note that this construction is identical to that in [1, Section 4.1], except for Case (ω4 ). It remains to show that there exist multipliers λ, M ∈ RV+×V and regularization parameter γ such that W and F , as chosen above, satisfy W v = W T v = 0, kW k < 1, and kF k∞ < 1. 1.2 Choice of the Multipliers λ and M The multiplier M is chosen so that W v = W T v = 0. By the choice of Wij in (ω3 ) and (ω5 ), we have ni =0 [W v]i = ni λ − (k − ni )λ k − ni for all i ∈ V − V ∗ . On the other hand, the requirement that 0= X j∈V ∗ Wij = k λ̃ − γdi − X j∈V Mij , 0 = ∗ X i∈V Wij = k λ̃ − γdj − ∗ X i∈V Mij ∗ for all i, j ∈ V ∗ , defines a system of 2k equations for the k 2 unknown entries of M , where di = k − 1 − ni for all i ∈ V ∗ . To obtain a solution of this underdetermined system, we parametrize M as M = yeT + eyT , where y is the solution of the linear system (kI + eeT )y = λ̃ke − γd. 3 ∗ Here d ∈ RV is the vector with ith component equal to di . By the Sherman-MorrisonWoodbury formula [4, Equation (2.1.4)], we have 1 y= 2k Note that s dT e 1 k λ̃e − γ 2d − e = k λ̃e − 2γ d − e . k 2k k (1.8) s 1 di − ≤ kdk∞ 1 − ≤ δ1 (k − 1) k k for all i ∈ V ∗ , by the facts that s ≥ kdk∞ and kdk∞ ≤ δ1 (k −1). It follows that each element yi of y satisfies 2kyi ≥ k λ̃ − 2γδ1 (k − 1). Taking λ = 2γδ1 + 1/k ensures that the entries of y and, hence, M are strictly positive. 1.3 A Bound on kF k∞ We next show that our choice of γ and λ ensures that kF k∞ ≤ 1. Note that Fij = 0 for all (i, j) ∈ V × V except those corresponding to Cases (ω4 ), (ω5 ), and (ω6 ). In Case (ω4 ), we have 1 2δ1 + δ2 − 1 λ − 2δ1 = − 2δ1 > −1 Fij = − = − γ kγ 2 by the choice of γ = 2((1 − 2δ1 − δ2 )k)−1 and λ = 2γδ1 + 1/k, as well as the assumption that 2δ1 + δ2 < 1. On the other hand, we take λ Fij = − γ k k − nj for all (i, j) corresponding to Case (ω5 ). In this case, we have |Fij | < 1 because 1 nj 1 1 + 2δ1 + δ2 + 2δ1 + ≤ + 2δ1 + δ2 ≤ < 1. kγ k kγ 2 by the fact that nj ≤ δ2 k and the assumption that 2δ1 + δ2 < 1. The case corresponding to (ω6 ) follows similarly by the symmetry of F . 1.4 A Bound on kW k It remains to establish that our choice of W satisfies kW k ≤ 1. The following lemma provides the necessary bound on kW k. 4 Lemma 1.1 Suppose that W is constructed according to (ω1 )-(ω6 ). Then 2 2 2 kW k ≤ 4γ (1 + 2δ1 )s + 2λ 1+2 2 − δ2 1 − δ2 r . Proof: We decompose W as W = Q + R, where Q(V ∗ , V ∗ ) = W (V ∗ , V ∗ ) and all remaining entries of Q are zero. It follows that kW k2 ≤ kW k2F = kQk2F + kRk2F . By an argument identical to that in [2, Section 4.1], we have kRk2F 2 − δ2 r . ≤ 2λ 1 + 2 1 − δ2 2 It remains to derive the necessary upper bound on kQk2F . Let Q̃ = Q(V ∗ , V ∗ ). Note that dT e T 1 T T Q̃ = γ −H + de + ed − 2 ee k k by our choice of y, where H is the adjacency matrix of the complement of G(V ∗ ). By the fact that both kHk2F and dT e are equal to twice the number of pairs of nonadjacent nodes in V ∗ we have kHk2F = dT e ≤ 2s. On the other hand, we have 1 dT e T 2 T T k de + ed − k 2 ee F 4s2 1 4s T T T T 2 T T 2 = 2 kde + ed kF − Tr (de + ed )ee + 2 kee kF k k k 2 4s 1 4s 1 · 4ks + 2 · k 2 = 2 2kkdk2 − 4s2 . ≤ 2 8s2 + 2kkdk2 − k k k k Applying Hölder’s inequality [5, Equation (5.4.14)] and the inequalities s ≥ kdk∞ and kdk∞ ≤ δ1 (k − 1) implies that 1 dT e T 2 T T ≤ 2 kkdk1 − 2s kdk∞ ≤ 4s δ1 (k − 1)2 ≤ 4δ1 s. de + ed − ee k k2 k2 k2 F It follows immediately that kQk2F = kQ̃k2F ≤ 2γ 2 (2s + 4δ1 s) = 4γ 2 (1 + 2δ1 )s. This completes the proof. By our choice of λ and γ, Lemma 1.1 implies that we can have r and s as large as O(k 2 ) and still have kW k < 1. 5 2 2.1 Recovery of the Densest (k1, k2)-Subgraph The Adversarial Case We next derive the guarantee for recovery of the maximum density (k1 , k2 )-subgraph under adversarial noise from the optimal solution of the convex problem n o X U ×V min kXk∗ + γkY k1 : Xij ≥ k1 k2 , Xij + Yij = 0 ∀ ij ∈ / E, Xij ∈ [0, 1] (2.1) given by the following theorem. Theorem 2.1 Let G = (U, V, E) be a bipartite graph and let U ∗ ⊆ U , V ∗ ⊆ V be subsets of cardinality k1 and k2 respectively. Let u and v denote the characteristic vectors of U ∗ and V ∗ . Let (X ∗ , Y ∗ ) = (uvT , −PẼ (uvT )). Suppose that G(U ∗ , V ∗ ) contains at least k1 k2 − s edges and that G contains at most r edges other than those in G(U ∗ , V ∗ ). Suppose that every node in V ∗ is adjacent to at least (1 − α1 )k1 nodes in U ∗ and every node in U ∗ is adjacent to at least (1 − α2 )k2 nodes in V ∗ for some scalars α1 , α2 > 0. Further, suppose that each node in V − V ∗ is adjacent to at most β1 k1 nodes in U ∗ and each node in U − U ∗ is adjacent to at most β2 k2 nodes in V ∗ for some β1 , β2 > 0. Finally suppose that the scalars α1 , α2 , β1 , β2 satisfy α1 + α2 + max{β1 , β2 } < 1. Then there exist scalars c1 , c2 > 0, depending only on α1 , α2 , β1 , and β2 , such that if r ≤ c1 k1 k2 and s ≤ c2 k1 k2 then G(U ∗ , V ∗ ) is the unique maximum density (k1 , k2 )-subgraph of G and (X ∗ , Y ∗ ) is the unique optimal solution −1 √ of (2.1) for γ = 2 k1 k2 (1 − α1 − α2 − max{β1 , β2 }) . As before, the proof of Theorem 2.1 relies on establishing that the proposed solution (X ∗ , Y ∗ ) satisfies the sufficient conditions for optimality given by the Karush-Kuhn-Tucker theorem. The following theorem provides the necessary specialization of these conditions for (2.1). Theorem 2.2 Let G = (U, V, E) be a bipartite graph. Let Ū ⊆ U , V̄ ⊆ V have cardinality k1 and k2 , respectively and let ū, v̄ be their respective characteristic vectors. Let X̄ = ūv̄T and let Ȳ = −PẼ (X̄). Suppose that there exist F, W ∈ RU ×V , λ ∈ R, and M ∈ RU+×V such that √ X̄ + W − λeeT − γ(Ȳ + F ) + M = 0 k1 k2 (2.2) W v̄ = 0, W T ū = 0, kW k ≤ 1 (2.3) kF k∞ ≤ 1 (2.4) PẼ (F ) = 0, 6 Fij = 0 for all (i, j) ∈ E (2.5) Mij = 0 for all (i, j) ∈ (U × V ) − (Ū × V̄ ). (2.6) Then (X̄, Ȳ ) is an optimal solution of (2.1) and the subgraph G(Ū , V̄ ) induced by (Ū , V̄ ) is a maximum density (k1 , k2 )-subgraph of G. Moreover, if kW k < 1 and kF k∞ < 1 then (X̄, Ȳ ) is the unique optimal solution of (2.1) and G(V̄ ) is the unique maximum density (k1 , k2 )-subgraph of G. The proof of Theorem 2.2 is analogous to that of [1, Theorem 4.1] and is left to the reader. As in the proof of Theorem 1.1, we establish Theorem 2.1 by showing that there exist multipliers corresponding to our proposed solution (X ∗ , Y ∗ ) satisfying the hypothesis of Theorem 2.2. In particular, we choose W and F according to the following cases. (ψ1 ) If (i, j) ∈ (U ∗ × V ∗ ) ∩ E then (2.2) implies that W must satisfy Wij = λ̂ − Mij , where √ λ̂ := λ − 1/ k1 k2 . (ψ2 ) If (i, j) ∈ U ∗ × V ∗ such that (i, j) ∈ / E, we take Wij = λ̂ − γ − Mij . (ψ3 ) If (i, j) ∈ (U × V ) − (U ∗ × V ∗ ) such that (i, j) ∈ E, we let Wij = λ to ensure that the left-hand side of (2.2) is equal to zero. (ψ4 ) If i ∈ / U, j ∈ / V , (i, j) ∈ / E, we take Wij = 0. In this case, we must have Fij = −λ/γ to ensure that the left-hand side of (2.2) is zero. (ψ5 ) Suppose that i ∈ / U ∗ , j ∈ V ∗ such that (i, j) ∈ / E. We choose Wij = −λ ni k2 − ni λ , Fij = − γ k2 k2 − ni , in this case, where ni is the number of neighbours of i in V ∗ . (ψ6 ) Finally, if i ∈ U ∗ , j ∈ / V ∗ , (i, j) ∈ / E then we take Wij = −λ mj k1 − mj λ , Fij = − γ k1 k1 − mj , where mj is the number of neighbours of j in U ∗ . We next establish that there exist λ ∈ R and M ∈ RU+×V such that W v = 0 and W T u = 0. If i ∈ / U ∗ , then [W v]i = X Wij = ni λ − (k2 − ni ) j∈V ∗ 7 ni k2 − ni λ = 0. Similarly, if j ∈ / V ∗ , we have T [W u]j = X Wij = mj λ − (k1 − mj ) i∈U ∗ mj k1 − mj λ = 0. The requirement that [W v]i = 0, and [W T u]j = 0 for all i ∈ U ∗ , j ∈ V ∗ defines an underdetermined system of k1 + k2 equations for k1 k2 unknowns. To identify a solution of this system, we parametrize M as M = yeT + ezT for some y ∈ Rk1 and z ∈ Rk2 . The conditions W (U ∗ , V ∗ )e = 0 and W (U ∗ , V ∗ )T e = 0, imply that y and z are solutions of the system k2 I eeT eeT k1 I ! y z ! ! b1 b2 = , (2.7) ∗ ∗ where b1 := k2 λ̂e − (k2 e − n)γ, b2 := k1 λ̂ − (k1 e − m)γ, and n ∈ RU , m ∈ RV are the vectors whose entries are equal to the degrees in G(U ∗ , V ∗ ) of each node in U ∗ and V ∗ , respectively. This system is singular with nullspace spanned by the vector (e; −e) = (ek1 ; −ek2 ), where ek1 , ek2 are the all-ones vectors in Rk1 and Rk2 , respectively. Since (e; −e) is in the nullspace of the coefficient matrix, the unique solution of the nonsingular system obtained by perturbing (2.7) by (e; −e)(e; −e)T , k2 I + eeT 0 0 k1 I + eeT ! y z ! b1 b2 = ! , (2.8) is also a solution of (2.7). Indeed, taking the inner product of each side of (2.8) with (e; −e) yields (k1 + k2 )(eT y − eT z) = bT1 e − bT2 e. However, bT1 e − bT2 e = γ(nT e − mT e) = 0 by the fact that both mT e and nT e equal |E(G(U ∗ , V ∗ ))|. It follows that yT e − zT e = 0 and so the solution of (2.8) must satisfy b1 b2 ! = = ! ! k2 I + eeT 0 y 0 k1 I + eeT z ! ! k2 I eeT y + (yT e − zT e) T ee k1 I z e −e ! = k2 I eeT eeT k1 I ! y z ! . Thus, the unique solution of (2.8) is a solution of (2.7). Applying the Sherman-Morrison-Woodbury formula to (2.8) shows that 1 y= k2 I− eeT k1 + k2 b1 = λ̂ k2 k1 + k2 8 γ − k2 dT2 e d2 − e , k1 + k2 (2.9) where d2 = k2 e − n. Similarly, z = λ̂ k1 k1 + k2 γ e− k1 dT1 e d1 − e , k1 + k2 (2.10) where d1 = k1 e − m. The following lemma provides a lower bound on the entries of M , based on lower bounds on the entries of y and z. Lemma 2.1 For all i ∈ U ∗ , j ∈ V ∗ , Mij ≥ λ̂ − γ(α1 + α2 ) 1 − 1 k1 + k2 . Proof: Recall that 0 ≤ [d2 ]i ≤ α2 k2 for all i ∈ U ∗ . It follows immediately that dT2 e 1 1 [d2 ]i − ≤ kd2 k∞ 1 − ≤ α2 k2 1 − k1 + k2 k1 + k2 k1 + k2 (2.11) for all i ∈ U ∗ . Similarly, 1 dT1 e ≤ α1 k1 1 − [d1 ]j − k1 + k2 k1 + k2 (2.12) for all j ∈ V ∗ . Substituting (2.11) and (2.12) into (2.9) and (2.10) completes the proof. It follows immediately from Lemma 2.1 that all entries of M are nonnegative if λ = √ γ(α1 + α2 ) − 1/ k1 k2 . We next establish that kF k∞ < 1 for γ chosen as in Theorem 2.1 and this particular choice of λ, M . Recall that Fij = 0 except for all (i, j) corresponding to Cases (ψ4 ), (ψ5 ), and (ψ6 ). In (ψ4 ), we take Fij = −λ/γ. In this case λ 1 |Fij | = = √ + α1 + α2 = γ γ k1 k2 1 − α1 − α2 − max{β1 , β2 } 2 + α1 + α2 < 1, by our choice of γ and the assumption that α1 + α2 + max{β1 , β2 } < 1. On the other hand, if (i, j) are as in (ψ5 ), λ k2 |Fij | = γ k2 − ni which is bounded above by 1 by the fact that ni ≤ β2 k2 and our choice of γ. Similarly, if (i, j) are as in (ψ6 ), then |Fij | < 1 since mj ≤ β1 k1 . Therefore, kF k∞ ≤ 1 for this particular choice of λ, M and γ. It remains to show that our choice of W has spectral norm bounded above by 1 when the hypothesis of Theorem 2.1 is satisfied. The following lemma provides the necesssary upper 9 bound on kW k. Lemma 2.2 The matrix W constructed according to (ψ1 ) − (ψ6 ) satisfies kW k ≤ 2γ (α1 + α2 + 1)s + λ 1 + 2 2 1 1 + 1 − β1 1 − β2 r As an immediate consequence of Lemma 2.2, note that kW k ≤ 1 for r and s as large as O(k1 k2 ). That is, we have exact recovery of the hidden biclique after up to O(k1 k2 ) edge additions or deletions. The remainder of this section comprises a proof of Lemma 2.2. We decompose W as W = Q + R, where Q is such that Q(U ∗ , V ∗ ) = W (U ∗ , V ∗ ) and all remaining entries of Q are zero. We will obtain bounds on kQkF and kRkF individually and bound kW k by kW k2 ≤ kW k2F = kQk2F + kRk2F . The fact that kRk2F ≤λ r 1+ 2 1 1 + 1 − β1 1 − β2 follows from an argument identical to that in [2, Section 5.1]. It remains to derive the necessary upper bound on kQkF . Let Q̃ = Q(U ∗ , V ∗ ). By our choice of M , we have T d2 eT edT1 d1 e dT2 e eeT Q̃ = γ −H + + − + , k2 k1 k1 k2 k1 + k2 where H is the adjacency matrix of the complement of G(U ∗ , V ∗ ). By our definition of H, d1 and d2 , we have kHk2F = dT1 e = dT2 e ≤ s. We assume that s = dT1 e = dT2 e for simplicity in the following calculations. Then, Hölder’s inequality and the assumption that kd1 k1 = kd2 k1 = s implies that T T 2 2 2 2 d2 eT ed 1 1 ee 1 = kd1 k + kd2 k − s + − s + k2 k1 k1 k2 k1 + k2 F k1 k2 k1 k2 skd1 k∞ 1 skd2 k∞ 1 ≤ 1− + 1− k1 k2 k2 k1 1 1 ≤ sα1 1 − + sα2 1 − ≤ s(α1 + α2 ), k2 k1 10 where the second to last inequality follows from the fact that kd1 k∞ ≤ α1 k1 , kd2 k∞ ≤ α2 k2 . Putting everything together yields 2 2 kQk = kQ̃k ≤ kQ̃k2F ≤2 kHk2F + kQ̃ − Hk2F ≤ 2s(1 + α1 + α2 ) as required. 2.2 The Random Case We conclude by proving an analogous result to Theorem 2.1 for random graphs constructed as follows: (Ψ1 ) For some k1 -subset U ∗ ⊆ U and k2 -subset V ∗ ⊆ V , we add each potential edge from U ∗ to V ∗ independently with probability 1 − q. (Ψ2 ) Then each remaining possible edge is added independently to E with probability p. By construction, G(U ∗ , V ∗ ) is significantly more dense than all other (k1 , k2 )-subgraphs in expectation in the case that p + q < 1. So if k1 and k2 are sufficiently large, we should expect to recover G(U ∗ , V ∗ ) from the optimal solution of (2.1). Indeed, we have the following theorem. Theorem 2.3 Suppose that the (N1 , N2 )-node bipartite graph G = (U, V, E) is constructed according to (Ψ1 ) and (Ψ2 ) such that p + q < 1 and (1 − p)ki ≥ max{8, 64p} log ki (2.13) pNi ≥ (1 − p)2 log2 Ni n 1/2 o (1 − p − q)ki ≥ 72 max log ki , q(1 − q)ki log ki (2.14) (2.15) for i = 1, 2. Then there exist absolute constants c1 , c2 , c3 > 0 such that if p 1/2 c1 (1 − p − q) k1 k2 ≥ Nmax log(Nmax ) · max p1/2 , ((1 − p) min{k1 , k2 })−1/2 , (2.16) where Nmax = max{N1 , N2 }, then G(U ∗ , V ∗ ) is the densest (k1 , k2 )-subgraph of G and (X ∗ , Y ∗ ) is the unique optimal solution of (2.1) with high probability for γ∈ c3 c2 √ √ , (1 − p − q) k1 k2 (1 − p − q) k1 k2 11 . As before, we establish optimality of (X ∗ , Y ∗ ) for (2.1) by proving that a particular proposed choice of multipliers satisfies the conditions for uniqueness and optimality given by Theorem 2.2. We choose W and F as in (ψ1 )-(ψ6 ) with the following exception: (ψ40 ) If i ∈ / U ∗, j ∈ / V ∗ such that (i, j) ∈ / E, we choose Wij = −λp/(1 − p) and Fij = −λ/(γ(1 − p)). We next establish that there exists choice of λ, M and γ corresponding to this choice of W and F satisfying the hypothesis of Theorem 2.2. U ×V We begin by showing that there exist λ ∈ R and M ∈ R+ such that W v = 0 and T T ∗ W u = 0. As before, [W v]i = 0 and [W u]j = 0 for all i ∈ /U ,j∈ / V ∗ by our choice of Wij in (ψ3 ), (ψ5 ) and (ψ6 ), and we parametrize M as M = yeT + ezT where y, z are given by (2.9) and (2.10). By this choice of y, z we have [W v]i = 0 and [W T u]j = 0 for all i ∈ U ∗ and j ∈ V ∗ . We next show that the entries of y and z are nonnegative with high probability. Note that k k1 2 e and E[z] = (λ̂ − qγ) e E[y] = λ̂ − qγ k1 + k2 k1 + k2 since E[d1 ] = qk1 e and E[d2 ] = qk2 e. It follows that E[Mij ] = λ̂ − γq > 0 under the assumption that λ̂ > γq. Thus, M is nonnegative in expectation. The following lemma shows that the entries of y and z are concentrated near their expected values and, consequently, M is nonnegative with high probability. Here, and in the rest of this note, an event occurs with high probability (w.h.p.) if the event occurs with probability tending polynomially to 1 as min{k1 , k2 } tends to infinity. Lemma 2.3 Suppose that q(1 − q)k1 ≥ log k1 , q(1 − q)k2 ≥ log k2 , and q(1 − q)k1 k2 ≥ log(k1 + k2 ). Then ky − E[y]k∞ kz − E[z]k∞ n 1/2 o 12 ≤ max log k2 , q(1 − q)k2 log k2 , and k2 n 1/2 o 12 ≤ max log k1 , q(1 − q)k1 log k1 k1 with high probability. Proof: We prove the bound on ky − E[y]k∞ . An identical argument yields the bound on kz − E[z]k∞ . Recall that γ y − E[y] = k2 1 T d e − qk1 k2 e − (d2 − qk2 e) . k1 + k2 2 12 (2.17) The entries of d2 are binomially distributed random variables, each corresponding to k2 independent Bernoulli trials with probability of success q. By the standard Bernstein inequality (see [1, Lemma 1]), we have |[d2 ]i − qk2 | ≤ 6 max n q(1 − q)k2 log k2 1/2 , log k2 o (2.18) for all i ∈ U ∗ with high probability. Similarly, dT2 e is a binomially distributed random variable corresponding to k1 k2 independent Bernoulli trials. Applying the Bernstein inequality again establishes that |dT2 e − qk1 k2 | ≤ 6 max n o 1/2 q(1 − q)k1 k2 log(k1 k2 ) , log(k1 k2 ) (2.19) with high probability. Substituting (2.18) and (2.19) into (2.17) yields the desired bound on ky − E[y]k∞ . Lemma 2.3 implies that the entries of M are nonnegative with high probability for suffi√ ciently large values of k1 , k2 . Indeed, taking λ = γ((1 − p − q)/3 + q) + 1/ k1 k2 yields γ Mij ≥ 3 1/2 72 log(kmin ), q(1 − q)kmin log(kmin ) 1−p−q− kmin for all i ∈ U ∗ , j ∈ V ∗ with high probability by Lemma 2.3, where kmin = min{k1 , k2 }. Therefore, all entries of M are nonnegative provided p, q, k1 and k2 satisfy (2.15) by the fact that log x/x is a decreasing function on the interval x ∈ [3, ∞). We next verify that our choices of λ, M and γ yield F such that kF k∞ < 1. As before, Fij = 0 for all i, j except those corresponding to Cases (ψ4 ), (ψ5 ) and (ψ6 ). In (ψ4 ), we have 2(1 − p − q) +q <1−p 3 −1 √ and, consequently, |Fij | < 1 for γ ≥ 3 (1 − p − q) k1 k2 and λ = γ((1 − p − q)/3 + q) + √ 1/ k1 k2 . On the other hand, when (i, j) corresponds to (ψ5 ) we have |Fij |(1 − p) ≤ λ |Fij | = γ k2 k2 − ni Therefore, |Fij | < 1 in this case if and only if λ ni + < 1. γ k2 13 . p By the Bernstein inequality, ni ≤ pk2 + 6 max{ p(1 − p)k2 log k2 , log k2 } with high probability, which ensures that |Fij | ≤ 1 with high probability. By a similar argument, we have |Fij | < 1 in Case (ψ6 ) with high probability for sufficiently large k1 . It remains to establish that kW k ≤ 1 with high probability for this choice of multipliers. The following lemma provides the necessary bound on kW k. Lemma 2.4 Suppose that p, q, k1 , k2 , N1 , and N2 satisfy (2.13) and (2.14). Let kmax = max{k1 , k2 }, kmin = min{k1 , k2 }, and Nmax = max{N1 , N2 }. Then ( kW k ≤ 24γ max +λ 1/2 q(1 − q)kmax log(kmax ) , 36 1−p 1/2 Nmax kmax kmin 1/2 ) log(kmax ) n −1/2 o 1/2 log(Nmax ) · max p , (1 − p)kmin with high probability. As an immediate consequence of Lemma 2.4, note that there exist absolute constants c1 , c3 such that kW k < 1 if c1 (1 − p − q) p 1/2 k1 k2 ≥ Nmax log(Nmax ) · max p1/2 , ((1 − p) min{k1 , k2 })−1/2 √ √ and we take 3((1 − p − q) k1 k2 )−1 ≤ γ ≤ c3 (1 − p − q) k1 k2 )−1 . The remainder of this section consists of a proof of Lemma 2.4. We decompose W as W = Q + R as before and bound kQk and kRk separately. To do so, we will repeatedly apply the following bound on the largest singular value of random matrix. Lemma 2.5 Let A ∈ Rm×n be a random matrix with i.i.d. entries having mean zero, variance σ 2 , and magnitude bounded above by B. Let n̄ = max{m, n}. Then n o kAk ≤ 6 max σ (n̄ log n̄)1/2 , B log n̄ with probability at least 1 − 2n̄−8 . Lemma 2.5 follows immediately from applying the Noncommutative Bernstein Inequality T [6, Theorem 1.6] to the sequence of matrices {Zij }i=1,...m j=1,...,n defined by Zij = Aij ei ej , where ei , ej are the ith and jth standard basis vectors in Rm and Rn respectively; the details of the proof are left to the reader. 14 We are now ready to bound kQk and kRk. We begin with kQk. Without loss of generality, we assume that k1 ≤ k2 . Under this assumption kmin = k1 and kmax = k2 . Let Q̃ := Q(U ∗ , V ∗ ). We decompose Q̃ as Q̃ = γ(Q1 + Q2 + Q3 + Q4 ), where edT1 d2 eT − qeeT , Q3 = − qeeT , Q1 = qeeT − H, Q2 = k2 k1 T d1 e dT2 e eeT Q4 = − qk1 + − qk2 . k1 k2 k1 + k2 The entries of Q1 are i.i.d. mean zero variables with variance σ 2 = q(1 − q) and magnitude bounded above by 1. Applying Lemma 2.5 shows that n 1/2 o kQ1 k = kH − qeeT k ≤ 6 max log k2 , q(1 − q)k2 log k2 (2.20) with high probability. On the other hand, applying (2.19) and the triangle inequality shows that T √ |d1 e − qk1 k2 | |dT2 e − qk1 k2 | k1 k2 kQ4 k ≤ + k1 k2 k1 + k2 ( ) 2 1/2 log (k1 k2 ) 1/2 ≤ 6 max q(1 − q) log(k1 k2 ) (2.21) , k1 k2 with high probability. Similarly, !1/2 1 1 X kQ2 k = √ kd2 − qk2 ek = ([d2 ]i − qk2 )2 k k2 2 i∈U ∗ 1/2 n 1/2 o k1 ≤6 max log k2 , q(1 − q)k2 log k2 k2 (2.22) with high probability by (2.18). By symmetry, we have kQ3 k ≤ 6 k2 k1 1/2 n 1/2 o max log k2 , q(1 − q)k2 log k2 with high probability. Combining (2.20), (2.21), (2.22), and (2.23) shows that ( kQk ≤ 24γ max q(1 − q)k2 log k2 with high probability. 15 1/2 , k2 k1 ) 1/2 log k2 (2.23) We complete the proof by deriving the required upper bound on kRk. To do so, we decompose R as R = λ(R1 + R2 + R3 + R4 ) as in [1, Section 3.4]. We first approximate R as a random matrix R1 in RN1 ×N2 with i.i.d. entries having mean 0, variance σ 2 = p/(1 − p), and magnitude bounded by B := max{1, p/(1 − p)}. Specifically, for all ij ∈ (U − U ∗ ) × (V − V ∗ ) ∩ E we let [R1 ]ij = 1 = Wij /λ if ij ∈ E and [R1 ]ij = −p/(1 − p) otherwise. For all i ∈ U ∗ , j ∈ V ∗ , [R1 ]ij is a random variable independently sampled from the distribution ( x= 1, with probability p, −p/(1 − p), with probability 1 − p. Applying Lemma 2.5 to R1 shows that ( kR1 k ≤ 6 max B log2 (Nmax ), p 1−p 1/2 ) Nmax log(Nmax ) (2.24) with high probability. The remaining terms in the decomposition R = λ(R1 + R2 + R3 + R4 ) are corrections for each of the following cases. First, R2 is the correction matrix for the (U ∗ , V ∗ ) block of R. Recall that R(U ∗ , V ∗ ) = 0 by construction. We choose R2 (U ∗ , V ∗ ) = −R1 (U ∗ , V ∗ ) and set all remaining entries of R2 equal to 0. Then, again by Lemma 2.5, we have ( kR2 k ≤ 6 max B log2 k2 , p 1−p 1/2 !) k2 log k2 (2.25) with high probability. To complete the proof, we take R3 and R4 to be the correction matrices corresponding to Cases (ψ5 ) and (ψ6 ). We take p ni − , if i ∈ U − U ∗ , j ∈ V ∗ , ij ∈ / E, 1 − p k − n 2 i [R3 ]ij =  0, otherwise,   and similarly define R4 as the correction for the (U ∗ , V − V ∗ ) block of R. Note that 2 kR3 k ≤ kR3 k2F 2 X p X ni (pk2 − ni )2 = − = . 2 (k − n ) 1 − p k (1 − p) 2 = −ni 2 i ∗ ∗ i∈U −U i∈U −U 16 Recall that the Bernstein inequality implies that |ni − pk2 | ≤ 6 p(1 − p)k2 log k2 high probability if p(1 − p)k2 ≥ log k2 . In this case, kR3 k2 ≤ ≤ 1/2 with 36N1 max{p(1 − p)k2 log k2 , log2 k2 } p (1 − p)2 (1 − p)k2 − 6 max{ p(1 − p)k2 log k2 , log k2 } 144N1 max{p(1 − p)k2 log k2 , log2 k2 } (1 − p)3 k2 (2.26) with high probability, where the last inequality holds under the assumption that (1 − p)k2 ≥ max{8, 64p} log k2 . Similarly, 144N2 max{p(1 − p)k1 log k1 , log2 k1 } kR4 k ≤ (1 − p)3 k1 2 (2.27) with high probability if (1 − p)k1 ≥ max{8, 64p} log k1 . Combining (2.24), (2.25), (2.26), and (2.27) shows that kRk ≤ λ 36 1−p 1/2 Nmax n log(Nmax ) · max p 1/2 , (1 − p)k1 −1/2 o with high probability under the assumptions of Theorem 2.3; in particular, the assumption (pNmax )1/2 ≥ (1 − p) log(Nmax ) is used to bound the B log2 (Nmax ) term in (2.24) in order to simplify the expression for the bound on kRk. This completes the proof. References [1] B. Ames. Guaranteed Recovery of Planted Cliques and Dense Subgraphs by Convex Relaxation. Arxiv preprint arXiv: 1305.4891, 2013. 2, 3, 7, 13, 16 [2] B. Ames and S. Vavasis. Nuclear norm minimization for the planted clique and biclique problems. Mathematical Programming, 129(1):1–21, 2011. 5, 10 [3] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, Cambridge, UK, 2004. 2 [4] G.H. Golub and C.F. Van Loan. Matrix computations. Johns Hopkins University Press, 1996. 4 [5] R.A. Horn and C.R. Johnson. Matrix analysis. Cambridge University Press, 2005. 5 17 [6] J.A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, pages 1–46, 2011. 14 18

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Appendices