* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ALTERNATING PROJECTIONS ON NON
Survey
Document related concepts
Transcript
ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. FREDRIK ANDERSSON AND MARCUS CARLSSON We consider sequences (Bk )∞ k=0 of points obtained by projecting a given point B = B0 back and forth between two manifolds M1 and M2 , and give conditions guaranteeing that the sequence converges to a limit B∞ ∈ M1 ∩ M2 . Our motivation is the study of algorithms based on nding the limit of such sequences, which have proven useful in a number of areas. The intersection is typically a set with desirable properties, but for which there is no ecient method for nding the closest point Bopt in M1 ∩ M2 . Under appropriate conditions, we prove not only that the sequence of alternating projections converges, but that the limit point is fairly close to Bopt , in a manner relative to the distance kB0 − Bopt k, thereby signicantly improving earlier results in the eld. Abstract. 1. Introduction R and let M be a manifold. M = M1 ∩ M2 along with corre1 sponding projection operators denoted by π , π1 and π2 . Given a point B ∈ K, suppose that π(B) is sought for and that π1 and π2 can be eciently computed, whereas π cannot. A classical result by von Neumann [52] says that if M1 and M2 Let K be a nite dimensional Hilbert space over Given two manifolds M1 , M2 ⊂ K satisfying are ane linear manifolds, then the sequence of alternating projections B0 = B, (1.1) converges to π(B). the angle between Bk+1 = π1 (Bk ), π2 (Bk ), if if k is even, k is odd, Moreover, the convergence rate is linear, and determined by M1 and M2 . This paper is concerned with extensions of this result to non-linear manifolds. Suppose for the moment that the limit point of a (Bk )∞ k=0 exists and denote it by B∞ . In contrast to the case where each Mj (j = 1, 2) is an ane linear manifold, B∞ is usually dierent from π(B). However, given that Mj behave nicely, we may expect B∞ ≈ π(B), which particular sequence is one of the central results of the paper. We begin with a brief review of related works. For a brief history of the early developments of von Neumann's algorithm we refer to [29]. Alternating projec- tion schemes for non-linear subsets have been used in a number of applications; cf. 2010 Mathematics Subject Classication. 41A65, 49Q99, 53B25. Key words and phrases. Alternating projections, convergence, Non-convexity, low-rank approximation, manifolds. 1 If the manifold M is not convex, then there exist points with multiple closest points on the manifold. To dene the projection onto M, one thus needs to involve point to set maps. We will not use this formalism, but rather write π(B) to denote an arbitrarily chosen closest point to B on M. This is done to simplify the presentation and because our results are stated in a local environment where the π 's are shown to be well dened functions. See Proposition 2.3. 1 2 FREDRIK ANDERSSON AND MARCUS CARLSSON [13, 24, 33, 36, 37, 38, 39, 43]. matrices and the sets Mj For instance, K can be the set Mm,n of m × n- be subsets with a certain structure, e.g. matrices with a certain rank, self-adjoint matrices, Hankel or Toeplitz matrices etc. For a detailed overview of optimization methods on matrix manifolds, cf. [1]. Alternating projection schemes between several linear subsets was investigated in [3, 29]. Other applications concern e.g. the EM algorithm, see [8]. Much emphasis has been put towards the use of alternating projections for the case of convex sets M1 and M2 , see for instance [5, 7, 12, 26]. It is worth pointing out that except for linear subspaces, there is no overlap between this theory and the one considered here. Dyk- stra's alternating projection algorithm [6, 19] utilizes an additional dual variable and a slightly dierent projection scheme that is more ecient than the classical von Neumann method in the convex case. For connections between Dykstra's alternating projection method and the alternating direction method of multipliers (ADMM), see [10]. However, for non-convex sets the eld remained rather undeveloped until the 90's. For example, in [13], Zangwill's Global Convergence Theorem [55] is used to motivate the convergence of an alternating projection scheme. Zangwill's theorem (Bk )∞ k=1 is bounded and the distance to M1 ∩ M2 is (Bk )∞ k=1 has a convergent subsequence to a point B∞ ∈ implies that if the sequence strictly decreasing, then M1 ∩ M2 . This result is an easy consequence of the fact that any sequence in a compact set has a convergent subsequence. Thus, the use of Zangwill's theorem in this context does not provide any information about whether the limit point of the entire sequence exists, or if so, whether it is close to the optimal point π(B). To our knowledge, the rst rigorous attempt at dealing with the method of alternating projections for non-convex sets was made in [14]. However, the setting is rather general and the results are mainly concerned with convergence of the sequence. In particular, when this does happen, the results in [14] does not reveal anything about the size of the error kπ(B) − B∞ k. Recently, A. Lewis and J. Malick presented stronger results [35], although valid under somewhat more restrictive conditions on M1 and M2 . Before discussing their results in more detail, we give two simple examples that illustrate some of the diculties that may arise when using alternating projections on non-linear manifolds. (1, 1) (3, 0) (1, 0) Figure 1. An example demonstrating that the algorithm can get stuck in loops where not even a subsequence converges to a point in the intersection (the black dots). However, starting closer to the intersection point, we do get convergence (the white dots). ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. Example 1.1. Let M2 = R × {0}. K = R2 M1 = {(t, (t + 1)(3 − t)/4) : t ∈ R} π1 ((1, 0)) = (1, 1) and π2 ((1, 1)) = (1, 0), and set It is easily seen that 3 and and hence the sequence of alternating projections does not converge, cf. Figure 1. On the other hand, if (1 + , 0) ∈ M2 , > 0, is used as a starting point, the sequence of alternating projections will converge to (3, 0) ∈ M1 ∩ M2 . In general, it seems reasonable to assume that if the starting point is suciently close to an intersection point, then the sequence does converge to a point in the intersection. The next example shows that this is not always the case. Figure 2. Alternating projections stuck in a loop. Example 1.2. Without going into the details of the construction, we note that one can construct a C ∞ -function f with f (0) = 0 and horizontal segments arbitrarily M1 = R × {0} and M2 = {(t, f (t)) : t ∈ near 0. Figure 2 explains the idea. With R}, the sequence of alternating projections can then get stuck in projecting back and forth between the same two points, even if the starting point is arbitrarily close to the intersection point However, if we have (0, 0). f 0 (0) 6= 0 it is hard to imagine how to make a similar construction work. This is indeed impossible, which follows both from the present f 0 (0) 6= 0 implies that M1 and M2 are transversal at (0, 0). In general, given C -manifolds M1 , M2 and a point A ∈ M1 ∩ M2 , we say that A is transversal if paper and [35]. In the terminology of the latter, the condition 1 TM1 (A) + TM2 (A) = K, (1.2) where TMj (A) denotes the tangent-space of [35] is roughly the following: Mj at A, j = 1, 2. The main result in Theorem 1.3. Let M1 and M2 be C 3 -manifolds and let A ∈ M1 ∩M2 be transversal. If B is close enough to A, then the sequence of alternating projections (Bk )∞ k=0 given by (1.1) converges at a linear rate to a point B∞ in M1 ∩ M2 . (1, 1) (0, 0) Figure 3. Lack of transversality due to tangential curves. 4 FREDRIK ANDERSSON AND MARCUS CARLSSON The error kB∞ −π(B)k is not discussed in [35], but inspection of the proofs yield kB∞ − π(B)k ≤ 2kA − Bk. The improvement over the previously mentioned results is thus that the entire sequence converges and that the limit in relative terms, i.e. comparing with B∞ is not too far away from π(B), although dist(B, M1 ∩ M2 ) = kB − π(B)k, it need not be particularly close either. Moreover, the assumption of transversality is rather restrictive. To demonstrate the essence of the transversality assumption we now present some examples. Example 1.4. A = (0, 0), M2 = {(t, t2 ) : t ∈ R}, we obtain another example of manifolds that are not transversal at (0, 0), see Figure 3. In this case, the sequence of alternating projections do converge to (0, 0), since The manifolds in Example 1.2 are not transversal at TM1 (A) = TM2 (A) = M1 . With M1 = R × {0} and independent of the starting point, albeit extremely slowly. Example 1.5. M1 = R × {0}2 and M2 = {(t, t, t2 ) : t ∈ R}, transversality is not satised at A = (0, 0), but again it seems plausible that the sequence of alternating projections converges to (0, 0). See Figure 4. With K = R3 and 1 M2 0.8 0.6 0.4 0.2 0 1 M1 0.5 1 0.5 0 0 −0.5 −0.5 −1 −1 Figure 4. Lack of transversality due to low dimensions. In Example 1.5, the two manifolds clearly sit at a positive angle, but this situation is not covered by Theorem 1.3, since the manifolds are of too low dimension to K has dimension n and Mj has mj , j = 1, 2, the transversality (1.2) can never be satised if m1 +m2 < n, the sum of the dimensions of the manifolds is less than that of K. This satisfy the transversality assumption. In fact, if dimension i.e. when is, however, the case in many applications of practical interest. The papers [34] and [2] both consider alternating projections in a more general setting, but when dealing with manifolds, the assumptions also imply m1 + m2 ≥ n (see Theorem 5.16 and condition (5.17) in [34] and Section 4.5 in [2]). We now present the main results of the present paper. We introduce the concept of a non-tangential sions of M1 and intersection point, which impose no restriction on the dimen- M2 . Loosely speaking, a point A in M1 ∩ M2 is non-tangential if the two manifolds have a positive angle in directions perpendicular to M1 ∩ M2 . ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. 5 We will show that this happens if and only if TM1 (A) ∩ TM2 (A) = TM1 ∩M2 (A), which we can use as denition for now. In microlocal analysis this is called intersection [28]. clean We remark that non-tangential points are always transversal, since these do satisfy the above identity by Denition 3 in [35]. The point (0, 3) in Example 1.1 and the origin in Example 1.5 are both non-tangential, whereas the origins in Examples 1.2 and 1.4 are not. We now present a simplied version of the main result (Theorem 5.1). Figure 5 illustrates the idea. Theorem 1.6. Let M1 , M2 and M1 ∩ M2 be C 2 -manifolds. Given a nontangential point A ∈ M1 ∩ M2 there exists an s > 0 such that the sequence of alternating projections (1.1) converges to a point B∞ ∈ M1 ∩ M2 , given that kB − Ak < s. Moreover, given any > 0 one can take s such that kB∞ − π(B)k < kB − π(B)k. M1 ∩ M2 s B∞ π(B) < ǫd A d B Picture illustrating Theorem 1.6. B∞ may dier from π(B), but the error is small compared to the distance d between B and the manifold M1 ∩ M2 , given that B is in the ball BK (A, s). Figure 5. The improvement over Theorem 1.3 mainly consists of two items. Primarily, the assumption that the surfaces are non-tangential is not at all restrictive, and in particular there is no implication about the dimensions of M1 and M2 . For the applications we are aware of, the set of tangential points is very small, if it exists at all. Secondly, as has been highlighted before, we are usually interested not just in any point of M1 ∩ M2 , but the closest point π(B). Here the theorem says that in M1 ∩ M2 , the error is small M1 ∩ M2 decreases. When M1 and M2 are linear, we have relative terms, i.e. after dividing with the distance to and moreover improves if the distance from We now discuss the rate of convergence. (1.3) where c = cos α and α B to kBk − B∞ k ≤ const · ck is the angle between the subspaces M1 and M2 , which was shown in [26]. This is usually referred to as linear convergence. In the present setting, we show that (1.3) holds if satises c > cos(α). B is suciently near a point whose angle α However, the denition of the angle between two manifolds needs some preparation, and is discussed in Section 3. For the precise results 6 FREDRIK ANDERSSON AND MARCUS CARLSSON see Denition 3.1 and Theorem 5.1. The corresponding statement for transversal manifolds has been proven in [35], and similar results can also be found in [5, 16, 18, 22, 29, 54]. In many applications, one has real algebraic varieties as opposed to manifolds, (of which the usual complex algebraic varieties are special cases). However, algebraic varieties are manifolds except at the singular set (singular locus). Moreover, the singular set is a variety of smaller dimension, and hence makes up a very small part of the original variety. Since Theorem 1.6 is a local statement, it is plausible that it applies at non-singular intersection points of two given varieties. In Section 6, we prove that this is indeed the case under mild assumptions. We also provide a number of concrete results for verifying these assumptions. Finally, the paper ends with a simple application where the alternating projection method is used for the problem of nding correlation matrices with prescribed rank from measurements with missing data. The paper is organized as follows. Section 2 presents rudimentary results concerning manifolds, and Section 3 gives denitions and basic theory for determining angles between them. The technical part of the paper begins in Section 4, where the various projection operators are studied. With this at hand, the proof of the main theorem (given in Section 5) is fairly straightforward. Finally, Section 6 deals with real algebraic varieties and Section 7 presents the example. There are three appendices with proofs of the results in Sections 2, 6 and 7 respectively. 2. Manifolds K be A∈K We provide a review of necessary concepts from dierential geometry. Let a Euclidean space, i.e. a Hilbert space of nite dimension n ∈ N. Given r > 0 we write B(A, r) or BK (A, r) for the open ball centered at A with radius K is nite-dimensional it has a unique Euclidean topology. Any subset M of K will be given the induced topology from K. Let p ≥ 1 and let M ⊂ K be an m-dimensional C p -manifold. We recall that around each A ∈ M there exists an 2 p injective C -immersion φ on an open set U in Rm such that and r. Since M ∩ BK (A, s) = Im φ ∩ BK (A, s), (2.1) for some s > 0, where Im φ denotes the image of φ. See e.g. Theorem 2.1.2 [9]. If A = φ(xA ), we dene the tangent space TM (A) by TM (A) = Ran dφ(xA ), where dφ denotes the Jacobian matrix. It is a standard fact from dierential geometry that this denition is independent of φ, (see e.g. Section 2.5 of [9]). Moreover, we set T̃M (A) = A + TM (A), T̃M (A) i.e., is the ane linear manifold which is tangent to M at A. We now list some necessary properties of manifolds. Some statements are well-known, but for the convenience of the reader, we outline all proofs in Appendix A. Proposition 2.1. Let α : K → M be any C p −map, where M is a C p -manifold and p ≥ 1. With φ as above, the map φ−1 ◦ α is also C p (on its natural domain of denition). 2 i.e. a p times continuously dierentiable function such that dφ(x) is injective for all x ∈ BRm (0, r) ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. 7 A ⊥ B to denote that A, B ∈ K are orthogonal. If X, Y ⊂ K are subsets hA, Bi = 0 for all A ∈ X and B ∈ Y , we also write X ⊥ Y . If Y is the orthogonal complement of X , i.e. the maximal subspace with the above property, we shall denote it by Y = KX . When X is a subspace, we use PX : K → K for the orthogonal projection onto X , i.e. if A ∈ X and B ∈ K X we set PX (A + B) = A. We write such that Proposition 2.2. Let M be a C 1 -manifold. Then PT of A. The next proposition shows that projection on is a continuous function M (A) M locally is a well dened op- eration. Thus, as long as we start the alternating projections method suciently near the manifolds in question, there is no need to deal with point to set maps. Proposition 2.3. Let M be a C 2 -manifold and let A ∈ M be given. Then there exists an s > 0 such that for all B ∈ BK (A, s) there exists a unique closest point in M. Denoting this point by π(B), the map π : BK (A, s) → M is C 2 . Moreover, C ∈ M ∩ BK (A, s) equals π(B) if and only if B − C ⊥ TM (C). When there is risk of confusion we will write it is easy to show that π is C 1. πM instead of C2 The fact that it actually is π. further developed in [31]. It is however not true in general that a C 1 We remark that is noted in [23] and π is C1 if M is -manifold, in fact, it does not even have to be single valued. The last line in the above proposition is an extension of Fermat's principle, for similar statements in a more general environment we refer to [44]. We end with a proposition that basically says that the ane tangent-spaces are close to M locally. Proposition 2.4. Let M be a C 2 -manifold and A ∈ M be given. For each > 0 there exists s > 0 such that for all C ∈ BK (A, s) ∩ M we have (i) (ii) dist(D, T̃M (C)) < kD − Ck, ∀D ∈ B(A, s) ∩ M. dist(D, M) < kD − Ck, ∀D ∈ B(A, s) ∩ T̃M (C). 3. Non-tangentiality Suppose now that we are given tersection M1 ∩ M2 is itself a C 1 C 1 -manifolds M1 the objects from Section 2 associated to m1 , m2 and m M2 such that their inM1 ∩ M2 by M, and and -manifold. We will denote M1 , M2 and M, e.g. the dimension, by respectively. We thus omit subindex when dealing with We now introduce the angle between M1 and M2 M1 ∩ M2 . A ∈ M. at an intersection point This issue is a bit delicate. We rst discuss the case when M1 and M2 are linear subspaces. A good reference for various denitions of angles in this setting is [17]. We will here use the so called Friedrich's angle, which is common in functional analysis [3, 42] and also the one appearing in (1.3). The Friedrich's angle two subspaces is then given by (3.1) α = cos−1 (kPM1 PM2 − PM1 ∩M2 k) . To better understand this denition, set Fj = {Bj ∈ Mj (M1 ∩ M2 )} \ {0} and note that (3.2) −1 α = cos sup Bj ∈Fj hB1 , B2 i kB1 kkB2 k ! , α between 8 FREDRIK ANDERSSON AND MARCUS CARLSSON (assuming F1 6= ∅ = 6 F2 , in which case it becomes pointless to talk of angles). Hence, α is the minimal angle in directions perpendicular to the common intersection M1 ∩ M2 . To introduce angles between manifolds one obviously needs to let the angle be A ∈ M1 ∩ M2 . Based on TM1 (A) and TM2 (A), i.e. dependent on the point of intersection to let the angle at A be the angle of (3.1), one idea is α(A) = cos−1 kPTM1 (A) PTM2 (A) − PTM1 (A)∩TM2 (A) k . (3.3) This is indeed the denition adapted in [35], see Section 3.2. However, consider again the two manifolds in Example 1.4 and Figure 3. The expression (3.3) then leads to the counterintuitive conclusion that α((0, 0)) = cos−1 (0) = π/2. In fact, it is easy to see from (3.2) that the angle between two subspaces never can be 0, whereas we argue that the angle should be 0 for the manifolds in Figure 3. For this reason, we adopt the following denition, generalizing (3.2). Denition 3.1. Given A ∈ M1 ∩ M2 , set Fjr = {Bj ∈ Mj \ A, kBj − Ak < r and Bj − A ⊥ TM1 ∩M2 (A)}. If Fjr 6= ∅ for all r > 0 and j = 1, 2, we dene the angle α(A) of M1 and M2 at A as α(A) = cos−1 (σ(A)) where σ(A) = lim sup r→0 Bj ∈F r j hB1 − A, B2 − Ai kB1 − AkkB2 − Ak Otherwise, we let both σ(A) and α(A) be undened. . This means that that the angle, if it is dened, is minimized locally in directions perpendicular to TM1 ∩M2 (A), (as opposed to TM1 (A) ∩ TM2 (A)). With Denition 3.1, angles can be 0, and it is readily veried that the angle is zero in Example 1.4. The next proposition shows that the angle is undened only at points where it makes no sense to talk of angles. Proposition 3.2. The angle of M1 and M2 at A is undened is and only if one of the manifolds is a subset of the other in a neighborhood of A, that is, if there exists an s > 0 such that M1 ∩ BK (A, s) ⊂ M2 ∩ BK (A, s) or M2 ∩ BK (A, s) ⊂ M1 ∩ BK (A, s). Proof. There is clearly no restriction to assume that M Recall that a C 1- 1 C -function (TM (0))⊥ , (this is a simple improvement of Theorem 2.1.2 (iv) in [9]). Now, either TM1 ∩M2 (A) = TM1 (A) or not. The fact just mentioned implies that in the rst case M1 ∩ M2 and M1 r coincide near 0, and in the second case F1 6= ∅ for all r > 0. The same dichotomy obviously holds with the roles of 1 and 2 switched, and the proposition follows. manifold containing 0 A = 0. locally can be written as the graph of a dened in a neighborhood of 0 in TM (0) with values in Denition 3.3. Points A ∈ M1 ∩M2 where the angle is dened will be called nontrivial intersection points. For such points, we say that A is tangential if α(A) = 0 and non-tangential if α(A) > 0. Proposition 3.4. σ is continuous on supp σ and σ(A) = kPTM1 (A) PTM2 (A) − PTM1 ∩M2 (A) k. ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. Proof. (3.4) 9 A ∈ M1 ∩ M2 be non-trivial. It is easy to see that hB1 − A, B2 − Ai σ(A) = sup : Bj ∈ TMj (A) TM1 ∩M2 (A) = kB1 − AkkB2 − Ak = kPTM1 (A)TM1 ∩M2 (A) PTM2 (A)TM1 ∩M2 (A) k = Let = kPTM1 (A) PTM2 (A) − PTM1 ∩M2 (A) k. The continuity of σ now follows by Proposition 2.2. In particular, we get α(A) = cos−1 kPTM1 (A) PTM2 (A) − PTM1 ∩M2 (A) k (3.5) which should be compared with (3.3). Moreover, we infer that non-tangentiality is a stable property, i.e. if in a neighborhood of A is non-tangential, then the same holds for all B ∈ M1 ∩M2 A. Our experience is that in practice, tangential and trivial intersection points are exceptional. For example, we will show in Section 6 that M1 and M2 are also algebraic varieties, then non-tangentiality at in M1 ∩ M2 implies non-tangentiality of the vast majority of points. if one point The next proposition gives a useful criterion for checking non-tangentiality. Proposition 3.5. Let A ∈ M1 ∩ M2 be a non-trivial intersection point. Then A is non-tangential if and only if TM1 (A) ∩ TM2 (A) = TM1 ∩M2 (A). (3.6) Proof. By (3.4) it is easy to see that A is non-tangential if and only if TM1 (A) TM1 ∩M2 (A) ∩ TM2 (A) TM1 ∩M2 (A) = {0} which in turn happens if and only if (3.6) holds. In particular, for non-trivial intersection points a simple criterion which implies non-tangentiality is dim(TM1 (A) ∩ TM2 (A)) ≤ m, m = dim(M1 ∩ M2 )), which is immediate by TM1 (A) ∩ TM2 (A) ⊃ TM1 ∩M2 (A). (3.6) is usually referred (recall that (3.6) and the inclusion to as clean intersection in microlocal analysis, (see Appendix C.3, [28]). We have chosen the terminology non-tangential since it is more intuitive. Note that by (3.5) and (3.6), the angle as dened in Denition 3.1 and that of (3.3) are equivalent whenever A is non- tangential. Finally, Proposition 3.5 shows that non-tangentiality is a weaker concept than transversality, dened in (1.2). For if two manifolds are transversal at A, then (3.6) holds by Denition 3 in [35]. 4. Properties of the projection operators Let M1 , M2 and M = M1 ∩ M2 be C 2 -manifolds and A ∈ M1 ∩ M2 a non-tangential intersection point. This will be xed throughout the section. We M1 ∩ M2 without πM1 , πM2 and πM by π1 , π2 continue to use the convention of denoting objects related to subindex. As in the introduction we shall abbreviate and π. The following result will be the main tool for proving convergence of the alternating projections. 10 FREDRIK ANDERSSON AND MARCUS CARLSSON Theorem 4.1. For each ε > 0 there exists an s > 0 such that for all B ∈ B(A, s) we have kπ(πj (B)) − π(B)k < εkB − π(B)k, j = 1, 2. The proof is rather technical and depends on a long series of lemmas. number M. s and the map φ appearing in (2.1) clearly dier between M1 , M2 The and φ1 , φ2 and φ, and let s−1 be a number such that (2.1) holds in all three cases with s = s−1 . We will make repeated use of Propositions 2.3 and 2.4, applied to each of the manifolds M1 , M2 and M1 ∩ M2 . The numbers s appearing in these also depend on which manifold that is considered, and also on the auxiliary constant . We will choose a xed value for in the proof of Theorem 4.1. In the meantime, it will be treated as a xed number. We let s0 We denote the respective maps by be a constant small enough that Proposition 2.3 applies to each of the manifolds M1 , M2 and M1 ∩ M2 B(A, s0 ) and have A as a in B(A, s0 ). Since π1 , π2 π and then are continuous in xed point, we may also assume that that their respective images are contained in B(A, s−1 ). s0 is small enough s0 ≤ s0 B(A, s0 ). Finally, we let such that also Proposition 2.4 applies for each of the 3 manifolds in be s4 B E F π(B) π1 (B) D ρ1 (B) M1 M1 ∩ M2 Figure 6. and F Illustration of the dierence between We introduce maps ρj : B(A, s0 ) → K, ( ρj (B) = πT̃M j Thus, ρj resemble πj π(B), j=1 and Mj or j = 2), π1 . D, E via (π(B)) (B). but is slightly dierent. projects onto the tangent plane of i.e. ρ1 appear in the proof of Proposition 4.3. πj Mj whereas ρj B in M1 ∩ M2 , ρj and πj is given in projects onto taken at the closest point to (see Fig 6). An estimate of the dierence between Proposition 4.3. Lemma 4.2. The functions ρ1 and ρ2 are C 1 -maps in B(A, s0 ). Moreover, we can select a number s1 < s0 such that the image of B(A, s1 ) under ρ1 , ρ2 , π, π1 , π2 , as well as any composition of two of those maps, is contained in B(A, s0 ). ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. Proof. 11 The second part is an immediate consequence of the continuity of the maps, π, π1 and π2 are continuous by Proposition 2.3. Let j denote 1 or 2, and −1 set Mj (B) = dφj (φj (π(B))), which is well dened in B(A, s0 ) by the choice of 2 s−1 and s0 . By Proposition 2.1 and 2.3, we have that φ−1 j (π(B)) is a C -map in 1 2 B(A, s0 ), and hence Mj is C as φj is C . Given any point B0 ∈ Mj it is easy to and see that πT̃M j (B0 ) (B) = B0 + PTMj (B0 ) (B − B0 ) and hence, by the formula already used in the proof of Proposition 2.2, we have = π(B) + PTMj (π(B)) (B − π(B)) = −1 ∗ = π(B) + Mj (B) Mj∗ (B)Mj (B) Mj (B) B − π(B) ρj (B) = πT̃M j (π(B)) (B) from which the result follows. B r2 r EB r1 x h DB M1 Figure 7. kDB k) and Proof illustration. r2 = kEB k − h. Proposition 4.3. Suppose that j = 2, we have Proof. √1+ 1−2 r = kEB k+kDB k, h = 2(kEB k+ < 2. Given any B ∈ B(A, s1 ) and j = 1 or √ kπj (B) − ρj (B)k < 4 kB − π(B)k. C = π(B). By π(B) = 0, which we do from now on. Denote D = TMj (0), E = Span {B − ρj (B)} and F = K (D + E), (see Fig. 6). Let DB and EB be elements of D and E such that B = DB + EB , and note that Lemma 4.2 implies that Proposition 2.4 applies to the point a translation we can assume that ρj (B) = DB . We thus have to show that (4.1) √ kπj (B) − DB k < 4 kBk. First note that by Proposition 2.4 (ii) (applied with the set Mj ∩ B(DB , kDB k) is not void. C = π(B) = 0 and D = DB ) This set is included in the ball with center 12 FREDRIK ANDERSSON AND MARCUS CARLSSON B and radius to B Mj , on kB − DB k + kDB k = kEB k + kDB k. Since πj (B) is the closest point we conclude that kB − πj (B)k ≤ kEB k + kDB k. If we write πj (B) = D + E + F (with D, E, F in the respective subspace this becomes D, E, F ) kD − DB k2 + kE − EB k2 + kF k2 < (kEB k + kDB k)2 . (4.2) However, by Proposition 2.4 (i) (applied with also get C = π(B) = 0 and D = πj (B)) we kEk2 + kF k2 < 2 kDk2 + kEk2 + kF 2 k . (4.3) The left hand side of (4.1) is thus dominated by the supremum of the function δ(D, E, F ) = kD + E + F − DB k = p kD − DB k2 + kEk2 + kF k2 , subject to the conditions in (4.2) and (4.3). Either by geometrical considerations or the method of Lagrange multipliers, it is not hard to deduce that this supremum is attained for F =0 and D, E of the form D = dDB and E = eEB where d, e ∈ R. We now have a two-dimensional problem of circles and cones, and in the remainder D,pDB , E and EB as elements of R2 . Summing up, δ(D, E) = kD − DB k2 + kEk2 for D, E ∈ R2 satisfying of the proof we treat to maximize we want kD − DB k2 + kE − EB k2 < (kEB k + kDB k)2 (4.4) and kEk < √ (4.5) The rst inequality gives kDk ≤ kD − DB k + kDB k < kEB k + kDB k + kDB k which inserted in the second yields kEk ≤ √ 1+ As √ 2 1− < 2, kDk. 1 − 2 (1 + )kDB k + kEB k . 1 − 2 the following condition kEk < 2 kDB k + kEB k (4.6) is less restrictive than (4.5) (when combined with (4.4), see Fig. 7). The function δ to be maximized is just the distance from distance d D+E to DB , and clearly the maximal (with constraints (4.4) and (4.6)) is obtained by the point x in the picture. Using the notation from the picture, we have q q d = kDB − xk = r12 + h2 = r2 − r22 + h2 = p = (kEB k + kDB k)2 − (kEB k − 2(kEB k + kDB k))2 + (2(kEB k + kDB k))2 = p = 2kEB kkDB k + 2 kDB k2 + 4(kEB k2 + kEB kkDB k). clearly implies < 1 so 2 < p . Moreover, kEB k, kDB k ≤ kBk √ and hence the value obtained above is less than 11kBk2 ≤ 4 kBk. Since the value in question is also larger than kπj (B) − ρj (B)k the proposition follows. The assumption on ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. 13 Lemma 4.4. Given B ∈ B(A, s1 ) we have π(B) = π(ρ1 (B)) = π(ρ2 (ρ1 (B))). Proof. We begin with the rst equality. By Lemma 4.2 we have π(B), ρ1 (B) ∈ B(A, s0 ), since s0 < s0 , so Proposition 2.3 applies to give B−π(B) ⊥ TM1 ∩M2 (π(B)). We obviously also have B − ρ1 (B) ⊥ TM1 (π(B)) so ρ1 (B) − π(B) = (ρ1 (B) − B) + (B − π(B)) ⊥ TM1 ∩M2 (π(B)). By Proposition 2.3 this implies that π(B) = π(ρ1 (B)), as desired. By Lemma 4.2, ρ2 (B) in place of B , which yields the above argument also applies to the point π(ρ2 (B)) = π(ρ1 (ρ2 (B))). The desired second identity follows by reversing the roles of 1 and 2. We nally have all the necessary ingredients. Proof of Theorem 4.1. B(A, s0 ), By the choice of and hence we can pick C>0 s0 and Proposition 2.3, π is C2 on such that kπ(B) − π(B 0 )k ≤ CkB − B 0 k √ 1+ 0 < 2, and set for all B, B ∈ B(A, s0 ). Now x such that C4 < ε and √ 1−2 s = s1 . By Lemma 4.4 we have π(B) = π(ρj (B)), and Lemma 4.2 guarantees that πj (B), ρj (B) ∈ B(A, s0 ). By (4.7) and Proposition 4.3 we get (4.7) kπ(πj (B)) − π(B)k = kπ(πj (B)) − π(ρj (B))k ≤ Ckπj (B) − ρj (B)k ≤ √ C4 kB − π(B)k < εkB − π(B)k, as desired. M1 ∩ M2 is reduced in proportion to the M1 onto M2 or vice versa. Recall the function Next we show that the distance to angle, each time we project from σ(A) from Denition 3.1. Theorem 4.5. For each M2 ∩ B(A, s) we have c > σ(A) there exists an s > 0 such that for all B ∈ kπ1 (B) − π(B)k < ckB − π(B)k. Moreover the same holds true with the roles of M1 and M2 reversed. Lemma 4.6. Let c̃ > 1 and ˜ > 0 be given. If E, F kE − F k < ˜, then kEk < ˜ Proof. If kEk < ˜ we ∈ K satises kEk > c̃kF k and c̃ . c̃ − 1 are done, otherwise c̃ < kEk kEk < kF k kEk − ˜ which easily gives the desired estimate. Proof of Theorem 4.5. Fix c1 such that σ(A) < c1 < c and pick an s1 < s0 that sup{σ(C) : C ∈ M1 ∩ M2 ∩ B(A, s1 )} < c1 , such 14 FREDRIK ANDERSSON AND MARCUS CARLSSON σ is continuous by Proposition 3.4. Let c2 > 1 be such c2 c1 < c. By the choice of s0 and Lemma 4.2, ρ1 and π are C 1 -functions on B(A, s0 ), and hence we can pick C > 0 such that which we can do since that kρj (B) − ρj (B 0 )k ≤ CkB − B 0 k (4.8) for all (4.9) Then kπ(B) − π(B 0 )k ≤ CkB − B 0 k and B, B 0 ∈ B(A, s0 ). We now x such that √1+ < 2 and 1−2 √ √ c2 < c and (1 + 4 )c2 c1 < c. 4 (1 + C) c2 − 1 x s < s1 such that and let B ∈ M2 ∩ B(A, s). π(B(A, s)) ⊂ B(A, s1 ), There is no restriction to assume that we do from now on. Note that π(B) ∈ M1 ∩ M2 ∩ B(A, s1 ) π(B) = 0, which so σ(0) = σ(π(B)) < c1 . (4.10) Setting D = π1 (B), the desired inequality takes the form TM1 (0) A TM2 (0) M2 M1 B′ M1 ∩ M2 D′ B D Figure 8. Proof illustration (4.11) 0 0 0 kDk/kBk < c. B = ρ2 (B) and D = ρ1 (B ), (see Figure 8). Note that B ∈ B(A, s1 ) whereas D, B 0 , D0 ∈ B(A, s0 ) by Lemma 4.2. By Lemma 4.4 it then follows that 0 = π(ρ2 (B)) = π(B 0 ) and 0 = π(ρ1 (ρ2 (B))) = π(D0 ) so Put D0 = PTM1 (0) (B 0 ) = PTM1 (0) PTM2 (0) (B 0 ), since B 0 ∈ TM2 (0). B 0 ⊥ TM1 ∩M2 (0) D0 = PTM1 (0) PTM2 (0) − PTM1 ∩M2 (0) B 0 . By Proposition 2.3 we also have By Proposition 3.4 and (4.10) we thus have (4.12) kD0 k ≤ kPTM1 (0) PTM2 (0) − PTM1 ∩M2 (0) k = σ(0) < c1 , kB 0 k and hence ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. which should be compared with (4.11). We now show that First, note that by Proposition 4.3 B0 ≈ B and 15 D0 ≈ D. √ kB − B 0 k = kπ2 (B) − ρ2 (B)k < 4 kBk. (4.13) Moreover, by (4.8), (4.13) and Proposition 4.3 we have that kD0 − Dk = kρ1 (B 0 ) − π1 (B)k ≤ kρ1 (B 0 ) − ρ1 (B)k + kρ1 (B) − π1 (B)k ≤ √ √ ≤ CkB 0 − Bk + 4 kBk < 4 (1 + C)kBk √ 0 Apply Lemma 4.6 with E = D , F = D , ˜ = 4 (1 + C)kBk and c̃ = c2 . We see that either kDk ≤ c2 kD0 k (4.14) √ 2 kBk, in which case we are done since kDk < 4 (1 + C) c2c−1 than c by (4.9). We thus assume that (4.14) holds. Note that √ kB 0 k (4.15) ≤1+4 kBk or the constant is less by (4.13). Combining (4.12), (4.14) and (4.15) we get √ kDk kDk kB 0 k kD0 k = < c2 (1 + 4 )c1 < c, 0 0 kBk kD k kBk kB k where the last inequality follows by (4.9). 5. Alternating projections We are nally ready for the main theorem. Theorem 5.1. Let M1 , M2 and M1 ∩ M2 be C 2 -manifolds and let A ∈ M1 ∩ M2 be a non-tangential intersection point of M1 and M2 . Given > 0 and 1 > c > σ(A), there exists an r > 0 such that for any B ∈ B(A, r) the sequence of alternating projections B0 = π1 (B), B1 = π2 (B0 ), B2 = π1 (B1 ), B3 = π2 (B2 ), . . . (i) (ii) (iii) Proof. converges to a point B∞ ∈ M1 ∩ M2 kB∞ − π(B)k ≤ kB − π(B)k kB∞ − Bk k ≤ const · ck kB − π(B)k We may clearly assume that (5.1) and Theorem 4.5 with c < 1. Invoking 1−c , ε= 3−c Theorem 4.1 with as above, gives us two possibly distinct radii. denote the minimum of the two and pick r< (5.2) such that (5.3) π(B(A, r)) ⊂ B(A, s/4). s(1 − ) 4(2 + ) The latter condition ensures that kπ(B) − Ak < s/4. We let s 16 Let FREDRIK ANDERSSON AND MARCUS CARLSSON l = kB − π(B)k and observe that (5.4) As π(B) ∈ M1 ∩ M2 l ≤ kB − Ak + kA − π(B)k ≤ r + s/4. we have kB0 − Bk = kπ1 (B) − Bk ≤ kπ(B) − Bk = l and kB0 − π(B0 )k ≤ kB0 − π(B)k ≤ kB0 − Bk + kB − π(B)k ≤ 2l. (5.5) Applying Theorem 4.5 we get kBk − π(Bk )k ≤ kBk − π(Bk−1 )k ≤ ckBk−1 − π(Bk−1 )k as long as {Bk }k−1 k=0 ⊂ B(A, s), (5.6) which will be established by induction. First note that kB0 − Ak ≤ kB0 − Bk + kB − Ak ≤ l + r ≤ 2r + s/4 ≤ by (5.2), (5.4) and (5.5), and hence (5.6) holds for for a xed k ≥ 1. k = 1. s(1 − ) s + <s 2(2 + ) 4 Now assume that it holds We then get (5.7) kBk − π(Bk )k ≤ ck kB0 − π(B0 )k ≤ 2lck and by Theorem 4.1 we also have kπ(Bk ) − π(Bk−1 )k ≤ εkBk−1 − π(Bk−1 )k ≤ ε(2lck−1 ). (5.8) By the triangle inequality and Theorem 4.1 we get kπ(Bk ) − π(B)k ≤ kπ(B) − π(B0 )k + (5.9) εl + k X ε(2lck−1 ) < εl + j=1 k X j=1 kπ(Bj ) − π(Bj−1 )k ≤ 3−c 2εl = εl = l, 1−c 1−c where the last inequality follows by (5.1). Combining this with (5.3), (5.4), (5.7) we also have kA − Bk k ≤ kA − π(B)k + kπ(B) − π(Bk )k + kπ(Bk ) − Bk k < s/4 + l + 2l ≤ 3+ 3+ 1− ≤ s/4 + (r + s/4) + 2(r + s/4) = s + (2 + )r < s+ s = s, 4 4 4 where the last inequality follows by (5.2). This shows that (5.6) holds for k := k + 1 and so (5.6) is true for all k ∈ N by induction. In particular all the above estimates hold true for any k ∈ N. ∞ By (5.8) we see that the sequence (π(Bk ))k=1 is a Cauchy sequence, and hence ∞ it converges to some point B∞ . By (5.7) the sequence (Bk )k=1 must also converge, and the limit point is again B∞ , which thus satises B∞ = π(B∞ ) since π is continuous. Hence (i) is established, and (ii) follows by taking the limit in (5.9). For (iii), note that by a similar calculation to (5.9) we have ∞ X ε2lck , kπ(Bk ) − B∞ k ≤ kπ(Bj ) − π(Bj−1 )k ≤ 1−c j=k+1 which combined with (5.7) gives kBk − B∞ k ≤ kBk − π(Bk )k + kπ(Bk ) − B∞ k ≤ 2lck + ε2lck 2 − 2c + 2ε k = c l, 1−c 1−c ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. 17 as desired. 6. Non-tangentiality on real algebraic varieties In this section we suppose that with Rn . K has been given a basis and thus identify it In many applications the sets M1 , M2 and M are actually real algebraic varieties, that is, they can be dened by the vanishing of a set of polynomials on Rn . To make the distinction explicit we denote them V1 , V2 and V, respectively. This poses a few additional problems, but we shall see in the end of this section that these can be overcome and moreover, that non-tangentiality at one single intersection point implies non-tangentiality at the vast majority of points in V, in a sense to be specied below. This result is interesting for applications since it says that the (a priori not known) closest point(s) to a given B ∈ Rn is likely to be non-tangential, so that Theorem 5.1 applies, given that we are close enough. The rst obstacle is that algebraic varieties have singular points, and are therefore not manifolds. This is true even for complex algebraic varieties, which of course can be regarded as real by separating the real and imaginary parts. Fortunately, we have the following theorem by H. Whitney [53]. Theorem 6.1. Given a real algebraic variety V ⊂ Rn , we can write V = ∪m j=0 Mj where each Mj is either void or a C ∞ -manifold of dimension j . Moreover, each Mj , j = 0, . . . , m contains at most a nite number of connected components. For example, consider the Cartan umbrella V given by z(x2 + y 2 ) − x3 = 0 in R3 . The main part of the variety is a manifold of dimension 2, but it also contains the z−axis as a submanifold of complex dimension 1. Thus, in terms of Theorem 6.1, we have V = M1 ∪ M2 M2 = {(x, y, z) : (x, y) 6= 0 (6.1) and where M1 = {0} × {0} × R. and z = x3 /(x2 + y 2 )} For more interesting examples in this direction we refer to [4], Chapter 3. Theorem 6.1 shows that the main part of V is a manifold, and we can apply the theory from the previous sections since the statements are local. that ∪m−1 j=0 Mj V . We will m−1 m = 1, ∪j=0 Mj makes up a very small part of statement, but note that in the case nite collection of points to be compared with the curve is obtained by taking V = Rn . We note not try to quantify this would correspond to a M1 . The information we have about Another example ∪m−1 j=0 Mj is then m−1 much stronger than e.g. saying that ∪j=0 Mj has Lebesgue measure zero, which is a common criterion for negligible sets, see for example Sard's theorem [45]. Our next aim is to nd conditions on two real algebraic varieties V1 and V2 which guarantee that Theorem 5.1 can be applied to the majority of points in V = V1 ∩ V2 , in the sense that the exceptional set is a union of manifolds of lower dimension with nitely many connected components. To do this, we need to connect Theorem 6.1 with classical algebraic geometry in Cn . This is also done in the seminal paper [53], which has given rise to a rich theory on decomposition of various sets into manifolds, now known as (Whitney) stratication theory. For an overview and historical account we refer to [50]. However, although Propositions 6.2-6.4 below are implicitly shown in Sections 8-11 in [53], we have not been able to nd a good reference for all the results we need here. Hence, for completeness 18 FREDRIK ANDERSSON AND MARCUS CARLSSON we provide proofs in Appendix B. The material in this section is self-contained and it is not necessary to know algebraic geometry in order to apply the results below. Usually, by an algebraic variety one refers to a subset of Cn dened as the common zeroes of a set of (analytic) polynomials (with complex coecients). To distinguish between these varieties and real algebraic varieties, we will call them complex algebraic varieties or simply complex varieties. Note that all complex varieties are real varieties, but not conversely. However, if we identify subset of C n in the usual way, then each real variety VZar , dened vanish on V . Given V Cn as the subset in all polynomials that a real algebraic manifold V. with a has a related complex variety given by its Zariski closure denote the set of real polynomials that vanish on Rn of common zeroes to V IR (V) we let It is easy to see that VZar = {z ∈ Cn : p(z) = 0, ∀p ∈ IR (V)}. (6.2) There are several equivalent ways of dening the dimension of a complex algebraic variety, see e.g. [15, 47]. The below proposition basically states that these coincide with the maximal non-trivial dimension in any Whitney composition. Proposition 6.2. Let V be a real algebraic variety and let V = ∪m j=0 Mj be a decomposition as in Theorem 6.1, with Mm 6= ∅. Then m equals the algebraic dimension of VZar . Hence, although Whitney decompositions are not unique (consider e.g. (R \ {0}) ∪ {0}), the maximal dimension of its components is. The number V. henceforth be called the dimension of R = m will For example, it can be veried that the Zariski closure of the Cartan umbrella equals the variety in C3 dened by the same equation, and that this variety has algebraic dimension 2, as predicted by the above proposition and (6.1). V is irreducible if there does not exist any non-trivial decompositions V = V1 ∪V2 , where V1 and V2 are algebraic varieties. A simple condition We say that of the form which guarantees this (and involves no algebraic geometry) is given in Proposition 6.8. We say that a point A∈V is non-singular if it is non-singular in the sense of algebraic geometry as an element of VZar . The theory becomes much simpler if we restrict attention to irreducible varieties, which we do from now on. Let denote the gradient operator and set this is a linear subspace of Rn . NV (A) = {∇p(A) : p ∈ IR (V)}. ∇ to Note that Proposition 6.3. Let V ⊂ Rn be an irreducible real algebraic variety of dimension m. Then dim NV (A) ≤ n − m for all A ∈ V and A is non-singular if and only if The subset of V dim NV (A) = n − m. of non-singular points will be denoted Proposition 6.4. Let V ns . V be an irreducible real algebraic variety of dimension m. Then the decomposition V = ∪m j=0 Mj in Theorem 6.1 can be chosen such that V ns = Mm . Moreover, given A ∈ V ns we have (6.3) TV ns (A) = (NV (A))⊥ . Note that the counterpart of (6.3) in the complex setting also holds, (this will follow in Appendix B). We remark that in this case, the right hand side is the denition of the tangent space of V in algebraic geometry, so (6.3) implies that the algebraic tangent space coincides with the dierential geometry tangent space at non-singular points. ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. 19 We now return to the issue of alternating projections. If we have irreducible varieties V1 , V2 , V of dimensions m1 , m2 , m and restrict attention to V1ns , V2ns , V ns , it makes sense to talk about non-tangential intersection points and Theorem 5.1 can be applied. We suppose that one of the varieties is not a subset of the other. Then all intersection-points are non-trivial, i.e. V1ns the angle between exists. V2ns and Proposition 6.5. Suppose that V1 and V2 are irreducible real algebraic varieties and that V = V1 ∩ V2 is irreducible and strictly smaller than both V1 and V2 . Then each point in V1ns ∩ V2ns ∩ V ns is a non-trivial intersection point. The next theorem shows that in this setting, the majority of points are nontangential if one is. It also gives a simple criterion under which this is the case. Denote by V ns,nt ⊂ V the set of all points in ns with respect to the manifolds V1 and V2ns . V ns ∩V1ns ∩V2ns that are non-tangential Theorem 6.6. Suppose that V1 and V2 are irreducible real algebraic varieties and that V = V1 ∩ V2 is irreducible and strictly smaller than both V1 and V2 . Let the dimension of V be m. If V ns,nt 6= ∅, then V \ V ns,nt is a real algebraic variety of dimension strictly less than m. A sucient condition for this to happen is that there exists a point A ∈ V1ns ∩ V2ns such that dim (TV1ns (A) ∩ TV2ns (A)) ≤ m. (6.4) Combined with Theorem 6.1 and Proposition 6.2, the theorem implies that V1ns,nt C ∞ -manifolds is a union of nitely many disjoint connected strictly less than of points in V. m. V\ of dimension In other words, the good points make up the vast majority Note that (6.4) is equivalent with dim (TV1ns (A) + TV2ns (A)) ≥ m1 + m2 − m by basic linear algebra. Theorem 6.6 thus provides a concrete condition under which Theorem 5.1 applies to a majority of points in preconditions of Theorem 6.6. V. We now provide concrete means to verify the Denition 6.7. Suppose we have given a j ∈ N and an index set I such that for each i ∈ I , there exists an open connected Ωi ⊂ Rj and a real analytic map φi : Ωi → V . We will say that V can be covered with analytic patches if, for each A ∈ V , there exists an i ∈ I and a radius rA such that V ∩ BRn (A, rA ) = Im φi ∩ BRn (A, rA ). Note that we put no restriction on the rank of dφi at any point, so that Rj may contain redundant variables and degenerate points. Proposition 6.8. Let V be a real algebraic variety. If V is connected and can be covered with analytic patches, then V is irreducible. The same conclusion holds if a dense subset of V can be given as the image of one real analytic function φ. To apply Theorem 6.6, one needs to know the various dimensions involved. This can sometimes be tricky. For example, the curve y 2 = x2 (x−1) in R2 has dimension 1 but an isolated point at the origin. Thus a local analysis of the surface may lead to false conclusions concerning the dimension. A more complex example is provided by the Cartan umbrella discussed earlier. We end this section with a simple method 20 FREDRIK ANDERSSON AND MARCUS CARLSSON of determining the dimension. An alternative is to work with quotient manifolds, as explained in [1], but we will not pursue this here. Proposition 6.9. Under either of the assumptions of Proposition 6.8, suppose in addition that an open subset of V is the image of a bijective real analytic map dened on a subset of Rd . Then V has dimension d. A nal remark. Suppose that case one may of course identify K C to begin with is a vector space over with R2 C. In this and apply the results of this section. However, all subspaces considered above then become closed over C and all results have a natural complex counterpart where e.g. dimension refers to the dimension over C. We leave these reformulations as well as their proofs to the reader. 7. An example We provide a simple example in which Theorem 5.1 applies; Given an symmetric matrix lies closest to choose K A B and an integer k, we seek the rank matrices with 1's on the diagonal, and k, n×n correlation matrix that with respect to the Hilbert-Schmidt norm. For this problem, we n × n-matrices, V1 to be the set of real symmetric or equal to k where k < n. V2 the ane subspace of the subset of matrices with rank less than This example is also considered in [11] as well as in [2], where the reformulation suggested in [25] is used so that it ts the framework of [2]. For the alternating projections method to be useful, we need fast implementations of the projections π1 and π2 . Since V1 is π2 , to compute and well-dened everywhere. For an ane subspace, π1 is trivial a theorem essentially due to E. Schmidt [46], sometimes attributed to Eckart-Young [20], states that the closest rank k B is obtained by computing the singular n − k smallest singular values by 0. This Example 5. Note that when the k th sin- (or less) approximation for a given value decomposition and then replace the is well known and explained e.g. in [35], gular value has multiplicity higher than 1, the statement still applies but the best approximation is no longer unique. This is the reason why π2 needs to be a point to set map when dened globally. However recall Proposition 2.3 and Proposition 6.4 which imply that π2 V2 . V = V1 ∩ V2 . is a well dened function near non-singular points of We show that Theorem 5.1 applies to the majority of points in First we need to obtain basic results concerning the structure of these sets. Proposition 7.1. K has dimension Proposition 7.2. V2ns equals all matrices with rank equal to k and n2 +n 2 . V1 is an ane subspace of dimension 2 +k . V2 is an irreducible real algebraic variety of dimension 2nk−k . V is an 2 2nk−k2 +k irreducible real algebraic variety of dimension − n . 2 n2 −n 2 V ns,nt 6= ∅. Combining the above results with Proposition 6.5 and Theorem 6.6 we immediately obtain. Corollary 7.3. The set 2nk−k2 +k 2 V ns,nt of points on V where Theorem 5.1 applies is an − n dimensional manifold. Its complement V \ V ns,nt is a nite set of connected manifolds of lower dimension. ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. 21 −1.2 −1.4 −1.6 kB∞ −π(B)k kkB−π(B)k −1.8 −2 log10 −2.2 −2.4 −2.6 −2.8 −3 −3.2 −1.5 −1 −0.5 0 0.5 1 log10 (kB − π(B)k) Figure 9. Simulations using alternating projections for nding the closest correlation matrix given a data set with missing data elements. The relation between the initial distance to the closest correlation matrix and the distance between the result obtained by alternating projections and the closest correlation matrix is depicted. Each dot shows the result for a simulation with a certain number of missing data points. The proofs are given in Appendix C. We now present a concrete application. Consider random variables X1 , . . . , Xn Xl = (7.1) that arise as k X cl,j Yj , j=1 where {Yj }kj=1 Pk and where are independent random variables with a unit normal distribution, 2 j=1 cl,j = 1. Then the correlation matrix C(l, l0 ) = var(Xl , Xl0 ) = k X C is given by cl,j cl0 ,j , j=1 n × n-matrix in K with rank k and ones on the diagonal, i.e. N ∈ N be given and X ∈ MN,n a matrix where each row is a n T realization of (Xl )l=1 . If N is large we should have X X/N ≈ C . However, in real situations, for instance in nancial applications [27], some of the values X(j, l) are which clearly is an C ∈ V1 ∩ V2 . Let 22 FREDRIK ANDERSSON AND MARCUS CARLSSON not available. For the computation of the correlation matrix corresponding to (7.1), it is then common to replace the missing elements by zeros. Given an index set that indicates the missing data points, let X̃I I denote the matrix corresponding to |I| denote the X̃IT X̃I /N will no longer be in V1 ∩ V2 . The distance between X̃IT X̃I /N and V1 ∩ V2 will tend to increase as the T number of missing data points increases. π(X̃I X̃I /N ) is known to be computable X where the missing elements has been replaced by zeros, and let number of missing elements. Generically, the matrix by using semi denite programming [49, Section 4.4]. We have used the SDPT3 implementation [48, 51]. Figure 9 shows results for 1000 simulations with varying number of missing B = X̃IT X̃I /N . The plot displays logarithmic kB∞ −π(B)k values of the relative error kB−π(B)k as a function of logarithmic values of the 5 distance kB − π(B)k to V1 ∩ V2 . The matrices X are of size 10 × 50 and have 3 6 rank k = 10. The number of missing data points varied linearly from 10 to 10 . elements to generate initial matrices An increasing linear trend between the errors can clearly be seen. This should be compared with Theorem 5.1, which predicts that we should see an increasing function. 8. Appendix A A be a point in K. We will φ(0) = A and that the domain of denition of φ is BRm (0, r) for some r > 0. We also let s be such that (2.1) holds. Proposition 2.1 Let α : K → M be any C p −map, where M is a C p -manifold and p ≥ 1. Then the map φ−1 ◦ α is also C p (on its natural domain of denition). Recall the map φ introduced in Section 2 and let throughout without restriction assume that Proof. Given and dene x0 ∈ BRm (0, r) f1 , . . . , fn−m ∈ K with the ⊥ TM (φ(x0 )) = Span {f1 , . . . , fn−m } pick ωx0 : BRm (0, r) × Rn−m → K property that via ωx0 (x, y) = φ(x) + n−m X yi fi . i=1 By the inverse function theorem ([9] 0.2.22) of (x0 , 0). The proposition now φ−1 ◦ α = ωx−1 ◦ α. 0 ωx0 follows by noting we have Proposition 2.2 Let function of A. Proof. Given A ∈ Im φ, set C p -inverse in a neighborhood that for values of α near φ(x0 ) has a M be a C 1 -manifold. Then PTM (A) is a continuous M = dφ(φ−1 (A)). It is easy to see that PTM (A) = M (M M )−1 M ∗ . The conclusion now follows as ∗ dφ and φ−1 are continuous. Propositions 2.3 and 2.4 are a bit harder. We begin with a lemma. Lemma 8.1. If B ∈ K is given and A is the closest point in M, then B − A ⊥ TM (A). Moreover, kφ(x) − φ(y)k/kx − yk is uniformly bounded above and below for x, y in any B(0, r0 ), r0 < r. ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. Proof. 23 Note that φ(x) = A + dφ(0)x + o(x) where o stands for a function with the property that o(x)/kxk extends by continuity to 0 and takes the value 0 there. Thus we have kφ(x) − Bk2 = kA + dφ(0)(x) + o(x) − Bk2 = = kA − Bk2 + 2hA − B, dφ(0)xi + o(kxk) x's. and hence the scalar product needs to be zero for all w = (φ(y) − φ(x))/kφ(y) − φ(x)k and apply hφ(x + (y − x)t) − φ(x), wi to conclude that set For the second claim, the mean value theorem to γ(t) = kφ(y) − φ(x)k = γ(1) − γ(0) = hdφ(z)(y − x), wi z for some on the line between the singular values of dφ(z), x and y. Letting σ1 (dφ(z)), . . . , σm (dφ(z)) {σm (dφ(z))}ky − xk ≤ kφ(y) − φ(x)k ≤ inf z∈BRm (0,r 0 ) Now, dφ(z) depends continuously on z on the matrix entries [21, p191]. Since for all sup x ∈ B(0, r), denote we thus have sup {σ1 (dφ(z))}ky − xk. z∈BRm (0,r 0 ) and its singular values depend continuously φ so by compactness of is an immersion we have cl(B(0, r0 )) σm (dφ(x)) 6= 0 inf and we get that both the amount to nite positive numbers, as desired. Since s s0 in the below formulation. M be a C 2 -manifold. Given any xed A ∈ M, there has been specied above, we work with Proposition 2.3. Let exists s0 > 0 and a C 2 -map π : BK (A, s0 ) → M such that for all B ∈ BK (A, s0 ) there exists a unique closest point in M which is given by π(B). Moreover, C ∈ M ∩ BK (A, s0 ) equals π(B) if and only if B − C ⊥ TM (C). Proof. e.g. By standard dierential geometry there exists an C 1 -functions f1 , . . . , fn−m : BRm (0, r0 ) → K with the property that ⊥ TM (φ(x)) = Span {f1 (x), . . . , fn−m (x)} for all M, r0 < r We repeat the standard construction of a tubular neighborhood of [9] or [40]). (see and x ∈ BRm (0, r0 ). Moreover, applying the Gram-Schmidt process we may {f1 (x), . . . , fn−m (x)} is an orthonormal set for all x ∈ BRm (0, r0 ) τ : BRm (0, r0 ) × Rn−m → K via assume that Dene τ (x, y) = φ(x) + n−m X yi fi (x). i=1 τ is C1 by construction, so the inverse function theorem implies that there exists such that s0 τ such that (8.1) is a dieomorphism from s0 < r1 /4, 2s0 < s and BRn (0, r1 ) onto a neighborhood of BK (A, 2s0 ) ⊂ τ (B(0, r1 /2)). A. r1 Choose 24 FREDRIK ANDERSSON AND MARCUS CARLSSON B ∈ BK (A, s0 ) there thus exists a τ ((xB , yB )) and k(xB , yB )k ≤ r1 /2. We dene Given any unique (xB , yB ) such that B = π(B) = φ(xB ). To see that π note that on 1 C -map, let θ : Rm × Rn−m → Rm B(A, s0 ) we have is a be given by θ((x, y)) = x and π = φ ◦ θ ◦ (τ |BRn (0,r1 ) )−1 . For the fact that π is actually C 2, (which is not needed in this paper), we refer π(B) have the desired properties. C ∈ M is a closest point to B . Since kA − Bk < s0 we clearly must have kC − Ak < 2s0 so by (2.1) and (8.1) there exists a xC ∈ BRm (0, r1 /2) with φ(xC ) = C . Moreover, by Lemma 8.1 B − C ⊥ TM (C) so (by the orthonormality of the f 's n−m 0 at xC ) there exists a y ∈ R with τ (xC , y) = B and kyk = kB − Ck < s < r1 /4. Since k(xC , y)k ≤ r1 /2 + r1 /4 and τ is a bijection in B(0, r1 ), we conclude that (xC , y) = (xB , yB ). Hence π(B) = φ(xB ) = φ(xC ) = C . This establishes the rst to [31] or Sec. 14.6 in [23]. We now show that Suppose part of the proposition. π(B) − B ∈ Span {f1 (xB ), . . . , fn−m (xB )} is orthogonal TM (φ(xB )) = TM (π(B)). Conversely, let C be as in the second part of the proposition. As above we have C = φ(xC ) with kxC k < r1 /2 and there exists a y with B = τ ((xC , y)) and kyk = kB − Ck < 2s0 < r1 /2. As earlier, this implies C = π(B), as desired. By the construction, to Proposition 2.4 Let M be a locally C 2 -manifold at A. For each > 0 there exists s > 0 such that for all C ∈ BK (A, s ) ∩ M we have (i) (ii) Proof. dist(D, T̃M (C)) < kD − Ck, D ∈ B(A, s ) ∩ M. dist(D, M) < kD − Ck, D ∈ B(A, s ) ∩ T̃M (C). We rst prove dφ(xC )(xD − xC )) (i). s < s. Set w = D − (C + dist(D, T̃M (C)) ≤ kwk. Apply the mean value We may clearly assume that and note that theorem to the function D w E γ(t) = φ (xD − xC )t + xC − φ(xC ) − dφ(xC )(xD − xC )t, kwk y on the line [xC , xD ] between xC and xD such D w E kwk = γ(1) − γ(0) = dφ(y + xC ) − dφ(xC ) (xD − xC ), . kwk to conclude that there exists a that By the Cauchy-Schwartz's inequality we get dist(D, T̃M (C)) ≤ kdφ(y + xC ) − dφ(xC )kkxD − xC k for some y ∈ [xC , xD ]. By Lemma 8.1 and the equicontinuity of continuous functions r < r such that dist(D, T̃M (C)) < kD − Ck for xD , xC ∈ B(0, r ). As in the proof of Proposition 2.3 we can now dene τ and use it to pick an s such that xC , xD ∈ B(0, r ) whenever C, D ∈ B(A, s ). Hence (i) holds. The proof of (ii) is similar. We assume without restriction that C ∈ Im φ and let y be such that D = C +dφ(xC )y . We may chose s small enough that kxC +yk < r, on compact sets, we can choose an all ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. and then clearly dist(D, M) ≤ kφ(xC + y) − Dk. Setting applying the mean value theorem to γ(t) = we easily obtain φ(xC + yt) − C − dφ(xC )yt, 25 w = φ(xC + y) − D w kwk and dist(D, M) ≤ kwk ≤ kdφ(xC + t0 y) − dφ(xC )kkyk for some t0 ∈ [0, 1]. We omit the remaining details. 9. Appendix B When dealing with concepts such as dimension or the ideal generated by a variety, we will use R and C as subscripts when there can be confusion about which eld is involved. To begin, a short argument shows that IC (VZar ) = IR (V) + iIR (V), (9.1) from which (6.2) easily follows. (∂z1 , . . . , ∂zn ) where ∂zj In case we work over C, we dene ∇ as ∇ = refers to the formal partial derivatives, or, which is the same, the standard derivatives for analytic functions (see e.g. Denition 3.3, Ch. II of [30]). We need the following classical result from algebraic geometry. See e.g. Theorem 3 and the comments following it, Ch. II, Sec. 1. [47]. Theorem 9.1. Let V be an irreducible complex algebraic variety with algebraic dimension d. Then dim C Span C {∇p(A) : p ∈ IC (V)} < n − d for all singular points A ∈ V and dim C Span C {∇p(A) : p ∈ IC (V)} = n − d for all non-singular points A ∈ V . Moreover the set of singular points form a complex variety of lower dimension. P Rn If (in is a set of polynomials, we will write or Cn V(P ) for the variety of common zeroes S1 and S2 and a common depending on the context). Given two sets loc A, we will write that S1 = S2 near A, if there exists an open set U containing A such that S1 ∩ U = S2 ∩ U . Theorem 9.1 shows that for an irreducible variety, the point non-singular points coincide with those of maximal rank in the work of H. Whitney [53]. This observation combined with Sections 6 and 7 of that paper yields the following key result. Theorem 9.2. Let V be an irreducible complex algebraic manifold and A a nonsingular point. Given any p1 , . . . , pn−d ∈ I(V) such that ∇p1 (A), . . . , ∇pn−d (A) loc are linearly independent, we have V = V({p1 , . . . , pn−d }) near A. The remark after Proposition 6.4 now easily follows from the above results and the complex version of the implicit function theorem (see e.g. Theorem 3.5, Ch. II, [30]). In order to transfer these results to the real setting, we rst need a lemma. Lemma 9.3. A real algebraic variety V is irreducible in Rn if and only if VZar is irreducible in Cn . Proof. V Rn into smaller varieties as U1 ∪ U2 , (U1 )Zar ∪ (U2 )Zar is a nontrivial decomposition of VZar . Conversely, n let VZar = U1 ∪ U2 be a nontrivial decomposition of VZar in C . Let IC (Ui ) denote the respective complex ideals, i = 1, 2, and set Ii = (Re IC (Ui )) ∪ (Im IC (Ui )). The n real variety corresponding to each Ii clearly coincides with Ui ∩ R , which thus is If has a nontrivial decomposition in then clearly 26 FREDRIK ANDERSSON AND MARCUS CARLSSON a variety in Rn . Since VZar is U1 or U2 , and of V . the smallest complex variety including subset of either decomposition hence V = (U1 ∩ Rn ) ∪ (U2 ∩ Rn ) V, V is not a is a nontrivial real Lemma 9.4. Let V be an irreducible real algebraic variety such that VZar has algebraic dimension d. Then dim R NV (A) < n − d for all singular points A ∈ V and dim R NV (A) = n − d for all non-singular points A ∈ V . Proof. VZar is irreducible, by Lemma 9.3, and VZar . Given any real vector space V , it follows by dimR V = dimC (V + iV ). By (9.1) we thus get First note that applies to hence Theorem 9.1 linear algebra that dim R Span R {∇p(A) : p ∈ IR (V)} = dim C Span {∇p(A) : p ∈ IC (V)} for all A ∈ Rn , and hence the lemma follows by Theorem 9.1. Lemma 9.5. Let V be an irreducible real algebraic variety such that VZar has algebraic dimension d. Then V ns is a C ∞ -manifold of dimension d and V \ V ns is a real algebraic manifold such that (V \ V ns )Zar has algebraic dimension strictly less than d. Proof. By Lemma 9.3, singular points of d. Hence V \ V ns VZar VZar is irreducible. By Theorem 9.1, we have that the form a proper subvariety of dimension strictly less than is included in a complex subvariety of lower dimension than d. Since the algebraic dimension decreases when taking subvarieties, we conclude that dim(V \ V ns )Zar < d. If V ns would be empty, (V \ V ns )Zar would include all of V , contradicting the denition of VZar as the smallest complex variety including V . Finally, given A ∈ V ns , Theorem 9.2 and Lemma 9.4 imply that we can nd p1 , . . . , pn−d ∈ IR (V) with linearly independent derivatives at A such that loc (9.2) near A. The fact that from Theorem 2.1.2 (ii) V ns V ns = V({p1 , . . . , pn−d }) is a d-dimensional C ∞ -manifold now follows directly in [9]. We are now ready to prove the results in Section 6. Proof of Proposition 6.2 and 6.4. have algebraic dimension d. We begin with Proposition 6.2. d, and (V \ V ns )Zar can is a non-empty manifold of dimension strictly less than d. Thus V is irreducible. (V \ V ns )Zar has First assume that Let VZar V ns By Lemma 9.5, algebraic dimension be decomposed into nitely many irreducible components of dimension strictly less than d, (see Section 4.6 and 9.4 in [15]). Lemma 9.5 can then be applied to the real part of each such component. Continuing like this, the dimension drops at each step and hence the process must terminate. This process will give us a decomposition has dimension j. V = ∪dj=0 Mj where each Mj Now, such decompositions are not unique, but basic dierential geometry implies that the number d is an invariant of V. V = ∪m j=0 M̃j m > d. Let φ be Indeed, let M̃m 6= 0, and suppose that M̃m , as in (2.1). φ is then dened on an open subset φ−1 (Mj ) are manifolds of dimension strictly less than be another such decomposition with a chart covering a patch of U of Rm , and the subsets m. Hence each one has Lebesgue measure zero, which his not compatible with that their union should equal U. Reversing the roles of m and d, Proposition 6.2 follows ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. 27 in the irreducible case. Incidentally, we have also shown the rst part of Proposition 6.4. If V is not irreducible, we can apply the above argument to each of its irreducible components. Since the dimension of VZar is the maximum of the dimension of its components (Proposition 8, Sec. 9.6, [15]), Proposition 6.2 follows as above. A ∈ V ns , the identity (6.3) follows by (9.2) and the implicit function theorem. This establishes the second part of Proposition 6.4 and we are done. Proof of Proposition 6.3. This is now immediate by Lemma 9.4. Proof of Proposition 6.5. VZar is a strict submanifold of both (V1 )Zar and (V2 )Zar , and all three are irreducible by assumption and Lemma 9.3. Hence VZar Finally, with has strictly lower dimension than the other two, by basic algebraic geometry, see e.g. Theorem 1, Chap I, Sec. 6, in [47]. By Lemma 9.5 the manifold ns and lower dimension than both V1 intersection at any intersection point 3.2 means that the angle at Proof of Theorem 6.6. A V2ns . V ns has strictly This means that these have a proper A ∈ V ns ∩ V1ns ∩ V2ns , which by Proposition is dened, as desired. By Proposition 6.5, all points non-trivial, (i.e. the angle between V1ns and V2ns A ∈ V ns ∩ V1ns ∩ V2ns are exists). We rst prove the latter statement. By Proposition 6.4, (6.4) implies that dim (NV1 (A) + NV2 (A)) ≥ n − m. (9.3) Since obviously NV (A) ⊃ NV1 (A) + NV2 (A), (9.4) dim(NV (A)) ≥ n − m, we have which combined with Lemma 9.4 and Proposition 6.2 shows that in fact we must have A ∈ V ns dim(NV (A)) = n − m. Moreover, NV (A) = NV1 (A) + NV2 (A) which, and combined with (9.3) and (9.4) this implies that upon taking the complement and recalling (6.3), yields TV1ns (A) ∩ TV2ns (A) = TV ns (A). A is non-tangential, so A ∈ V ns,nt . For the ns ns ns rst statement, we note that V \ (V1 ∩ V2 ∩ V ) = (V \ V1ns ) ∪ (V \ V2ns ) ∪ (V \ V ns ) ns ns and V \ Vi = V ∩ (Vi \ Vi ) for i = 1, 2. Hence V \ (V1ns ∩ V2ns ∩ V ns ) is a real algebraic variety by Lemma 9.5 and the trivial fact that unions and intersections ns ns ns ns,nt of varieties yield new varieties. Now, suppose A ∈ V1 ∩ V2 ∩ V is not in V . Then it is tangential, which by the earlier arguments happens if and only if By Proposition 3.5 we conclude that dim(NV1 (A) + NV2 (A)) < n − m. (9.5) By Hilbert's basis theorem we can pick nite sets is generated by these sets for of each {pj,l }j,l l = 1, 2. Let M {pj,1 } and {pj,2 } such that IR (Vl ) be the matrix with the gradients as columns. The condition (9.5) can then be reformulated as the vanishing of the determinants of all (n − m) × (n − m) each such determinant is a polynomial, we conclude that submatrices of V \ V ns,nt M. Since is dened by the vanishing of a nite number of polynomials, so it is a real algebraic variety. Finally, if V ns,nt is not void, then (V \ (V ns,nt ))Zar latter is irreducible by Lemma 9.3, and hence than [47]). VZar IR (V) (V \ (V ns,nt ))Zar by standard algebraic geometry, (see e.g. Proof of Proposition 6.8. if is a proper subvariety of VZar . The has lower dimension Theorem 1, Ch. I, Sec. 6, It is well-known that is prime, i.e. if and only if f g ∈ IR (V) V is irreducible if and only implies that either f ∈ IR (V) or 28 FREDRIK ANDERSSON AND MARCUS CARLSSON g ∈ IR (V) (see e.g. Proposition 3, Chap. we have such a product xed A ∈ V, let i0 ∈ I fg and that V 4, Sec. 5 of [15]). be as in Denition 6.7. Then by analyticity implies that one of the functions, say Note that V Suppose now that is covered by analytic patches. Given any (f ◦ φi0 )(g ◦ φi0 ) ≡ 0, which f ◦ φi0 , vanishes identically. is path connected. To see this, use the analytic patches to show that any path-connected component is both open and closed, which gives the desired conclusion since V is connected (see e.g. Sec. 23-25 in [41]). Now, let other point and let γ be a continuous path connecting A B. with B∈V be any The image Im γ {BRn (C, rC )}C∈Im γ an open covering, where rC is as in Denition L 6.7. We pick a nite subcovering at the points {Cl }l=0 with C0 = A and CL = B . Clearly these can be ordered such that V ∩ B(Cl , rCl ) ∩ B(Cl+1 , rCl+1 ) 6= ∅. Also let is compact and φil of B(Cl , rCl ) in accordance with Denition 6.7, i0 already has been chosen. Let D ∈ V ∩ B(C0 , rC0 ) ∩ B(C1 , rC1 ) be given and let R be a radius such that B(D, R) ⊂ B(C0 , rC0 ) ∩ B(C1 , rC1 ). By assumption, f vanishes on all points of V ∩ B(C0 , rC0 ), and hence f ◦ φi1 vanishes on the open −1 set φi (B(D, R)), which by analyticity means that it vanishes identically on Ωi1 1 (since it is assumed to be connected). By induction it follows that f (B) = 0, and il be the index of the covering where the rst part is proved. The second is a simple consequence of continuity. We omit the details. Proof of Proposition 6.9. that V ns is dense in V. By Proposition 6.8, Consider the case when V V is irreducible. We rst show is covered by analytic patches, (the proof in the second case is easier and will be omitted). nontrivial subvariety by Lemma 9.5, there exists a polynomial V \ V ns Since f V \ V ns is a which vanishes on V \ V ns contains an open set, then the argument in Proposition 6.8 shows that f ≡ 0 on V , a contradiction. d Now, let θ : U → V be the bijection in question, where U ⊂ R and V ⊂ V are ns open. Let m be the dimension of V and pick any A ∈ V ∩ V . By Proposition m 6.4 and (2.1), there exists open sets Ũ ⊂ R and Ṽ ⊂ V containing A and a C ∞ -bijection φ : Ũ → Ṽ . Moreover, by Proposition 2.1, φ−1 ◦ θ is bijective and −1 dierentiable between the open sets θ (Ṽ ) ⊂ Rd and Ũ ⊂ Rd . That m = d is now a well-known consequence of the implicit function theorem. but not on V. However, if 10. Appendix C If we were to write out all the details, this section would get rather long. Considering that it is just an illustration, we will be a bit brief. Proof of Proposition 7.1. Given a matrix A ∈ K, we can consider all elements above and on the diagonal as variables, and the remaining to be determined by AT = A. K is a linear space of dimension n(n+1)/2. The statements V2 is a real algebraic variety, note that B has rank greater than k if and only if one can nd a non-zero (k + 1) × (k + 1) invertible minor (that is, a matrix obtained by deleting n − (k + 1) rows and It follows that concerning V1 are now immediate. To see that columns). The determinant of each such minor is a polynomial, (more precisely, the determinant composed with the map that identies K with R(n 2 +n)/2 ), and is clearly the variety obtained from the collection of such polynomials. Thus a real algebraic variety. The same is true for adding the algebraic equations V2 . We denote by Mi,j the set V = V1 ∩ V2 V2 V2 is since it is obtained by {Bj,j = 1}nj=1 to those dening V2 . We now study of i × j matrices with real entries. By the spectral ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. theorem, each B∈K with Rank B ≤ k U ∈ Mn,k can be written as B = U ΣU T (10.1) where 29 and Σ ∈ Mk,k is a diagonal matrix, and conversely any ma- trix given by such a product has rank less than or equal to k. We see that can be covered with one real polynomial, which by Proposition 6.8 shows that is irreducible and Proposition 6.9 applies. V2 V2 In order to determine the dimension, consider the open subset of matrices with positive eigenvalues. This has the easier U U T , where again U ∈ Mn,k is arbitrary. However, neither this is T bijective. In fact, if B = U U , then any other such parametrization of B is given T T by B = (U W )(U W ) where W ∈ Mk,k is unitary. Pick any B0 = U0 U0 such that the upper k × k submatrix of U0 is invertible, and denote this by V0 . By the theory of QR-factorizations, there is a unique unitary matrix W0 for which V0 W0 is lower parametrization triangular and has positive values on the diagonal [32, Theorem 1, p. 262]. The (k−1)k independent 2 (k−1)k nk− 2 variables, so we can identify the set of such matrices with R . Denote the (k−1)k (k−1)k nk− 2 inverse of this identication by ι : R → Mn,k , and let Ω ⊂ Rnk− 2 be n×k set of matrices that are lower triangular contain nk − the open set corresponding to the matrices with strictly positive diagonal elements. Dene φ : Ω → V2 by φ(y) = ι(y)(ι(y))T . (10.2) φ It is easy to see that is in bijective correspondence with an open set including Thus Proposition 6.9 implies that V2 has 2nk−k2 +k , as desired. 2 We turn our attention to V and rst prove that it can be covered with analytic B0 , and moreover φ is a polynomial. dimension σ : {1, . . . , n} → {1, . . . , k} and τ : {1, . . . , n} → {−1, 1} be given and U ∈ Mn,k where Uj,σj = xj is an undetermined variable whereas all other values are xed. Denote the j th row of U by Uj , and let Σ ∈ Mk,k be a xed T diagonal matrix. Then Uj ΣUj = 1 is a quadratic equation with xj as unknown, patches. Let consider all which may have 0, 1, 2 or innitely many real solutions. Suppose the remaining values of U are such that this has two solutions for all be the solution whose sign coincides with by Ũ τ (j). 1 ≤ j ≤ n, and x xj to Denote the corresponding matrix and note that it is a real analytic function if we now consider the remaining Ũ as variables. These variables and the values on the diagonal of Σ are n(k − 1) + k in number, and so can be identied with points y in an open subset of Rnk−n+k . Let Ω be a particular connected component of this open set. Consider Ũ and Σ as functions of y on Ω, in the obvious way, and set values of ψσ,τ,Ω (y) = Ũ (y)Σ(y)(Ũ (y))T , (10.3) Let I be the set of all possible triples sees that each B ∈V σ, τ, Ω. {ψi }i∈I V 1 is irreducible, but rst we need to is connected. The proof gets very lengthy and technical, so we will only outline the details. The idea is that any matrix ψi , i ∈ I . It V . We is a covering with analytic patches of wish to use Proposition 6.8 to conclude that V By the spectral theorem one easily is in the image of at least one of these maps is now not hard to see that show that y ∈ Ω. B ∈V is path connected with the with all elements equal to 1. To see this, rst note that the subset of that can be represented as (10.1) with all elements of U Hence it suces to nd a path from such an element to non-zero, is dense in 1. Let B V V. be xed. Now, 30 FREDRIK ANDERSSON AND MARCUS CARLSSON all values in Σ are not negative, for then the diagonal values of We can assume that the diagonal elements of Σ1,1 = 1. Pick σ such that σ(j) = k for all Σ j, B would be as well. are ordered decreasingly and that and choose τ and Ω such that the representation (10.1) can be written in the form (10.3). Now, if the second diagonal value in Ω. Σ is negative, we may continuously change it until it is not, without leaving y corresponding to the rst and second column of Ũ can Then the values of be continuously moved until all elements of the rst column are positive. At this point, we can reduce all values of Ũ except the rst column to zero, increasing the rst value of each row whenever necessary to stay in Ω. Then we can move y so that the values in the rst column become the same. Finally, we can let these values increase simultaneously until they reach 1. We have now obtained the matrix and conclude that V 1, is connected and hence also irreducible, as desired. Finally, we shall determine the dimension of V. Consider again the map ι intro- duced earlier, with the dierence that this time the last lower diagonal element in each row is not a variable, but instead determined by the other variables in that row and the constraint that it be strictly positive and that the norm of the row be 1. The number of free variables is thus 2nk−k2 +k 2 − n, and the above construction thus naturally denes a real analytic map on an open subset Denote this map by θ and dene ψ:Ξ→V Ξ of R 2nk−k2 +k −n 2 . by ψ(y) = θ(y)(θ(y))T . It is not hard to see that 6.9 the dimension of V ψ is a bijection with an open subset of V , so by Proposition 2nk−k2 +k − n, as desired. 2 is Proof of Proposition 7.2. By Proposition 6.3 and 7.1 we need to show that dim NV2 (A) = (n2 + n)/2 − (2nk − k 2 + k)/2 = (n − k + 1)(n − k)/2 Rank (A) = k. By the same propositions we already have that dim NV2 (A) ≤ (n − k + 1)(n − k)/2, so it suces to show that this inequality is strict when Rank (A) < k and that the reverse inequality holds when Rank (A) = k . Based on the representation (10.1), it is easily seen that each A ∈ V2 can be written if and only if (10.4) A = U ΣU T Σ and U lie in Mn,n , U is unitary and Σ is diagonal with only 0's after the k th element. In this proof the particular identication of K with 2 2 R(n +n)/2 is important. We dene ω : R(n +n)/2 → K by letting the rst n entries where now both be placed on the diagonal and the remaining ones be distributed over the upper triangular part, but multiplied by the factor √ 1/ 2. Finally, the lower triangular AT = A. This identication will be implicit. For example, if p is a polynomial on K then we will write ∇p instead of the correct ω(∇(p ◦ ω)). Note however that ∇p depends on ω . Now, given a polynomial p ∈ I(V2 ) and C ∈ Mn,n , qC (·) = p(C T · C) is clearly also in I(V2 ). Due to the particular choice of ω , we have T T that ∇qC (B) = C ∇p(C BC)C as can be veried by direct computation. Letting A be xed of rank j ≤ k , it is easy to use (10.4) to produce an invertible matrix C T such that C AC = Ij , where Ij ∈ K is the diagonal matrix whose rst j diagonal part is dened by values are 1 and 0 elsewhere. In particular ∇qC (A) = C T ∇p(Ij )C, ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. which implies that of K 31 dim NV2 (A) = dim NV2 (Ij ). Now, all Mk+1,k+1 subdeterminants I(V2 ) and their derivatives at Ik are easily computed by form polynomials in hand. In this way one easily gets dim NV2 (Ik ) ≥ (n − k + 1)(n − k)/2, k element of V2 is non-singular. Conversely, if j < k , u ∈ Rn as a row-vector and dene the map θu : R → V2 via θu (x) = Ij + xuT u. Letting {el }nl=1 be the standard basis of Rn and considering θel as well as all dierences θel +el0 − θel − θel0 , one easily sees that d n θu (0) : u ∈ R =K Span dx which proves that any rank consider a xed and hence dim NV2 (Ij ) = 0. This shows that Ij is singular, by the remarks in the beginning of the proof. Finally, we shall establish that V ns,nt is not void. By Proposition 6.5, Theorem 6.6, Proposition 7.1 and the rst part of this proposition, it suces to show that 2nk − k 2 + k −n 2 T for some point A which has rank k . We choose the point A = U U where U ∈ Mn×k has a 1 on the last lower diagonal element (i.e. with index (j, j) for j < k and (j, k) for j ≤ k ) of each row, and zeroes elsewhere. Recall the map φ in (10.2) and let dim(TV1 (A) ∩ TV2 (A)) ≤ (10.5) y∈R 2nk−k2 +k 2 U = φ(y). Given (i, j) let E(i,j) be the matrix with a 1 (j, i) and zeroes elsewhere. The partial derivatives of φ at y contain multiples of all E(i,j) with i ≤ j < k , as well as n − k + 1 derivatives related to the last row of U , which we denote by F1 , . . . , Fn−k+1 . These are not hard to compute, but we only need the fact each Fl has precisely one non-zero diagonal value on one of the n − k + 1 last elements on the diagonal, (with distinct positions for distinct l's). Since TV1 (A) = Span {Ei,j : i 6= j}, it is easily seen that on positions be such that (i, j) and Span (TV1 (A) ∩ TV2 (A)) = Span {Ei,j : i < j < k}. n(n−1) These are 2 − (n−k+1)(n−k) in number, and (10.5) follows. 2 11. Acknowledgements This work was supported by the Swedish Research Council and the Swedish Foundation for International Cooperation in Research and Higher Education, as well as Dirección de Investigación Cientíca y Technológica del Universidad de Santiago de Chile, Chile. Part of the work was conducted while Marcus Carlsson was employed at Universidad de Santiago de Chile. We thank Arne Meurman for fruitful discussions. We also would like to thank one of the reviewers for the constructive criticism and useful suggestions that have helped to improve the quality of the paper. References Optimization algorithms on matrix mani- [1] P. A. Absil, R. Mahony, and R. Sepulchre, [2] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, folds, Princeton Univ Pr, 2008. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the KurdykaLojasiewicz inequality, Mathematics of Operations Research, 35 (2010), pp. 438457. 32 [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] FREDRIK ANDERSSON AND MARCUS CARLSSON A generalization of the Friedrichs angle and the method of alternatingprojections, Comptes Rendus Mathematique, 348 (2010), pp. 53 56. S. Basu, R. Pollack, and M.-F. Roy, Algorithms in real algebraic geometry, vol. 10 of Algorithms and Computation in Mathematics, Springer-Verlag, Berlin, 2006. H. H. Bauschke and J. M. Borwein, On the convergence of von Neumann's alternating projection algorithm for two sets, Set-Valued Analysis, 1 (1993), pp. 185212. , Dykstra's alternating projection algorithm for two sets, Journal of Approximation Theory, 79 (1994), pp. 418443. H. H. Bauschke and J. M. Borwein, On projection algorithms for solving convex feasibility problems, SIAM Rev., 38 (1996), pp. 367426. H. H. Bauschke, D. Noll, A. Celler, and J. M. Borwein, An EM algorithm for dynamic SPECT, IEEE TRANS. MED. IMAG, 18 (1999). M. Berger and B. Gostiaux, Dierential Geometry: Manifolds, Curves and Surfaces., Springer Verlag, 1988. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Now Publishers, 2011. S. Boyd and L. Xiao, Least squares covariance matrix adjustment., SIAM J. Matrix Anal. Appl., 27 (2005), pp. 532546. L. M. Brègman, Finding the common point of convex sets by the method of successive projection, Dokl. Akad. Nauk SSSR, 162 (1965), pp. 487490. J. A. Cadzow, Signalenhancement-a composite property mapping algorithm, Acoustics, Speech and Signal Processing, IEEE Transactions on, 36 (1988), pp. 49 62. P. L. Combettes and H. J. Trussell, Method of successive projections for nding a common point of sets in metric spaces, Journal of optimization theory and applications, 67 (1990), pp. 487507. D. A. Cox, J. B. Little, and D. O'Shea, Ideals, varieties, and algorithms: an introduction to computational algebraic geometry and commutative algebra, vol. 10, Springer Verlag, 2007. F. Deutsch, Rate of convergence of the method of alternating projections, Parametric optimization and approximation, (1985), pp. 96107. F. Deutsch, Best approximation in inner product spaces, CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC, 7, Springer-Verlag, New York, 2001. F. Deutsch and H. Hundal, The rate of convergence for the method of alternating projections, ii, Journal of Mathematical Analysis and Applications, 205 (1997), pp. 381405. R. L. Dykstra, An algorithm for restricted least squares regression, Journal of the American Statistical Association, (1983), pp. 837842. C. Eckart and G. Young, The approximation of one matrix by another of lower rank, Psychometrika, 1 (1936), pp. 211218. J. N. Franklin, Matrix theory., Dover Publications in Mineola, N.Y., 2000. K. Friedrichs, On certain inequalities and characteristic value problems for analytic functions and for functions of two variables, Trans. Amer. Math. Soc, 41 (1937), pp. 321364. D. Gilbarg and N. Trudinger, Elliptic partial dierential equations of second order, vol. 224, Springer Verlag, 2001. K. M. Grigoriadis, A. E. Frazho, and R. E. Skelton, Application of alternating convex projection methods for computation of positive Toeplitz matrices, Signal Processing, IEEE Transactions on, 42 (1994), pp. 1873 1875. I. Grubi²i¢ and R. Pietersz, Ecient rank reduction of correlation matrices, Linear algebra and its applications, 422 (2007), pp. 629653. L. G. Gubin, B. T. Polyak, and E. V. Raik, The method of projections for nding the common point of convex sets, USSR Computational Mathematics and Mathematical Physics, 7 (1967), pp. 124. N. J. Higham, Computing the nearest correlation matrix a problem from nance, IMA Journal of Numerical Analysis, 22 (2002), pp. 329343. L. Hörmander, The analysis of linear partial dierential operators. III, Springer, Berlin, 2007. C. Badea, S. Grivaux, and V. M�ller, ALTERNATING PROJECTIONS ON NON-TANGENTIAL MANIFOLDS. 33 Error bounds for the method of alternating projections, Mathematics of Control, Signals, and Systems (MCSS), 1 (1988), pp. 4359. [30] K. Kendig, Elementary algebraic geometry, Springer-Verlag, 1977. [31] S. G. Krantz and H. R. Parks, Distance to C k hypersurfaces, Journal of Dierential Equations, 40 (1981), pp. 116120. [32] P. D. Lax, Linear algebra and its applications, Pure and Applied Mathematics (Hoboken), Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, second ed., 2007. [33] A. Levi and H. Stark, Signal restoration from phase by projections onto convex sets, J. Opt. Soc. Am., 73 (1983), pp. 810822. [34] A. S. Lewis, D. Luke, and J. Malick, Local linear convergence for alternating and averaged nonconvex projections, Foundations of Computational Mathematics, 9 (2009), pp. 485 513. [35] A. S. Lewis and J. Malick, Alternating projections on manifolds, Math. Oper. Res., 33 (2008), pp. 216234. [36] Y. Li, K. Liu, and J. Razavilar, A parameter estimation scheme for damped sinusoidal signals based on low-rank hankel approximation, Signal Processing, IEEE Transactions on, 45 (1997), pp. 481 486. [37] B. Lu, D. Wei, B. L. Evans, and A. C. Bovik, Improved matrix pencil methods, in Signals, Systems Computers, 1998. Conference Record of the Thirty-Second Asilomar Conference on, vol. 2, nov 1998, pp. 1433 1437 vol.2. [38] I. Markovsky, Structured low-rank approximation and its applications, Automatica, 44 (2008), pp. 891909. [39] R. J. Marks II, Alternating projections onto convex sets, in Deconvolution of images and spectra (2nd ed.), P. A. Jansson, ed., Academic Press, Inc., 1997, pp. 476501. [40] S. Montiel, A. Ros, and D. G. Babbitt, Curves and surfaces, vol. 69, American Mathematical Society, 2009. [41] J. R. Munkres, Topology, Prentice Hall, Upper Saddle River, 2nd ed., 2000. [42] N. K. Nikolski, Operators, functions, and systems: an easy reading. Vol. 1, vol. 92 of Mathematical Surveys and Monographs, American Mathematical Society, Providence, RI, 2002. Hardy, Hankel, and Toeplitz, Translated from the French by Andreas Hartmann. [43] V. U. Prabhu and D. Jalihal, An improved ESPRIT based time-of-arrival estimation algorithm for vehicular OFDM systems, in Vehicular Technology Conference, 2009. VTC Spring 2009. IEEE 69th, april 2009, pp. 1 4. [44] R. T. Rockafellar and R. J.-B. Wets, Variational analysis, vol. 317 of Grundlehren der Mathematischen Wissenschaften, Springer-Verlag, Berlin, 1998. [45] A. Sard, The measure of the critical values of dierentiable maps, Bull. Amer. Math. Soc, 48 (1942), pp. 883890. [46] E. Schmidt, Zur theorie der linearen und nichtlinearen integralgleichungen. iii. teil., Math. Ann., 65 (1908), pp. 370399. [47] I. R. Shafarevich, Basic algebraic geometry, Springer-Verlag, New York, 1974. Translated from the Russian by K. A. Hirsch, Die Grundlehren der mathematischen Wissenschaften, Band 213. [48] K. C. Toh, M. J. Todd, and R. H. Tütüncü, SDPT3 a MATLAB software package for semidenite programming, version 1.3, Optimization Methods and Software, 11 (1999), pp. 545581. [49] , On the implementation and usage of SDPT3a matlab software package for semidenite-quadratic-linear programming, version 4.0, Handbook on Semidenite, Conic and Polynomial Optimization, (2012), pp. 715754. [50] D. Trotman, Singularity Theory, Scientic Publishing Co. Pte. Ltd., Oxford, 2007, ch. Lectures on Real Stratication Theory. [51] R. H. Tütüncü, K. C. Toh, and M. J. Todd, Solving semidenite-quadratic-linear programs using SDPT3, Mathematical programming, 95 (2003), pp. 189217. [52] J. von Neumann, Functional Operators, Volume II: The Geometry of Orthogonal Spaces, Princeton University Press, 1950. [53] H. Whitney, Elementary structure of real algebraic varieties, The Annals of Mathematics, 66 (1957), pp. 545556. [29] S. Kayalar and H. L. Weinert, 34 FREDRIK ANDERSSON AND MARCUS CARLSSON The method of alternating projections and the method of subspace corrections in Hilbert space, Journal of the American Mathematical Society, 15 (2002), pp. 573598. [55] W. I. Zangwill, Nonlinear Programming, Prentice Hall, Englewood Clis, N. J., 1969. [54] J. Xu and L. Zikatanov, Centre for Mathematical Sciences, Lund University, Sweden E-mail address : [email protected] Centre for Mathematical Sciences, Lund University, Sweden E-mail address : [email protected]