Download Optimal Spare Capacity Preconfiguration for Faster Restoration of

Journal of Network and Systems M anagement, Vol. 5, No. 2, 1997 Optimal Spare Capacity Precon® guration for Faster Restoration of Mesh Networks M. H. MacGregor,1 W. D. Gro v e r,1,2 and K. Ry h orc hu k 1 Several distributed real-time methods have been proposed for restoratio n from single span failures in digital transport networks. These methods have the potential to avoid user service outages due to such failures, if they operate quickly enough. For example, switched 64 kbps connectio ns will not be disconnected if the network can be restored before the time at which calls in progress are dropped, typically 1± 2 seconds after a failure. However, it will be dif® cult to achieve the goal of sub-second restoration if cross-connects cannot operate crosspoints quickly enough, either due to large workloads during a restoration response, or because of implementation choices such as testing each cross-connection while in the midst of a serious outage. The results in this paper demonstrate that it can be useful to pre-operate selected cross-points between the spare links of a mesh-restorable network before any failure has occurred, putting the network into a statisticall y optimal state of readiness. W hen a failure occurs, some of the precon® gured restoration path bundles can be used immediatel y. If more restoration paths are needed, they can be obtained by a real-time restoration process. The ® rst advantage of precon® guration is that the number of cross-connection operation s may be greatly reduced or eliminated for a portion of the affected traf® c. This will reduce restoratio n time signi® cantly. Secondly, after utilizing precon® gured restoratio n paths, the workload of a real-time restoration process will be lower because it w ill be searching for fewer paths. This paper demonstrates that precon® guration can supply a signi® cant proportion of the replacement capacity required after a span failure. The results are obtained through integer programming. KEY W ORD S: Telecommun ications; restoration ; cross-connect. 1 TRLabs and University of Alberta, Departmen t of Electrical and Computer Engineering, Edmonton, T6G 2E1, Canada. whom correspondence should be addressed at 800 Park Plaza, 10611± 98 Ave., Edmonton, Alberta, Canada T5K 2P7. (E-mail: grover@edm .trlabs.ca) 2 To 159 / / / 1064-7570 97 0600-0159$12.50 0 Ó 19 97 Plenum Publishing Corporation 160 M acG reg o r, Gro v e r, and Ry h o rc h uk 1. INTRODUC TION Protocols for real-time restoration of span failures in digital transport networks have been studied for several years [1± 4]. Investigators generally agree that it is computationally feasible for distributed protocols to ® nd a set of replacement paths in under 2 seconds, whereas centralized methods [5] require times on the order of minutes. The following method applies equally well to networks served by distributed or centralized restoration; bene® ts will accrue whenever the time required to implement crosspoint closures is a signi® cant contributor to overall restoration times. For example, even if rapid distributed techniques are used to ® nd the restoration pathset, existing digital cross-connect systems (DCS) may then require up to 10 0 msec, largely for continuity testing, to close each crosspoint [6, 7]. If this is true, the network-wide process which computes the reaction will be suitably rapid, but the DCS could be too slow in implementing the restoration pathset to meet real-time objectives on large span cuts. One approach is to design DCS control complexes based on parallel-processing architectures and switch matrices in which many crosspoints can be operated in parallel [7]. The alternative considered here is to study the restoration path-set requirements of all possible span failures and pre-operate crosspoints between spare links in such a manner as to best match those requirements, in a statistical sense. It is important to note that optimizing the precon® guration of spare links is not the same as pre-planning the restoration pathsets for individual failures. In precon® guration, the restoration process is still dynamic and based on the actual state of the network at the time of failure. The bene® t is that some of the crossconnections which will be needed have already been set up, as opposed to having to implement all of the cross-connections speci® ed by a previously computed pre-plan. The objective of precon® guration is to put the network into a state of best preparedness, in a maximum likelihood sense, for any single span failure. Obviously we cannot know which failure will occur but we can attempt to be in a best state of preparedness either by assuming that the failure of any single span is equally likely, or by deriving a plan from known failure rate data. In this sense, a self-healing ring is a fully precon® gured degree 2 subnetwork. The equivalent arrangement for a generalized mesh network is presented here. The basic idea of mesh network spare capacity precon® guration has been presented previously [8] along with an initial assessment of its potential. The prior work showed that an average of 79% of the span-to-span ¯ ow required to restore any single span failure in 10 test networks could have been precon® gured in advance of the failure. Thus, we could reasonably expect precon® guration to supply a useful fraction of the detailed pathset required for restoring any span, without any crossconnect deployment delays. The prior work was, however, only an assessment of the upper bound achievable by precon® guration as the calcula- Optimal Spare Capacity Precon® guration for Faster Restoration of M esh Networks 161 tion was based on the number and pattern of cross-connections required at each node in isolation. The means of determining exactly which link in the next span should be connected to a given link in the current span was left open. This ª coherenceº problem for the individual paths is solved here by using an Integer Programming (IP) tableau to determine the detailed link-to-link coordination between spans at each node. This allows us to maximize the precon® gured readiness of a network given its working and spare capacities and a speci® ed set of potential failures. Network readiness is a speci® c term which we use to quantify the effectiveness of a spare capacity precon® guration plan. It is de® ned as the percentage of working capacity which could be restored immediately over precon® gured path bundles. 1.1. Pre con® gure d Path Bundles A precon® gured bundle of paths, or just ª bundleº , is described by a route and the set of links used in each span on the route. The route of the bundle is a speci® ed sequence of k spans from a starting node n 0 to a destination node n k . The cross-section of the bundle is the number of links used on each span, and is the same on each span in the route. A link is, for example, a single DS-3 or STS-3 on the span between neighboring DCS nodes. The links used by the bundle in one span must be cross-connected to the links used by the bundle on an adjacent span so that the bundle can actually carry as much ¯ ow as its cross-sectional capacity indicates. For example, consider the ® ve hop route (9, 7, 4, 3, 6, 8) in Fig. 1. A bundle following this route could be con® gured to yield a precon® gured ¯ ow of four units between nodes 8 and 9 if four links on each span were cross-connected to four links on the following span so as to create four continuous paths from 8 to 9. At node 6, we could cross-connect links 4± 7 on span (6, 8) to links 1± 4 on span (3, 6). Then at node 3, we would have to cross-connect links 1± 4 on span (3, 6) to four links on span (3, 4). This is the meaning of ª coherentº cross-connection: it is not suf® cient just to say that four links on span (3, 6) are cross-connected to four links on span (3, 4). We must maintain the end-to-end coherence of the bundle all along route (9, 7, 4, 3, 6, 8) for the bundle to be useful in restoration. This means specifying exactly which individual links in each span are connected to speci® c links in other spans. There is considerably more information in such a solution than simply the ¯ ow assignments between spans at a node. Note that this single precon® gured bundle can be used in the restoration of more than one independent failure. In case of a failure of span (8, 9) the bundle can be used along its entire route. In case of failure of span (7, 8) the sub-bundle (7, 4, 3, 6, 8) can be used. And in case of a (6, 7) failure, the bundle can be used along the partial route (7, 4, 3, 6). If bundle (9, 7, 4, 3, 6, 8) has a capacity of four units, then up to four failed working links can be restored on any of 162 M acG reg o r, Gro v e r, and Ry h o rc h uk Fig. 1. Example of a precon® gured restoratio n bundle. spans (8, 9), (7, 8), or (6, 7) by using part or all this bundle, along all or part of its route. For an (8, 9) failure, restoration traf® c would be substituted onto the bundle by connecting links in the surviving spans on each side of the failure to the four links in the restoration bundle. Failures of (7, 8) or (6, 7) would require signals to be substituted into the bundle at intermediate nodes. For a (6, 7) span failure, the crosspoints on the precon® gured bundle connecting span (3, 6) to (6, 8) would ® rst be opened, as would those connecting (4, 7) to (7, 9). Then, assuming the source traf® c is coming from span (2, 6), we close cross-points between the affected links on (2, 6) and the restoration links on span (3, 6). Substituting traf® c into the restoration bundle at a midpoint enables a single precon® gured bundle to be useful in multiple, separately-occurring, failures. This is especially important when attempting to anticipate a large number of equally-likely independent failures. Multiple re-use of portions of precon® gured bundles wherever possible is an essential part of the economy of this technique. The alternative of dedicating bundles to single failures, although possible, would be extremely costly in terms of the capacity and cross-points required relative to the optimized plans which we develop. Such an approach would be warranted only if it were impossible to break into precon® gured paths at intermediate nodes, such as 6, 3, 4, or 7 in (9, 7, 4, 3, 6, 8). This does imply that a fast break-then-make operation is required to gain the most real-time bene® t from breaking into bundles as opposed to using them only at their endpoints. Of course, the number of alternative bundle de® nitions is very large because Optimal Spare Capacity Precon® guration for Faster Restoration of M esh Networks 163 there may be a number of whole or partial bundles that could be precon® gured for each case. Given the number of possible con® gurations of the network’s spare capacity, and the number of ways those con® gurations could be used, an optimization method is required to make the best set of bundle selections. The contribution of this work has been to develop the Integer Programming tableau disclosed in Section 2 which de® nes the bundles to use and gives an unambiguous assignment of the amount of traf® c restored over each full or partial bundle, for each failure considered. 1.2. Combining P recon® guration and Real-time Path® nding Precon® guration of network spares can be compatible with and complementary to real-time distributed restoration mechanisms. The relationship between the two is just that the precon® gured component provides a ª head startº for the real-time mechanism. In practice, we ® nd that precon® guration may supply a surprisingly large fraction of the whole restoration requirement. An example of precon® guration interworking with real-time restoration is presented in Figs. 2± 5. Figure 2 shows the network topology of Fig. 1 dimensioned with the minimum spare capacity for 10 0% real-time span restoration [1] of any single span failure. The ordered pairs give the number of working and Fig. 2. Example network dimensioned for 10 0% span restorability. 164 M acG reg o r, Gro v e r, and Ry h o rc h uk Fig. 3. Real-time reaction to the failure of span (7, 9). Fig. 4. A precon® guration plan for the example network. Optimal Spare Capacity Precon® guration for Faster Restoration of M esh Networks 165 Fig. 5. Precon® gured response with real-time restoration of remainder of span (7, 9) failure. spare links on each span, respectively. Figure 3 shows how the failure of span (7, 9) could be restored. Eight working links are lost in this failure. Now let us assume that the basic spare capacity put in place for real-time span restoration can also be precon® gured as shown in Fig. 4. Each precon® gured bundle has unit capacity. The tables at each node show how many links are crossconnected between span pairs at the node. For example, at node 9 there is one link on span (5, 9) crossconnected to one link on span (9, 7). The node crossconnection tables give a very low-level view of the precon® guration state of the network. If we step back, there are ® ve precon® gured bundles in the network. Four of them are Hamiltonian circuits, touching all nodes in the network once. The reaction of the network to the failure of span (7, 9) can now take place in two stages as shown in Fig. 5. First, of the eight failed working links, four can be restored using portions of the precon® gured bundles. Segments of bundles 2 and 4 are used along (7, 5, 9) to restore two failed links. Segments of bundles 1 and 5 are used along (7, 8, 9) to restore another two links. Subsequently, real-time restoration can be used to restore the remaining four working links. The real-time protocol has access to all spares left unused after the precon® gured bundles have been exploited. In the example, real-time restoration restores four more links on 166 M acG reg o r, Gro v e r, and Ry h o rc h uk paths (7, 4, 5, 9), (7, 3, 1, 5, 9), (7, 6, 8, 9), and (7, 6, 2, 8, 9). And the routes synthesized in real-time are a subset of those used when real-time restoration is employed alone. In general, however, precon® gured bundles will have differing capacities, and the routes found by real-time restoration reacting in combination with precon® guration could be different than when real-time restoration is used on its own. An important feature of this example is the signi® cant extent to which precon® guration is able to restore the failure. One half of the working links lost were recovered just by substituting traf® c onto already existing precon® gured paths. And this is in a network previously provisioned with only enough spares to support dynamic real-time restoration. In general, the challenge is to ® nd the best precon® guration plan for a given network and set of anticipated failures. An optimal method is presented in the next section. 2. OPTIMAL PREC ONFIGUR ATION OF A GIVEN POOL OF SPARES 2.1. Formulation If we are given an existing span-restorable network to protect via precon® guration, we must work within the constraints of the existing spare placement. In this case, our goal is to minimize the number of working links which are not restorable using precon® gured bundles: min å uj (2.1) j where the index j runs from 1 up to the number of spans, S, in the network, and u j is the number of working links on span j left unrestored by the precon® gured component of spares. We assume that working links which are not restorable immediately over precon® gured bundles will be restored dynamically by a subsequent real-time restoration process. This objective function could be extended to account for differing priorities amongst spans, or variations in failure probabilities, by weighting u j with a multiplicative constant a j . The objective function, Eq. (2.1), is subject to the following constraints. First, the amount of restoration ¯ ow crossing any span is bounded by the spare capacity on that span: å p d j f p £ sj (2.2) p where p is the index number of a precon® gured bundle, f p is the ¯ ow which Optimal Spare Capacity Precon® guration for Faster Restoration of M esh Networks 167 p bundle p can carry, d j is 1 if f p crosses span j or 0 otherwise, and s j is the spare capacity of span j . The index p ranges from 1 up to P, the total number of distinct eligible restoration routes provided for use in the optimization tableau. The second constraint is that the amount of ¯ ow restored plus the amount of ¯ ow left unrestored after the failure of a particular span equals the working capacity of that span: f å p p j f f + uj = wj (2.3) p where j is the index of the failed span, w j is the working capacity of span i, and p p j is 1 if f can contribute to the restoration of span j or 0 otherwise. A bundle can contribute to the restoration of a span if both end nodes of the span appear in the route of the bundle, and the failed span is not on the route of the bundle. The constraint of Eq. (2.3) actually needs a further, less obvious, re® nement to allow for the case that in an optimum design it may be possible to restore more ¯ ow for some span j than is lost by the failure of span j. This could happen if, for example, another larger span restores all of its ¯ ow over one bundle which can also contribute to the restoration of span j. The amount of ¯ ow that is actually restorable beyond that which is needed is referred to as the super-restorability, r j , of a span. In such a case the working capacity lost plus the super-restorability will equal the ¯ ow which can be restored, and the unrestorable ¯ ow will be zero: å f p p j f + uj ± rj = wj (2.4) p Super-restorability has been added to the tableau, rather than just carrying an inequality constraint, as this is useful information for a network designer. A span with super-restorable demand can be assigned additional working capacity without having to provide any additional spare capacity on other spans to ensure restorability. Lastly, we have nonnegativity constraints on the amount of unrestored ¯ ow, super-restorability and the restoration ¯ ows: 0 £ uj £ w j (2.5) 0 £ (2.6) 0 £ f rj p (2.7) All of the variables in the tableau are constrained to be nonnegative integers. p p The approach of using 0± 1 variables like d j and f j to represent paths is based on previous work in optimal network design [9]; however, these variables 168 M acG reg o r, Gro v e r, and Ry h o rc h uk are not explicitly present in the ® le presented to the IP engine. They are incorporated via a pre-processing step when writing out the sets of constraints represented by Eqs. (2.2) and (2.4). The number of distinct routes P and the actual routes considered are determined in pre-processing based on decisions about the eligible set of restoration routes. The set of eligible restoration routes is a subset of all possible distinct simple routes in the network. In small networks it is feasible to use all such routes. However, this is not always feasible for larg e networks due to space and time constraints. In such a case, the pre-processing module generates all simple routes up to a given maximum length. The maximum length is determined relative to the length of the shortest path for each demand pair. Length may be determined on the basis of number of hops using an excess hop factor, or on the basis of geographical length using an excess distance factor. Excess distance is the amount by which any precon® gured path is allowed to exceed the shortest path between a pair of nodes in generating the eligible route set for the IP. Excess hops are similar, and denote the number of hops by which a precon® gured path is allowed to exceed the length of the shortest path. Once all paths shorter than these limits have been found, the set of potential restoration routes is enriched by adding the k successively-longer span-disjoint routes found by running Dijkstra’s shortest path algorithm iteratively, ® nding a route, removing its spans from the network, and looking for the next longest route from source to target. No length limit is applied to this process. The iteration of Dijkstra’s algorithm terminates when no new routes can be added to the route set. This step ensures that node pairs which are particularly far apart in comparison to the length limit in the ® rst instance are not starved for restoration alternatives. 2.2. Results Five test networks were provisioned for 10 0% span restorability using a method reported previously [9] which ® nds the minimum spare capacity placement for span-restorable mesh networks. Table I gives some data on the test networks. Herzberg’s method [9] was used to ensure that tightly spared networks were used as references, as otherwise precon® guration could be made to look arbitrarily good, given heavily spared test cases. Precon® guration would readily make use of any excess sparing to its own advantage, so for the most exacting evaluation of precon® guration, we use the most parsimonious technique for network dimensioning. Table II presents the results of maximizing the amount of demand restorable by precon® guration in these ® ve test networks, using only those spares of the existing spare placement for basic span restoration. The fourth column indicates whether the complete set of all distinct routes was obtainable for the optimization tableau. If the complete set of all distinct routes was not feasible, then the Optimal Spare Capacity Precon® guration for Faster Restoration of M esh Networks 169 Table I. Properties of the Five Test Networks Case No. of nodes Average degree Working links Spare links Redundancy 1 2 3 4 5 10 15 20 53 30 4.40 3.73 3.10 2.98 3.93 142 1404 4369 2191 27522 44 868 3112 206 6 23809 0.31 0.62 0.71 0.94 0.87 feasible set of routes found with the given excess hop and distance factors was enriched as outlined earlier. This would be a consideration in any practical use of this formulation for large networks. An alternative approach based on a technique called ª column generationº could be used to add routes to the tableau as needed, but this has not been explored in the work documented here. Column ® ve gives the ª network readinessº , that is the percentage of working capacity which could be restored immediately over precon® gured bundles. ª Network readinessº is calculated by examining the precon® gured coverage for all single span failures, totaling the number of working links restored this way and dividing by the total working links in the network. Column six shows the incremental percentage of working capacity which could be restored using a real-time mechanism to follow up after exploiting the precon® gured bundles. The last column shows total restorability achieved by combining precon® guration and real-time restoration in the minimally-spared test networks. Running times for these tableaus vary from a few seconds to a few minutes, depending on the size of the network and demand set. A signi® cant result is that the restoration of approximately 30% ± 40% of the working capacity of the network can be based on precon® guration. On average, therefore, one expects that most priority or special services traf® c could enjoy the fastest restoration possible; only the time for traf® c substitution at end-nodes is needed. That is, any available precon® gured paths could be used to restore the Table II. Restorability by Precon® guration + Real-Time Restoration Case 1 2 3 4 5 Excess distance (miles) Excess hops ¥ ¥ ¥ ¥ 2872 417 ¥ ¥ 14 6 Uses all distinct routes? Network readiness (% ) % Real-time restoration % Total restoration yes yes yes no no 43.0 42.5 41.8 30.4 35.5 5 5.6 57.2 58.2 69.6 64.4 98.6 9 9.7 10 0.0 10 0.0 9 9.9 170 M acG reg o r, Gro v e r, and Ry h o rc h uk highest priority traf® c on a span ® rst, before being offered to lower priority traf® c. The theoretical price for this speed is that it will sometimes be necessary to add some spares to the network in excess of those required for real-time restoration alone. This is indicated by the results in the ® nal column of Table II, where cases 1, 2, and 5 show less than 10 0% restorability. Full restorability after the fast precon® gured response can be achieved by designing in the small amount of additional sparing required. The results suggest that only a small increment is required, which in practice may often be present anyway due to provisioning modularity effects. For cases 1± 3 it was feasible to represent all distinct simple routes in the IP tableau, but not in cases 4 and 5 (see the earlier discussion regarding preprocessing to generate the route set for details). We think this shows up in the results as the network readiness is over 40% for cases 1± 3, and only about 30% for cases 4 and 5. 3. CONCLUD ING DISCUSSION Precon® guration is an idea that can be used in existing span-restorable designs to supply an especially rapid form of survivability for a signi® cant proportion of the network’s traf® c. Network readiness may be high enough in some cases to give line-switched ring-like restoration speeds to all priority or special services traf® c, while still achieving the relatively high capacity ef® ciency of a mesh-restorable network. This could help overcome some of the problems in implementation of real-time mesh restoration arising from the slow cross-pointclosure rates of some cross-connect systems. Patents on this method have been ® led in Canada and the U.S. in 19 95 [ 10] . REFER EN CES 1. W. D. Grover, B. D. Venables, M . H. MacGregor, and J. H. Sandham, Developme nt and performance veri® cation of a distributed asynchron ous protocol for real-time network restoration , IEEE J. on Selected Areas in Comm unication , Vol. 9, No. 1, pp. 112± 125, 1991. 2. H. Sakauchi, Y. Nishimura, and S. Hasegawa, A self-healin g network with economical sparechannel assignment, Proc. IEEE Globecom ’ 91 , pp. 438± 4 43, 19 91. 3. C. H. Yang and S. Hasegawa, FITNESS: A failure immunization technolo gy for network service survivability, Proc. IEEE Globecom ’ 8 8, pp. 1549± 15 54, 198 8. 4. W. D. Grover, Distributed restoration of the transport network, Telecom munications Network M anagem ent: Into the 21st Century , IEEE Press, New York, pp. 3 37± 417, 19 94. 5. C. W. Chao, P. M . Dollard, J. E. Weythman, L. T. Nguyen, and H. Eslambolchi, FASTARÐ A robust system for fast DS3 restoration, Proc. IEEE Globecom ’91 , pp. 1396± 140 0, 19 91. 6. SR-NWT-00 2514, Digital Cross-Connect Systems in Transport Network Survivability, Issue 1, Bellcore, January, 1993. 7. T.-H. Wu, H. Kobrinski, D. Ghosal, and T. V. Lakshman, A service restoration time study for distributed control SONET digital cross-connect system self-healin g networks, Proc. ICC’ 93 , pp. 893± 89 9, 19 93. Optimal Spare Capacity Precon® guration for Faster Restoration of M esh Networks 171 8. W. D. Grover and M . H. M acGregor, On the potential for spare capacity preconnection to reduce cross-connection workloads in mesh-restorable networks, Electronics Letters, Vol. 30, No. 3, pp. 194± 195, Feb. 3, 19 94. 9. M. Herzberg and S. Bye, An optimal spare-capaci ty assignment model for survivable networks with hop limits, Proc. IEEE Globecom ’ 94 , pp. 1601± 1607, 19 94. 10. W. D. Grover and M. H. M acGregor, M ethod for Precon® guring a Network to Withstand Anticipated Failures, U.S. Patent Application 08 5 51,709, November 1, 19 95. / M. MacG re go r was with TRLabs from 19 90 to 1997. He is currently with Telus Advanced Communications as a Senior Engineer. He holds a Ph.D. in Computing Science from the University of Alberta (1991). In 1989 and 1990, M ike was a member of the team at TRLabs which researched the applicatio ns of self-healin g to the Telecom Canada network. His contributions were in algorith m developm ent, and in reducing simulation run times. Subsequent to his thesis on self traf® c-engineering networks, he has worked on spare capacity placement in mesh-resto rable networks, methods for managing ATM virtual paths, spare capacity precon® guration, optimal recovery from node failure and network design using multiple self-healin g rings. W. D. Gro v e r joined TRLabs (then ATRC) in 1986 as the founding Vice President-Tec hnical responsible for development of the research program after 10 years with BNR (now Nortel Technology). From 1987 through 19 90 he served as an Adjunct Professor at the University of Alberta. He now functions as Director, Networks and Systems group at TRLabs and in 19 92 was appointed Associate Professor (now Professor, as of 19 95) at the University of Alberta, Departmen t of Electrical and Computer Engineerin g. He presently has 17 issued patents, three patents pending and a number of signi® cant publicatio ns that have introduced new methods now in use by industry. Dr. Grover is a registered Professional Engineer in the province of Alberta, and a Senior Member of the IEEE. Kent Ryh o rc h u k is an undergrad uate student in the Department of Computing Science at the University of Alberta. He began working at TR Labs in the Networks and Systems group as a summer student in M ay 19 95 and continued working part-time through his studies until M ay 19 96. Kent is currently completin g an internship at Nortel Technology in the OC48 software architectur e group and plans to return to the University of Alberta in the fall of 19 97.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Optimal Spare Capacity Preconfiguration for Faster Restoration of