Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 60, NO. 3, MARCH 2012 451 Longitudinal-Partitioning-Based Waveform Relaxation Algorithm for Efficient Analysis of Distributed Transmission-Line Networks Sourajeet Roy, Student Member, IEEE, Anestis Dounavis, Member, IEEE, and Amir Beygi, Student Member, IEEE Abstract—In this paper, a waveform relaxation algorithm is presented for efficient transient analysis of large transmission-line networks. The proposed methodology represents lossy transmission lines as a cascade of lumped circuit elements alternating with lossless line segments, where the lossless line segments are modeled using the method of characteristics. Partitioning the transmission lines at the natural interfaces provided by the method of characteristics allows the resulting subcircuits to be weakly coupled by construction. The subcircuits are solved independently using a proposed hybrid iterative technique that combines the advantages of both traditional Gauss–Seidel and Gauss–Jacobi algorithms. The overall algorithm is highly parallelizable and exhibits good scaling with both the size of the network involved and the number of CPUs available. Numerical examples have been presented to illustrate the validity and efficiency of the proposed work. Index Terms—Convergence analysis, delay, longitudinal partitioning, transient simulation, signal integrity, transmission line, waveform relaxation. I. INTRODUCTION W ITH the constant increase in operating frequencies, interconnects need to be modeled as distributed transmission lines for accurate signal integrity analysis of modern integrated circuits (IC) [1]. Accurate modeling of large distributed networks using commercial circuit solvers with integrated circuit emphasis (like SPICE) require significant central processing unit (CPU) time and memory, thereby making them computationally prohibitive for fast transient simulation. The waveform relaxation (WR) algorithm has emerged as an attractive technique to reduce the simulation costs of such large networks [2]–[23]. Typically, waveform relaxation attempts to break a large circuit into smaller subcircuits that can be solved iteratively in sequence or in parallel. Each iteration involves an exchange of voltage/current waveforms between the subcircuits for the response to converge to the actual solution. Presently, two approaches exist for application of waveform relaxation to transmission line networks. One such approach Manuscript received September 26, 2011; accepted November 21, 2011. Date of publication January 18, 2012; date of current version March 02, 2012. This work was supported in part by the Natural Sciences and Engineering Research Council of Canada, Canada Foundation for Innovation, Canadian Microelectronics Corporation and Ministry of Research and Innovation—Early Research Award. The authors are with the Department of Electrical and Computer Engineering, University of Western Ontario, London, ON, Canada N6A 5B9 (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMTT.2011.2178261 is the transverse partitioning scheme [11]–[14] where multiconductor transmission lines (MTLs) are partitioned into single lines by assuming weak capacitive and inductive coupling between the lines. The coupling between the lines is represented as time-domain relaxation sources introduced into the circuit model of each line. An alternative waveform relaxation algorithm is based on longitudinal partitioning of the network into repeated subcircuits [4]–[8], [10], [16]. While longitudinal partitioning schemes based on the generalized method of characteristics (MoC) has been reported in [4]–[8], more recent works [16] have focused on partitioning the line based on segmentation models such as the conventional resistive-inductive-conductive-capacitive (RLGC) lumped model [24]. Partitioning techniques based on segmentation models have a common limitation that since each segment directly feeds into the next segment, the adjacent segments are strongly coupled in physical space. This is reflected in the fact that blindly partitioning the conductor between segments requires resolving the stringent Dirichlet’s transmission condition across the partition and consequently exhibits poor convergence [16]. The work of [16] accelerated the convergence of the WR algorithm by artificially exchanging additional voltage/current waveforms (i.e., increasing the overlap between subcircuits) followed by optimization routines. More recently, in [25], a WR algorithm based on the delay extraction-based passive compact transmission-line (DEPACT) segmentation model [26], [27] was presented for two conductor transmission-line networks. The DEPACT model represents lossy transmission lines as a cascade of lumped circuit elements alternating with lossless line segments where the lossless line segments are realized in the time domain using the MoC [24], [28]. The work of [25] exploited the inherent weak coupling across the natural interfaces provided by the MoC [4]–[8] to longitudinally partition the transmission line at these interfaces into smaller, disjoint subcircuits. The iterative solution of the subcircuits was performed using the sequential Gauss–Seidel (GS) technique and was shown to naturally achieve fast convergence without the need of any artificial exchange of waveforms or optimization techniques as proposed in [16]. This work extends the concepts of [25] to multiconductor transmission-line systems. Furthermore, the efficiency of the proposed algorithm for any general transmission-line network (two conductor or multiconductor) has been investigated on parallel processing-based platforms. To this end, two highly parallelizable iterative techniques have been implemented—the traditional Gauss–Jacobi (GJ) and a novel hybrid technique that 0018-9480/$31.00 © 2012 IEEE 452 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 60, NO. 3, MARCH 2012 combines the complimentary features of Gauss–Seidel (GS) and the Gauss–Jacobi (GJ). This hybrid technique exhibits superior convergence properties when compared to the traditional GJ algorithm while maintaining its high parallelizability with respect to the number of CPUs available. In addition, a mathematical framework has been provided to demonstrate the scalability of the algorithm with respect to both the size of the network involved and the number of CPUs available for parallel processing. Numerical examples have been provided to illustrate the validity and efficiency of the proposed WR algorithm over full SPICE simulations. The paper is organized as follows. Section II deals with the background of waveform relaxation algorithms and concludes with a review of the DEPACT model [26], [27]. Section III presents the details of the proposed algorithm and Section IV describes the mathematical framework for analyzing the computational cost of the proposed work. The numerical examples and conclusions are presented in Sections V and VI, respectively. II. BACKGROUND AND DEPACT MODEL In order to explain the contributions of the proposed work, here we briefly discuss the background of general waveform relaxation algorithms followed by a review of the DEPACT model. A. Background of Waveform Relaxation Algorithms Waveform relaxation, from its introduction in [2], has proven to be an attractive algorithm to address the issue of exorbitant computational costs for solving large networks using traditional circuit solvers like SPICE. The algorithm is based on partitioning large networks into smaller subcircuits where the coupling between the subcircuits is represented using time-domain relaxation sources introduced into each subcircuit. Assuming an initial guess for the waveforms of the relaxation sources, the subcircuits are solved independently. The present solution of the subcircuits is then used to update the relaxation sources for the next iteration. This process is repeated until the error between two successive iterations falls within a prescribed error tolerance. Solving the individual subcircuits using modern parallel processing resources has allowed the utilization of multiprocessor hardware and provided significant CPU savings in memory and time compared with traditional full circuit simulation [14]. It is noted that the main limitation of relaxation algorithms is the speed of convergence of the iterations. Several methods have been reported to speed up convergence, such as time windowing [3], overlapping subdomains [22], [23], and optimization [16], [22]. B. Review of DEPACT Model A general coupled MTL system for quasi-transverse electromagnetic (TEM) mode of propagation is described by the Telegraphers partial differential equations [24] (1) where and represent the spatial distribution of the voltage and current along the longitudinal direction , and are the frequency-dependent and resistive, inductive, conductive, and capacitive per-unit-length (p. u. l.) parameters of the line, respectively. The solution of the above equations can be written as an exponential matrix function [29], [30] as (2) where (3) and are the p. u. l. inducand tive and capacitive parameters at the maximum frequency of interest . Typically, the solution of (2) does not have an exact time domain counterpart and hence segmentation based modeling techniques [26], [27], [29]–[34] are generally used to derive an equivalent time domain expression of (2). Of these segmentation algorithms, the DEPACT is suitable for electrically long transmission lines due to the fact that it explicitly extracts the delay of the network leading to smaller number of lumped segments. However, extracting the delay terms from is not a trivial task since the matrices and do not commute (i.e., ). To approximate in terms of a product of exponentials, a modified Lie product [35] is used as (4) where is the number of sections. The associated error of the approximation scale as [34] (i.e., (4) quickly converges to the exponential matrix of (2) with increase in number of sections ). Equation (4) provides a methodology of discretizing the transmission line into a cascade of alternating subsections with the individual stamps of and , as illustrated in Fig. 1 (for single lines) and Fig. 2 (for MTLs). The exponential matrix represents the attenuation does not contain losses of the transmission line. Since and , it can be approximated by a low-order rational function, which in turn can be realized in SPICE using either lumped RLC elements or lumped dependent sources [26], [27]. As a result, the subsections with stamps of are replaced by a macromodel referred to as “lumped circuit elements” in Figs. 1 and 2. On the other hand, the matrix contains only and and can be modeled as a lossless line using the MoC [24], [28]. As a result, the subsections with stamps of are replaced by the equivalent MoC circuit [24], [28] in Figs. 1 and 2. More detailed derivations of a SPICE realization of the DEPACT model of (4) has been provided in [26] and [27]. The rational macromodel describing the lossy sections and the MoC equations describing the lossless sections both enjoy exact representations in the time domain and together approximate the frequency domain solution of (2) as a set of ROY et al.: WAVEFORM RELAXATION ALGORITHM FOR EFFICIENT ANALYSIS OF TRANSMISSION-LINE NETWORKS 453 Fig. 1. SPICE equivalent circuit of a two conductor transmission line using DEPACT. delayed ordinary differential equations in the time domain which can be solved by SPICE. Section III discusses the development of the proposed WR algorithm based on the DEPACT model of (4). III. DEVELOPMENT OF PROPOSED ALGORITHM Here, we begin by describing the proposed longitudinal partitioning scheme for single lines and the methodology to iteratively solve the subcircuits. From this discussion, the algorithm is extended to MTLs. A. Proposed Partitioning Scheme for Single Lines The DEPACT model of (4) provides a methodology to discretize two conductor transmission lines into alternating cascade of lossy and lossless line segments (Fig. 1). To better explain the proposed partitioning methodology, consider the equations for the th lossless line segment in Fig. 1 given as follows: (5) are the near and far end voltages, respecwhere tively, and are the near and far end currents. respectively, of the th lossless line segment. Using simple algebraic manipulations on (5) followed by converting the resultant equations into the time domain provides the following MoC relation [24], [26], [27]: (6) where and are the characteristic impedance and the delay of each lossless section, respectively. The MoC equations of (6) can be realized by the simple circuit equivalent of Fig. 1. From Fig. 1, it is observed that the MoC provides natural interfaces across which information is exchanged using the time delayed equations of (6) rather than the more stringent Dirichlet’s transmission conditions. As a result, partitioning the transmission lines at these interfaces as shown in Fig. 3 was found to yield reliably efficient convergence without the need for artificial overlap of subcircuits and optimization like [16]. From (6), it can be further concluded that the delayed sources serve as the relaxation sources responsible for ensuring the coupling between the subcircuits for the proposed WR algorithm. The next section describes the methodology to iteratively solve the subcircuits and update the relaxation sources. B. Iterative Solution of Subcircuits for Single Lines Typically, two techniques exist for the iterative solution of the subcircuits—the Gauss–Seidel (GS) and the Gauss–Jacobi (GJ) techniques. According to the GS technique, the th iterative solution of any th subcircuit requires the present ( th) solution of th subcircuits as well. This translates to a all of the preceding sequential solution of the subcircuits where all of the relaxation sources are updated after solution of each individual subcircuit [3]. On the other hand, according to the GJ iterative technique, the th iterative solution of any th subcircuit requires only the previous ( th) solution of all subcircuits. This corresponds to a possible parallel solution of the subcircuits where the relaxation sources are only updated when the solution of all subcircuits is complete [3]. The above discussion shows that the GS technique involves updates or exchanges of information per iteration where is the number of subcircuits, compared with GJ that involves only one exchange of information. Thus, GS exhibits better convergence than GJ [15]. However, a potential drawback of GS is that it does not naturally lend itself to parallel processing like the GJ technique since the present solution of any th subcircuit is dependent on the present solution of all previous subcircuits. In [25], a sequential GS iterative technique to solve the subcircuits was implemented. In this work, with the focus being on highly parallelizable iterative techniques, two schemes are proposed—first, the traditional GJ technique, followed by a hybrid technique that combines the complementary features of GS and GJ. 454 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 60, NO. 3, MARCH 2012 Fig. 2. SPICE equivalent circuit of an MTL using DEPACT. Fig. 3. Partitioning of single line into subcircuits for waveform relaxation. 1) Gauss–Jacobi (GJ): This discussion begins by considering a general two-conductor transmission line discretized into subcircuits, as illustrated in Fig. 3. Prior to beginning the th iteration, it is assumed that the th iteration has been completed for all subcircuits and waveforms of all of the relaxation sources have been updated to . For , the waveforms of the relaxation sources, , is simply the initial guess. For the th iteration, considering the th subcircuit of Fig. 3, the corresponding relaxations sources with known waveforms serve as the input excitation. This translates to the following terminal conditions for the th subcircuit: (7) The terminal conditions of (7) along with the equations of the corresponding lumped circuit elements, together form the set of ordinary differential equations describing the th subcircuit, which can be solved for a self consistent solution . It is noted that the of the waveforms ) of relaxation sources of (7) (i.e., each th subcircuit are assumed to be known beforehand and, hence, considered independent of the present ( th) solution of subcircuits. This particular aspect allows the remaining the subcircuits to be solved in parallel on a multiprocessor machine. Once all of the subcircuits are solved, the voltage wave, determined from the present ( th) forms iteration, is used to update the relaxation sources for the future th iteration using (6) as follows: (8) The total equations of (8) required to update all of the relaxation sources, being decoupled, can be solved in parallel as well. Using the updated values of (8) as the new source waveforms for the next th iteration, the subcircuits are solved again. This iterative cycle continues until the absolute error satisfies a predefined tolerance expressed as (9) where is the predefined error tolerance. ROY et al.: WAVEFORM RELAXATION ALGORITHM FOR EFFICIENT ANALYSIS OF TRANSMISSION-LINE NETWORKS 455 . If , the waveforms of the above relaxation is simply the initial guess. For the sources, th iteration, using the above relaxation sources with known waveforms as the input excitation to the corresponding subcircuits of group A, the subcircuits can be solved in parallel via the GJ technique explained in previous section. Once the GJ is concluded, voltage waveforms determined from the present ( th) iteration of group A is used to update the relaxation sources responsible for exciting only the even numbered subcircuits (group B) of Fig. 3 as (11) The total equations of (11) can be solved in parallel, similar to (8). The relaxation sources of (11) serve as the input for the corresponding subcircuits can also be subcircuits of group B and the solved in parallel using the GJ technique. The voltage waveforms determined from the present ( th) iteration of group B is used to update the relaxation sources responsible for exciting only the subcircuits of group A for the future th iteration as Fig. 4. Hybrid GS–GJ iterative technique. 2) Hybrid GS–GJ: To explain this contribution, the subcircuits of Fig. 3 is considered to be divided among two groups—group A containing the odd numbered subcircuits and group B containing the even numbered subcircuits, where the total number of subcircuits within each group is defined as —group A —group B (10) and represents the modulus function. Since, for the specific case of longitudinal partitioning, coupling exists between an odd-numbered and an even-numbered subcircuit only (and not between two odd-numbered or two even-numbered subcircuits themselves), the th iterative solution of any subcircuit in any group is independent of the present ( th) solution of any other subcircuit within the same group and rather depends on the present ( th) solution of particular subcircuits within the opposite group. This coupling is addressed using a nested iterative technique. The outer iteration solves groups A and B in sequence (using GS) with updating the relaxation sources after every group solution. The inner iteration solves the subcircuits within each group in parallel (using GJ). This forms the basis of the proposed hybrid iterative technique and is illustrated in Fig. 4. In each iteration, the GS sequence begins with group A before proceeding to group B. Hence, prior to beginning the th iteration, it is assumed that the th iteration has been completed for all subcircuits and those relaxation sources responsible for exciting only the odd numbered subcircuits (group A) in Fig. 3 have been updated to (12) equations of (12) can be solved in parallel as The total well. The above iterative cycle continues until the absolute error of the iterations satisfies the error tolerance as in (9). It is noted that the hybrid technique provides more frequent exchange of waveforms using (11)–(12) compared with traditional GJ which allows only a single exchange of (8). As a result, the hybrid technique exhibits better convergence than GJ. In Section III-C, the proposed algorithm is extended for MTLs. C. Extension for Multiconductor Transmission Lines To better explain the partitioning methodology for MTLs, the equations for the th lossless line segment in Fig. 2 is provided as (13) coupled equations. However, It is observed that (13) leads to the coupled lossless sections can be decoupled into single lossless lines using a linear transformation of modal voltages/ currents as (14) 456 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 60, NO. 3, MARCH 2012 Fig. 5. Partitioning of MTLs into subcircuits for waveform relaxation. where and and are constant matrices chosen to diagonalize and have the following properties [24]: (15) and are diagonal matrices and the superscript denotes the transpose of the matrix. Replacing (14) and (15) in (13) and performing the same algebraic manipulations as in Section III-A followed by converting the resultant equations into the time domain, the decoupled lossless sections can be represented using the MoC equations similar to (6) as (16) is expected to yield efficient convergence of the proposed WR algorithm. The following section describes the iterative solution of the subcircuits of Fig. 5. D. Iterative Solution of Subcircuits for MTLs Once the MTL network is partitioned using the above methodology, both the GJ and hybrid GS-GJ iterative technique can be used to solve the subcircuits as explained below. The iterative procedures (GJ and hybrid GS-GJ) for MTLs are similar to that of two conductor line with the main difference being that, the MoC equations of (6) now has to be extended to consider the decoupled equations of (16). 1) GJ for MTLs: This discussion begins by considering a general MTL discretized into subcircuits as illustrated in Fig. 5. Assuming that the waveforms of all of the relaxation sources are known from the previous th iteration and are used as input excitations for the subcircuits of Fig. 5, the terminal conditions required for the th iterative solution of the th subcircuits is changed from (7) to include the effect of MTLs described by (16) as where represents the line number, and represents the characteristic impedance and delay of each lossless section, respectively, of the th line and (18) Since (17) where the time-domain are of the vectors , respectively, defined in (14). The MoC equations (16) for MTLs can be realized using the equivalent circuit of Fig. 2, where the matrices and arising from the similarity transformation of (14) is grouped with the lumped representation of the lossy section. It is observed that, similar to the single-line case of Fig. 1, the MoC provides natural interfaces for MTLs across which information is exchanged using the time delayed equations of (16). Hence, longitudinally partitioning transmission lines at these interfaces, as shown in Fig. 5, counterparts the relaxation sources of (18) (i.e., ) of each th subcircuit are assumed known beforehand and independent of the present ( th) solution of the remaining subcircuits, the subcircuits can be solved in parallel, similar to two conductor subcircuits lines. The th iterative solution of all of the provides the self consistent solution of the waveforms which are thereafter used to update the relaxation sources for the future th iteration using (16) as (19) This iterative cycle continues until the absolute error satisfies a predefined tolerance as (20) ROY et al.: WAVEFORM RELAXATION ALGORITHM FOR EFFICIENT ANALYSIS OF TRANSMISSION-LINE NETWORKS 2) Hybrid GS–GJ for MTLs: The characteristic of longitudinal partitioning where couplings exist between an odd numbered and an even numbered subcircuit only (and not between two odd-numbered or two even-numbered subcircuits themselves), is applicable to MTLs as well. Hence, the hybrid iterative technique of Fig. 4 can be easily extended to MTLs. Assuming that the waveforms of all of the relaxation sources responsible for exciting the subcircuits of group A are known from the previous th iteration, the subcircuits of group A can be solved in parallel via the GJ technique explained in the previous section. Once the GJ is concluded, determined voltage waveforms from the present ( th) iteration of group A is used to update the relaxation sources responsible for exciting only the subcircuits of group B as (21) The relaxation sources of (21) now serve as the input for the corresponding subcircuits of group B and the subcircuits can also be solved in parallel using the GJ technique. The voltage waveforms determined from the present ( th) iteration of group B is used to update the relaxation sources responsible for exciting only the subcircuits of group A for the future th iteration as 457 computational cost for traditional circuit simulators is a major factor limiting its applicability. To address the above issue in the proposed WR algorithm, the DEPACT sections are separated into subcircuits each described using delayed differential equations which can now be solved independently. The total computational cost of the proposed WR algorithm is mathematically quantified using the following lemmas. Lemma 1: For subcircuits, the computational cost of the proposed WR algorithm using traditional GJ iterations is , where is the number of iterations and is the number of CPUs available for parallel processing. Proof: For typical WR algorithms, the total computational cost can be divided into two parts—the first part is to solve the subcircuits independently and the next is to update the relaxation sources. It is assumed that the cost of solving one subcircuit scales as , where is the scaling coefficient. Using a GJ iterative technique where the task of independently solving subcircuits can be distributed over CPUs, the total cost of solving the subcircuits per iteration is given by . The second stage of the algorithm involves updating the relaxation sources using (8) and (19). This translates to the solution of linear algebraic equations in the time domain per iteration. Since the equations are all decoupled, they can be solved independently in parallel using CPUs for a cost of where is the scaling coefficient for the second part of the proposed WR algorithm. Since, within the context of this analysis, is a constant, the above cost can be rewritten as . The total cost of each iteration is the sum of the above costs given as (23) (22) The above iterative cycle of continues till the absolute error of the iterations satisfies the error tolerance as in (20). Equations (21)–(22) provide twice the amount of waveform exchange compared to the single waveform exchange of (19) and hence, the hybrid technique exhibits improved convergence compared with the GJ technique. IV. COMPUTATIONAL COMPLEXITY ALGORITHM OF THE PROPOSED The analysis begins by considering a general MTL network of Fig. 2 discretized into DEPACT sections. Assuming each DEPACT section to be described using number of delayed ordinary differential equations, the size of the overall circuit matrix describing the original network is . The computational complexity of directly inverting the above matrix to perform time-domain analysis is or [36], [37]. However, the matrices obtained by traditional circuit simulators are sparse by nature and can be solved more efficiently using sparse matrix routines at a cost of where typically depending on the sparsity of the matrix [11]. For large distributed networks, the interconnect have to be discretized into many segments to accurately capture the response at the output ports. For such cases, the super linear scaling of the is the cost of each GJ iteration. Since the above where process needs to be redone for iterations, the total cost of the proposed algorithm using traditional GJ is (24) is the total cost of the proposed algorithm using GJ. where It is observed that the solution of the linear algebraic equations to update the relaxation sources of (8) and (19) does not involve any matrix inversion. On the other hand, the solution of each subcircuits involves the inversion of a matrix of size . As a result, the cost of solving the subcircuits (first part) is found to dominate over the cost of updating the relaxation sources (second part) [13] (i.e., ). Hence, the result of (24) can be simplified to (25) where, within the context of this work, is a function of the number of MTLs and is treated as a constant. Equation (25) demonstrates that the proposed WR algorithm scales as when using the traditional GJ. The following lemma extends the above analysis to the hybrid iterative technique. Lemma 2: For subcircuits, the computational cost of the proposed WR algorithm using the hybrid GS–GJ iterations is , where is the number of iterations. 458 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 60, NO. 3, MARCH 2012 Fig. 6. Circuit of Example 1. Proof: The cost of the proposed WR algorithm using the hybrid iterative technique can be divided into two parts—the first part is to solve the subcircuits and update the relaxation sources using (21). The second part is to solve the subcircuits and update the relaxation sources using (22). Since updating the relaxation sources using (21) and (22) does not require any matrix inversion, the contribution of solving (21) and (22) is minimal compared with the cost of the solution of each subcircuit. As a result, the total cost of the hybrid iterative technique can be approximated as simply the cost of the independent solution of the and subcircuits. The computational cost of solving the subcircuit per iteration using the GJ technique with parallel CPUs is given by (from Lemma 1). Similarly, the cost of the subcircuits per iteration is approximated as . Since the solution of and subcircuits proceeds in sequence, the total cost of the hybrid technique per iteration is the sum of the above two costs, given here as (26) Multiplying the above cost with the number of iterations (in this case, ) provides an estimate of the full computational cost of the proposed WR algorithm using the proposed GS-GJ hybrid iterative technique as follows: (27) From the definition of proximated to and in (10), (27) can be ap(28) Equation (28) demonstrates that the proposed WR algorithm scales as when using the hybrid iterative technique. Comparing the scaling of (25) and (28) with the number of , it is appreciated that the hybrid iterative available CPUs technique retains the high degree of parallelizability as the GJ technique. However, the hybrid technique has the added advantage of faster convergence over the GJ counterpart due to the greater exchange of waveforms using (11) and (12) and (21) and (22) compared with the single exchange of (8) and (19). It is observed that the main reason behind the attractiveness of the proposed algorithm [whether using GJ as in (25) or the hybrid technique as in (28)] is the ability to solve the subcircuits independently. This translates to an almost linear scaling of the computational costs of the proposed algorithm with number of DEPACT sections unlike SPICE which suffers from a super linear scaling. In addition, using GJ and the hybrid technique provides an additional advantage over SPICE (and GS based WR algorithms like [25]) of dividing the computational cost of the proposed algorithm over multiple CPUs . These results will be validated using the numerical examples in Section V. V. NUMERICAL EXAMPLES Three examples are presented here to demonstrate the validity and efficiency of the proposed algorithm. For a fair comparison of the proposed work with full SPICE simulations, all of the subcircuits of the WR iterations are also solved using SPICE. A customized C++ code is used to extract the waveforms of the th subcircuit and update the relaxation sources without any external communication between the user and SPICE engine. The scheduling of each subcircuit solve (whether using GJ or GS–GJ technique) is automated using MATLAB 2010b. Within the context of this work, full SPICE simulations refer to the DEPACT algorithm of [26] and [27]. Example 1: The objective of this example is to demonstrate the accuracy of the proposed WR algorithm and the superior convergence of the hybrid iterative technique over the traditional GJ technique. For this example a transmission line network consisting of seven transmission line segments as shown in Fig. 6 is considered. The p. u. l. parameters of the network 0.25 /cm, 4 nH/cm, pF/cm, are mmho/cm and 5 /cm where represents the skin effect losses as a function of frequency [38], [39]. The network is excited by a trapezoidal voltage source of rise time 0.1 ns, pulsewidth 5 ns, ROY et al.: WAVEFORM RELAXATION ALGORITHM FOR EFFICIENT ANALYSIS OF TRANSMISSION-LINE NETWORKS Fig. 7. Transient response for Example 1 using the proposed algorithm and full SPICE simulation. All line lengths are . (b) Transient response at output port . port Fig. 8. Convergence properties of the proposed hybrid iterative technique comcm. pared to GJ. All line lengths are amplitude of 2 V, and loaded with two SPICE level 49, CMOS inverters using 180-nm technology. To illustrate the accuracy of the proposed algorithm, the line 30 cm. In this case, the length of each segment is set to number of subcircuits required is 420. The network is then solved using both proposed work and the full SPICE simulation. The proposed work uses the hybrid iterative technique to solve the subcircuits on a sequential platform with the predefined error tolerance set to and an initial guess of the relaxation sources set to the dc solution of zero. The transient responses at the far end of the network using the proposed WR algorithm and full SPICE simulations are shown in Fig. 7. Next, the convergence properties of the proposed hybrid technique are compared with the traditional GJ technique. For each algorithm, the number of iterations is varied from 1 to 10 and the scaling of the associated error [ of (9)] is displayed in Fig. 8. It is observed that the proposed hybrid technique shows significantly faster convergence than the traditional GJ algorithms. This is due to the fact that the proposed hybrid technique involves twice the amount of information exchange as the GJ technique for same number of iterations (see Sections III-B and III-D). 459 cm. (a) Transient response at output Example 2: The objective of this example is to illustrate the computational efficiency of the proposed work over full SPICE simulations for MTL structures. For this example, a seven-coupled line network with the physical dimensions as shown in Fig. 9(a) is considered. The p. u. l. parameters for this example are extracted from the HSPICE field solver [38] and include frequency dependent parameters. For the following analyses, the MTL network topology is shown in Fig. 9(b), where lines 1, 3, 5, and 7 are excited with trapezoidal voltage sources of rise time 0.1 ns, pulsewidth 5 ns, and amplitude of 2 V. This example begins with a demonstration of the performance of the proposed work compared with full SPICE simulations as the size of the network increases. The line length of the network in Fig. 9(b) is increased from 0 to 200 cm in steps of 10 cm. To accurately model the network, the numbers of subcircuits are increased in steps of 16 for each 10-cm step and range from 0 to 320. For each case, the network is solved using both proposed work and the full SPICE simulation. The proposed work uses both the hybrid technique and traditional GJ technique on a with the predefined error tolerance sequential platform set to and an initial guess of the relaxation sources set to the DC solution of zero. For this particular error tolerance, the number of iterations required for convergence is found to be consistently between 5 and 6. The accuracy of the proposed work (with the hybrid technique) compared to full SPICE simcm (i.e., for 80 subcirulation is illustrated in Fig. 10 for cuits). The scaling of the computational cost of both proposed work and full SPICE simulation with the line length is shown in Fig. 11(a). It is observed from Fig. 11(a) that the proposed work scales almost linearly for both GJ and the hybrid algorithm as predicted in (25), (28) respectively while the full SPICE solution of the original network scale super linearly as where for this example. In addition, the hybrid iterative technique converges twice as fast as traditional GJ technique. Next, the performance of the proposed work is demonstrated on a parallel platform. The length of the network is fixed at the cm and the network corner of our design space where solved using both proposed work and full SPICE simulation. The proposed WR iterations are performed using both the hybrid technique and the traditional GJ technique where number of processors are varied from to for the same 460 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 60, NO. 3, MARCH 2012 Fig. 9. Transmission line structure of example 2. Fig. 10. Transient response for Example 2 using proposed WR algorithm and the SPICE full simulation. Line length of the network is . (b) Transient response at output port . response at output port Fig. 11. Scaling of computational cost for Example 2. (a) Scaling of computational cost with line length where line length cm. speed up with number of CPUs error tolerance as before. The CPU speed up offered by both iterative techniques over full SPICE simulations is shown in Fig. 11(b) and summarized in Table I. The speed up for either iterative technique scale almost linearly with number of processors, thereby demonstrating the high parallelizability of both as theoretically expected from (25) and (28). The minor deviation of Fig. 11(b) from the exactly linear scaling of (25) and (28) with respect to number of CPUs is due to the incurred communication overheads between processors. Example 3: For this example a network consisting of a cascade of subnetworks as shown in Fig. 12 is considered. Each subnetwork consists of the three coupled MTL structure of [40] with line length cm. For the following analysis, where number of CPUs cm. (a) Transient . (b) Scaling of CPU TABLE I CPU TIME COMPARISON FOR EXAMPLE 2 line one and three of the network is excited with a trapezoidal voltage source of rise time ns, pulsewidth ns ROY et al.: WAVEFORM RELAXATION ALGORITHM FOR EFFICIENT ANALYSIS OF TRANSMISSION-LINE NETWORKS 461 Fig. 12. Circuit of Example 3. Fig. 13. Scaling of computational cost for Example 3. (a) Scaling of computational cost with number of subnetworks where number of subnetworks . (b) Scaling of computational cost with number of CPUs and amplitude of 5 V. Each subnetwork is modeled using eight subcircuits. In this analysis, the number of subnetworks ( of Fig. 12) is increased from 0 to 50 in steps of 5 (i.e., the number of subcircuits are increased from 0 to 400 in steps of 40). For each case, the network is solved using both proposed work and the full SPICE simulation. The WR iterations for the proposed work is performed using the hybrid technique on a sequential machine with the predefined error tolerance set to and an initial guess of the relaxation sources set to the dc solution of zero. For this particular error tolerance, the number of iterations required for convergence was found to be consistently between 6 and 7. The scaling of the computational cost is of both proposed work and full SPICE simulation with demonstrated in Fig. 13(a). Similar to the previous example, the proposed WR algorithm shows linear scaling with the size of the network compared to the super linear scaling of full SPICE ( where for this example). Next, the performance of the proposed work is demonstrated on a parallel platform. The number of subnetworks is fixed at the corner of our design space where and the network solved using both proposed work and full SPICE simulation. The proposed WR iterations are performed on a parallel platform where number of processors are varied from to and the same error tolerance of is used with an initial guess of the relaxation sources set to the DC solution of zero. The scaling of the CPU speed up offered by the proposed algorithm over full SPICE simulations as a function of the number of processors is shown in Fig. 13(b) and summarized in Table II. As expected, the speed up for the proposed WR algo- where number of CPUs . TABLE II CPU TIME COMPARISON FOR EXAMPLE 3 rithm scales almost linearly with number of processors, similar to Example 2. VI. CONCLUSION In this paper, a longitudinal-partitioning-based waveform relaxation algorithm for efficient transient analysis of distributed transmission-line networks is presented. The proposed methodology represents lossy transmission lines as a cascade of lumped circuit elements alternating with lossless line segments, where the lossless line segments are modeled using the method of characteristics. Partitioning the transmission lines at the natural interfaces provided by the method of characteristics allows the resulting subcircuits to be weakly coupled by construction. The subcircuits are solved independently using a hybrid iterative technique that combines the fast convergence of the proposed GS technique with the parallelizability of the GJ technique. Numerical examples illustrate that the proposed algorithm exhibits good scaling with both the size of the network and the number of CPUs available for parallel processing, thereby providing sig- 462 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 60, NO. 3, MARCH 2012 nificant savings in run time costs compared with full SPICE simulations. REFERENCES [1] R. Achar and M. Nakhla, “Simulation of high-speed interconnects,” Proc. IEEE, vol. 89, no. 5, pp. 693–728, May 2001. [2] E. Lelarasmee, A. E. Ruehli, and A. L. Sangiovanni-Vincentelli, “The waveform relaxation method for time-domain analysis of large-scale integrated circuits,” IEEE Trans. Comput.-Aided Des. (CAD) Integr. Circuits Syst., vol. CAD-1, no. 3, pp. 131–145, Jul. 1982. [3] J. White and A. L. Sangiovanni-Vincentelli, Relaxation Techniques for the Simulation of VLSI Circuits. Norwell, MA: Kluwer, 1987. [4] F. Y. Chang, “The generalized method of characteristics for waveform relaxation analysis of lossy coupled transmission lines,” IEEE Trans. Microw. Theory Tech., vol. 37, no. 12, pp. 2028–2038, Dec. 1989. [5] F. Y. Chang, “Waveform relaxation analysis of RLCG transmission lines,” IEEE Trans. Circuits Syst., vol. 37, no. 11, pp. 1394–1415, Nov. 1990. [6] F. Y. Chang, “Relaxation simulation of transverse electromagnetic wave propagation in coupled transmission lines,” IEEE Trans. Circuits Syst., vol. 38, no. 8, pp. 916–936, Aug. 1991. [7] F. Y. Chang, “Waveform relaxation analysis of nonuniform lost transmission lines characterized with frequency dependent parameters,” IEEE Trans. Circuits Syst., vol. 38, no. 12, pp. 1484–1500, Dec. 1991. [8] F. Y. Chang, “Transient simulation of nonuniform coupled lossy transmission lines characterized with frequency-dependent parameters—Part I: Waveform relaxation analysis,” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 39, no. 8, pp. 585–603, Aug. 1992. [9] J. Mao and Z. Li, “Waveform relaxation solution of ABCD matrices of nonuniform transmission lines for transient analysis,” IEEE Trans. Comput.-Aided Des. (CAD) Integr. Circuits Syst., vol. 13, no. 11, pp. 1409–1412, Nov. 1994. [10] F. C. M. Lau and E. M. Deeley, “Transient analysis of lossy coupled transmission lines in a lossy medium using the waveform relaxation method,” IEEE Trans. Microw. Theory Tech., vol. 43, no. 3, pp. 692–697, Mar. 1995. [11] N. M. Nakhla, A. E. Ruehli, R. Achar, and M. S. Nakhla, “Simulation of coupled interconnects using waveform relaxation and transverse partitioning,” IEEE Trans. Adv. Packag., vol. 29, no. 1, pp. 78–87, Feb. 2006. [12] N. Nakhla, A. E. Ruehli, M. S. Nakhla, R. Achar, and C. Chen, “Waveform relaxation techniques for simulation of coupled interconnects with frequency-dependent parameters,” IEEE Trans. Adv. Packag., vol. 30, no. 2, pp. 257–269, May 2007. [13] D. Paul, N. M. Nakhla, R. Achar, and M. S. Nakhla, “Parallel simulation of massively coupled interconnect networks,” IEEE Trans. Adv. Packag., vol. 33, no. 1, pp. 115–127, Feb. 2010. [14] Y.-Z. Xie, F. G. Canavero, T. Maestri, and Z.-J. Wang, “Crosstalk analysis of multiconductor transmission lines based on distributed analytical representation and iterative technique,” IEEE Trans. Electromagn. Compatibil., vol. 52, no. 3, pp. 712–727, Aug. 2010. [15] R. Achar, M. S. Nakhla, H. S. Dhindsa, A. R. Sridhar, D. Paul, and N. M. Nakhla, “Parallel and scalable transient simulator for power grids via waveform relaxation (PTS-PWR),” IEEE Trans. Very Large-Scale Integr. (VLSI) Syst., vol. 19, no. 2, pp. 319–332, Feb. 2011. [16] M. Al-Khaleel, A. E. Ruehli, and M. J. Gander, “Optimized waveform relaxation methods for longitudinal partitioning of transmission lines,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 56, no. 9, pp. 1732–1743, Aug. 2009. [17] M. J. Gander and A. Stuart, “Space-time continuous analysis of waveform relaxation for the heat equation,” SIAM J. Sci. Comput., vol. 19, no. 6, pp. 2014–2031, Nov. 1998. [18] E. Giladi and H. B. Keller, “Space time domain decomposition for parabolic problems,” Numer. Math., vol. 93, no. 2, pp. 279–313, 2002. [19] W. T. Beyene, “Application of multilinear and waveform relaxation methods for efficient simulation of interconnect-dominated nonlinear networks,” IEEE Trans. Adv. Packag., vol. 31, no. 3, pp. 637–648, Aug. 2008. [20] V. B. Dmitriev-Zdorov and B. Klaassen, “An improved relaxation approach for mixed system analysis with several simulation tools,” in Proc. EURO-DAC, 1995, pp. 274–279. [21] V. B. Dmitriev-Zdorov, “Generalized coupling as a way to improve the convergence in relaxation-based solvers,” in Proc. EURO-DAC/ EUROVHDL Exhib., Geneva, Switzerland, Sep. 1996. [22] M. J. Gander and L. Halpern, “Optimized Schwarz waveform relaxation methods for advection reaction diffusion problems,” SIAM J. Numer. Anal., vol. 45, no. 2, pp. 666–697, Apr. 2007. [23] M. J. Gander, “Overlapping Schwarz waveform relaxation methods for parabolic problems,” in Proc. Algoritmy, 1997, pp. 425–431. [24] C. R. Paul, Analysis of Multiconductor Transmission Line. New York: Wiley-Interscience, 2008. [25] S. Roy and A. Dounavis, “Longitudinal partitioning based waveform relaxation algorithm for transient analysis of long delay transmission lines,” in IEEE MTT-S Int. Microw. Symp. Dig., Baltimore, Jun. 2011, pp. 1–4. [26] N. Nakhla, A. Dounavis, R. Achar, and M. S. Nakhla, “DEPACT: Delay extraction-based passive compact transmission-line macromodeling algorithm,” IEEE Trans. on Adv. Packaging, vol. 28, no. 1, pp. 13–23, Feb. 2005. [27] N. Nakhla, M. S. Nakhla, and R. Achar, “Simplified delay extraction-based passive transmission line macromodeling algorithm,” IEEE Trans. Adv. Packag., vol. 33, no. 2, pp. 498–509, May 2010. [28] F. H. Branin, Jr., “Transient analysis of lossless transmission lines,” Proc. IEEE, vol. 55, no. 11, pp. 2012–2013, Nov. 1967. [29] A. Odabasioglu, M. Celik, and L. T. Pilleggi, “PRIMA: Passive reduced-order interconnect macromodeling algorithm,” IEEE Trans. Comput.-Aided Des. (CAD) Integr. Circuits Syst., vol. 17, no. 8, pp. 645–653, Aug. 1998. [30] A. Dounavis, R. Achar, and M. Nakhla, “Efficient passive circuit models for distributed networks with frequency-dependent parameters,” IEEE Trans. Adv. Packag., vol. 23, no. 8, pp. 382–392, Aug. 2000. [31] A. Dounavis, R. Achar, and M. Nakhla, “A general class of passive macromodels for lossy multiconductor transmission lines,” IEEE Trans. Microw. Theory Tech., vol. 49, no. 10, pp. 1686–1696, Oct. 2001. [32] A. Cangellaris, S. Pasha, J. Prince, and M. Celik, “A new discrete transmission line model for passive model order reduction and macromodeling of high-speed interconnections,” IEEE Trans. Adv. Packag., vol. 22, no. 3, pp. 356–364, Aug. 1999. [33] Q. Yu, J. M. L. Wang, and E. S. Kuh, “Passive multipoint moment matching model order reduction algorithm on multiport distributed interconnect networks,” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 46, no. 1, pp. 140–160, Jan. 1999. [34] E. Gad and M. Nakhla, “Efficient simulation of nonuniform transmission lines using integrated congruence transform,” IEEE Trans. Very Large-Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 1307–1320, May 2004. par [35] F. Fer, “Resolution de l’equation matricielle produit infini d’exponentielles matricielles,” Acad. Roy. Belg. Cl. Sci., vol. 44, no. 5, pp. 818–829, 1958. [36] J. D. Dixon, “Exact solution of linear equations using p-adic expantions,” Numerische Mathematik, vol. 40, no. 1, pp. 137–141, 1982. [37] W. Eberly, M. Giesbrecht, P. Giorgi, A. Storjohann, and G. Villard, “Solving sparse integer linear systems,” in Proc. ISSAC’06, Genova, Italy, Jul. 2006, pp. 63–70. [38] “HSPICE U-2008.09-RA,” Synopsis Inc.. [39] “HSPICE Signal Integrity User Guide,” Synopsis Inc., Sep. 2005. [40] M. Celik and A. C. Cangellaris, “Efficient transient simulation of lossy packaging interconnects using moment-matching techniques,” IEEE Trans. Compon., Packag., Manuf. Technol. B, vol. 19, no. 1, pp. 64–73, Feb. 1996. Sourajeet Roy (S’11) received the B.Tech. degree in electrical engineering from Sikkim Manipal University, India, in 2006, and the M.E.Sc. degree from University of Western Ontario, London, ON, Canada, in 2009, where he is currently working toward the Ph.D. degree. His research interests include modeling and simulation of high speed interconnects, signal and power integrity analysis of electronic packages and design and implementation of parallel algorithms. Mr. Roy was the recipient of the Vice-Chancellors Gold Medal for academic excellence at the undergraduate level. ROY et al.: WAVEFORM RELAXATION ALGORITHM FOR EFFICIENT ANALYSIS OF TRANSMISSION-LINE NETWORKS Anestis Dounavis (S’00–M’03) received the B.Eng. degree from McGill University, Montreal, QC, Canada, in 1995, and the M.Sc. and Ph.D. degrees from Carleton University, Ottawa, ON, Canada, in 2000 and 2004, respectively, all in electrical engineering. He currently serves as an Associate Professor with the Department of Computer and Electrical Engineering, University of Western Ontario, London, ON, Canada. His research interests are in electronic design automation, simulation of high-speed and microwave networks, signal integrity and numerical algorithms. Dr. Dounavis was the recipient of the Ottawa Centre for Research and Innovation (OCRI) futures award—student researcher of the year in 2004 and the INTEL Best Student Paper Award at the Electrical Performance of Electronic Packaging Conference in 2003. He also received the Carleton University Medal for outstanding graduate work at the M.Sc. and Ph.D. levels in 2000 and 2004, respectively. He was the recipient of the University Student Council Teaching Honour Roll Award at the University of Western Ontario in 2009 to 2010. 463 Amir Beygi (S’08) received the B.S. degree in electrical engineering from K.N. Toosi University of Technology, Tehran, Iran, in 2004, the M.S. degree in electrical engineering from Iran University of Science and Technology, Tehran, Iran, in 2007, and the Ph.D. in electrical and computer engineering from The University of Western Ontario, London, ON, Canada, in 2011. His research interests include simulation and modeling algorithms for electromagnetic compatibility and signal integrity of high-speed interconnects.