Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Electrical engineering wikipedia , lookup
Mains electricity wikipedia , lookup
Resistive opto-isolator wikipedia , lookup
Alternating current wikipedia , lookup
Rotary encoder wikipedia , lookup
Analog-to-digital converter wikipedia , lookup
Integrated circuit wikipedia , lookup
A High-Speed Analog Turbo Decoder 77 A High-Speed Analog Turbo Decoder Fotios Gioulekas , Michael Birbas , Alex Birbas, and George Bilionis, Non-members ABSTRACT A new type of iterative decoders based on analog computing networks, which are used to decode powerful error-correcting schemes, such as Turbo and Low-density parity-check (LDPC) codes, outperform their digital counterparts in terms of power consumption and speed. Only few analog Turbo decoders, all of them based on CMOS subthreshold technology have been implemented till now. This paper aims to present the design and enhanced functionality of an all-analog Turbo decoder taking advantage of high-speed features of SiGe HBTs achieving throughput up to Gbits/s. Simulation results based on AMS 0.35µm SiGe BiCMOS technology demonstrate promising performance compared to existing designs. Keywords: Analog Turbo Decoder, High speed, SiGe benefits. 1. INTRODUCTION The superior performance of Turbo [1] and LDPC [2] codes, which almost “touch” the theoretical Shannon capacity limit offer great channel coding gains in telecommunication systems providing low bit and frame error rates at significant low signal to noise ratios. Researchers [3, 4, 5] have observed that this important class of algorithms can be efficiently represented using graphical models (factor graphs). These error-correcting schemes are based on iteratively passing messages (soft information), which correspond to probability mass functions, and their execution can be interpreted as an instance of a general “sum-product algorithm (SPA) [5]”. It has been shown [3] that this algorithm is identical to the a posteriori probability (APP)-BCJR algorithm [6] used in Turbo decoding. The main advantages of this graphical representation are the easy visualization of the algorithm and the simplification of the corresponding equations leading to low-power execution graphs. Digital implementations of Turbo codes and their representative SPA algorithm require complex floating- point computations, large look up tables and memory storage while the iterative nature of the decoding process causes long latencies, thus limiting 04PSI04: Manuscript received on December 20, 2004 ; revised on August 8, 2005. The authors are with the Department of Electrical and Computer Engineering University of Patras Campus of Rion, 26500, Greece. E-mail:{fgiou, mbirbas, birbas, bilionis}@ee.upatras.gr significantly the decoding speed and increasing the power consumption. However, the direct mapping of the SPA algorithm into analog translinear circuits that has been proposed lately outperforms comparable digital decoders improving the ratio of speed to power consumption by two orders of magnitude [7]. Furthermore, a digital error-correcting decoder has to estimate the original message by processing the received analog signal, a task that requires A/D converters of significant resolution. This denotes that the availability of an analog decoder would give the ability to directly process the incoming signal in its natural form avoiding the A/D converter overhead. The aim of this work is to utilize directly the analog received waveform through the design of a 16bit analog Turbo decoder in 0.35µm SiGe BiCMOS technology and to investigate/compare its capabilities against other existing approaches (both digital and analog). This is one of the first implementations in literature since almost all other known ones, which are few anyway, are in CMOS. The rest of the paper is organized as follows: Section 2 reports the related and previous work in the field of analog decoding. In Section 3 an overview of the Turbo code scheme is given and the employed design technique is described. The decoder architecture and its performance based on simulation results are given in section 4. Finally, section 5 compares our decoder to other analog and digital implementation. 2. RELATED WORK AND MOTIVATION The concept of an all-analog decoding scheme was introduced in 1998 [8] while prior work in analog implementations of Viterbi decoders [9, 10] demonstrated better performance in power consumption and speed than their corresponding digital realizations. Implementations of small analog decoders in CMOS and conventional BiCMOS technologies of Hamming [11], tailbiting convolutional [12, 13], Turbo [14, 15, 16] and LDPC [17, 18] codes have been reported as a proof-of-concept. Those analog decoders are constructed by interconnecting highly parallel translinear [7] circuits. These translinear circuits work with soft (continuous) values in continuous-time and are based on the Gillbert multiplier cell [19]. Some research groups [7] follow the current mode approach, where the soft-inputs and soft-outputs (SISO) [20] of the APP algorithm have the form of currents corresponding to discrete probability distributions while other research group [8, 12] follow the voltage mode approach, where log-likelihood ratios 78 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.3, NO.2 AUGUST 2005 (LLRs) are represented as differential voltages. Both approaches exploit the exponential voltage-current characteristic of bipolar transistors as well as of CMOS ones when these operate in the subthreshold region. Thus, the non-linearity of the transistor is exploited rather than fought. Our work adopts the current mode approach based on the fact that the voltage mode approach is more sensitive in temperature [21]. So far, really few analog implementations of Turbo decoders can be found in the literature [14, 15, 16] and almost all of them in CMOS technology. This is mainly due to the fact that CMOS transistors consume less power, but on the other hand are slower and sensitive to noise and process variations when working in the subthreshold region [22], an operation mode that is anyway difficult to maintain/control but it is necessary to keep so as to avoid losing accuracy. These are some of the reasons that led us to choose SiGe BiCMOS technology and in the same time we were able to directly benefit from the superior speed of SiGe heterojunction transistors. Regarding existing implementation of Turbo decoders in SiGe, to our knowledge only one has been reported [23] indicating speed gains but they followed a different architectural approach from our case. Specifically, they used as many component APP decoders as the estimated number of the iterations to build the analog decoder (no sample-and-hold-circuits where used to feed the decoder with the voltage waveform). However, their design did not work properly and so it was not possible to compare our design to theirs. In this work the objective is to prove the concept of realizing efficient analog implementations of Turbo decoders in SiGe technology, and to investigate/evaluate the pros and cons of such realizations while keeping the design procedure and the Turbo decoder in quite realistic standards. accepts a bit information sequence u and provides a set of outputs c = [u, p, q], where p and q are the parity bits produced by RSC1 and RSC2, respectively. After the serialization of the encoder outputs, the modulator forms and transmits them through the channel. In our case the BPSK modulation scheme and the AWGN memoryless channel model have been employed. Both trellises are considered truncated at the end. The Turbo decoder incorporates a parallel-to-serial (P2S) interface, which accepts the corrupted (by noise) signals and feeds them to the two APP decoders. The two decoders exchange iteratively extrinsic information. The extrinsic information, which is a reliability measure of each component decoder’s estimate concerning the transmitted information symbols, is based on the corresponding parity sequence only. This negative feedback scheme is responsible for the Turbo decoder’s stability and superior performance. After a specified number of iterations the decoder makes final hard-decisions about the transmitted information sequence. Figure 1 illustrates the Turbo encoder with 1/3 rate, the trellis diagram of the constituent codes and the Turbo decoder architecture. 3. DESIGN METHODOLOGY 3. 2 Factor Graph Representation 3. 1 Turbo Code Structure The Turbo encoding and decoding procedures can be efficiently represented through a bipartite graphical model, the factor graph, which expresses how a global function of several variables is factored into a product of local functions [3]. A factor graph has a variable node x for each variable, a factor node f for each local function and an edge connecting a variable node x to a factor node f if and only if x is an argument of f. The final cost function-inference (a-posteriori probabilities) is computed by performing the sum-product algorithm and the node-update rules [5]. Figure 2 delineates the factor graph for our Turbo coding scheme. The variable nodes Yu , Yp , Yq are represented by circles and describe the channel observations provided e.g. by the matched filter of a typical BPSK demodulator. The variable nodes U, Q, P represent the transmitted information (u) and parity bits (p and q), respectively. The auxiliary nodes S are used to define the Trellis states. The function The Turbo encoder is constructed by the parallel concatenation of two or more recursive systematic convolutional codes (RSC), termed constituent codes. The input of the first encoder is not permuted. The input to the other encoders is permuted so that the other constituent encoders are fed with the interleaved and uncorrelated form of the information sequence. Our design refers to the turbo encoder that comprises two identical RSC codes and one cyclicshift interleaver [24]. The RSC codes used are described by the following generator polynomial: g (D) 1 + D2 1 G(D) = 1, = 1, g0 (D) 1 + D + D2 (1) The encoding process of an RSC code can be represented by a trellis diagram, which is defined by the relevant generator polynomial. The Turbo encoder Fig.1: (I) Turbo encoder structure, (II) One trellis section, (III) the turbo decoder scheme. A High-Speed Analog Turbo Decoder 79 Fig.3: MATLAB simulation results of BER and FER vs. SNR. Fig.2: The factor graph representation of the 16-bit Turbo code. nodes T, Ch are represented by squares and correspond to the trellis sections (this is a binary function, which its value is true when a codeword belongs to the trellis) and to the transition probabilities of the Gaussian channel, respectively. The methodological steps used for this implementation are described as follows. Firstly, the factor graph model is employed as the input specification. A high-level language like MATLAB simulates the functionality of the Turbo code. The compound encoders are chosen appropriately in order to construct a Turbo code of large “minimum Hamming distance” avoiding “error-floor” problems, if it is possible. The aforementioned encoders (eq.1) of constraint length equal to three have the properties to constitute a Turbo code with low complexity that fulfills the above criteria. In Figure 3 the BER and FER curves are depicted and show the error correcting capability of the employed decoder scheme. 3. 3 The Architecture of the Basic Analog Circuit The next step during the design of the analog Turbo decoder includes the mapping of the Trellis function nodes into the transistor-level analog circuits. The Trellis function node (sum product module) is described and evaluated according to the following equation: pX (x)pY (y)f (x, y, z) (2) pZ (z) = S x∈X y∈Y pX , pY and p − Z are probability distributions (real-valued non-negative functions) defined on some set X, Y, and Z of discrete random variables x, y and z respectively. f is a binary function ranged in the set {0, 1}, which denotes that the two nodes x, y are connected by a trellis branch and S is a normalization factor that does not depend on z. Eq.2 is the elementary computation used in the APP algorithm. A current vector Iv with non-negative components represents the relative probability distributions ([Iv,1 , . . . , Iv,n ] = [pv (v1 ), . . . , pv (vn )]). Figure 4 shows the circuit implementation of the equation (2). The circuit consists of three cores. Two cores are responsible for the interconnection to other computation modules and are based on current mirrors [6, 14]. The scaling up of the currents can be employed either in the input or the output interfacing core. For BiCMOS implementations the scaling up at the input core provides more speed to the circuit [14]. The central core that performs the appropriate computation as defined by (eq.2) is based on the Gilbert multiplier cell. In our case the computation core and the input core employ HBTs transistors while the output core uses PMOS ones. The next steps of the design methodology involve schematic design and Spice-level circuit simulations. Transistor low-level effects are taken into account in order to determine their impact to the error-correcting performance of the decoder. The design is verified comparing the circuit simulations with the factor-graph specification (MATLAB simulation results of the factor graph model). Finally, the layout and the chip is constructed and it is ready for fabrication. 4. THE ANALOG TURBO DECODER ARCHITECTURE 4. 1 Analog Components of the Turbo Decoder The analog Turbo decoder architecture includes a serial-to-parallel (S2P) interface, two APP modules, an interleaver and the output stages, which in- 80 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.3, NO.2 AUGUST 2005 Fig.4: Generic circuit implementation of the equation (2). Fig.5: The 16-bit analog Turbo decoder architecture. volve current comparators and shift registers in order to serialize the output (parallel to serial interfaceP2S). Due to the fact that this analog network works continuously in time, the inputs should be fed simultaneously. The demodulator provides soft-values (LLRs), which are an estimation of the corresponding codeword. This output has the form of a voltage waveform representing the LLR values (LLR= ln[P(y/u=1)/P(y/u=0)]). In order to avoid the use of a large number of the decoder pin-outs, a S2P interface has been employed [25], which comprises two pipeline stages embedding two sample-and-hold circuits. This allows the decoder to process the analog signal frame-by-frame. The role of the two stage buffer is firstly, to keep track of the corresponding soft-value and secondly, to feed in parallel the decoder inputs through a voltage to current converter (V2I). Figure 5 depicts the analog Turbo decoder’s block diagram architecture. The probability propagation (SPA) is applied at the factor graph’s level. Figure 6 presents the basic components of a single APP decoder used in the analog Turbo decoder. Each circuit line represents Fig.6: The block diagram of the APP decoder with 16-bit length. a probability distribution function and consequently the relevant current vector. The boxes represent the computations based on the elementary module given in eq.2. Specifically, the A-type modules perform the branch metric computation γ. (It associates the channel transition probabilities of the information and parity bits with the relevant edge of the trellis.) The B-type modules multiply the branch metric by the extrinsic information δ(u) provided by the other APP decoder. The C-type and D-type modules are responsible for the computation of the forward α and backward metrics b, respectively. The E-type modules multiply the forward metric by the backward one and the F-type modules compute the extrinsic information E. Finally, the G-type modules provide the a-posteriori probabilities U of the information bits. The above computations are based on the SumProduct algorithm and are given analytically in Table 1. S is a scaling-up coefficient (it differs from module to module), P (y/u) and P (y1 /p) are the channel transition probabilities for the information and the parity bits, respectively. The execution time t corresponds to the relevant trellis section. 4. 2 Transistor Level Design The analog decoder has been designed using a 0.35 µm SiGe BiCMOS process from AMS. The analog circuit operates with a 5V supply. The S2P interface feeds the voltage-to-current converters, which are actually differential amplifiers. To efficiently map the LLR values to voltages, we need to evaluate the dynamic range of the transistors. Simulations/optimizations have shown that for our design the common mode voltage equals to VCM =2.2 V. The A High-Speed Analog Turbo Decoder 81 Table 1: APP Computations. Fig.7: V2I component. range of the single-ended voltage Vinput is [2.0, 2.4] volts. Therefore, the LLR values are converted to differential voltages (∆Vdif f erential = 0.2Vp−p ) by scaling and shifting them with constant factors. In our case, the following equation holds. Vinput = A ∗ Vth ∗ LLR + VCM = 0.0282 ∗ LLR + 2.2 (3) The differential currents (I0 and I1 ) are provided to the analog decoder APPs through the V2Is components (log-likelihood ratios to probabilities conversion). The differential currents represent channel transition probabilities (see Figure 7). Thus the exponential characteristic of the HBTs regarding the base-emitter voltage versus the collector current is expressed as follows: I0 I = I 1 ∆V −V th 1+e = I0 + I1 , I1 = I ∆V −V e th ∆V −V 1+e th This circuit is a translinear circuit based on a Gilbert cell that calculates the branch metric and it is depicted in Figure 8 (I). The relevant circuit of the module C that performs the forward recursion of the APP algorithm is also shown in Figure 8 (II). The bottom HBT transistor (in both Figures 8 (I) and 8 (II)) is responsible for the bias current and the scaling-up of the input currents. This scaling-up in the input speeds-up the decoder. If chosen properly this bias current corresponds to the probability of one. The 4 diode connected HBTs transistors (Figure 8 (I)) are used to mirror the input currents into the basic multiplication circuitry. The rest of the HBTs transistors perform the assigned calculation. The pMOS transistors mirror the output current and allow the direct cascading of the modules avoiding extra interface circuitry. It is obvious that by increasing the bias current and the emitter length of the HBTs, the power consumption is increased too. In order to keep the power consumption and the die area to an acceptable magnitude, the bias current has been chosen to be equal to Ibias = 688µA and the emitter length of the HBTs equals to Lemitter = 8×0.35µm. The channel width of each PMOS transistor is W = 10µm. The values of the reference voltages are VRef a = 2.3 Volts, VRef b = 1.3 Volts and Vb = 1.2 Volts. The value of the resistor equals to R = 505.65Ω. Same comments hold for the similar circuits of the C, D, E, F and G modules. (4) 4. 3 Simulation Results As mentioned in section 4.1 the module A calculates the branch metric of the relevant trellis section. SPICE simulation of the analog Turbo decoder is a very time-consuming procedure. In order to simulate 82 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.3, NO.2 AUGUST 2005 settled within 8ns while existing BiCMOS and CMOS designs [14] and [10] require 50ns and 1µs, respectively. Similar behavior has also been reported [17] but only for a single APP decoder. (Similar graph with Figure 9 have been obtained for the 16-bit case but only the 10-bit case is shown here for reasons of clarity.) The following figure shows this subset of the output differential currents, which correspond to the aposteriori probabilities, provided the transmitted bit is zero P(ui = 0/yi ). The error-received bits 3 and 4 were finally corrected at the end of 8ns. Fig.9: Analog Turbo decoder’s error correction capability vs. settling time for 30 bits codeword (10 information bits) at SNR=0.8dB. Fig.8: (I) Branch metric calculation, (II) schematic of the forward recursion. its performance, we have chosen to simulate a Turbo decoder of 10 and 16 information bits. The analog Turbo decoder is a continuous-time asynchronous network including a continuous-time feedback loop. After feeding it with the voltage waveform, which corresponds to a received frame, the decoder settles to a steady stage and its output provides an estimation of the transmitted information bits. Hard decisions are made on soft outputs using a bank of current comparators. Therefore, it avoids the discrete iterations that its digital counterpart performs before making the hard decision. Figure 9 illustrates the transient response of the proposed analog decoder and characterizes its performance. The use of SiGe HBTs has as a result a reduction to the decoder’s settling time. The used HBTs have a transient frequency of approximately fT =60 GHz. This gives a significant advantage comparing to the CMOS and conventional BiCMOS processes used in other analog decoding designs. The simulated transient response shows that the current values are The decoding process of the analog network is divided into two phases. The first phase is related to the decoder reset by feeding it with an analog signal that produces equal currents. The time delay for this procedure has been estimated by simulations to be equal to Treset = 5ns. We reset the decoder for every received frame. The other phase involves the supply of the received frame to the decoder and its decoding, and it is characterized by the settling time Tsettling . Thus the total decoding delay equals approximately to Td = Treset + Tsettling = 13ns. This gives a total throughput of 16×75M b/s = 1.23Gb/s approximately for a 16-bit decoder. Figure 10 shows an example of the decoding progress of the 16-bit analog decoder. A decoder’s performance and its capability are at a great extent also determined and evaluated by its bit error (BER) rate at various SNR values. The performance of the specific 16-bit analog decoder is given in Figure 11, where the simulated BER is drawn as a function of the time Tsettling for SNR=0.4 and 0.8 dB. From the graph, we observed that as the decoder size increases the BER values are decreased, and its errorcorrecting capability improves as the settling time increases, thus exhibiting the desired performance. A High-Speed Analog Turbo Decoder 83 Table 2: coders. Fig.10: Transient response of the decoding process for 5 consecutive received frames (SNR=0.8dB). Throughput of Digital and Analog De- Decoder Information bits Throughput Analog SiGe Turbo decoder (This work) 16 1.2Gb/s Analog CMOS Turbo decoders [8, 10] 16-40 2-13Mb/s FPGA based Turbo decoder [20, 21 ,22] 4k-64k (6.1-6.5) Kb/s- Digital ASIC Turbo decoder [23] 16k 356Kb/s 311Mb/s reliable metric for comparing various decoders implemented in different technologies (in general BiCMOS designs are faster but consume more power than the relevant CMOS ones). Table 3 depicts the performance of analog and digital decoders based on this metric. Table 3: Energy consumption per decoded bit for digital and analog Turbo Decoders. Energy consumption per decoded bit (nJ/decoded bit) Fig.11: BER performance of the analog Turbo decoder as a function of the settling time. 5. COMPARISON OF THE PROPOSED DECODER WITH EXISTING ANALOG AND DIGITAL IMPLEMENTATIONS The simulation results show that our 16-bit analog decoder under implementation, which uses SiGe BiCMOS technology, can achieve a high-throughput of 1.2Gb/s approximately. On the other hand, the throughput of existing analog CMOS implementations [9, 10] varies from 2-13 Mb/s. This indicates the applicability of this analog decoder to high-performance systems, like storage applications, VDSL systems as well as optical systems. The throughput of digital implementations based on Field Programmable Gate Arrays (FPGAs) ranges from 356Kbit/s (64k info bits) [20] to 6.1-6.5Mb/s (4k-info bits) [21, 22] while the fastest ASIC implementation [23] of Turbo codes achieves throughput of 311Mbit/s (16k-info bits) at the expense of high power consumption and silicon area. Table 2 shows the throughput of Digital and Analog Turbo Decoders. The energy consumption per decoded bit at a specific operating supply voltage can be considered as a Decoders 7.088@5V (Estimated) Analog Turbo decoder in 0.35µm SiGe BiCMOS (This work) [email protected] (Measured) 0.35µmCMOS Analog Turbo decoder [8] [email protected] (Measured) 0.35µm CMOS Analog Turbo decoder [10] 40@5V (Measured) 0.8µm CMOS Digital Turbo decoder [24] 960@5V (Estimated) TMS320C55XTM DSP [25] Table 3 shows that the analog decoder presented here outperform digital implementations. Indeed, digital decoders demand ADC converters of high resolution to increase the BER performance. Additionally, digital implementations are dominated by excessive memory requirements, which are due to the excessive block length, the need for floating-point storage at the interleavers and for exponential computations. Thus, they require much greater chip area than an analog decoder. For instance, the type-D module used for the backward recursion requires 8 multiplications and 4 additions for the digital case, while the analog implementation requires no more than 49 transistors (including the current mirrors and bias) for the same task. Regarding our decoder based on SiGe HBTs, it greatly outperforms existing analog CMOS implementations in terms of speed but as expected consumes more power. This is due to the inherent nature of CMOS transistors, at the expense however of the great difficulty to have them operating at the subthreshold region. The estimated power consump- 84 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.3, NO.2 AUGUST 2005 tion of our 16-bit analog decoder is approximately P=886mW at 5V supply. Furthermore, it should be noted that, as shown in Table 3, the energy consumption per decoded bit of our decoder is at the same order of magnitude with the corresponding CMOS ones, which demonstrates that the capability of high throughput is not translated to excessive power consumption. 6. CONCLUSION In this paper the design of a high-speed analog Turbo Decoder with low energy consumption per decoded bit was presented. Its implementation is based on the 0.35 µm SiGe BiCMOS AMS process and, to our knowledge, it is one of the first analog Turbo Decoder’s realizations in SiGe. The derived results so far demonstrate the efficiency and the applicability of this design for high throughput systems, being able to reach speeds of the order of Gb/s. This exhibits superior performance than the few existing analog CMOS ones and outperforms similar digital designs in terms of speed and power consumption, while also its size is certainly smaller. Currently, a 16-bit version of this decoder is being implemented with the layout of the design and the prototype board being under construction and about to be sent for fabrication. ACKNOWLEDGEMENT This work has been partially supported by the “K. Karatheodoris” basic research initiative of the University of Patras. References [1] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: Turbo-codes (1),” IEEE International Conference on Communications (ICC93), pp. 1064-1071, vol. 2/3, May 1993. [2] R. Gallager, “Low density parity check codes,” IRE Transactions on Information Theory, vol. IT8, pp. 21-28, Jan. 1962. [3] N. Wiberg, Codes and decoding on general graphs, Ph.D. dissertation 440, Linkoping Studies in Science and Technology, University of Linkoping, Linkoping, Sweden, 1996. [4] F. R. Kschischang, B. J. Frey, and H. A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Transactions on Information Theory, vol. 47, pp. 498-519, Feb. 2001. [5] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Transactions on Information Theory, vol. IT-20, pp. 284-287, Mar. 1974. [6] H. A. Loeliger, F. Lustenberger, M. Helfenstein, and F.Tarkoy, “Probability propagation and decoding in analog VLSI,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 837-843, Feb. 2001. [7] J. Hagenauer and M. Winklhofer, “The analog decoder,” Proceedings of IEEE International Symposium on Information Theory (ISIT98), 16-21 Aug.1998, Boston, USA, p.145. [8] V. C. Gaudet, and P. G. Gulak, “A 13.3Mb/s 0.35 µm CMOS Analog Turbo Decoder IC with a Configurable Interleaver,” IEEE Journal of SolidState Circuits, vol. 38, no. 11, pp.2010-2015, Nov. 2003. [9] A. Xotta, D. Vogrig, A. Gerosa, A. Neviani, A. Graell i Amat, G. Montorsi, M. Bruccoleri, and G. Betti “An All-Analog CMOS Implementation of a Turbo Decoder for Hard-Disk Drive Read Channels”, IEEE International Symposium on Circuits and Systems (ISCAS02), 26-29 May 2002, Scottsdale, USA, pp.69-72,. [10] D.Vogrig, A. Gerosa, A. Neviani, A. Graell i Amat, G. Montorsi, and S. Benedetto, “A full CMOS Analog Turbo Decoder for UMTS Coding Schemes,” 2nd Analog Decoding Workshop (ADW03), Sep. 2003, Zürich, Switzerland. [11] C. Winstead, J. Dai, S. Yu, C. Myers, R.R. Harrison, and C. Schlegel, “CMOS Analog MAP Decoder for (8, 4) Hamming Code,” IEEE Journal of Solid State Circuits, vol. 39, no. 1, pp. 122-131, Jan. 2004. [12] M. Moerz, T. Gabara, R. Yan, and J. Hagenauer, “An analog 0.25 µm BiCMOS tailbiting MAP decoder,” IEEE International Solid-State Circuits Conference Digest Technical Papers (ISSCC00), Feb. 2000, San Francisco, USA, pp. 356-357,. [13] F. Lustenberger, M. Helfenstein, H. A. Loeliger, F. Tarkoy, and G. S. Moschytz, “All-analog decoder for a binary (18, 9, 5) tail-biting trellis code,” Proceedings of 25th European Solid-State Circuits Conference (ESSCIRC99), Sept. 1999, Duisburg, Germany, pp. 362-365. [14] F. Lustenberger, Ph.D. Dissertation, On the Design of Analog Iterative VLSI Decoders, ETH Zürich, No 13879, Hartung-Gorre, Konstanz, Series in Signal and Information Processing, Vol. 2, Nov. 2000. [15] B. Gilbert, “A precise four-quadrant multiplier with subnanosecond response,” IEEE Journal of Solid-State Circuits, vol. 3, pp. 365-373, 1968. [16] Y. Tsividis, Operation and Modeling of the MOS Transistor, 2nd edition, McGraw-Hill, Apr. 1999. [17] W.Huang, V. Igure, G. Rose, Y. Zhang, and M. Stan, “Analog Turbo Decoder Implemented in SiGe BiCMOS Technology,” 40th Design Automation Conference Student Design Contest (DAC03), Jun. 2003, Anaheim, California,. [18] B. Vucetic, and J. Yuan, Turbo Codes: Principles and Applications, Kluwer Academics Publishers, 2000. [19] S. Yu, C. Winstead, C. Myers, R. Harrison, and A High-Speed Analog Turbo Decoder C. Schlegel, “An analog decoder for (8, 4) Hamming code with Serial Input Interface,” Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS02), May 2002, Arizona, USA. [20] S. S. Pietrobon, “Implementation and performance of a turbo/MAP decoder,” International Journal of Satellite Communications, vol. 16, pp. 23-46, Jan.-Feb. 1998. [21] Small World Communications, http://www.sworld.com.au. [22] J. Steensma, C. Dick, “FGPA Implementations of a 3GPP Turbo codec,” Proceedings of 35th IEEE Asilomar conference on Signal, Systems and Computers (Asilomar01), Nov. 2001, Pacific Grove, CA, USA, pp. 61-65. [23] Comtech AHA Corporation, http://www.aha.com. [24] C. Berrou, P. Combelles, P. Pénard, and B. Talibart, “An IC for turbocodes encoding and decoding,” IEEE International Solid-State Circuits Conference Digest Technical Papers (ISSCC95), Feb. 1995, San Francisco, USA, pp. 90-91. [25] Texas Instruments, http://www.ti.com. Fotios Gioulekas received his Diploma in Electrical and Computer Engineering from the University of Patras, Greece, in 2000. He is currently working towards the Ph.D. degree in Electrical and Computer Engineering at the University of Patras. His research interests include analog VLSI implementation of iterative error-decoding alogrithms, and information theory. Michael Birbas received the Diploma and Ph.D. degrees in Electrical Engineering from the University of Patras, Greece in 1985 and 1991. From 19861991 and 1992-1995 he worked with the teams of the VLSI Design Lab. and the Applied Electronics Lab. of the University of Patras respectively, in VLSI implementation of DSP algorithms and in embedded systems design of telecommunication applications. From 1993-1999 he worked at Synergy Systems S.A., a high-tech company in the area of microelectronics, as R&D projects manager. From 1999 -2002 he was with GIGA Hellas S.A-an Intel company, (part of the Optical Products Group of Intel), working as R&D manager in the development of 10-40 Gbit/s system solution demonstrators for SDH/SONET and OTN applications and of 40 Gbits/sec Ser/Des components. Since 2002 he is with the team of Applied Electronics Lab. of the University of Patras working in the fields of embedded systems design and efficient analog/digital VLSI implementations of soft iterative decoding algorithms. 85 Alexios Birbas is an Associate Professor at the Dept. of Electrical and Computer Engineering of the University of Patras, Greece. He holds a Ph.D. degree from the University of Minnesota, Minneapolis USA. His research interests focus in device electronics, modeling of electron devices, analog circuit design, soc design and co-design. He has more than 50 publications in related matters and has taken various industry consultant positions. George Bilionis received his Diploma in Electrical and Computer Engineering from the University of Patras, Greece, in 2000. He is currently working towards the Ph.D. degree in Electrical and Computer Engineering at the University of Patras. In 2001, he was with Giga Aps (an Intel company) where he was involved with the design and development of circuits for 40 Gb/s optical communications. His research work is in RF analog and mixed-signal circuits for wireless and optical communications.