Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Immunity-aware programming wikipedia , lookup
Telecommunications engineering wikipedia , lookup
Protective relay wikipedia , lookup
Microprocessor wikipedia , lookup
Flexible electronics wikipedia , lookup
Regenerative circuit wikipedia , lookup
Electronic engineering wikipedia , lookup
Semiconductor device wikipedia , lookup
Building Modern Integrated Systems: A Cross-cut Approach (The Electrical, The Optical and The Mechanical) Vladimir Stojanović Integrated Systems Group Massachusetts Institute of Technology Chip design is going through a change Already have more devices than can use at once Limited by power density and bandwidth Intel Network Processor 1 GPP Core 16 ASPs (128 threads) 18 Stripe RDRA M 1 18 RDRA M 2 64b (64b) 66 MHz QDR SRAM 1 E/D Q 1 1 8 8 18 RDRA M 3 Intel® XScale ™ Core PCI 32K IC 32K DC QDR SRAM 2 E/D Q 1 1 8 8 MEv2 1 G A S K E T QDR SRAM 3 E/D Q 1 1 8 8 QDR SRAM 4 E/D Q 1 1 8 8 Sun Niagara 8 GPP cores (32 threads) Intel 4004 (1971): 4-bit processor, 2312 transistors, ~100 KIPS, 10 micron PMOS, 11 mm2 chip Asanovic IBM Cell 1 GPP (2 threads) 8 ASPs MEv2 2 MEv2 3 MEv2 4 MEv2 8 MEv2 7 MEv2 6 MEv2 5 MEv2 9 MEv2 10 MEv2 11 MEv2 12 MEv2 16 MEv2 15 MEv2 14 MEv2 13 IXP280 0 Rbuf 64 @ 128B S P 16b I 4 or C S 16b I X Tbuf 64 @ 128B Hash 48/64/1 CSRs 28 -Scratc Fast_wr h -UART 16KB Timers -GPIO BootROM/Sl owPort Picochip DSP 1 GPP core 248 ASPs Cisco CSR-1 188 Tensilica GPPs 1000s of processor “The Processor is cores and the new Transistor” [Rowen] accelerators per die 2 Subthreshold leakage: Game over for CMOS Energy/op vs. 1/throughput 25 100 20 80 Normalized Energy/op Normalized Energy/cycle Energy/op vs. Vdd 15 Etotal 10 Edynamic 5 Eleak 0.1 0.2 0.3 Vdd (V) 0.4 0.5 60 Scale Vdd & VT: 40 20 0 1 10 2 10 3 4 5 10 10 1/throughput (ps/op) 10 CMOS circuits have well-defined minimum energy Caused by leakage and finite sub-threshold swing Need to balance leakage and active energy Limits energy-efficiency, regardless how slow the circuit runs 3 Wire and I/O scaling I/O On-chip wires Best electrical links On-chip wires copper resistivity Energy-cost [pJ/b] 18 16 Chip2Chip Backplane Loss ~20-25dB 14 12 10 8 6 4 Loss ~10dB 2 0 0 5 10 15 20 25 Data-rate [Gb/s] Increased wire resistivity makes wire caps scale very slowly Can’t get both energy-efficiency and high-data rate in I/O 4 Opportunity for integrated system design Energy-efficient computation and communication CMOS – need cross-cut approach to keep scaling performance Network & µArchitecture Design Optimization Communications (Eq., Mod, Coding) 2.5 Energy/Bit (pJ/Bit) 2 Equalized, 30mV Eye Equalized, 50mV Eye Equalized, 90mV Eye Repeated Circuit modeling, Characterization 1.5 1 0.5 0 0 1 2 Data Rate Density (Gbps/um) Φ Φ Circuits & Logic Tx, Rx, Ctrl, Meas Φ in- in+ IPHOTO Φ Φ Φ 3 Interconnect and switch technology Cu MOSFET 5 Manycore SOC roadmap fuels bandwidth demand 64-tile system (64-256 cores) - 4-way SIMD FMACs @ 2.5 – 5 GHz - 5-10 TFlops on one chip - Need 5-10 TB/s of off-chip I/O - Even higher on-chip bandwidth 2 cm 2 cm Intel 48 core -Xeon 6 Cross-layer design approach Build modeling tools for design-space exploration and vertical integration Apps OS ISA Rep.. Power Manycore hardware Offered BW NoC topologies Offered BW NoC metrics Routers, NoC Eq . , Width Eq . , Space Rep . , Width Rep . , Space 3 2. 5 Equalized , 30 mV Eye Equalized , 50 mV Eye Equalized , 90 mV Eye Repeated E nergy /B it (pJ /B it) 2 1 Channel Technologies Throughput Density 2 3 di ( Gbps / um ) Vp Vp D Link design parameters Td Vp D Vs w0 + - w1 Mux Mux Register Register Vth - Wth 1 y1 Sp D Vs 0 Vth - w2 PLL or Opt. Clk in 1 2 3 4 in 1 2 3 Mod-DriverMod-Driver Pre-Driver Pre-Driver -y1 Phase Adjust 4 Phase PLL or Adjust Opt. Clk Φ Φ + Φ 0 1 Data Rate Density 2 ( Gbps / um ) 3 Link metrics PLL or Opt. Clk Receiver Receiver Samplers & Samplers & Front-end Front-end MonitoringMonitoring Φ 1 0 clk + 1. 5 0. 5 d̂ i + Vs + PLL or Opt. Clk clk WLCM + - 2 Register 1 Demux 0 Register Demux Wire Widt h and S pa ce (u m) Eq. Φ + Φ Φ Φ Φ Φ 7 Physical modeling – Equalized interconnects Data rate density, latency, eye opening, sampling delay(Td) Energy-per-bit (Eb) Equalization coefficient: w, y1 Link power model Link architecture: FFE, DFE tap numbers Link performance model Transfer function: T(f), Tc(f) target data rate density Channel model RLGC parameters Wire Model Capturing the wire+ circuit interactions 2D RLGC matrices database Wth, Sp 2D field solver Huge design-space target R, C model wire length: l for LCM & Inverter Circuit Model Normalized R(Ohm-um), C(fF/um) switch model database Linearized RC swtich extraction Technology information Wth, Sp Transistor: spice model Wire: metal conductance, dielectric8constant, etc. Circuit type: LCM|Inverter, WLCM, Vs, Vp Kim and Stojanovic ICCAD07, D&T 2008 Sredojevic and Stojanovic ICCAD08 Circuit type: LCM | Inverter WLCM, Vs, Vp 8 Optimized on-chip links 2.5 + 2 D0B D0 Energy/Bit (pJ/Bit) - 60uA P2_P IBIAS 60uA IBIAS I0 80uA P1_P Weak Driver N1_P + - 4.3u P2_P 27u A<4:0> + I1 I2 N2_P 1.75u 9.4u Decoding Block D 1.5 10mm wire 1 0.5 A<19:0> 20 Transition Signals : P1_P, P2_P, N1_P, N2_P D Equalized, 30mV Eye Equalized, 50mV Eye Equalized, 90mV Eye Repeated 8 + 0 0 Effective Receiver Admittance - 1 2 Data Rate Density (Gbps/um) Strong Driver 20 A<19:0> Amplitude Control voltage swing channel attenuation distance Energy-efficient digital pre-emphasis 90nm CMOS Nonlinear predistortion, mismatch robustness Kim and Stojanovic, ISSCC09, JSSC June 2010 3 mV 200 Optimized off-chip links 100 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.8 2 Oversampled Discrete-Time Rx 1.6 Equalizer - No need for(b) CDR Adaptive Eq FSE Output– Eyeonly Openning @4Gbps Digital Tx Equalizer - Energy-efficient Dynamic Impedance Modulation 400 Bit sequence Driver Linearization Sequence Coding 300 LUT 16 x 6b + 1b E[] Sign S Output Voltage VDD VDD 1 1 0 Therm. Code 1 1 [63:0] 1 0 0 0 0 1 0 mV 4bit Pattern dependent code Static transfer curve [Vdd] Serializer 0.8 0.7 I1 = 0 I2 = 15.5u FSE output I1 = 30.0u eye openning I2 = 0 single-tap output eye openning I1 = 20.0u I2 = 4.5u 200 0.45 100 0.2 -0.05 0 0 -0.3 0.2 0.4 -0.55 -0.8 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Delay between Data and CLK (Data Cycle - UI) Φ225Φ270Φ315 -60 -40 -20 0 20 40 60 ΦS1 V2T RF ΦEVA V2T R F ΦEVA ΦS1 V2T RF Vin- ΦEVA ΦS2 V2T RF ΦEVA_ TOF2+ VO+ 2-tap T2V TOF1- I1 TOR2- I2 CONF0 Sign(Vesref, Way3) -α Sign(V esref, Way2) Sign(V ) Sign(V esref, Way1)ssref, Way3 Sign(ES[n],way 0) ssref, Way2) Sign(V Sign(V ) TOF2- ssref, Way1 Sign(d0[n],way0),Sign(d1[n],way0) 880mV Sensors for Adaptation Equalized Way2 Way0 Iss1,2,ref SI SO ScanChain Way3 Way1 <1pJ/b @ 4Gb/s 90nm CMOS CONF Way0 CONF0 ~b k,Way0 MUX VO- 880mV Unequalized CONF Way1 +α TOF1+ TOR1- ScanChain ScanChain ScanChain ScanChain ScanChain ScanChainSnapShot ScanChain ScanChainSnapShot SnapShot ScanChainScanChain SnapShot ScanChain ScanChainScanChain SnapShot SnapShot ScanChain SnapShot SnapShot SnapShot SnapShot ScanChain SnapShot SnapShot SnapShot ScanChain ScanChain SnapShot SnapShot ScanChain ScanChain SnapShot SnapShot ScanChain SnapShot ScanChain CONF Way2 CONF1 Feedback Equalizer (DFE) TOR1+ TOR2+ ~b k,Way0 ΦEVA Feedforward Equalizer ΦEVA(FSE) ΦS2 300mV 300mV ~b k,Way1 ΦS1ΦS2ΦEVA Way0 Way3 ~b k,Way2CONF2 Φ315Φ0Φ45 ΦS1ΦS2ΦEVA Way1 480 480 mVmV CONF3 Φ45Φ90Φ135 ΦS1ΦS2ΦEVA Way2 Vin+ ~b k,Way3 Φ135Φ180Φ225 ΦS1ΦS2 ΦEVA Way3 Memory Code CONF3 DAC DAC DAC I 2 7-bit DACs i1,ref 7-bit DAC Ii2,ref 7-bit DAC DAC DAC DAC I I es,ref 7-bit DAC +α,ref 7-bit DAC CONF2 CONF1 CONF0 I+α,ref 7-bit DAC Iref Transmitter Scan-chain Sredojevic and Stojanovic, CICC10, JSSC Aug 2011 5b linear 3pJ/b @ 6Gb/s 90nm CMOS Song and Stojanovic, VLSI09, JSSC May 2011 Bandwidth, pin count and power scaling Package pin count 256 cores 2,4 cores 1 Byte/Flop *> half pins for power supply Need 16k pins in 2017 for HPC* 2 TFlop/s signal pins @ 20 Gb/s/link Emerging devices can help Energy-efficient computation and communication CMOS – need cross-cut approach to keep scaling performance Network & µArchitecture Post-CMOS – need cross-cut approach to guide new devices/systems Design Optimization Communications (Eq., Mod, Coding) 2.5 Energy/Bit (pJ/Bit) 2 Equalized, 30mV Eye Equalized, 50mV Eye Equalized, 90mV Eye Repeated Circuit modeling, Characterization 1.5 1 0.5 0 0 1 2 Data Rate Density (Gbps/um) Φ Φ Circuits & Logic Tx, Rx, Ctrl, Meas 3 Interconnect and switch technology Cu MOSFET Si-Photonics Φ in- in+ IPHOTO Φ Φ Φ CMOS photonics density and energy advantage Metric Energy (pJ/b) Bandwidth density (Gb/s/μ) Global on-chip photonic link 0.1-0.25 160-320 Global on-chip optimally repeated electrical link 1 5 Off-chip photonic link (100 μ coupler pitch) 0.1-0.25 6-12 Off-chip electrical SERDES (100 μ pitch) 5 0.1 Assuming 128 x 10Gb/s wavelengths on each waveguide, and 20Gb/s electrical I/O 13 Monolithic Si-Photonics for core-to-core and core-to-DRAM networks Supercomputers Si-photonics in advanced CMOS and DRAM process NO costly process changes Embedded apps Bandwidth density – need dense WDM Energy-efficiency – need monolithic integration 14 14 Many architectural studies show promise [Shacham’07] [Petracca’08] [Koka’08-10] [Joshi’09] [Pan’09] [Batten’08] [Vantrease’08] [Psota’07] [Kirman’06] [Beamer’10] 15 2 3 4 in 1 2 3 Phase Adjust 4 Mod-Driver Mod-Driver Pre-Driver PLL or Phase Opt. Clk Adjust PLL or Opt. Clk Receiver Samplers & Receiver Samplers & Front-end Monitoring Front-end Monitoring Φ Φ + Φ Φ Φ Φ Φ + Φ Φ Φ Dense WDM – 128 wavelengths/waveguide - >1Tb/s per waveguide Need 1000’s of transceivers on die with < 100fJ/bit cost at > 10Gb/s ! - Optimized modulator circuits/devices - Optimized receiver circuits/photo-detector - Optimized thermal tuning 16 Register 1 Register Demux Pre-Driver in PLL or Opt. Clk Demux Mux PLL or Opt. Clk Mux Register Register Big Challenge: Efficient integration with circuits in advanced CMOS process Need to optimize carefully 2 3 4 in 1 2 3 Phase Adjust 4 Mod-Driver Mod-Driver Pre-Driver PLL or Phase Opt. Clk Adjust PLL or Opt. Clk Receiver Samplers & Receiver Samplers & Front-end Monitoring Front-end Monitoring Φ Φ + Φ Φ Φ Φ Φ + Φ Φ Φ assuming 32nm CMOS Laser energy increases with data-rate – – Limited Rx sensitivity Modulation more expensive -> extinction ratio / insertion loss trade-off Tuning costs decrease with data-rate Moderate data rates most energy-efficient Georgas CICC 2011 Register 1 Register Demux Pre-Driver in PLL or Opt. Clk Demux Mux PLL or Opt. Clk Mux Register Register 512 Gb/s aggregate throughput DWDM link efficiency optimization Electrical bump-pitch limited to <1Tb/s/mm2 >10x Package pin limit 0.05 Tb/s/mm2 Optimize for min energy-cost Bandwidth density dominated by circuit and photonics area (not coupler pitch) 10x better than electrical bump limited 200x better than electrical package pin limit 18 Photonic DRAM Network Organization Super DIMM Laser in CPU DRAM cube 1 DRAM cube 4 MC 1 Dwr Drd MC K Important Concepts - Power/message switching (only to active DRAM chip in DRAM cube/super DIMM) - Vertical die-to-die coupling cmd Drd Dwr (minimizes cabling - 8 dies per DRAM cube) Mem Scheduler ( cube 1, die 8) die-die switch cmd Dwr Drd ( cube 1, die 1) -Command distributed electrically (broadcast) - Data photonic (single writer multiple readers) Super DIMM K DRAM cube 4 Modulator bank MC 16 Processor die Receiver/PD bank Tunable filterbank Through silicon via Through silicon via hole Beamer ISCA 2010 Optimizing DRAM with photonics P1 Floorplan P4 Beamer ISCA 2010 Through loss (dB/ring) Cross-layer modeling identifies key device requirements Feedback to device designers Waveguide loss (dB/cm) Optical Laser Power Die Area Overhead Waveguide loss and Through loss limits for 2 W optical laser power 21 Significant integration activity, but hybrid and older processes … 130nm thick BOX SOI 130nm thick BOX SOI [IBM] [Luxtera/Oracle/Kotura] [Many schools] Bulk CMOS Backend monolithic [Intel] [HP] [Watts/Sandia/MIT] [Lipson/Cornell] [Kimerling/MIT] 22 Monolithic CMOS photonic integration Optical Mode Photo credit: Intel Polysilicon - transistor gates, local interconnect and resistors Use for photonic components instead or with silicon body in SOI Sub-100nm lithography has 1-5 nm design grid Enables edge roughness necessary for photonic devices 23 EOS Platform for Monolithic CMOS 2011 photonic integration Joint work with Ram and Popovic 2007 0 Transmission, dB -2 -4 -6 -8 45 nm SOI CMOS IBM 12SOIs0 -10 -12 -14 -200 0 200 400 600 800 1000 Frequency, GHz 32 nm bulk CMOS Texas Instruments 90 nm bulk CMOS IBM cmos9sf 65 nm bulk CMOS Texas Instruments Create integration platform to accelerate technology development and adoption 24 A 32nm bulk CMOS photonic platform Monolithic CMOS photonic platform integrated with CMOS circuits 32nm process – fabrication support from Texas Instruments Robust post-processing steps at MIT Second-order resonator filterbank shows process precision Great on-die matching (rings track within 40GHz) Record thermal heating efficiency 25uW/K Orcutt et al – CLEO 2008, Optics Express 2011 25 in 1 2 3 Phase Adjust 4 PLL or Opt. Clk Register PLL or Opt. Clk Demux Register EOS: A 45nm SOI Monolithic Photonic Platform Mux Receiver PolysiliconPre-Driver and Silicon Photonics on ThinSamplers BOX& IBM SOI Mod-Driver Front-end Monitoring Φ Φ + Φ Φ Φ 6 rows of electronic-photonic WDM links with body and polysilicon photonic devices 54 Transmit-receive testsites, ~3M transistors and hundreds of photonic devices Electrical and photonic integration – test row Body and polysilicon photonic devices Filterbanks, waveguide paperclips, rings, standalone modulators and photodetectors 26 Integration of photonics into VLSI tools Layout of photonics abstract Layout of Circuit blocks abstract LEF LEF LEF of standard cells, I/O pads (provided by ARM) layout modulator.LEF VERSION 5.6 ; BUSBITCHARS "[]" ; DIVIDERCHAR "/" ; MACRO block_electronic_etch_row_1 CLASS BLOCK ; ORIGIN -208 -1794 ; FOREIGN block_electronic_etch_row_1 208 1794 ; SIZE 2488 BY 165 ; SYMMETRY X Y R90 ; PIN heater_a_1 DIRECTION INOUT ; USE SIGNAL ; PORT LAYER ua ; RECT 431 1870.5 436.5 1882 ; END END heater_a_1 ... OBS LAYER m1 ; RECT 208 1794 2696 1959 ; ... END END block_electronic_etch_row_1 Chip-level verilog (instantiation of .LEF macros and connectivity) Floorplan (macro placement, power grid, routing Constraints) SOC Encounter Place and route Place&routed layout Technology files END LIBRARY abstract Photonic device p-cell custom photonics-friendly auto-fill 27 Platform Organization 28 A full electro-optical test setup Fiber Positioner Microscope DUT Fiber Positioner Chip Board HS Clocks Control Board FPGA USB to laptop 29 Extremely good dimensional tolerances in 45nm SOI Good body waveguide loss 3.7dB/cm at ~1220nm 30 Integrated Delta-Sigma Heat Control ~10mW required to retune all 8 rings Thermal tuning BW lower than 500kHz Tuning control overhead negligible Tuning efficiency 2.6mW/nm (32.4mW/2π) On fully substrate removed die 31 Current-sensing optical data receiver Receiver detects photo current 50fJ/b, uA sensitivities, 3-5Gb/s Georgas ESSCIRC 2011 32 Receiver sensitivity Φ Φ Φ in- in+ IPHOTO Φ Φ Φ Exponential Dependence on Wire Capacitance Linear Dependence on Photo-Detector Capacitance 33 Modulator test site Silicon carrier injection modulator monolithically integrated with transistors • • • • Extinction ratio 19dB 45GHz 3dB bandwidth Carrier lifetime ~2-3ns Requires flexible drive circuits • Sub-bit pre-emphasis • Split-supplies 45 GHz 3 dB bandwidth 19 dB extinction First ever dynamic electro-optic test in 45nm SOI Silicon carrier injection modulator monolithically integrated with transistors Modulator Driver Modulator Modulation data-rate up to 1Gb/s 5-10 Gb/s achievable with device and biasing optimization Lots of room to improve circuit/device designs Transistors and Photonics can be built together in advanced CMOS! 35 Improving computation efficiency Energy-efficient computation and communication CMOS – need cross-cut approach to keep scaling performance Network & µArchitecture Post-CMOS – need cross-cut approach to guide new devices/systems Design Optimization Communications (Eq., Mod, Coding) 2.5 Energy/Bit (pJ/Bit) 2 Equalized, 30mV Eye Equalized, 50mV Eye Equalized, 90mV Eye Repeated Circuit modeling, Characterization 1.5 1 0.5 0 0 1 2 Data Rate Density (Gbps/um) Φ Φ Circuits & Logic Tx, Rx, Ctrl, Meas IPHOTO Φ Interconnect and switch technology MOSFET in- in+ 3 Cu Φ Si-Photonics NEMS relay Φ Φ Nano-electro-mechanical (NEM) relays Joint work with T-J. King Liu, E. Alon and D. Markovic (UCB, UCLA) 30mm Gate Oxide 90nm Body Drain A Gate Body 2 7. 5mm A’ Source Channel Relay schematic Nearly ideal switching characteristics: Low on-state resistance (Ron <1kΩ) Infinite off-state resistance Zero off-state leakage 37 Why not use relays to compute? - Need to compare at block level NEMS: 12 relays 4 gate delays Delay Comparison vs. CMOS 1 mechanical delay Single mechanical delay vs. several electrical gate delays For reasonable load, NEMS delay unaffected by fan-out/fan-in Area Comparison vs. CMOS Larger individual devices But often need fewer devices to implement same function F. Chen et al., “Integrated Circuit Design with NEM Relays,” ICCAD 2008 38 Scaled NEMS vs. CMOS adders Energy/op vs. Delay/op across Vdd 9x Compare vs. Sklansky CMOS adder* 90nm technology 30x less capacitance 10x 2.4x lower Vdd Lower device Cg, Cd Fewer devices No leakage energy For similar area: >9x lower E/op, >10x greater delay Scaled relays limited by contact surface energy - 2aJ for 90nm litho – 50x better than 90nm CMOS Patil et. al., “Robust Energy-Efficient Adder Topologies,” in Proc. 18th IEEE Symp. 39 on Computer Arithmetic (ARITH'07). *D. Contact resistance - Feedback from system level Energy/op vs. Delay/op across Vdd & CL Low contact R not critical Good news for reliability… Can build testplatforms that work 40 CLICKR technology development platform: NEM relay-based circuits ISSCC 2010 – TD Award F. Chen et al, ISSCC2010 M. Spencer et al, JSSC Jan’11 41 Towards more complex designs Y2 (a) Y2 (b) A4 A4 Kill A3 A3 A3 A3 8mm A2 A2 Generate Y2 A1 A1 A1 A2 A2 Y1 A1 A1 A1 A0 Y0 A0 Y2 (c) 4 10 (d) A6 Y2 Y2 A5 A4 A4 A3 A3 OTCT (90nm) Dadda/HC (45nm) A2 A2 A1 A1 10 A3 A3 A2 A2 A2 A1 A1 A0 A0 A0 A1 A1 A0 Energy/op (fJ) 16-bit multipliers 1 2 10 10 3 10 Delay(ns) Multiplier building block: 7:3 compressor 98 relays – largest working relay circuit to date Fariborzi ASSCC 2011 A A1 A1 A1 A1 1 0 A2 A3 A0 A1 16X Parallel 10 A2 A5 A3 A3 A2 (a) 10 A3 A3 A0 2 A3 Y A3 A1 A1 A0 10 A4 A4 A4 A2 A2 3 A4 A4 A3 A3 A2 A5 A4 A4 A4 A3 A5 A5 A4 A3 Y0 A6 A5 A5 Scaled MEM Relay Y0 A6 A6 A6 A5 A5 700μm Energy-benefit preserved even in more complex functions Input code A0 A0 (b) NEM Relay VLSI design infrastructure P-cell Spectre Verilog-A Verilog-A Model Schematic Vout Device A A B Layout Verilog B Logic Synthesis Synthesis Place & Route Place Route LVS DRC Verilog-A model and Logic Synthesis created for NEMS technology The flow supports multiple device designs and foundries Toward full systems - NEM Relay scaling 1um litho Relay size 120um x 150um 0.25um litho Scaled Relay size 20um x 20um Sematech 44 Microcontroller Test-Chip 64x8b Scratchpad 64x18b Program Memory 2 x 72 I/O Pads Control Logic Register File + ALU Instruction Decode 32x10b Program Stack 12k relays 9mm x 6mm (using 85um x 53um devices) 45 Summary Cross-layer modeling and design key to continued system performance scaling Building early technology development platforms Feedback to device and circuit designers Accelerated adoption EOS Platform designed for multi-project wafer runs Fast design-space exploration Feedback to all layers of design hierarchy 50 fJ/b receivers with uA sensitivities Record-high tuning efficiency with undercut ~ 25uW/K First modulation demonstrated in 45nm process CLICKR Platform designed for multiple foundries and devices Energy-gains preserved for larger blocks Designs moving toward scaled devices and full VLSI systems 46 Acknowledgments Devices: Tsu-Jae King Liu, Rajeev Ram, Miloš Popović, Henry Smith Architecture: Krste Asanović, Christopher Batten, Ajay Joshi Circuits: Elad Alon, Dejan Marković Students: Devices - Jason Orcutt, Anatoly Khilo, Jie Sun, Cheryl Sorace, Eugen Zgraggen, Jaeseok Jeon, Rhesa Nathanael, Hei Kam Circuits – Michael Georgas, Jonathan Leu, Ben Moss, Chen Sun, Fred Chen, Byungsub Kim, Hossein Fariborzi, Matthew Spencer, Chengcheng Wang, Kevin Dwan Architecture - Yong-Jin Kwon, Scott Beamer, Chen Sun, Imran Shamim DARPA MTO Texas Instruments – Dennis Buss and Tom Bonifield IBM and Trusted Foundry Intel Corporation – Ian Young and Alex Kern 47