Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
THE RAW MICROPROCESSOR: A COMPUTATIONAL FABRIC FOR SOFTWARE CIRCUITS AND GENERALPURPOSE PROGRAMS Taylor, M.B.; Kim, J.; Miller, J.; Wentzlaff, D.; Ghodrat, F.; Greenwald, B.; Hoffman, H.; Johnson, P.; Jae-Wook Lee; Lee, W.; Ma, A.; Saraf, A.; Seneski, M.; Shnidman, N.; Strumpen, V.; Frank, M.; Amarasinghe, S.; Agarwal, A. IEEE Micro , Volume: 22 Issue: 2 , March-April 2002 pp. 25-35 Wire delay is emerging as the natural limiter to microprocessor scalability. A new architectural approach could solve this problem, as well as deliver unprecedented performance, energy efficiency, and cost effectiveness. The Raw Microprocessor Problem: How to leverage growing quantities of chip resources even as wire delays become substantial? • • • Scalable ISA Provide a parallel, software interface to the gate, wire, and pin resources of the chip Allow programmers more control of physical resources to achieve maximum performance and energy efficiency The Raw Microprocessor Technology Trends • • • Until recently, the abstraction of a wire as an instantaneous connection between transistors has shaped assumptions and architectural designs However, today, it takes on the order of two clock cycles for a signal to travel from edgeto-edge of a 2-GHz processor die Processor manufacturers have strived to maintain high clock rates in spite of the increased impact of wire delay; but materials and process changes have not been sufficient to solve the problem The Raw Microprocessor The Response of Existing Architectures The Raw Microprocessor The Raw Microprocessor • • Attempts to minimize the ISA gap by exposing underlying physical resources as architectural entities Uses an array of identical, programmable tiles The Raw Microprocessor The Raw Microprocessor Each tile contains: • • • • • • The Raw Microprocessor One static communication router Two dynamic communication routers An eight-stage, in-order, singleissue, MIPS-style processor A four-stage, pipelined, floatingpoint unit A 32-Kbyte data cache 96 Kbytes of software-managed instruction cache The Raw Microprocessor • • • The tiles interconnect using four 32-bit fullduplex on-chip networks, consisting of over 12,500 wires. Each tile only connects to its four neighbors. The length of the longest wire in the system is no greater than the length or width of a tile. This property ensures high clock speeds, and the continued scalability of the architecture. The Raw Microprocessor Pin Multiplexing • • On the edges of the network, the network buses are multiplexed onto pins Prototype uses 1,657 pins and provides 14 full-duplex, 32-bit, 7.5 Gbps I/O ports at 225 MHz The Raw Microprocessor Architectural Entities The Raw Microprocessor Architectural Entities Raw processors will have: • • • More functional units, as well as more flexible and efficient pin utilization Higher pin count due to this efficiency More predictablity and have higher clock frequencies due to explicit exposure of wire delay The Raw Microprocessor Application Mapping • • Applications can leverage the Raw static network’s ASIC-like place and route facility -applications that do so are called software circuits The Raw operating system allows both space and time multiplexing of processes -- it allocates a rectangular-shaped number of tiles to each process The Raw Microprocessor Application Mapping The Raw Microprocessor Design Decisions Compute Processor: • • • Focus: tight integration of coupled network interfaces and processor pipeline Networks are register mapped and integrated directly into the bypass paths of the pipeline Intertile networking extends bypass concept into 2-D The Raw Microprocessor Design Decisions The Raw Microprocessor Design Decisions Static Router: • • • Routing instructions determine routing path The static routers collectively reconfigure the entire communication pattern of the network on a cycle-by-cycle basis One cycle-per-hop latency between tiles The Raw Microprocessor Design Decisions Static Router: • 5-stage pipeline that exploits parallelism in routing The Raw Microprocessor Design Decisions Dynamic Networks: • • Supports need for dynamic events and message passing Better suited for long data streams due to large overhead The Raw Microprocessor • • • Implementation IBM’s SA-27E, 0.15 micron, six-level copper, ASIC process 25W power consumption Wire delay in tiles was large enough that placement could not be ignored The Raw Microprocessor Implementation • • • Applications with very small ILP generally do not benefit from running on Raw For applications with moderate to significant ILP, performance increases are observed Authors attain speedups ranging from 6x to 11x versus a single tile on Specfp applications for a 16-tile Raw processor and9x to 19x for 32 tiles The Raw Microprocessor Conclusion • • • Replicated tile design saved time in design, RTL Verilog coding, resynthesis, verification, placement, and back-end flow Virtual Raw systems can be created from glueless connection of up to 64 chips Authors believe that reaching the point at which a Raw tile is a relatively small portion of total computation could change the way we compute The Raw Microprocessor Discussion The Raw Microprocessor • • • • • • Discussion Questions Does this paper discuss enough real program and benchmark results? Is 25W power consumption “energy efficient” for the performance they have indicated? Are there negative consequences of exposing so much complexity to the software/programmer? How can the functionality of this processor be likened to a 2-D pipeline? Does cost need to be addressed? How advantageous is the design time reduction achieved through redundancy? The Raw Microprocessor