Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 1 Introduction to the Systems Approach Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011) soc 1.1 SOC architecture and design • system-on-chip (SOC) – processors: become components in a system • SOC covers many topics – processors, cache, memory, interconnect, design tools • need to know – – – – – – – – user view: variety of processors basic information: technology and tools processor internals: effect on performance storage: cache, embedded and external memory interconnect: buses, network-on-chip evaluation: processor, cache, memory, interconnect advanced: specialized processors, reconfiguration design productivity: system modelling, design exploration soc 1.2 System on a Chip: driven by semiconductor advances soc 1.3 Basic system-on-chip model soc 1.4 SOC vs processors on chip • with lots of transistors, designs move in 2 ways: – complete system on a chip – multi-core processors with lots of cache System on chip Processors on chip processor multiple, simple, heterogeneous few, complex, homogeneous cache one level, small 2-3 levels, extensive memory embedded, on chip very large, off chip functionality special purpose general purpose interconnect wide, high bandwidth often through cache power, cost both low both high operation largely stand-alone need other chips soc 1.5 iPhone: has System-on-Chip Source: UC Berkeley soc 1.6 iPhone SOC I/O Processor 1 GHz ARM Cortex A8 I/O Memory Source: UC Berkeley I/O soc 1.7 AMD’s Barcelona Multicore 4 out-of-order cores Processor 512KB L2 512KB L2 Core 1 Core 2 1.9 GHz clock rate 65nm technology 3 levels of caches integrated Northbridge Core 3 512KB L2 Northbridge 512KB L2 2MB shared L3 Cache Core 4 http://www.techwarelabs.com/reviews/processors/barcelona/ soc 1.8 SOC design: key ideas • to design and evaluate an SOC, designers need to understand: – its components: processors, memory, interconnect – applications that it targets • SOC economics heavily dependent on: – costs: initial design, marginal production – volume: applicability, lifetime • reducing design complexity – Intellectual Property (IP) – reconfigurable technology soc 1.9 SOC processors • usually a mix of special and general purpose (GP) – can be proprietary design or purchased IP • commonly GP processor is purchased IP – includes OS and compiler support • GP processor optimized for an application – additional instructions – vector units soc 1.10 soc 1.11 Some processors for SOCs SOC Basic ISA Processor description Freescale c600: signal processing PowerPC Superscalar with vector extension ClearSpeed CSX600: general Proprietary Array processor with 96 processing elements PlayStation 2: gaming MIPS Pipelined with 2 vector coprocessors ARM VFP11: general ARM Configurable vector coprocessor soc 1.12 Processor types: overview Processor type Architecture / Implementation approach SIMD Single instruction applied to multiple functional units Vector Single instruction applied to multiple pipelined registers VLIW Multiple instructions issued each cycle under compiler control Superscalar Multiple instructions issued each cycle under hardware control soc 1.13 Adding instructions • additional instructions to support specialized resources – exception: superscalar, with hardware control • instructions can be added to base processor for coprocessor control – VLIW: Very Large Instruction Word – Array – Vector soc 1.14 Sequential and parallel machines • basic single stream processors – pipelined: basic sequential – superscalar: transparently concurrent – VLIW: compiler generated concurrency • multiple stream – array processors – vector processors • multiprocessors soc 1.15 Sequential processors • operation – generally transparent to sequential programmer – appear as in order instruction execution • pipeline processor – execution in order – limited to one instruction execution / cycle • superscalar processor – multi instructions / cycle, managed by hardware • VLIW – multi op execution / cycle, managed by compiler soc 1.16 Pipelined processor Instruction #1 IF ID AG DF EX WB ID AG DF EX WB ID AG DF EX WB ID AG DF EX Instruction #2 IF Instruction #3 IF Instruction #4 IF WB Time • IF: Instruction Fetch • ID: Instruction Decode • AG: Address Generation • DF: Data Fetch • EX: Execution • WB: Write Back soc 1.17 Superscalar and VLIW processors Instruction #1 IF ID AG DF EX WB ID AG DF EX WB ID AG DF EX WB ID AG DF EX WB ID AG DF EX WB ID AG DF EX WB Instruction #2 IF Instruction #3 IF Instruction #4 IF Instruction #5 IF Instruction #6 IF Time soc 1.18 Superscalar VLIW soc 1.19 Superscalar VLIW soc 1.20 Parallel processors • execution managed by programmer • array processors – single instruction stream, multiple data streams: SIMD • vector processors – SIMD • multiprocessors – multiple instruction streams, multiple data streams: MIMD soc 1.21 Array processors • perform op if condition = mask • operand can come from neighbour mask op dest sr1 sr2 n PEs, each with memory; neighbour communications one instruction issued to all PEs soc 1.22 Vector processors • vector registers, eg 8 regs x 64 words x 64 bits • vector instructions: VR3 <- VR2 VOP VR1 soc 1.23 Array Processors Vector Processors soc 1.24 SOC multiprocessors soc 1.25 Memory and addressing • many SOC memory designs use simple embedded memory – a single level cache – real (rather than virtual) addressing • as SOC become more complex – their designs are expected to use more complex memory and addressing configurations soc 1.26 Three levels of addressing soc 1.27 User view of memory: addressing • a program: process address (offset + base + index) – virtual address: process address + process id • a process: assigned a segment base and bound – system address: segment base + process address • pages: active localities in main/real memory – virtual address: translated by table lookup to real address – page miss: virtual pages not in page table • TLB (translation look-aside buffer): recent translations – TLB entry: corresponding real and (virtual, id) address • a few hashed virtual address bits address TLB entries – if virtual, id = TLB (virtual, id) then use translation soc 1.28 The TLB and The MMU soc 1.29 SOC interconnect • interconnecting multiple active agents requires – bandwidth: capacity to transmit information (bps) – protocol: logic for non-interfering message transmission • bus – AMBA (adv. Microcontroller bus architecture) from ARM, widely used for SOC – bus performance: can determine system performance • network on chip – array of switches – statically switched: eg mesh – dynamically switched: eg crossbar soc 1.30 Bus based SOC soc 1.31 Network on a Chip soc 1.32 SOC design approach • understand application (compiler, OS, memory and real time constrains) • select initial die area, power, performance targets; select initial processors, memory, interconnect • assume target processor and interconnect performance, design and evaluate memory • evaluate and redesign processors with memory • design interconnect to support processors and memory • repeat and iterate to optimize soc 1.33 SOC design approach soc 1.34 Processor optimization example • given embedded ARM processor – in an SOC chip • 1 IALU vs 2 IALU vs 3 IALU vs 4 IALU – instructions per cycle? • 16k L1 instruction cache vs 32k L1 i-cache – how much improvement? less power? • branch predictor: taken vs not-taken – misprediction rate? • aim: explore this large design space soc 1.35 Design cost: product economics • increasingly product cost determined by – design costs, including verification – not marginal cost to produce • manage complexity in die technology by – engineering effort – engineering cleverness soc 1.36 Design complexity soc 1.37 Cost: product program vs engineering Chip design Fixed costs Variable costs Verify & test Labor costs Software Marketing, sales, administration Manufacturing costs CAD support Engineering costs Engineering Mask costs CAD programs Fixed project costs Product cost Capital equipment soc 1.38 Two scenarios • fixed costs Kf, support costs 0.1 x fct(n), and variable costs Kv*n, so Costs Kf (0.1* Kf ) * 3 n Kv * n • design get more complex while production costs decrease – K1 increases while K2 decreases – implicitly requires higher volumes to break even • when compared with 1995, in 2005 – K1 increased by 10 times – K2 decreased by the same amount soc 1.39 Two scenarios soc 1.40 Product volume dictates design effort Basic physical tradeoffs Design time and effort Balance point depends on n, number of units soc 1.41 Reduce complexity: use IP • IP: Intellectual Property soc 1.42 Reduce complexity: reconfig. tech. • reconfigurable technology: no fabrication costs – lower non-recurring engineering (NRE) costs • reconfigurable design: faster and cheaper – improve time-to-market • reconfigurability – in-system upgrade: improve time-in-market – run-time adaptation: respond to run-time conditions – compile-time reconfiguration: retarget accelerator • overhead: performance, area, energy efficiency – less effective than ASIC (application-specific IC) soc 1.43 Summary • to design and evaluate an SOC, designers need to understand: – its components: processors, memory, interconnect – applications that it targets • SOC economics heavily dependent on: – costs: initial design, marginal production – volume: applicability, lifetime • reducing design complexity – Intellectual Property (IP) – reconfigurable technology soc 1.44