Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SoC Architecture Course Oct 2008 – Jan 2009, KTH Zhonghai Lu / Axel Jantsch [email protected] Course Information Course responsible: Dr. Zhonghai Lu Course examiner: Prof. Axel Jantsch 12 Lectures, 4 Tutorials, 3 Labs Home page: www.ict.kth.se/courses/IL2207 Course Material Dally, Towles: Principles and Practices of Interconnection Networks Distributed Material Slides Advanced-level course, more demanding May 23, 2017 SoC Architecture 2 Lecture Overview L1: Introduction L2: Buses and Arbitration (Dally: 22, 18) L3: Shared Memory Multiprocessors L4: Cache Coherency Protocols L5: Memory Consistency L6: Introduction to Network-on-Chip, Topologies (Dally: 1, 2, 3, 4, 5) L7: Routing Algorithms and Mechanics (Dally: 8, 9, 10, 11) L8: Flow Control (Dally: 12, 13) L9: Deadlock and Livelock (Dally: 12, 13, 14) L10: Router Architecture and Network Interface (Dally: 16, 17, 20) L11: Quality of Service and Performance Analysis (Dally: 15) L12: Course Summary and Trends May 23, 2017 SoC Architecture 3 Tutorial Overview T1: Bus, arbitration and cache coherency T2: Memory consistency and network topology After Lecture 7, on Nov. 12 By Dr. Lu T3: Interconnection networks (routing, flow control, deadlock etc.) After Lecture 5, on Nov. 5 By Prof. Jantsch After Lecture 8, on Nov. 19 By Dr. Lu T4: Router architecture, QoS and performance analysis After Lecture 12, on Dec. 3. By Prof. Jantsch May 23, 2017 SoC Architecture 4 Lab Overview Laboratory 1: Uniprocessor SoC Design with Altera Laboratory 2: Multiprocessor SoC Design with Altera Laboratory 3: Wormhole Networks Students work in groups of max. 2 Good preparation is required. May 23, 2017 SoC Architecture 5 Course Requirements To pass the course the student has to fulfill the following requirements: Pass the final exam. The grade for the exam will be the grade of the course. Final exam: Dec. 16, 2008, 9:00-13:00, Room 432, 438, 439,530 ** Register the exam in Daisy 2 weeks before the exam date in order to guaranttee a seat *! Attend tutorials Complete all labs May 23, 2017 SoC Architecture 6 Observations in System Design Advances in Integration Intel 4004 (1971) Intel Pentium 4 (2000) 1.5 GHz 42 million transitors 108 KHz 2,300 transistors If automobile speed had increased similarly over the same period, we could now drive from Stockholm to Shanghai in about 23 seconds. May 23, 2017 SoC Architecture 8 Advances in Integration - 2007 Intel Terflop Chip 2007 http://techresear ch.intel.com/arti cles/TeraScale/1449.htm Growing Design-Productivity Gap Moore’s Law: Design Productivity Crisis Standard cell density and speed 10,000 100,000,000 Gates Clock 1,000 100 Logic Transistor per Chip (M) Equivalent Added Complexity 1,000 10,000 Logic Tr./Chip 100 Tr./S.M. 1,000 10 100 1 10 0.1 xx xx x x x 1 x 0.01 0.1 0.001 0.01 Productivity (K) Trans./Staff – Mo. Density (Kgates/mm2) ASIC clock (MHz) Potential Design Complexity and Designer Productivity 10,000 Source: (SRC 1997) Designs do not only get more complex, but also much more expensive! May 23, 2017 SoC Architecture 10 The Role of the Market! Source: Smith 1997 May 23, 2017 SoC Architecture 11 Moore’s Law drives the development of System-in-Chip Architectures RTL function 1 Processor RTL function 2 Yesterday’s SOC RTL function 3 The growing number of transistors on an SOC drives the trend towards more RTL blocks on the chip Memory RTL I/O Ctl RTL RTL RTL Proc RTL RTL RTL Mem RTL RTL RTL RTL RTL RTL DSP RTL RTL RTL Mem RTL RTL I/O Today’s SOC Source: Leibson (DAC2004) May 23, 2017 SoC Architecture 12 Verification Costs The percentage of the verification costs of the total design costs is continuously increasing (at present 50-70% for large designs) May 23, 2017 SoC Architecture 13 Platforms reduce Costs SOC Flexibility = Per-Unit Cost Reduction Source: Leibson 2004 (Model: 100K and 1M system volumes) 120 Low-end still camera 100 Total per unit cost 100 000 High-end still camera 1 000 000 80 60 40 20 Video camcorder 0 1 One Chip 2 3 4 5 6 7 System designs per chip design Many System Designs $10M design cost, $15 manf. cost, 5% premium for programmability May 23, 2017 SoC Architecture 14 Platform Example: Nexperia May 23, 2017 SoC Architecture 15 Nexperia Instance: Viper May 23, 2017 SoC Architecture 16 Arm based MPSoC Platform May 23, 2017 SoC Architecture 17 Texas Instruments OMAP A SOC Platform based on Peter Cumming: ”The TI OMAP Platform Approach to SOC” The OMAP platform OMAP products are combinations of hardware and software allowing mutimedia capabilities to be included in 2.5G and 3G wireless handsets and PDAs Critical design paramters are: Performance, Power, Cost and Time-to-Market First Approach: ”Opportunistic Reuse” No planned reuse, but try to reuse whenever possible Second Approach: ”Structured Approach” Systematic Reuse, SoC Platform May 23, 2017 SoC Architecture 19 What is a platform? OMAP defines a platform as ”a packaged capability used in subsequent stages of the development to reduce development costs” Platforms have the following characteristics: Between silicon and systems many platforms may be developed and used in subsequent stages of the development Platforms are valuable due to the notion of reuse (good for economy) They include hardware, software, assemblies and tools! May 23, 2017 SoC Architecture 20 Examples for platforms Transistor and ASIC libraries are the lowest hardware platforms Instruction Set Architecture and associated Assembly Language Tools are the lowest levels in Software These well-understood levels are used by other OMAP platforms May 23, 2017 SoC Architecture 21 OMAP: Hierarchy of Platforms Application Specific Ref Design Appl. Platform SoC Platform OMAP Products OMAP Infrastructure ASIC Library & Tools Silicon Technology Reuse OMAP uses platforms on different levels This is a precondition for reuse May 23, 2017 SoC Architecture 22 SoC Platform The SoC platform consists of The Application Platform (the OMAP product) A library of hardware components An architecture for their interconnection Processor and Peripherals Low-Level Software (Drivers) Development Environment The System Platform The platform includes the code that controls all aspects of the system from device driver to system interface TI has a reference design group in order to understand the new demands for OMAP May 23, 2017 SoC Architecture 23 OMAP Products The OMAP product range consists of several families of devices for different markets, e.g. Application processors for 3G: OMAP 1510 and 1610 Application processors for 2.5G: OMAP 710 and 730 May 23, 2017 SoC Architecture 24 OMAP 1510 OMAP 1510 is based on Enhanced ARM 925 core (RISC processor) TI C55x core DMA, SRAM, Busses, Peripherals May 23, 2017 SoC Architecture 25 Current OMAP platform for Wireless Handset & PDA OMAP™ 3 architecture combines mobile entertainment with high performance productivity applications (Source: Texas Instruments) May 23, 2017 SoC Architecture 26 Strength of the OMAP concept The main strength of the OMAP concept is that several actors can make extensive Reuse of development efforts at several levels of the design process Actors: Levels: Mobile Device Manufacturers Software Developers TI’s internal Development Teams Common Hardware and Software Interfaces Common Development Environment Single Low-Level Software Framework (Code can be used for several products) Single SoC Platform OMAPI is an interface standard for OMAP founded by TI and ST May 23, 2017 SoC Architecture 27 OAMP Architecture The OMAP architectute consisting of general purpose processor and DSP has been chosen because of the application area Need for Performance Energy and Area Constraints Two Main Tasks: User Interface and Signal Processing Flexibility and Reuse May 23, 2017 SoC Architecture 28 Requirements on Software Platform Hardware architecture requires a matching software approach Well-defined Set of Application Programming Interfaces in the high-level OS running on the general purpose processor System Software that links General Purpose Applications to DSP components Well-defined Standard for DSP Components (TMS320 Algorithm Standard or eXpressDSP) May 23, 2017 SoC Architecture 29 Summary The OMAP platform Covers a wide range of products allowing to reuse Hardware and Software Hardware Architecture adopted to Application Area Software Architecture using features of Hardware Architecture Efficient SOC Platform with Definitions for Hardware and Software Reuse May 23, 2017 SoC Architecture 30 Emerging Architectures System-on-Chip Architectures A system-on-chip architecture integrates several heterogeneous components on a single chip Microcontroller Communication Structure AnalogDigital DSP May 23, 2017 Memory FPGA DigitalAnalog Custom Hardware A key challenge is to design the communication between the different entities of a SoC in order to minimize the communication overhead SoC Architecture 32 System-on-Chip Architecture: A bus-based SoC Memory Microprocessor System on a chip May 23, 2017 Custom Logic SoC Architecture DSP I/O 33 System-on-Chip Architecture: Network-on-Chip Switch PE1 NI NI PE3 Channel PE2 Resource NI NI MEM Network Interface The resources are connected to the network via network interfaces The topology of the network and the capability of the switches and communication channels determines the capacity of the network May 23, 2017 SoC Architecture 34 ASIC Technologies What is an ASIC? ASIC = Application Specific Integrated Circuit An ASIC is an integrated circuit for a specifc application and (generally) produced in relatively small volumes. An ASIC-technology helps to shorten the design time by providing a semi-fabricated integrated circuit May 23, 2017 SoC Architecture 36 ASIC families The term ASIC is often reserved for circuits that are fabricated in a silicon foundry, while circuits that can be programmed at the customer’s site are called Programmable Logic. Programmable Logic Programmable Logic Device (PLD) Field Programmable Gate Array ASIC Standard Cell Gate Array The term full custom is reserved for circuits where all silicon layers can be optimized. This implies a long design process and thus full custom is mainly used for high-volume high-end circuits. May 23, 2017 SoC Architecture 37 Standard Cell Standard cells are often referred as Cell-Based Integrated Circuits (CBIC) All mask layers are customized The standard cell library defines logic elements of varying complexity: SSI, MSI logic, data path blocks, memories and system-level blocks. May 23, 2017 SoC Architecture 38 Standard Cells Cells are configured in rows and have constant height and variable width Each cell is optimized for an efficient implementation May 23, 2017 SoC Architecture 39 Gate Array A gate array chip contains prefabricated adjacent rows of PMOS and NMOS transistors The gate array is configured by the interconnect structure May 23, 2017 SoC Architecture 40 Channeled Gate Array Only the interconnect is customized The interconnect uses spaces between rows of base cells May 23, 2017 SoC Architecture 41 Channelless Gate Array (Sea of Gates) Only the interconnect is customized Cells are connected via unused transistors May 23, 2017 SoC Architecture 42 Field Programmable Gate Arrays None of the layers are customized Basic logic cells and interconnect can be programmed Basic cells can be SRAM based, Flash Memory based or fuse-based (one time programmable) May 23, 2017 SoC Architecture 43 Programmable Logic Device • No customized mask layers or logic cells • A single large block of interconnects • Macrocells consist of programmable array logic followed by a flipflop or latch May 23, 2017 SoC Architecture 44 Comparison FPGA, Gate Array, Standard Cell FPGA Initial Cost Cost per part Performance Fabrication Time Low High Low Short High Low High Long Gate Array Standard Cell May 23, 2017 SoC Architecture 45 Design Trade-Offs Design Time Full Custom Standard Cell Gate Array Programmable Logic Microprocessor Performance May 23, 2017 SoC Architecture 46 Challenges for System Design How to design a system-onchip? Implementation Efficient implementations require to exploit the low-level features of the target architecture Challenge for System Design! May 23, 2017 Idea (Specification) abstract Design Specification Design productivity increases with the level of abstraction The task of functional verification is very difficult at low abstraction levels Abstraction Gap detailed Product (Implementation) SoC Architecture 48 SoC Design The continuous progress in silicon process technology allows to increase more and more functionality on a single chip => Systems on a chip become reality Market-driven forces: Shorter product design schedules and life spans Products have to confirm to standards The design has to be right from the start. An implementation error means heavy loss of money or product death Large designs are integrated into a single chip The SoC design process must address these driving forces May 23, 2017 SoC Architecture 49 The Design Process Design Step Intermediate Model Abstraction Gap Abstraction Level Design Specification Implementation Design Space May 23, 2017 SoC Architecture 50 Requirements on Design Flow Design Entry Well-defined abstract specification model Efficient verification methodology Design Refinement Well-defined models at all abstraction levels Well-defined refinement steps Verification at all levels May 23, 2017 SoC Architecture 51 Requirements on Design Flow Implementation Mapping Efficient platform architecture with well-defined API Mapping detailed implementation model to API services Tool Support Verification Design Refinement Implementation Mapping Estimation of Properties May 23, 2017 SoC Architecture 52 Design Process A design specification has to be mapped on an architecture Design Specification Architecture Specification Design Process Design Implementation May 23, 2017 SoC Architecture 53 Design Process (Uniprocessor) A program is compiled to assembler code for a chosen uniprocessor and operative system Program (Parallel Tasks) Uniprocessor + Operating Syst. Compilation Executable Code May 23, 2017 SoC Architecture 54 Design Process The design process for a SoC applications is a very complex task Many components work in parallel and communicate with each other A task can be mapped on different components The overhead for communication depends on how tasks are located The designer has to choose an appropriate SoC architecture, since different architectures have different strength and weaknesses May 23, 2017 SoC Architecture 55 Design Process (System-On-Chip) A specification shall be mapped onto a SOCArchitecture with several heterogeneous components Specification (Parallel Tasks) SoC Arch. with several components Partitioning, Mapping, Compilation HW Descr. Comp. A May 23, 2017 HW Descr. Comp. B SoC Architecture Code Processor X Code Processor Y 56 Platform-Based Design The idea of a platform is to simplify the design process Programmers Model Hardware Abstraction Hardware Platform Microcontroller FPGA Communication Structure AnalogDigital DigitalAnalog Custom Hardware DSP May 23, 2017 Memory SoC Architecture 57 System-on-Chip Platform Layered Concept allows to API Services with Guarantees Change the physical architecture of the SoC without affecting the application Add new services on top of existing architecture Changes in one layer affect only the layer itself and its interfaces May 23, 2017 SoC Architecture Transaction Messages, Load/Store Transport Packets Physical Wires, Clocks 58 Concurrency Embedded Systems have to cope with Parallelism Sink C A Embedded System B Reactive Environment D Source Provides an alternative to faster clock for performance Applies at all levels of system design Is essential within embedded system design, where the system has to react to several inputs from the environment May 23, 2017 SoC Architecture 60 System-on-Chip: A Parallel Architectures A parallel computer is a collection of processing elements that cooperate to solve large problems fast Resources Data access, Communication and Synchronization Processing capacity of the components Distributed and/or global memory Communication protocol Communication capacity Communication abstraction and primitives Objectives Performance and Scalability May 23, 2017 SoC Architecture 61 Components in a Parallel SoC Microprocessor cores or DSP:s are cheap and optimized for their application area Customizable hardware can be used to guarantee a high performance for a special task Often each parallel task does not need a tremendous processing power It is important, how the parallel tasks can be mapped onto the SoC so that the parallel nature of the system can be fully exploited May 23, 2017 SoC Architecture 62 Communication Primitives System on Chip There are two main paradigms Shared Memory Message Passing May 23, 2017 SoC Architecture 63 Communication Primitives System on Chip Shared memory is typical for bus-systems, since naturally a memory is connected to the bus that all processing entities can access Memory System on a chip May 23, 2017 Microprocessor Custom Logic (ASIC) SoC Architecture DSP I/O 64 Communication Primitives Network on Chip Switch PE1 NI NI PE3 NI MEM Channel PE2 NI Network Interface Message passing looks very natural for networks-onchip, since a shared memory is usually not available However, locality is important, since otherwise huge amounts of data have to be sent over a network May 23, 2017 SoC Architecture 65 Message Passing Message P1 P2 Processes send messages between processes A message has a sender and and receiver(s) Primitives are Send and Receive Programming does not include a shared memory May 23, 2017 SoC Architecture 66 Programming Model for Message Passing A C Process Receive Message (Wait for message) Send Message D B Natural Model for NoCs: Communicating Finite State Machines Communication is done by message passing (languages like SDL are suitable) May 23, 2017 SoC Architecture 67 Implementation of a Message Passing Programming Model A programming model based on message passing can still be implemented by a shared memory architecture Each layer has to use the primitives that are provided by their lower layer neighbour Source Code uses High-Level Comm. Primitives Compiled Program Operating System uses Low-Level Comm. Primitives uses Hardware Drivers Mem P1 P2 here Shared Memory Comm. (can also be NoC) Hardware May 23, 2017 SoC Architecture 68 Summary System-on-Chips are heterogeneous and parallel A good communication is the key to an efficient parallel architecture In the course we will mainly focus on comunnication architectures May 23, 2017 Buses Network-on-chip SoC Architecture 69