Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Tipovi procesora - Chapter 2 - Different classes of processors In order to achieve efficient designs, there exist different classes of processors: Microcontrollers RISC processors Digital Signal Processors (DSP) Multimedia processors Application Specific Instruction Set Processors (ASIP) Other calasses Microcontrollers classes of embedded processors relatively slow very area efficient intended for control - intensive applications microprogrammed CISC architecture the number of clock cycles different for various instructions limited computational and storage resources relatively small word length data – path (8 or 16 bits) Microcontrollers classes of embedded processors – cont. complex instruction set – provides convenient programming interface, i.e. dense code control-oriented application domain reach set of instruction for bit level data manipulation and peripheral components like timers or serial I/O ports simple processor, such as 8051, 6502 nowadays reused in customized form as microcontroller for Ess. Microcontroller – block diagram - Microcontroller – detailed block diagram - Timer as constituent of microcontroller RISC classes of embedded processors evolved from CISC architectures Harvard architecture –separated data and instruction memory pipelined instruction execution offer only very basic set of instructions instructions are executed at very high speed all instructions have the same size, and require the same number of clock cycle for instruction execution RISC classes of embedded processors - cont. Load/Store architecture large number of general purpose registers – reduced number of memory accesses in a machine program for a fixed application, the code size for a RISC exceed the code size of a CISC popular members of RISC processors for ESs are ARM RISC core, MIPS RISC core, TRICO low power consumption (100 mW), suitable for portable systems with battery supply Clock Frequency Versus Year for Various Representative Machines Fundamental attributes The key metrics for characterizing a microprocessor include: performance power consumption cost (chip area) high availability (fault tolerant) Instruction Level Parallelism – Definition The next step in performance enhancement beyond pipelining calls for executing several instructions in parallel Instruction-Level Parallelism (ILP) is a family of processor and compiler design techniques that speed-up execution by causing individual machine operations, such as memory loads and stores, integer additions, and floating-point multiplications, to execute in parallel. Parallel processor systems Parallel processor systems tend to take one of two forms: • multiprocessors – relatively large tasks, such as procedures or loop iterations are executed in parallel • instructions level parallel (ILP) processors – execute individual instructions in parallel ILP processors Processors that exploit ILP have been much more successful than multiprocessors in the general-purpose workstations/PC market because they can provide performance improvements on conventional programs, while this has not been possible on multiprocessors. The two more common architectures for ILP are: • superscalar processors • Very Long Instruction Word (VLIW processor) The structure of ILP processors In the structure of ILP processor some of the execution units are able to execute integer while the other floating-point operations What is ILP ? ILP processors exploit the fact that many of the instructions in a sequential program do not depend on the instructions that immediately precede them in the program Let consider the following sequence: What is ILP ? - continue The dependencies require that instructions 1, 3, and 5 are executed in order to generate the correct result, but instructions 2 and 4 can be executed before, after, or in parallel with any of the other instructions without changing the result of the program fragment. Division of responsibilities between the compiler and the hardware If ILP is to be achieved, between the compiler and the runtime hardware, the following functions must be performed •the dependencies between operations must be determined •the operations, that are independent of any operation has not as yet completed, must be determined, and •these independent operations must be scheduled to execute at same particular time, on some specific functional unit, and must be assigned a register into which the result may be deposited Breakdown of tasks between compiler and runtime hardware Superscalar processors – basic principle Superscalar processors contain hardware that examine a sequential program to locate instructions that can be executed in parallel. This allow them to maintain compatibility between generations and to achieve speedups on programs that were compiled for sequential processors, but were compiled window of instructions that the hardware examines to select instructions that can be executed in parallel, wich can reduce performance. Superscalar processors can achieve speedups when running programs (that were compiled for execution on sequential (nonILP)) processors without requiring recompilation Superscalar execution Instead of ‘scalar’ execution where in each cycle only one instruction can be resident in each pipeline stage, ‘superscalar’ execution is used, where two or more instructions can be at the same pipe stage in the same cycle. Superscalar execution allow multiple instructions, that are adjacent in program order, to be in the stage of processing simultaneously Superscalar design require significant replication of resources in order to support fetching, decoding, execution, and writing-back of multiple instructions in every cycle. General superscalar organization Superpipelining an alternative approach An alternative approach to achieving greater performance is referred to as ‘superpipelining’ Superpipelining exploits the fact many pipeline stages perform task that require less than half a clock cycle Superscalar vs Superpipeline Limitations The superscalar approach depends on the ability to execute multiple instructions in parallel ILP refers to the degree to which, on average, the instructions of a program can be executed in parallel A combination of compiler-based optimization and hardware techniques can be used to maximize ILP. Fundamental limitations Fundamental limitations to parallelism with which he system must cope are : data dependencies: - true data dependencies - output dependencies - antidependencies procedural dependencies (control dependencies) resource conflicts (structural dependencies) Effect of dependencies Data dependencies Design issues: ILP versus Machine Parallelism ILP and Machine Parallelism (MP) are two related concepts in processor design so it is very important to make a clear distinction between them: ILP exists when instructions in a sequence are independent and thus can be executed in parallel overlapping. ILP is a measure how many instructions can be executed together on an infinitely wide superscalar type machine. ILP vs Machine Parallelism MP is a measure of the ability of the processor to take advantage of ILP MP is determined by the number of instructions that can be fetched and executed at the same time (the number of parallel pipelines) and by the speed and sophistication of the mechanisms that the processor uses to find independent instructions. Both ILP and MP are important factors in enhancing performance Example for ILP and MP The code for ( i = 0 ; i < 100 ; i ++) a[i] = a[i] + 1 ; has considerable amount of parallelism. If we built a machine with 100 functional units and memory ports would give us a 100 x speedup. Example for ILP and MP - continue In many cases the amount of ILP is simply the ratio of dependencies (data and structural) and control dependencies to other types of instructions. Fewer branches and true data dependencies will increase ILP More functional units will increase MP Instruction issue and instruction issue policy Machine parallelism is not simply of matter of having multiple instances of each pipeline stage. The processor must also be able to identify ILP and to orchestrate the fetching, decoding and execution of instructions in parallel. The term instruction issue refer to the process of initiating instruction execution in the processor’s functional units The term instruction issue policy refer to the protocol used to issue instructions Instruction issue policies Superscalar instruction issue policies can be grouped into the following three categories: •In-order issue with in-order completion •In-order issue with out-of-order completion •Out-of-order issue with out-of-order completion Instruction issue policy - examples We assume a superscalar pipeline capable of fetching an decoding two instructions at a time, having three separate functional units, and having two instances of the write-back pipeline stage The examples assumes the following constraints on a sixinstruction code fragment: – I1 requires two cycles to execute – I3 and I4 conflict for the same functional unit – I5 depends on the value produced by I4 – I5 and I6 conflict for a functional unit In Order Issue and in Order Completion In Order Issue Out of Order Completion Out of Order Issue and Out of Order Completion Another Example of out-of-order execution Cycle Scalar / In order Super-Scalar / In order 1 Load eax, meml Load eax, mem 1 2 3 4 Super-Scalar / Out-of-Order Load eax1, meml / Load eax2, mem3 Store mem2, eax Store mem2, eax / Load eax, mem3 Store mem2, eax1 / Store mem4. eax2 Load eax, mem3 Store mem4, eax Store mem4, eax Conceptual Description of Superscalar Processing Superscalar processor - How execution progresses Superscalar Internal Structure Another Superscalar Internal Structure Instruction Flow, Register and Memory Dataflow VLIW processors - basic principles VLIW processors architecture requires that programs be recompiled for the new architecture but achieves very good performance on program written in sequential languages such as C or Fortran when these programs are recompiled for a VLIW processor. VLIW is one particular style of processor design that tries to achieve high levels of ILP by executing long instruction words composed of multiple operations. VLIW processors, contrary to superscalar approach, take a differant approach to ILP, relying on the compiler to determine which instructions may be executed in parallel and provide that information to the hardware. VLIW instruction & VLIWprocessor In VLIW processors, each instruction specifies several independent operations that are executed in parallel by the hardware Sheduling sequence of operations for execution on a VLIW processor with 3 Execution unit – Example Let consider the following sequence: VLIW scheduling will be: VLIW – different flavours of parallelism The number of operations in VLIW instructions is equal to the number of execution units in the processor Each operation specifies the instruction that will be executed in the cycle that the VLIW instruction is issued. There is no need for the hardware to examine the instruction stream to determine which instructions may be executed in parallel. The compiler is responsible for ensuring that all of the operations in an instruction can be executed simultaneonsly. Pros and cons of VLIW – advantages The main advantages of VLIW architectures are: • simpler instruction issue logic, often allow VLIW processors to fit more execution units onto a given amount of chip space (than superscalar processors) • the compiler generally has a larger-scale view of the program than the instruction logic in a superscalar processor and if therefore generally better than the issue logic at finding instructions to execute in parallel Pros and cons of VLIW – disadvantages The most significant disadvantages of VLIW processors are: VLIW programs only work correctly when executed on a processor with the same number of execution units and the same instruction latencies as the processor they were compiled. Code written for a machine with 4 concurrent integer units could not exploit additional execution units in a later model. Likewise, code optimized for a newer VLIW with 8 concurrent integer units would not function correctly on an older machine with fewer units. Pros and cons of VLIW – disadvantages - continue In addition, if the compiler cannot find enough parallel operations to fill all of the slots in an instruction, it must place explicit Nop operation into the coresponding operation slots. This causes VLIW programs to take more memory than equivalent programs for superscalar processors. Defoe Processor – VLIW Representative Itanium Bundle Itanium Register Set Parallelism of Instruction Execution and Instruction Issue The ways to exploit instruction parallelism: Scalar & Superscalar The ways to exploit instruction parallelism: Super-pipeline & VLIW Typical application of VLIW and superscalar processors VLIW processors are often used in digital signal-processing (DSP) applications, where high performance and low cost are critical Superscalar processors are mainly used in general-purpose computers such as workstations and PCs, because customers demand software compatibility between generations of a processor Improving performance In general performance can be improved by increasing IPC and/or by decreasing the instruction count RISC architecture seeks to increase both frequency and IPC via pipelining and use of cache memories at the expanse of increased instruction count CISC microprocessors employ RISC-like internal representation to achieve higher frequency while maintaining lower instruction count VLIW concept, revived with the EPIC (Explicitly Parallel Instruction Computing) uses the compiler to schedule instruction statically. Exploiting parallelism statically can enable simpler control logic and help EPIC to achieve higher IPC and higher frequency DSP classes of embedded processors designed for arithmetic – intensive signal processing applications instruction set tuned for fast execution of algorithms like digital filtering and FFT special hardware components: hardware multipliers and dedicated address generation units instructions can be executed in parallel - VLIW architecture DSP classes of embedded processors - cont. unlike RISCs, DSPs use special purpose registers (dedicated accumulator register) operate in special arithmetic mode - saturation mode due to irregularities in the processor architecture, compared to other processor classes, compilers construction is difficult the market leader in DSPs is Texas Instruments Signalno-procesne arhitekture Danas na tržištu se mogu identifikovati sledeće signalnoprocesne arhitekture: - ASIC – Application Specific Integrated Circuit - ASSP - Application Specific Standard Product - konfigurabilni procesori – Configurable Processor - DSP – Digital Signal Processor - FPGA – Field Programmable Gate Array - MCU - Microcontroller - RISC/GPP – Reduced Instruction Set Computer / General Purpose Processor Kriterijumi koji se koriste za procenu mogućnosti procesnih elemenata Vremenski period od trenutka kada se proizvod zamisli do trenutka kada se proizvede (Time to market) – veoma važno Performanse (Performance) – vrlo važne Cena (Price) – vrlo važna Sredstva za projektovanje koja su a raspolaganju (Development Ease) - vrlo važna Potrošnja (Power) – srednje važnosti Fleksibilnost karakteristika (Feature Flexibility) – nisu od velike važnosti Kriterijumi za procenu pogodnosti primene date arhitekture kod procesiranja signala u realnom vremenu Tipovi programibilnih VLSI kola Tipovi programibilnih VLSI kola – nast. ASIC - specifično projektovana kola koja izvršavaju jedinstveni zadatak. Kod ovih kola u kasnijoj fazi projektovanja je veoma teško izvršiti izmene. Upravljačka jedinica je obično tipa hard-wired. ASPP - programibilna arhitektura (odnosi se na stazu podataka) koja je u stanju da izvršava veći broj različitih zadataka ( aktivnosti ). Upravljačka jedinica je mikroprogramski zasnovana. Postoji nekoliko programa upisanih u mikroprogramskoj memoriji pri čemu se svaki program odnosi na jedan zadatak. ASIP - takođe poseduje programibilnu stazu podataka koja se sa aspekta fleksibilnosti nalazi negde između ASPP-a i DSP-a. U ovom slučaju staza podataka je nešto uopštenije strukture jer kao i kod standardnih procesora sadrži RF polje (registarsko polje) i ALU. U odnosu na DSP se razlikuje po tome što je skup instrukcija dosta ograničen (restriktivan je) a takođe i broj internih magistrala nije tako veliki. Primena ASIP-a je ograničena na specifične aplikacije koje se mogu brzo izvršavati. Tipovi programibilnih VLSI kola – nast. DSP - procesori za obradu digitalnih signala, na sličan način kao i mikrokontrolerske jedinice ( kakve su popularne Intel 80C51 ili Motorola MC 68HC11 ) su “zaokružene” računarske mašine sa interno ugrađenim U/I kanalima i memorijom ali sa znatno superiornijim mogućnostima za matematičkom manipulacijom kao i arhitekturom koja je bolje prilagođena obradi tipovima podataka (pre svega nizovima) tipičnih za digitalno procesiranje signala. Danas DSP-ovi su postale ključne VLSI komponente koje se ugrađuju u komunikacionim, medicinskim, vojnim, industrijskim i raznim drugim proizvodima široke potrošnje. Istraživači i projktanti ih često sa opravdanjem smatraju kao klasa mikroprocesora koja je optimizirana za digitalnu obradu signala. MPU su procesorske jedinice opšte namene koje su u stanju, po ceni redukovane brzine izvršenja, da izvršavaju, bez ograničenja, zadatke bilo kog tipa. Množenje sa akumulacijom – specifičnost DSP-a Veći broj izvršnih jedinica – specifičnost DSP-a Razlika u memorijskim arhitekturama kod standardnih MPU-ova I DSP procesora Generator adresa i pristup memoriji kod DSP procesora Tipična organizacija U/I-a kod DSP procesora Tipična aplikacija DSP procesora – TMS 320C240 Konvencionalni u odnosu na poboljšani DSP Organizacija izvršnih jedinica memorije (program & podatke) kod TMS 320C62xx SIMD DSP procesori Princip rada 64 - bitnog sabirača podeljen na četiri 16- bitne sub-reči. Performansne karakteristike nekih DSP procesora BDTI - je performansna mera Paralelno procesiranje nezavisnih instrukcija TMS320C10 TMS320C206 TMS320VC33 Multimedia processors classes of embedded processors relatively new on the market - architecturally related to RISCs and DSPs intended for multimedia applications: audio, image, or video signal processing the architecture follows the VLIW paradigm different functional units can operate in parallel Use general purpose registers like RISCs Multimedia processors classes of embedded processors - cont. The architecture is more regular than in DSPs the compiler is responsible for exploiting ILP in a program Examples of multimedia processors are: C6201 (up to 8 parallel instructions per cycle), Trimedia TM1000 (up to 5 parallel instructions per cycle) Multimedia processor – TM1000 Multimedia processor – STn8810 ASIPs classes of embedded processors Microcontrollers, RISCs, DSPs and multimedia processors are domain - specific: they are tuned for certain application domain, but not for the given application itself ASIPs are compromise between domain - specific processors and non-programmable ASICs ASIPS are programmable, but they serve only a very narrow range of application ASIPs classes of embedded processors - cont1 ASIPs can be parameterized the basic architecture of an ASIP is fixed, but it can be customized for a given application by setting a number of different parameters word lengths my be adjusted to the required precision, register files my be sized, and available special hardware components tuned Since these parameters are mostly orthogonal to each other, large number of different configuration of a single ASIP may be available ASIPs classes of embedded processors - cont2 ASIPs are very efficient, but a large number of different compilers are normally required Retargetable compilers are capable of generating code for any particular ASIP configuration source program P source program P model of processor Q compiler for processor Q retargetable compiler machine code for executing P on Q machine code for executing P on Q Regular versus retargetable compilation ASIP in the context of processor HW implementation class ASIC low ASIP DSP flexibility GPP high high computational performance low high energy efficiency low Energy Efficiency (MOPS/mW or MIPS/mW) The energy - flexibility gap 1000 100 Dedicated (ASIC) hardware ASIPs, FPGAs Reconfigurable logic ICORE ASIP: 35MOPS/mW Programmable DSPs TMS 320C54:3MIPS/mW GPP, microcontrollers SA110: 0.4MIPS/mW 10 1 0.1 Flexibility Definitions of ASIP related terms From application point of view The technical literature uses the acronym ASIP to describe two different kinds of digital ICs: ASIP Application-Specific Integrated Processor (any kind of digital IC used for data processing and does not imply any kind of instruction set oriented or programmable data processing) Application-Specific Instruction Set Processor (Application-Specific Instruction Processor) Programmable application-specific Processor using the concept of an Instruction set architecture for Data processing Evolution of design criteria in CMOS integrated circuits Power dissipation in time “CMOS Circuits dissipate little power by nature. So believed circuit designers” (Kuroda-Sakurai, 95) 100 Power (W) x4 / 3years 10 1 0.1 0.01 80 85 90 95 “By the year 2000 power dissipation of high-end ICs will exceed the practical limits of ceramic packages, even if the supply voltage can be feasibly reduced.” Gloom and Doom predictions Power density will increase VDD, Power and Current Trend Voltage Voltage [V] 2 Power 1.5 Current 1 0.5 0 1998 2002 2006 2010 500 Power per chip [W] 200 0 2014 VDD current [A] 2.5 0 Year International Technology Roadmap for Semiconductors 1999 update sponsored by the Semiconductor Industry Association in cooperation with European Electronic Component Association (EECA) , Electronic Industries Association of Japan (EIAJ), Korea Semiconductor Industry Association (KSIA), and Taiwan Semiconductor Industry Association (TSIA) (* Taken from Sakurai’s ISSCC 2001 presentation) Power Delivery Problem (not just California) Your car starter ! Source: Shekhar Borkar, Intel Power Consumption New Dimension in Design Sources of Power Consumption • The three major sources of power consumption in digital CMOS circuits are: Pavg pt CL Vdd2 f clk I sc Vdd I leakage Vdd P1 P2 P3 + P4 where: P1 – capacitive switching power (dynamic - dominant) P2 – short circuit power (dynamic) P3 – leakage current power (static) P4 – static power dissipation (minor) Research Efforts in Low-Power Design Reducing the Power Dissipation • The power dissipation can be minimized by reducing: • supply voltage • load capacitance • switching activity – Reducing the supply voltage brings a quadratic improvement – Reducing the load capacitance contributes to the improvement of both power dissipation and circuit speed. Amount of Reducing the Power Dissipation Gate Delay and Power Dissipation in Term of Supply Voltage Power dissipation [ W ] (normalized) 25 Gate delay [ns] (normalized) 10 1 1 0.6 3.0 Supply voltage [ V ] 5.0 Needs for Low-Power • Efficient methodologies and technologies for the design of high-throughput and lowpower digital systems are needed. • The main interest of many researches is now oriented towards lowering the energy dissipation of these systems while still maintaining the high-throughput in real time processing. Baterije – podela U zavisnosti od načina upotrebe (korišćenja) baterije delimo na: • primarne - namenjene da se pune jedanput, koriste se dok se ne isprazne, a nakon toga se bacaju • sekundarne – imaju mogućnost da se ponovo pune i prazne više puta Osobine 1. Energy density — je mera koja pokazuje koliko energije baterija može da čuva u zadati volumen ili masu. Ova mera se može iskazati na sledeća dva načina: Volumetrijska energy density se obično meri u watthours per liter (Wh/L) Gravimetrijska energy density se meri u watthours per kilogram (Wh/kg) 2. Memory effect - Neke od sekundarnih baterija poseduju osobinu poznatu kao memory effect. Naime, ako se ove baterije koriste dok se u potpunosti isprazne, tada se one mogu ponovo napuniti do njihovog početno deklarisanog kapaciteta. No ako su ove baterije delimično isprazne pre ponovnog punjenja one pokazuju osobine redukcije energetskog kapaciteta. Nakon većeg broja punjenja i pražnjenja ove baterije će postati potpuno beskorisne. Osobine – prod. 3. Cycle life – ukazuje na broj ciklusa punjenja i pražnjenja koju baterija može da podnese pre nego što postane neupotrebljiva. 4. Working voltage – dostupan napon od jedne čelije koji je odredjen hemijskim sastavom baterije. 5. Self discharge – brzina sa kojom se baterija sama po sebi prazni kada je neiskorišćena. Tehnologija baterije - tipovi Ni-Cd – najčešće korišćen oblik. Ove baterije se karakterišu high-energy current i koriste se za ugradnju u uredjajima koji mogu da pokretaju male motore. Memorijski efekat, high-self-discharge rate, i low-energy density su loše osobine ovih baterija, što ih čini neupotrebljivim za cellular phones i notebook computers. Alkaline – imaju energy-density nešto bolju od Ni-Cd, i uglavnom se koriste kao baterije za jednokratnu upotrebu. Postoje i recharchable tip ovih baterija ali njihova energy-density brzo opada sa višestrukim punjenjem. Ni-MH – Nickel Metal Hybride baterije se uobičajeno koriste kod cellular phones i notebook computers jer je njihova cena prihvatljiva, a energydensity je relativno visoka. Na žalost self-discharching rate je visoka što ih čini neogodnim za odredjene aplikacije. Ovaj tip baterije je dugo bio most izmedju Ni-Cd i lithium ion-skih, ali je izgubio primat zbog pada cena lithium-skih baterija. Tehnologija baterije - tipovi (prod.) Lithium-ion - karakteriše se velikim energy-density. Standardno se koriste kod cellular-nih telefona i notebook računara. Veoma su tanke (do 0.5 mm). Zadnjih godina cena im je drastično pala. Lithium polymer – karakteriše se high energy density i mogu se formirati (oblikovati) u različite oblike čime se izvrsno uklapaju sa formom (oblikom) proizvoda. Photovoltaic cells - konvertuju ambijentalno svetlo u električnu energiju i mogu se koristiti za low-power devices kakvi su kalkulatori. Fuel cells – konvertuju hydro-carbon u električnu energiju i imaju veoma visoku energy density. Ponovno punjenje ovih ćelija slično je punjenju upaljača. Imaju od 3 do 5 puta bolju energy density u odnosu na lithium ionske baterije , ali su nepraktične za apikacije koje se odnose na prenosive elektronske uredjaje. Kritične metrike za tehnologiju baterije Implementacija proizvoda Najbolja tehnologija baterije za prenosive elektronske uredjaje se odredjuje u fazi procesa analize proizvoda. Projektant mora pri tome da napravi pravi balans izmedju high energy capacity, male dimenzije baterije (small form factor), i cene, kako bi napravio uspešan proizvoodni koncept. Da bi rešenje učinio realnim, proizvodjač mora da sagleda formu (oblik) baterije, zahteve za ponovnim punjenjem /zamena, mehaničku montažu, konektore, i power management elektronikom. Postoji mnogo oblika (formi) baterija. Standardne forme su AA, AAA, C i D celije, lithium-ske button cell baterije koje se takoreči mogu kupiti u svakoj prodavnici. Ovi tipovi baterija su poželjni ako želimo da one budu lako zamenljive od strane širokog kruga korisnika. Sa druge strane, lithium ion-ske i Ni-MH su dostupne u razne forme (pravougaone, ne cilindrične, i dr.) kao i neke forme koje se prave po narudžbini.