Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 2: Fundamentals of Computer Design Kai Bu [email protected] http://list.zju.edu.cn/kaibu/comparch Chapter 1 • Transition from single processor to multiple processors; • Quantitative approach: empirical observations (of programs, experimentations, simulation) as its tools; Outline • Classes of computers • Parallelism • Instruction Set Architecture • Trends • Dependability • Performance Measurement Outline • Classes of computers • Parallelism • Instruction Set Architecture • Trends • Dependability • Performance Measurement 5 Classes of Computers PMD: Personal Mobile Device • Wireless devices with multimedia user interfaces • cell phones, tablet computers, etc. • a few hundred dollars PMD Characteristics • Cost effectiveness less expensive packaging; absence of fan for cooling • Responsiveness & Predictability real-time performance: a maximum execution time for each app segment; soft real-time: average time constraint – tolerate occasionally missed time constraint on an event. • Memory efficiency optimize code size • Energy efficiency battery power, heat dissipation Desktop Computing • Largest market share • low-end netbooks: $x00 •… • high-end workstations: $x000 Desktop Characteristics • Price-Performance combination of performance and price; compute performance graphics performance • The most important to customers, and hence to computer designers Servers • Provide large-scale and reliable file and computing services (to desktops) • Constitute the backbone of large-scale enterprise computing Servers Characteristics • Availability against server failure • Scalability in response to increasing demand with scaling up computing capacity, memory, storage, and I/O bandwidth • Efficient throughput toward more requests handled in a unit time Why Server Availability Clusters/WSCs Warehouse-Scale Computers collections of desktop computers or servers connected by local area networks to act as a single larger computer Characteristics price-performance, power, availability Embedded Computers hide everywhere Embedded vs Non-embedded • Dividing line the ability to run third-party software • Embedded computers’ primary goal meet the performance need at a minimum price; rather than achieve higher performance at a higher price Outline • Classes of computers • Parallelism • Instruction Set Architecture • Trends • Dependability • Performance Measurement Application Parallelism • DLP: Data-Level Parallelism many data items being operated on at the same time • TLP: Task-Level Parallelism tasks of work created to be operate independently and largely in parallel Hardware Parallelism • Computer hardware exploits two kinds of application parallelism in four major ways: Instruction-Level Parallelism Vector Architectures and GPUs Thread-Level Parallelism Request-Level Parallelism Hardware Parallelism • Instruction-Level Parallelism exploits data-level parallelism at modest levels – pipelining; at medium levels – speculative exec; Hardware Parallelism • Vector Architectures & GPUs (Graphic Process Units) exploit data-level parallelism apply a single instruction to a collection of data in parallel Hardware Parallelism • Thread-Level Parallelism exploits either DLP or TLP in a tightly coupled hardware model that allows for interaction among parallel threads Hardware Parallelism • Request-Level Parallelism exploits parallelism among largely decoupled tasks specified by the programmer or the OS Classes of Parallel Architectures by Michael Flynn according to the parallelism in the instruction and data streams called for by the instructions at the most constrained component of the multiprocessor: SISD, SIMD, MISD, MIMD SISD • Single instruction stream, single data stream – uniprocessor • Can exploit instruction-level parallelism SIMD • Single instruction stream, multiple data stream • The same instruction is executed by multiple processors using different data streams. • Exploits data-level parallelism • Data memory for each processor; whereas a single instruction memory and control processor. MISD • Multiple instruction streams, single data stream • No commercial multiprocessor of this type yet MIMD • Multiple instruction streams, multiple data streams • Each processor fetches its own instructions and operates on its own data. • Exploits task-level parallelism Outline • Classes of computers • Parallelism • Instruction Set Architecture • Trends • Dependability • Performance Measurement Instruction Set Architecture ISA • actual programmer-visible instruction set • the boundary between software and hardware • 7 major dimensions ISA: Class • Most are general-purpose register architectures with operands of either registers or memory locations • Two popular versions register-memory ISA: e.g., 80x86 many instructions can access memory load-store ISA: e.g., ARM, MIPS only load or store instructions can access memory ISA: Memory Addressing • Byte addressing • Aligned address object width: s bytes address: A aligned if A mod s = 0 Each misaligned object requires two memory accesses ISA: Addressing Modes • Specify the address of a memory object • Register, Immediate, Displacement ISA: Types and Sizes of OPerands Type Size in bits ASCII character 8 Unicode character Half word 16 Integer word 32 Double word Long integer 64 IEEE 754 floating point – single precision 32 IEEE 754 floating point – double precision 64 Floating point – extended double precision 80 MIPS64 Operations • Data transfer MIPS64 Operations • Arithmetic Logical MIPS64 Operations • Control MIPS64 Operations • Floating point ISA: Control Flow Instructions • Types: conditional branches unconditional jumps procedure calls returns • Branch address: add an address field to PC (program counter) ISA: Encoding an ISA • Fixed length: ARM, MIPS – 32 bits • Variable length: 80x86 – 1~18 bytes http://en.wikipedia.org/wiki/MIPS_architecture Start with a 6-bit opcode. R-type: three registers, a shift amount field, and a function field; I-type: two registers, a 16-bit immediate value; J-type: a 26-bit jump target. Computer Architecture ISA Organization actual programmer high-level aspects visible instruction set; of computer design: boundary between sw memory system, and hw; memory interconnect, design of internal processor or CPU; Hardware computer specifics: logic design, packaging tech; Outline • Classes of computers • Parallelism • Instruction Set Architecture • Trends • Dependability • Performance Measurement Five Critical Implementation Technologies • Integrated circuit logic technology • Semiconductor DRAM • Semiconductor flash • Magnetic disk technology • Network technology Integrated circuit logic technology • Moore’s Law: a growth rate in transistor count on a chip of about 40% to 55% per year doubles every 18 to 24 months Semiconductor DRAM • Capacity per DRAM chip doubles roughly every 2 or 3 years Semiconductor Flash • Electronically erasable programmable read-only memory • Capacity per Flash chip doubles roughly every two years • In 2011, 15 to 20 times cheaper per bit than DRAM Magnetic Disk Technology • Since 2004, density doubles every three years • 15 to 20 times cheaper per bit than Flash • 300 to 500 times cheaper per bit than DRAM • For server and warehouse scale storage Network Technology • Switches • Transmission systems Performance Trends • Bandwidth/Throughput the total amount of work done in a given time; • Latency/Response Time the time between the start and the completion of an event; Bandwidth over Latency Trends in Power and Energy • Power = Energy per unit time 1 watt = 1 joule per second energy to execute a workload = avg power x execution time • Three primary concerns the max power for a processor sustained power consumption energy and energy efficiency Trends in Power and Energy • Sustained power consumption • Metric: TDP Thermal Design Power determines cooling requirement • Heat management 1. reduce clock rate and hence power as the thermal temperature approaches the junction temperature limit; 2. if 1 is not working, power down the chip. Trends in Power and Energy • Energy and Energy Efficiency • energy to execute a workload = avg power x execution time • Example processor A with 20% higher avg power consumption than processor B; but A executes the task with 70% of the time by B; A or B is more efficient? Trends in Power and Energy • Example processor A with 20% higher avg power consumption than processor B; but A executes the task with 70% of the time by B; A or B is more efficient? • EnergyConsumptionA =1.2 x 0.7 x EnergyConsumptionB =0.84 x EnergyConsumptionB Trends in Power and Energy • Primary energy consumption within a microprocessor is for switching transistors – dynamic energy logic transistion: 0->1->0 or 1->0->1 • The energy of a single transition Trends in Power and Energy • The power required per transistor • For a fixed task, slowing clock rate (frequency) reduces power, but not energy. Trends in Power and Energy • Example some microprocessors with adjustable voltage; 15% reduction in voltage -> 15% reduction in frequency; the impact on dynamic energy and dynamic power? Trends in Power and Energy • Answer Trends in Power and Energy • Challenges distributing the power removing the heat preventing hot spots potential research topics Trends in Power and Energy • Energy-efficiency improvement techniques 1. do nothing well turn off the clock of inactive modules 2. DVFS: dynamic voltage-frequency scaling scale down clock frequency and voltage during periods of low activity DVFS Trends in Power and Energy • Energy-efficiency improvement techniques 3. design for typical case PMDs, laptops – often idle memory and storage with low power modes to save energy 4. overclocking the chip runs at a higher clock rate for a short time until temperature rises Trends in Cost • Cost of an Integrated Circuit wafer for test; chopped into dies for packaging Trends in Cost • Cost of an Integrated Circuit percentage of manufactured devices that survives the testing procedure Trends in Cost • Cost of an Integrated Circuit Trends in Cost • Cost of an Integrated Circuit Intel Core i7 Die Trends in Cost • Example Trends in Cost • Example Trends in Cost • Cost of an Integrated Circuit • N: process-complexity factor for measuring manufacturing difficulty Outline • Classes of computers • Parallelism • Instruction Set Architecture • Trends • Dependability • Performance Measurement Dependability • SLA: service level agreements • System states: up or down • Service states service accomplishment failure restoration service interruption Dependability • Two measures of dependability Module reliability Module availability Dependability • Two measures of dependability Module reliability continuous service accomplishment from a reference initial instant MTTF: mean time to failure MTTR: mean time to repair MTBF: mean time between failures MTBF = MTTF + MTTR Dependability • Two measures of dependability Module reliability FIT: failures in time failures per billion hours MTTF of 1,000,000 hours = 109/106 = 1000 FIT Dependability • Two measures of dependability Module availability Dependability • Example Dependability • Answer Outline • Classes of computers • Parallelism • Instruction Set Architecture • Trends • Dependability • Performance Measurement Measuring Performance • Execution time the time between the start and the completion of an event • Throughput the total amount of work done in a given time Measuring Performance • Computer X and Computer Y • X is n times faster than Y Quantitative Principles • Parallelism • Locality temporal locality: recently accessed items are likely to be accessed in the near future; spatial locality: items whose addresses are near one another tend to be referenced close together in time Quantitative Principles • Amdahl’s Law Quantitative Principles • Amdahl’s Law: two factors 1. Fractionenhanced: e.g., 20/60 if 20 seconds out of a 60second program to enhance 2. Speedupenhanced: e.g., 5/2 if enhanced to 2 seconds while originally 5 seconds Quantitative Principles • Example Quantitative Principles • The Processor Performance Equation Quantitative Principles • Example Quantitative Principles • Example ? Reading • Chapter 1.8, 1.10 – 1.13