Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Advanced Computer Architecture • • • • • • Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory Hierarchy Design Storage System Instruction-Level Parallelism:Concepts and Challenges • Exploiting Instruction-Level Parallelism with Software Approaches • Multiprocessors and Thread-Level Parallelism Forces on Computer Architecture Technology Programming Languages Applications Computer Architecture Operating Systems History (A = F / M) Fundamentals of Computer Design • • • • • • • Introduction The Task of the Computer Designer Technology Trends Cost Price, and Their Trends Performance Quantitative Principles of Computer Design Putting It All Together: Performance and PricePerformance • Power Consumption and Efficiency • Fallacies and Pitfalls Microprocessor Performance Cost of Downtime System Characteristics of the the Three Computing Classes Technology Trends • • • • • • Clock Rate: ~30% per year Transistor Density: ~35% Chip Area: ~15% Transistors per chip: ~55% Total Performance Capability: ~100% by the time you graduate... – 3x clock rate (3-4 GHz) – 10x transistor count (1 Billion transistors) – 30x raw capability • plus 16x DRAM density, 32x disk density The Most Important Functional Requirements and Architect Faces 1.4 Cost, Price, and Their Trends Prices of six generation of DRAMS The Price of an Intel Pentium III over Time What is “Computer Architecture”? Application Operating System Compiler Firmware Instr. Set Proc. I/O system Instruction Set Architecture Datapath & Control Digital Design Circuit Design Layout • Coordination of many levels of abstraction • Under a rapidly changing set of forces • Design, Measurement, and Evaluation Computer Architecture Topics • Networks P M P S M °°° P M P M Interconnection Network Processor-Memory-Switch Multiprocessors Networks and Interconnections Shared Memory, Message Passing, Data Parallelism Network Interfaces Topologies, Routing, Bandwidth, Latency, Reliability Photograph of an Intel Pentium 4 This 8-inch Wafer Contains 564 MIPS64 20k Processors Dies per wafer (Wafer Diameter/ 2) 2 Wafer Diameter Dies Areas 2 Dies area Die yield Defect per unit area Die area Dies Yield Wafer Yield1 Estimated distribution of PC Costs RAM Cost Drop The components of price for a $1000 PC 1.5 Measuring and Reporting Performance: Execution Time 1 Execution timeY PerformanceY PerformanceX n 1 Execution timeX PerformanceY PerformanceX The programs in the SPEC CPU 2000 benchmark suites The Embedded Benchmark EEMBC:The EDN Embedded Microprocessor Benchmarks Consortium The machine, software, and baseline tuning parameters for the CINT2000 Comparing and Summarizing Performance Weighted arithmetic mean execution for three machines Execution times from Figure 1.15 normalized to each machine 1.6 Quantitative Principles of Computer Design • Amdahl’s Law Perforrman ce for entire task u sin g the enhancement when possible Speedup Performanc e for entire task without u sin g the enhancement Speedup Execution time for entire task without u sin g the enhancement Execution time for entire task u sin g the enhancement when possible Amdahl’s Law • Enhancement more, Improvement more Execution timenew Execution timeold ((1 Fractionenhanced ) Speedupoverall Fractionenhanced ) Speedupenhanced Execution timenew 1 Fractionenhanced Execution timeold (1 Fraction ) ) enhanced Speedupenhanced Amdahl’s Law (Page41) Performance Comparison-Speedup Amdahl’s Law The CPU Performance Equation(Page42) CPU time CPU Clock Cycles for a Pr ogram Clock cycle time CPU time Instruction Count Cycles per instruction Clock cycle time CPU time IC CPI Clock cycle time Instructions Clock Cycles Seconds Seconds CPU time Pr ogram Instruction Clock cycles Pr ogram CPU time • Clock cycle time---Hardware technology and organization • CPI---Organization and instruction set architecture • Instruction count---Instruction set architecture and compiler technology Overall CPI n CPU time ( ICi CPI i ) Clock cycle time i 1 n CPI overall ( ICi CPI i ) i 1 Instruction count n i 1 ICi CPI i Instruction count Overall CPI Comparison (Page44) CPI Com. Speedup • Pipeline(Operation manual,Regular design ,…) • Principle of locality-Temporal and Spatial • Parallelism-Multiple Units, processors and Cluster Servers, Distributed Computing,… • Clock Rate ,(Circuits, Devices,…..) • Optics,….. 1.7 Performance and Price-performance Seven different desktop systems Performance and price-performance Performance and price-performance Cluster Systems The performance and the price-performance of cluster systems Price-performance of cluster systems Five different embedded processors Relative performance of five different embedded processors for three of the five EEMBC benchmark suites EEMBC:The EDN Embedded Microprocessor Benchmarks Consortium Relative price-performance of five different embedded processors for three of the five EEMBC benchmark suites 1.8 Power Consumption and Efficiency as the metric 1.9 Fallacies and Pitfalls • Fallacies—misbelieves(F) • Pitfalls---Easily made mistakes(P) – The relative performance of two processors with the same instruction set architecture(ISA) can be judged by clock rate or by the performance of a single benchmark suite. (F)(Fig.1.28) – Benchmarks remain valid indefinitely. (F)(Fig. 1.29) – Comparing hand-coded assembly and compilergenerated high-level language performance.(P) – Peak performance tracks observed performance. (F) 1.9 Fallacies and Pitfalls • The Best design for a computer is the one that optimizes the primary objective without considering implementation.(F) • Neglecting the cost of software in either evaluating a system or examining costperformance. (P) • Falling prey to Amdahl’s Law.(P) • Synthetic benchmarks predict performance for real programs. 1.9 Fallacies and Pitfalls • MIPS is an accurate measure for computing performance among computers.(F) Instruction count Clock rate MIPS 6 Excution time 10 CPI 106 Instruction count Excution time MIPS 106 1.9 Fallacies and Pitfalls • The problem with using MIPS as a measure for comparison – MIPS is dependent on the instruction set, making it difficult to compare MIPS of computer with different instruction sets. – MIPS varies between programs on the same computer. – Most importantly, MIPS can vary inversely to performance P4 and P3 performance comparison-Relative performance The tuning parameters for the SPEC CFP2000 report The evolution of the SPEC benchmarks over time The performance of three embedded processors Measurements of peak performance and actual performance 1.10 Concluding Remarks • Make the common case fast • Chap. 2:The interaction between compiler and instruction set design. • Part 3: Pipeline(Appendix A) • Part 4: Memory Design(Chap.5) • Part 5: Storage System (Chap. 7) • (Page1-86),(page87-168),(page A-1~A87)….. 1.11 Historical Perspective and References • The First General-purpose Electronic Computers • Important special-purpose machines • Commercial Developments • Development of Quantitative Performance Measures:Successes and Failures PerformanceM MIPS M MIPS reference Performancereference