Download 1.9 Fallacies and Pitfalls

Document related concepts
no text concepts found
Transcript
Advanced Computer Architecture
•
•
•
•
•
•
Fundamental of Computer Design
Instruction Set Principles and Examples
Pipelining:Basic and Intermediate Concepts
Memory Hierarchy Design
Storage System
Instruction-Level Parallelism:Concepts and
Challenges
• Exploiting Instruction-Level Parallelism with
Software Approaches
• Multiprocessors and Thread-Level Parallelism
Forces on Computer Architecture
Technology
Programming
Languages
Applications
Computer
Architecture
Operating
Systems
History
(A = F / M)
Fundamentals of Computer
Design
•
•
•
•
•
•
•
Introduction
The Task of the Computer Designer
Technology Trends
Cost Price, and Their Trends
Performance
Quantitative Principles of Computer Design
Putting It All Together: Performance and PricePerformance
• Power Consumption and Efficiency
• Fallacies and Pitfalls
Microprocessor Performance
Cost of Downtime
System Characteristics of the the Three
Computing Classes
Technology Trends
•
•
•
•
•
•
Clock Rate:
~30% per year
Transistor Density:
~35%
Chip Area:
~15%
Transistors per chip: ~55%
Total Performance Capability: ~100%
by the time you graduate...
– 3x clock rate (3-4 GHz)
– 10x transistor count (1 Billion transistors)
– 30x raw capability
• plus 16x DRAM density, 32x disk density
The Most Important Functional Requirements and Architect Faces
1.4 Cost, Price, and Their Trends
Prices of six generation of DRAMS
The Price of an Intel Pentium III over Time
What is “Computer Architecture”?
Application
Operating
System
Compiler
Firmware
Instr. Set Proc. I/O system
Instruction Set
Architecture
Datapath & Control
Digital Design
Circuit Design
Layout
• Coordination of many levels of abstraction
• Under a rapidly changing set of forces
• Design, Measurement, and Evaluation
Computer Architecture Topics
• Networks
P M
P
S
M
°°°
P
M
P M
Interconnection Network
Processor-Memory-Switch
Multiprocessors
Networks and Interconnections
Shared Memory,
Message Passing,
Data Parallelism
Network Interfaces
Topologies,
Routing,
Bandwidth,
Latency,
Reliability
Photograph of an Intel Pentium 4
This 8-inch Wafer Contains 564 MIPS64
20k Processors
Dies per wafer 
 (Wafer Diameter/ 2) 2   Wafer Diameter
Dies Areas

2  Dies area
Die yield
Defect per unit area  Die area 
Dies Yield  Wafer Yield1 





Estimated distribution of PC Costs
RAM Cost Drop
The components of price for a $1000 PC
1.5 Measuring and Reporting Performance:
Execution Time
1
Execution timeY PerformanceY PerformanceX
n


1
Execution timeX
PerformanceY
PerformanceX
The programs in the SPEC CPU 2000 benchmark suites
The Embedded Benchmark
EEMBC:The EDN Embedded Microprocessor Benchmarks Consortium
The machine, software, and baseline
tuning parameters for the CINT2000
Comparing and Summarizing Performance
Weighted arithmetic mean execution for
three machines
Execution times from Figure 1.15
normalized to each machine
1.6 Quantitative Principles of Computer
Design
• Amdahl’s Law
Perforrman ce for entire task u sin g the enhancement when possible
Speedup 
Performanc e for entire task without u sin g the enhancement
Speedup 
Execution time for entire task without u sin g the enhancement
Execution time for entire task u sin g the enhancement when possible
Amdahl’s Law
• Enhancement more, Improvement more
Execution timenew  Execution timeold  ((1  Fractionenhanced ) 
Speedupoverall 
Fractionenhanced
)
Speedupenhanced
Execution timenew
1

Fractionenhanced
Execution timeold (1  Fraction
)

)
enhanced
Speedupenhanced
Amdahl’s Law (Page41)
Performance Comparison-Speedup
Amdahl’s Law
The CPU Performance Equation(Page42)
CPU time  CPU Clock Cycles for a Pr ogram Clock cycle time
CPU time  Instruction Count  Cycles per instruction  Clock cycle time
CPU time  IC  CPI  Clock cycle time
Instructions Clock Cycles
Seconds
Seconds
CPU time 



Pr ogram
Instruction Clock cycles Pr ogram
CPU time
• Clock cycle time---Hardware technology
and organization
• CPI---Organization and instruction set
architecture
• Instruction count---Instruction set
architecture and compiler technology
Overall CPI
n
CPU time  (  ICi  CPI i )  Clock cycle time
i 1
n
CPI overall 
(  ICi  CPI i )
i 1
Instruction count
n

i 1
ICi
CPI i
Instruction count
Overall CPI Comparison (Page44)
CPI Com.
Speedup
• Pipeline(Operation manual,Regular
design ,…)
• Principle of locality-Temporal and Spatial
• Parallelism-Multiple Units, processors and
Cluster Servers, Distributed Computing,…
• Clock Rate ,(Circuits, Devices,…..)
• Optics,…..
1.7 Performance and Price-performance
Seven different desktop systems
Performance and price-performance
Performance and price-performance
Cluster Systems
The performance and the price-performance of
cluster systems
Price-performance of cluster systems
Five different embedded processors
Relative performance of five different embedded
processors for three of the five EEMBC
benchmark suites
EEMBC:The EDN Embedded Microprocessor Benchmarks Consortium
Relative price-performance of five different
embedded processors for three of the five
EEMBC benchmark suites
1.8 Power Consumption and Efficiency as the
metric
1.9 Fallacies and Pitfalls
• Fallacies—misbelieves(F)
• Pitfalls---Easily made mistakes(P)
– The relative performance of two processors with the
same instruction set architecture(ISA) can be judged by
clock rate or by the performance of a single benchmark
suite. (F)(Fig.1.28)
– Benchmarks remain valid indefinitely. (F)(Fig. 1.29)
– Comparing hand-coded assembly and compilergenerated high-level language performance.(P)
– Peak performance tracks observed performance. (F)
1.9 Fallacies and Pitfalls
• The Best design for a computer is the one that
optimizes the primary objective without
considering implementation.(F)
• Neglecting the cost of software in either
evaluating a system or examining costperformance. (P)
• Falling prey to Amdahl’s Law.(P)
• Synthetic benchmarks predict performance for real
programs.
1.9 Fallacies and Pitfalls
• MIPS is an accurate measure for computing
performance among computers.(F)
Instruction count Clock rate
MIPS 

6
Excution time  10
CPI  106
Instruction count
Excution time 
MIPS  106
1.9 Fallacies and Pitfalls
• The problem with using MIPS as a measure
for comparison
– MIPS is dependent on the instruction set,
making it difficult to compare MIPS of
computer with different instruction sets.
– MIPS varies between programs on the same
computer.
– Most importantly, MIPS can vary inversely to
performance
P4 and P3 performance comparison-Relative
performance
The tuning parameters for the SPEC CFP2000
report
The evolution of the
SPEC benchmarks
over time
The performance of three embedded
processors
Measurements of peak performance and
actual performance
1.10 Concluding Remarks
• Make the common case fast
• Chap. 2:The interaction between compiler
and instruction set design.
• Part 3: Pipeline(Appendix A)
• Part 4: Memory Design(Chap.5)
• Part 5: Storage System (Chap. 7)
• (Page1-86),(page87-168),(page A-1~A87)…..
1.11 Historical Perspective and
References
• The First General-purpose Electronic
Computers
• Important special-purpose machines
• Commercial Developments
• Development of Quantitative Performance
Measures:Successes and Failures
PerformanceM
MIPS M 
 MIPS reference
Performancereference
Related documents