Download Lecture 2: Fundamentals of Computer Design

Document related concepts
no text concepts found
Transcript
Lecture 2: Fundamentals
of Computer Design
Kai Bu
[email protected]
http://list.zju.edu.cn/kaibu/comparch
Chapter 1
• Transition from single processor to
multiple processors;
• Quantitative approach: empirical
observations (of programs,
experimentations, simulation) as its
tools;
Outline
• Classes of computers
• Parallelism
• Instruction Set Architecture
• Trends
• Dependability
• Performance Measurement
Outline
• Classes of computers
• Parallelism
• Instruction Set Architecture
• Trends
• Dependability
• Performance Measurement
5 Classes of Computers
PMD: Personal Mobile Device
• Wireless devices with multimedia user
interfaces
• cell phones, tablet computers, etc.
• a few hundred dollars
PMD Characteristics
• Cost effectiveness
less expensive packaging;
absence of fan for cooling
• Responsiveness & Predictability
real-time performance: a maximum execution time for each
app segment;
soft real-time: average time constraint – tolerate occasionally
missed time constraint on an event.
• Memory efficiency
optimize code size
• Energy efficiency
battery power, heat dissipation
Desktop Computing
• Largest market share
• low-end netbooks: $x00
•…
• high-end workstations: $x000
Desktop Characteristics
• Price-Performance
combination of performance and price;
compute performance
graphics performance
• The most important to customers,
and hence to computer designers
Servers
• Provide large-scale and reliable file and
computing services (to desktops)
• Constitute the backbone of large-scale
enterprise computing
Servers Characteristics
• Availability
against server failure
• Scalability
in response to increasing demand with
scaling up computing capacity, memory,
storage, and I/O bandwidth
• Efficient throughput
toward more requests handled in a unit
time
Why Server Availability
Clusters/WSCs
Warehouse-Scale Computers
collections of desktop computers or servers
connected by local area networks
to act as a single larger computer
Characteristics
price-performance, power, availability
Embedded Computers
hide everywhere
Embedded vs Non-embedded
• Dividing line
the ability to run third-party software
• Embedded computers’ primary goal
meet the performance need at a
minimum price;
rather than achieve higher
performance at a higher price
Outline
• Classes of computers
• Parallelism
• Instruction Set Architecture
• Trends
• Dependability
• Performance Measurement
Application Parallelism
• DLP: Data-Level Parallelism
many data items being operated on at
the same time
• TLP: Task-Level Parallelism
tasks of work created to be operate
independently and largely in parallel
Hardware Parallelism
• Computer hardware exploits two kinds
of application parallelism in four major
ways:
Instruction-Level Parallelism
Vector Architectures and GPUs
Thread-Level Parallelism
Request-Level Parallelism
Hardware Parallelism
• Instruction-Level Parallelism
exploits data-level parallelism
at modest levels – pipelining;
at medium levels – speculative exec;
Hardware Parallelism
• Vector Architectures &
GPUs (Graphic Process Units)
exploit data-level parallelism
apply a single instruction to a collection
of data in parallel
Hardware Parallelism
• Thread-Level Parallelism
exploits either DLP or TLP
in a tightly coupled hardware model
that allows for interaction among
parallel threads
Hardware Parallelism
• Request-Level Parallelism
exploits parallelism among largely
decoupled tasks specified by the
programmer or the OS
Classes of Parallel Architectures
by Michael Flynn
according to the parallelism
in the instruction and data
streams called for by the
instructions at the most
constrained component of
the multiprocessor:
SISD, SIMD, MISD, MIMD
SISD
• Single instruction stream, single data
stream – uniprocessor
• Can exploit instruction-level parallelism
SIMD
• Single instruction stream, multiple data
stream
• The same instruction is executed by
multiple processors using different
data streams.
• Exploits data-level parallelism
• Data memory for each processor;
whereas a single instruction memory
and control processor.
MISD
• Multiple instruction streams, single
data stream
• No commercial multiprocessor of this
type yet
MIMD
• Multiple instruction streams, multiple
data streams
• Each processor fetches its own
instructions and operates on its own
data.
• Exploits task-level parallelism
Outline
• Classes of computers
• Parallelism
• Instruction Set Architecture
• Trends
• Dependability
• Performance Measurement
Instruction Set Architecture
ISA
• actual programmer-visible instruction
set
• the boundary between software and
hardware
• 7 major dimensions
ISA: Class
• Most are general-purpose register
architectures with operands of either
registers or memory locations
• Two popular versions
register-memory ISA: e.g., 80x86
many instructions can access
memory
load-store ISA: e.g., ARM, MIPS
only load or store instructions can
access memory
ISA: Memory Addressing
• Byte addressing
• Aligned address
object width: s bytes
address: A
aligned if A mod s = 0
Each misaligned object
requires two memory accesses
ISA: Addressing Modes
• Specify the address of a memory
object
• Register, Immediate, Displacement
ISA: Types and Sizes of OPerands
Type
Size in bits
ASCII character
8
Unicode character
Half word
16
Integer
word
32
Double word
Long integer
64
IEEE 754 floating point –
single precision
32
IEEE 754 floating point –
double precision
64
Floating point –
extended double precision
80
MIPS64 Operations
• Data transfer
MIPS64 Operations
• Arithmetic Logical
MIPS64 Operations
• Control
MIPS64 Operations
• Floating point
ISA: Control Flow Instructions
• Types:
conditional branches
unconditional jumps
procedure calls
returns
• Branch address: add an address field
to PC (program counter)
ISA: Encoding an ISA
• Fixed length: ARM, MIPS – 32 bits
• Variable length: 80x86 – 1~18 bytes
http://en.wikipedia.org/wiki/MIPS_architecture
Start with a 6-bit opcode.
R-type:
three registers,
a shift amount field,
and a function field;
I-type:
two registers,
a 16-bit immediate value;
J-type:
a 26-bit jump target.
Computer Architecture
ISA
Organization
actual programmer
high-level aspects
visible instruction set; of computer design:
boundary between sw
memory system,
and hw;
memory interconnect,
design of internal
processor or CPU;
Hardware
computer specifics:
logic design,
packaging tech;
Outline
• Classes of computers
• Parallelism
• Instruction Set Architecture
• Trends
• Dependability
• Performance Measurement
Five Critical
Implementation Technologies
• Integrated circuit logic technology
• Semiconductor DRAM
• Semiconductor flash
• Magnetic disk technology
• Network technology
Integrated circuit logic
technology
• Moore’s Law: a growth rate in
transistor count on
a chip of about
40% to 55%
per year
doubles every
18 to 24 months
Semiconductor DRAM
• Capacity per DRAM chip doubles
roughly every 2 or 3 years
Semiconductor Flash
• Electronically erasable programmable
read-only memory
• Capacity per Flash chip doubles roughly
every two years
• In 2011, 15 to 20 times cheaper per bit
than DRAM
Magnetic Disk Technology
• Since 2004, density doubles every
three years
• 15 to 20 times cheaper per bit than
Flash
• 300 to 500 times cheaper per bit than
DRAM
• For server and warehouse scale storage
Network Technology
• Switches
• Transmission systems
Performance Trends
• Bandwidth/Throughput
the total amount of work done in a
given time;
• Latency/Response Time
the time between the start and the
completion of an event;
Bandwidth over Latency
Trends in Power and Energy
• Power = Energy per unit time
1 watt = 1 joule per second
energy to execute a workload =
avg power x execution time
• Three primary concerns
the max power for a processor
sustained power consumption
energy and energy efficiency
Trends in Power and Energy
• Sustained power consumption
• Metric: TDP
Thermal Design Power
determines cooling requirement
• Heat management
1. reduce clock rate and hence power
as the thermal temperature approaches
the junction temperature limit;
2. if 1 is not working, power down the
chip.
Trends in Power and Energy
• Energy and Energy Efficiency
• energy to execute a workload =
avg power x execution time
• Example
processor A with 20% higher avg
power consumption than processor B;
but A executes the task with 70% of
the time by B;
A or B is more efficient?
Trends in Power and Energy
• Example
processor A with 20% higher avg
power consumption than processor B;
but A executes the task with 70% of
the time by B;
A or B is more efficient?
• EnergyConsumptionA
=1.2 x 0.7 x EnergyConsumptionB
=0.84 x EnergyConsumptionB
Trends in Power and Energy
• Primary energy consumption within a
microprocessor is for switching
transistors – dynamic energy
logic transistion: 0->1->0 or 1->0->1
• The energy of a single transition
Trends in Power and Energy
• The power required per transistor
• For a fixed task, slowing clock rate
(frequency) reduces power, but not
energy.
Trends in Power and Energy
• Example
some microprocessors with adjustable
voltage;
15% reduction in voltage -> 15%
reduction in frequency;
the impact on dynamic energy and
dynamic power?
Trends in Power and Energy
• Answer
Trends in Power and Energy
• Challenges
distributing the power
removing the heat
preventing hot spots
potential research topics
Trends in Power and Energy
• Energy-efficiency improvement
techniques
1. do nothing well
turn off the clock of inactive modules
2. DVFS: dynamic voltage-frequency
scaling
scale down clock frequency and voltage
during periods of low activity
DVFS
Trends in Power and Energy
• Energy-efficiency improvement
techniques
3. design for typical case
PMDs, laptops – often idle
memory and storage with low power
modes to save energy
4. overclocking
the chip runs at a higher clock rate for
a short time until temperature rises
Trends in Cost
• Cost of an Integrated Circuit
wafer for test; chopped into dies for
packaging
Trends in Cost
• Cost of an Integrated Circuit
percentage of
manufactured devices
that survives the
testing procedure
Trends in Cost
• Cost of an Integrated Circuit
Trends in Cost
• Cost of an Integrated Circuit
Intel Core i7 Die
Trends in Cost
• Example
Trends in Cost
• Example
Trends in Cost
• Cost of an Integrated Circuit
• N: process-complexity factor for
measuring manufacturing difficulty
Outline
• Classes of computers
• Parallelism
• Instruction Set Architecture
• Trends
• Dependability
• Performance Measurement
Dependability
• SLA: service level agreements
• System states: up or down
• Service states
service accomplishment
failure
restoration
service interruption
Dependability
• Two measures of dependability
Module reliability
Module availability
Dependability
• Two measures of dependability
Module reliability
continuous service accomplishment
from a reference initial instant
MTTF: mean time to failure
MTTR: mean time to repair
MTBF: mean time between failures
MTBF = MTTF + MTTR
Dependability
• Two measures of dependability
Module reliability
FIT: failures in time
failures per billion hours
MTTF of 1,000,000 hours
= 109/106
= 1000 FIT
Dependability
• Two measures of dependability
Module availability
Dependability
• Example
Dependability
• Answer
Outline
• Classes of computers
• Parallelism
• Instruction Set Architecture
• Trends
• Dependability
• Performance Measurement
Measuring Performance
• Execution time
the time between the start and the
completion of an event
• Throughput
the total amount of work done in a
given time
Measuring Performance
• Computer X and Computer Y
• X is n times faster than Y
Quantitative Principles
• Parallelism
• Locality
temporal locality: recently accessed
items are likely to be accessed in the
near future;
spatial locality: items whose
addresses are near one another tend to
be referenced close together in time
Quantitative Principles
• Amdahl’s Law
Quantitative Principles
• Amdahl’s Law: two factors
1. Fractionenhanced:
e.g., 20/60 if 20 seconds out of a 60second program to enhance
2. Speedupenhanced:
e.g., 5/2 if enhanced to 2 seconds
while originally 5 seconds
Quantitative Principles
• Example
Quantitative Principles
• The Processor Performance Equation
Quantitative Principles
• Example
Quantitative Principles
• Example
?
Reading
• Chapter 1.8, 1.10 – 1.13