Download Lecture 4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ECE 232
Hardware Organization and Design
Lecture 4
Performance, Design with VHDL
Maciej Ciesielski
www.ecs.umass.edu/ece/labs/vlsicad/ece232/spr2002/index_232.html
ECE 232 L4 perform.1
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
Outline
° Performance, evaluation
• Metrics: MIPS, CPI, execution time
• Amdahl’s law
° VHDL basics
• Combinational logic
• Examples
° Instruction formats, cont’d
• Addressing classes, modes
• Examples
• MIPS assembly
ECE 232 L4 perform.2
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
Two notions of “performance”
Plane
NY to Paris
Speed
Passengers
Throughput
(p/mph)
Boeing 747
6.5 hours
610 mph
470
286,700
Concodre
3 hours
1350 mph
132
178,200
Which has higher performance?
° Time to do the task (Execution Time)
– execution time, response time, latency
° Tasks per day, hour, week, sec, ns. .. (Performance)
– throughput, bandwidth
Response time and throughput often are in opposition
ECE 232 L4 perform.3
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
Performance - Example
Performance: in units of things/time_unit
- bigger is better
• Time of Concorde vs. Boeing 747?
• Concord is 1350 mph / 610 mph = 2.2 times faster
(6.5 hours / 3 hours)
• Throughput of Concorde vs. Boeing 747 ?
• Concord is 178,200 pmph / 286,700 pmph = 0.62 “times faster”
• Boeing is 286,700 pmph / 178,200 pmph = 1.6 “times faster”
• Boeing is 1.6 times (“60%”)faster in terms of throughput
• Concord is 2.2 times (“120%”) faster in terms of flying time
We will focus primarily on execution time for a single job
ECE 232 L4 perform.4
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
Metrics of performance
Answers per month
Application
Useful Operations per second
Programming
Language
Compiler
(millions) of Instructions per second – MIPS
(millions) of (F.P.) operations per second – MFLOP/s
ISA
Datapath
Control
Megabytes per second
Function Units
Transistors Wires Pins
Cycles per second (clock rate)
Each metric has a place and a purpose, and each can be misused
ECE 232 L4 perform.5
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
Review: Aspects of CPU Performance
CPU time
= Seconds
Program
= Instructions x Cycles
Program
Instr count
Instruction
CPI
Program
X
Compiler
X
X
Instr. Set
X
X
Organization
Technology
ECE 232 L4 perform.6
Adapted from Patterson 97 ©UCB
x Seconds
X
Cycle
Clock rate
X
X
Copyright 1998 Morgan Kaufmann Publishers
MIPS, CPI
• MIPS = # instructions per cycle (in millions)
MIPS = Instruction count / Execution time *106
• CPI = average # cycles per instruction
CPI = Clock Cycles / Instruction Count
= (CPU Time * Clock Rate) / Instruction Count
n
CPU time = ClockCycleTime *

CPI
i =1
i
cycles per intstruction class i
* Instri
n

CPI
i =1
CPI =
i
* Fi
where Fi
=
"instruction frequency"
Instri
Instruction Count
° Invest Resources where time is Spent!
ECE 232 L4 perform.7
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
Evaluating Instruction Sets
Design-time metrics:
° Can it be implemented, in how long, at what cost?
° Can it be programmed? Ease of compilation?
Static Metrics:
° How many bytes does the program occupy in memory?
Dynamic Metrics:
° How many instructions are executed?
° How many bytes does the processor fetch to execute the program?
° How many clocks are required per instruction?
° How "lean" a clock is practical?
Best Metric: Time to execute the program!
NOTE: this depends on instructions set, processor organization, and
compilation techniques.
ECE 232 L4 perform.8
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
Example (RISC processor)
Base Machine (Reg / Reg)
Op
Freq
Cycles
ALU
50%
1
Load
20%
5
Store
10%
3
Branch
20%
2
CPI(i)
.5
1.0
.3
.4
2.2
% Time
23%
45%
14%
18%
Typical Mix
• How much faster would the machine be if a better data cache
reduced the average load time to 2 cycles?
• How does this compare with using branch prediction to shave a
cycle off the branch time?
• What if two ALU instructions could be executed at once?
ECE 232 L4 perform.9
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
Amdahl's Law
Speedup due to enhancement E:
Exec_time w/o E
Speedup(E) = ----------------------Exec_time with E
=
F
Performance with E
--------------------------Performance w/o E
F
1
1/S
Suppose that enhancement E accelerates a fraction F of the task
by a factor S, and the remainder of the task is unaffected. Then:
Exec_time(with E) = (F/S + (1-F) ) X Exec_time(w/o E)
Speedup(with E) =
ECE 232 L4 perform.10
1
(1-F) + F/S
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
Review: Summary of the Design Process
• Hierarchical Design to manage complexity
• Top Down vs. Bottom Up vs. Successive Refinement
• Importance of Design Representations:
•
•
•
•
•
Block Diagrams
Decomposition into Bit Slices
Truth Tables, K-Maps
Circuit Diagrams
Other Descriptions:
- state diagrams
- timing diagrams
- register transfer, . . .
• Optimization Criteria:
Area
Gate Count
top
down
bottom
up
Logic Levels
Delay
Power
Fan-in/Fan-out
[Package Count]
Cost
Design time
Pin Out
ECE 232 L4 perform.11
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
Hardware Representation Languages
Block Diagrams: FUs, Registers, & Dataflows
Register Transfer Diagrams: Choice of busses to connect FUs, Regs
Flowcharts
State Diagrams
Two different ways to describe
sequencing & microoperations
Hardware Description Languages
HW modules described like programs
with i/o ports, internal state, & parallel
execution of assignment statements
Verilog HDL
VHDL
Descriptions in these languages can be used as input to
simulation systems
"software breadboard"
synthesis systems
generate hw from high level description
"To Design is to Represent"
ECE 232 L4 perform.12
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
VHDL (VHSIC Hardware Description Language)
° Goals:
• Support design, documentation, and simulation of hardware
• Digital system level to gate level
• “Technology Insertion”
° Concepts:
• Design entity
• Time-based execution model.
Interface =
External Characteristics
Design Entity =
Hardware Component
Architecture (Body ) =
Internal Behavior or Structure
ECE 232 L4 perform.13
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
VHDL Example: nand Gate
ENTITY nand is
PORT (a,b: IN BIT;
names (given)
y: OUT BIT);
END nand;
a
ARCHITECTURE behavioral OF nand is
BEGIN
nand
y
b
y <= a NAND b;
END behavioral;
° Entity describes interface
° Architecture give behavior (function)
° y is a signal, not a variable
• it changes whenever the inputs change
• NAND process is in an infinite loop
° Bit is 0, 1. Can also use STD_LOGIC (0,1, Z,X)
ECE 232 L4 perform.14
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
Modeling Delays
ENTITY nand is
PORT (a,b: IN BIT; y: OUT BIT);
END nand;
ARCHITECTURE behavioral OF nand is
BEGIN
y <= a NAND b after 1 ns;
END behavioral;
° Model temporal, as well as functional behavior, with delays in signal
statements. Time is one difference from programming languages
° Output y changes 1 ns after a or b changes
° Delay statements not supported by synthesis tools
(non-synthesizable)
ECE 232 L4 perform.15
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
Bit-vector Operators
ENTITY nand32 is
PORT (a,b: IN STD_LOGIC_VECTOR ( 31 downto 0);
y: OUT STD_LOGIC_VECTOR ( 31 downto 0);
END nand32;
ARCHITECTURE behavioral OF nand32 is
BEGIN
y <= a NAND b;
STD_LOGIC_VECTOR
END behavioral;
• Can be converted to a 32 bit integer
a [31:0]
Y[31:0]
nand32
b [31:0]
ECE 232 L4 perform.16
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
Simple Operators
LIBRARY ieee;
a
0
ENTITY mux2to1 IS
PORT (a, b, sel: IN STD_LOGIC;
b
y: OUT STD_LOGIC;
END mux2to1;
1
mux2to1
USE ieee.std_logic_1164.all;
y
sel
ARCHITECTURE logic OF mux2to1 IS
BEGIN
WITH sel SELECT
y <= a WHEN ‘0’ ;
b WHEN OTHERS;
END logic ;
You can also use other constructs:
IF … THEN
WHEN,
etc.
° Must use “others”, since sel={0,1,Z,X} (std_logic)
ECE 232 L4 perform.17
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
Arithmetic Operations
ENTITY add32 is
PORT (a,b: IN STD_LOGIC_VECTOR ( 31 downto 0);
y: OUT STD_LOGIC_VECTOR ( 31 downto 0);
END add32;
ARCHITECTURE behavioral OF add32 is
BEGIN
y <= addum(a, b) ;
END behavioral;
° “addum” adds two n-bit vectors to produce an n+1 bit vector
° Alternatively, you can declare a,b,y as INTEGERS, and use y <= a+b.
ECE 232 L4 perform.18
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers
Control Constructs
ENTITY mux32 is
PORT(A, B: In STD_LOGIC_VECTOR (31 downto 0);
DOUT: STD_LOGIC_VECTOR (31 downto 0);
SEL: in BIT);
End mux32;
ARCHITECTURE behavior Of mux32 Is
begin
mux32_process: process(A, B, SEL)
begin
if (SEL= 0) then
DOUT <= A;
else
DOUT <= B;
end if;
end process;
end behavior;
° Process fires whenever its “sensitivity list” changes
° Evaluates the body sequentially
° VHDL provide case statements as well
ECE 232 L4 perform.19
Adapted from Patterson 97 ©UCB
Copyright 1998 Morgan Kaufmann Publishers