Download Datapaths

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
COMP541
Datapaths I
Montek Singh
Mar 28, 2012
1
Topics
 Over next 2 classes: datapaths
 How ALUs are designed
 How data is stored in a register file
 Lab 9: Start building a datapath!
2
What is computer architecture?
3
Architecture (ISA)
 Jumping up a few levels of
abstraction.
 Architecture: the
programmer’s view of the
computer
 Defined by instructions
(operations) and operand
locations
 Microarchitecture: how to
implement an architecture in
hardware
Application
Software
programs
Operating
Systems
device drivers
Architecture
instructions
registers
Microarchitecture
datapaths
controllers
Logic
adders
memories
Digital
Circuits
AND gates
NOT gates
Analog
Circuits
amplifiers
filters
Devices
transistors
diodes
Physics
electrons
MIPS Machine Language
 Three instruction formats:
 R-Type: register operands
 I-Type: immediate operand
 J-Type: for jumps
R-Type instructions
 Register-type
 3 register operands:
 rs, rt: source registers
 rd: destination register
 Other fields:
 op: the operation code or opcode (0 for R-type instructions)
 funct: the function
– together, op and funct tell the computer which operation to perform
 shamt: the shift amount for shift instructions, otherwise it is 0
R-Type
op
6 bits
rs
5 bits
rt
rd
shamt
funct
5 bits
5 bits
5 bits
6 bits
R-Type Examples
Field Values
Assembly Code
rs
op
rt
rd
shamt
funct
add $s0, $s1, $s2
0
17
18
16
0
32
sub $t0, $t3, $t5
0
11
13
8
0
34
5 bits
5 bits
5 bits
5 bits
6 bits
6 bits
Note the order of
registers in the assembly
code:
add rd, rs, rt
Machine Code
op
rs
rt
rd
shamt
funct
000000 10001 10010 10000 00000 100000 (0x02328020)
000000 01011 01101 01000 00000 100010 (0x016D4022)
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
I-Type instructions
 Immediate-type
 3 operands:
 op: the opcode
 rs, rt: register operands
 imm: 16-bit two’s complement immediate
I-Type
op
6 bits
rs
5 bits
rt
imm
5 bits
16 bits
I-Type Examples
Assembly Code
Field Values
rs
op
rt
imm
addi $s0, $s1, 5
8
17
16
5
addi $t0, $s3, -12
8
19
8
-12
lw
$t2, 32($0)
35
0
10
32
sw
$s1,
43
9
17
4
4($t1)
6 bits
Note the differing order of
registers in the assembly and
machine codes:
5 bits
5 bits
16 bits
Machine Code
op
rs
rt
imm
001000 10001 10000 0000 0000 0000 0101 (0x22300005)
addi rt, rs, imm
001000 10011 01000 1111 1111 1111 0100 (0x2268FFF4)
lw
rt, imm(rs)
100011 00000 01010 0000 0000 0010 0000 (0x8C0A0020)
sw
rt, imm(rs)
101011 01001 10001 0000 0000 0000 0100 (0xAD310004)
6 bits
5 bits
5 bits
16 bits
J-Type instructions
 Jump-type
 26-bit address operand (addr)
 Used for jump instructions (j)
J-Type
op
addr
6 bits
26 bits
Review: Instruction Formats
R-Type
op
6 bits
rs
5 bits
rt
rd
shamt
funct
5 bits
5 bits
5 bits
6 bits
I-Type
op
6 bits
rs
5 bits
rt
imm
5 bits
16 bits
J-Type
op
addr
6 bits
26 bits
Microarchitecture
 Microarchitecture: how to
implement an architecture in
hardware
 This is sometimes just called
implementation
 Processor:
 Datapath: functional blocks
 Control: control signals
Application
Software
programs
Operating
Systems
device drivers
Architecture
instructions
registers
Microarchitecture
datapaths
controllers
Logic
adders
memories
Digital
Circuits
AND gates
NOT gates
Analog
Circuits
amplifiers
filters
Devices
transistors
diodes
Physics
electrons
Parts of CPUs
 Datapath
 The registers and logic to perform operations on them
 Control unit
 Generates signals to control datapath
13
Memory and I/O
 Memories are connected to the data/control in and
out lines
 Example: register to memory ops
 Will discuss I/O arrangements later
14
Basic Datapath
 Basic components of the CPU datapath
 PC, Instruction Memory, Register File, ALU, Data Memory
CLK
CLK
CLK
PC'
PC
32
32
32
A
RD
Instruction
Memory
5
32
5
A1
A2
WE3
WE
RD1
RD2
32
32
32
5
32
A3
WD3
Register
File
32
A
RD
Data
Memory
WD
32
C
First: A “lightweight” ALU
Arithmetic Logic Unit = ALU
16
Lightweight ALU
 A lightweight ALU from textbook:
 3-bit function select (7 functions)
A
B
N
N
ALU
N
Y
3F
F2:0
Function
000
A&B
001
A|B
010
A+B
011
not used
100
A & ~B
101
A | ~B
110
A-B
111
SLT
Lightweight ALU: Internals
 (light-weight version)
A
B
N
N
N
0
1
F2
N
Cout
+
[N-1] S
Zero
Extend
N
N
N
N
0
1
2
3
2
N
Y
F1:0
F2:0
Function
000
A&B
001
A|B
010
A+B
011
not used
100
A & ~B
101
A | ~B
110
A-B
111
SLT
Set Less Than (SLT) Example
 Configure a 32-bit ALU for the
A
set if less than (SLT) operation.
B
N
 Suppose A = 25 and B = 32.
N
 A is less than B, so we expect Y to
N
0
1
F2
N


Cout

+
[N-1] S
1 bit (MSB)
Zero
Extend
N
N
N

N
0
1
2
3
2
N
Y
F1:0
be the 32-bit representation of 1
(0x00000001).
For SLT, F2:0 = 111.
F2 = 1 configures the adder unit
as a subtracter. So 25 - 32 = -7.
The two’s complement
representation of -7 has a 1 in the
most significant bit, so S31 = 1.
With F1:0 = 11, the final
multiplexer selects Y = S31 (zero
extended) = 0x00000001.
Next: A “full-feature” ALU
20
Arithmetic Logic Unit (ALU)
 Full-feature ALU from COMP411:
A
B
5-bit ALUFN
Sub
Bidirectional
Barrel
Shifter
Add/Sub
Boolean
Bool
0
1
1
Math
1
Flags N
V,C Flag
0
R
0
…
Shft
Z
Flag
Sub Bool Shft Math
0
XX
0
1
1
XX
0
1
X
X0
1
1
X
X1
1
1
X
00
1
0
X
10
1
0
X
11
1
0
X
00
0
0
X
01
0
0
X
10
0
0
X
11
0
0
OP
A+B
A-B
0
1
B<<A
B>>A
B>>>A
A & B
A | B
A ^ B
A | B
Shifting Logic
 Shifting is a common operation
 applied to groups of bits
 used for alignment
 used for “short cut” arithmetic operations
 X << 1 is often the same as 2*X
 X >> 1 can be the same as X/2
 For example:
 X = 2010 = 000101002
 Left Shift:
 (X << 1) = 001010002 = 4010
 Right Shift:
 (X >> 1) = 000010102 = 1010
X7
X6
X5
X4
X3
X2
X1
X0
“0”
SHL1
 Signed or “Arithmetic” Right Shift:
 (-X >>> 1) = (111011002 >>> 1) = 111101102 = -1010
0
1
R7
0
1
R6
0
1
R5
0
1
R4
0
1
R3
0
1
R2
0
1
R1
0
1
R0
Shifting Logic
 How do you shift by more than 1 position?
 feed other bits into the multiplexer
 e.g., left-shift-by-2
 multiplexer for Rk receives input from Xk-2
 How do you allow the shift amount to be specified
dynamically?
 need a bigger multiplexer
 shift amount is applied as the select input
 will design in class and lab
23
Boolean Operations
 It will also be useful to perform logical operations on
groups of bits. Which ones?
 ANDing is useful for “masking” off groups of bits.
 ex. 10101110 & 00001111 = 00001110 (mask selects last 4 bits)
 ANDing is also useful for “clearing” groups of bits.
 ex. 10101110 & 00001111 = 00001110 (0’s clear first 4 bits)
 ORing is useful for “setting” groups of bits.
 ex. 10101110 | 00001111 = 10101111 (1’s set last 4 bits)
 XORing is useful for “complementing” groups of bits.
 ex. 10101110 ^ 00001111 = 10100001 (1’s invert last 4 bits)
 NORing is useful for.. uhm…
 ex. 10101110 # 00001111 = 01010000 (0’s invert, 1’s clear)
Boolean Unit
 It is simple to build up a Boolean unit using primitive
gates and a mux to select the function.
 Since there is no interconnection
between bits, this unit can
be simply replicated at each
position.
 The cost is about
7 gates per bit. One for
each primitive function,
and approx 3 for the
4-input mux.
Bi
Ai
This logic
block is
repeated
for each bit
(i.e. 32
times)
00
01
10
Bool
Qi
11
An ALU at last!
 Full-feature ALU from COMP411:
A
B
5-bit ALUFN
Sub
Bidirectional
Barrel
Shifter
Add/Sub
Boolean
Bool
0
1
1
Math
1
Flags N
V,C Flag
0
R
0
…
Shft
Z
Flag
Sub Bool Shft Math
0
XX
0
1
1
XX
0
1
X
X0
1
1
X
X1
1
1
X
00
1
0
X
10
1
0
X
11
1
0
X
00
0
0
X
01
0
0
X
10
0
0
X
11
0
0
OP
A+B
A-B
0
1
B<<A
B>>A
B>>>A
A & B
A | B
A ^ B
A | B
Which one do we implement?
 We will use the full-feature one!
 slightly more challenging …
 I will help you!
 … but a lot more fun to use
 supports much more useful set of instructions for your final
programming project
27
Processor Architecture
Rather, “microarchitecture”
or implementation
28
Microarchitectures
 Multiple implementations for a single architecture:
 Single-cycle
 Each instruction executes in a single cycle
 Multicycle
 Each instruction is broken up into a series of shorter steps
 Pipelined
 Each instruction is broken up into a series of steps
 Multiple instructions execute at once.
 Directly impacts performance obtained
Processor Performance
 Program execution time
 Execution Time =
(# instructions) (cycles/instruction)(seconds/cycle)
 Definitions:
 Cycles/instruction = CPI
 Seconds/cycle = clock period
 1/CPI = Instructions/cycle = IPC
 Challenge is to satisfy constraints of:
 Cost
 Power
 Performance
MIPS Processor
 We will consider a subset of MIPS instructions
(in book & lab):
 R-type instructions: and, or, add, sub, slt, …
 Memory instructions: lw, sw, …
 Branch instructions: beq, …
 Some immediate instructions too: addi, …
 Jumps as well: j, …
Next
 Next class:
 We’ll look at single cycle MIPS
 Then the more complex versions
 Lab Friday (March 30)
 Demo your graphics displays (Lab 8)
 Start on Lab 9 (will post on website by Fri)
 start building the datapath!
– ALU
– Registers
32
Related documents