Download class5-pipeline-b.pdf

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CS:APP Chapter 4
Computer Architecture
Pipelined
Implementation
Part II
Randal E. Bryant
Carnegie Mellon University
http://csapp.cs.cmu.edu
CS:APP
Overview
Make the pipelined processor work!
Data Hazards

Instruction having register R as source follows shortly after
instruction having register R as destination

Common condition, don’t want to slow down pipeline
Control Hazards

Mispredict conditional branch
 Our design predicts all branches as being taken
 Naïve pipeline executes two extra instructions

Getting return address for ret instruction
 Naïve pipeline executes three extra instructions
Making Sure It Really Works

–2–
What if multiple special cases happen simultaneously?
CS:APP
Pipeline Stages
W_icode, W_valM
W_valE, W_valM, W_dstE, W_dstM
W
valM
Fetch

Memory
Addr, Data
Select current PC

Read instruction

Compute incremented PC
Data
Data
memory
memory
M_icode,
M_Bch,
M_valA
M
Bch
valE
CC
CC
Execute
ALU
ALU
aluA, aluB
Decode

E
Read program registers
valA, valB
Execute


Decode
D
Instruction
Instruction
memory
memory
Fetch
PC

–3–
B
valP
icode, ifun,
rA, rB, valC
Read or write data memory
Write Back
A
Register
Register M
file
file E
Write back
Operate ALU
Memory
d_srcA,
d_srcB
valP
PC
PC
increment
increment
predPC
f_PC
F
Update register file
CS:APP
Write back
PIPE- Hardware
W
icode
valE
Pipeline registers hold
intermediate values
from instruction
execution
Forward (Upward) Paths


Values passed from one
stage to next
Cannot jump past
stages
write
Memory
Data
Data
memory
memory
data in
Addr
M_valA
M_Bch
M
icode
Bch
valE
valA
dstE dstM
e_Bch
Execute
E
icode
ALU
fun.
ALU
ALU
CC
CC
ifun
ALU
A
ALU
B
valC
valA
valB
dstE dstM srcA srcB
d_srcA d_srcB
Select
A
Decode
D
Fetch
icode
d_rvalA
A
dstE dstM srcA srcB
W_valM
B
Register
Register M
file
file E
 e.g., valC passes
through decode
dstE dstM
data out
read
Mem.
control

valM
ifun
rA
rB
Instruction
Instruction
memory
memory
valC
W_valE
valP
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
Select
PC
F
–4–
W_valM
predPC
CS:APP
Data Dependencies: 2 Nop’s
# demo-h2.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
F
E
D
F
M
E
D
F
W
M
E
D
F
0x006: irmovl
$3,%eax
0x00c: nop
0x00d: nop
0x00e: addl %edx,%eax
0x010: halt
6
7
8
9
10
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
Cycle 6
W
R[ %eax] R[
3
] 3
%eax
•
•
•
D
valA  R[ %edx] = 10
valB  R[ %eax] = 0
–5–
Error
CS:APP
Data Dependencies: No Nop
# demo-h0.ys
1
2
3
4
5
6
7
8
0x000: irmovl $10,%edx
F
D
F
E
D
F
M
E
D
F
W
M
E
D
W
M
E
W
M
W
0x006: irmovl
$3,%eax
0x00c: addl %edx,%eax
0x00e: halt
Cycle 4
M
M_valE = 10
M_dstE = %edx
E
e_valE  0 + 3 = 3
E_dstE = %eax
D
valA  R[ %edx] = 0
valB  R[ %eax] = 0
–6–
Error
CS:APP
Stalling for Data Dependencies
# demo-h2.ys
1
2
3
4
5
6
0x000: irmovl $10,%edx
F
D
F
E
D
F
M
E
D
F
W
M
E
D
W
M
E
0x006: irmovl
$3,%eax
0x00c: nop
0x00d: nop
bubble
0x00e: addl %edx,%eax
0x010: halt
–7–
F
D
F
7
8
9
10
11
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W

If instruction follows too closely after one that writes
register, slow it down

Hold instruction in decode

Dynamically inject nop into execute stage
CS:APP
Write back
Stall Condition
W
icode
valE
dstE
dstM
data out
read
Mem.
control
Source Registers
valM
write
Memory
Data
Data
memory
memory
data in
Addr
M_valA
M_Bch

srcA and srcB of current
instruction in decode
stage
Destination Registers

dstE and dstM fields

Instructions in execute,
memory, and write-back
stages
Special Case

Don’t stall for register ID
8
M
–8–
Bch
valE
valA
dstE
dstM
dstE
dstM srcA
dstE
dstM srcA
e_Bch
Execute
E
icode
ALU
fun.
ALU
ALU
CC
CC
ifun
ALU
A
ALU
B
valC
valA
valB
srcB
d_srcA d_srcB
Select
A
Decode
d_rvalA
A
srcB
W_valM
B
Register
RegisterM
file
file
W_valE
E
D
Fetch
icode
ifun
rA
rB
Instruction
Instruction
memory
memory
valC
valP
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
Select
PC
 Indicates absence of
register operand
icode
F
W_valM
predPC
CS:APP
Detecting Stall Condition
# demo-h2.ys
1
2
3
4
5
6
0x000: irmovl $10,%edx
F
D
F
E
D
F
M
E
D
F
W
M
E
D
W
M
E
0x006: irmovl
$3,%eax
0x00c: nop
0x00d: nop
bubble
0x00e: addl %edx,%eax
0x010: halt
F
D
F
7
8
9
10
11
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
Cycle 6
W
W_dstE = %eax
W_valE = 3
•
•
•
D
–9–
srcA = %edx
srcB = %eax
CS:APP
Stalling X3
# demo-h0.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
F
E
D
M
E
W
M
E
0x006: irmovl
$3,%eax
bubble
bubble
6
W
M
E
bubble
0x00c: addl %edx,%eax
F
0x00e: halt
D
F
D
F
D
F
7
8
9
10
11
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
Cycle 6
W
Cycle 5
W_dstE = %eax
M
Cycle 4
M_dstE = %eax
E
•
•
•
D
E_dstE = %eax
D
– 10 –
srcA = %edx
srcB = %eax
srcA = %edx
srcB = %eax
•
•
•
D
srcA = %edx
srcB = %eax
CS:APP
What Happens When Stalling?
# demo-h0.ys
Cycle 8
4
5
6
7
0x000: irmovl $10,%edx
0x006: irmovl
$3,%eax
0x00c: addl %edx,%eax
0x00e: halt
Write Back
Memory
Execute
Decode
Fetch
0x000: irmovl
0x006:
bubble $10,%edx
$3,%eax
0x000: irmovl
0x006:
bubble $10,%edx
$3,%eax
0x006: addl
0x00c:
irmovl
bubble
%edx,%eax
$3,%eax
0x00c: halt
0x00e:
addl %edx,%eax
0x00e: halt

Stalling instruction held back in decode stage

Following instruction stays in fetch stage

Bubbles injected into execute stage
 Like dynamically generated nop’s
 Move through later stages
– 11 –
CS:APP
Implementing Stalling
W_dstM
W_dstE
W
icode
valE
valM
dstE dstM
valE
valA
dstE dstM
M_dstM
M_dstE
M
icode
Bch
E_dstM
Pipe
control
logic
E_dstE
E_bubble
E
icode
ifun
valC
valA
valB
dstE dstM srcA
srcB
d_srcB
d_srcA
srcB
D_icode
D_stall
D
F_stall
F
srcA
icode
ifun
rA
rB
valC
valP
predPC
Pipeline Control
– 12 –

Combinational logic detects stall condition

Sets mode signals for how pipeline registers should update
CS:APP
Pipeline Register Modes
Input = y
Normal
Output = x
x
stall
=0
Output = x
x
stall
=1
– 13 –
Output = x
x
stall
=0
y

Rising
clock

Output = x
x
bubble
=0
Input = y
Bubble

Output = y
bubble
=0
Input = y
Stall

Rising
clock

Rising
clock

n
o
p
Output = nop
bubble
=1
CS:APP
Data Forwarding
Naïve Pipeline

Register isn’t written until completion of write-back stage

Source operands read from register file in decode stage
 Needs to be in register file at start of stage
Observation

Value generated in execute or memory stage
Trick
– 14 –

Pass value directly from generating instruction to decode
stage

Needs to be available at end of decode stage
CS:APP
Data Forwarding Example
# demo-h2.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
F
E
D
F
M
E
D
F
W
M
E
D
F
0x006: irmovl
$3,%eax
0x00c: nop
0x00d: nop
0x00e: addl %edx,%eax
0x010: halt



– 15 –
irmovl in writeback stage
Destination value in
W pipeline register
Forward as valB for
decode stage
6
7
8
9
10
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
Cycle 6
W
R[ %eax] 3
W_dstE = %eax
W_valE = 3
•
•
•
D
srcA = %edx
srcB = %eax
valA  R[ %edx] = 10
valB  W_valE = 3
CS:APP
Bypass Paths
W_icode, W_valM
W_valE, W_valM, W_dstE, W_dstM
W_valE
W_valM
W
Decode Stage
m_valM

Forwarding logic selects
Memory
valA and valB

Normally from register
file

Addr, Data
M_valE
M
Forwarding: get valA or
valB from later pipeline
Execute
stage
Execute: valE

Memory: valE, valM

Write back: valE, valM
e_valE
Bch
CC
CC
ALU
ALU
E_valA, E_valB,
E_srcA, E_srcB
Forwarding Sources

Data
Data
memory
memory
M_icode,
M_Bch,
M_valA
E
valA, valB
Forward
d_srcA,
d_srcB
Decode
A
B
Register
Register M
file
file
E
Write back
– 16 –
D
icode, ifun,
rA, rB, valC
valP
CS:APP
valP
Data Forwarding Example #2
# demo-h0.ys
1
2
3
4
5
6
7
8
0x000: irmovl $10,%edx
F
D
F
E
D
F
M
E
D
F
W
M
E
D
W
M
E
W
M
W
0x006: irmovl
$3,%eax
0x00c: addl %edx,%eax
0x00e: halt
Register %edx

Generated by ALU
during previous cycle

Forward from memory
as valA
Register %eax
– 17 –

Value just generated
by ALU

Forward from execute
as valB
Cycle 4
M
M_dstE = %edx
M_valE = 10
E
E_dstE = %eax
e_valE  0 + 3 = 3
D
srcA = %edx
srcB = %eax
valA  M_valE = 10
valB  e_valE = 3
CS:APP
Implementing
Forwarding
W_valE
Write back
W_valM
W
icode
valE
valM
dstE dstM
data out
read
Mem.
control
Data
Data
memory
memory
write
Memory
m_valM

Add additional feedback
paths from E, M, and W
pipeline registers into
decode stage

Create logic blocks to
select from multiple
sources for valA and valB
in decode stage
data in
Addr
M_Bch
M
icode
M_valA
M_valE
Bch
valE
valA
dstE dstM
e_Bch
e_valE
ALU
ALU
CC
CC
Execute
E
icode
ifun
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE dstM srcA srcB
d_srcA d_srcB
dstE dstM srcA srcB
Sel+Fwd
A
Decode
Fwd
B
A
W_valM
B
Register
Register M
file
file
W_valE
E
D
icode
– 18 –
Fetch
ifun
rA
rB
Instruction
Instruction
memory
memory
valC
valP
PC
PC
increment
increment
Predict
PC
CS:APP
f_PC
M_valA
Select
Implementing Forwarding
W_valE
W_valM
valE
valM
dstE dstM
data out
read
Mem.
control
m_valM
Data
Data
memory
memory
write
data in
Addr
M_valA
M_valE
Bch
valE
valA
dstE dstM
e_Bch
e_valE
ALU
ALU
CC
CC
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE dstM srcA srcB
d_srcA d_srcB
dstE dstM srcA srcB
Sel+Fwd
A
Fwd
B
A
W_valM
B
Register
Register M
file
file
W_valE
E
rA
rB
Instruction
Instruction
memory
–valC
19 –
CS:APP
valP
PC
PC
increment
## What should be the A value?
int new_E_valA = [
# Use incremented PC
D_icode in { ICALL, IJXX } : D_valP;
# Forward valE from execute
d_srcA == E_dstE : e_valE;
# Forward valM from memory
d_srcA == M_dstM : m_valM;
# Forward valE from memory
d_srcA == M_dstE : M_valE;
# Forward valM from write back
d_srcA == W_dstM : W_valM;
# Forward valE from write back
d_srcA == W_dstE : W_valE;
# Use value read from register file
1 : d_rvalA;
];
Predict
PC
Limitation of Forwarding
# demo-luh.ys
0x000:
0x006:
0x00c:
0x012:
0x018:
0x01e:
0x020:
1
2

4
irmovl $128,%edx
F
D
E M
irmovl $3,%ecx
F
D
E
rmmovl %ecx, 0(%edx)
F
D
irmovl $10,%ebx
F
mrmovl 0(%edx),%eax # Load %eax
addl %ebx,%eax # Use %eax
halt
Load-use dependency

3
Value needed by end of
decode stage in cycle 7
Value read from memory in
memory stage of cycle 8
5
6
W
M
E
D
F
W
M
E
D
F
7
8
9
10
11
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
Cycle 7
Cycle 8
M
M
M_dstE = %ebx
M_valE = 10
M_dstM = %eax
m_valM  M[128] = 3
•
•
•
D
valA  M_valE = 10
valB  R[%eax] = 0
– 20 –
Error
CS:APP
Avoiding Load/Use Hazard
# demo-luh.ys
1
2
3
4
5
6
0x000: irmovl $128,%edx
0x006: irmovl $3,%ecx
F
D
F
E
D
M
E
W
M
W
0x00c: rmmovl %ecx, 0(%edx)
F
D
0x012: irmovl $10,%ebx
F
0x018: mrmovl 0(%edx),%eax # Load %eax
bubble
E
D
F
0x01e: addl %ebx,%eax # Use %eax
0x020: halt


Stall using instruction for
one cycle
Can then pick up loaded
value by forwarding from
memory stage
7
8
M
E
W
M
W
D
E
F
D
F
9
10
M
E
W
M
W
D
F
E
D
M
E
11
W
M
12
W
Cycle 8
W
W_dstE = %ebx
W_valE = 10
M
M_dstM = %eax
m_valM  M[128] = 3
•
•
•
D
– 21 –
valA  W_valE = 10
valB  m_valM = 3
CS:APP
data out
read
Mem.
control
Data
Data
memory
memory
write
Memory
m_valM
Detecting Load/Use Hazard
data in
Addr
M_Bch
M
icode
M_valA
M_valE
Bch
valE
valA
dstE
dstM
e_Bch
e_valE
ALU
ALU
CC
CC
Execute
E
icode
ifun
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE
dstM
srcA
srcB
d_srcA d_srcB
dstE
Sel +Fwd
A
Decode
D
icode
Condition
Fetch
Load/Use Hazard
F
– 22 –
rA
rB
Instruction
Instruction
memory
memory
valC
Trigger
srcA
srcB
Fwd
B
A
ifun
dstM
W_valM
B
Register
RegisterM
file
file E
W_valE
valP
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
E_icode in { IMRMOVL, IPOPL } &&
E_dstM in { d_srcA, d_srcB }
Select
PC
W_valM
predPC
CS:APP
Control for Load/Use Hazard
# demo-luh.ys
1
2
3
0x000:
0x006:
0x00c:
0x012:
0x018:
4
irmovl $128,%edx
F
D
E M
irmovl $3,%ecx
F
D
E
rmmovl %ecx, 0(%edx)
F
D
irmovl $10,%ebx
F
mrmovl 0(%edx),%eax # Load %eax
bubble
0x01e: addl %ebx,%eax # Use %eax
0x020: halt

Stall instructions in fetch
and decode stages

Inject bubble into execute
stage
Condition
Load/Use Hazard
– 23 –
5
6
7
W
M
E
D
F
W
M
E
D
W
M
E
F
D
F
8
9
10
11
12
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
F
D
E
M
W
stall
stall
bubble
normal
normal
CS:APP
Branch Misprediction Example
demo-j.ys
0x000:
xorl %eax,%eax
0x002:
jne t
0x007:
irmovl $1, %eax
0x00d:
nop
0x00e:
nop
0x00f:
nop
0x010:
halt
0x011: t: irmovl $3, %edx
0x017:
irmovl $4, %ecx
0x01d:
irmovl $5, %edx

– 24 –
# Not taken
# Fall through
# Target (Should not execute)
# Should not execute
# Should not execute
Should only execute first 8 instructions
CS:APP
Handling Misprediction
# demo-j.ys
1
2
3
4
5
6
0x000:
xorl %eax,%eax
F
0x002:
jne target # Not taken
D
F
E
D
F
M
E
D
W
M
W
E
M
W
D
F
E
D
F
M
E
D
0x011: t: irmovl $2,%edx # Target
bubble
0x017:
irmovl $3,%ebx # Target+1
irmovl $1,%eax # Fall through
0x00d:
nop
8
9
10
W
M
E
W
M
W
F
bubble
0x007:
7
Predict branch as taken

Fetch 2 instructions at target
Cancel when mispredicted
– 25 –

Detect branch not-taken in execute stage

On following cycle, replace instructions in execute and
decode by bubbles

No side effects have occurred yet
CS:APP
W_valE
Write back
W_valM
W
icode
valE
valM
dstE
dstM
Detecting Mispredicted Branch
data out
read
Mem.
control
Data
Data
memory
memory
write
Memory
m_valM
data in
Addr
M_Bch
M
icode
M_valA
M_valE
Bch
valE
valA
dstE
dstM
e_Bch
e_valE
ALU
ALU
CC
CC
Execute
E
icode
ifun
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE
dstM
dstE
dstM
srcA
srcB
d_srcA d_srcB
Sel +Fwd
A
Condition
srcB
Fwd
B
Trigger
Decode
srcA
A
W_valM
B
Register
RegisterM
file
file E
Mispredicted Branch E_icode = IJXX & !e_Bch
D
Fetch
icode
ifun
rA
rB
Instruction
Instruction
memory
memory
valC
W_valE
valP
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
– 26 –
Select
PC
F
predPC
W_valM
CS:APP
Control for Misprediction
# demo-j.ys
1
2
3
4
5
6
0x000:
xorl %eax,%eax
F
0x002:
jne target # Not taken
D
F
E
D
F
M
E
D
W
M
W
E
M
W
D
F
E
D
F
M
E
D
0x011: t: irmovl $2,%edx # Target
bubble
0x017:
irmovl $3,%ebx # Target+1
0x007:
irmovl $1,%eax # Fall through
0x00d:
nop
F
Mispredicted Branch normal
– 27 –
8
9
10
W
M
E
W
M
W
F
bubble
Condition
7
D
E
M
W
bubble
bubble
normal
normal
CS:APP
demo-retb.ys
Return Example
0x000:
0x006:
0x00b:
0x011:
0x020:
0x020:
0x026:
0x027:
0x02d:
0x033:
0x039:
0x100:
0x100:

– 28 –
irmovl Stack,%esp
call p
irmovl $5,%esi
halt
.pos 0x20
p: irmovl $-1,%edi
ret
irmovl $1,%eax
irmovl $2,%ecx
irmovl $3,%edx
irmovl $4,%ebx
.pos 0x100
Stack:
# Initialize stack pointer
# Procedure call
# Return point
# procedure
#
#
#
#
Should
Should
Should
Should
not
not
not
not
be
be
be
be
executed
executed
executed
executed
# Stack: Stack pointer
Previously executed three additional instructions
CS:APP
Correct Return Example
# demo-retb
0x026:
ret
F
bubble
D
F
bubble
bubble
0x00b:

irmovl $5,%esi # Return
As ret passes through
pipeline, stall at fetch stage
E
D
F
M
E
D
F
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
W
valM = 0x0b
 While in decode, execute, and
memory stage


– 29 –
Inject bubble into decode
stage
Release stall when reach
write-back stage
•
•
•
F
valC  5
rB  %esi
CS:APP
Detecting Return
M_Bch
M
icode
M_valE
Bch
valE
valA
dstE dstM
e_Bch
e_valE
ALU
ALU
CC
CC
Execute
E
icode
ifun
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE dstM srcA
srcB
d_srcA d_srcB
dstE dstM srcA
Sel+Fwd
A
Decode
D
– 30 –
icode
Fwd
B
A
B
Register
Register M
file
file E
ifun
rA
rB
valC
srcB
W_valM
W_valE
valP
Condition
Trigger
Processing ret
IRET in { D_icode, E_icode, M_icode }
CS:APP
Control for Return
# demo-retb
0x026:
ret
F
D
F
bubble
bubble
bubble
0x00b:
M
E
D
F
irmovl $5,%esi # Return
Condition
Processing ret
– 31 –
E
D
F
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
F
D
E
M
W
stall
bubble
normal
normal
normal
CS:APP
Special Control Cases
Detection
Condition
Trigger
Processing ret
IRET in { D_icode, E_icode, M_icode }
Load/Use Hazard
E_icode in { IMRMOVL, IPOPL } &&
E_dstM in { d_srcA, d_srcB }
Mispredicted Branch E_icode = IJXX & !e_Bch
Action (on next cycle)
Condition
F
D
E
M
W
Processing ret
stall
bubble
normal
normal
normal
Load/Use Hazard
stall
stall
bubble
normal
normal
bubble
bubble
normal
normal
Mispredicted Branch normal
– 32 –
CS:APP
Implementing Pipeline Control
W
icode
valE
valM
dstE dstM
valE
valA
dstE dstM
M_icode
M
icode
Bch
e_Bch
CC
CC
E_dstM
E_icode
Pipe
control
logic
E_bubble
E
icode
ifun
valC
valA
valB
dstE dstM srcA
srcB
d_srcB
d_srcA
srcB
D_icode
srcA
D_bubble
– 33 –
D_stall
D
F_stall
F
icode
ifun
rA
rB
valC
valP
predPC

Combinational logic generates pipeline control signals

Action occurs at start of following cycle
CS:APP
Initial Version of Pipeline Control
bool F_stall =
# Conditions for a load/use hazard
E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } ||
# Stalling at fetch while ret passes through pipeline
IRET in { D_icode, E_icode, M_icode };
bool D_stall =
# Conditions for a load/use hazard
E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };
bool D_bubble =
# Mispredicted branch
(E_icode == IJXX && !e_Bch) ||
# Stalling at fetch while ret passes through pipeline
IRET in { D_icode, E_icode, M_icode };
bool E_bubble =
# Mispredicted branch
(E_icode == IJXX && !e_Bch) ||
# Load/use hazard
E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};
– 34 –
CS:APP
Control Combinations
Load/use
M
E
D
Load
Use
Mispredict
M
E
D
ret 1
M
JXX
E
ret
D
Combination A
ret 2
M
E
D
ret
bubble
ret 3
M
E
D
ret
bubble
bubble
Combination B

Special cases that can arise on same clock cycle
Combination A


Not-taken branch
ret instruction at branch target
Combination B
– 35 –

Instruction that reads from memory to %esp

Followed by ret instruction
CS:APP
E
icode
ifun
valC
valA
valB
dstE
dstM srcA
dstE
dstM srcA
srcB
d_srcA d_srcB
Select
A
Decode
A
Control Combination A
D
Mispredict
M
E
D
ifun
rA
rB
Fetch
srcB
W_valM
B
Register
RegisterM
file
file
W_valE
E
valC
valP
ret 1
M
JXX
E
ret
D
Combination A
Condition
icode
d_rvalA
Instruction
Instruction
memory
memory
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
Select
PC
F
W_valM
predPC
F
D
E
M
W
stall
bubble
normal
normal
normal
Mispredicted Branch normal
bubble
bubble
normal
normal
Combination
bubble
bubble
normal
normal
Processing ret
– 36 –
stall

Should handle as mispredicted branch

Stalls F pipeline register

But PC selection logic will be using M_valM anyhow
CS:APP
Control Combination B
ret 1
Load/use
M
E
D
M
E
D
Load
Use
ret
Combination B
Condition
F
D
E
M
W
Processing ret
stall
bubble
normal
normal
normal
Load/Use Hazard
stall
stall
bubble
normal
normal
Combination
stall
bubble + bubble
stall
normal
normal
– 37 –

Would attempt to bubble and stall pipeline register D

Signaled by processor as pipeline error
CS:APP
Handling Control Combination B
ret 1
Load/use
M
E
D
M
E
D
Load
Use
ret
Combination B
Condition
F
D
E
M
W
Processing ret
stall
bubble
normal
normal
normal
Load/Use Hazard
stall
stall
bubble
normal
normal
Combination
stall
stall
bubble
normal
normal


– 38 –
Load/use hazard should get priority
ret instruction should be held in decode stage for additional
cycle
CS:APP
Corrected Pipeline Control Logic
bool D_bubble =
# Mispredicted branch
(E_icode == IJXX && !e_Bch) ||
# Stalling at fetch while ret passes
IRET in { D_icode, E_icode, M_icode
# but not condition for a load/use
&& !(E_icode in { IMRMOVL, IPOPL }
&& E_dstM in { d_srcA, d_srcB
Condition
through pipeline
}
hazard
});
F
D
E
M
W
Processing ret
stall
bubble
normal
normal
normal
Load/Use Hazard
stall
stall
bubble
normal
normal
Combination
stall
stall
bubble
normal
normal


– 39 –
Load/use hazard should get priority
ret instruction should be held in decode stage for additional
cycle
CS:APP
Pipeline Summary
Data Hazards

Most handled by forwarding
 No performance penalty

Load/use hazard requires one cycle stall
Control Hazards

Cancel instructions when detect mispredicted branch
 Two clock cycles wasted

Stall fetch stage while ret passes through pipeline
 Three clock cycles wasted
Control Combinations

Must analyze carefully

First version had subtle bug
 Only arises with unusual instruction combination
– 40 –
CS:APP
Related documents