Download 1 COMP381 by M. Hamdi

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
(Recap)
Control Hazards
COMP381 by M. Hamdi
1
Control (Branch) Hazard
A: beqz r2, label
B:
--------label:
P:
Problem:
The outcome of the branch (taken/not taken) is determined deep in the pipeline
Do not know whether to execute B or execute P?
Solutions:
• Stall pipeline for required number of cycles
• Methods to reduce the branch penalty:
1. Speculatively predict branch outcome and recover if mispredicted
2. Delayed Branch: Always execute instruction(s) following the branch
COMP381 by M. Hamdi
2
Branch Hazards: Stall pipeline
A: beqz r2, label
stall
stall
stall
B
----label: P
A
B
P
1
IF
2
ID
Execution sequence:
A, NOP, NOP, NOP, B …..
or A, NOP, NOP, NOP, P
3
EX
4
MEM
5
WB
6
7
8
9
IF
ID
EX
MEM
WB
COMP381 by M. Hamdi
3
Reducing Branch Penalty
1. Stall scheme had a branch penalty of 3 cycles.
CPI assuming branch frequency of 30% is 1.9 (with no data stalls)
2. Predicting branch not taken:
• Speculatively fetch and execute in-line instructions following the branch
• If prediction incorrect flush pipeline of speculated instructions
• Convert these instructions to NOPs by clearing pipeline registers.
• Fetch instructions from the branch target
COMP381 by M. Hamdi
4
Predict branch not taken
B
IF
A
ID
T=2
C
IF
B
ID
T=3
D
IF
C
IF
D
T=1
EX
MEM
WB
A
EX
MEM
WB
ID
B
EX
A
MEM
WB
ID
C
EX
B
MEM
Branch actually not
taken: No stalls
T=4
E
COMP381 by M. Hamdi
A
WB
5
Predict branch not taken
B
IF
A
ID
T=2
C
IF
B
ID
T=3
D
IF
C
ID
T=1
EX
MEM
WB
A
EX
MEM
WB
B
EX
MEM
WB
A
Branch actually taken:
FLUSH pipeline
Make B,C,D NOPS
T=4
P
IF
ID
EX
COMP381 by M. Hamdi
MEM
A
WB
6
Reducing Branch Stall Cycles
1- Find out whether a branch is taken earlier in the pipeline.
2- Compute the taken PC earlier in the pipeline.
In MIPS:
– In MIPS branch instructions BEQZ, BNE, test a register for
equality to zero.
– This can be completed in the ID cycle by moving the zero test
into that cycle.
– Both PCs (taken and not taken) must be computed early.
– Requires an additional adder because the current ALU is not
useable until EX cycle.
– This results in just a single cycle stall on branches.
COMP381 by M. Hamdi
7
Reducing Branch Penalty (contd … )
3. Predicting branch taken:
• Speculatively fetch and execute instructions at the branch target
– But haven’t calculated branch target address in MIPS
• MIPS still incurs 1 cycle branch penalty
• Other machines: branch target known before outcome
COMP381 by M. Hamdi
8
Reducing Branch Penalty (contd … )
4. Delayed Branch:
Branch delay slots: In-line instructions following a branch instruction
conditional branch instruction
sequential successor1
sequential successor2
……..
sequential successorn
branch target if taken
Always executed: Usually 1 delay slot (short pipelines)
• Compiler tries to put useful instructions in delay slot to mask delay
COMP381 by M. Hamdi
9
Delayed Branch
• Instruction in branch delay slot is always executed
• Compiler (tries to) move a useful instruction into delay slot.
(a) From before the Branch: Always helpful when possible
ADD R1, R2, R3
BEQZ R2, L1
DELAY SLOT
L1:
BEQZ R2, L1
ADD R1, R2, R3
L1:
• If the ADD instruction were: ADD R2, R1, R3 the move
would not be possible
COMP381 by M. Hamdi
10
Delayed Branch
(b) From the Target: Helps when branch is taken. May duplicate
instructions
ADD R2, R1, R3
ADD
R2, R1, R3
BEQZ R2, L1
DELAY SLOT
L1: SUB R4, R5, R6
L2:
BEQZ R2, L2
SUB R4, R5, R6
L1:
SUB R4, R5, R6
L2:
Instructions between BEQ and SUB (in fall through) must not use
R4.
COMP381 by M. Hamdi
11
Delayed Branch
( c ) From Fall Through: Helps when branch is not taken.
ADD R2, R1, R3
BEQZ R2, L1
DELAY SLOT
SUB R4, R5, R6
-
ADD R2, R1, R3
BEQZ R2, L1
SUB R4, R5, R6
-
L1:
L1:
Instructions at target (L1 and after) must not use R4 till set again.
• Cancelling (Nullifying) Branch:
Branch instruction indicates direction of prediction.
If mispredicted the instruction in the delay slot is cancelled.
Greater flexibility for compiler to schedule instructions.
COMP381 by M. Hamdi
12
Canceling branch
• Cancelling (Nullifying) Branch:
– Branch instruction indicates direction of prediction.
– Mispredicted branch “annuls” the instruction in delay slot.
Instruction behaves as a no-op.
– Gives more leverage to compiler to select instructions to fill delay
slots.
COMP381 by M. Hamdi
13
Delayed Branch
• Limitations of delayed branch
– Compiler may not find appropriate instructions to
fill delay slots. Then it fills delay slots with noops.
– Visible architectural feature – likely to change
with new implementations
• Pipeline structure is exposed to compiler. Need
to know how many delay slots.
COMP381 by M. Hamdi
14
Delayed Branch
• Compiler effectiveness for single branch delay slot:
– Fills about 60% of branch delay slots
– About 80% of instructions executed in branch delay slots
useful in computation
– About 50% (60% x 80%) of slots usefully filled
• Delayed Branch downside: As processor go to deeper
pipelines and multiple issue, the branch delay grows
and need more than one delay slot
– Delayed branching has lost popularity compared to more
expensive but more flexible dynamic approaches
– Growth in available transistors has made dynamic
approaches relatively cheaper
COMP381 by M. Hamdi
15
Dynamic Branch Prediction
• Builds on the premise that history matters
– Observe the behavior of branches in previous instances
and try to predict future branch behavior
– Try to predict the outcome of a branch early on in order
to avoid stalls
– Branch prediction is critical for multiple issue
processors
• In an n-issue processor, branches will come n times faster than
a single issue processor
COMP381 by M. Hamdi
16
Basic Branch Predictor
• Use a 1-bit branch predictor buffer or branch
history table
• 1 bit of memory stating whether the branch was
recently taken or not
• Bit entry updated each time the branch instruction
is executed
T
Prediction
Taken
Not Taken
State
1
0
Branch outcome
Taken Not Taken
1
0
1
0
NT
NT
State 1
State 0
Predict
Taken
Predict
Not Taken
T
COMP381 by M. Hamdi
17
1-bit Branch Prediction Buffer

Problem – even simplest branches are mispredicted
twice
LD R1, #5
Loop:
LD R2, 0(R5)
ADD R2, R2, R4
STORE R2, 0(R5)
ADD R5, R5, #4
SUB R1, R1, #1
BNEZ R1, Loop
First time: prediction = 0
but the branch is taken  change
prediction to 1 miss
Time 2, 3, 4: prediction = 1
and the branch is taken
Time 5: prediction = 1
but the branch is not taken  change
prediction to 0 miss
COMP381 by M. Hamdi
18
Dynamic Branch Prediction Accuracy
COMP381 by M. Hamdi
19