Download 2.2 - KTH

Document related concepts

Multidimensional empirical mode decomposition wikipedia , lookup

Opto-isolator wikipedia , lookup

Rectiverter wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Flip-flop (electronics) wikipedia , lookup

Time-to-digital converter wikipedia , lookup

Transcript
Traditional SOC Design Flow
Concept + Market Research
Architechtural
Specs & RTL
coding
• Key Problem: Timing assumption
during prelayout synthesis widely
differs from the post layout reality.
Transfer Clock
Tree to DC
Formal Verification
RTL Simulation
(Scan Inserted Netlist
Vs
CT Inserted Netlist)
Logic Synthesis,
Optimization &
Scan Insertion
Post Global Route
STA
Formal Verification
Timing
OK?
(RTL Vs Gates)
No
• As a result getting a timing closure
becomes a challenge.
Yes
Pre-layout STA
Detailed Routing
No
Timing
OK?
• This happens because the
interconnect delay dominates the
overall propagation delay in DSM
(Deep Sub-Micron) technologies.
Post-layout STA
Yes
Timing
OK?
Floorplanning,
Placement,
CT Insertion &
Global Routing
No
Yes
Tape out
Source: Advanced ASIC Chip Synthesis. 2nd Ed. Himanshu Bhatnagar. Kluwer Academic Publishers
Develop HDL files
Specify Libraries
Library Objects
link_library
target_library
symbol_library
synthetic_library
Read Design
analyze
elaborate
read_file
Define Design Environment
Set_operating_conditions
Set_wire_load_model
Set_drive
Set_driving_cell
Set_load
Set_fanout_load
Set_min_library
Set Design Constraints
Design Rule Constraints
set_max_transition
set_max_fanout
set_max_capacitance
Design Optimisation Constraints
Create_clock
set_clock_latency
set_propagated_clock
set_clock_uncertainty
set_clock_transition
set_input_delay
set_output_delay
set_max_area
Select Compile Strategy
Top Down
Bottom Up
Optimize the Design
Compile
Analyze and Resolve
Design Problems Check_design
Save the
Design database
Report_area
Report_constraint
Report_timing
write
Design Compiler Setup Files
• .synopsys_dc.setup
– Library paths
– Company wide, project wide design environment related variables and commands
– UNIX variables
• Three files at three locations. All three are read in the following order
– Synopsys root - $SYNOPSYS/admin/setup
•
Affects all users. Only system adminstrator can modify this. In small startups with only single
ASIC project, this serves as the place to enforce project wide discipline.
– Home Directory
• Content affects all DC activities. Project wide enforcement could happen at these level if the
designer is involved in a single project (less likely).
– Working Directory
• Affects the current invocation of DC. If a person is working on more than one Synopsys projects
(more likely), then the project wide enforcement should happen at this level. One working
directory for each project.
• Repeated commands are overridden
Libraries & Search Path
•
t
Technology Library
Created by ASIC vendor in Synopsys format – which is now an open standard.
Cells are defined by their names, function, timing, net delay, parasitic information, a
units for time, resistance, capacitance etc.
•
Target Library
a technology library that Design Compiler maps to during optimization.
•
Link Library
C
b
c
The technology library that contains the definition of the cells used in the mapped
d
design. In principle should be the same as target_library unless a technology
translation is being performed.
C
U
C
C
z = (a + b)(cd)
EON1

Symbol Library
Definition of graphics symbols. Cells in Symbol Library must match

DesignWare Library
A DesignWare component library is a collection of reusable circuit-design building blocks that are tightly integrated
into the Synopsys synthesis environment.

GTECH Library
The GTECH library is the Synopsys generic technology library. It is technology-independent and included with
Design Compiler software.
GTECH parts are Synopsys unmapped representations of Boolean functions (library cell placeholders). GTECH
instantiation allows for a technology-independent HDL description and the accuracy of instantiation.

Search_path
If the library variables only specify file names, search_path is used to locate libraries. By default
points to current working directory and $SYNOPSYS/libraries/syn
z
Synopsys Design Objects
• Design
A circuit that performs one or more logical functions
• Cell
An instance of a design or library primitive within a design
• Reference
The name of the original design that a cell instance points to
• Port
The input or output of a design
• Pin
The input or output of a cell
• Net
A wire that connects ports to ports or ports to pins
• Clock
A timing reference object to describe a waveform for timing analysis
Synopsys Design Objects - Schematic
Design
Parity
Pin
Top
U1
Ain Ain
Bin Bin
Cell
U5
A
A
Q0
XOR
Q0
B
B
Cin Cin
C
C
U6
Clock
Q1
U2
U3
Ain
Q0
bus0
Bin
Cin
Q1
bus1
INV
U4
INV
Parity
inv0
inv1
Z[0:1]
D0 Q[0:1]
D1
Regfile
Clk
Q1
Clk
Clk
Cell
Port
Net
Parity
Designs
Cells
References
Reference and Design
Top
{“Top“, “Parity“, “Regfile“}
{“Top“, “Parity“, “Regfile“}
{"U5", "U6"}
{“U1“, “U2“, “U3“, “U4“}
{"EXNOR3", "INVX1”}
{“Parity“, “Regfile“, “INVX1“}
Synopsys Design Objects - VHDL
Design
Port
ENTITY Top IS
PORT(
A, B, C, Clk : IN STD_LOGIC;
Z : OUT STD_LOGIC_VECTOR(1 DOWNTO 0));
END Top;
Net
ARCHITECTURE structural OF Top IS
...
SIGNAL bus0, bus1, inv0, inv1: STD_LOGIC;
BEGIN
U1 : Parity
PORT MAP( Ain
Bin
Cin
Cell
Q0
Q1
U2 : Regfile
PORT MAP(...
Name of Entity, function or procedure
Cell
Instantiated component or subroutine
Reference
Name of used component or subroutine
Port
Input/Output port
Pin
Reference
Port inside the reference
=>
=>
=>
=>
=>
Net
A,
B,
C,
bus0,
bus1);
Local signals or variables
Clock
No interpretation
Net
END structural;
Design
Pin
Synopsys Design Objects - VHDL
Design
Port
ENTITY Top IS
PORT(
A, B, C, Clk : IN STD_LOGIC;
Z : OUT STD_LOGIC_VECTOR(1 DOWNTO 0));
END Top;
Net
ARCHITECTURE structural OF Top IS
...
SIGNAL bus0, bus1, inv0, inv1: STD_LOGIC;
BEGIN
U1 : Parity
PORT MAP( Ain
Bin
Cin
Cell
Q0
Q1
U2 : Regfile
PORT MAP(...
Name of Entity, function or procedure
Cell
Instantiated component or subroutine
Reference
Name of used component or subroutine
Port
Input/Output port
Pin
Reference
Port inside the reference
=>
=>
=>
=>
=>
Net
A,
B,
C,
bus0,
bus1);
Local signals or variables
Clock
No interpretation
Net
END structural;
Design
Pin
Reading Assignment
Read about these commands from Synopsys Documentation
Find and Filter
Read / Analyze / Elaborate
Compile
Report_timing
Also read about what are Attributes and Variables
Outline of this course module
•
•
•
•
•
•
•
•
Synopsys Design Environment Essentials
CMOS essentials for logic synthesis
Constraint Classification
Load and Drive Constraints
Clocking constraints
Operating Conditions Constraints
Static Timing Analysis
Chip Level Timing and Multiple Clock Domains
MOSFET Transistor
Source: MIT. Course 6.375. Lecture L06. 2006
Key qualitative Characteristics of MOSFET
transistors
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
RC Model of an inverter
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Wires
Source: MIT. Course 6.375. Lecture L06. 2006
Distributed RC wire model
This is also known as
Elmore Delay model
Source: MIT. Course 6.375. Lecture L06. 2006
Manual insertion of Repeaters
Source: MIT. Course 6.375. Lecture L06. 2006
Lumped RC wire model
Source: MIT. Course 6.375. Lecture L06. 2006
Estimate the rise time
Source: MIT. Course 6.375. Lecture L06. 2006
1. Width of transistor is found by multiplying the
scaling factor (16/8/2/1) with the minimum
width of transistor which is 0.5 mm.
2. Multiply Cg,N/Cg,P/Cd,N/Cd,P with the width of the
transistor to get the drain/gate capacitances for
P and N transistors.
3. Wider transistor  more capacitance
1. Divide Reff,N/Reff,P with the width of the
transistor to get the Resistance for the
N and P transistors.
2. Wider Transistor  Less resistance
The factor 2.2 comes from 90% Vdd swing
loge(0.9Vdd / 0.1Vdd)
The sheet resistance (0.07) is for unit square.
Since the wire width is 0,25mm. resistance
for 1 mm X 0.25 mm wire is 0.07/0.25. This
factor is multiplied by the length 250 mm
The wire capacitance is made up of two parts: Bottom
(area) capacitance found using 250 X 0.25 (area) X CA,M2.
Side capacitance is found by multiplying length 250 XCL,M32
Source: MIT. Course 6.375. Lecture L06. 2006
Constraints
• Technology, Operating
and Manufacturing
Constraints
– Max rise time, max
capacitance
– Operating Conditions –
• Vdd, Temperature
• Drive current, Load
– Process Variations
• Fast corner, Slow corner
– Physical Design
• Antenna rules
• Optimisation Constraints
– Performance – clock
– Area
– Power
Generic Synthesis Flow
Create a solution
Evaluate the solution
Analysis
Constraints Met
Technology, Operating &
Manufacturing Constraints
Optimisation Constraints
Design
Static Timing Analysis (STA)
• Exhaustively verifies that
– the timing constraints (clock) are met for a design
– for given technology (Standard Cell Library) and
– a set of specified operating conditions
• Limitations of the alternative – Simulation
– Not Exhaustive
– Accuracy
• RTL
• Gate Level
– SDF back annotation
– Dependent on STA
PROCESS (clk)
BEGIN
IF rising_edge (clk) THEN
s <= a * b;
END IF;
END
• Circuit Level SPICE simulation are impractical
– Time (STA also takes time, but is bounded)
Timing Models - Accuracy
• Untimed
• Transaction Level - SystemC
– Multiple Cycles
– Bus Transactions, Transmit/Receive, Encode/Decode
• Cycle Accurate – RTL
– What happens in each clock cycle is accurately known
• Gate Level – Event Driven
– Physical details of computation, storage and interconnect operations
known
– Delay in wire is not known
– Clock is ideal
• Layout Level
– Delay in wire known
– Clock is real
– Relative position of standard cell is known
Delay Parameters – Intrinsic Delay & Slew
A=1
Z
B
Vdd
B
R
0.5Vdd
z
y
Q
P
x
t1
t2
Vdd
0.7Vdd
0.3Vdd
t1
t2
Z
Path Delay Calculation
• The intrinsic delays and the slews are characterised using SPICE
simulation by sweeping many parameters that affects the Intrinsic
delay and Slew
• All the paths are exhaustively covered
Library and Design
Delay Computation
Through Gate
A
Delay and Slew
At Gate Output
B
D
Delay and Slew
At Next Gate Input
Environment Conditions
for Analysis
C
Delay Computation
Through Wire
Paths & Path Groups
j
d
b
f
g
e
Q i
D
m o
k
D
Q
q
r
s
q
r
s
t
clk
clk
p
h
c
n
l
a
j
m
d
e
f
i
g
o
k
b
t
p
h
n
c
l
• Paths
Start point: Input ports or clock pins of sequential devices and
End point: Output ports or Data input pins of sequential devices.
• Path groups
Paths are organised in groups identified by clocks controlling their endpoints.
Timing Arcs
• positive unate timing arc:
•Combines rise delays with rise delays, and fall delays with fall delays. An
example is an AND gate cell delay or an interconnect (net) delay.
• negative unate timing arc:
Combines incoming rise delays with local fall delays, and incoming fall
delays with local rise delays. An example is a NAND gate.
• nonunate timing arc:
Combines local delay with the worst-case incoming delay value. Nonunate
timing arcs are present in logic functions whose output value change cannot
be predicted by the direction of the change on the input value. An example is
an XOR gate.
• Accuracy of estimates is critical
• Intrinsic Delays are accurate after logic synthesis
• Slew and Net Delays are estimated and known accurately only after
physical synthesis
Factors Affecting Delay and Slew
Discrete Factors:
P1
1.
2.
3.
4.
Geometry & Dimension
Specific Path
Transition Direction
Related Pin
4 Input NAND gate
A
B
P2
Z
N1
N2
Factors Affecting Delay and Slew
Load on the Gate
• Load of all the inputs that this output has to drive
• Load of the interconnect wires
• Tri-stated wires
Input Slew
• Transition time at the previous gate
• The interconnect
• Primary input – drive strength, driver cell
Constraints
Technology Constraints
• Max Transition
• Max Fanout
• Max Capacitance
• Min Capacitance
Design Constraints
• Set Load
• Set Drive (inverse of resistance)
Technology Constraint; Cannot be relaxed
Design Constraint
A
A
Z1
5
Z2
Z3
set_driving_cell
or set_drive
set_load
• If drive or driving cell is not specified, the synthesis tool
assumes infinite drive strength
• If load is not specified, the synthesis tool assumes zero load
Interpolation and Extrapolation
Piece Wise Linear Model
Load
L2
L
L1
D12
D1
D22
D
D11
S1
D2
D21
S
S2
Slew
worst
nominal
best
Process
best
worst
nominal
Temperature
Delay
Delay
Delay
Process, Voltage, Temperature (PVT)
Variation & Operating Conditions
worst
nominal
best
Voltage
Operating Conditions
Name
Library
Process
Temp
Volt
Interconnect Model
WCCOM
WCIND
WCMIL
BCCOM
BCIND
BCMIL
my_lib
my_lib
my_lib
my_lib
my_lib
my_lib
1.50
1.50
1.50
1.50
1.50
1.50
70
80
125
0
-40
-55
1.1
1.1
1.0
1.2
1.2
1.3
worst_case_tree
worst_case_tree
worst_case_tree
best_case_tree
best_case_tree
best_case_tree
PVT Variation: An Example
Consider a minimum size NMOS device in a 1.2 mm CMOS process. VGS =VDS = 5V
The nominal saturation current for the device size W = 1.8 mm, Leff = 0,9 um
2
1 W
I d = --- k ----- V gs – Vt 
2 L
= (1/2) 19.6  10-6  (2)(5 - 0.75)2 = 354 mA
Now consider the variation in the following parameters:
25 % variation in Threshold voltage – Vt
10 % variation in transconductance k’n mainly due to variation in oxide thickness.
±0.15mm (about 10 %) variation in W and L. Variations in W and L are uncorrelated as
they are
±0.5V (10%) variation in power supply voltage
2
19.6 + 1.96 1.8 + 0.15
I D MAX =  ---------------------------  ------------------------  5 + 0.5  –  0.75 – 0.1875  = 683mA
2
0.9 – 0.15
2
19.6 – 1.96 1.8 – 0.15
I D MIN =  --------------------------- ------------------------  5 – 0.5  – 0.75 + 0.1875  = 176mA
2
0.9 + 0.15
Speed of device is proportional to the drain current and can thus result in
variation of the speed of the circuit.
Derating
Libraries are characterized for various operating conditions
Further characterisation is done to see how the delay model
responds to change in process, voltage and temperature. This is
done by holding two parameters constant and sweeping the third.
This yields derating factors for Process, Voltage and Temperature
Sequential Arcs
Timing relationship between
1. two input pins
2. two consecutive events on the same input pin
1. Pulse Width
2. Setup
3. Hold
4. Recovery
5. Removal
Pulse Width
1. Width of High and low phases of clocks
2. Width of Active level of asynchronous inputs like reset
Not met. Reset may
have no effect
rst_n
Pulse
Width
Requirement
Setup
Data should be stable setup time before the arrival of clock edge.
What happens if the setup time is violated ?
clk
Not met. New data
may not get latched
data
Setup Requirement
Hold
Data should be stable hold time after the arrival of clock edge.
What happens if the Hold time is violated ?
clk
Not met. Old data may
not get latched
data
Hold
Requirement
Recovery and Removal
Minimum time between de-assertion
of an asynchronous control signal and
the next active clock edge
rst_n
Not met. clk may
not have effect
clk
Minimum time between an active
clock edge that an asynchronous
control signal should remain asserted
clk
Not met. clk may
override rst_n
rst_n
Recovery
Requirement
Can be formulated as a setup check
Removal
Requirement
Can be formulated as a hold check
Vin1
Vin2, Vout1
What is the reason for setup and hold
Vout1
a Vin2 = Vout1
c
Vin2
Vout2
a
c
b
Vin1 = Vout2
Vin1, Vout2
b
Transistor Level Schematic of a D-Flop
http://www.edn.com/design/analog/4371393/Understanding-the-basics-of-setup-and-hold-time
Working of the D-Flop work
at Transistor Level
http://www.edn.com/design/analog/4371393/Understanding-the-basics-of-setup-and-hold-time
Setup and Hold Time at Circuit Level
The time it takes data D to reach node Z is called the setup time.
The time it takes data D to reach node W is called the hold time.
http://www.edn.com/design/analog/4371393/Understanding-the-basics-of-setup-and-hold-time
Negative Hold Time
http://www.edn.com/design/analog/4371393/Understanding-the-basics-of-setup-and-hold-time
Generalizing Setup & Hold Constraints
Boundary of the Flop
data
Delay D1
F1
clk
Delay C1
1.
2.
3.
4.
5.
Hold Constraint
Setup Constraint
Assume C1 is zero
clk reaches F1 before data has arrived at F1 and
registers wrong data
To avoid this, data should stabilize D1 time
before the arrival of clk.
In reality, C1 is never zero, so data should
stabilize D1-C1 time before the arrival of clk.
As there are multiple D1 paths and multiple C1
paths, the complete and safe setup constraint is
max (data path delays) – min (clock path delays)
1. Assume D1 is zero
2. Data reaches F1 before clk has arrived at F1. When the clk arrives, new data has
overwritten the previous data.
3. To avoid this, data should remain stable C1 time after the arrival of clk.
4. In reality, D11 is never zero, so data should remain stable C1-D1 time after the arrival of
clk.
5. The complete and safe hold constraint is max (clock path delays) – min (data path delays)
Negative Hold
Boundary of the Flop
Delay D1
data
F1
Delay C1
clk
clk
data
At Device Interface
Stable
Setup + Hold (cannot be negative) =
Max(clock path) + Max(data path) –
Min(clock path) – Min(data path)
New
At Latching Element
clk
data
1. Typically clock paths are well buffered and faster
2. There can be substantial data path delay,
especially in scan flops
3. max (data path delays) – min (clock path delays)
is always positive. This implies that Setup
constraint is never negative
4. max (clock path delays) – min (data path delays)
can be negative. This implies that Hold
constraint can be negative
Stable
New
Negative Hold – Seen At Device Interface
Specifying Input Delay
inBlock
FF 1
myDesign
m
n
FF 2
Good design practice
mandates that inBlock does
not have a combinatorial
logic (”m”) driving output
These days ”m” is more likely
to be the result of global
interconnect delay.
clk
inpdelay
m
clk-to-Q
Early floorplanning is a good
way to estimate the delay
due to ”m”
n
t setu p
If floorplanning is not done a
good bet is 50-60% of the
clock cycle
set_input_delay -clock Clock 8 “data_in_2”
Characterize command
automatically calculates
input delay from parent
design
Specifying Output Delay
myDesign
FF 1
outBlock
s
t
FF 2
clk
outpd elay
s
clk-to-Q
t
t setu p
set_output_delay -clock Clk -max -fall 10 {"Z<0>" "Z<1>"}
General Timing Constraints
I1
C0
F1
C1
F2
C2
F3
C3
O1
clk
I2
Four kinds of path groups exist:
1. Input to Output, e.g., I2 to O2
2. Input to Register, e.g, I1 to F1
3. Register to Register F1 to F2
4. Register to Output F3 to O1
TI1, TI2 are input delays
DQ1, DQ2 and DQ3 are clk-to-Q delays
S1, S2 and S3 are setup constraints
H1, H2 and H3 are hold constraints
C0-C3 combinatorial delays
P is the clock Period
C4
O2
O2 = TI2 + C4
TI1 + C0 ≤ P – S1
TI1 + C0 ≥ H1
Setup Slack: P- S1- TI1- C0
Hold Slack: TI1 + C0 - H1
Setup and Hold Slacks should be positive
DQ1 + C1 ≤ P – S2
DQ2 + C1 ≥ H2
Setup Slack: P - S2 - DQ2 - C1
Hold Slack: DQ2 + C1 – H2
Gate Level Simulation
Simulation Library
Gate Level Design
Timing Library
Timing Analysis
Tool
Simulator
SDF File
Clock Distribution
Source: MIT. Course 6.375. Lecture L06. 2006
Clock Skew
Clock Drivers
The basic assumption in synchronous system is that all the
sequential elements in the design sample their input at the
same time, marked by a clock signal. In reality, the clock
signal does not arrive at the sequential elements at the
same time. The difference in time between the reference
clock signal and the local clock signal at a sequential
element is called the clock skew.
In fact clock skew would not be a problem if the clock
signal was uniformly delayed at all the sequential
elements. It is the non-uniform delay of the clock signal
that creates the problem. The delay depends on the
distance of the sequential element from the clock source
and the local load.
The primary reason for the delay is the large amount of
load seen by the clock signal. The load consists of all the
sequential elements in the design and clock net itself which
behaves as a distributed RC line (or higher order models )
and can be several cms long in a large chip.
The total capacitance of a single clock line easily measures
hundreds of pF and can easily reach into nF range. The
total clock capacitance of the Alpha processor equals 3.25
nF, which is 40% of the total switching capacitance of the
entire chip.
Clock Skew in Alpha Processor
Clock Skew
Source: MIT. Course 6.375. Lecture L06. 2006
Clock Jitter
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Clock Skew and Sequential Circuit Performance
Each synchronous module is composed of
combinational logic CL and a Flop and is characterised
by six timing parameters: The min. and max.
propagation(pg) delays of the register: tr,min, tr,max and
combinational logic: tl,min, tl,max. The propagation delay
of the interconnect ti and the local clock skew t.
The max pg. delay corresponds to the time taken by
the slowest output to respond to any transition at
input. This delay constraints the max. allowable clock
speed.
The min pg. delay corresponds to the time taken by
atleast one output to start responding to a transition
at input. This delay is typically much smaller than the
max delay and determines the amount of skew a
circuit can tolerate before race condition occurs. If  is
greater tr,min + ti + tl,min than inputs at R2 can change
before the previous inputs are latched.
t”  t’ + tr,min + ti + tl,min
OR
  tr,min + ti + tl,min
t” + T  t’ + tr,max + ti + tl,max
T  tr,max + ti + tl,max - 

t’
In
CL1
R1
t’’
CL2
ti
tl,min tr,min
’
t’
R1
t’’’
R2
CL3
R3
Out
tl,max tr,max

’’
t’’ =t’ + 
tr,min+tl,min +ti
R2
data
(a) Race between clock and data.
’
t’

tr,max+tl,max+ti
’’
’’+T
t’’ +T=
t’ +T
R2
R1
data
OR
(b) Data should be stable before clock pulse is applied.
Positive and Negative Clock Skew
• Positive Skew:  > 0:
In this case the clock is routed in the same
direction as the data and the first equation
needs to be satisfied. Violating it will result in
malfuntioning of circuit. Observe that slowing
down the clock period does not help. The
positive skew actually helps improve the clock
speed as it is a negative factor in the constraint
on clock period T.
• Negative Skew:  < 0:
The negative skew occurs when the data is
routed in the direction opposite to the clock
signal. The first equation is unconditionally
satisfied and the circuit works correctly
independent of the skew. Unfortunately,
negative skew will limit the clock speed and
thus lower the performance, as predicted by
the second equation: the skew reduces the time
available for computation by ||.
  t r,min + ti + tl,min
T  tr,max + ti + tl,max - 
(a) Positive Skew

Data
CL
R
CL
R
CL
R

(b) Negative Skew
Data
CL
R
CL
R
CL
R
a
c
b
d
0
Setup time met
Hold time met
Launch
Clock
c
a
b
0
Capture
Clock
d
a
b
0
a
c
b
d
0
Setup time violated
Hold time violated
Launch
Clock
c
a
b
0
Capture
Clock
d
a’
b’
0
a
c
b
0
Setup time violated
Hold time met
Launch
Clock
c
a
b
0
Capture
Clock
d
0
d
logic
startpoint
FF 1
logic
FF 2
setup
relationship
hold
relationship
endpoint
Setup Violations result from worst case timing
Hold Violations result from best case timing
Chip Level Timing Issues
Blocks 4 & 8 communicate and need
their clocks to be skew alligned
1
CGU
8
2
3
6
44
5
The data signals between Blocks 4 & 8
could take more than one clock cycle
and can get routed through blocks 5 and
6
1
CGU
8
2
3
6
44
5
8
8
7
7
This makes chip level timing closure difficult and sensitive to geometry.
A hierarchical design style, where each chiplets are timing closed independently and
chip can be composed from such chiplets. Solution: Latency insensitive design.
Categories of Synchronization
Clock Based
GS
Data Based
Double Latch
GALS
Handshake: 2 Phase, 4 Phase
Asynchronous – 2 Clock FIFO
Latency
ambiguity
GRLS (KTH Technology)
Constraints
Complexity
Send and Forget – Double Latching
ACL: Asynchronous Communication Link
Source
Destination
PS
S
Vout
Vin
PD
ACL
D

in
CLKs
CLKD
VIH
1
VMS
Ps
D Q
D Q
PD
VIL
CLKD
0

t
v(t ) = V
MS
+ (v(0) – V
MS
) et /
Send and Forget – Double Latching
Advantages
• Good choice for single bit control
data
• Grey coded multi bit data
payloads are also target
Disadvantages
• No Flow Control  Send and Forget
• Metastable signal to multiple
targets could resolve to different
values
Handshake ACL
Asynchronous Communication Link
S
PS
RS
AS
PD
ACL
D
RD
AD
CLKs
Ps
CLKD
D
Q
PD
Pd: Destination Payload
Ps: Source Payload
CLKD
FSM
RS
AS
FSM
CLKs
Q D
D
Q D
Q
AD
D
Q
RD
Data payload frequency must be less than the worst-case round trip delay of
the flow control
2-phase
3Ts + 3Td ≥ TPs
4 phase
6Ts + 6Td ≥ TPs
Example:
Source: 27 MHz, Destination: 200 MHz
Maximum isochronous data rate using 2 phase protocol
3*(37nS) + 3*(5nS) = 126 ns = 7.9 MHz
3Ts + 3Td
6Ts + 6Td
TPs
2-phase
3Ts + 3Td ≥ TPs
TPs
TPs
The period for which
data remains valid/asserted
4 phase
6Ts + 6Td ≥ TPs
1. Note that TPs does not decide data payload frequency. TPs is less than the round trip delay to
enable the next payload to be transferred immediately after the round trip delay is over.
2. The period (TPL)corresponding to the data payload frequency has to be more than the worst case
round trip delay i.e. 3Ts + 3Td ≤ TPL and 6Ts + 6Td ≤ TPL for 2 and 4 phase protocols respectively. This
is illustrated in the example below
Data payload frequency must be less than the worst-case round trip delay of the flow control
2-phase
3Ts + 3Td
Example:
4-phase
6Ts + 6Td
Source: 27 MHz, Destination: 200 MHz
Maximum isochronous data rate using 2 phase protocol
3*(37nS) + 3*(5nS) = 126 ns = 7.9 MHz
2 Clock Asynchronous FIFO
• Fail Safe, Self Correcting:
• Write logic could think the FIFO
is full when it is not
• Read logic could think that the
FIFO is empty when it is not
• Not suitable for Island hopping:
• Storage in Write Island is a
problem
• Typically the read side needs to
be read every cycle
GALS
Globally Asynchronous Locally Synchronous
Source: ETH, Zurich
GALS
Clocking and Communication Schemes
• Synchronous Design – phase and skew alligned
• Mesochronous Design – same clk freq and phase
alligned
• Ratiochronous Design
Different Clock freqs but have rational relationship – phase
alligned
KTH research
• Pleisochronous
– No rational clock relationship – phase relationship
drifts
• Asynchronous
Ideal vs Real Clock
 During the initial phase of synthesis clock is ideal
 set_auto_disable_drc_nets command should
be used to prevent DC from wasting time on fixing DRC
violations on high fanout nets like Resets and Clocks
 Model skew and jitter effects using the
set_clock_uncertainity command
 Model clock network latency using
set_clock_latency command
 Once clock tree has been inserted use the
set_propagated_clock command to use the
actual clock. Back annotation using read_sdf
command is required
Modelling Clock Skew