Download High-Level Synthesis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Politecnico di Milano
HIGH LEVEL SYNTHESIS WITH AREA
CONSTRAINTS FOR FPGA DESIGNS:
AN EVOLUTIONARY APPROACH
Tesi di Laurea di:
Christian Pilato
Matr.n. 674373
Relatore: Prof. Fabrizio FERRANDI
Correlatore: Ing. Antonino TUMEO
Outlines
 High-Level Synthesis
 Proposed methodology
 Experimental results
 Some further extensions…
 Conclusion and future works
Summary
2
High-Level Synthesis
Three main sub-tasks:
Inputs:
“High-Level
Synthesislevel
means
going
from
an algorithmic
level
Output:
register-transfer
(RTL)
design
in
a
hardware
description

behavioral
description
(in
C
language)
1. operation scheduling: when operations start their execution
specification
of
a behaviour
of resources
aVerilog)
digital system to a register level
language
(e.g. of
SystemC,
VHDL
and
 library
different
types of
2. structure
resourcethat
allocation
and binding:
where operations are executed,
implements
that behavior”.
 set of constraints
where
values objectives
are stored (area,
and how
elements
Goal:
minimize
latency,
etc.)are interconnected.
McFarland, et al., Proc. IEEE, February 1990.
3. controller synthesis: which operations are issued
Resource
Library
Behavioral
specification
Design
constraints
High-Level
Synthesis tool
Objectives
Scheduling
Allocation
Binding
Datapath
& Controller
Controller
Synthesis
High-Level Synthesis – Problem description
3
What are the problems?
 All the sub-tasks are NP-complete: no efficient algorithms
 Interconnections have to be considered: up to 80% of final area
 All the tasks are closely interdependent
 Most of information are available only at the end of the synthesis
Try non-deterministic approaches with feedback information
Genetic algorithms
Multi-objective optimization: reducing to single-objective
(weighted average) is not efficient
Non-dominated Sorting Genetic Algorithm (NSGA-II)
K. Deb, S. Agrawal, A. Pratab, and T. Meyarivan, “A Fast and Elitist Multi-Objective Genetic Algorithm: NSGAII,” Proceedings of the Parallel Problem Solving from Nature VI Conference, pp. 849–858, 2000.
High-Level Synthesis – Problem description
4
The proposed methodology
Fitness Evaluation
Chromosome
Partial binding
scheduling
Finite State
Machine creation
Initial
population
Rank sorting
(NSGA-II)
Register
allocation
Interconnection
optimization
Selection
Result
estimations
Mutation
Add offsprings to
parent population
NO
Are any
stopping criteria
met?
End
YES
Crossover
High-Level Synthesis and Design Space Exploration
5
Experimental results
Development framework
 Integrated in the PandA framework
 an open-source C++ framework covering different aspects of
the hardware-software design of embedded systems
 Evolutionary computation with Open BEAGLE framework
Functional validation
 Comparison between Verilog and C simulations
Estimation model validation
 Comparison between estimations and logic synthesis values
average error equal 4.02 %
standard deviance equal 2.82 %
maximum error less than 10 %
These values can be effectively used as fitness values
Experimental results
6
Experimental results
Design Space Exploration validation
 Population size of 1.000 individuals, evolving up to a maximum of
200 generations
 the best trade-off between overall execution time and solution
quality.
Considerations:
 It takes into account all elements in the design solution
 It can cover a good number of trade-offs between the fastest solution
and the minimal area solution
 Better approach than existing tools to deal with area constraints
Paper accepted for publication at International Symposium on Systems, Architectures, MOdeling and
Simulation (SAMOS), Samos, Greece, July 2007
Title: “An Evolutionary Approach to Area-Time Optimization of FPGA designs”
Experimental results
7
Some features just provided…
Weighted clique covering: in register allocation to reduce interconnections
 An higher weight is assigned to compatibility edge when the two
values involve the same functional units
 Clique covering on a weighted graphs; results show a further
reduction of overall area up to 10%.
Fitness inheritance: to reduce overall execution time
 A fraction of expensive real evaluations is substituted with an
estimation based on similar evaluated individuals
 It is able to reduce overall execution time over by 25%
 No substantial difference in the final Pareto-optimal solution
Paper submitted to IEEE Congress of Evolutionary Computation (CEC) 2007, Singapore, September
2007.
Title: “Fitness Inheritance In Evolutionary and Multi-Objective High Level Synthesis”
Some extensions…
8
Conclusion and future works
The main contributions from this thesis are:
 An high-level synthesis flow from C specifications to HDL
descriptions
 Integration of a model for fast estimation of synthesis results
 Design space exploration with a genetic algorithm:
 It takes into account all elements composing the design solution
 High fitting with real values
 Multi-objective concurrent optimization
Future works:
 Optimize the results coming from the synthesis flow
 Further reduce the overall execution time of the proposed methodology
 Refine the estimation model and specialize it for different targets
Conclusion and future works
9
Thank you!
Christian PILATO
Matr. n. 674373
High-Level Synthesis Flow
The proposed flow is organized as follows:
 From C to intermediate representation
 from GIMPLE to produce graph representation
 High-Level Synthesis Flow
1. Partial binding and Scheduling
2. Finite State Machine creation
3. Register allocation
4. Interconnection allocation
5. Performance and area estimations
 From data structures to intermediate representation in form of graph
 From intermediate representation to Hardware Description Language (e.g.
Verilog) ready for logic synthesis
The proposed High-Level Synthesis flow
5
Partial Binding and Scheduling
Partial binding: force an operation to be executed on a selected
functional unit instance
β (+1) = < plus; 0 >
 A technique introduced to partially control the final area occupation
 It can affect scheduling, register allocation and interconnection
allocation
Scheduling: assign a starting control step to each operation to be
executed
 Many scheduling algorithms are able to support partial binding
(Integer Linear Programming formulation, list based algorithm, etc.)
 Different solutions based on the selected algorithm
1. Partial binding and Scheduling
6
FSM and Register allocation
Scheduling gives information about concurrent operations.
This information is useful for controller synthesis and register allocation
State Transition Graph (STG), based on Moore-FSM model, is created
on scheduled specification
 It represents control flow and concorrent operations
 Conditional operations create bifurcation based on evaluated
conditional values
Register allocation: allocate elements to store values across cycle step
boundaries. A compiler approach has been implemented on STG:
 Liveness analysis based on dataflow equations
 Interference graph based on liveness information
 Different heuristics to minimize number of registers
2-3. Finite State Machine creation and Register allocation
7
The final steps…
Interconnection allocation: allocate elements to interconnect the
hardware components
 Mux-based architecture: port swapping for commutative operations
Glue logic: represent logic netlist to decode commands and select inputs
 Truth tables based on signals from controller
The RTL structural description is now available and it considers all
elements. Objective values could be retrieved from logic synthesis
too slow!
Estimation model: perform fast estimations of objective values.
 Area is difficult to be estimated
 Updated and used an existing area model*
*: C. Brandolese, W. Fornaciari, and F. Salice. “An Area Estimation Methodology for FPGA Based Designs at
SystemC-Level ”, DAC '04: Proceedings of the 41st annual conference on Design automation, pp. 129– 132,
2004.
4-5. Interconnection allocation and result estimations
8
Problem dependent elements
Chromosome encoding
 Each operation in the specification has a gene to represent a feasible
partial binding
 Genes are added to represent algorithms used to perform high-level
synthesis steps: scheduling, register allocation and interconnection
optimization
Fitness Evaluation
 Information from chromosome about partial binding and algorithms
are used to perform a synthesis flow.
 Objective values are estimated using the proposed model
Design Space Exploration by Genetic Algorithm
10
Problem independent elements
Generic operators
 common operators (crossover and mutation) used without
modifications: no unfeasible chromosomes can be created.
 If the gene changed by operators is related to:
 operation: a new binding constraint for that operation.
 algorithm: a different algorithm to solve the related synthesis step
Initial population
 created by random or starting from some interesting points to explore
around them.
Solution ranking
 ranking into different levels according to their fitness values.
 accelerated using the fast-non-dominated-sort algorithm available in
the NSGA-II
Design Space Exploration by Genetic Algorithm
11