* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Report - CAE Users
Survey
Document related concepts
Biochemical switches in the cell cycle wikipedia , lookup
Tissue engineering wikipedia , lookup
Endomembrane system wikipedia , lookup
Extracellular matrix wikipedia , lookup
Cell encapsulation wikipedia , lookup
Programmed cell death wikipedia , lookup
Cellular differentiation wikipedia , lookup
Cell culture wikipedia , lookup
Cell growth wikipedia , lookup
Cytokinesis wikipedia , lookup
Transcript
Evolution-base Standard Cell Placement (ESP) ECE 556 Final Project 12/15/2004 I. Jason WoodsIntroduction I.1 Motivation and overview VLSI CAD tool development is an interesting area of study that posses many challenges. This stems from the fact that most problems that need to be addressed in VLSI CAD tool development cannot feasibly be solved in a finite amount of time due to the immense size of the solution space. These types of problems are classified as NP-hard and NPcomplete and they require some form of heuristics in order to generate an approximate solution. Standard cell placement is once such problem in VLSI CAD tool design that requires heuristics to solve. Many such heuristics have been introduced to generate approximate placement solutions. The most notable heuristics are Simulated Annealing, which was successfully implemented in the TimberWolf package. Followed by Min-Cut placement and Genetic algorithms. Evolution-base Standard Cell Placement (ESP) is offered as an alternative heuristic that promises better performance than its predecessors. ESP is described as a cross between a genetic algorithm and Simulated Annealing. ESP uses a natural selection process similar to the one found in genetic algorithms to eliminate ill-suited placements. It also uses iterative improvements similar to Simulated Annealing in order to converge on a nearoptimal solution. I.2 Comparison to Simulated Annealing and Genetic Algorithms Although ESP is an amalgamy of Simulated Annealing and genetic placement there are several major differences between the ESP heuristic and the two previously mentioned heuristics. Simulated Annealing (SA) has proven to be a versatile algorithm, and given arbitrary cpu-time will generate near optimal solutions. However, the problem with Simulated Annealing is that it is dependent upon parameter optimization. There are no clearly defined patterns for parameter optimization, which means each problem has to be tailored through trial and error. ESP requires no parameter optimization to generate a nearoptimal solution. Another difference between ESP and SA is that SA uses a pairwise exchange of cells to alter the layout; ESP permits simultaneous removal of and redistribution of large number of cells from their current position. Genetic Placement (GP) has also produces near optimal solutions through evolving solutions from one generation to the next. GP maintains a set of solutions called parents for each generation. Merging the placements of two parents into a child generates new solutions. This is where ESP differs from GP. ESP generates one child form one parent during each generation. ESP also differs from GP in the method of selecting cells to be passed on to the next generation. GP uses a random process to select a set of cells from the parent, while ESP determines the “goodness” of each cell at its current position to determine its survival between generations. This allows ESP to converge much faster than GP, thereby, performing much less iterations. These advantages of SA and GP coupled together give ESP its strength. Its main advantages being that it is uncomplicated and easy to adapt, as well as achieving near optimal solutions with a comparatively fast execution time. II. The Algorithm and Implementation Details The algorithm consists of two main phases. The first phase is the setup phase in which the parameters are set, the input file is read and the initial setup is performed. The second phase is the main iterative process in which a near optimal solution is generated. The algorithm flow is depicted below. Figure 1: Block Diagram of Algorithm II.1 Initial Setup The first two steps (shown above) is the initial I/O. The initial I/O consists of the interactive parameter input where the user can specify optimization parameters, and the reading of the input file containing the circuit description. The input file format is as follows: [parameters] aspectration=<width>:<height> rowlength=<length> cellheight=<height> [nets] N0 <weight> N1 <weight> . Figure 2: Format of Input File Worth noting is the format of the cell descriptions. First, each cell is represented by C then a cell index (e.g. C0, C1, etc), followed by a width and lastly a connection list. The connection list contains a list of nets (N0, N1, …, Nw) with a port assignment ([<x>,<y>]). The coordinates defined in the port assignment are relative to the position of the cell itself. After the input parameters and circuit description are read in all internal data structures are initialized. The next step in the initial setup is chip floorplanning. During this stage the layout height and width is determined in accordance with the specified aspect ratio. Two important values to consider are the average row length and maximum row length. The average row length is directly computed using the aspect ratio and follows as a guideline for cell placement. In order to maintain the aspect ratio rows should not exceed the value determined for the average row length. The maximum row length is directly set as a parameter. The maximum row length acts as a cap; if a row must exceed the average row length it cannot exceed the maximum row length. Once floorplanning is completed an initial placement is generated by placing cells sequential in the placement grid. The initial placement is generated by placing each cell sequentially in each row, making sure not to exceed the average row length when possible. After the initial placement is generated, precomputation is done. Precomputation calculates each nets optimal wire length. In order to compute the optimal wire length for each net, every cell the net connects is assumed to be right next to each other in a quasisquare configuration. The optimal wire length is then the semi-perimeter of the bounding rectangle of the cell configuration. The process is depicted in the figure below. To further increase the accuracy of this approximation the optimal wire length is allowed to adjust during the main iterative process if a new optimal wire length is found. Figure 3: Example of a nets optimal wire length calculation II.2 Main Iterative Process The main iterative process consists of routines which form the evolutionary algorithm. The iterative process is executed until no further improvement can be achieved. The first step in the main iterative process is the mutation procedure. Mutation is performed by simply selecting two cells at random and swapping them. The mutation procedure is followed by a collision resolving routine that cleans up any overlaps or spaces resulting from the mutation. The next step is the evaluation of the current layout. Evaluation works by computing a placement value for each cell. The placement value is determined by averaging the “goodness” of each net connected to the cell and normalizing the value from 0 to 100%. A nets “goodness” is computed as the ratio of the current wire length of the net, to the nets optimal wire length. To current wire length of the net is determined by calculating the semi-perimeter of the bounding rectangle of all the cell ports the net connects. Figure 4: Example of a net current wire length calculation After completing the evaluation process each cell is judged whether it will retain its current position in the net generation, or whether it will be removed and scheduled for allocation. Each cells chances of survival is equal to its placement value. A random number between 0 and 100% is generated for each cell. If the cells placement value is greater than the random number, the cell will survive. If the cell does not survive it is removed from the layout and placed in an allocation queue for redistribution. Two userdefined parameters are available to tailor the judgment routine. First, a user can put an upper limit on the number of cells that can be scheduled for allocation each iteration. Secondly, the user can globally increase or decrease the survival chances of the cells. The allocation queue is a priority queue based on the number of connections a cell has. So the cell with the most connections will be placed first. To eliminate the need to sort the queue, a modified binary search algorithm determines the index to insert a cell into the queue. Since the sorting routine is skipped, the allocation phase is next. During the allocation phase each cell in the allocation queue is redistributed onto the layout. The redistribution process is fairly complex. A cell is removed from the queue and placed at every empty location in the layout. For each location the cell is placed, a modified evaluation routine is run to determine the placement value of the cell at that location. The modified evaluation routine simply determines the wire length of the nets connecting to the cell. Only nets that connect to cells already placed on the grid are considered in the wire length calculation. After the cell is placed in each empty location, the location with the smallest wire length is chosen and the cell is placed in that location. Cells are only added to rows if the total row length does not exceed the average row length. If cells remain after this process, they are added to rows in which there is enough room, and the row does not exceed the maximum row length. After all the cells are placed a row realignment routine is run to fix any cell overlaps and empty spaces remaining in each row. This is necessary because the redistribution routine can place cells anywhere an empty slot exists. Figure 5: Examples of row realignment The I/O portion of the main iterative process consists of printing the current process status to the standard out put. Similarly, postprocessing consists of printing the final layout to the standard output. III. Program Structure III.1 Data Structures The program consists of six main data structures: Net, Cell, Port, Row, Slot, and Layout. A Net represents a net connecting ports on cells. Nets have weights and a list of connections. A Cell represents a standard cell. Cells consist of a width and a list of ports. A Port represents a physical location on a cell that a net can connect to. Ports consist of an x and a y coordinate that is relative to the cells position. A Row represents a row in the layout. Rows consist of a current length and a list of slots. A Slot represents a placement in a row. Slots consist of an absolution x and y coordinate within the placement grid and a cell. Slots can ether be empty or occupied. Lastly, a Layout represents the placement grid as well as general layout parameters. Layouts consist of an aspect ratio, average and maximum row lengths, cell height, and a list of rows. III.2 Program Flow The program flow follows the algorithm block diagram of figure 1. First the optimization parameters are processed, and then an initial solution is generated from a specified input file. After the initial solution is generated, precomputation is done. Next, the iterative process begins and the solution is mutated, evaluated, judged, allocated and realigned. Upon completion the final solution is printed. The program flow is depicted below, main() { - Process Input Parameters - Construct Initial Solution - Perform Precomputations - Perform Iterative ESP Process - Mutate - Resolve Collisions - Evaluate - Judge - Allocate - Realign Rows - Print Final Solution } Figure 6: General Program Flow IV. Results The performance results of the ESP algorithm were compared to that of the previously implemented TimberWolf algorithm. General results showed that for small placement problems (<= 10 cells) the two algorithms performed roughly the same. As the number of cells increased ESP begins to out perform TimberWolf. For large placement problems (>= 100 cells) ESP vastly outperforms TimberWolf. However, it is important to note that there are discrepancies in directly comparing the implementation of the ESP algorithm and the implementation of the TimberWolf algorithm. Namely, the TimberWolf algorithm only deals with fixed cell widths, and only allows net connections on the perimeter of the cell. ESP allows variable width cells and allows net connections anywhere on the cell. The common test parameters between the two algorithms are cell height, which is set at four. Number of nets a cell can have connected, which is restricted to five. Finally, the number of cells a single net can connect, which is restricted to three. The results are broken down into two tables. The first one is the comparison of wire lengths. The second one is the comparison of execution time. ESP 10 – cell circuit 50 – cell circuit 100 – cell circuit 400 4716 16701 TimberWolf 385 5549 24310 Table 1: Wire lengths of resulting placements ESP 10 – cell circuit 50 – cell circuit 100 – cell circuit 0.045 0.107 0.208 TimberWolf 0.04 0.147 0.311 Table 2: Execution time of resulting placements in seconds V. Discussion V.1 General Comments Due to the high exchange rate of the initial solution at the beginning of the algorithm, the algorithms performance is independent of the initial solution. However, if the initial placement is already a good placement then the overall computation time is reduced. So ESP might be a good follow up to a previously generated placement. V.2 Features This implementation of ESP has the ability to enforce an aspect ratio. Allowing for optimization of area User defined parameters to control allocation process User defined parameters to penalize aspect ratio violations Cells are broken down into grids allowing net connections anywhere on the cell grid. This is an enhancement of the previous implementation of TimberWolf in which nets could only be place on the cell perimeter. The precomputation of the optimal wire length is well tuned to generate the closest approximation of an optimal placement of each cell into a squarebounding box for semi-perimeter wire length calculation. Due to the implementation of a binary search for the insert point of a cell in the allocating queue there is no need to sort the queue. V.3 Challenges How to exactly accomplish the precomputation of the optimal wire length. Specifically how to approximate the initial square-bounding box. How to efficiently redistribute the cells from the allocation queue. Including how to keep track of empty slots, and how to optimize the evaluate routine to efficiently determine the placement value for each cell in the allocation queue.