Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Build GCC Cross Compiler for a Specify CPU Chia-Tsun Wu D92943007 [email protected] Outline Introduction to SoC Motivation and project goal Design a CPU Tools are used to design CPU hardware CPU Specification CPU Design flow Simulation and Results Outline Build a GCC Cross Compiler GCC structure Knowledge to port GCC Build Flow Build a GCC Cross Assembler and Cross Linker Build a GCC Cross Compiler A simple test program Summary Introduction to SoC SoC: System on a Chip. Highly integrated include: CPU System Bus Peripherals Co-processor ………… Low cost, low area, high performance. What is SOC? Portable / reusable IP Embedded CPU Embedded Memory Real World Interfaces (USB, PCI, Ethernet) Software (both on-chip and off) Mixed-signal Blocks Programmable HW (FPGAs) > 500K gates SOC Design Flow System Specs.. HW/SW Partitioning Hardware Descript. HW Synth. and Configuration Configuration Modules Software Descript. Software Gen. & Parameterization Interface Synthesis Hardware Components HW/SW Interfaces Software Modules HW/SW Integration and Cosimulation Integrated System System Evaluation Design Coverification System Validation Motivation and project goal Motivation: SoC is the major trend in recent years CPU is one of the key kernel of SoC design Development environment is the most important to a CPU Goal: Design a simple 32-bit RISC CPU Build a cross assembler and cross linker for a specify CPU Build a cross compiler for a specify CPU Design a CPU Specification 32-bit RISC based CPU General-purpose register architecture 32-bit (64 Gbyte) addressing 32-bit fixed instruction length (excluding immediate data) MSB first Reset address 0x000ffffc No pipeline, one instruction cycle four clock cycles Instruction fetch Instruction decode and Data fetch Execution Write back No interrupt No timer Registers General purpose register R0~R15 R13: Accumulator R14: memory data pointer R15: stack pointer Program counter (PC) (0x000ffffc after reset) Program status (PS) (Sign flag, Zero flag, oVerflow flag, Carry flag) Instruction formats General: OP OP: 8 bits n: register number 0000: R0, 1111: R15 Immediate: OP Rn1, Rn2 OP: 8 bits n: register number 0000: R0, 1111: R15 #data:32 bit data Branch: OP #data, Rn2 Addr OP: 16 bit (low byte=0x00) Addr: 32 bits branch address Instruction sets ADD Rn1,Rn2 ADDC Rn1,Rn2 Rn2=Rn1+Rn2 Flag: SZVC SUB Rn1,Rn2 Rn2=Rn1+Rn2 Flag: SZVC Rn2=Rn2-Rn1 Flag: SZVC SUBC Rn1,Rn2 Rn2=Rn2-Rn1 Flag: SZVC Machine code:00000000Rn1Rn2 Machine code:00000001Rn1Rn2 Machine code:00000010Rn1Rn2 Machine code:00000011Rn1Rn2 Instruction sets LDI #data,Rn2 MOV Rn1,Rn2 Rn2=Rn1 Flag: RET Rn2=data Flag: PC=[SP--] Flag: JMP #Addr PC=[Addr] Flag: Machine code:00001000000Rn2#Data Machine code:00000101Rn1Rn2 Machine code:0000011000000000 Machine code:0000011100000000#Addr Tools are used Synposis Design Compiler Mentor Graph ModelSim Synposis Apollo TSMC 0.25um standard cell libraries Design Flow CPU Specifications RTL Coding Test bench Function simulation Constrain Design compiler Test bench Gate level simulation Constrain Apollo Test bench Post layout simulation Tape out Test vectors LDI #0x0,R0 00000000000000000000010000000000 00000000000000000000000000000000 LDI #0x1,R1 00000000000000000000010000000001 00000000000000000000000000000001 LDI #0x2,R2 00000000000000000000010000000010 00000000000000000000000000000010 LDI #0x3,R3 00000000000000000000010000000011 00000000000000000000000000000011 LDI #0x4,R4 00000000000000000000010000000100 00000000000000000000000000000100 LDI #0x5,R5 00000000000000000000010000000101 00000000000000000000000000000101 LDI #0x6,R6 00000000000000000000010000000110 00000000000000000000000000000110 LDI #0x7,R7 00000000000000000000010000000111 00000000000000000000000000000111 LDI #0x8,R8 00000000000000000000010000001000 00000000000000000000000000001000 LDI #0x9,R9 00000000000000000000010000001001 00000000000000000000000000001001 LDI #0xa,R10 00000000000000000000010000001010 00000000000000000000000000001010 LDI #0xb,R11 00000000000000000000010000001011 00000000000000000000000000001011 LDI #0xc,R12 00000000000000000000010000001100 00000000000000000000000000001100 LDI #0xd,R13 00000000000000000000010000001101 00000000000000000000000000001101 LDI #0xe,R14 00000000000000000000010000001110 00000000000000000000000000001110 LDI #0xf,R15 00000000000000000000010000001111 00000000000000000000000000001111 ADD R0,R1 00000000000000000000000000000001 ADDC R2,R3 00000000000000000000000100100011 SUB R4,R5 00000000000000000000001001000101 SUBC R6,R7 00000000000000000000001101100111 MOV R8,R9 00000000000000000000010110001001 JMP 0x000000 00000000000000000000011100000000 00000000000000000000000000000000 Simulation result Synthesis results TSMC 0.25um Area:0.35mm*mm Clock:400MHz Power:1.73mW UMC 0.18um Area:0.19mm*mm Clock:600MHz Power:1mW Build a GCC Cross Compiler GCC structure Knowledge to port GCC Build Flow Build a GCC Cross Assembler and Cross Linker Build a GCC Cross Compiler A simple test program Summary GCC Execution gcc cpp cc1 g++ gas (assembler) Input file output file ld (linker) The Structure of Compiler source program lexical analyzer syntax analyzer Front-end semantic analyzer symbol-table manager intermediate code generator error handler code optimizer Back-end code generator target program The Structure of GCC C C++ ObjC Parsing TREE RTL Machine Description Macro Definition Global Optimizations - Jump Optimization - Common Subexpr. Elimination - Loop Optimization - Data Flow Analysis Instruction Combining Instruction Scheduling Register Class Preferencing Register Allocation Peephole Optimizations Assembly Fortran GCC Code Generation Backend machine description pattern match intermediate format (RTL). Machine description like a template. Machine description includes type bit widths, memory alignment instruction patterns, register classes peephole optimization rules GCC Code Generation (cont’d) (set (reg:SF 12) (minus:SF (reg:SF 13) (reg:SF 14))) Intermediate format (RTL) (define_insn "subsf3" [(set (match_operand:SF 0 "register_operand" "=f") (minus:SF (match_operand:SF 1 "register_operand" "f") (match_operand:SF 2 "register_operand" "f")))] "" "subf\\t%0,%1,%2") Machine description subf r1, r2, r3 Output assembly Example of RTL (plus:SI (reg:SI 8) (const_int 123)) Adds two 4-byte integer (SImode) operands. First operand is register Register is also 4-byte integer. Register number is 8. Second operand is constant integer. Value is “123”. Mode is VOIDmode (not given). Templates Used for three purposes: Generating RTL from parse tree. Generating machine insns from RTL. Specifying parameters about instructions. Sample Template for RISC machine: (define_insn "addsi3" [(set (match_operand:SI 0 "register_operand" "=r") (plus:SI (match_operand:SI "register_operand" "%r") (match_operand:SI 2 "register_operand" "r")))] "" "add %0,%1,%2" [(set_attr "type "arith")]) GCC Porting and Retargeting Porting to new machines/processors Using GCC as backend for other language The “Using and Porting the GCC” book and self-contained. Done by describing machine, not how to compile for machine. Few well-documented. Few examples. See GNAT、GNU Cobol、Fortran porting. In both case, copy from similar ports. How to port GCC In directory gccxxx/gcc/config/machine/ machine.h machine.md Contain C macros that define general attributes of the machine. Contain RTL expressions that define the instruction set. Input to programs that procude .h and .c files. machine.c Machine-dependent functions; normally things too large to cleanly put into above two files. How to port GCC (cont’d) Study the book "Using and Poting GCC" Study Target-machine Specification Find a approximate machine description No find Find Create target.h target.c target.md Modify target.h target.c target.md Test gcc/config --Architecture characteristic key H A hardware implementation does not exist. M A hardware implementation is not currently being manufactured. S A Free simulator does not exist. L Integer registers are narrower than 32 bits. Q Integer registers are at least 64 bits wide. N Memory is not byte addressable, and/or bytes are not eight bits. F Floating point arithmetic is not included in the instruction set I Architecture does not use IEEE format floating point numbers C Architecture does not have a single condition code register. B Architecture has delay slots. D Architecture has a stack that grows upward. l Port cannot use ILP32 mode integer arithmetic. gcc/config --Architecture characteristic key q Port can use LP64 mode integer arithmetic. r Port can switch between ILP32 and LP64 at runtime. (Not necessarily supported by all subtargets.) c Port uses cc0. p Port does not use define_peephole. f Port does not define prologue and/or epilogue RTL expanders. g Port does not define TARGET_ASM_FUNCTION_(PRO|EPI)LOGUE. m Port does not use define_constants. b Port does not use '"* ..."' notation for output template code. d Port uses DFA scheduler descriptions. h Port contains old scheduler descriptions. a Port generates multiple inheritance thunks using TARGET_ASM_OUTPUT_MI(_VCALL)_THUNK. t All insns either produce exactly one assembly instruction, or trigger a define_split. e <arch>-elf is not a supported target. s <arch>-elf is the correct target to use with the simulator in /cvs/src. gcc/config --Architecture characteristic key Gcc-config.txt define_peephole In addition to instruction patterns the `md' file may contain definitions of machine-specific peephole optimizations. The combiner does not notice certain peephole optimizations when the data flow in the program does not suggest that it should try them. For example, sometimes two consecutive insns related in purpose can be combined even though the second one does not appear to use a register computed in the first one. A machine-specific peephole optimizer can detect such opportunities. define_splits Often you can rewrite the single insn as a list of individual insns, each corresponding to one machine instruction. The compiler splits the insn if there is a reason to believe that it might improve instruction or delay slot scheduling. Splits are evaluated after the combiner pass and before the scheduling passes Splits optimaized the speed and instruction length they are the perfect place to put this intelligence. Ex: If we are loading a small negative constant we can save space and time by loading the positive value and then sign extending it. define_expand On some target machines, some standard pattern names for RTL generation cannot be handled with single insn, but a sequence of RTL insns can represent them. For these target machines, you can write a `define_expand' to specify how to generate the sequence of RTL. A `define_expand' is an RTL expression that looks almost like a `define_insn'; but, unlike the latter, a `define_expand' is used only for RTL generation and it can produce more than one RTL insn. The combiner pass only cares about reducing the number of instructions does not care about instruction lengths or speeds define_insn Push and pop Move Addition movqi_unsigned_register_load movqi_signed_register_load *movqi_internal movhi movhi_unsigned_register_load movhi_signed_register_load *movhi_internal movsi movsi_internal movdi *movdi_insn movsf *movsf_internal *movsf_constant_storeSigned conversions from a smaller integer to a larger integer movsi_push movsi_popmove extendqisi2 extendhisi2 zero_extendqisi2 zero_extendhisi2 Subtraction add_to_stack addsi3 addsi_regs addsi_small_int addsi_big_int *addsi_for_reload subsi3 Multiplication mulsidi3 umulsidi3 mulhisi3 umulhisi3 mulsi3 Negation Shifts negsi2 ashlsi3 ashrsi3 lshrsi3 define_insn Logical Operations andsi3 iorsi3 xorsi3 one_cmplsi2 cmpsi *cmpsi_internal beq bne blt ble bgt bge bltu bleu bgtu bgeu *branch_true *branch_false call call_value jump indirect_jump tablejump Function Prologues and Epilogues Branches Calls & Jumps Comparisons prologue epilogue return_from_func leave_func enter_func Miscellaneous nop blockage define_insn “addsi_regs” (define_insn "addsi_regs" [(set (match_operand:SI 0 "register_operand" "=r") (plus:SI (match_operand:SI 1 "register_operand" "%0") (match_operand:SI 2 "register_operand" "r")))] "" "add %2, %0" ) ;set value x chapter 9.15 p110 ; value=x ; (plus:m x y) ; x+y with carry out in mode m define_insn “addsi_regs” (cont’d) ; (mach_operand:m n predicate constraint) chapter 10.4 p131 ; if condition(predicate) is true then return n ; n count from 0 ; for each number n, only one match_operand expression ; predicate is a name of C function call. return 0 when failed ; general_operand: check the operand is either a constant, a register, or a memory reference ; register_operand: check the operand is register or not ; immediate_operand: check the operand is immediate data or not ; constraint: describes one kind of operand that is permited ; r: register ; m: any kind of memory operand ; o: only offsetable memory operand ; V: only not offsetable memory operand ; <: memory operand with autodecrement addressing ; >: memory operand with autoincrement addressing ; i: immediate integer operand ; 0~9: an operand that matches the specified operand number is allowed. Build a GCC Cross Compiler Machine Description Configure GCC Configure Binutils Make Make Make install Make install GCC compiler Build a GCC Cross Assembler and Cross Linker Binutils: Ver 2.14 Configure --target=fr30-elf –prefix=dir Make Make install Build a GCC Cross Compiler GCC: ver 3.3.1 ../configure --target=fr30-elf --prefix=dir -enable-languages=c Make Make install A simple c to test cross compiler int test(int i,int j,int k) { int a; int b; a=49999999; b=39999999; a+=k; b+=j; a++; b--; i += a + b; return i; } fr30-elf-gcc –S –O2 t.c A simple c to test cross compiler (cont’d) test: .file "t.c" .text .p2align 2 .globl test .type test, @function mov ldi:32 r4, r2 #50000000, r4 ldi:32 #39999998, r1 add add add add ret .size .ident r6, r5, r1, r2, r4 r1 r4 r4 ;00000000000000000000010101000010 ;00000000000000000000010000000100 ;10111110101111000010000000 ;00000000000000000000010000000001 ;10011000100101100111111110 ;00000000000000000000000001100100 ;00000000000000000000000001010001 ;00000000000000000000000000010100 ;00000000000000000000000000100100 test, .-test "GCC: (GNU) 3.3.1 (cygming special)" A simple c to test cross compiler (cont’d) Summary Study RTL is more important than study MD. Build cross assembler and cross linker before build cross compiler. There are few data to port GCC as a cross compiler Modify an existing MD is easier than to create a new one. “The main goal of GCC was to make a good, fast compiler for machines in the class that the GNU system aims to run on: 32bit machines that address 8-bit bytes and have several general registers.” -- Richard Stallman. It seems that to design a new CPU is easier than to build a cross compiler for a GIEE studient. http://gcc.gnu.org