Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Assembly Process 1 The Assembly Process Assembler has to perform two major tasks: translate assembly language code into machine code assign addresses for all symbolic labels This is generally done on a line by line basis Often two or more passes required The programmer must determine what the initial state of memory should be to execute the program. Includes both data and text areas although program itself may initialise data areas, such initialisation should occur at beginning of program. 2 Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than simply mapping assembly instructions to machine instructions Each instruction is bound to an address Labels are bound to addresses Assembly instructions which refer to labels generate machine instructions which contain the label's address Pseudo-instructions are translated into one or more machine instructions 3 The Symbol Table The assembler scans the source code and generates the appropriate bit string for each line encountered The assembler must remember what memory locations have been allocated to which address each label is bound A symbol table is a list of (label, address) pairs When the data and text segments have been generated, they are stored as an executable file The file is used by a program called the loader to initialize memory to the appropriate state before execution 4 Instructions The .text directive tells the assembler that the lines which follow are instructions. In some cases, a symbol may not have an assigned address yet when the assembler scans the line where it belongs By default, the text segment starts at 0x00000 Different for different processors A second pass through the code can update instructions containing unresolved labels Maintain a list of addresses in which each unresolved label appears When the label is added to the symbol table, all locations in the corresponding list are updated to hold the address associated with the label 5 Pseudo-instructions Some assembly languages include PseudoInstructions (not WRAMP). Pseudo-instructions are not directly implemented in the processor Pseudo instructions map to (generally) one or two actual processor instructions The assembler makes the substitution E.g. load high instruction could be implemented as load and shift 6 Jump target calculation The jump instruction has two forms Direct, for j and jal Register direct for jr and jalr jr and jalr specify a register containing the address to be loaded into the PC j and jal specify most of the address of the target within the instruction. However, their range is limited by the space allocated in the instruction format f e d c b a 9 8 7 6 5 4 3 2 1 0 7 Jump Target Calculation Jump Instruction has 20 bits allocated for the address. 12 bits are opcode and registers. WRAMP solution This limits the jump to 1M address range. Address space is 20 bits = 1M words = 4 MB Alternative solution (MIPS -32bit, byte addressable) The bottom 2 bits are always zero - word boundaries The highest-order bits of the target are taken from the address currently stored in the program counter PC opcode Jump target bits 0 0 8 Branch Offset In machine code, the target address in a branch must be specified as an offset from the address of the branch. During execution, this offset is simply added to the program counter to fetch the next instruction PC contains the address of the next instruction Offset is measured in words, not bytes even on byte addressable architectures PC_NEW = offset*1 + PC_OLD To calculate the offset, the assembler uses the formula: offset = target instruction address – (branch instruction address + 1) 9 Branch Offset Calculation The offset is stored in the instruction as a word offset rather than a byte offset. An offset maybe negative Instructions are only stored at word boundaries WRAMP uses word addresses, but for byte addressable processors both target and branch instruction have at least two bits of the address as zero If the target instruction preceded the branch instruction The offset is stored in the 20-bit Offset field This means the branch can jump 219 instructions before or after the current address On other architectures this may be a more serious limitation. 10 Branch Offset Calculation An entry in the WRAMP instruction list offset in bytes (main = 0x00000) 0x00000 – (0x0001A + 1) = - 27 stored offset ffe5 = -27 [0x0001A] 0xb02fffe5 bne $2, -27 26: bnez $2, main machine code orignal assembly code instruction address line number in source file 11 Program relocation It is possible that program modules are developed separately by individual programmers. When these programs are to be loaded into memory they should not be assigned overlapping memory space. To handle this problem, the modules have to be relocated relative addresses are relocatable Any absolute references must be "fixed" by the loader Use a logical base address known at load time Absolute addresses are stored as offsets from this TBD base 12 From source to executable high-level source code lib obj asm exe asm obj linker compiler assembler loader memory 13 An Example of Assembling Code .data a1: .word 3 a2: .word 16 a3: .word 5 .text .global main main: la $6, a2 loop: lw $7, 1($6) lw $10, 0($6) mult $9, $10, $7 beqz $9, loop j loop syscall 14 Some examples of assembling code .data a1: .word 3 a2: .word 16 a3: .word 5 .text .global main main: la $6, a2 loop: lw $7, 1($6) lw $10, 0($6) mult $9, $10, $7 beqz $9, loop j loop syscall Symbol Table symbol address a1 00007 a2 00008 a3 00009 main 00000 loop xxxxx Memory map of data section address contents 00007 0000 0003 00008 0000 0010 00009 0000 0005 15 Translate to Machine Code la $6, a2 lw $7, 1($6) lw $10, 0($6) mult $9, $10, $7 beqz $9, loop j loop syscall address 0x00000 0x00001 0x00002 0x00003 0x00004 0x00005 0x00006 contents c6000008 (la) 87600001 (lw) 8a600000 (lw) 09a40007 (mult) a09xxxxx (beqz) 400xxxxx (j) 200d0000 (syscall) 16 Resolve Symbols la $6, a2 lw $7, 1($6) lw $10, 0($6) mult $9, $10, $7 beqz $9, loop j loop syscall address 0x00000 0x00001 0x00002 0x00003 0x00004 0x00005 0x00006 contents c6000008 (la) 87600001 (lw) 8a600000 (lw) 09a40007 (mult) a09xxxxx (beqz) 40000001 (j) 200d0000 (syscall) 17 Resolve Relative References la $6, a2 lw $7, 1($6) lw $10, 0($6) mult $9, $10, $7 beqz $9, loop j loop syscall address 0x00000 0x00001 0x00002 0x00003 0x00004 0x00005 0x00006 0x00001 - (0x00004+1) = contents c6000008 (la) 87600001 (lw) 8a600000 (lw) 09a40007 (mult) a09ffffc (beqz)(-4) 40000001 (j) 200d0000 (syscall) -4 = 0xfffc 18 Summary Have been looking at the relationship between high level languages and machine code “C” wcc WRAMP assembler wasm wlink WRAMP Machine code –WRAMP architecture used as the example •instructions based on 3 operand format •example of register-register architecture 19 Summary- WRAMP processor Register file provides small fast memory Instructions for Arithmetic and logic instructions Test Instructions Flow of control Instructions Memory Instructions 20 Summary – WRAMP processor WRAMP Instruction set has four addressing modes Register Direct (e.g. add $3, $4, $5) Immediate (e.g. addi $3, $4, 0x12) Base displacement (e.g. lw $3, 8($5) PC relative (e.g. beqz $3 0x123) 21 Summary - Compilation/Assembly wcc cross compiler used as example; see exercise 2 Compiles with WRAMP conventions Uses a stack frame for each procedure which contains space to: parameter passing register save conventions save parameters save local variables use as a register save area (e.g. $ra) Machine code output of compilation process 22