Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 MISC Assembly Language This document contains information about MISC and its assembly and machine languages. The first part of the document contains information on the assembly language mnemonics and the corresponding machine language opcodes. The second gives an example illustrating the assembly language and its translation into a machine language program. The third part of the document discusses specific features of the assembly language as they relate to the task of writing an assembler for them. 1. Assembly language mnemonics and machine language instructions For the operations of moving, adding, and subtracting, there is just one mnemonic each in the assembly language. Each of these mnemonics corresponds with five different machine instructions. Which machine language instruction goes with each mnemonic can only be distinguished by determining what kinds of operands the mnemonic has. Part of the logic of the assembler will be devoted to this task so that a correct machine language translation can be constructed. For these mnemonics, the possible operands are registers, memory variables, or numeric constants. In the assembly language a general purpose register is identified by one of the letters, A, B, C, or D. A memory variable is identified by a name which is limited to the letters of the alphabet. It cannot have a name consisting simply of one of the letters A, B, C, or D. It also cannot start with an X. A numeric constant starts with the letter X and is followed by a value given in two hexadecimal digits. Assembly language mnemonic Machine language opcode MOVE register, register MOVE memory, register MOVE register, memory MOVE memory, constant MOVE register, constant 10000001 10000010 10000011 10000100 10000101 ADD register, register ADD memory, register ADD register, memory ADD memory, constant ADD register, constant 10000110 10000111 10001000 10001001 10001010 SUB register, register SUB memory, register SUB register, memory SUB memory, constant SUB register, constant 10001011 10001100 10001101 10001110 10001111 2 The single parameter of a jump instruction is the address to jump to. In assembly language it is the name of a label, given as an identifier consisting of letters of the alphabet. JMP unsigned integer JPOS unsigned integer JNEG unsigned integer JZERO unsigned integer JOVER unsigned integer unconditional on positive on negative on zero on overflow 10010000 10010001 10010010 10010011 10010100 2. Example program Here is the example program. It is given first in machine language, artificially formatted line by line. This is followed by an assembly language translation. In the following section the assumptions, conventions, and rules for the machine language are given, followed by the same for assembly language. data segment 00001011000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 code segment 10000101000001000000000100000000 10000111000000010000010000000000 10001010000001000000000100000000 10000011000000110000000000000000 10001011000000110000010000000000 10010001000010010000000000000000 ******************************** .DATA/// LOOPLIM/X0B// ACCUM/X00// .CODE/// MOVE/D/X01/ .LABEL/LOOPTOP// ADD/ACCUM/D/ ADD/D/X01/ MOVE/C/LOOPLIM/ 3 SUB/C/D/ JPOS/LOOPTOP// .END/// 3. Language features The features of the MISC machine language can be summarized as follows: 1. A program consists of at most 32 lines total. 2. The first 8 lines are set aside for memory variables. A program can have maximum of 8 memory variables, and if it has fewer than that, then the extra lines are filled with 0’s and wasted. Whether or not one of these lines represents a memory variable depends in the first instance on whether an initial value other than 0 is stored in the first 8 lines, and in the second instance by whether that location is referred to by an operand of an instruction in the code segment. 3. The MISC program loading logic expects the last line of a program to consists of *’s, so there are effectively only 23 possible lines of code in a program. 4. Every line of source code, including the data segment, consists of 4 bytes. In the data segment only 1 byte is used. The other 3 are wasted and filled with 0’s. In the code segment up to 3 are used. In the current design the last is always wasted. Every line in the code segment begins with an instruction opcode followed by either 1 or 2 operands. If there is only 1, then the last 2 bytes are filled with 0’s. If there are 2, then the last byte is filled with 0’s. 5. With the exception of the last line of *’s, a machine language source file is an unbroken sequence of binary digits. In the MISC simulation, these binary digits are given as characters, not literal binaries. What an operand represents, whether register, memory variable, or constant, can only be determined by whether it is the first or second operand and by which instruction opcode it is an operand of. Just as the machine language is based on simplifying assumptions, the assembly language sacrifices efficiency for simplicity. It is clear that some of the assembly language simplifications stem directly from simplifications in the machine language. The features of the MISC assembly language are summarized below with an emphasis on how it can be translated into machine language. 1. An assembly language program contains directives, which are not literally source code, in the sense that they aren’t translated into instructions. They are shown beginning with a “.” and will be discussed individually below. 4 2. There is an upper limit of 8 on the number of memory variables in a program. Only variables that exist are declared. There are no blank lines holding places for potential memory variables that don’t exist. 3. Because of the directives and the treatment of variables the number of lines in an assembly language program doesn’t necessarily agree with the number of lines in the corresponding machine language program. 4. In order to make scanning straightforward, each line of assembly code is treated as consisting of 3 units, or strings, separated by forward slashes with no spaces. Even if a line has fewer than 3 units in it, it will contain a full set of slashes. Each unit in a line is separated from the following unit by a slash and the line ends with a slash. Every line contains exactly 3 slashes. Lines are separated from each other by newline characters. 5. The data segment of the program comes first and is identified by the “.DATA” directive. For the memory variable declarations the initial value has to be loaded into the corresponding line of the target machine language program, and the name of the variable and its relative word address (the machine language program line it was loaded into) have to be stored in the symbol table. At any point in the future when translating, if this variable is referred to, its address should be retrieved from the symbol table and be used in the machine code. 6. The code segment is set off by the “.CODE” directive. Translation of the move, add, and subtract instructions is based on this logic: Determine the types of the operands based on their form in assembly language, and pick the one out of the five opcodes for the instruction mnemonic that is the right match. If an operand is one of the registers, A through D, it is represented by the corresponding value, 1 through 4, respectively, in binary. If an operand is a constant, its two hex digits have to be translated into binary. If an operand is a memory variable, its address has to be looked up in the symbol table. 7. There is a third directive in the language, “.LABEL”. This is followed by an identifier. When scanning the code this identifier has to be entered in the symbol table along with its relative word address in the code. The directive does not generate a line of machine language code. In effect, what is recorded is the address of the real line of code which follows it. 8. Translating jump instructions differs from translating move, add, and subtract instructions. There are 5 different mnemonics for the 5 different kinds of jumps, and they each have one operand of the same kind, a label. The mnemonic can be translated directly into the corresponding opcode and the label operand has to be looked up in the symbol table so that its address can be placed in the machine language code. 9. The last line of code in an assembly language program is the “.END” directive.