Download MISC Assembly Language

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1
MISC Assembly Language
This document contains information about MISC and its assembly and machine
languages. The first part of the document contains information on the assembly language
mnemonics and the corresponding machine language opcodes. The second gives an
example illustrating the assembly language and its translation into a machine language
program. The third part of the document discusses specific features of the assembly
language as they relate to the task of writing an assembler for them.
1. Assembly language mnemonics and machine language instructions
For the operations of moving, adding, and subtracting, there is just one mnemonic
each in the assembly language. Each of these mnemonics corresponds with five different
machine instructions. Which machine language instruction goes with each mnemonic
can only be distinguished by determining what kinds of operands the mnemonic has. Part
of the logic of the assembler will be devoted to this task so that a correct machine
language translation can be constructed.
For these mnemonics, the possible operands are registers, memory variables, or
numeric constants. In the assembly language a general purpose register is identified by
one of the letters, A, B, C, or D. A memory variable is identified by a name which is
limited to the letters of the alphabet. It cannot have a name consisting simply of one of
the letters A, B, C, or D. It also cannot start with an X. A numeric constant starts with
the letter X and is followed by a value given in two hexadecimal digits.
Assembly language mnemonic
Machine language opcode
MOVE register, register
MOVE memory, register
MOVE register, memory
MOVE memory, constant
MOVE register, constant
10000001
10000010
10000011
10000100
10000101
ADD register, register
ADD memory, register
ADD register, memory
ADD memory, constant
ADD register, constant
10000110
10000111
10001000
10001001
10001010
SUB register, register
SUB memory, register
SUB register, memory
SUB memory, constant
SUB register, constant
10001011
10001100
10001101
10001110
10001111
2
The single parameter of a jump instruction is the address to jump to. In assembly
language it is the name of a label, given as an identifier consisting of letters of the
alphabet.
JMP unsigned integer
JPOS unsigned integer
JNEG unsigned integer
JZERO unsigned integer
JOVER unsigned integer
unconditional
on positive
on negative
on zero
on overflow
10010000
10010001
10010010
10010011
10010100
2. Example program
Here is the example program. It is given first in machine language, artificially
formatted line by line. This is followed by an assembly language translation. In the
following section the assumptions, conventions, and rules for the machine language are
given, followed by the same for assembly language.
data segment
00001011000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
code segment
10000101000001000000000100000000
10000111000000010000010000000000
10001010000001000000000100000000
10000011000000110000000000000000
10001011000000110000010000000000
10010001000010010000000000000000
********************************
.DATA///
LOOPLIM/X0B//
ACCUM/X00//
.CODE///
MOVE/D/X01/
.LABEL/LOOPTOP//
ADD/ACCUM/D/
ADD/D/X01/
MOVE/C/LOOPLIM/
3
SUB/C/D/
JPOS/LOOPTOP//
.END///
3. Language features
The features of the MISC machine language can be summarized as follows:
1. A program consists of at most 32 lines total.
2. The first 8 lines are set aside for memory variables. A program can have maximum of
8 memory variables, and if it has fewer than that, then the extra lines are filled with 0’s
and wasted. Whether or not one of these lines represents a memory variable depends in
the first instance on whether an initial value other than 0 is stored in the first 8 lines, and
in the second instance by whether that location is referred to by an operand of an
instruction in the code segment.
3. The MISC program loading logic expects the last line of a program to consists of *’s,
so there are effectively only 23 possible lines of code in a program.
4. Every line of source code, including the data segment, consists of 4 bytes. In the data
segment only 1 byte is used. The other 3 are wasted and filled with 0’s. In the code
segment up to 3 are used. In the current design the last is always wasted. Every line in
the code segment begins with an instruction opcode followed by either 1 or 2 operands.
If there is only 1, then the last 2 bytes are filled with 0’s. If there are 2, then the last byte
is filled with 0’s.
5. With the exception of the last line of *’s, a machine language source file is an
unbroken sequence of binary digits. In the MISC simulation, these binary digits are
given as characters, not literal binaries. What an operand represents, whether register,
memory variable, or constant, can only be determined by whether it is the first or second
operand and by which instruction opcode it is an operand of.
Just as the machine language is based on simplifying assumptions, the assembly
language sacrifices efficiency for simplicity. It is clear that some of the assembly
language simplifications stem directly from simplifications in the machine language. The
features of the MISC assembly language are summarized below with an emphasis on how
it can be translated into machine language.
1. An assembly language program contains directives, which are not literally source
code, in the sense that they aren’t translated into instructions. They are shown beginning
with a “.” and will be discussed individually below.
4
2. There is an upper limit of 8 on the number of memory variables in a program. Only
variables that exist are declared. There are no blank lines holding places for potential
memory variables that don’t exist.
3. Because of the directives and the treatment of variables the number of lines in an
assembly language program doesn’t necessarily agree with the number of lines in the
corresponding machine language program.
4. In order to make scanning straightforward, each line of assembly code is treated as
consisting of 3 units, or strings, separated by forward slashes with no spaces. Even if a
line has fewer than 3 units in it, it will contain a full set of slashes. Each unit in a line is
separated from the following unit by a slash and the line ends with a slash. Every line
contains exactly 3 slashes. Lines are separated from each other by newline characters.
5. The data segment of the program comes first and is identified by the “.DATA”
directive. For the memory variable declarations the initial value has to be loaded into the
corresponding line of the target machine language program, and the name of the variable
and its relative word address (the machine language program line it was loaded into) have
to be stored in the symbol table. At any point in the future when translating, if this
variable is referred to, its address should be retrieved from the symbol table and be used
in the machine code.
6. The code segment is set off by the “.CODE” directive. Translation of the move, add,
and subtract instructions is based on this logic: Determine the types of the operands
based on their form in assembly language, and pick the one out of the five opcodes for
the instruction mnemonic that is the right match. If an operand is one of the registers, A
through D, it is represented by the corresponding value, 1 through 4, respectively, in
binary. If an operand is a constant, its two hex digits have to be translated into binary. If
an operand is a memory variable, its address has to be looked up in the symbol table.
7. There is a third directive in the language, “.LABEL”. This is followed by an
identifier. When scanning the code this identifier has to be entered in the symbol table
along with its relative word address in the code. The directive does not generate a line of
machine language code. In effect, what is recorded is the address of the real line of code
which follows it.
8. Translating jump instructions differs from translating move, add, and subtract
instructions. There are 5 different mnemonics for the 5 different kinds of jumps, and they
each have one operand of the same kind, a label. The mnemonic can be translated
directly into the corresponding opcode and the label operand has to be looked up in the
symbol table so that its address can be placed in the machine language code.
9. The last line of code in an assembly language program is the “.END” directive.