Download CS 431, Assignment 2, Chapter 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1
CS 431, Assignment 3, Assembler
This document gives information on these 2 topics:
1. The specific assignment requirements for the MISC assembler.
2. An outline of how the MISC assembler might be implemented.
1. Assignment Requirements
Your task is to write an assembler for MISC. This assembler will not be
complete. It will have to be functional from beginning to end, but it only needs to
include the logic to handle the specific instructions that appear in the example test
program. It would be long and repetitive to include all of the possible combinations of
instructions and operands.
In other words, you need to write a program that will read an assembly language
source file and produce a machine language file that will run on the MISC simulation.
Background information on MISC is posted on the Web page. Information specifically
on the MISC assembly language is given in the document miscinfoassembly.doc. That
document should provide the background necessary in order to understand the
functionality of the assembler.
You may hand in your solution by email or on a diskette or CD. You may break
your source code into as many classes as necessary. When handing in your assignment,
provide a brief readme file in case the names of the files do not make it obvious how to
compile and run your code. Your program class or classes will be black box tested on the
sumtenV1asm.txt file, which is posted on the Web page.
These are the possible outcomes of the assignment:
1. Your code does not compile correctly: ¼ credit.
2. Your code compiles, but it has run time errors or some other run time problem that
either results in no output or output that is obviously incorrect: ½ credit.
3. Your code produces what superficially appears to be a correctly formatted machine
language output file, but when this file is run on the MISC simulation, it does not give the
right result: ¾ credit.
4. Your code produces an output file which gives the correct result when run on MISC:
full credit.
2
2. Assignment Outline
An assembler is not a compiler. However, some of the basic ideas involved in an
assembler are related to the logic of a compiler. The basic theoretical difference between
an assembler and a compiler can be explained as follows: Assembly language code does
not have a hierarchical structure. That means that the assembler does not have to do
syntax analysis, or parsing. Its functionality is linear. It does lexical analysis, or
scanning, and assembly language code generation. In doing this it has to deal with the
concept of identifiers and management of identifiers using a symbol table. A follow-up
assignment to this one would be to write a compiler for a simple high level language that
generated MISC assembly language as output. In order to produce machine language
code as a final result, an assembler such as this one would be needed.
Some ideas about implementation follow. You may or may not find these ideas
helpful and you’re not obligated to make use of them. I wrote my code in Java and you
may also be making use of an object-oriented language. However, the logic of my
assembler was not object-oriented. In principle the assembler could be written as one
long, complicated main() method based on loops and ifs. It is not immediately clear how
to decompose the problem into classes, but from the structured programming point of
view it is clear how to decompose it into a set of modules. In other words, the overall
problem can be broken into manageable sized pieces by implementing functionality in
modules. Modules can be implemented as static methods in the program class along with
the main() method.
Because of the simplified nature of MISC, there are some shortcuts that can be
taken in implementing the assembler. What follows is a list of points describing the
overall structure of my solution. This is followed by a list of the individual modules and
a summary of what they do.
1. The main program is structured as a loop that reads the source file, picking up lines of
code and keeping a count of the lines of code. The line count is used for various things,
and in order to save some parameter passing, it’s declared static in main().
2. In addition to opening the source assembly language file, main() opens a target
machine language file. Every time through the loop, if a call to a module returns a String,
this is appended to the target file as machine language output from the assembler.
3. The MISC assembly language requires that all variables be declared up front in the
data segment. The example assembly language program does not contain any ifs or other
statements that would cause a forward reference. These two facts together mean that it is
possible to write a one pass assembler.
4. The MISC assembly language is based on 3 units separated by slashes on each line.
The assembler can be written to always pick up 3 units at a time. Depending on what
kind of line of code it is, some of the units may be blank.
3
5. Inside the loop in main() is a large if statement which checks the first unit in a line of
code to see whether it’s a directive, an instruction, or a variable. A module may be
called, based on what the unit is, which further determines how to handle it, and which
may return a String as output.
6. The symbol table is implemented as an instance of the Java class HashTable. This
makes it possible to enter a key and a value and retrieve the value later based on the key,
using the methods put() and get(). In order to save parameter passing, the symbol table is
declared static in main().
7. A HashTable key has to be a reference to an object of a class which implements both
the hashCode() and the equals() methods. The String class satisfies this requirement, so
keys consisting of Strings can contain the identifiers for variables and labels.
8. The values in a HashTable also have to be references. Labels have numeric addresses
and variables have numeric values. Instead of writing a new class to hold symbol table
values, it is possible to make use of the MachineByteV1 class. This class has the
methods copyIntToByte() and getStringFromByte(). They can be used for converting
between an integer linecount in the assembler, for example, and the String representing
that linecount in binary which might be needed to represent a label address in the
generated code.
9. Because all of the possible units that may legally be at the beginning of a line of code
are explicitly present in the if statements, it is not necessary to pre-load the symbol table
with directives or instructions.
10. The .DATA and .CODE directives have minimal meaning. They are a convenience
for the assembly language code writer so that it’s not necessary to include blank lines up
to a total of 8 lines for data declarations. In order to handle these directives correctly, it’s
simply necessary to make sure the line count is correct and that the directive lines are not
included in the count.
11. The .LABEL directive has more meaning. It is followed by an identifier. This has to
be entered into the symbol table. The value that goes with it is its address, namely its line
number in the source code.
12. The default case in the assembler if statement is for variables. If the first unit in a
line doesn’t match any directive or instruction, it must be a user declared variable name.
This is entered into the symbol table along with its value.
13. In MISC, the general category of an instruction is identified by its name. For
example, if the first thing in a line of code is ADD, this is an ADD instruction. However,
there are different machine language ADD instructions depending on the operands.
Which of the instructions it is can only be determined by seeing what the operands are.
The module for handling ADD receives these operands as parameters, determines what
they are, and acts accordingly.
4
14. It is also helpful to write another static method that can convert a String containing
hexadecimal digits to its integer value. This is needed in order to deal with numeric
constants in the source code. The MachineByteV1 methods then make it possible to
convert from integer to a String containing binary digits.
Here are the signature lines for the methods in the implementation, preceded by
the comments that exist in my code and followed by a summary statement of what
happens in them:
public static void main(String[] args)
Declare the linecount and the symbol table as static variables.
Loop, reading characters and forming the units of the lines of assembly code.
Call the translation modules in an if statement based on the contents of the first
unit.
Write the translation out based on the module return values.
Increment the line count.
/* Decrement linecount. */
public static void handleData()
keep the line count correct.
/* Put together blank lines for unused variables if necessary and make sure the linecount
is set to 8. */
public static String handleCode()
Fill blank lines if there were < 8 variables.
Keep the line count correct.
/* Enter the String as a key in the symbol table with the current linecount as the value.
Decrement linecount. This method relies on the MachineByteV1 class. */
public static void handleLabel(String labelname)
Make an entry in the symbol table.
Keep the line count correct.
/* Put together the *’s for the end of the machine code. */
public static String handleEnd()
Supply the string of *’s at the end.
Return line of code.
/* Put together the instruction String based on the parameters. In the example there are
two cases: MOVE, REG, CONST; and MOVE, REG, MEM. The code only has to
handle these cases. In order to deal with the constant operand the method
hexStringToInt() is used. In order to handle conversions to bytes and symbol table
entries methods of the class MachineByteV1 are used. */
public static String handleMove(String destination, String source)
Deal with two cases:
5
MOVE REG CONST
MOVE REG MEM
Return line of code.
(CONST  hex conversion)
(MEM  symbol table access)
/* Put together the instruction string based on the parameters. There are two cases:
ADD, REG, CONST; and ADD, MEM, REG. */
public static String handleAdd(String destination, String source)
Deal with two cases:
ADD REG CONST
(CONST  hex conversion)
ADD MEM REG
(MEM  symbol table access)
Return line of code.
/* Put together the instruction string based on the parameters. There is only one case in
the example: SUB, REG, REG. */
public static String handleSub(String destination, String source)
Deal with:
SUB REG REG
Return line of code.
/* Look up the value in the symbol table and put together the instruction string. */
public static String handleJpos(String jumplabelname)
Deal with the operand.
( symbol table access)
Return line of code.
/* Put the variable name as a key in the symbol table with the current linecount as the
value. Return a string with the value converted from hex to binary. */
public static String handleVariable(String variablename, String hexvalue)
Update symbol table.
( hex conversion)
Return line of code.
/* Convert a String of 0's and 1's to its integer value. */
public static int hexStringToInt(String hexstring)
Return converted value.