Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 CS 431, Assignment 3, Assembler This document gives information on these 2 topics: 1. The specific assignment requirements for the MISC assembler. 2. An outline of how the MISC assembler might be implemented. 1. Assignment Requirements Your task is to write an assembler for MISC. This assembler will not be complete. It will have to be functional from beginning to end, but it only needs to include the logic to handle the specific instructions that appear in the example test program. It would be long and repetitive to include all of the possible combinations of instructions and operands. In other words, you need to write a program that will read an assembly language source file and produce a machine language file that will run on the MISC simulation. Background information on MISC is posted on the Web page. Information specifically on the MISC assembly language is given in the document miscinfoassembly.doc. That document should provide the background necessary in order to understand the functionality of the assembler. You may hand in your solution by email or on a diskette or CD. You may break your source code into as many classes as necessary. When handing in your assignment, provide a brief readme file in case the names of the files do not make it obvious how to compile and run your code. Your program class or classes will be black box tested on the sumtenV1asm.txt file, which is posted on the Web page. These are the possible outcomes of the assignment: 1. Your code does not compile correctly: ¼ credit. 2. Your code compiles, but it has run time errors or some other run time problem that either results in no output or output that is obviously incorrect: ½ credit. 3. Your code produces what superficially appears to be a correctly formatted machine language output file, but when this file is run on the MISC simulation, it does not give the right result: ¾ credit. 4. Your code produces an output file which gives the correct result when run on MISC: full credit. 2 2. Assignment Outline An assembler is not a compiler. However, some of the basic ideas involved in an assembler are related to the logic of a compiler. The basic theoretical difference between an assembler and a compiler can be explained as follows: Assembly language code does not have a hierarchical structure. That means that the assembler does not have to do syntax analysis, or parsing. Its functionality is linear. It does lexical analysis, or scanning, and assembly language code generation. In doing this it has to deal with the concept of identifiers and management of identifiers using a symbol table. A follow-up assignment to this one would be to write a compiler for a simple high level language that generated MISC assembly language as output. In order to produce machine language code as a final result, an assembler such as this one would be needed. Some ideas about implementation follow. You may or may not find these ideas helpful and you’re not obligated to make use of them. I wrote my code in Java and you may also be making use of an object-oriented language. However, the logic of my assembler was not object-oriented. In principle the assembler could be written as one long, complicated main() method based on loops and ifs. It is not immediately clear how to decompose the problem into classes, but from the structured programming point of view it is clear how to decompose it into a set of modules. In other words, the overall problem can be broken into manageable sized pieces by implementing functionality in modules. Modules can be implemented as static methods in the program class along with the main() method. Because of the simplified nature of MISC, there are some shortcuts that can be taken in implementing the assembler. What follows is a list of points describing the overall structure of my solution. This is followed by a list of the individual modules and a summary of what they do. 1. The main program is structured as a loop that reads the source file, picking up lines of code and keeping a count of the lines of code. The line count is used for various things, and in order to save some parameter passing, it’s declared static in main(). 2. In addition to opening the source assembly language file, main() opens a target machine language file. Every time through the loop, if a call to a module returns a String, this is appended to the target file as machine language output from the assembler. 3. The MISC assembly language requires that all variables be declared up front in the data segment. The example assembly language program does not contain any ifs or other statements that would cause a forward reference. These two facts together mean that it is possible to write a one pass assembler. 4. The MISC assembly language is based on 3 units separated by slashes on each line. The assembler can be written to always pick up 3 units at a time. Depending on what kind of line of code it is, some of the units may be blank. 3 5. Inside the loop in main() is a large if statement which checks the first unit in a line of code to see whether it’s a directive, an instruction, or a variable. A module may be called, based on what the unit is, which further determines how to handle it, and which may return a String as output. 6. The symbol table is implemented as an instance of the Java class HashTable. This makes it possible to enter a key and a value and retrieve the value later based on the key, using the methods put() and get(). In order to save parameter passing, the symbol table is declared static in main(). 7. A HashTable key has to be a reference to an object of a class which implements both the hashCode() and the equals() methods. The String class satisfies this requirement, so keys consisting of Strings can contain the identifiers for variables and labels. 8. The values in a HashTable also have to be references. Labels have numeric addresses and variables have numeric values. Instead of writing a new class to hold symbol table values, it is possible to make use of the MachineByteV1 class. This class has the methods copyIntToByte() and getStringFromByte(). They can be used for converting between an integer linecount in the assembler, for example, and the String representing that linecount in binary which might be needed to represent a label address in the generated code. 9. Because all of the possible units that may legally be at the beginning of a line of code are explicitly present in the if statements, it is not necessary to pre-load the symbol table with directives or instructions. 10. The .DATA and .CODE directives have minimal meaning. They are a convenience for the assembly language code writer so that it’s not necessary to include blank lines up to a total of 8 lines for data declarations. In order to handle these directives correctly, it’s simply necessary to make sure the line count is correct and that the directive lines are not included in the count. 11. The .LABEL directive has more meaning. It is followed by an identifier. This has to be entered into the symbol table. The value that goes with it is its address, namely its line number in the source code. 12. The default case in the assembler if statement is for variables. If the first unit in a line doesn’t match any directive or instruction, it must be a user declared variable name. This is entered into the symbol table along with its value. 13. In MISC, the general category of an instruction is identified by its name. For example, if the first thing in a line of code is ADD, this is an ADD instruction. However, there are different machine language ADD instructions depending on the operands. Which of the instructions it is can only be determined by seeing what the operands are. The module for handling ADD receives these operands as parameters, determines what they are, and acts accordingly. 4 14. It is also helpful to write another static method that can convert a String containing hexadecimal digits to its integer value. This is needed in order to deal with numeric constants in the source code. The MachineByteV1 methods then make it possible to convert from integer to a String containing binary digits. Here are the signature lines for the methods in the implementation, preceded by the comments that exist in my code and followed by a summary statement of what happens in them: public static void main(String[] args) Declare the linecount and the symbol table as static variables. Loop, reading characters and forming the units of the lines of assembly code. Call the translation modules in an if statement based on the contents of the first unit. Write the translation out based on the module return values. Increment the line count. /* Decrement linecount. */ public static void handleData() keep the line count correct. /* Put together blank lines for unused variables if necessary and make sure the linecount is set to 8. */ public static String handleCode() Fill blank lines if there were < 8 variables. Keep the line count correct. /* Enter the String as a key in the symbol table with the current linecount as the value. Decrement linecount. This method relies on the MachineByteV1 class. */ public static void handleLabel(String labelname) Make an entry in the symbol table. Keep the line count correct. /* Put together the *’s for the end of the machine code. */ public static String handleEnd() Supply the string of *’s at the end. Return line of code. /* Put together the instruction String based on the parameters. In the example there are two cases: MOVE, REG, CONST; and MOVE, REG, MEM. The code only has to handle these cases. In order to deal with the constant operand the method hexStringToInt() is used. In order to handle conversions to bytes and symbol table entries methods of the class MachineByteV1 are used. */ public static String handleMove(String destination, String source) Deal with two cases: 5 MOVE REG CONST MOVE REG MEM Return line of code. (CONST hex conversion) (MEM symbol table access) /* Put together the instruction string based on the parameters. There are two cases: ADD, REG, CONST; and ADD, MEM, REG. */ public static String handleAdd(String destination, String source) Deal with two cases: ADD REG CONST (CONST hex conversion) ADD MEM REG (MEM symbol table access) Return line of code. /* Put together the instruction string based on the parameters. There is only one case in the example: SUB, REG, REG. */ public static String handleSub(String destination, String source) Deal with: SUB REG REG Return line of code. /* Look up the value in the symbol table and put together the instruction string. */ public static String handleJpos(String jumplabelname) Deal with the operand. ( symbol table access) Return line of code. /* Put the variable name as a key in the symbol table with the current linecount as the value. Return a string with the value converted from hex to binary. */ public static String handleVariable(String variablename, String hexvalue) Update symbol table. ( hex conversion) Return line of code. /* Convert a String of 0's and 1's to its integer value. */ public static int hexStringToInt(String hexstring) Return converted value.