Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Page 1 of 7 Unit-2 Fundamentals of Assembly Language and Assemblers Elements of Assembly Language Programming • An Assembly Language is a machine-dependent, low-level programming language which is specific to a computer system (or a family of computer systems). • Machine Dependent means different machine can have different Assembly Instructions for performing various jobs/tasks. This means that you write same C program on different machine and it works well, but you cannot write same assembly program on different machine. In other words a C program written for one machine will work on other machine without making any changes to your C program. Whereas an Assembly program written for one machine (architecture) will not work for other machine (architecture). In case of C program, the care to run your program on different machines (architectures) is taken by C Compiler. Hence, we say that C supports portability or C is portable language. • Low-level programming Language is a programming language that provides little or no abstraction from a computer's instruction set architecture. Generally this refers to either machine code or assembly language. The word "low" refers to the small or nonexistent amount of abstraction between the language and machine language; because of this, low-level languages are sometimes described as being "close to the hardware". [ ] Three Basic Features of an Assembly Language Programming are: Compared to the Machine Language of a computer system, it offers three features: 1. Mnemonic Operation Codes 2. Symbolic Operands 3. Data Declarations 1. Mnemonic Operation Codes • Eliminates the need to memorize numeric operation codes. • Enable the assembler to provide helpful diagnostics, for example, indication of misspelt Operation Codes. • E.g., MOVEM, MULT, ADD etc. 2. Symbolic Operands • Can be associated with Instructions (e.g., Label used to Jump/Go to) or Data. • Can be used as Operands in Statements. • The assembler performs memory bindings to these names; the programmer need not know any details of the memory bindings. This makes program modification easier. • Memory Binding – is activity of associating memory address with a symbolic name (mnemonic). • E.g., AREG, BREG, RESULT etc. 3. Data Declarations • Data can be declared using various notations. For example, Strings, Numbers (integers, floats), etc. can be supplied using notations such as Decimal, Hexa Decimal, Octal, etc. • This avoid manual conversion of constants into their machine representation (i.e., in Binary). • E.g. 123, 12.5, “HELLO”, ‘A’ etc. Description of Simple Assembly Language Format of an assembly language statement: [LABEL] <OPCODE> <OPERAND1> [, <OPERAND2> …] Page 2 of 7 Parts specified within [ ] are optional. At least one operand must be specified. But, there may be more than one operands. Some instructions do not require even first operand, e.g. STOP. The first operand is always a register, which can be any one of AREG, BREG, CREG and DREG. The second operand refers to a memory word (mostly a variable name, can be a register) using a symbolic name. Optionally, it can include Displacement and/or Index Register. The Operand has following syntax • <Symbol Name> [+Displacement] [(Index_Register_Number)] • If used, Displacement is added in Memory Address specified by Symbol_Name. It may be positive or negative. For example, if A represents (associated with) memory address 100, then A+5 represents effective memory address 105, A-3 represents effective memory address 97, and so on. • If used, Index Register, its content will be added in (Memory Address represented by) Symbol_Name to get effective memory address. For example, if Index Register 3 contains value 10, and Symbol A represents memory address 200, then A(3) represents effective memory address 210. Opcode (Operation Code) 00 01 Assembly Mnemonic Remarks Example STOP ADD 02 SUB 03 MULT 04 05 06 07 08 MOVER MOVEM COMP BC DIV 09 10 READ PRINT Stop Execution STOP First Operand is modified. That is, ADD AREG, A Result of operation is stored in (location specified by) first operand, Condition Code is Set First Operand is modified, SUB AREG, A Condition Code is Set First Operand is modified, MULT AREG, A Condition Code is Set Move to Register from Memory MOVER AREG, A Move to Memory from Register MOVEM AREG, A Sets Condition Code COMP AREG, BREG Branch on Condition BC ANY, LOOP First Operand is modified, DIV AREG, A Condition Code is Set First Operand is not used First Operand is not used Thus, using above statements “A=B+C*D” can be carried out as below. MOVER AREG, C MULT AREG, D ADD AREG, B MOVEM AREG, A Assembly Language Statements • An assembly program contains three types of statements: 1. Imperative statements Page 3 of 7 2. Declaration statements 3. Assembler directives 1. Imperative Statements • • An imperative -statement indicates an action to be performed during the execution of the assembled program. Each imperative statement typically translates into one machine instruction. 2. Declaration statements The syntax of declaration statements is as follows: Label DS <Constant> Label DC ‘<Value>’ • • • • • • • • The DS (short for Declare Storage) statement reserves areas of memory and assigns names to them. Consider the following DS statements: A DS 1 M DS 10 The first statement reserves 1 word in the memory and assigns name A to it. The second statement reserves 10 words in the memory and assigns name M to it. The name M is associated with the first word of the block. Other words in the block can be accessed through offsets from M, e.g., M+5 is the sixth word of the reserved memory block for M. The DC (short for Declare Constant) statement constructs memory words containing constants. Consider the following DC statements: A DC ‘5’ M DC ’85’ M+1 DC ’75’ M+2 DC ’80’ • Contrary to the name 'declare constant', the DC statement does not really implement constants, it merely initializes memory words to given values. • These values are not protected by the assembler; they may be changed by moving a new value into the memory word. • For example, the value of M can be changed by executing an instruction: MOVEM BREG , A • A literal is an operand with the syntax =‘<value>'. ADD AREG, 5 (Simple assembly language does not support this). ADD AREG, =‘5’ • It differs from a constant because its location cannot be specified in the assembly program. • This helps to ensure that its value is not changed during execution of a program. • The value of the literal is protected because the name and address of this word is not known to the assembly language programmer. 3. Assembler Directives • Assembler directives instruct the assembler to perform certain actions during the assembly of a program. START <constant> Page 4 of 7 • This directive indicates that the first word of the target program generated by the assembler should be placed in the memory word with address <constant>. END [operand] • This directive indicates the end of the source program. The optional <operand spec> indicates the address of the instruction where the execution of the program should begin. • By default, execution begins with the first instruction of the assembled program. Advantages of Assembly Language • • • • • It is easier to change/correct an Assembly language program than a machine language program because memory addresses are not to be changed. It is easier to understand and write. Hence, saves time and effort of programmer. It has diagnostic capability for error detection (e.g., use of misspelled mnemonic name) In situations where special hardware features of a computer are to be used, use of Assembly language is better. It can be used to write efficient and controlled code Design Specification of an assembler • To develop the design specification of an assembler following four steps are followed: 1. Identify the information necessary to perform a task. 2. Design a suitable data structure to record the information. 3. Determine the processing necessary to obtain and maintain the information. 4. Determine the processing necessary to perform the task. Synthesis phase of an assembler • • • • • • • • • Consider the assembly statement MOVER BREG, ONE We must have the following information to synthesize the machine instruction corresponding to this statement: 1. Address of the memory word with which name ONE is associated. 2. Machine operation code corresponding to the mnemonic MOVER. The first item of information depends on the source program. Hence it must be made available by the analysis phase. The second item of information does not depend on the source program. It merely depends on the assembly language. Hence the synthesis phase can determine this information for itself. Based on the above discussion. we consider the use of two data structures during the synthesis phase: (1) Symbol table (2) Mnemonics table. Each entry of the symbol table has two primary fields-name and address. The symbol table is built by the analysis phase. An entry in the mnemonics table has two primary fields-mnemonic and opcode. The synthesis phase uses these tables to obtain the machine address with which a name is associated and the machine opcode corresponding to a mnemonic, respectively. Name Address ONE 201 N 303 Page 5 of 7 … … Name Opcode ADD 01 SUB … Symbol table 02 … Mnemonic table Analysis phase of an assembler • • • • • • • • • • • • • • The primary function performed by the analysis phase is the building of the symbol table. For this purpose it must determine the addresses of symbolic names used in a program. It is possible to determine some addresses directly, e.g. the address of the first instruction in the program. However others must be inferred/calculated (e.g., address of current instruction is address of previous instruction plus length of previous instruction) To determine address of all program elements is known as memory allocation. Thus, memory allocation is necessary to build the symbol table. Memory allocation must be completed in order to build the symbol table. To implement memory allocation a data structure called location Counter (LC) is introduced. The location counter is always made to contain the address of the next memory word in the target program. It is initialized to the constant specified in the START statement. Whenever the analysis phase sees a label in an assembly statement, it enters the label and the contents of LC in a new entry of the symbol table. A DC ‘5’ M DC ’85’ It then finds the number of memory words required by the assembly statement and updates the LC contents. (Hence the word 'counter' in ' location counter“) To update contents of LC, the analysis phase needs to know lengths of different instructions. Hence, the mnemonics table can be extended to have one more field: length. Thus, mnemonics table is accessed by analysis and synthesis phase. But, symbol table is constructed during analysis phase & used during synthesis phase. • The tasks performed by analysis and synthesis phase are follows: Analysis phase 1. Isolate the label, mnemonic opcode and operand fields of a statement. Page 6 of 7 2. If a label is present, enter the pair (symbol, <LC contents>) in a new entry of symbol table. 3. Check validity of the mnemonic opcode through a look-up in the Mnemonics table. 4. Perform LC processing. That is, update the value contained in LC by considering the opcode and operands of the statement. Synthesis phase 1. Obtain the machine opcode corresponding to the mnemonic from the Mnemonics table. 2. Obtain address of a memory operand from the Symbol table. 3. Synthesize a machine instruction or the machine form of a constant, as the case may be. Pass structure of assemblers • • • • A pass is a complete scan of the source program. An assembler can be a single-pass or a two-pass assembler. In a two-pass assembler tackles forward references easily whereas a single-pass assembler has to use back-patching to handle forward references. A forward reference is using a symbol before declaring it. Two-pass translation • • • • In the first pass, LC processing is performed and symbols defined in the program are entered into the symbol table. The second pass performs synthesis of target program. To do so, it uses address information available in the symbol table. The first pass constructs intermediate representation (IR) of code. This IR code consists of: symbol table & intermediate code (IC). Single-pass translation • • • • • • • Consider the following instruction in which the variable SUM is yet not declared: MOVER BREG, SUM This statement will be synthesized partially. The instruction opcode & address of BREG will be assembled to reside in memory. Assume that the instruction is stored at memory location 101. A Table of Incomplete (i.e., partially synthesized) Instruction (TII) is maintained. So, an entry for the instruction discussed above will be made in this table. This entry has two fields: instruction address and name of unresolved symbol like <101 , SUM> By the time END statement is processed, the symbol table would contain the address of all the symbols used in the program and TII would contain information about all forward references. The assembler can now process each entry in TII to complete the concerned instruction. For example, the entry <101, SUM> would be processed by obtaining the address of SUM from symbol table and inserting the same it in the operand address field of the instruction. Design of a Two-pass assembler • • Tasks performed by a two-pass assembler are as follows: Pass-1 1. Separate the symbol, mnemonic opcode and operand fields. 2. Build the symbol table. 3. Perform the LC processing. Page 7 of 7 • 4. Construct the IR. Pass-2 1. Synthesize the target program. Questions 1. Explain in brief 3 features of an Assembly Language. What are the advantages of Assembly Language? 2. Explain in brief simple Assembly Language. 3. Explain in detail different types of Statements in Assembly Program 4. Explain in brief Analysis and Synthesis Phases (Design Specification) of Assembler 5. Explain Pass Structure of Assemblers