Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CPE 471 Assemblers 1 Assembly Instructions Four parts to an instruction: We are using a flexible format Label(s) Operation Operands Comments Instruction is still on one line (except label) Spaces and tabs can appear between “tokens” Alternative: fixed format Label in column 1; Op in column 16; Operands in col 24, etc. CPE 471 2 Assembly Instructions Label definition: Symbolic name for instructions address Clarifies branching to an address and data's address Often severely restricted in length Starts anywhere before operation Contains letters (a-zA-Z), digits, ‘$’, ‘_’, ‘.’ CPE 471 3 Assembly Instructions Operation field Operand field Mnemonic for an instruction (ADD, BRA) or Mnemonic for a pseudo-instruction (DC, EQU) Addresses and registers used by instruction In other words, what to add, where to branch Comment field: no affect on assembler Used for documentation. CPE 471 4 What's an Assembler Translation: Source program translated into target Source and target define levels Source is not directly executed (what is that?) Target or object file is executed or translated later Assembly language means the source is a symbolic representation of machine language When the source is higher level the translator is usually called a compiler CPE 471 5 Advantages of Assembly Compared to machine code Easier to remember (HLT vs 26577) (symbolic) Similarly for addresses in program (symbolic) BGT loop1 vs BGT 6210 Over high-level language Access to full capabilities of the machine testing overflow flag, test & set (atomic), registers Speed CPE 471 6 Advantages of Assembly Often a holdover from when machines were expensive and people where cheap Systems programming often done in a language like C Old myth: "if a program will be used a lot, it should (for efficiency) be written in assembly" Hard to write (10 lines/day independent of language) Hard to read: high maintenance, high turnover Reality: good compilers, fast machines CPE 471 7 Modern Approach Write in high level language Analyze to find where time spent Invariably, it's a tiny part of the code Improve that tiny part (perhaps with assembly) Problem oriented language allows high level insights Algorithmic insights save tremendously Assembly programmer immersed in bit-twiddling (penny wise, pound foolish) CPE 471 8 Types of Assemblers Assemblers can be classified according to the number of passes (scanning the source file) to: One-pass assembler: some restrictions are imposed on the forward referencing such as eliminating forward references to data. Two-pass assembler: the only restriction on forward referencing is in the symbol definition, i.e., all assembler directives (e.g. EQU and ORG) that define symbols can only use symbols that are previously defined. Multi-pass assembler, restrictions are made on the level of nesting in forward referencing. CPE 471 9 Assembler Tasks Parse assembly instructions Assigns addresses to instructions Check for syntax Tokenize string Maintains location counter LC = eventual location in memory of this instruction Generate machine code Evaluation mnemonic instructions (replace w/opcode) Evaluate operand sub-field (replace symbols with value) CPE 471 10 Assembler Tasks Process pseudo-ops Concatenate to form instructions generate header record evaluate DC, DS, … etc Write output object file Nothing here seems all that hard! CPE 471 11 Example: First Attempt Read each input line and generate machine code Associate symbol with location counter Lookup mnemonic and get opcode Generate instruction Example: CPE 471 12 Example Test A Begin N ANS ORIG EQU LD LD ST HLT DC DS END $0100 16 R0,N R1, #A R0, ANS 13 1 Begin CPE 471 13 Data Structures Location counter (LocCounter) Search a table (OpTable) for mnemonic Get opcode Prepare to handle arguments (group like instructions together?) Translate arguments Lookup/add symbol names (SymTable) and replace with location Watch out for relative offsets CPE 471 14 Location Counter Eventual address of instruction Initialized with 0 Increment with each instruction (see OpTable). Always two for our machine CPE 471 15 2-pass Assembler Solution: 2-pass assembler Pass #1: identify and define all labels As location of each label is determined, save in SymTable Requires some knowledge of instructions Pass #2: generate object code Whenever a label is used, replace with value saved in table CPE 471 16 Pass I: Determine length of machine instructions. Keep track of Location Counter (LC). Generate a table of symbols for pass 2. Process some pseudo operations. Remember literals. CPE 471 17 Pass II Look up value of symbols. Generate machine code for instructions. Generate data for DS, DC, and literals. Process pseudo operations. CPE 471 18 Pass 1 & 2 Communication Scan text file twice Save symbol locations first pass, then plug in 2nd Simpler Disadvantage: slow 6,800 instructions medium size file 52,000 instructions large file CPE 471 19 Big Picture: Labs 1 & 2 Assembly File Assembler Object File Linker Lab 2 Disassembler Lab 1 Lab 3 Executable File CPE 471 20 Table Driven Software Many times software is very repetitive Many times the information processes is repetitive Use functions! Use loops Many times the information to write the code is repetitive but static: Use tables CPE 471 21 Table Driven Software Easier to modify: add new entries, change existing ones, well centralized Code is easier, eventually, to understand Works if there are not many exceptions CPE 471 22 Machine OP Table Mnemonic Name Opcode Instruction Size Instruction Format This table is static (unchanging) For our machine: All opcodes are 6 bits Instructions size is one or two words. Formats do differ CPE 471 23 Machine Op Table Variable length instructions: "Branch relative": PC <- PC + operand Near: operand is 9 bits, far operand is 16 bits. Varying formats Fixed format makes parsing simple CPE 471 24 Machine Op Table Fields might include: Name: Type: opcode: Size: “add” ALU 000000 2 or 4 One entry for each instruction CPE 471 25 Symbol Table Pass 1: Each symbol is defined. Every time a new symbol seen (i.E., In a label) put it in the symbol table If already there, error CPE 471 26 Symbol Table (Pass #1) Each symbol gives a value Explicit: A EQU 16 Easy: just put operand in table Implicit: N DC 20 Must know address of instruction Therefore, keep track of addresses as program is scanned Use location counter (LC) CPE 471 27 Symbol Table (Pass #2) Symbols in operand replaced with value Look up symbol in symbol table If there, replace with value Else, ERROR CPE 471 28 Literals Implicit allocation & initialization of memory Allows us to put the value itself right in the instruction Prefice with “=“ Example: LD D3,=#16 This means: allocate storage at end of program initialize this storage with the value #16 use this address in the instruction CPE 471 29 Literal Table Pass #1 To complete pass #1: literals are identified and placed in the table “name”, “value”, and “size” fields updated duplicates can be eliminated literals are added to the end of the program “address” field can now be calculated Pass #2: literals in instructions are replaced with the __________ field from the Literal Table what if the literal is not in the table? CPE 471 30 Pseudo Operations Unlike operations, typically do not have machine instruction (opcode) equivalents Give information to the assembler itself. Another term: assembler directive Not intended to implement higher level control structures (if-then, loops) Uses: segment definition, symbol definition, memory initialization, storage allocation Low level bookeeping easily done by machine CPE 471 31 Psuedo-op Table Also a static table Some lengths are 0, some 1, others? CPE 471 32 Object File Header record: contains information on program length, symbols that other modules may be looking for, and the name of this module. Format: 1 H for header 2-5 Program length in HEX + a space 6-80 List of entry names and locations Entry name (followed by a space). Entry location in HEX 4 columns + a space The first entry should be the first ORG. CPE 471 33 Object File II Text (type T): contains the hex equivalent of the machine instruction. Format: 1 T for text 2-5 Address in HEX where this text should be placed in memory. 6-80 Instructions and/or data in HEX. Each word (byte) should be followed by an allocation character to determine the type: S (Absolute – Does not need modification), R (Relative: needs relocation), and X (external). CPE 471 34 Object File III End (type E): indicates the end of the module and where the execution would begin. Format: Column 1 2-5 E for END Execution start address in HEX CPE 471 35 Information Flow Pass 1/2 CPE 471 36 Two Pass Assembler: Limitations Q: Does our 2-pass approach solve all forwardreference problems? A: no! Something is still broken… CPE 471 37 Forward Reference Restriction Consider: Y EQU 0 X EQU Y To avoid this trouble, impose a restriction: X EQU Y Y EQU 0 ------------- What about DS? Is a forward symbol allowed as the operand? CPE 471 38 Containers What does this mean We can't just generate a machine instruction from each assembly instruction -- must save info We need to start using some data structures! Table means any data structure or container CPE 471 39 Absolute Programs Programmer decides a priori where program will reside e.g., Prog ORIG $3176 But memory is shared concurrent users, concurrent jobs/processes several programs loaded simultaneously CPE 471 40 Absolute Programs: Limitation CPE 471 41 Absolute Programs: Limitation II Would like the loading to be flexible decide at load time where it goes! (not at ____________ time) this decision is made by __________________ What the programmer wants: “find a free slot in memory that is big enough to fit this program” CPE 471 42 Motivating Relocation -Example Prog X Start Y ORIG 0 DC.W Y LD.L D1,X ST.L D1,Y HLT DS.W 1 END Start •In Memory, this program appears as: 0000 0002 0004 0006 0008 000A 000C CPE 471 43 Motivating Relocation -Example * One slight change Prog ORIG $0100 X DC.W Y Start LD.L R1,X ST.L R1,Y HLT Y DS.L 1 END Start •In Memory, this program appears as: 0100 0102 0104 0106 0108 010A 010C CPE 471 44 Relocation The loader must update some (parts of) text records, but not all after load address has been determined The assembler does 2 things: assemble with a load address tell the loader which parts will need to be updated CPE 471 45 Modification Records One approach: define a new record type that identifies the address to be modified We could add the following record to the object file: M address e.g. M 0004 Also need to indicate size of quantity being relocated: Two sizes: 9 bit pgoffset and 16 bit full word M0000_16 M0001_9 M0002_9 One disadvantage of this approach: - CPE 471 46 Alternative: Bit Masks Use 1 bit / memory cell (or byte) Size of relocation data independent of number of records needing modification bit value is 0 means no relocation necessary bit value is 1 means relocation necessary but more densely packed (1 bit / text record) Hard to read (debug, grade,…) CPE 471 47 Kinds of Data Our machine has two flavors of data: relative (to the load address) absolute • The first must be modified, the second not • Let’s look at how these kinds arise… CPE 471 48 Symbols Some are relative: Some are absolute: e.g., e.g., Symbol Table CPE 471 49 Searching We must associate a name with a value. Example: A symbol table is a collection of <key, value> pairs: Search is given a key, return the corresponding value. Very important for assemblers. Every line has an instruction (look up in MOT or POT). Lots of symbols (or literals) used. 50% of time searching tables. CPE 471 50 Searching Cover several methods: Linear Binary Hash (best for assembler) For each method, lots of analysis (i.E. Theory and mathematics) about how long they take and why CPE 471 51 Linear Search: Performance For a table of length N, how long to insert? Constant time: O(1) (eg. fast) Why? Insert at beginning or end How long to find? Best case: 1 (i.e. At first location) Worst case: O(N) (i.e. At last location) average case: (N+1)/2 = O(N). How? CPE 471 52 Linear Search: Performance One way to lower average case is to change the probability: put most likely matches at beginning All this analysis assumes name is in the table If not present, time is N. Average case should consider this too What does this mean? Time to find one item is linear (double size, twice as long). N CPE 471 53 Binary Search This algorithm requires: Sorted table There is a "<" operator Constant time access to arbitrary keys Note: each iteration divides the problem in half How long? Best case: 1. Worst case lg N CPE 471 54 Compromise In the assembler, we do all insertions then the bulk of the searching So insert as a linear table: O(1) Then sort: O(N lg N) Then binary search the sorted table When do we detect duplicates Insertion, or After sort CPE 471 55 Hash Tables Combines advantage of binary search (fast retrieval) and advantage of linear search (fast insertion) Basic idea: we want to use the key as an index in an array so it only takes one step to find something: table["key"] Problem: strings (or any other data type) are not integers. If we gave strings an ordinal value, way too big 128^8 =72 quadrillion CPE 471 56 Hash Tables Solution: Use a function that transforms our data into an integer. This is called the hash function, h: h: K --> Z(m) where Z(n) = {1,2,... m}; m = table size; K = set of possible keys An ideal function generates a unique integer for each key (problem: 72Quadrillion to 4billion integers) CPE 471 57 Hash Table To insert <key, value>, we want to do: To find key: table[h (key)] = value return table[h (key)]; Since |K| >> m what we look for is a hash function h that returns a uniform distribution CPE 471 58 Hash Functions We want to design h so that small differences in key makes a big difference in h(key). H is like a deterministic random number generator. Why? Digital signatures (md5, pgp) sometimes called hash function Need to find the key after insertion CPE 471 59 Hash Function Example h(key) = (sum of letter values) mod 50 + 1 h("magenta") = (13+1+7+5+14+20+1) mod 50 + 1 = (61 mod 50) + 1 = 12 But two different keys can get mapped to the same value: h("rub") = 18+21+2 = 41; h("dream") = 4+18+5+1+13 = 41 This is called a collision. CPE 471 60 Dealing With Collisions Go to next open cell. (X2,X3,X4 hash to same value) X1 X2X3X4 Problem: leads to clustering: Rehash with another, different function: Cost: computation, some clustering CPE 471 61 Dealing With Collisions Keep a linked list X1 X2 X3 X4 CPE 471 62 Hash Tables: Complexity How long does insertion take (w/ chaining) Number of attempts depends on how full the table is - 1st entry never collides Assume there are n elements in table (size is m) Average time for an unsuccessful search: Average time to search to end of a list Average list length: r = n/m Including hash and indexing: O(1 + r) CPE 471 63 Expressions Most assemblers permit use of expressions Used as instruction operands in machine op’s and pseudo op’s Typically simple mathematical forms only Two parts: operators (+, -, *, /) individual terms CPE 471 64 • Individual terms may be: constants (e.g., #4, ‘A’, $3F) user-defined symbols (e.g., X, Buff) special terms (e.g., * for LC) Examples Buff DS 4 Buff DS 4 LD D2, Buff Bend EQU * LD D2, Buff+1 Len CPE 471 EQU Bend - Buff 65 Relocation Expressions are evaluated at ____________ Expressions can be relative or absolute Intuition: (not entirely true… as we’ll see later) the value of an absolute expression _________ __________ with program relocation What are the “rules” for well-formed expressions? CPE 471 66 Absolute Expressions An expression is absolute iff: 1. contains only absolute terms, OR 2. contains relative terms provided: i) they occur in pairs, AND ii) terms in each pair have opposite sign, AND iii) relative terms do not enter in * or / Examples of 1: 3+4+x2 2*X where: Examples of 2: Buff2-Buf CPE 471 67 Relative Expressions An expression is relative iff: 1. all relative terms can be paired as above, except one, AND 2. that remaining unpaired term is positive, AND 3. relative terms do not enter into * or / Examples: Buff+6 CPE 471 68