Download Assembly Language - Kirkwood Community College

Assembly Language part 1 Some terminology • Assembly language: a low-level language that is a little more human-friendly than machine language • assembler: a program that translates assembly language source code into executable form • object code: machine language program (assembler output) Types of assemblers • Resident assembler: an assembler written for its own platform, using the native instruction set • Cross assembler: assemble run on one platform to produce object code for another • Disassembler: program that attempts to recover source code from object code (not 100% successful) Assembly language instructions • Mnemonics: abbreviated words used instead of machine language hex code; – have one-to-one correspondence with underlying instruction – always possible to determine underlying machine language statement from assembly language mnemonic, but not vice-versa • Pseudo-ops: assembly language statements used mostly for data declaration; do not correspond to specific machine language instructions Pep/8 assembly language • General syntax notes: – one instruction per line of code – comments start with semicolon, continue until end of line – not case-sensitive – Spacing: • at least one space required after each instruction (mnemonic or pseudo-op) • otherwise doesn’t matter – last line of program must be .END pseudo-op Pep/8 Assembly Language • Mnemonic instruction format: – 2-6 letter instruction specifier (most or 3-4 letters) – operand specifier, usually followed by a comma and – 1-3 letter address mode specifier (most are 1) • Examples: LDA 0x0014,i ; load hex value 14 to A LDX 0x1110,d ; load data at address 1110 into x • Entire Pep/8 assembly language instruction set is printed on the inside front cover (and on page 191) of your textbook Pep/8 Assembly Language • Addressing mode specifiers: – i: immediate – d: direct – n: indirect – s: stack-relative – sf: stack-relative deferred – x: indexed – sx: stack-indexed – sxf: stack-indexed deferred Pep/8 Assembly Language • Unimplemented opcodes – instructions available at assembly language level, even though they are not (directly) available at the machine language level – represent operations handled by the operating system • They include: – – – – – NOPn: unary no operation trap NOP: non-unary NOP DECI: decimal input trap DECO: decimal output trap STRO: string output trap Pseudo-ops • .ADDRSS: used to crate labeled jump destinations • .ASCII: specifies char string • .BLOCK: allocates specified # of bytes, initializes whole set to zero • .BURN: used for OS configuration • .BYTE: allocates one byte; can specify hex or decimal content • .END: stop code • .EQUATE: equate symbol with literal value; like # define in C/C++ • .WORD: allocates one word of memory Example program 1 ; Program example 1 CHARO 0x0010 ,d CHARO 0x0011 ,d CHARO 0x0012 ,d CHARO 0x0013 ,d CHARO 0X0014 ,d STOP .ASCII "Arrr!" .END Comment Instructions: each outputs one character; starting address of program is 0000, and each instruction (except STOP) is 3 bytes long; STOP is one byte Data Object code & assembler output from program example 1 Program example 2 go past data – minimizes offset BR 0x0008 calculations (contrast with data after code) .ASCII "#?" “constant” declarations .ASCII "\n" “variable” declaration .BLOCK 2 CHARO 0x0003, d output prompt CHARO 0x0004, d read number DECI 0x0006, d CHARO 0x0005, d output newline character DECO 0x0006, d output number STOP .END Data declaration & storage • The previous example included two different types of data declaration instructions: – the .ASCII pseudo-op is used to allocate a contiguous set of bytes large enough to hold the specified data; as with Java & C++, the backslash character (\) is used as the escape character for control codes, like newline – the .BLOCK pseudo-op allocates the specified number of bytes and initializes their values to 0 Data declaration & storage • Two other pseudo-ops provide data storage: – the .WORD instruction allocates two bytes, suitable for storage of integers – the .BYTE instruction allocates one byte, suitable for storage of characters – like .ASCII (and unlike .BLOCK), both of these instructions allow the programmer to specify initial values, as shown on next slide Initialization examples • .WORD 7 ; allocates 2 bytes, with ; decimal value 7 • .BYTE 0x2B ; allocate 1 byte, with hex ; value 2B (‘+’) I/O instructions • The DECI and DECO instructions considerably ease the process of reading and writing numbers • Each one deals with word-size data, and represent instructions not available in the underlying machine language – thus they are part of the set of unimplemented op codes • The actual I/O is performed by the operating system; the instructions generate program interrupts that allow the OS to temporarily take over to provide a service to the program I/O instructions • The CHARI and CHARO instructions are simply assembly language versions of the machine language input and output instructions: – read or write a byte of data – data source (for output) and destination (for input) are memory (not registers) I/O instructions • STRO is yet another example of an unimplemented op code • Outputs a string of data – String can be predefined with the .ASCII pseudo-op – Predefined string must be terminated with a null character: “\x00” Arranging instructions and data • In the first program example (see Monday’s notes), as with all of the machine language examples, instructions were placed first, ended with a STOP code, and data followed • Problems with this approach: – requires address calculations based on the number of instructions (which may not be known as you’re writing a particular instruction) – addresses may have to be adjusted if even minor changes are made to the program Putting the data first • An easy solution to the problems described on the previous slide was illustrated by the program example; the solution is twofold: – declare the data first – place an unconditional branch instruction at the beginning of the program, pointing to the first instruction after the data – the following example provides another illustration Program example 3 br 0x0020 ; bypass data .block 4 ; space for 2 ints .ascii "Enter a number: \x00" .ascii " + \x00" .ascii " = \x00" stro 0x0007,d ; prompt deci 0x0003,d ; get 1st number stro 0x0007,d ; prompt deci 0x0005,d ; get 2nd number deco 0x0003,d ; output 1st number stro 0x0018,d ; output ascii string " + " deco 0x0005,d ; output 2nd number stro 0x001c,d ; output string " = " lda 0x0003,d ; put the first # in A adda 0x0005,d ; add 2nd # to first sta 0x0003,d ; store sum deco 0x0003,d ; output sum stop .end Program example 4: using labels br code pirate: .ASCII "Arrr!\x00" code: stro pirate ,d STOP .END Symbols • Symbols are assembler names for memory addresses • Can be used to label data or instructions • Syntax rules: – – – – start with letter contain letter & digits 8 characters max CASE sensitive • Define by placing symbol label at start of line, followed by colon Symbol Table • Assembler stores labels & corresponding addresses in lookup table called symbol table • Value of symbol corresponds to 1st byte of memory address (of data or instruction) • Symbol table only stores label & address, not nature of what is stored • Instruction can still be interpreted as data, & vice versa Example Example program: this: deco this, d stop .end Output: 14592 What happened? High Level Languages & Compilers • Compilers translate high level language code into low level language; may be: – machine language – assembly language – for the latter an additional translation step is required to make the program executable C++/Java example // C++ code: #include <iostream.h> #include <string> string greeting = “Hello world”; int main () { cout << greeting << endl; return 0; } // Java code: public class Hello { static String greeting = “Hello world”; public static void main (String [] args) { System.out.print (greeting); System.out.print(‘\n’); } } Assembly language (approximate) equivalent br main greeting: .ASCII "Hello world \x00" main: stro greeting, d charo '\n', i stop .end Data types • In a high level language, such as Java or C++, variables have the following characteristics: – Name – Value – Data Type • At a lower level (assembly or machine language), a variable is just a memory location • The compiler generates a symbol table to keep track of high level language variables Symbol table entries The illustration drove shows a snippet of output from the Pep/8 assembler. Each symbol table entry includes: • the symbol • the value (of the symbol’s start address) • the type (.ASCII in this case) Pep/8 Branching instructions • We have already seen the use of BR, the unconditional branch instruction • Pep/8 also includes 8 conditional branch instructions; these are used to create assembly language control structures • These instructions are described on the next couple of slides Conditional branching instructions • BRLE: – branch on less than or equal – how it works: if N or Z is 1, PC = operand • BRLT: – branch on less than – how it works: if N is 1, PC = operand • BREQ: – branch on equal – how it works: if Z is 1, PC = operand • BRNE: – branch on not equal – how it works: if Z is 0, PC = operand Conditional branching instructions • BRGE: – branch on greater than or equal – if N is 0, PC = operand • BRGT: – branch on greater than – if N and Z are 0, PC = operand • BRV: – branch if overflow – if V is 1, PC = operand • BRC: – branch if carry – if C is 1, PC = operand Example HLL code: int num; Scanner kb = new Scanner(); System.out.print (“Enter a number: ”); num = kb.nextInt(); if (num < 0) num = -num; System.out.print(num); Pep/8 code: br main num: .block 2 prompt: .ascii "Enter a number: \x00" main: stro prompt, d deci num, d lda num, d brge endif lda num, d nega ; negate value in a sta num, d endif: deco num, d stop .end Analysis of example Pep/8 code: br main num: .block 2 prompt: .ascii "Enter a number: \x00" main: stro prompt, d deci num, d lda num, d brge endif lda num, d nega ; negate value in a sta num, d endif: deco num, d stop .end The if statement, if translated back to Java, would now be more like: if (num >= 0); else num = -num; This part requires a little more explanation; see next slide Analysis continued • A compiler must be programmed to translate assignment statements; a reasonable translation of x = 3 might be: – load a value into the accumulator – evaluate the expression – store result to variable • In the case above (and in the assembly language code on the previous page), evaluation of the expression isn’t necessary, since the initial value loaded into A is the only value involved (the second load is really the evaluation of the expression) Compiler types and efficiency • An optimizing compiler would perform the necessary source code analysis to recognize that the second load is extraneous – advantage: end product (executable code) is shorter & faster – disadvantage: takes longer to compile • So, an optimizing compiler is good for producing the end product, or a product that will be executed many times (for testing); a nonoptimizing compiler, because it does the translation quickly, is better for mid-development Another example HLL code: Pep/8 code: final int limit = 100; int num; br main limit: .equate 100 num: .block 2 high: .ascii "high\x00" low: .ascii "low\x00" prompt: .ascii "Enter a #: \x00" main: stro prompt, d deci num, d if: lda num, d cpa limit, i brlt else stro high, d br endif else: stro low, d endif: stop .end System.out.print(“Enter a #: ”); if (num >= limit) System.out.print(“high”); else System.out.print(“low”); Compare instruction: cpr where r is a register (a or x): action same as subr except difference (result) isn’t stored in the register – just sets status bits – if N or Z is 0, <= is true Writing loops in assembly language • As we have seen, an if or if/else structure in assembly language involves a comparison and then a (possible) branch forward to another section of code • A loop structure is actually more like its high level language equivalent; for a while loop, the algorithm is: – perform comparison; branch forward if condition isn’t met (loop ends) – otherwise, perform statements in loop body – perform unconditional branch back to comparison Example • The following example shows a C++ program (because this is easier to demonstrate in C++ than in Java) that performs the following algorithm: – prompt for input (of a string) – read one character – while (character != end character (‘*’)) • write out character • read next character C++ code #include <iostream.h> int main () { char ch; cout << “Enter a line of text, ending with *” << endl; cin.get (ch); while (ch != ‘*’) { cout << ch; cin.get(ch); } return 0; } ;am3ex5 br main ch: prompt: main: while: endW: .end Pep/8 code .block 1 .ascii "Enter a line of text ending with *\n\x00" stro prompt, d chari ch, d ; initial read lda 0x0000, i ; clear accumulator ldbytea ch, d ; load ch into A cpa '*', i breq endW charo ch, d chari ch, d ; read next letter br while stop Do/while loop • Post-test loop: condition test occurs after iteration • Premise of sample program: – cop is sitting at a speed trap – speeder drives by – within 2 seconds, cop starts following, going 5 meters/second faster – how far does the cop travel before catching up with the speeder? C++ version of speedtrap #include <iostream.h> #include <stdlib.h> int main() { int copDistance = 0; // cop is sitting still int speeder; // speeder's speed: entered by user int speederDistance; // distance speeder travels from cop’s position cout << "How fast is the driver going? (Enter whole #): "; cin >> speeder; speederDistance = speeder; do { copDistance += speeder + 5; speederDistance += speeder; } while (copDistance < speederDistance); cout << "Cop catches up to speeder in " << copDistance << " meters." << endl; return 0; } Pep/8 version ;speedTrap br main cop: .block 2 drvspd: .block 2 drvpos: .block 2 prompt: .ascii "How fast is the driver going? (Enter whole #): \x00" outpt1: .ascii "Cop catches up to speeder in \x00" outpt2: .ascii " meters\n\x00" main: lda 0, i sta cop, d stro prompt, d deci drvspd, d ; cin >> speeder; ldx drvspd, d ; speederDistance = speeder; stx drvpos, d Pep/8 version continued do: lda 5, i adda drvspd, d adda cop, d sta cop, d addx drvspd, d stx drvpos, d while: lda cop, d cpa drvpos, d brlt do stro outpt1, d deco cop, d stro outpt2, d stop .end ; copDistance += speeder + 5; ; speederDistance += speeder; ; while (copDistance < ; speederDistance); For loops • For loop is just a count-controlled while loop • Next example illustrates nested for loops C++ version #include <iostream.h> #include <stdlib.h> int main() { int x,y; for (x=0; x < 4; x++) { for (y = x; y > 0; y--) cout << "* "; cout << endl; } return 0; } Pep/8 version ;nestfor br main x: .word 0x0000 y: .word 0x0000 main: sta x, d stx y, d outer: adda 1, i cpa 5, i breq endo sta x, d ldx x, d inner: charo '*', i charo ' ', i subx 1, i cpx 0, i brne inner charo '\n', i br outer endo: stop .end Notes on control structures • It’s possible to create “control structures” in assembly language that don’t exist at a higher level • Your text describes such a structure, illustrated and explained on the next slide A control structure not found in nature • Condition C1 is tested; if true, branch to middle of loop (S3) • After S3 (however you happen to get there – via branch from C1 or sequentially, from S2) test C2 • If C2 is true, branch to top of loop • No way to do this in C++ or Java (at least, not without the dreaded goto statement) High level language programs vs. assembly language programs • If you’re talking about pure speed, a program in assembly language will almost always beat one that originated in a high level language • Assembly and machine language programs produced by a compiler are almost always longer and slower • So why use high level languages (besides the fact that assembly language is a pain in the patoot) Why high level languages? • Type checking: – data types sort of exist at low level, but the assembler doesn’t check your syntax to ensure you’re using them correctly – can attempt to DECO a string, for example • Encourages structured programming Structured programming • Flow of control in program is limited to nestings of if/else, switch, while, do/while and for statements • Overuse of branching instructions leads to spaghetti code Unstructured branching • Advantage: can lead to faster, smaller programs • Disadvantage: Difficult to understand – and debug – and maintain – and modify • Structured flow of control is newer idea than branching; a form of branching with gotos by another name lives on in the Java/C++ switch/case structure Evolution of structured programming • First widespread high level language was FORTRAN; it introduced a new conditional branch statement: if (expression) GOTO new location • Considered improvement over assembly language – combined CPr and BR statements • Still used opposite logic: if (expression not true) branch else // if-related statements here branch past else else: // else-related statements here destination for if branch Block-structured languages • ALGOL-60 (introduced in 1960 – hey, me too) featured first use of program blocks for selection/iteration structures • Descendants of ALGOL include C, C++, and Java Structured Programming Theorem • Any algorithm containing GOTOs can be written using only nested ifs and while loops (proven back in 1966) • In 1968, Edsgar Dijkstra wrote a famous letter to the editor of Communications of the ACM entitled “gotos considered harmful” – considered the structured programming manifesto • It turns out that structured code is less expensive to develop, debug and maintain than unstructured code – even factoring in the cost of additional memory requirements and execution time

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Assembly Language - Kirkwood Community College