Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Assembly Language part 1 Some terminology • Assembly language: a low-level language that is a little more human-friendly than machine language • assembler: a program that translates assembly language source code into executable form • object code: machine language program (assembler output) Types of assemblers • Resident assembler: an assembler written for its own platform, using the native instruction set • Cross assembler: assemble run on one platform to produce object code for another • Disassembler: program that attempts to recover source code from object code (not 100% successful) Assembly language instructions • Mnemonics: abbreviated words used instead of machine language hex code; – have one-to-one correspondence with underlying instruction – always possible to determine underlying machine language statement from assembly language mnemonic, but not vice-versa • Pseudo-ops: assembly language statements used mostly for data declaration; do not correspond to specific machine language instructions Pep/8 assembly language • General syntax notes: – one instruction per line of code – comments start with semicolon, continue until end of line – not case-sensitive – Spacing: • at least one space required after each instruction (mnemonic or pseudo-op) • otherwise doesn’t matter – last line of program must be .END pseudo-op Pep/8 Assembly Language • Mnemonic instruction format: – 2-6 letter instruction specifier (most or 3-4 letters) – operand specifier, usually followed by a comma and – 1-3 letter address mode specifier (most are 1) • Examples: LDA 0x0014,i ; load hex value 14 to A LDX 0x1110,d ; load data at address 1110 into x • Entire Pep/8 assembly language instruction set is printed on the inside front cover (and on page 191) of your textbook Pep/8 Assembly Language • Addressing mode specifiers: – i: immediate – d: direct – n: indirect – s: stack-relative – sf: stack-relative deferred – x: indexed – sx: stack-indexed – sxf: stack-indexed deferred Pep/8 Assembly Language • Unimplemented opcodes – instructions available at assembly language level, even though they are not (directly) available at the machine language level – represent operations handled by the operating system • They include: – – – – – NOPn: unary no operation trap NOP: non-unary NOP DECI: decimal input trap DECO: decimal output trap STRO: string output trap Pseudo-ops • .ADDRSS: used to crate labeled jump destinations • .ASCII: specifies char string • .BLOCK: allocates specified # of bytes, initializes whole set to zero • .BURN: used for OS configuration • .BYTE: allocates one byte; can specify hex or decimal content • .END: stop code • .EQUATE: equate symbol with literal value; like # define in C/C++ • .WORD: allocates one word of memory Example program 1 ; Program example 1 CHARO 0x0010 ,d CHARO 0x0011 ,d CHARO 0x0012 ,d CHARO 0x0013 ,d CHARO 0X0014 ,d STOP .ASCII "Arrr!" .END Comment Instructions: each outputs one character; starting address of program is 0000, and each instruction (except STOP) is 3 bytes long; STOP is one byte Data Object code & assembler output from program example 1 Program example 2 go past data – minimizes offset BR 0x0008 calculations (contrast with data after code) .ASCII "#?" “constant” declarations .ASCII "\n" “variable” declaration .BLOCK 2 CHARO 0x0003, d output prompt CHARO 0x0004, d read number DECI 0x0006, d CHARO 0x0005, d output newline character DECO 0x0006, d output number STOP .END Data declaration & storage • The previous example included two different types of data declaration instructions: – the .ASCII pseudo-op is used to allocate a contiguous set of bytes large enough to hold the specified data; as with Java & C++, the backslash character (\) is used as the escape character for control codes, like newline – the .BLOCK pseudo-op allocates the specified number of bytes and initializes their values to 0 Data declaration & storage • Two other pseudo-ops provide data storage: – the .WORD instruction allocates two bytes, suitable for storage of integers – the .BYTE instruction allocates one byte, suitable for storage of characters – like .ASCII (and unlike .BLOCK), both of these instructions allow the programmer to specify initial values, as shown on next slide Initialization examples • .WORD 7 ; allocates 2 bytes, with ; decimal value 7 • .BYTE 0x2B ; allocate 1 byte, with hex ; value 2B (‘+’) I/O instructions • The DECI and DECO instructions considerably ease the process of reading and writing numbers • Each one deals with word-size data, and represent instructions not available in the underlying machine language – thus they are part of the set of unimplemented op codes • The actual I/O is performed by the operating system; the instructions generate program interrupts that allow the OS to temporarily take over to provide a service to the program I/O instructions • The CHARI and CHARO instructions are simply assembly language versions of the machine language input and output instructions: – read or write a byte of data – data source (for output) and destination (for input) are memory (not registers) I/O instructions • STRO is yet another example of an unimplemented op code • Outputs a string of data – String can be predefined with the .ASCII pseudo-op – Predefined string must be terminated with a null character: “\x00” Arranging instructions and data • In the first program example (see Monday’s notes), as with all of the machine language examples, instructions were placed first, ended with a STOP code, and data followed • Problems with this approach: – requires address calculations based on the number of instructions (which may not be known as you’re writing a particular instruction) – addresses may have to be adjusted if even minor changes are made to the program Putting the data first • An easy solution to the problems described on the previous slide was illustrated by the program example; the solution is twofold: – declare the data first – place an unconditional branch instruction at the beginning of the program, pointing to the first instruction after the data – the following example provides another illustration Program example 3 br 0x0020 ; bypass data .block 4 ; space for 2 ints .ascii "Enter a number: \x00" .ascii " + \x00" .ascii " = \x00" stro 0x0007,d ; prompt deci 0x0003,d ; get 1st number stro 0x0007,d ; prompt deci 0x0005,d ; get 2nd number deco 0x0003,d ; output 1st number stro 0x0018,d ; output ascii string " + " deco 0x0005,d ; output 2nd number stro 0x001c,d ; output string " = " lda 0x0003,d ; put the first # in A adda 0x0005,d ; add 2nd # to first sta 0x0003,d ; store sum deco 0x0003,d ; output sum stop .end Program example 4: using labels br code pirate: .ASCII "Arrr!\x00" code: stro pirate ,d STOP .END Symbols • Symbols are assembler names for memory addresses • Can be used to label data or instructions • Syntax rules: – – – – start with letter contain letter & digits 8 characters max CASE sensitive • Define by placing symbol label at start of line, followed by colon Symbol Table • Assembler stores labels & corresponding addresses in lookup table called symbol table • Value of symbol corresponds to 1st byte of memory address (of data or instruction) • Symbol table only stores label & address, not nature of what is stored • Instruction can still be interpreted as data, & vice versa Example Example program: this: deco this, d stop .end Output: 14592 What happened? High Level Languages & Compilers • Compilers translate high level language code into low level language; may be: – machine language – assembly language – for the latter an additional translation step is required to make the program executable C++/Java example // C++ code: #include <iostream.h> #include <string> string greeting = “Hello world”; int main () { cout << greeting << endl; return 0; } // Java code: public class Hello { static String greeting = “Hello world”; public static void main (String [] args) { System.out.print (greeting); System.out.print(‘\n’); } } Assembly language (approximate) equivalent br main greeting: .ASCII "Hello world \x00" main: stro greeting, d charo '\n', i stop .end Data types • In a high level language, such as Java or C++, variables have the following characteristics: – Name – Value – Data Type • At a lower level (assembly or machine language), a variable is just a memory location • The compiler generates a symbol table to keep track of high level language variables Symbol table entries The illustration drove shows a snippet of output from the Pep/8 assembler. Each symbol table entry includes: • the symbol • the value (of the symbol’s start address) • the type (.ASCII in this case) Pep/8 Branching instructions • We have already seen the use of BR, the unconditional branch instruction • Pep/8 also includes 8 conditional branch instructions; these are used to create assembly language control structures • These instructions are described on the next couple of slides Conditional branching instructions • BRLE: – branch on less than or equal – how it works: if N or Z is 1, PC = operand • BRLT: – branch on less than – how it works: if N is 1, PC = operand • BREQ: – branch on equal – how it works: if Z is 1, PC = operand • BRNE: – branch on not equal – how it works: if Z is 0, PC = operand Conditional branching instructions • BRGE: – branch on greater than or equal – if N is 0, PC = operand • BRGT: – branch on greater than – if N and Z are 0, PC = operand • BRV: – branch if overflow – if V is 1, PC = operand • BRC: – branch if carry – if C is 1, PC = operand Example HLL code: int num; Scanner kb = new Scanner(); System.out.print (“Enter a number: ”); num = kb.nextInt(); if (num < 0) num = -num; System.out.print(num); Pep/8 code: br main num: .block 2 prompt: .ascii "Enter a number: \x00" main: stro prompt, d deci num, d lda num, d brge endif lda num, d nega ; negate value in a sta num, d endif: deco num, d stop .end Analysis of example Pep/8 code: br main num: .block 2 prompt: .ascii "Enter a number: \x00" main: stro prompt, d deci num, d lda num, d brge endif lda num, d nega ; negate value in a sta num, d endif: deco num, d stop .end The if statement, if translated back to Java, would now be more like: if (num >= 0); else num = -num; This part requires a little more explanation; see next slide Analysis continued • A compiler must be programmed to translate assignment statements; a reasonable translation of x = 3 might be: – load a value into the accumulator – evaluate the expression – store result to variable • In the case above (and in the assembly language code on the previous page), evaluation of the expression isn’t necessary, since the initial value loaded into A is the only value involved (the second load is really the evaluation of the expression) Compiler types and efficiency • An optimizing compiler would perform the necessary source code analysis to recognize that the second load is extraneous – advantage: end product (executable code) is shorter & faster – disadvantage: takes longer to compile • So, an optimizing compiler is good for producing the end product, or a product that will be executed many times (for testing); a nonoptimizing compiler, because it does the translation quickly, is better for mid-development Another example HLL code: Pep/8 code: final int limit = 100; int num; br main limit: .equate 100 num: .block 2 high: .ascii "high\x00" low: .ascii "low\x00" prompt: .ascii "Enter a #: \x00" main: stro prompt, d deci num, d if: lda num, d cpa limit, i brlt else stro high, d br endif else: stro low, d endif: stop .end System.out.print(“Enter a #: ”); if (num >= limit) System.out.print(“high”); else System.out.print(“low”); Compare instruction: cpr where r is a register (a or x): action same as subr except difference (result) isn’t stored in the register – just sets status bits – if N or Z is 0, <= is true Writing loops in assembly language • As we have seen, an if or if/else structure in assembly language involves a comparison and then a (possible) branch forward to another section of code • A loop structure is actually more like its high level language equivalent; for a while loop, the algorithm is: – perform comparison; branch forward if condition isn’t met (loop ends) – otherwise, perform statements in loop body – perform unconditional branch back to comparison Example • The following example shows a C++ program (because this is easier to demonstrate in C++ than in Java) that performs the following algorithm: – prompt for input (of a string) – read one character – while (character != end character (‘*’)) • write out character • read next character C++ code #include <iostream.h> int main () { char ch; cout << “Enter a line of text, ending with *” << endl; cin.get (ch); while (ch != ‘*’) { cout << ch; cin.get(ch); } return 0; } ;am3ex5 br main ch: prompt: main: while: endW: .end Pep/8 code .block 1 .ascii "Enter a line of text ending with *\n\x00" stro prompt, d chari ch, d ; initial read lda 0x0000, i ; clear accumulator ldbytea ch, d ; load ch into A cpa '*', i breq endW charo ch, d chari ch, d ; read next letter br while stop Do/while loop • Post-test loop: condition test occurs after iteration • Premise of sample program: – cop is sitting at a speed trap – speeder drives by – within 2 seconds, cop starts following, going 5 meters/second faster – how far does the cop travel before catching up with the speeder? C++ version of speedtrap #include <iostream.h> #include <stdlib.h> int main() { int copDistance = 0; // cop is sitting still int speeder; // speeder's speed: entered by user int speederDistance; // distance speeder travels from cop’s position cout << "How fast is the driver going? (Enter whole #): "; cin >> speeder; speederDistance = speeder; do { copDistance += speeder + 5; speederDistance += speeder; } while (copDistance < speederDistance); cout << "Cop catches up to speeder in " << copDistance << " meters." << endl; return 0; } Pep/8 version ;speedTrap br main cop: .block 2 drvspd: .block 2 drvpos: .block 2 prompt: .ascii "How fast is the driver going? (Enter whole #): \x00" outpt1: .ascii "Cop catches up to speeder in \x00" outpt2: .ascii " meters\n\x00" main: lda 0, i sta cop, d stro prompt, d deci drvspd, d ; cin >> speeder; ldx drvspd, d ; speederDistance = speeder; stx drvpos, d Pep/8 version continued do: lda 5, i adda drvspd, d adda cop, d sta cop, d addx drvspd, d stx drvpos, d while: lda cop, d cpa drvpos, d brlt do stro outpt1, d deco cop, d stro outpt2, d stop .end ; copDistance += speeder + 5; ; speederDistance += speeder; ; while (copDistance < ; speederDistance); For loops • For loop is just a count-controlled while loop • Next example illustrates nested for loops C++ version #include <iostream.h> #include <stdlib.h> int main() { int x,y; for (x=0; x < 4; x++) { for (y = x; y > 0; y--) cout << "* "; cout << endl; } return 0; } Pep/8 version ;nestfor br main x: .word 0x0000 y: .word 0x0000 main: sta x, d stx y, d outer: adda 1, i cpa 5, i breq endo sta x, d ldx x, d inner: charo '*', i charo ' ', i subx 1, i cpx 0, i brne inner charo '\n', i br outer endo: stop .end Notes on control structures • It’s possible to create “control structures” in assembly language that don’t exist at a higher level • Your text describes such a structure, illustrated and explained on the next slide A control structure not found in nature • Condition C1 is tested; if true, branch to middle of loop (S3) • After S3 (however you happen to get there – via branch from C1 or sequentially, from S2) test C2 • If C2 is true, branch to top of loop • No way to do this in C++ or Java (at least, not without the dreaded goto statement) High level language programs vs. assembly language programs • If you’re talking about pure speed, a program in assembly language will almost always beat one that originated in a high level language • Assembly and machine language programs produced by a compiler are almost always longer and slower • So why use high level languages (besides the fact that assembly language is a pain in the patoot) Why high level languages? • Type checking: – data types sort of exist at low level, but the assembler doesn’t check your syntax to ensure you’re using them correctly – can attempt to DECO a string, for example • Encourages structured programming Structured programming • Flow of control in program is limited to nestings of if/else, switch, while, do/while and for statements • Overuse of branching instructions leads to spaghetti code Unstructured branching • Advantage: can lead to faster, smaller programs • Disadvantage: Difficult to understand – and debug – and maintain – and modify • Structured flow of control is newer idea than branching; a form of branching with gotos by another name lives on in the Java/C++ switch/case structure Evolution of structured programming • First widespread high level language was FORTRAN; it introduced a new conditional branch statement: if (expression) GOTO new location • Considered improvement over assembly language – combined CPr and BR statements • Still used opposite logic: if (expression not true) branch else // if-related statements here branch past else else: // else-related statements here destination for if branch Block-structured languages • ALGOL-60 (introduced in 1960 – hey, me too) featured first use of program blocks for selection/iteration structures • Descendants of ALGOL include C, C++, and Java Structured Programming Theorem • Any algorithm containing GOTOs can be written using only nested ifs and while loops (proven back in 1966) • In 1968, Edsgar Dijkstra wrote a famous letter to the editor of Communications of the ACM entitled “gotos considered harmful” – considered the structured programming manifesto • It turns out that structured code is less expensive to develop, debug and maintain than unstructured code – even factoring in the cost of additional memory requirements and execution time