Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSE 360: Introduction to Computer Systems Course Notes Rick Parent ([email protected]) http://www.cse.ohio-state.edu/~parent Wayne Heym ([email protected]) http://www.cse.ohio-state.edu/~heym Copyright © 1998-2005 by Rick Parent, Todd Whittaker, Bettina Bair, Pete Ware, Wayne Heym CSE360 1 Information Representation 1 Positional Number Systems: position of character in string indicates a power of the base (radix). Common bases: 2, 8, 10, 16. (What base are we using to express the names of these bases?) – Base ten (decimal): digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 form the alphabet of the decimal system. E.g., 31610 = – Base eight (octal): digits 0, 1, 2, 3, 4, 5, 6, 7 form the alphabet. CSE360 E.g., 4748 = 2 Information Representation 2 – Base 16 (hexadecimal): digits 0-9 and A-F. E.g., 13C16 = – Base 2 (binary): digits (called “bits”) 0, 1 form the alphabet. E.g., 100110 = – In general, radix r representations use the first r chars in {0…9, A...Z} and have the form dn-1dn-2…d1d0. Summing dn-1rn-1 + dn-2rn-2 + … + d0r0 will convert to base 10. Why to base 10? CSE360 3 Information Representation 3 Base Conversions – Convert to base 10 by multiplication of powers E.g., 100125 = ( )10 – Convert from base 10 by repeated division E.g., 63210 = ( )8 – Converting base x to base y: convert base x to base 10 then convert base 10 to base y CSE360 4 Information Representation 4 – Special case: converting among binary, octal, and hexadecimal is easier CSE360 Go through the binary representation, grouping in sets of 3 or 4. E.g., 110110012 = 11 011 001 = 3318 110110012 = 1101 1001 = D916 E.g., C3B16 = ( )8 5 Information Representation 5 What is special about binary? – The basic component of a computer system is a transistor (transfer resistor): a two state device which switches between logical “1” and “0” (actually represented as voltages on the range 5V to 0V). – Octal and hexadecimal are bases in powers of 2, and are used as a shorthand way of writing binary. A hexadecimal digit represents 4 bits, half of a byte. 1 byte = 8 bits. A bit is a binary digit. – Get comfortable converting among decimal, binary, octal, hexadecimal. Converting from decimal to hexadecimal (or binary) is easier going through octal. CSE360 6 Information Representation 6 Binary Hex Decimal Binary Hex Decimal 0000 0 0 1000 8 8 0001 1 1 1001 9 9 0010 2 2 1010 A 10 0011 3 3 1011 B 11 0100 4 4 1100 C 12 0101 5 5 1101 D 13 0110 6 6 1110 E 14 0111 7 7 1111 F 15 CSE360 7 Information Representation 7 Ranges of values – Q: Given k positions in base n, how many values can you represent? – A: nk values over the range (0…nk-1)10 n=10, k=3: 103=1000 range is (0…999)10 n=2, k=8: 28=256 range is (0…255)10 n=16, k=4: 164=65536 range is (0…65535)10 – Q: How are negative numbers represented? CSE360 8 Information Representation 8 Integer representation: – Value and representation are distinct. E.g., 12 may be represented as XII, C16, 1210, and 11002. Note: -12 may be represented as -C16, -1210, and -11002. – Simple and efficient use of hardware implies using a specific number of bits, e.g., a 32-bit string, in a binary encoding. Such an encoding is “fixed width.” – Four methods: (fixed-width) simple binary, signed magnitude, binary coded decimal, and 2’s complement. – Simple binary: as seen before, all numbers are assumed to be positive, e.g., 8-bit representation of 6610 = 0100 00102 and 19410 = 1100 00102 CSE360 9 Information Representation 9 – Signed magnitude: simple binary with leading sign bit. 0 = positive, 1 = negative. E.g., 8-bit signed mag.: 6610 = 0100 00102 -6610 = 1100 00102 What ranges of numbers may be expressed in 8 bits? Largest: Smallest: Extend 1100 0010 to 12 bits: CSE360 10 Information Representation 10 Problems: (1) Compare the signed magnitude numbers 1000 0000 and 0000 0000. (2) Must have “subtraction” hardware in addition to “addition” hardware. – Binary Coded Decimal (BCD): use a 4 bit pattern to express each digit of a base 10 number 0000 = 0 0100 = 4 1000 = 8 E.g., CSE360 0001 = 1 0101 = 5 1001 = 9 0010 = 2 0110 = 6 1010 = + 0011 = 3 0111 = 7 1011 = - 123 : 0000 0001 0010 0011 +123 : 1010 0001 0010 0011 -123 : 1011 0001 0010 0011 11 Information Representation 11 BCD Disadvantages: – Takes more memory. 32 bit simple binary can represent more than 4 billion discrete values. 32 bit BCD can hold a sign and 7 digits (or 8 digits for unsigned values) for a maximum of 110 million values, a 97% reduction. – More difficult to do arithmetic. Essentially, we must force the Base 2 computer to do Base 10 arithmetic. BCD Advantages: – Used in business machines and languages, i.e., in COBOL for precise decimal math. – Can have arrays of BCD numbers for essentially arbitrary precision arithmetic. CSE360 12 Information Representation 12 – Two’s Complement CSE360 Used by most machines and languages to represent integers. Fixes the -0 in the signed magnitude, and simplifies machine hardware arithmetic. Divides bit patterns into a positive half and a negative half (with zero considered positive); n bits creates a range of [-2n-1… 2n-1 -1]. CODE Simple Signed 2’s comp 0000 0 +0 0 0001 1 1 1 0010 2 2 2 0011 3 3 3 0100 4 4 4 0101 5 5 5 0110 6 6 6 0111 7 7 7 1000 8 -0 -8 1001 9 -1 -7 1010 10 -2 -6 1011 11 -3 -5 1100 12 -4 -4 1101 13 -5 -3 1110 14 -6 -2 1111 15 -7 -1 13 Information Representation 13 – Representation in 2’s complement; i.e., represent i in n-bit 2’s complement, where -2 n-1 i +2 n-1-1 Nonnegative numbers: same as simple binary Negative numbers: – Obtain the n-bit simple binary equivalent of | i | – Obtain its negation as follows: • Invert the bits of that representation • Add 1 to the result CSE360 Ex.: convert -32010 to 16-bit 2’s complement Ex.: extend the 12-bit 2’s complement number 1101 0111 1000 to 16 bits. 14 Information Representation 14 Binary Arithmetic – Addition and subtraction only for now – Rules: similar to standard addition and subtraction, but only working with 0 and 1. 0+0=0 1+0=1 0+1=1 1 + 1 = 10 0-0=0 1-0=1 1-1=0 10 - 1 = 1 – Must be aware of possible overflow. CSE360 Ex.: 8-bit signed magnitude 0101 0110 + 0110 0011 = Ex.: 8-bit signed magnitude 0101 0110 - 0110 0011 = 15 Information Representation 15 2’s Complement binary arithmetic – Addition and subtraction are the same operation – Still must be aware of overflow. CSE360 Ex.: 8 bit 2’s complement: 2310 + 4510 = Ex.: 8 bit 2’s complement: 10010 + 4510 = Ex.: 8 bit 2’s complement: 2310 - 4510 = 16 Information Representation 16 – 2’s Complement overflow CSE360 Opposite signs on operands can’t overflow If operand signs are same, but result’s sign is different, must have overflow Can two positives sum to positive and still have overflow? Can two negatives? 17 Information Representation 17 Characters and Strings – EBCDIC, Extended Binary Coded Decimal Interchange Code Used by IBM in mainframes (360 architecture and descendants). Earliest system – ASCII, American Standard Code for Information Interchange. Most common system – Unicode, http://www.unicode.org CSE360 New international standard Variable length encoding scheme with either 8- or 16-bit minimum “a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.” 18 Information Representation 18 ASCII – see table 1.7 on pg. 18. In Unix, run “man ascii”. – 7 bit code Printable characters for human interactions Control characters for non-human communication (computercomputer, computer-peripheral, etc.) – 8-bit code: most significant bit may be set CSE360 Extended ASCII (IBM), includes graphical symbols and lines ISO 8859, several international standards Unicode’s UTF-8, variable length code with 8-bit minimum 19 ASCII Easy to decode – But takes up a predictable amount of space Upper and lower case characters are 0x20 (3210) apart ASCII representation of ‘3’ is not the same as the binary representation of 3. – To convert ASCII to binary (an integer), ‘3’-‘0’ = 3 Line feed (LF) character – 000 10102 = 0x0a = 1010 – ‘\n’ = 0xa CSE360 Character ‘ ’ ‘A’ ‘a’ ‘R’ ‘r’ ‘0’ ‘3’ ASCII Binary 010 100 110 101 111 011 011 0000 0001 0001 0010 0010 0000 0011 ASCII Hex 0x20 0x41 0x61 0x52 0x72 0x30 0x33 20 Information Representation 19 String: definition is programming language dependent. – C, C++: strings are arrays of characters terminated by a null byte. Decode: 1000001, 1010011, 1000011, 1001001, 1001001, 0100000, 1101001, 1110011, 0100000, 1100101, 1100001, 1110011, 1111001, 0000000 – Or (in hex): 41 53 43 49 49 20 69 73 20 65 61 73 79 00 How many bytes is this? What’s the use of the ’00’ byte at the end? CSE360 21 Information Representation 20 Simple data compression – ASCII codes are fixed length. – Huffman codes are variable length and based on statistics of the data to be transmitted. Assign the shortest encoding to the most common character. – In English, the letter ‘e’ is the most common. – Either establish a Huffman code for an entire class of messages, – Or create a new Huffman code for each message, sending/storing both the coding scheme and the message. CSE360 “a widely used and very effective technique for compressing data; savings of 20% to 90% are typical, depending on the characteristics of the file being compressed.” (Cormen, p. 337) 22 ECL - Expected Code Length Char Fixed len encoding Freq Var len encoding # bits Expected # bits 00 .5 1 1 .5 01 .25 01 2 .5 10 .15 001 3 .45 11 .10 000 3 .3 Avg len 2 CSE360 1.75 23 Information Representation 21 Huffman Tree for “a man a plan a canal panama” – Examine data set and determine frequencies of letters (example ignores spaces, normally significant) ‘a’ ‘c’ ‘l’ ‘m’ ‘n’ ‘p’ Count Frequency 10 1 2 2 4 2 0.476190 0.047619 0.095238 0.095238 0.190476 0.095238 – Create a forest of single node trees. Choose the two trees having the smallest total frequencies (the two “smallest” trees), and merge them together (lesser frequency as the left subtree, for definiteness, to make grading easier). Continue merging until only one tree remains. CSE360 24 Information Representation 22 Reading a ‘1’ calls for following the left branch. Reading a ‘0’ calls for following the right branch. Decoding using the tree: To decode ‘0001’, start at root and follow r_child, r_child, r_child, l_child, revealing encoded ‘m’. Huffman Tree for "a man a plan a canal panama" 1.0 'a' .4762 'n' .1905 .3333 .1428 'c' .0476 CSE360 .5238 'l' .0952 .1905 'm' .0952 'p' .0952 25 Information Representation 23 Comparison of Huffman and 3-bit code example – 3-bit: 000 011000100 000 101010000100 000 001000100000010 101000100000011000 = 63 bits – Huffman: 1 0001101 1 00000010101 1 001110110010 0000101100011 = 46 bits – Savings of 17 bits, or 27% of original message ‘a’ 3-bit code 000 Huffman Code 1 Count 10 H length 10 3 length 30 ‘c’ ‘l’ ‘m’ ‘n’ 001 010 0011 0010 1 2 4 8 3 6 011 100 0001 01 2 4 8 8 6 12 101 0000 2 8 6 46 63 ‘p’ Totals CSE360 26 Parity: Simple error detection Data transmission, aging media, static interference, dust on media, etc. demand the ability to detect errors. Single bit errors detected by using parity checking. Parity, here, is the “the state of being odd or even.” CSE360 27 Information Representation 24 – How to detect a 1-bit error: Ex.: send ASCII ‘S’: send 1010011, but receive 1010010? Add a 1-bit parity to make an odd or even number of bits per byte. ASCII Even parity Odd Parity CSE360 ‘S’ 101 0011 0101 0011 1101 0011 ‘E’ 100 0101 1100 0101 0100 0101 Parity bit is stripped by hardware after checking. Sender/receiver both agree to odd or even parity. 2 flipped bits in the same encoding are not detected. 28 Information Representation 25 Two meanings for Hamming distance. 2nd is generalization of 1st. 1st is: distance between two encodings of the same length. 1. A count of the number of bits different in encoding 1 vs. encoding 2. E.g., dist(1100, 1001) = dist(0101, 1101) = 2. Generalize to an entire code by taking the minimum over all distinct pairs (2nd meaning). – The ASCII encoding scheme has a Hamming distance of 1. – A simple parity encoding scheme has a Hamming distance of 2. Hamming distance serves as a measure of the robustness of error checking (as a measure of the redundancy of the encoding). CSE360 29 ISEM FAQ 1 Editing, Assembling, Linking, and Loading – There are three components to the Instructional SPARC Emulator (ISEM) package that we use for this class: the assembler, the linker, and the emulator/debugger. CSE360 30 ISEM FAQ 2 Editing – There are a number of programs that you can use to create your source files. Emacs is probably the most popular; vi is also available, but its command syntax is difficult to learn and use; using pine program, you can use the pico editor, which combines many features of Emacs into a simple menu-driven facility. – Start Emacs by “xemacs sourcefile.s &”, which creates the file called sourcefile.s. – Use the tutorial, accessed by typing "Ctrl-H Ctrl-H t". – For other editors, you are on your own. CSE360 31 Example Sparc Assembly Language Instructions % type xmp0.s .data A_m: .word ’?’ B_m : .word 0x30 C_m : .word 0 .text ! ! ! ! ! ! ! Assembler directive: data starts here. A_m, B_m, and C_m are symbolic constants. Furthermore, each is an address of a certain-sized chunk of memory. Here, each chunk is four bytes (one word) long. When the program gets loaded, each of these chunks stores a number in 2’s complement encoding, as follows: At address C_m, zero; at B_m, 48; at A_m, 0x3F = 077 = 63. ! start: ! set A_m, %r2 ! ld [%r2], %r2 ! set B_m, %r3 ! ld [%r3], %r3 ! sub %r2, %r3, %r2 ! set C_m, %r4 ! st %r2, [%r4] ! terminate: ! ta 0 ! beyond_end: ! CSE360 Assembler directive, instructions start here Label (symbolic constant) for this address Put address A_m into register 2 Use r2 as an indirect address for a load (read) Put address B_m into register 3 Read from B_m and replace r3 w/ value at addr B_m Subtract r3 from r2, save in r2 Put address C_m into register 4 Store (write) r2 to memory at address C_m Label for address where ’ta 0’ instruction stored Stop the program Label for address beyond the end of this program 32 ISEM FAQ 3 Assembling – The assembler is called "isem-as", and is the GNU Assembler (GAS), configured to cross-assemble to a SPARC object format. – It is used to take your source code, and produce object code that may be linked and run on the ISEM emulator. – The syntax for invoking the assembler is: isem-as [-a[ls]] sourcefile.s -o objectfile.o – The input is read from sourcefile.s, and the output is written to objectfile.o. – The option "-a" tells the assembler to produce a listing file. The sub-options "l" and "s" tell the assembler to include the assembly source in the listing file and produce a symbol table, respectively. CSE360 33 ISEM FAQ 4 The listing file – Will identify all the syntactic errors in your program, and it will warn you if it identifies "suspicious" behavior in your source file. – Column 1 identifies a line number in your source file. – Column 2 is an offset for where this instruction or data resides in memory. – Column 3 is the image of what is put in memory, either the machine instructions or the representation of the data. – The final column is the source code that produced the line. – At the bottom of the file you will find the symbol table. – Again, the symbols are represented as offsets that are relocated when the program is loaded into memory. CSE360 34 isem-as -als labn.s -o labn.o >! labn.lst 1 2 3 4 5 6 7 7 8 9 9 10 11 12 12 13 14 15 16 0000 0004 0008 000c 0000003F 00000030 00000000 00000000 .data .word ’?’ .word 0x30 .word 0 .text A_m: B_m: C_m: start: 0000 05000000 8410A000 0008 C4008000 000c 07000000 8610E000 0014 C600C000 0018 84208003 001c 09000000 88112000 0024 C4210000 0028 91D02000 002c 01000000 DEFINED SYMBOLS xmp0.s:2 xmp0.s:3 xmp0.s:4 xmp0.s:6 xmp0.s:14 xmp0.s:16 NO UNDEFINED SYMBOLS CSE360 Line in source file (.s) set A_m, %r2 ld set [%r2], %r2 B_m, %r3 ld sub set [%r3], %r3 %r2, %r3, %r2 C_m, %r4 st terminate: ta beyond_end: Offset to address in memory %r2, [%r4] 0 .data:00000000 .data:00000004 .data:00000008 .text:00000000 .text:00000028 .text:0000002c A_m B_m C_m start terminate beyond_end Labels are symbolic offsets Contents at address in memory 35 ISEM FAQ 5 Linking – Linking turns a set of raw object file(s) into an executable program. – From the manual page, "ld combines a number of object and archive files, relocates their data and ties up symbol references. Often the last step in building a new compiled program to run is a call to ld." – Several object files are combined into one executable using ld; the separate files could reference symbols from one another. – The output of the linker is an executable program. – The syntax for the linker is as follows: isem-ld objectfile.o [-o execfile] Examples % isem-ld foo.o -o foo Links foo.o into the executable foo. % isem-ld foo.o Links foo.o into the executable a.out. CSE360 36 ISEM FAQ 6 Loading/Running – Execute the program and test it in the emulation environment. – The program "isem" is used to do this, and the majority of its features are covered in your lab manual. – Invoke isem as follows isem [execfile] Examples % isem foo Invokes the emulator, loads the program foo % isem Invokes the emulator, no program is loaded – Once you are in the emulator, you can run your program by typing "run" at the prompt. CSE360 37 ISEM Debugging Tools 1 % isem xmp0 Instructional SPARC Emulator Copyright 1993 - Computer Science Department University of New Mexico ISEM comes with ABSOLUTELY NO WARRANTY ISEM Ver 1.00d : Mon Jul 27 16:29:45 EDT 1998 Loading File: xmp0 2000 bytes loaded into Text region at address 8:2000 2000 bytes loaded into Data region at address a:4000 PC: 08:00002020 start nPC: 00002024 : sethi PSR: 0000003e N:0 Z:0 V:0 C:0 0x10, %g2 ISEM> run Program exited normally. Assembly language programs are not notoriously chatty. CSE360 38 ISEM Debugging Tools 2 reg – Gives values of all 32 general registers – Also PC ----0--- ----1--- ----2--- ----3--- ----4--- ----5--- ----6--- ----7--G 00000000 00000000 0000000f 00000030 00004008 00000000 00000000 00000000 O 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 L 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 I 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 PC: 08:0000204c CSE360 sethi PSR: 0000003e N:0 Z:0 V:0 C:0 0x0, %g0 ISEM> symb Symbol List A_m : 00004000 B_m : 00004004 . dump [addr] – Either symbol or hex address – Gives the values stored in memory nPC: 00002050 beyond_end : symb – Shows the resolved values of all symbolic constants ISEM> reg . . terminate : 00004028 ISEM> dump A_m 0a:00004000 00 00 00 3f 00 00 00 30 00 00 00 0f 00 00 00 00 ...?...0........ 0a:00004010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0a:00004020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 39 ISEM Debugging Tools 3 break [addr] – Set breakpoints in execution – Once execution is stopped, you can look at the contents of registers and memory. trace – Causes one (or more) instruction(s) to be executed – Registers are displayed – Handy for sneaking up on an error when you’re not sure where it is. For the all-time “most wanted” list of errors (and their fixes) – http://www.cse.ohio-state.edu/~heym/360/common/faq.html CSE360 40 Basic Components 1 Terminology from Ch. 2: – Flip flop: basic storage device that holds 1 bit – D flip flop: special flip flop that outputs the last value that was input to it (a data signal). – Clock: two different meanings: (1) a control signal that oscillates (low to high voltage) every x nanoseconds; (2) the “write select” line for a flip flop. Data In D Flip Flop Clock CSE360 Data Out one cycle 41 Basic Components 2 – Register: collection of flip flops with parallel load. Clock (or “write select”) signal controlled. Stores instructions, addresses, operands, etc. – Bus: Collection of related data lines (wires). Input Bus d7 d6 d5 d4 d3 Clock d2 d1 d0 8 Clock 8 Bit Register 8 Output Bus CSE360 42 Basic Components 3 – Combinational circuits: implement Boolean functions. No feedback in the circuit, output is strictly a function of input. Gates: and, or, not, xor AND OR NOT XOR E.g., xy + z x y z CSE360 f 43 Basic Components 4 – Gates can be used in combination to implement a simple (half) adder. Addition creates a value, plus a carry-out. Z=XY CO = X Y CSE360 X Y Z CO 0 0 0 0 0 1 1 0 1 0 1 0 1 1 0 1 X Y Z CO 44 Basic Components 5 – Sequential Circuits: introduce feedback into the circuit. Outputs are functions of input and current state. D Q C – Multiplexers: combinational circuits that use n bits to select an output from 2n input lines. i0 i1 i2 i3 4 to 1 MUX f s0 s1 CSE360 45 Basic Components 6 Von Neumann Architecture – Can access either instructions or data from memory in each cycle. – One path to memory (von Neumann bottleneck) – Stored program system. No distinction between programs and data Main Memory System Address Pathway Data and Instruction Pathway Operational Registers Arithmetic and Logic Unit Program Counter Control Unit Input/Output System CSE360 46 Basic Components 7 Examples of Von Neumann architecture to be explored in this course: SAM: tiny, good for learning architecture MIPS: text’s example assembly language SPARC: labs M68HC11: used in ECE 567 (taken by CSE majors) Roughly, the order of presentation in this course is as follows: A couple of days on the Main Memory System Weeks on the Central Processing Unit (CPU) Finish the course with the I/O System CSE360 47 Basic Components 8 Memory: Can be viewed as an array of storage elements. – The index of each element is called the address. – Each element holds the same number of bits. How many bits per element? 8, 16, 32, 64? 8 bits = 1 byte CSE360 16 bits 32 bits 64 bits 0 0 0 0 1 1 1 1 2 2 2 2 ... ... ... ... n-1 n-1 n-1 n-1 48 Memory Element & Address Sizes •If a machine’s memory is 5-bit addressable, then, at each distinct address, 5 bits are stored. The contents at each address are represented by 5 bits. •If 3 bits are used to represent memory addresses, then the memory can have at most 23 = 8 distinct addresses. •Such a memory can store at most 8 5 = 40 bits of data. •If the data bus is 10 bits wide, then up to 10 bits at a time can be transferred between memory and processor; this is a 10-bit word. CSE360 Address Contents Decimal Binary 0 000 00011 1 001 01111 2 010 01110 3 011 10100 4 100 00101 5 101 01110 6 110 10100 7 111 10011 49 Basic Components 9 Let’s look deeper. – Suppose each memory element is stored in a bank and given a relative address. – You could have several such banks in your memory. – The GLOBAL address of each element would be: [relative address] & [bank address]. – To get two elements at a time, start reading from bank 0 (don’t start from bank 1; this would be a “memory address not aligned” error). CSE360 Bank 0 000 001 010 011 100 101 Bank 0 000 001 010 011 100 101 000 0 001 0 010 0 011 0 100 0 101 0 Bank 1 000 001 010 011 100 101 000 1 001 1 010 1 011 1 100 1 101 1 Global addresses, not contents. Think of the contents as being underneath the global addresses. 50 Basic Components 10 – Memory alignment: Assume a byte addressable machine with 4-byte words. Where are operands of various sizes positioned? CSE360 bytes: on a byte boundary (any address) half words: on half word boundary (even addresses) words: on word boundary (addresses divisible by 4) double words: on double word boundary (addresses divisible by 8) 51 Basic Components 11 Byte ordering: how numeric data is stored in memory – Ex.: 24789651110 = 0EC699BF16 – Stored at address 0 0 OE 1 C6 2 99 Big Endian High order (big end) is at byte 0 Little Endian Low order (little end) is at byte 0 3 BF 0 BF 1 99 2 C6 3 0E Contrast with bit ordering CSE360 7 6 5 4 3 2 1 0 1 0 1 1 1 1 1 1 52 Basic Components 12 Read/Write operations: must know the address to read or write. (read = fetch = load, write = store) CPU puts address on address bus A0 A1 CPU sends read signal A(m-1) – (R/W=1, CS=1) – (Read/don’t Write, Chip Select) Wait Memory puts data on data bus – reset (CS=0) CSE360 CS R/ D0 D1 D(n-1) 53 W Basic Components 13 – Types of memory: ROM: Read Only Memory: non-volatile (doesn’t get erased when powered down; it’s a combinational circuit!) PROM: Programmable ROM: use a ROM burner to write data to it initially. Can’t be re-written. EPROM: Erasable PROM. Uses UV light to erase. EEPROM: Electrically Erasable PROM. RAM: Random access memory. Can efficiently read/write any location (unlike sequential access memory). Used for main memory. – Many variations (types) of RAM, all volatile • SDRAM, DDR SDRAM • RDRAM • www.tomshardware.com CSE360 54 Basic Components 14 CPU: executes instructions -- primitive operations that the computer can perform. – E.g., CSE360 arithmetic data movement control logical A+B A := B if expr goto label AND, OR, XOR… Instructions specify both the operation and the operands. An encoded operand is often a location in memory where the value of interest may be found (address of value of interest). 55 Basic Components 15 – Instruction set: all instructions for a machine. Instruction format specifies number and type of operands. Ex.: Could have an instruction like ADD A, B, R Where A, B, and R are the addresses of operands in memory. The result is R := A+B. Addr Memory Label 0 8 A 4 9 B 17 R 8 C CSE360 56 Basic Components 16 – Actually, the “instruction” might be represented in a source file as: 0x41444420412C20422C20520A. … A D D A , B , R As such, it is an assembly language instruction. – An assembler might translate it to, say, 0x504C, the machine’s representation of the instruction. As such, it is a machine language instruction. CSE360 57 A Simple Instruction Set 1 Simple instruction set: the Accumulator machine. – Simplify instruction set by only allowing one operand. Accumulator implied to be the second operand. – Accumulator is a special register. Similar to a simple calculator. CSE360 ADD SUB MPY DIV LOAD STORE addr addr addr addr addr addr ACC ACC + M[addr] ACC ACC - M[addr] ACC ACC * M[addr] ACC ACC / M[addr] ACC M[addr] M[addr] ACC 58 A Simple Instruction Set 2 Ex.: C = AB + CD LOAD MPY STORE LOAD MPY ADD STORE 20 21 30 22 23 30 22 ! ! ! ! ! ! ! 1)Acc<-M[20] 2)Acc<-Acc*M[21] M[30]<-Acc 3)Acc<-M[22] 4)Acc<-Acc*M[23] 5)Acc<-Acc+M[30] M[22]<-Acc Accumulator 20 A 21 B 22 C 23 D ... 30 1) 2) 3) 4) 5) temp – Machine language: Converting from assembly language to machine language is called assembling. CSE360 59 An Instruction (Encoding) Format Assume 8-bit architecture. Each instruction may be 8 bits. 3 bits hold the op-code and 5 bits hold the operand. op-code 7 CSE360 operand 54 0 How much memory can we address? How many op-codes can we have? Convert the mnemonic op-codes into binary codes. Operation Code ADD SUB MPY DIV LOAD STORE 000 001 010 011 100 101 60 A Simple Instruction Set 4 Hand assemble our program: LOAD 20 MPY 21 STORE 30 ... CSE360 100 10100 010 10101 101 11110 ... Instructions are stored in consecutive memory: Addr Memory Mnemonic 0 1 2 3 4 5 6 … 20 21 22 23 … 30 100 010 101 100 010 000 101 … 4 5 6 7 … 20 LOAD A MPY B STORE temp LOAD C MPY D ADD temp STORE C 10100 10101 11110 10110 10111 11110 10110 A B C D temp 61 A Simple Instruction Set 5 INC Addr Op 2 to 1 MUX Decode 3 Timing and Control PC IR 2 9 12 Bus 4 5 6 7 0 10 1 MAR MDR 2 to 1 MUX ACC 11 ALU Memory 8 13 CSE360 14 62 A Simple Instruction Set 6 – Control signals: control functional units to determine order of operations, access to bus, loading of registers, etc. Number 0 1 2 3 4 5 6 7 CSE360 Operation ACCbus load ACC PCbus load PC load IR load MAR MDRbus load MDR Number 8 9 10 11 12 13 14 Operation ALUACC INCPC ALU operation ALU operation Addrbus CS R/W 63 Number 0 1 2 3 4 5 6 7 CPU Operation ACCbus load ACC PCbus load PC load IR load MAR MDRbus load MDR Number 8 9 10 11 12 13 14 Operation ALUACC INCPC ALU operation ALU operation Addrbus CS R/W INC Addr Op 2 to 1 MUX Decode 3 Timing and Control PC IR 2 9 12 Bus 4 5 6 7 0 10 1 MAR MDR 2 to 1 MUX ACC 11 ALU Memory 8 13 CSE360 14 64 A Simple Instruction Set 7 State PC to bus load MAR INC to PC load PC 0 1 2 3 MDR to bus load IR Addr to bus load MAR Y 4 5 OP=store ACC to bus load MDR CSE360 N CS, R/W CS Execute 6 OP=load Y 7 8 Fetch CS, R/W MDR to bus load ACC N MDR to bus ALU to ACC ALU op load ACC 65 State 0: Control Signals 2, 5, 9, 3 Put the address of the next instruction in the Addr Register and Inc. PC. INC Addr Op Decode 2 to 1 MUX PC to bus load MAR INC to PC load PC 3 Timing and Control PC Fetch CS, R/W IR MDR to bus load IR 2 9 12 Bus 4 Addr to bus load MAR 5 6 7 0 OP=store CS, R/W CS Execute ACC 1 11 MAR OP=load ALU Memory 8 MDR to bus load ACC MDR to bus ALU to ACC ALU op load ACC 13 CSE360 MDR 2 to 1 MUX ACC to bus load MDR 10 14 66 State 1: Control Signals 13, 14 Fetch the word of memory at Address, and load into Data Register. INC PC to bus load MAR INC to PC load PC Addr Op Decode Timing and Control Fetch 2 to 1 MUX CS, R/W 3 PC MDR to bus load IR IR Addr to bus load MAR 2 9 12 Bus 4 OP=store 5 6 0 ACC to bus load MDR CS, R/W ACC OP=load MDR to bus load ACC MDR to bus ALU to ACC ALU op load ACC 1 11 MAR ALU Memory 8 13 CSE360 MDR 2 to 1 MUX CS 10 Execute 14 67 7 State 2: Control Signals 6, 4 Send the word from the Data Register to the Instruction Register. INC PC to bus load MAR INC to PC load PC Addr Op Decode 2 to 1 MUX Fetch CS, R/W 3 Timing and Control PC MDR to bus load IR IR Addr to bus load MAR 2 9 12 Bus 4 OP=store 5 6 0 ACC to bus load MDR CS, R/W ACC OP=load MDR to bus load ACC MDR to bus ALU to ACC ALU op load ACC 1 11 MAR ALU Memory 8 13 CSE360 MDR 2 to 1 MUX CS 10 Execute 14 68 7 State 3: Control Signals 12, 5 Put the address from the instruction in the Address Register. INC PC to bus load MAR INC to PC load PC Addr Op Decode 2 to 1 MUX Fetch CS, R/W 3 Timing and Control PC MDR to bus load IR IR Addr to bus load MAR 2 9 12 Bus 4 OP=store 5 6 0 ACC to bus load MDR CS, R/W ACC OP=load MDR to bus load ACC MDR to bus ALU to ACC ALU op load ACC 1 11 MAR ALU Memory 8 13 CSE360 MDR 2 to 1 MUX CS 10 Execute 14 69 7 After State 3, what values are now stored in each register? PC MAR MDR IR ACC CSE360 70 State 4: Control Signals 0, 7 Take the value from the ACCumulator and store it in the Data Register. INC Addr PC to bus load MAR INC to PC load PC Op 2 to 1 MUX Decode Fetch CS, R/W 3 Timing and Control PC MDR to bus load IR IR Addr to bus load MAR 2 9 12 Bus 4 5 OP=store 6 0 ACC to bus load MDR 10 CS, R/W Execute OP=load MDR to bus load ACC MDR to bus ALU to ACC ALU op load ACC 1 MAR ALU Memory 8 13 CSE360 MDR 2 to 1 MUX ACC CS 11 14 71 7 State 5: Control Signal 13 Write the data from the Data Register to the address stored in the MAR. INC Addr PC to bus load MAR INC to PC load PC Op 2 to 1 MUX Decode Fetch CS, R/W 3 Timing and Control PC MDR to bus load IR IR Addr to bus load MAR 2 9 12 Bus 4 5 OP=store 6 0 ACC to bus load MDR 10 CS, R/W Execute OP=load MDR to bus load ACC MDR to bus ALU to ACC ALU op load ACC 1 MAR ALU Memory 8 13 CSE360 MDR 2 to 1 MUX ACC CS 11 14 72 7 State 6: Control Signals 13, 14 Load the word at the Address from the Addr Reg into the Data Register. INC PC to bus load MAR INC to PC load PC Addr Op Decode 2 to 1 MUX Fetch CS, R/W 3 Timing and Control PC MDR to bus load IR IR Addr to bus load MAR 2 9 12 Bus 4 OP=store 5 6 0 ACC to bus load MDR CS, R/W ACC OP=load MDR to bus load ACC MDR to bus ALU to ACC ALU op load ACC 1 11 MAR ALU Memory 8 13 CSE360 MDR 2 to 1 MUX CS 10 Execute 14 73 7 After State 6, what values are now stored in each register? PC MAR MDR IR ACC CSE360 74 State 7: Control Signals 6, 1 Load the word from Data Register into the ACCumulator. INC PC to bus load MAR INC to PC load PC Addr Op Decode 2 to 1 MUX Fetch CS, R/W 3 Timing and Control PC MDR to bus load IR IR Addr to bus load MAR 2 9 12 Bus 4 OP=store 5 6 7 0 ACC to bus load MDR CS, R/W ACC OP=load MDR to bus load ACC MDR to bus ALU to ACC ALU op load ACC 1 11 MAR ALU Memory 8 13 CSE360 MDR 2 to 1 MUX CS 10 Execute 14 75 State 8: Control Signals 6, 8, 10/11, 1 Use word from the Data Register for Arith Op and put result in ACC. INC PC to bus load MAR INC to PC load PC Addr Op Decode Fetch 2 to 1 MUX CS, R/W MDR to bus load IR 3 Timing and Control PC IR Addr to bus load MAR 2 9 12 Bus 4 OP=store 5 6 0 ACC to bus load MDR CS, R/W OP=load MDR to bus load ACC 10 ACC MDR to bus ALU to ACC ALU op load ACC 1 11 MAR ALU Memory 8 13 CSE360 MDR 2 to 1 MUX CS Execute 14 76 7 New Instruction •What is necessary to implement a new instruction? •New states? •New control signals? •New fetch/execute cycle? •An Example: •SWAP Exchange value in Accumulator with value at Address •SWAP addr CSE360 ! Acc <- #M[addr], M[addr] <- #Acc 77 New Instruction What changes to fetch/execute cycle? – The fetch part of the cycle usually remains the same. – Recall the values stored in registers after each state E.g., After State 6, what values are in each register? – – – – – PC MAR MDR IR ACC Handy to have #M[addr] in MDR – Start after state 6 then… . CSE360 PC to bus load MAR INC to PC load PC Fetch CS, R/W MDR to bus load IR Addr to bus load MAR OP=store ACC to bus load MDR CS, R/W CS Execute OP=load MDR to bus load ACC MDR to bus ALU to ACC ALU op load ACC 78 New State 9: Control Signals 6, 5 Save the Data value from the MDR in the Address Register. INC MDR -> bus Load MAR Addr Op 2 to 1 MUX Decode 3 Timing and Control PC IR 2 9 12 Bus 4 5 6 7 0 10 1 MAR MDR 2 to 1 MUX ACC 11 ALU Memory 8 13 CSE360 14 79 New State 10: Control Signals 0, 7 Send the ACCumulator value to the Data Register. INC ACC -> bus load MDR Addr Op 2 to 1 MUX Decode 3 Timing and Control PC IR 2 9 12 Bus 4 5 6 7 0 10 1 MAR MDR 2 to 1 MUX ACC 11 ALU Memory 8 13 CSE360 14 80 New State 11: Control Signals ?, 1 Put the saved value from the MAR into the ACCumulator. INC Addr MAR->bus load ACC 2 to 1 MUX 3 Timing and Control PC IR 2 9 12 Bus 4 5 6 0 10 ACC 1 11 MAR ALU Memory 8 13 CSE360 MDR 2 to 1 MUX Note: there is no control signal in the current architecture opposite of 5 (Load MAR), so we would have to create a new control signal (MAR to bus) in addition to creating these new states. Op Decode 14 81 7 New State 12 (Old 3): Control Signals 12, 5 Put (reload) the address from the instruction in the Address Register. INC Addr -> bus load MAR Addr Op 2 to 1 MUX Decode 3 Timing and Control PC IR 2 9 12 Bus 4 5 6 7 0 10 1 MAR MDR 2 to 1 MUX ACC 11 ALU Memory 8 13 CSE360 14 82 New State 13 (Old 5): Control Signals 13 Write the data from the Data Register to the address stored in the MAR. INC Addr CS Op 2 to 1 MUX Decode 3 Timing and Control PC IR 2 9 12 Bus 4 5 6 0 10 1 MAR MDR 2 to 1 MUX ACC 11 ALU Memory 8 13 CSE360 14 83 7 New Instruction Example Summary Changes to States, added 9 thru 13 Changes to Signals, added 15: MAR -> bus Changes to Fetch/Execute, new register transfer language (RTL) PC -> bus, load MAR, INC -> PC, Load PC CS, R/w MDR -> bus, load IR Addr -> bus, load MAR CS, R/w MDR -> bus, load MAR ACC -> bus, load MDR MAR -> bus, load ACC Addr -> bus, load MAR CS CSE360 84 Instruction Set Architectures 1 RISC vs. CISC – Complex Instruction Set Computer (CISC): many, powerful instructions. Grew out of the need for high code density. Instructions have varying lengths, number of operands, formats, and clock cycles in execution. – Reduced Instruction Set Computer (RISC): fewer, less powerful, optimized instructions. Grew out of opportunity for simpler, faster hardware. Instructions have fixed length, number of operands, formats, and similar number of clock cycles in execution. CSE360 85 Instruction Set Architectures 2 Motivation: memory is comparatively slow. – 10x to 20x slower than processor. – Need to minimize number of trips to memory. Provide faster storage in the processor -- registers. Registers (16, 32, 64 bits wide) are used for intermediate storage for calculations, or repeated operands. Accumulator machine – One data register -- ACC. – 2 memory accesses per instruction -- one for the instruction and one for the operand. CSE360 Add more registers (R0, R1, R2, …, Rn) 86 Instruction Set Architectures 3 How many addresses to specify? – With binary operations, need to know two source operands, a destination, and the operation. E.g., op (dest_operand) (src_op1) (src_op2) – Based on number of operands, could have: CSE360 3 addr. machine: both sources and dest are named. 2 addr. machine: both sources named, dest is a source. 1 addr. machine: one source named, other source and dest. is the accumulator. 0 addr. machine: all operands implicit and available on the stack. 87 Instruction Set Architectures 4 1-address architecture: a:=ab+cde – Memory only Code LOAD MPY STORE LOAD MPY MPY ADD STORE # mem refs 100 104 100 108 112 116 100 100 2 2 2 2 2 2 2 2 Using registers Code LOAD MPY STORE LOAD MPY MPY ADD STORE # mem refs 100 104 R2 108 112 116 R2 100 2 2 1 2 2 2 1 2 1½-address architecture: at least one operand must always be a register. (½ address is register, 1 address is the memory operand: LOAD 100, R1). – Like an accumulator machine, but with many accumulators. CSE360 88 Instruction Set Architectures 5 3-address architecture: a:=ab+cde – Using memory only: Code MPY MPY MPY ADD 100, 200, 200, 100, # mem refs 100, 108, 116, 200, 104 112 200 100 ;a:=ab ;t:=cd ;t:=et ;a:=t+a Memory – Using registers: Code MPY MPY MPY ADD R2, 100, R3, 108, R3, 116, 100, R3, # mem refs 104 112 R3 R2 ;t1:=ab ;t2:=cd ;t2:=et2 ;a:=t1+t2 100 104 108 112 116 ... 200 (a) (b) (c) (d) (e) (t) – What about instruction size? CSE360 89 Instruction Set Architectures 6 2-address architecture: a:=ab+cde – Using memory only: Code MPY MOVE MPY MPY ADD # mem refs 100, 200, 200, 200, 100, 104 108 112 116 200 ;a:=ab ;t:=c ;t:=td ;t:=te ;a:=t+a 4 3 4 4 4 – Using registers: Code MPY 100, MOVE R2, MPY R2, MPY R2, ADD 100, # mem refs 104 108 112 116 R2 ;a:=ab ;R2:=c ;R2:=R2d ;R2:=R2e ;a:=t+a 4 2 2 2 3 Memory 100 104 108 112 116 ... 200 (a) (b) (c) (d) (e) (t) – Most CISC arch. this way, making 1 operand implicit CSE360 90 Instruction Set Architectures 7 0-address architecture: a:=ab+cde – Stack machine: All operands are implicit. Only push and pop touch memory. All other operands are pulled from the top of stack, and result is pushed on top. E.g., HP calculators. Code PUSH A PUSH B MPY PUSH C PUSH D PUSH E MPY MPY ADD POP A CSE360 # mem refs 2 2 1 2 2 2 1 1 1 2 Stack 4 3 2 1 0 91 Instruction Set Architectures 8 Load/Store Architectures -- RISC – Use of registers is simple and efficient. Therefore, the only instructions that can access memory are load and store. All others reference registers. Code LOAD LOAD LOAD LOAD LOAD MPY MPY MPY ADD STORE CSE360 # mem refs R2, 100 R3, 104 R4, 108 R5, 112 R6, 116 R2, R2, R3, R4, R3, R3, R2, R2, 100, R2 R3 R5 R6 R3 ;R2a ;R3b ;R4c ;R5d ;R6e ;R2ab ;R3cd ;R3(cd)e ;R2ab+(cd)e ;aab+(cd)e 2 2 2 2 2 1 1 1 1 2 92 Instruction Set Architectures 9 Why load/store architectures? – Number of instructions (hence, memory references to fetch them) is high, but can work without waiting on memory. – Claim: overall execution time is lower. Why? Clock cycle time is lower (no micro code interpretation). More room in CPU for registers and memory cache. Easier to overlap instruction execution through pipelining. – Side effects: CSE360 Register interlock: delaying execution until memory read completes. Instruction scheduling: rearranging instructions to prevent register interlock (loads on SPARC) and to avoid wasting the results of pipelined execution (branches on SPARC). 93 SPARC Assembly Language 1 SPARC (Scalable Processor ARChitecture) – Used in Sun workstations, descended from RISC-II developed at UC Berkeley – General Characteristics: 32-bit word size (integer, address, register size, etc.) Byte-addressable memory RISC load/store architecture, 32-bit instruction, few addressing modes Many registers (32 general purpose, 32 floating point, various special purpose registers) – ISEM: Instructional SPARC Emulator - nicer than a real machine for learning to write assembly language programs. CSE360 94 SPARC Assembly Language 2 Structure – Line oriented: 4 types of lines Blank - Ignored Labeled – Any line may be labeled. Creates a symbol in listing. Labels must begin with a letter (other than ‘L’), then any alphanumeric characters. Label must end with a colon “:”. Label just assigns a name to an address. Assembler Directives - E.g., .data .word .text, etc. Instructions – Comments start after “!” character and go to the end of the line. CSE360 .data x_m: y_m: z_m: .word 0x42 .word 0x20 .word 0 .text start: set ld set ld x_m, %r2 [%r2], %r2 y_m, %r3 [%r3], %r3 ! Load x into reg 2 ! Load y into reg 3 95 SPARC Assembly Language 3 Directives: Instructions to the assembler – Not executed by the machine .data -- following section contains declarations – Each declaration reserves and initializes a certain number of bits of storage for each of zero or more operands in the declaration. • .word -- 32 bits • .half -- 16 bits • .byte -- 8 bits E.g., w: x: y: z: CSE360 .data .half .byte .byte .word 27000 8 ’m’, 0x6e, 0x0, 0, 0 0x3C5F .text -- following section contains executable instructions 96 SPARC Assembly Language 4 Registers -- 32 bits wide – 32 general purpose integer registers, known by several names to the assembler %r0-%r7 also known as %g0-%g7 global registers -- Note, %r0 always contains value 0. %r8-%r15 also known as %o0-%o7 output registers %r16-%r23 also known as %l0-%l7 local registers %r24-%r31 also known as %i0-%i7 input registers Use the %r0-%r31 names for now. Other names are used in procedure calls. – 32 floating point registers %f0-%f31. Each reg. is single precision. Double prec. uses reg. pairs. CSE360 97 SPARC Assembly Language 5 Assembly language – 3-address operations - format different from book op src1, src2, dest !opposite of text E.g., add %r1, %r2, %r3 !%r3 %r1 + %r2 or %r2, 0x0004, %r2 !%r2 %r2 b-w-or 0x0004 – Contrast SPARC with MiPs (used in the book) CSE360 indirect address notation: @addr vs [addr] operand order, especially the destination register register notation: R2 vs. %r2 branches 98 SPARC Assembly Language 6 – 2-address operations: load and store ld [addr], %r2 st %r2, [addr] ! %r2 M[addr] ! M[addr] %r2 Often use set to put an address (a label, a symbolic constant) into a register, followed by ld to load the data itself. set x_m, %r1 !put addr x_m into %r1 ld [%r1],%r2 !use addr in %r1 to load %r2 – Immediate values: instruction itself contains some data to be used in execution. CSE360 99 SPARC Assembly Language 7 – Immediate values (continued) E.g., add %rs, siconst13, %rd !%rd%rs+const Constant is coded into instruction itself, therefore available after fetching the instruction (no extra trip to memory for an operand). On SPARC, no special notation for differentiating constants from addresses because no ambiguity in a load/store architecture. Immediate value coded in 13 bit sign-extended value. Range is, then, -212…212-1 or -4096 to 4095. Immediate values can be specified in decimal, hexadecimal, octal, or binary. E.g., add %r2, 0x1A, %r2 ! %r2 %r2 + 26 CSE360 100 SPARC Assembly Language 8 – Synthetic Instructions: assembler translates one “instruction” into several machine instructions. CSE360 set : used to load a 32-bit signed integer constant into a register. Has 2 operands - 32 bit value and register number. How does that fit into a 32 bit instruction? E.g., set iconst32, %rd set -10, %r3 set x_m, %r4 set ’=’, %r8 clr %rd : used to set all bits in a register to 0. How? mov %rs, %rd : copies a register. neg %rs, %rd : copies the negation of a register. 101 SPARC Assembly Language 9 – Operand sizes – set ld ldsb ldub x_m, %r2 [%r2], %r1 [%r2], %r1 [%r2], %r1 !Put addr x_m in %r2 !load word !load byte, sign extended !load byte, extend with 0’s st stb sth %r1, [%r2] !store word, addr is mult of 4 %r1, [%r2] !store byte, any address %r1, [%r2] !store half word, address is even Characters use 8 bits CSE360 double word = 8 bytes, word = 4 bytes, half word = 2 bytes, byte = 8 bits. Recall memory alignment issues. ldub to load a character stb to store a character 102 SPARC Assembly Language 10 – Traps : provides initial help with I/O, also used in operating systems programming. ta 0 : terminate program ta 1 : output ASCII character from %r8 ta 2 input ASCII character into %r8 ta 4 : output integer from %r8 in unsigned hexadecimal ta 5 : input integer into %r8, can be decimal, octal, or hex E.g., set ’=’, %r8 ta 1 ta 5 mov %r8, %r1 set 0x0a, %r8 ta 1 CSE360 !put ’=’ in %r8 !output the ’=’ !read in value into %r8 !copy %r8 into %r1 !load a newline into %r8 !output the newline 103 SPARC Assembly Language 11 – More assembler directives (.asciz and .ascii): Each of the following two directives is equivalent: – msg01: .asciz "a phrase" – msg01: .byte 'a', ' ', 'p', 'h', 'r' .byte 'a', 's', 'e', 0 Note that .asciz generates one byte for each character between the quote (") marks in the operand, plus a null byte at the end. The .ascii directive does not generate that extra byte. Each of the following three directives is equivalent: – digits: .ascii "0123456789" – digits: .byte '0', '1', '2', '3', '4', '5' .byte '6', '7', '8', '9' – digits: .byte 0x30, 0x31, 0x32, 0x33, 0x34 .byte 0x35, 0x36, 0x37, 0x38, 0x39 CSE360 104 SPARC Assembly Language 12 – Quick review of instructions so far: ld [addr], %rd st %rd, [addr] op %rs1, %rs2, %rd op %rs, siconst13, %rd set siconst32, %rd ta # ! ! ! ! ! ! %rd M[addr] M[addr] %r2 op is ALU op %rd%rs op const %rdconst trap signal – Have actually seen many more variants, e.g., ldub, ldsb, sth, clr, mov, neg, add, sub, smul, sdiv, umul, udiv, etc. Can evaluate just about any simple arithmetic expression. CSE360 105 Review: Sparc Loads, Stores x_m: .data .word 0xa1b2c3d4 .skip 12 .text set x_m, %r2 ld [%r2], %r3 ldsb [%r2], %r4 ldub [%r2], %r5 st %r3, [%r2+4] sth %r3, [%r2+8] stb %r3, [%r2+12] ta 0 After this runs, what values are in %r2-5, and memory locations starting at byte address x_m? CSE360 106 Flow of Control 1 In addition to sequential execution, need ability to repeatedly and conditionally execute program fragments. – High level language has: while, for, do, repeat, case, if-then-else, etc. – Assembler has if, goto. – Compare: high level vs. pseudo-assembler, implementation of f=n! f = 1; i = 2; while (i <= n) { f = f * i; i = i + 1; } CSE360 f = 1 i = 2 loop: if (i > n) goto done f = f * i i = i + 1 goto loop done: ... 107 Flow of Control 2 – Branch -- put a new address in the program counter. Next instruction comes from the new address, effectively, a “goto”. – Unconditional branch (book) (SPARC) BRANCH ba addr addr ! PC addr ! PC addr – Conditional branch CSE360 (book) BRcc R1, R2, target “if R1 cc R2 then PC target” and cc is comparison operation (e.g., LT is <, GE is , etc.) 108 Flow of Control 3 – Evaluating conditional branches Evaluate condition If condition is true, then PC target, else PC PC+1 – Consider changes to the fetch-execute cycle given earlier for accumulator machine. What needs to change? CSE360 PC to bus, etc. Fetch Addr to bus, load PC Yes OP= BRANCH No Yes Yes Cond=T No OP=BRcc Execute No 109 Flow of Control 4 Other conditions (from text, very similar to MIPS) BRLT BRLE BREQ BRNE BRGE BRGT Rn, Rn, Rn, Rn, Rn, Rn, Rm, Rm, Rm, Rm, Rm, Rm, ; ; ; ; ; ; if if if if if if Rn < Rn Rn Rn Rn Rn Rm Rm Rm Rm Rm Rm then then then then then then PCtarget PCtarget PCtarget PCtarget PCtarget PCtarget Can implement high level control structures now. Back to the factorial example using the book’s assembly language: LOAD LOAD LOAD loop: BRGT MPY ADD BRANCH done: STORE CSE360 target target target target target target R1, R2, R3, R2, R1, R2, loop f, #1 #2 n R3, R1, R2, R1 done R2 #1 ; ; ; ; ; ; ; ; R1 = f = 1 R2 = i = 2 R3 = n branch if i > n f = f * i i = i + 1 goto loop f = n! 110 Flow of Control 5 – Condition Codes Book’s assembly language has 3-address branches. SPARC uses 1-address branches. Must use condition codes. Non-MIPS machines use condition codes to evaluate branches. Condition Code Register (CCR) holds these bits. SPARC has 4-bit CCR. N CSE360 Z V C N: Negative, Z: Zero, V: Overflow, C: Carry. All are shown in a trace, or in the reg command under ISEM. Condition codes are not changed by normal ALU instructions. Must use special instructions ending with cc, e.g., addcc. 111 Flow of Control 6 .text start: set 1, %r2 set 0xFFFFFFFE, %r1 cc_set: subcc %r1, %r2, %r3 end: ta 0 ISEM> reg ----0--- ----1--G 00000000 fffffffe O 00000000 00000000 L 00000000 00000000 I 00000000 00000000 PC: 08:00002028 cc_set ISEM> trace ----0--- ----1--G 00000000 fffffffe O 00000000 00000000 L 00000000 00000000 I 00000000 00000000 PC: 08:0000202c CSE360 ----2--- ----3--00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000 nPC: 0000202c : subcc ! –2 in 32-bit 2’s comp ! r3<= -2-1 ----4--- ----5--- ----6--- ----7--00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 PSR: 0000003e N:0 Z:0 V:0 C:0 %g1, %g2, %g3 ----2--- ----3--00000001 fffffffd 00000000 00000000 00000000 00000000 00000000 00000000 nPC: 00002030 ----4--- ----5--- ----6--- ----7--00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 PSR: 00b0003e N:1 Z:0 V:0 C:0 112 Flow of Control 7 – Setting the condition codes Regular ALU operations don’t set condition codes. Use addcc, subcc, smulcc, sdivcc, etc., to set condition codes. E.g., Suppose %r1 contains -4 and %r2 contains 5. N addcc subcc subcc subcc CSE360 %r1, %r1, %r2, %r1, %r2, %r2, %r1, %r1, Z V C %r3 %r3 %r3 %r3 113 ALU Hardware 1 How does a computer add? – Design a circuit that adds three single digit binary numbers. Results in a sum, and a carry out. Cin 0 1 0 1 0 1 0 1 X 0 0 0 0 1 1 1 1 Y 0 0 1 1 0 0 1 1 Sum 0 1 1 0 1 0 0 1 Cout 0 0 0 1 0 1 1 1 x cin y x cout y cin FA Sum Sum 1 cout CSE360 114 ALU Hardware 2 Now cascade the full adder hardware register x cout FA FA register y FA FA FA 0 register z How are CCR bits set? (Above is a ripple-carry adder.) – – – – CSE360 C-bit = Cout V-bit = Cout Cn-1 Z-bit = (rzn-1 rzn-2 rzn-3 ... rz0) N-bit = rzn-1 115 Flow of Control 8 – Branches use logic to evaluate CCR (SPARC) Operation CSE360 Assembler Syntax Branch Condition Branch always ba target 1 (always) Branch never bn target 0 (never) Branch not equal bne target Z Branch equal be target Z Branch greater bg target (Z (N V)) Branch less or equal ble target (Z (N V)) Branch greater or equal bge target (N V) Branch less bl target NV Branch greater, unsigned bgu target (C Z) Branch less or equal, unsigned bleu target CZ Branch carry clear bcc target C Branch carry set bcs target C Branch positive bpos target N Branch negative bneg target N Branch overflow clear bvc target V Branch overflow set bvs target V 116 Flow of Control 9 – Setting Condition Codes (continued) Synthetic instruction cmp %rs1, %rs2 – Sets CCR, but doesn't modify any registers. – Implemented as subcc %rs1, %rs2, %g0 Back to the factorial example (SPARC) loop: done: CSE360 set 1, %r1 set 2, %r2 set n, %r3 ld [%r3], %r3 ! ! ! ! %r1 %r2 Get Put = f = 1 = i = 2 loc of n n in %r3 cmp %r2, %r3 bg done nop ! Set CCR (i?n) ! i > n done ! Branch delay umul %r1, %r2, %r1 add %r2, 1, %r2 ! f = f * i ! i = i + 1 ba loop nop ! Goto loop ! Branch delay set f, %r3 st %r1, [%r3] ! Get loc of f ! f = n! 117 Flow of Control 10 – Branch delay slots: unique to RISC architecture Non-technical explanation: processor is running so fast, it can’t make a quick turn. – Instruction following branch is always executed. CSE360 Technical explanation: the efficiency advantage of pipelining is greater if the following instruction, which has almost completed execution, is allowed to complete. Compilers take advantage of branch delay slots by putting a useful instruction there if possible. For our purposes, use the nop (no operation) instruction to fill branch delay slots. Beware! Forgetting the nop will be a large source of errors in your programs! 118 High Level Control Structures 1 Converting high level control structures – You get to be the “compiler”. Some compilers convert the source language (C, Pascal, Modula 2, etc.) into assembly language and then assemble the result to an object file. GNU C, C++ do this to GAS (Gnu Assembler). – if-then-else, while-do, repeat-until are all possible to create in a structured way in assembly language. CSE360 119 High Level Control Structures 2 General guidelines – Break down into independent (or nested) logical units – Convert to if/goto pseudo-code. f = 1; for (i=2; i<=n; i++) f = f * i; f=1 i=2 loop: if (i>n) goto done f = f*i i = i+1 goto loop done: ... – Mechanical, step-by-step, non-creative process CSE360 120 High Level Control Structures 3 if-then-else if (a<b) c = d + 1; else c = 7; init: set ld set ld if: cmp bge nop set ld add ba nop else: set end: set st if/goto if (a >= b) goto else c = d + 1 goto end else: c = 7 end: CSE360 a, [%r2], b, [%r3], %r2, else %r2 %r2 %r3 %r3 %r3 d, %r5 [%r5], %r5 %r5, 1, %r4 end 7, c, %r4, ! ! ! ! ! ! get &a into r2 get a into r2 get &b into r3 get b into r3 a ?? b (want >=) a >= b, do then ! get &d into r5 ! get d into r5 ! r4 <- d+1 %r4 ! get 7 into r4 %r5 ! get &c into r5 [%r5] ! c <- r4 121 High Level Control Structures 4 while loops: while (a<b) a = a+1; c = d; if/goto: init: set ld set ld whle: cmp bge nop body: add st ba nop done: set ... a, [%r4], b, [%r3], %r2, done %r4 %r2 %r3 %r3 %r3 ! ! ! ! ! ! get &a into r4 get a into r2 get &b into r3 get b into r3 a ?? b (want >=) a >= b skip body %r2, 1, %r2 ! r2 = a + 1 %r2, [%r4] ! a = a + 1 whle ! repeat loop body c, %r5 ! get &c into r5 whle: if (a>=b) goto done body: a = a+1 goto whle done: c = d CSE360 122 High Level Control Structures 5 repeat-until loops: rpt: repeat … until (a>b) ... ... set ld set ld cmp ble nop a, [%r2], b, [%r3], %r2, rpt %r2 %r2 %r3 %r3 %r3 ; ; ; ; ; ; get &a into r2 get a into r2 get &b into r3 get b into r3 a <= b? do body again if/goto: repeat: … if (a<=b) goto repeat CSE360 123 High Level Control Structures 6 Complex condition if((a<b)and(b>=c)) … if((a<b)or(b>=c)) … These can be combined and used in if/else or while loops. CSE360 Primitive Language if (a>=b) then goto skip if (b<c) then goto skip body: ... ... skip: ... Primitive Language if (a<b) then goto body if (b<c) then goto skip body: ... ... skip: ... 124 Flow of Control 11 – Optimizing code: change order of instructions, combine instructions, take advantage of branch delay slots. Factorial example again. (for i:=n downto 1 do…) loop: CSE360 set 1, %r1 set n, %r2 ld [%r2], %r2 umul %r1, %r2, %r1 subcc %r2, 1, %r2 bg loop nop set f, %r3 st %r1, [%r3] ! ! ! ! ! ! ! ! ! %r1=f=1 Get loc of n Put n in %r2 f=f*n Decrement n Repeat Branch delay Get loc of f f=n! Reduced 7 instructions in loop to just 4. (You gain no advantage if you optimize code in your labs.) 125 Synthetic Instructions Remember lab0? x_m: y_m: z_m: .data .word 0x42 .word 0x20 .word 0 .text start: set ld set x_m, %r2 [%r2], %r2 y_m,%r3 ld [%r3], %r3 and so on… Suppose you gave this command to ISEM (after loading): ISEM> dump start start 05 00 00 10 84 10 a0 00 c4 00 80 00 07 00 00 10 Could you find the set instruction? CSE360 126 Instruction Encodings 1 First, Instruction Encoding is how instructions are assembled – All instructions must fit into 32 bits. Register-register: op=10, i=0 31 30 29 op op3 14 13 12 rs1 i 54 asi rs2 rd op3 rs1 i simm13 Floating point: op=10, i=0 op CSE360 rd 19 18 Register-immediate: op=10, i=1 op 25 24 rd op3 rs1 i opf rs2 127 Instruction Encodings 2 Call instructions: op=01 31 30 29 op Branch instructions: op=00, op2=010 31 30 29 28 25 24 22 21 op ia op2 cond disp22 SETHI instructions: op=00, op2=100 op disp30 rd op2 imm22 Ex.: add %r2, %r3, %r4 31 30 29 10 25 24 00100 19 18 000000 14 13 12 00010 0 54 00000000 00011 in hexadecimal: 88008003 CSE360 128 Understanding SET Synthetic Usually used to put the value of an address in memory into a register. For example, set 0x4004, %r3 Can do neither ‘add %r0, 0x4004, %r3’ nor ‘or %r0, 0x4004, %r3’. Why not? SET is a synthetic instruction which may be implemented in two steps. bit positions #1 #2 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 sethi 0x10, %r3 ! Puts 0x10 in the Most Significant 22 bits %r3 0 0 0 1 0 0 1 0 0 1 0 0x10 0 0 0 0 0 0 0 0 0 0 0 sethi %r3 0 0 0 0 0 0 0 0 0 0 0 9 8 7 6 5 4 3 2 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 hex value 0 1 0 0 1 0 0 1 0 0 0 0x12481248 0 x x x x x x x x x x 0x10 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0x4000 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0x4000 0 0x00000004 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0x4004 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0x 07 00 00 10 0 0x 86 10 E0 04 or %r3, 0x0004, %r3 ! Puts 0x0004 in the least significant bits %r3 0 0 0 0 0 0 0 0 0 0 0 0 0x0004 0 0 0 0 0 0 0 0 0 0 0 0 OR %r3 0 0 0 0 0 0 0 0 0 0 0 0 Machine language encoding for 'set 0x4004, %r3' sethi 0x10, %r3 or %r3, 4, %r3 CSE360 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 129 Decoding an Instruction 05 00 00 1016 0000 0101 0000 0000 0000 0000 0001 00002 Instruction Group (bits 30:31) = 00 Destination Register (bits 25:29) = 00010 Op Code (bits 22:24) = 100 Constant (bits 0:21) = 0000000000000000010000 Meaning: sethi 0x10, %r2 %r2 <-- 00000000000000000100000000000000 (0x4000) CSE360 130 More Decoding Binary 84 10 A0 00 Group O P Rd Rs1 Rs2 SICONST 1000 0100 0001 0000 1010 0000 0000 0000 C4 00 80 00 07 00 00 10 86 10 E0 04 CSE360 131 SET Synthetic Instruction set iconst, rd sethi or sethi or CSE360 %hi(iconst), rd rd, %lo(iconst), rd --or-%hi(iconst), rd --or-%g0, iconst, rd 132 Bitwise Operations 1 Bit Manipulation Instructions – Bitwise logical operations and %rs1, %rs2, %rd 10010011… (32 bits) 01111001… or %rs1, %rs2, %rd 10010011… (32 bits) 01111001… xor %rs1, %rs2, %rd 10010011… (32 bits) 01111001… CSE360 x 0 0 1 1 y 0 1 0 1 x y 0 0 0 1 x 0 0 1 1 y 0 1 0 1 x+y 0 1 1 1 x 0 0 1 1 y 0 1 0 1 xy 0 1 1 0 133 Bitwise Operations 2 andn %rs1, %rs2, %rd 10010011… (32 bits) 01111001… orn %rs1, %rs2, %rd 10010011… (32 bits) 01111001… not %rs, %rd 10010011… (32 bits) CSE360 x 0 0 1 1 y 0 1 0 1 xy 0 0 1 0 x 0 0 1 1 y 0 1 0 1 xy 1 0 1 1 x 0 1 x 1 0 Recall the cc operations, so andcc, orcc, etc. are available. (However, there is no notcc; use xnorcc.) 134 Bitwise Operations 3 CSE360 For what kinds of things are these bit level operations used? Recall the synthetic operation clr, and mov. clr %r2 or %r0, %r0, %r2 mov %r2, %r3 or %r0, %r2, %r3 Masking operations: Want to select a bit or group of bits from a set of 32. E.g., convert lower (or upper) to upper case: ‘a’ in binary is 01100001 ‘A’ in binary is 01000001 All we need to do is “turn off” the bit in position 5. and %r1, 0b11011111, %r1 will turn off that bit! What if we subtract 32 (0b100000) from %r1? What about converting upper to lower case? 135 Bitwise Operations 4 – Bitwise shifting operations Shift logical left: sll %rs1, %rs2, %rd %rs1: data to be shifted %rs2: shift count %rd: destination register E.g., set 0xABCD1234, %r2 sll %r2, 3, %r3 %r2: 1010 1011 1100 1101 0001 0010 0011 0100 %r3: 0101 1110 0110 1000 1001 0001 1010 0000 CSE360 sll is equivalent to multiplying by a power of 2 (barring overflow). (In the decimal system, what’s a shortcut for multiplying by a power of ten?) 136 Bitwise Operations 5 Shift Logical Right: srl %rs1, %rs2, %rd – Shifts right instead of left, inserting zeros. Arithmetic shifts: propagate the sign bit when shifting right, e.g., sra. (Left shift doesn't change.) – Almost equivalent to dividing by a power of 2. Rotating shifts: Bits that would have gone into the bit bucket are shifted in instead. (E.g., rr, rl) Rotate Right Rotate Left – Rotate not implemented in SPARC CSE360 137 More SPARC Assembly Language Assembler directives Are not encoded as machine instructions Memory alignment: .align 4 – Used when mixing allocations of bytes, words, halfwords, etc. and need word boundary alignment Reserve bytes of space: .skip 20 – Useful for allocating large amounts of space (e.g., arrays) Create a symbolic constant: .set mask, 0x0f – Can now use the word “mask” anywhere we could use the constant 0x0f previously All this is leading to additional addressing modes, which help us work with pointers, arrays, and records in assembly language. CSE360 138 Addressing Modes 1 Addressing Modes – How do we specify operand values? In a register, location is encoded in the instruction. As a constant, immediate value is in the instruction. In memory, operand is somewhere in memory, location may only be known at runtime. – Memory operands: CSE360 Effective address: actual location of operand in memory. This may be calculated implicitly (e.g., by a displacement in the instruction) or may be calculated by the programmer in code. 139 Addressing Modes 2 – Summary of addressing modes: Mode Immediate Register Direct Memory Direct Memory Indirect Register Indirect Register Indexed Register Displaced Post Increment Example add %r1, 100, %r1 add %r1, %r2, %r1 add %r1, [2000], %r2 add %r1, [[2000]], %r2 ld [%r1], %r2 st %r1, [%r2+%r3] st %r1, [%r2+x] Loc. Of Operand instruction %r2 mem[2000] mem[mem[2000]] mem[%r1] mem[%r2+%r3] mem[%r2+x] Suitable for SPARC? Constants Integers, constants Integers, constants Pointers Pointers Arrays Records Yes Yes No No Yes Yes Yes ld [%r1]+, %r2 ld -[%r1], %r2 Arrays, strings, stacks Arrays, strings, stacks No Pre Decrement mem[%r1] increment %r1 decrement %r1, mem[%r1] CSE360 No 140 Addressing Modes 3 – Memory Direct addressing Entire address is in the instruction (not in SPARC). E.g., accumulator machine: each instruction had an opcode and a hard address in memory. – Can’t be done on SPARC because an address is 32 bits, which is the length of an instruction. No room for opcodes, etc. Can be done in CISC because multi-word instructions are permitted. – Memory Indirect addressing CSE360 Pointer to operand is in memory. Instruction specifies location of pointer. Requires three memory fetches (one each for instruction, pointer, and data). Not in RISC machines because instruction is too slow; such an instruction would cause its own register interlock! 141 Addressing Modes 4 – Register Indirect addressing Register has address of operand (a pointer). Instruction specifies register number, effective address is contents of register. Ex.: .data n_m: .word .text set ld CSE360 5 ; initialize n to 5 n_m, %r1 [%r1], %r3 ; %r1 has n_m, pointer to n ; fetch n into %r3 142 Addressing Modes 5 n_m: a_m: sum_m: b_m: loop: .data .word .word .word .skip Ex.: sum up array of integers: 5 4,2,5,8,3 0 5*4 ! ! ! ! .text clr %r2 ! set n_m, %r3 ! ld [%r3], %r3 ! set a_m, %r4 ! ld [%r4], %r5 ! add %r5, %r2, %r2 ! add %r4, 4, %r4 ! subcc %r3, 1, %r3 ! bg loop ! nop ! set sum_m, %r1 ! st %r2, [%r1] ! ta 0 ! CSE360 Size of array 5 word array Sum of elements another 5 word array r2 will hold sum r3 points to n r3 gets array size r4 points to array a Load element of a into r5 sum = sum + element Incr ptr by word size Decrement counter Loop until count = 0 Branch delay slot r1 points to sum Store sum done 5 4 2 5 8 3 r2 r3 0 5 4 3 2 1 n_m a_m a_m+4 a_m+8 a_m+12 a_m+16 sum_m r4 r5 loop a_m loop+1 a_m+4 loop+2 a_m+8 loop+3 a_m+12 loop+4 a_m+16 143 Addressing Modes 6 C-style example of pointer data type char x; char * ptr; ptr = &x; *ptr = ‘a’; x_m r1 ptr_m r2 ‘a’ r3 x_m: ptr_m: // // // // object of type character pointer to character type ptr has address of x (points to x) store ‘a’ at address in ptr Assembly language equivalent .data x_m: .byte 0 .align 4 ptr_m: .word 0 .text set x_m, %r1 set ptr_m, %r2 st %r1, [%r2] set ’a’, %r3 set ptr_m, %r2 ld [%r2], %r1 stb %r3, [%r1] ! reserve character space; x_m = &x; [x_m] = x ! align to word boundary ! pointer variable; [ptr_m] = ptr ! ! ! ! ! ! ! get address x_m into %r1 get address ptr_m into %r2 make [ptr_m] point to [x_m] put character ‘a’ into r3 get address ptr_m into %r2 get address [ptr_m], i.e. x_m, into %r1 store ‘a’ at address [ptr_m], i.e., ptr ‘a’ x_m, i.e., addr of x CSE360 144 Addressing Modes 7 – Register Indexed addressing .data A: .skip Suitable for accessing successive elements of the same type in a data structure. Ex.: Swap elements A[i] and A[k] in array 24*4 ! reserve array[0..23] of int ! assume i is in %r2 and k is in %r3 .text set A, %r4 ! beginning of array ptr. sll %r2, 2, %r2 ! “multiply” i by 4 sll %r3, 2, %r3 ! “multiply” k by 4 ld [%r2+%r4], %r7 ! r7 <- a[i] r2 r3 ld [%r3+%r4], %r8 ! r8 <- a[k] st %r8, [%r2+%r4] ! a[i] <- r8 001 0010 <- r7 st %r7, [%r3+%r4] ! a[k] <= 100 1000 CSE360 A A+4 A+8 A+12 r4 A r7 r8 after sll Effective address calculations! 145 Addressing Modes 8 Simulating Register Indirect addressing on SPARC – SPARC doesn't truly have register indirect addressing. We can write st %r2, [%r1] but assembler converts this automatically into st %r2, [%r1+%r0] Array mapping functions: used by compilers to determine addresses of array elements. Must know upper bound, lower bound, and size of elements of array. – – – – CSE360 Total storage = (upper - lower + 1)*element_size Address offset for element at index k = (k - lower)*element_size Address (byte) offset for A[3] = (3-0)*4 = 12 This is for 1 dimensional arrays only! 146 Addressing Modes 9 1D array mapping functions: Want an array of n elements, each element is 4 bytes in size, array starts at address arr. – – – – Total storage is 4n bytes First element is at arr+0 Last element is at arr+4(n-1) kth (k can range from 0…n-1) element is at arr+4k. Array uses zero-based indexing. arr+0 k=0 arr+4 k=1 arr+8 k=2 arr+12 arr+16 arr+20 k=3 k=4 k=5 array of 6 elements, 4 bytes each CSE360 147 Addressing Modes 10 2D array mapping functions: must linearize the 2D concept; e.g., map the 2D structure into 1D memory. 0 1 2 3 4 0 0,0 0,1 0,2 0,3 0,4 1 1,0 1,1 1,2 1,3 1,4 2 2,0 2,1 2,2 2,3 2,4 3 Rows (0...2) 5 Columns (0...4) – Convert into 1D array in memory 0,0 CSE360 0,1 0,2 0,3 0,4 1,0 1,1 ..... 2,3 2,4 148 Addressing Modes 11 2 ways to convert to 1D – Row major order (Pascal, C, Modula-2) stores first by rows, then by columns. E.g., 0,0 0,1 0,2 0,3 0,4 1,0 1,1 ..... 2,3 2,4 – Column major order (FORTRAN) stores first by columns then by rows. E.g., 0,0 1,0 2,0 0,1 1,1 2,1 0,2 ..... 1,4 2,4 – Row major 2D array mapping function: Given an array starting at address arr that is x rows by y columns, each element is m bytes in size, and indices start at zero, then element (i, j) may be found at location: arr + (y i + j) m CSE360 149 Addressing Modes 12 3D array mapping function: natural extension of 2D function. Store by row, then column, then depth. +1 +3 +5 +7 +9 0,0,1 0,1,1 0,2,1 0,3,1 0,4,1 +0 +2 1,0,1 +4 1,1,1 +6 1,2,1 +8 1,3,1 1,4,1 0,0,0 0,1,0 0,2,0 0,3,0 0,4,0 +10 +122,0,1 +142,1,1 +162,2,1 +18 2,3,1 0,1,0 1,1,0 1,2,0 1,3,0 1,4,0 1,0,0 2,4,1 2,0,0 2,1,0 2,2,0 2,3,0 2,4,0 3 Rows, 5 Columns, 2 Depth – Array starting at arr with x rows, y columns, depth z, m element size. Element (i, j, k) is found at location: arr + (z(yi + j) + k)m CSE360 150 Addressing Modes 13 CALCULATE: total storage offset for A(i,j,k) address for A(i,j,k) element size (#bytes) # rows (x) # cols (y) # depth (z) starting addr (0) i= j= k= CSE360 1D 4 7 1 1 4 1 0 0 2D 2 3 5 1 8 1 1 0 3D 1 3 5 2 12 0 1 1 151 Addressing Modes 14 ! Example that adds 1 to every element of columns 1 and 2, not 0, of a 5 by 3 array .data .set rows, 5 ! define symbolic constants .set cols, 3 arr_m: .skip rows * cols * 4 ! allocate space (.skip 60 same) .text ... set arr_m, %r3 ! get address of array clr %r1 ! %r1 is i (row) loop1: cmp %r1, rows ! done if i >= rows bge done nop set 1, %r2 ! %r2 is j (col); start at one (skip col zero) loop2: cmp %r2, cols ! if at last column, done with row bge inc1 nop umul %r1, cols, %r4 ! # elements to skip for current row add %r4, %r2, %r4 ! then which column being accessed umul %r4, 4, %r4 ! change from element to byte offset ld [%r3+%r4], %r5 ! get arr[i][j] add %r5, 1, %r5 ! add 1 to the element value st %r5, [%r3+%r4] ! store it back to arr[i][j] inc2: add %r2, 1, %r2 ! next column ba loop2 ! continue inner loop over columns nop inc1: inc %r1 ! next row ba loop1 ! continue outer loop over rows nop done: ... CSE360 152 Addressing Modes 15 – Displacement Addressing Suitable for accessing the individual fields of record data structures. Each field can be of a different type. 20 Characters Name Age Integer DOB Integer Logical view of a record Use .set directive to establish offsets to fields within records. Then use displacement addressing to access those fields. Actual layout of record in memory 20 bytes person+0 CSE360 4 bytes 4 bytes person+20 person+24 153 Addressing Modes 16 Ex.: Add 1 to the age field in a person record .data .set name, 0 .set age, 20 .set dob, 24 person: .skip 28 ! ! ! ! offset to offset to offset to size of a name field age field date of birth person record ! ! ! ! get addr of person record get the age of the person increment age by 1 store back to record .text .... set ld add st person, %r1 [%r1+age], %r2 %r2, 1, %r2 %r2, [%r1+age] CSE360 Problem: alignment in memory. May have to waste some space in the person record in order to have the integer fields align on a word boundary. 154 Addressing Modes 17 – Auto-increment and Auto-decrement addressing CSE360 SPARC does not support these modes. They may be simulated using register indirect addressing followed by an add or subtract of the size of the element on that register. Useful for traversing arrays forward (auto-increment) and backward (auto-decrement). Also useful for stacks and queues of data elements. 155 Subroutines 1 – Subroutines and subroutine linkage Subroutines: programming mechanism to facilitate repeated computations and modularization. – Use of subroutines CSE360 Basis for structured and disciplined programming Compact code (no need to write monolithic loops) Relatively easy to debug (no cut-and-paste errors) Requires little hardware support, mostly protocols and conventions to handle parameters. 156 Subroutines 2 – Terminology CSE360 Caller: the code (which could be a subroutine itself) which invokes the subroutine of interest Callee: the subroutine being invoked by the caller Function: subroutine that returns one or more values back to the caller and exactly one of these values is distinguished as the return value Return value: the distinguished value returned by a function 157 Subroutines 3 – Terminology (continued) CSE360 Procedure: a subroutine that may return values to the caller (through the subroutine’s parameter(s)), but none of these values is distinguished as the return value Return address: address of the subroutine call instruction Parameters: information passed to/from a subroutine (a.k.a. arguments) Subroutine linkage: a protocol for passing parameters between the caller and the callee 158 Subroutines 4 – Subroutine linkage Calling a subroutine – Assembly language syntax for calling a subroutine call label nop – Must change the program counter (as in a branch instruction) however, we must also keep track of where to resume execution after the subroutine finishes. Call instruction handles this atomically (i.e., without interruption) by: %r15 #PC (PC #nPC) nPC label Returning from a subroutine – Assembly language syntax for returning from a subroutine retl nop CSE360 159 Subroutines 5 Returning from a subroutine (continued) – Again, must change the program counter to return to an instruction after the one that called the subroutine. The address of the instruction that called it was saved in %r15, and we must skip over the branch delay slot as well. So, this is accomplished by: nPC %r15+8 Parameter passing: 2 approaches – Register based linkage: pass parameters solely through registers. Has the advantage of speed, but can only pass a few parameters, and it won’t support nested subroutine calls. Such a subroutine is called a leaf subroutine. – Stack based linkage: pass parameters through the run-time stack. Not as fast, but can pass more parameters and have nested subroutine calls (including recursion). CSE360 160 Register-based Linkage 1 – Subroutine linkage: Startup Sequence Cleanup Sequence Prologue Body tl CSE360 Callee Caller re Startup Sequence: load parameters and return address into registers, branch to subroutine. Prologue: if non-leaf procedure then save return address to memory, save registers used by callee. Epilogue: place return parameters into registers, restore registers saved in prologue, restore saved return address, return. Cleanup Sequence: work with returned values ca ll Epilogue 161 Register-based Linkage 2 – Example: Print subroutine. main: print: CSE360 .text set set mov call nop mov call nop add call nop ta set or mov ta mov ta retl nop 1, %r1 3, %r2 %r1, %r8 print ! Initialize r1 and r2 %r2, %r8 print ! Print %r2 %r1, %r2, %r8 print ! Do our calculation ! Print the result (expect ‘4’) 0 ‘0’, %r1 %r8, %r1, %r2 %r2, %r8 1 ‘\n’, %r8 1 ! Print %r1 ! ! ! ! Ascii value of zero Treat r8 as parameter Move into output register Output character ! Output end of line (newline) ! Return What’s wrong with the above code? 162 Register-based Linkage 3 – Which registers can leaf subroutines change? Convention for optimized leaf procedures: Register(s) %r0 %r1 %r2-%r7 %r8 %r8-%r13 %r14 %r15 %r30 %r16-%r29, %r31 CSE360 Use Zero Temporary Caller’s variables Return value Parameters Stack pointer Return address Frame pointer Caller’s variables Mentionable? Yes Yes No Yes Yes No Yes, but preserve No No The subroutine must not use the value in any other register except to save it to memory somewhere and restore it before returning to the caller. Problem: how can a subroutine call another subroutine? How can a subroutine call itself? 163 Register-based Linkage 4 – Example: procedure to print linked list of ints. head 5 7 4 1 .data .set dta, 0 .set ptr, 4 head: .word 0 ! offset in record to data ! offset in record to next pointer .text main: . . . . set head, %r8 ld [%r8], %r8 call trav nop . . . . ! ! ! ! ! nil does all init and allocation of list prepare parameter to traverse proc follow head pointer to first node call subroutine branch delay trav: mov %r8, %r1 ! copy pointer to %r1 loop: cmp %r1, 0 ! check for null pointer be done ! null pointer means we are done nop ! branch delay ld [%r1+dta], %r8 ! follow pointer and get data field ta 4 ! print data field ld [%r1+ptr], %r1 ! get pointer to next record ba loop nop ! branch delay done: retl nop CSE360 164 Parameter Passing 1 – Review of parameter passing mechanisms: CSE360 Pass by value copy: parameters to subroutine are copies upon which the subroutine acts. Pass by result copy: parameters are copies of results produced by the subroutine. Pass by reference copy: parameters to subroutine are (copies of) addresses of values upon which the subroutine acts. Callee is responsible for saving each result to memory at the location referred to by the appropriate parameter. Hybrid: some parameters passed by value copy, some by result copy, and/or some by reference copy. Callee is responsible for saving results for reference parameters. 165 Parameter Passing 2 – Parameter passing notes: Array or record parameters typically are passed by reference copy (efficiency reasons). Primitive data types may be passed either way. Conventions among languages allows any language to call functions in any other language: – Pascal: VAR parameters are passed by reference copy; all others are passed by value copy. – C: all parameters are passed by value copy. Must explicitly pass a pointer if you want a reference parameter. – C++: like Pascal, can pass by value or reference copy. – FORTRAN: all things passed by reference copy (even constants). – ADA: pass by value/result copy. CSE360 166 Parameter Passing 3 .text ! Example 10.1 of Lab Manual ! pr_str – print a null terminated string ! Parameters: %r8 – pointer to string (initially) ! ! Temporaries: %r8 – the character to be printed ! %r9 – pointer to string ! pr_str: mov %r8, %r9 ! we need %r8 for the “ta 1” below pr_lp: ldub [%r9], %r8 ! load character cmp %r8, 0 ! check for null be pr_dn nop ta 1 ! print character ba pr_lp inc %r9 ! increment the pointer (in ! branch delay slot) pr_dn: retl nop CSE360 167 Parameter Passing 4 Summary from text (p. 220) – Pass by value copy: For small “in” parameters. Subroutines cannot alter the originals whose copies are passed as parameters. – Pass by value/result copy: For small “in/out” parameters. Caller’s cleanup sequence stores values of any “in/out” parameters. – Pass by reference copy: for “in/out” parameters of all sizes, and large “in” parameters. “Out” values are provided by changing memory at those addresses. (Note: pass by reference copy is passing an address by value copy.) CSE360 168 Parameter Passing 5 – Write Sparc code for the caller and callee for the following subroutine using register based parameter passing ! ! ! ! ! ! ! ! ! ! ! ! ! ! global_function Integer subchr (A, B, C) Substitutes character C for each B in string [A], and returns count of changes. // In comments, "[A+index]" is index = 0 count = 0 LOOP: if [A+index]=0 go to END if [A+index]B go to INC [A+index]=C count=count+1 INC: index=index+1 go to LOOP END: Assume C_m: B_m: A_m: R_m: CSE360 denoted by "ch". // while (ch != 0) { // if (ch == B) { // ch = C; // count++; } // index++; // } .data ! data section .byte ’I’ ! parameter C .byte ’i’ ! parameter B .asciz "i will tip" ! parameter A .align 4 .word 0 ! for storing result count 169 Stack-based Linkage 1 Stack based linkage – Advantages Permits subroutines to call others. Allows a larger number of parameters to be passed. Permits records and arrays to be passed by value copy. Saving of registers by callee is “built-in”. A way for callee to reserve memory for other uses is “built-in”, too. – Disadvantages Slower than register based More complex protocol – Why a stack? CSE360 Subroutine calls and returns happen in a last-in first-out order (LIFO). Also known as a runtime stack, parameter stack, or subroutine stack. 170 Stack-based Linkage 2 Items “saved” on the stack in one activation record – Parameters to the subroutine – Old values of registers used in the subroutine – Local memory variables used in subroutine – Return value and return address CSE360 Say A() calls B(), B() calls C(), and C() calls A() Runtime Stack 2nd stack frame for A 1st stack frame for C 1st stack frame for B 1st stack frame for A Expanded View Local variables Saved general purpose registers Return addresses Return values Parameters 171 Stack-based Linkage 3 – Stack based linkage parameter passing convention Startup sequence: Caller – Push parameters – Push space for return value Prologue Epilogue – Restore general purpose registers – Free local variable space – Use return address to return Body tl Cleanup Sequence Prologue re – Push registers that are changed (including return address) – Allocate space for local variables Startup Sequence Callee ca ll Epilogue Cleanup Sequence – Pop and save returned values – Pop parameters CSE360 172 Stack-based Linkage 4 – Stack based parameter passing example: Register %r14 %sp stack pointer – Invariant: Always indicates the top of the stack (it has the address in memory of the last item on stack, usually a word). – Moved when items are “pushed” onto the stack. – Due to interruptions (system interrupts (I/O) and exceptions), values stored above %sp (at addresses less than %sp) can change at any time! Hence, any access above %sp is unsafe! Register %r30 %fp frame pointer – Indicates the previous stack pointer. Activation record is from (some subroutine-specific number of words before) the %fp to the %sp. – Invariant: %fp is constant within a subroutine (after prologue). CSE360 173 Stack-based Linkage 5 – Stack based parameter passing example: ! ! ! ! ! ! ! ! ! ! ! ! ! ! Want to implement the following subroutine (also a caller): global_function Integer subchr (A, B, C) Substitutes character C for all B in string A, and returns count of changes. // In comments, "*(A+index)" is index = 0 count = 0 LOOP: if *(A+index)=0 go to END if *(A+index)B go to INC *(A+index)=C count=count+1 INC: index=index+1 go to LOOP END: denoted by "ch". // while (ch != 0) { // if (ch == B) { // ch = C; // count++; } // index++; // } C_m: B_m: A_m: R_m: CSE360 .data ! data section .byte ’I’ ! parameter C .byte ’i’ ! parameter B .asciz "i will tip" ! parameter A .align 4 .word 0 ! for storing result count 174 Stack-based Linkage 6 .data ! data section .word ’I’ ! parameter C .word ’i’ ! parameter B .asciz "i will tip" ! parameter A .align 4 ! align to word address stack: .skip 250*4 ! allocate 250 word stack bstak: ! point to bottom of stack R_m: .word 0 ! reserve for count .text ! Program’s one-time initialization start: set bstak, %sp ! set initial stack ptr mov %sp, %fp ! set initial frame ptr ! STARTUP SEQUENCE to call subchr() sub %sp, 16, %sp ! move stack ptr set A_m, %r1 ! A is passed by reference st %r1, [%sp+4] ! push address on stack set B_m, %r1 ! B is passed by value ld [%r1], %r1 ! get value of B st %r1, [%sp+8] ! push parameter B on stack set C_m, %r1 ! C is passed by value ld [%r1], %r1 ! get value of C st %r1, [%sp+12] ! push parameter C on stack ! SUBROUTINE CALL call subchr ! make subroutine call nop ! branch delay slot ! CLEANUP SEQUENCE ld [%sp], %r1 ! pop return value off stack add %sp, 16, %sp ! pop stack set R_m, %r2 ! get address of R st %r1, [%r2] ! store R . . . ! the rest of the program C_m: B_m: A_m: CSE360 stack: %sp -> Return value addr (a) b c %fp -> 175 Stack-based Linkage 7 ! SUBROUTINE PROLOGUE subchr: sub %sp, 32, %sp st %fp, [%sp+28] add %sp, 32, %fp st %r15, [%fp-8] st %r8, [%fp-12] … ! ! ! ! ! ! open 8 words on stack Save old frame pointer old sp is new fp save return address Save gen. Register Save r9-r13, omitted ! SUBROUTINE BODY ld_reg: ld [%fp+4], %r8 ld [%fp+8], %r9 ld [%fp+12], %r10 clr %r12 clr %r13 loop: ldub [%r8+%r13], %r11 cmp %r11, 0x0 be done cmp %r11, %r9 bne inc nop stb %r10, [%r8+%r13] add %r12, 1, %r12 inc: add %r13, 1, %r13 ba loop nop done: st %r12, [%fp+0] ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! “pop” (load) addr of A “pop” (load) value of B “pop” (load) value of C %sp -> count index load a string chr is chr=null? then go to done is chr<>B? (branch delay) then go to inc branch delay slot change chr to C increment count %fp -> increment index do next chr branch delay slot “push” (store) count on stack ! ! ! ! ! ! ! Restore r9-r13, omitted Restore r8 get saved return address Get old value of frame ptr Restore stack pointer return to caller branch delay slot ! EPILOGUE CSE360 … ld [%fp-12], %r8 ld [%fp-8], %r15 ld [%fp-4], %fp add %sp, 32, %sp retl nop ... %r9 %r8 return addr old frame ptr Return value addr (a) b c 176 Stack-based Linkage 8 General Guidelines – Keep Startups, Cleanups, Prologues, and Epilogues standard (but not necessarily identical); easy to cut, paste, and modify. – Caller: leave space for return value on the TOP of the stack. – Callee: always save and restore locally used registers. – Pass data structures and arrays by reference, all others by value (efficiency). CSE360 177 Our Fourth Example Architecture Motorola M68HC11 Called “HC11” for short Used in ECE 567, a course required of CSE majors References: – Data Acquisition and Process Control with the M68HC11 Microcontroller, 2nd Ed., by F. F. Driscoll, R. F. Coughlin, and R. S. Villanucci, Prentice-Hall, 2000. – http://www.cse.ohiostate.edu/~heym/360/common/e_series.pdf CSE360 178 Another Reference Late in an academic term (such as now), you can hope to access on-line lecture notes from the Electrical and Computer Engineering course, ECE 265. Visit http://www.ece.osu.edu Under “Academic Program”, click on the link “ECE Course Listings”. Find 265 and click on the link “Syllabus of this quarter”. CSE360 179 HC11 compared with Sparc (1) HC11 Sparc CISC RISC, Load/Store Instruction encoding lengths vary (8 to 32 bits) Instruction encoding lengths constant (32 bits) About 316 instructions About 175 instructions 4 16-bit user registers, one of which is divided into two 8bit registers 32 32-bit user integer registers CSE360 180 HC11 compared with Sparc (2) HC11 Sparc 8-bit data bus 32-bit data bus 16-bit address bus 32-bit address bus 8-bit addressable 8-bit addressable Instruction execution not overlapped Instruction execution overlapped in a pipeline CSE360 181 HC11 compared with Sparc (3) A Strange Fact: The HC11 architecture “allows accessing an operand from an external memory location with no execution-time penalty.” [p. 27, M68HC11 Processor Manual, http://www.cse.ohio-state.edu/~heym/360/common/e_series.pdf] Reason: The HC11 requirements state that the CPU cycle must be kept long enough to accommodate a memory access within one cycle. This seeming miracle is accomplished by keeping processor speed slow enough. CSE360 182 HC11 Programmer’s Model (1) 7 0 0 7 Accumulator A Accumulator B Accumulator D 0 15 X Index Register Y Index Register Stack Pointer (SP) Program Counter (PC) CSE360 183 HC11 Programmer’s Model (2) Condition Code Register (CCR) 7 6 5 4 3 2 1 0 S X H I N Z V C Carry/Borrow Overflow Zero Negative I Interrupt Mask Half-Carry X Interrupt Mask Stop CSE360 184 HC11 Assembly Language Format (1) Like Sparc, it is line-oriented. A line may: – Be blank (containing no printable characters), – Be a comment line, the first printable character being either a semicolon (‘;’) or an asterisk (‘*’), or – Have the following format (“[] means an optional field”): [Label] Operation [Operand field] [Comment field] CSE360 185 HC11 Assembly Language Format (2) Label: – begins in column 1, ending either with a space or a colon (‘:’) – Contains 1 to 15 characters – Case sensitive – The first character may not be a decimal digit (0-9) – Characters may be upper- or lowercase letter, digits 0-9, period (‘.’), dollar sign (‘$’), or underscore (‘_’) CSE360 186 HC11 Assembly Language Format (3) Operation: – Cannot begin in column 1 – Contains: Instruction mnemonic, Assembler directive, or Macro call (we haven’t studied macro expansion in this course) Operand field: – Terminated by a space or tab character, – So multiple operands are separated by commas (‘,’) without using any spaces or tabs CSE360 187 HC11 Assembly Language Format (4) Comment field: – Begins with the first space character following the operand field (or following the operation, if there is no operand field) – So no special printable character is required to begin a comment field – But it appears to be conventional to begin a comment field with a semicolon (‘;’) CSE360 188 Prefixes for Numeric Constants Encoding Decimal HC11 Sparc No symbol No symbol Hexadecimal $ 0x Octal @ 0 Binary % 0b CSE360 189 Assembler Directives (1) Meaning HC11 Sparc Set location counter (origin) ORG .data or .text End of source END Doesn’t have Equate symbol to a value EQU .set Form constant byte FCB .byte CSE360 190 Assembler Directives (2) Meaning HC11 Sparc Form double byte FDB .half Form character string constant FCC .ascii Reserve memory byte or bytes RMB .skip CSE360 191 HC11 Addressing Modes Immediate (IMM) Extended (EXT) Direct (DIR) Inherent (INH) Relative (REL) Indexed (INDX, INDY) CSE360 192 Immediate (IMM) Assembler interprets the # symbol to mean the immediate addressing mode Examples – – – – – – CSE360 LDAA LDAA LDAA LDAA LDAA LDAA #10 #$1C #@17 #%11100 #’C’ #LABEL 193 Extended (EXT) Lack of # symbol indicates extended or direct addressing mode. These are forms of memory direct addressing, like SAM. “Extended” means full 16-bit address, whereas “Direct” means directly to a low address, specified using only the least significant 8 bits of the address. Examples – LDAA $2025 – LDAA LABEL CSE360 194 Direct (DIR) Examples – LDAA $C2 – LDAA LABEL CSE360 195 Inherent (INH) All operands are implicit (i.e., inherent in the instruction) Examples: ABA, SBA, DAA ABA means add the contents of register B to the contents of A, placing the sum in A (A + B A) SBA means A – B A DAA means to adjust the sum that got placed in A by the previous instruction to the correct BCD result; e.g., $09 + $26 yields $2F in A, then DAA changes this to $35. CSE360 196 Relative (REL) Used only for branch instructions Relative to the address of the following instruction (the new value of the PC) Signed offset from -128 to +127 bytes Examples – BGE – BHS – BGT CSE360 -18 27 LABEL 197 Indexed (INDX, INDY) Uses the contents of either the X or Y register and adds it to a (positive, unsigned) offset contained in the instruction to calculate the effective address Example – LDAA 4,X CSE360 198 Interrupts When an interrupt is acknowledged, the CPU’s hardware saves the registers’ contents on the stack. An interrupt service routine ends with a(n) RTI instruction. This instruction automatically restores the CPU register values from the copies on the stack. CSE360 199 Condition Code Register (CCR) It’s reasonably safe to say that every instruction that changes a register (A, B, D, X, Y, SP) affects the CCR appropriately. Unlike Sparc, there are no arithmetic instructions that do not set condition codes. There do exist instructions that compare a register to a memory location by subtracting the memory contents from the register and throwing the result away, but setting the CCR (CMPA, CMPB, CPD, CPX, CPY). CSE360 200 Example HC11 Program Problem: Produce the following waveforms on the three least significant bits (LSBs) of parallel 8-bit output Port B (mapped to $1004), where we name the bits X, Y, and Z in increasing order of significance (X is bit 0; Y is bit 1; Z is bit 2). 10 ms X 20 ms Y 15 ms Z CSE360 201 Example Source File, p. 1 STACK: EQU PORTB: EQU $00FF $1004 ORG 0 DELAY1: FCB 10 DELAY2: FCB 20 DELAY3: FCB 15 CSE360 ; set stack pointer ; set address of Port B ; set the waveform times ; for X, Y, and Z 202 Example Source File, p. 2 ORG $E000 ; program starts at $E000 MAIN: LDS #STACK ; initialize stack pointer L0: LDAA #1 ; set X on Port B to 1 STAA PORTB LDAB DELAY1 ; delay for 10 ms L1: JSR DELAY_1MS DECB BNE L1 CSE360 203 Example Source File, p. 3 LDAA #%00000010 ; set Y on Port B to 1 STAA PORTB LDAB DELAY2 ; delay for 20 ms L2: JSR DELAY_1MS DECB BNE L2 LDAA #%00000100 ; set Z on Port B to 1 STAA PORTB LDAB DELAY3 ; delay for 15 ms L3: JSR DELAY_1MS DECB BNE L3 BRA L0 ; continue to cycle CSE360 204 Example Source File, p. 4 DELAY_1MS: PSHB LDAB #198 DELAY: DECB BRN DELAY NOP BNE DELAY PULB RETURN: RTS RESET: CSE360 ORG FDB END $FFFE MAIN ; subr. to delay for 1 ms ; initialize reset vector 205 Traps and Exceptions 1 Traps, Exceptions, and Extended Operations – Other side of low level programming -- the interface between applications and peripherals – OS provides access and protocols CSE360 206 Traps and Exceptions 2 – BIOS: Basic Input/Output System Subroutines that control I/O No need for you to write them as application programmer OS interfaces application with BIOS through traps (extended operations (XOPs)) Applications software BIOS Keyboard CSE360 Screen Mouse Disk 207 Traps and Exceptions 3 – Where are OS traps kept? Two approaches: Transient monitor: traps kept in a library that is copied into the application at link-time Appl 1 Appl 2 Appl 3 Appl 4 OS rtns OS rtns OS rtns OS rtns Resident monitor: always keep OS in main memory; applications share the trap routines. Appl 1 Appl 3 Appl 5 Appl 2 Appl 4 Appl 6 OS rtns CSE360 OS routines monitor devices. Frequently used routines kept resident; others loaded as needed. 208 Traps and Exceptions 4 – (Assuming a res. monitor) How to find I/O routines? Store routines in memory, and make a call to a hard address. E.g., call 256 – When new OS is released, need to recompile all application programs to use different addresses. Use a dispatcher – Dispatcher is a subroutine that takes a parameter (the trap number). Dispatcher knows where all routines actually are in memory, and makes the branch for you. Dispatcher subroutine must always exist in the same location. BIOS 1 Application Dispatcher BIOS 12 BIOS n CSE360 209 Traps and Exceptions 5 Use vectored linking – Branch table exists at a well known location. The address of each trap subroutine is stored in the table, indexed by the trap number. – On RISC, usually about 4 words reserved in the table. If the trap routine is larger than 4 words, can call the actual routine. CSE360 100 Addr of trap 0 100 104 Addr of trap 1 116 108 Addr of trap 2 132 100+4n Addr of trap n 100+16n 210 Traps and Exceptions 6 – Levels of privilege Supervisor mode - can access every resource User mode - limited access to resources OS routines operate in supervisor mode, access is determined by bit in PSW (processor status word). XOP (book’s notation) can always be executed, sets privilege to supervisor mode (ta) RTX (book’s notation) can only be executed by the OS, and returns privilege to user mode (rett) – Exceptions CSE360 Caused by invalid use of resource. E.g., divide by zero, invalid address, illegal operation, protection violation, etc. 211 Traps and Exceptions 7 Control transferred automatically to exception handler routine. Similar to trap or XOP transfer. Exceptions vs. XOPs – XOPs explicit in code, exceptions are implicit – XOPs service request and return to application; exceptions print message and abort (unless masked). – Trap example: non-blocking read ta 3 CSE360 If there is nothing in the keyboard buffer, return with a message that nothing is there. Otherwise, put the character into register 8. 212 Traps and Exceptions 8 Status of the keyboard is kept in a memory location, as is the (one-character) keyboard buffer. Memory mapped devices. ! ta 3 returns character if one is there, otherwise ! it returns 0x8000000 into %r8 set 0x8000000, %r8 ! set default return val set KbdStatus, %r1 ! KbdStatus is memory loc ld [%r1], %r1 ! read status (1 is ready) andcc %r1, 1, %r1 ! check status be rtn ! can’t read anything set KbdBuff, %r1 ! KbdBuff is memory loc ld [%r1], %r8 ! get character rtn: rett ! return to caller CSE360 On SPARC, trap table has 256 entries. 0-127 are reserved for exceptions and external interrupts. 128-255 are used for XOPs. Trap table begins at address 0x0000. Each entry is 4 instructions (16 bytes) long. 213 Traps and Exceptions 9 Trap execution: ta 3 – Calculate trap address: 3 * 16 + 0x0800 = 16 * (3 + 0x080) – Save nPC and PSW to memory • SPARC uses register windows • Assumes local registers are available – Set privilege level to supervisor mode – Update PC with trap address (and make nPC = PC + 4) (jumps to trap table) – Trap table has instruction ba ta3_handler – rett • Restores PC (from saved nPC value) and PSW (resets to user mode) • Returns to application program CSE360 214 Programmed I/O 1 Programmed I/O – Early approach: Isolated I/O CSE360 Special instructions to do input and output, using two operands: a register and an I/O address. CPU puts device address on address bus, and issues an I/O instruction to load from or store to the device. 215 Programmed I/O 2 Isolated I/O addr bus data bus Memory read/write CPU addr bus data bus I/O read/write CSE360 216 Memory Mapped I/O No special I/O instructions. Treat the I/O device like a memory address. Hardware checks to see if the memory address is in the I/O device range, and makes the adjustment. Use high addresses (not “real” memory) for I/O memory maps. E.g., 0xFFFF0000 through 0xFFFFFFFF. memory addr bus unused data bus Memory read/write CPU I/O I/O unused CSE360 217 Programmed I/O 3 – Advantages of each CSE360 Memory mapped: reduced instruction set, reduced redundancy in hardware. Isolated: don’t have to give up memory address space on machines with little memory 218 Programmed I/O - UARTs UARTs – Universal Asynchronous Receiver Transmitter Keyboard 01101010 serial UART parallel 0 1 1 0 CPU . . 0 – Asynchronous = not on the same clock. – Handshake coordinates communication between two devices. – A kind of programmed I/O. CSE360 219 UARTs 1 UART registers – Control: set up at init, speed, parity, etc. – Status: transmit empty, receive ready, etc. – Transmit: output data – Receive: input data – All four needed for bidirectional communications, – Status/control, transmit / receive often combined. Why? CSE360 Control bus Address bus Control Reg Status Reg Transmit Reg Receive Reg Transmit Logic Receive Logic Data bus 220 UARTs 2 FFFF 0000 Memory mapped UARTs – Both memory and I/O “listen” to the address bus. The appropriate device will act based on the addresses. – Keyboards and Printers require three addresses (when addresses are not combined). – Modems require four. – (why?) Address bus Control bus CPU Memory UART1 UART 1 data FFFF 0004 UART 1 status FFFF 0008 UART 1 control FFFF 000C UART 2 xmit FFFF 0010 UART 2 recv FFFF 0014 UART 2 status FFFF 0018 UART 2 control FFFF 001C UART 3 xmit and so on UART2 Data bus CSE360 221 Programmed I/O 4 Programmed I/O Characteristics: – Used to determine if device is ready (can it be read or written). – Each device has a status register in addition to the data register. – Like previous trap example, must check status before getting data. – Involves polling loops. CSE360 222 Programmed I/O – Polling Ex.: ta 2 handler (blocking keyboard input) ta_2_handler: set KbdBuff, %r1 set KbdStatus, %r9 wait: ld [%r9], %r10 andcc %r10, 1, %r10 be wait nop ld [%r1], %r8 rett ! ! ! ! ! ! ! ! get addr of kbd buffer get addr of kbd status get status check if ready loop until ready branch delay get data return from trap Are you ready?... Are you ready now?... How about NOW?... Nope .. Not yet.. Hang on.. Can’t afford to wait like this. Computer is millions of times faster than a typist. Also, multi-tasking operating systems can’t wait. Special purpose computers can wait. E.g., microwave oven controllers. Must have a better way! Interrupts are the answer! CSE360 223 Interrupts and DMA transfers 1 Programmed (polled) I/O used busy waiting. – Advantages: simpler hardware – Disadvantages: wastes time Interrupts (IRQs on PCs) – I/O device “requests” service from CPU. – CPU can execute program code until interrupted. Solves busy waiting problems. – Interrupt handlers are run (like traps) whenever an interrupt occurs. Current application program is suspended. CSE360 224 Interrupts and DMA transfers 2 Servicing an interrupt – I/O controller generates interrupt, sets request line “high”. – CPU detects interrupt at beginning of fetch/execute cycle (for interrupts “between” instructions). – CPU saves state of running program, invokes intrpt. handler. – Handler services request; sets the request line “low”. – Control is returned to the application program. CSE360 Application Program : : *Interrupt Detected* : : Interrupt Handler Service Request : : Clear Interrupt 225 Interrupts and DMA transfers 3 Changes to fetch/execute cycle Problems – Requires additional hardware in Timing & Control. – Queuing of interrupts – Interrupting an interrupt handler (solution: priorities and maskable interrupts) – Interrupts that must be serviced within an instruction – How to find address of interrupt handler CSE360 Y Interrupt Pending? N Save PC Save PSW PSW=new PSW PC=handler_addr PC -> bus load MAR INC to PC load PC 226 Interrupts and DMA transfers 4 Example: interrupt driven string output – Want to print a string without busy waiting. – Want to return to the application as fast as possible I’m ready! CSE360 227 Trap handler implementation Install trap handler into trap table – Buffer is like circular queue – only outputs, at most, one character disp_buf: .skip 256 disp_frnt: .byte 0 disp_bck: .byte 0 ! buffers string to print ! offset to front of queue ! offset to back of queue ta_6_handler: ! Copy str from mem[%r8] to mem[disp_buf+disp_bck] ! Disp_back = (disp_back+len(str)) mod 256 ! If display is ready ! If first char is not null, then output it ! Disp_frnt = (disp_frnt+1) mod 256 rett ! Return from trap CSE360 Disp_buf: disp_frnt Oldest byte Undisplayed byte newest byte disp_bck 228 Interrupt handler implementation This too outputs only one character at most, but when display becomes ready again, it generates another interrupt which invokes this routine! display_IRQ_handler: ! Save any registers used ! If disp_frnt != disp_bck (queue is not empty) ! Get char at mem[disp_frnt] ! If char is not null, then output it ! Disp_frnt = (disp_frnt+1) mod 256 ! Restore registers and set the request line “low” rett ! Return from trap I’m ready! Uses the UART for transmission. CPU Memory CSE360 229 Interrupts and DMA transfers 5 Problems with interrupt driven I/O CPU is involved with each interrupt Each interrupt corresponds to transfer of a single byte Lots of overhead for large amounts of data (blocks of 512 bytes) Execute 10s or 100s of instructions per byte Memory Transfer one word of data CSE360 CPU Device Controller Interrupt Transfer one byte of data 230 Interrupts and DMA transfers 6 DMA (Direct Memory Access) Want I/O without CPU intervention Want larger than one byte data transfers Solution: add a new device that can talk to both I/O devices and memory without the CPU; a “specialized” CPU strictly for data transfers. CPU Device Controller Memory DMA Controller CSE360 231 Interrupts and DMA transfers 7 Steps to a DMA transfer – CPU specifies a memory address, the operation (read/write), byte count, and disk block location to the DMA controller (or specify other I/O device). – DMA controller initiates the I/O, and transfers the data to/from memory directly – DMA controller interrupts the CPU when the entire block transfer is completed. Problem – Conflicts accessing memory. Can either arbitrate access or get a more expensive dual ported memory system. CSE360 232