Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computer Systems & Networks I – Computer Systems Roger Webb (13DJ01) [email protected] Syllabus 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Introduction .....................................................................chapter 2 Computer Interconnection Structures .....................chapter 3 Internal & External Memory ........................................chapter 4 Input & Output ................................................................chapter 6 Operating Systems – part 1 ........................................chapter 7 Operating Systems – part 2 Computer Arithmetic – part 1 .....................................chapter 8 Computer Arithmetic – part 2 Instruction Sets: Characteristics & Function ........chapter 9 Instruction Sets: Addressing Modes & Formats .chapter 10 CPU Structure & Function – part 1 .........................chapter 11 CPU Structure & Function – part 2 Control Unit Operation ..............................................chapter 14 Computer Organisation & Architecture – William Stallings Introduction History of Computing: Computer:- a person who performs computations Early Computing Devices: (8000BC – 1600) 8000 BC Tally Bones/Sticks 5000 BC – Sumeria – Abaq (writing in dust) 3000 BC – China - the Abacus 800 BC – Abax arrives in Europe 1594 – John Napier natural logs Introduction History of Computing: Mechanical Computers: (1600-1800) 1615 – Slide Rule – William Oughtred 1617 – Napier’s Bones 1642 – Pascaline - Blaise Pascal 1804 – Punch Card Loom – Jacquard Introduction History of Computing: Electro-Mechanical Computers: (1800-1890) 1833 – Difference Engine – Babbage - to make accurate trig and log tables - never actually finished – “of no use to science” 1842 – Analytic Engine (programmable version) (1991 Science Museum spent £400,000 to build working model of simplified Difference Engine – 3tons, 6’ high and 10’ long) 1840 – Lady Ada Lovelace – 1st programmer - suggested that Babbage’s engine could be programmed using punch cards Introduction History of Computing: Mechanical Computers: (1890-1920) 1890 – Holerith’s Tabulating Machine - punch card data for sorting census data 1906 – Lee De Forest invents vacuum tube (triode) 1911 – Holerith founds Tabulating Machines Company 1924 – Renamed International Business Machines Corporation (IBM) by Thomas J. Watson Introduction History of Computing: Electro-Mechanical Computers: (1920-1945) 1920 – Punch Card Technology 1944 – Mark I (Harvard) 1943 – Colossus (Bletchley Park) similar to Babbage engine. IBM: “electro-mechanical computers will never replace punched card-equipment” Introduction History of Computing: Electronic Computers: (1940-1950) 1942- First Electronic Computer - ABC uses base-2 number system, memory and logic circuits built from vacuum tubes IBM: “we will never be interested in an electronic computing machine” 1945: “I think there is a world market for maybe 5 computers” Chairman IBM 1946 – ENIAC (Electronic Numerical Integrator & Computer) to compute trajectory tables for the army (only 20% of bombs got within 1000ft) ENIAC’s 18,000 vacuum tubes dimmed the lights of Philadelphia when switched on Used in building first atomic bomb tests at Los Alamos Calculated pi to 2000 places in seventy hours Operated in decimal ENIAC - details • • • • • • • • Decimal (not binary) 20 accumulators of 10 digits Programmed manually by switches 18,000 vacuum tubes 30 tons 15,000 square feet 140 kW power consumption 5,000 additions per second von Neumann/Turing • • • • Stored Program concept Main memory storing programs and data ALU operating on binary data Control unit interpreting instructions from memory and executing • Input and output equipment operated by control unit • Princeton Institute for Advanced Studies - IAS • Completed 1952 Arithmetic and Logic Unit Input Output Equipment Main Memory Program Control Unit Introduction History of Computing: Electronic Computers: (1940-1950) 1947- First Computer “Bug” moth found in Mark II relay causing malfunction – hence “debugging” 1947 – Transistor Schockley, Bardeen & Brattain paves the way for smaller, low power and more stable systems to be built using same designs but replacing valve technology with transistor technology IAS Computer IAS – Institute of Advanced Study (Princeton) 1952 Basic Von Neuman Architecture: • 1000 x 40 bit words – Binary number – 2 x 20 bit instructions • Set of registers (storage in CPU) – – – – – – – Memory Buffer Register Memory Address Register Instruction Register Instruction Buffer Register Program Counter Accumulator Multiplier Quotient Structure of IAS - detail Central Processing Unit Arithmetic and Logic Unit Accumulator MQ Arithmetic & Logic Circuits MBR Input Output Equipment Instructions Main & Data Memory PC IBR MAR IR Control Circuits Program Control Unit Address Introduction History of Computing: Electronic Computers: (1950-70) 1954 - First commercial silicon transistor Texas Instruments 1958- First Integrated Circuit Jack Kilby from Texas Instruments made simple oscillator 1971 – 1st Microprocessor -Intel Intel (1968) make 4004 processor to do job of 12 chips 1972 – Pioneer 10 uses 4004 chips intel 4004 is first uprocessor to asteroid belt... Introduction History of Computing: 1st Generation Computers: (1950-60) Using Valve Technology 1951 - First Commercial Computer UNIVAC–1 correctly forecast US election in 1951 after only 5% of vote was in 1953- IBM 701 1st attempt from IBM 1954 – IBM 650 2nd attempt made as upgrade of punch card machines estimated total market as 50 – sold 1000... Introduction History of Computing: 2nd Generation Computers: (1960-65) Using transistor technology 1958 – Univac – NTDS 32kwords memory, 10,702 transistors, $500,000, 25kw 1960 – PDP 1 4kwords memory – CRT display first two player computer game – spacewar (1962) 1963 – PDP 8 $18,000 by 1971 25 companies making mini-comp Introduction History of Computing: 3rd Generation Computers: (1965-70) Using Integrated Circuits 1964 – IBM System 360 Upward compatibility - no need to redevelop when upgrading 1966 – Funding for ARPA net The beginnings of the internet 1967 – First Handheld Electronic Calculator Introduction History of Computing: 4th Generation Computers: (1971- ) Using Large Scale Integrated Circuits 1971 – LSI ICs several thousand transistors on a chip 1971 – Intel invent Microprocessor 1980’s– VLSI ICs several tens of thousands of transistors 80486 Pentium 4 Introduction • Multi Processor Systems – the way ahead • Fastest machine on the planet – as of November 2002 – the Earth Simulator in Japan Generations of Computer • Vacuum tube - 1946-1957 • Transistor - 1958-1964 • Small scale integration - 1965 on – Up to 100 devices on a chip • Medium scale integration - to 1971 – 100-3,000 devices on a chip • Large scale integration - 1971-1977 – 3,000 - 100,000 devices on a chip • Very large scale integration - 1978 to date – 100,000 - 100,000,000 devices on a chip • Ultra large scale integration – Over 100,000,000 devices on a chip Moore’s Law • • • • • • • • • Increased density of components on chip Gordon Moore – co-founder of Intel Number of transistors on a chip will double every year Since 1970’s development has slowed a little – Number of transistors doubles every 18 months Cost of a chip has remained almost unchanged Higher packing density means shorter electrical paths, giving higher performance Smaller size gives increased flexibility Reduced power and cooling requirements Fewer interconnections increases reliability Growth in CPU Transistor Count Itanium 2 Pentium 4 Semiconductor Memory • 1970 • Fairchild produce SRAM – Size of a single core • i.e. 1 bit of magnetic core storage – Holds 256 bits – Non-destructive read – 6 transistors – Much faster than core • Intel produce DRAM – Cheaper but slower – 1 transistor • Capacity approximately doubles each year Core Memory DRAM Speeding it up • • • • • • Pipelining On board cache On board L1 & L2 cache Branch prediction Data flow analysis Speculative execution Performance Mismatch • Processor speed increased • Memory capacity increased • Memory speed lags behind processor speed Trends in DRAM use Solutions • Increase number of bits retrieved at one time – Make DRAM “wider” rather than “deeper” • Change DRAM interface – Cache • Reduce frequency of memory access – More complex cache and cache on chip • Increase interconnection bandwidth – High speed buses – Hierarchy of buses Internet Resources • http://www.intel.com/ – Search for the Intel Museum • • • • • • http://www.ibm.com http://www.dec.com http://www.digidome.nl http://ed-thelen.org/comp-hist/ http://www.computerhistory.org/ http://www.cs.man.ac.uk/CCS Syllabus 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Introduction .....................................................................chapter 2 Computer Interconnection Structures .....................chapter 3 Internal & External Memory ........................................chapter 4 Input & Output ................................................................chapter 6 Operating Systems – part 1 ........................................chapter 7 Operating Systems – part 2 Computer Arithmetic – part 1 .....................................chapter 8 Computer Arithmetic – part 2 Instruction Sets: Characteristics & Function ........chapter 9 Instruction Sets: Addressing Modes & Formats .chapter 10 CPU Structure & Function – part 1 .........................chapter 11 CPU Structure & Function – part 2 Control Unit Operation ..............................................chapter 14 Computer Organisation & Architecture – William Stallings von Neumann/Turing • • • • Stored Program concept Main memory storing programs and data ALU operating on binary data Control unit interpreting instructions from memory and executing • Input and output equipment operated by control unit • Princeton Institute for Advanced Studies - IAS • Completed 1952 Arithmetic and Logic Unit Input Output Equipment Main Memory Program Control Unit Program Concept • Hardwired systems are inflexible • General purpose hardware can do different tasks, given correct control signals • Instead of re-wiring, supply a new set of control signals What is a program? • A sequence of steps • For each step, an arithmetic or logical operation is done • For each operation, a different set of control signals is needed Function of Control Unit • For each operation a unique code is provided – e.g. ADD, MOVE • A hardware segment accepts the code and issues the control signals • We have a computer! Components • The Control Unit and the Arithmetic and Logic Unit constitute the Central Processing Unit • Data and instructions need to get into the system and results out – Input/output • Temporary storage of code and results is needed – Main memory Top Level View Instruction Cycle • Two steps: – Fetch – Execute Fetch Cycle • Program Counter (PC) holds address of next instruction to fetch • Processor fetches instruction from memory location pointed to by PC • Increment PC Central Processing Unit – Unless told otherwise • Instruction loaded into Instruction Register (IR) • Processor interprets instruction and performs required actions Arithmetic and Logic Unit MQ Accumulator Arithmetic & Logic Circuits MBR Input Output Instructions Main & Data Memory IBR PC MAR IR Control Circuits Program Control Unit Address Execute Cycle • Processor-memory – data transfer between CPU and main memory • Processor I/O – Data transfer between CPU and I/O module • Data processing Central Processing Unit Arithmetic and Logic Unit MQ Accumulator – Some arithmetic or logical operation on data • Control – Alteration of sequence of operations – e.g. jump • Combination of above Arithmetic & Logic Circuits MBR Input Output Instructions Main & Data Memory IBR PC MAR IR Control Circuits Program Control Unit Address Example of Program Execution Connecting • All the units must be connected • Different type of connection for different type of unit – Memory – Input/Output – CPU Memory Connection • Receives and sends data • Receives addresses (of locations) • Receives control signals – Read – Write – Timing Input/Output Connection • Similar to memory from computer’s viewpoint • Output – Receive data from computer – Send data to peripheral • Input – Receive data from peripheral – Send data to computer • Receive control signals from computer • Send control signals to peripherals – e.g. spin disk • Receive addresses from computer – e.g. port number to identify peripheral • Send interrupt signals (control) CPU Connection • • • • Reads instruction and data Writes out data (after processing) Sends control signals to other units Receives (& acts on) interrupts Buses • There are a number of possible interconnection systems • Single and multiple BUS structures are most common • e.g. Control/Address/Data bus (PC) • e.g. Unibus (DEC-PDP) Data Bus • Carries data – Remember that there is no difference between “data” and “instruction” at this level • “Width” is a key determinant of performance – 8, 16, 32, 64 bit – number of parallel lines Address bus • Identify the source or destination of data • e.g. CPU needs to read an instruction (data) from a given location in memory • Bus width determines maximum memory capacity of system – e.g. 8080 has 16 bit address bus giving 64k address space Control Bus • Control and timing information – Memory read/write signal – Interrupt request – Clock signals Bus Interconnection Scheme Big, Red & Double Decker? • What do buses look like? – Parallel lines on circuit boards – Ribbon cables – Strip connectors on mother boards • e.g. PCI – Sets of wires Single Bus Problems • Lots of devices on one bus leads to: – Propagation delays • Long data paths mean that co-ordination of bus use can adversely affect performance • If aggregate data transfer approaches bus capacity • Most systems use multiple buses to overcome these problems Traditional Bus Strtucture High Performance Bus Bus Types • Dedicated – Separate data & address lines • Multiplexed – – – – Shared lines Address valid or data valid control line Advantage - fewer lines Disadvantages • More complex control • Ultimate performance Bus Arbitration • • • • More than one module controlling the bus e.g. CPU and DMA controller Only one module may control bus at one time Arbitration may be centralised or distributed Centralised Arbitration • Single hardware device controlling bus access – Bus Controller or Arbiter • May be part of CPU or separate Distributed Arbitration • Each module may claim the bus • Control logic on all modules Timing • Co-ordination of events on bus • Synchronous – – – – – – Events determined by clock signals Control Bus includes clock line A single 1-0 is a bus cycle All devices can read clock line Usually sync on leading edge Usually a single cycle for an event Synchronous Timing Diagram Asynchronous Timing Diagram Syllabus 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Introduction .....................................................................chapter 2 Computer Interconnection Structures .....................chapter 3 Internal & External Memory ........................................chapter 4 Input & Output ................................................................chapter 6 Operating Systems – part 1 ........................................chapter 7 Operating Systems – part 2 Computer Arithmetic – part 1 .....................................chapter 8 Computer Arithmetic – part 2 Instruction Sets: Characteristics & Function ........chapter 9 Instruction Sets: Addressing Modes & Formats .chapter 10 CPU Structure & Function – part 1 .........................chapter 11 CPU Structure & Function – part 2 Control Unit Operation ..............................................chapter 14 Computer Organisation & Architecture – William Stallings Memory Characteristics • • • • • • • • Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation Location • CPU – Internal - External • Word size Capacity – The natural unit of organisation • Number of words – or Bytes • Internal Unit of Transfer – Usually governed by data bus width • External – Usually a block which is much larger than a word • Addressable unit – Smallest location which can be uniquely addressed – Word internally Access Methods • Sequential – Start at the beginning and read through in order – Access time depends on location of data and previous location – e.g. tape • Direct – – – – Individual blocks have unique address Access is by jumping to vicinity plus sequential search Access time depends on location and previous location e.g. disk • Random – Individual addresses identify locations exactly – Access time is independent of location or previous access – e.g. RAM • Associative – Data is located by a comparison with contents of a portion of the store – Access time is independent of location or previous access – e.g. cache Memory Hierarchy • Registers – In CPU • Internal or Main memory – May include one or more levels of cache – “RAM” • External memory – Backing store Performance • Access time – Time between presenting the address and getting the valid data • Memory Cycle time – Time may be required for the memory to “recover” before next access – Cycle time is access + recovery • Transfer Rate – Rate at which data can be moved Physical Types • Semiconductor – RAM • Magnetic – Disk & Tape • Optical – CD & DVD • Others – Bubble – Hologram Physical Characteristics • • • • Decay Volatility Erasable Power consumption Organisation • Physical arrangement of bits into words • Not always obvious • e.g. interleaved The Bottom Line • How much? – Capacity • How fast? – Time is money • How expensive? Hierarchy List • • • • • • • • Registers L1 Cache L2 Cache Main memory Disk cache Disk Optical Tape CPU Semiconductor Memory • RAM – Misnamed as all semiconductor memory is random access – Read/Write – Volatile – Temporary storage – Static or dynamic Dynamic RAM • • • • • • • • • Bits stored as charge in capacitors Charges leak Need refreshing even when powered Simpler construction Smaller per bit Less expensive Need refresh circuits Slower Main memory Static RAM • • • • • • • • • Bits stored as on/off switches No charges to leak No refreshing needed when powered More complex construction Larger per bit More expensive Does not need refresh circuits Faster Cache Read Only Memory (ROM) • • • • • Permanent storage Microprogramming (see later) Library subroutines Systems programs (BIOS) Function tables Types of ROM • Written during manufacture – Very expensive for small runs • Programmable (once) – PROM – Needs special equipment to program • Read “mostly” – Erasable Programmable (EPROM) • Erased by UV – Electrically Erasable (EEPROM) • Takes much longer to write than read – Flash memory • Erase whole memory electrically Organisation in detail • A 16Mbit chip can be organised as 1M of 16 bit words • A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each word in chip 1 and so on • A 16Mbit chip can be organised as a 2048 x 2048 x 4bit array – Reduces number of address pins • Multiplex row address and column address • 11 pins to address (211=2048) • Adding one more pin doubles range of values so x4 capacity Refreshing • • • • • • Refresh circuit included on chip Disable chip Count through rows Read & Write back Takes time Slows down apparent performance Typical 16 Mb DRAM (4M x 4) Module Organisation 1 bit per chip per word Module Organisation (2) Larger Memory Size Cache • Small amount of fast memory • Sits between normal main memory and CPU • May be located on CPU chip or module So you want fast? • It is possible to build a computer which uses only static RAM • This would be very fast • This would need no cache – How can you cache cache? • This would cost a very large amount Cache operation - overview • • • • CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache • Then deliver from cache to CPU • Cache includes tags to identify which block of main memory is in each cache slot Cache Design • • • • • • Size Mapping Function Replacement Algorithm Write Policy Block Size Number of Caches Size does matter • Cost – More cache is expensive • Speed – More cache is faster (up to a point) – Checking cache for data takes time Locality of Reference • During the course of the execution of a program, memory references tend to cluster • e.g. loops Newer RAM Technology (1) • Basic DRAM same since first RAM chips • Enhanced DRAM – Contains small SRAM as well – SRAM holds last line read (c.f. Cache!) • Cache DRAM – Larger SRAM component – Use as cache or serial buffer Newer RAM Technology (2) • Synchronous DRAM (SDRAM) – – – – – currently on DIMMs Access is synchronized with an external clock Address is presented to RAM RAM finds data (CPU waits in conventional DRAM) Since SDRAM moves data in time with system clock, CPU knows when data will be ready – CPU does not have to wait, it can do something else – Burst mode allows SDRAM to set up stream of data and fire it out in block SDRAM Syllabus 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Introduction .....................................................................chapter 2 Computer Interconnection Structures .....................chapter 3 Internal & External Memory ........................................chapter 4 Input & Output ................................................................chapter 6 Operating Systems – part 1 ........................................chapter 7 Operating Systems – part 2 Computer Arithmetic – part 1 .....................................chapter 8 Computer Arithmetic – part 2 Instruction Sets: Characteristics & Function ........chapter 9 Instruction Sets: Addressing Modes & Formats .chapter 10 CPU Structure & Function – part 1 .........................chapter 11 CPU Structure & Function – part 2 Control Unit Operation ..............................................chapter 14 Computer Organisation & Architecture – William Stallings Input/Output Problems • Wide variety of peripherals – Delivering different amounts of data – At different speeds – In different formats • All slower than CPU and RAM • Need I/O modules – Interface to CPU and Memory – Interface to one or more peripherals I/O Module Function • • • • • Control & Timing CPU Communication Device Communication Data Buffering Error Detection Address bus Data bus Control bus I/O Steps • • • • • • CPU checks I/O module device status I/O module returns status If ready, CPU requests data transfer I/O module gets data from device I/O module transfers data to CPU Variations for output, DMA, etc. I/O Module Links to Peripheral Devices I/O Module Decisions • • • • Hide or reveal device properties to CPU Support multiple or single device Control device functions or leave for CPU Also O/S decisions – e.g. Unix treats everything it can as a file Input Output Techniques • Programmed • Interrupt driven • Direct Memory Access (DMA) Programmed I/O • CPU has direct control over I/O – Sensing status – Read/write commands – Transferring data • CPU waits for I/O module to complete operation – Wastes CPU time Programmed I/O - detail • • • • • • • CPU requests I/O operation I/O module performs operation I/O module sets status bits CPU checks status bits periodically I/O module does not inform CPU directly I/O module does not interrupt CPU CPU may wait or come back later I/O Commands • CPU issues address – Identifies module (& device if >1 per module) • CPU issues command – Control - telling module what to do • e.g. spin up disk – Test - check status • e.g. power? Error? – Read/Write • Module transfers data via buffer from/to device Addressing I/O Devices • Under programmed I/O data transfer is very like memory access (CPU viewpoint) • Each device given unique identifier • CPU commands contain identifier (address) I/O Mapping • Memory mapped I/O – Devices and memory share an address space – I/O looks just like memory read/write – No special commands for I/O • Large selection of memory access commands available • Isolated I/O – Separate address spaces – Need I/O or memory select lines – Special commands for I/O • Limited set Interrupt Driven I/O • Overcomes CPU waiting • No repeated CPU checking of device • I/O module interrupts when ready Interrupt Driven I/O - Basic Operation • CPU issues read command • I/O module gets data from peripheral whilst CPU does other work • I/O module interrupts CPU • CPU requests data • I/O module transfers data CPU Viewpoint • • • • Issue read command Do other work Check for interrupt at end of each instruction cycle If interrupted:– Save context (registers) – Process interrupt • Fetch data & store • See Operating Systems notes Design Issues • How do you identify the module issuing the interrupt? • How do you deal with multiple interrupts? – i.e. an interrupt handler being interrupted Identifying Interrupting Module • Different line for each module – Limits number of devices • Software poll – CPU asks each module in turn - Slow • Daisy Chain or Hardware poll – Interrupt Acknowledge sent down a chain – Module responsible places vector on bus – CPU uses vector to identify handler routine • Bus Master – Module must claim the bus before it can raise interrupt – e.g. PCI & SCSI Multiple Interrupts • Each interrupt line has a priority • Higher priority lines can interrupt lower priority lines • If bus mastering only current master can interrupt Direct Memory Access • Interrupt driven and programmed I/O require active CPU intervention – Transfer rate is limited – CPU is tied up • DMA is the answer DMA Function • Additional Module (hardware) on bus • DMA controller takes over from CPU for I/O DMA Operation • CPU tells DMA controller:– – – – Read/Write Device address Starting address of memory block for data Amount of data to be transferred • CPU carries on with other work • DMA controller deals with transfer • DMA controller sends interrupt when finished DMA Transfer - Cycle Stealing • DMA controller takes over bus for a cycle • Transfer of one word of data • Not an interrupt – CPU does not switch context • CPU suspended just before it accesses bus – i.e. before an operand or data fetch or a data write • Slows down CPU but not as much as CPU doing transfer DMA Configurations (1) CPU DMA Controller I/O Device I/O Device • Single Bus, Detached DMA controller • Each transfer uses bus twice – I/O to DMA then DMA to memory • CPU is suspended twice Main Memory DMA Configurations (2) CPU DMA Controller I/O Device I/O Device DMA Controller I/O Device • Single Bus, Integrated DMA controller • Controller may support >1 device • Each transfer uses bus once – DMA to memory • CPU is suspended once Main Memory Syllabus 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Introduction .....................................................................chapter 2 Computer Interconnection Structures .....................chapter 3 Internal & External Memory ........................................chapter 4 Input & Output ................................................................chapter 6 Operating Systems – part 1 ........................................chapter 7 Operating Systems – part 2 Computer Arithmetic – part 1 .....................................chapter 8 Computer Arithmetic – part 2 Instruction Sets: Characteristics & Function ........chapter 9 Instruction Sets: Addressing Modes & Formats .chapter 10 CPU Structure & Function – part 1 .........................chapter 11 CPU Structure & Function – part 2 Control Unit Operation ..............................................chapter 14 Computer Organisation & Architecture – William Stallings O/S Objectives and Functions • Convenience – Making the computer easier to use • Efficiency – Allowing better use of computer resources Layers Operating System Services • • • • • • • Program creation Program execution Access to I/O devices Controlled access to files System access Error detection and response Accounting Types of Operating System • • • • Interactive Batch Single program (Uni-programming) Multi-programming (Multi-tasking) Early Systems • Late 1940s to mid 1950s – No Operating System – Programs interact directly with hardware – Two main problems: – Scheduling – Setup time Simple Batch Systems • • • • • Resident Monitor program Users submit jobs to operator Operator batches jobs Monitor controls sequence of events to process batch When one job is finished, control returns to Monitor which reads next job • Monitor handles scheduling Desirable Hardware Features • Memory protection – To protect the Monitor (not screen saver!) • Timer – To prevent a job monopolizing the system • Privileged instructions – Only executed by Monitor – e.g. I/O • Interrupts – Allows for relinquishing and regaining control Multi-programmed Batch Systems • I/O devices very slow • When one program is waiting for I/O, another can use the CPU Single Program Multi-Programming with 2 Programs Multi-Programming with 3 Programs Time Sharing Systems • Allow users to interact directly with the computer – i.e. Interactive • Multi-programming allows a number of users to interact with the computer Process States Process Control Block • • • • • • • • Identifier PID State Running, Ready, Blocked, New, Exit Priority High, Medium, Low, etc…. Program Counter Where were we? Memory pointers Preserve our calculations Context data What was in the registers I/O status Reading, Writing, Null Accounting information How Much Used Already Memory Management • Uni-program – Memory split into two – One for Operating System (monitor) – One for currently executing program • Multi-program – “User” part is sub-divided and shared among active processes Swapping • Problem: I/O is so slow compared with CPU that even in multi-programming system, CPU can be idle most of the time • Solutions: – Increase main memory • Expensive • Leads to larger programs – Swapping What is Swapping? • Long term queue of processes stored on disk • Processes “swapped” in as space becomes available • As a process completes it is moved out of main memory • If none of the processes in memory are ready (i.e. all I/O blocked) – Swap out a blocked process to intermediate queue – Swap in a ready process or a new process – But swapping is an I/O process... Partitioning • Splitting memory into sections to allocate to processes (including Operating System) • Fixed-sized partitions – May not be equal size – Process is fitted into smallest hole that will take it (best fit) – Some wasted memory – Leads to variable sized partitions Variable Sized Partitions (1) • Allocate exactly the required memory to a process • This leads to a hole at the end of memory, too small to use – Only one small hole - less waste • When all processes are blocked, swap out a process and bring in another • New process may be smaller than swapped out process • Another hole Variable Sized Partitions (2) • Eventually have lots of holes (fragmentation) • Solutions: – Coalesce - Join adjacent holes into one large hole – Compaction - From time to time go through memory and move all hole into one free block (c.f. disk defragmentation) Effect of Dynamic Partitioning Relocation • No guarantee that process will load into the same place in memory • Instructions contain addresses – Locations of data – Addresses for instructions (branching) • Logical address - relative to beginning of program • Physical address - actual location in memory (this time) • Automatic conversion using base address Paging • Split memory into equal sized, small chunks -page frames • Split programs (processes) into equal sized small chunks - pages • Allocate the required number page frames to a process • Operating System maintains list of free frames • A process does not require contiguous page frames • Use page table to keep track Logical and Physical Addresses - Paging Virtual Memory • Demand paging – Do not require all pages of a process in memory – Bring in pages as required • Page fault – – – – Required page is not in memory Operating System must swap in required page May need to swap out a page to make space Select page to throw out based on recent history Thrashing • • • • Too many processes in too little memory Operating System spends all its time swapping Little or no real work is done Disk light is on all the time • Solutions – Good page replacement algorithms – Reduce number of processes running – Fit more memory Bonus • We do not need all of a process in memory for it to run • We can swap in pages as required • So - we can now run processes that are bigger than total memory available! • Main memory is called real memory • User/programmer sees much bigger memory - virtual memory Page Table Structure Segmentation • Paging is not (usually) visible to the programmer • Segmentation is visible to the programmer • Usually different segments allocated to program and data • May be a number of program and data segments Advantages of Segmentation • Simplifies handling of growing data structures • Allows programs to be altered and recompiled independently, without re-linking and re-loading • Lends itself to sharing among processes • Lends itself to protection • Some systems combine segmentation with paging Syllabus 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Introduction .....................................................................chapter 2 Computer Interconnection Structures .....................chapter 3 Internal & External Memory ........................................chapter 4 Input & Output ................................................................chapter 6 Operating Systems – part 1 ........................................chapter 7 Operating Systems – part 2 Computer Arithmetic – part 1 .....................................chapter 8 Computer Arithmetic – part 2 Instruction Sets: Characteristics & Function ........chapter 9 Instruction Sets: Addressing Modes & Formats .chapter 10 CPU Structure & Function – part 1 .........................chapter 11 CPU Structure & Function – part 2 Control Unit Operation ..............................................chapter 14 Computer Organisation & Architecture – William Stallings Arithmetic & Logic Unit • Does the calculations • Everything else in the computer is there to service this unit • Handles integers • May handle floating point (real) numbers • May be separate FPU (maths co-processor) • May be on chip separate FPU (486DX +) Integer Representation • Only have 0 & 1 to represent everything • Positive numbers stored in binary – e.g. 41=00101001 • No minus sign • No decimal point • Instead we can use either: – Sign-Magnitude – Two’s compliment Sign-Magnitude • • • • • • Left most bit is sign bit 0 means positive 1 means negative +18 = 00010010 -18 = 10010010 Problems – Need to consider both sign and magnitude in arithmetic – Two representations of zero (+0 and -0) Two’s Compliment • • • • • • • +3 = 00000011 +2 = 00000010 +1 = 00000001 +0 = 00000000 -1 = 11111111 -2 = 11111110 -3 = 11111101 Benefits • One representation of zero • Arithmetic works easily (see later) • Negating is fairly easy – 3 = 00000011 – Boolean complement gives – Add 1 to LSB 11111100 11111101 Geometric Depiction of Twos Complement Integers Negation Special Case 1 • • • • • • 0= 00000000 Bitwise not 11111111 Add 1 to LSB +1 Result 1 00000000 Overflow is ignored, so: -0=0 Negation Special Case 2 • • • • • • • • • -128 = 10000000 bitwise not 01111111 Add 1 to LSB +1 Result 10000000 So: -(-128) = -128 X Monitor MSB (sign bit) It should change during negation In this case 128 is too big to be represented Range of Numbers • 8 bit 2s compliment – +127 = 01111111 = 27 -1 – -128 = 10000000 = -27 • 16 bit 2s compliment – +32767 = 011111111 11111111 = 215 - 1 – -32768 = 100000000 00000000 = -215 Conversion Between Lengths • • • • • • • Positive number pack with leading zeros +18 = 00010010 +18 = 00000000 00010010 Negative numbers pack with leading ones -18 = 10010010 -18 = 11111111 10010010 i.e. pack with MSB (sign bit) Addition and Subtraction • Normal binary addition • Monitor sign bit for overflow • Take twos compliment of subtrahend and add to minuend – i.e. a - b = a + (-b) • So we only need addition and complement circuits Hardware for Addition and Subtraction Multiplication • • • • Complex Work out partial product for each digit Take care with place value (column) Add partial products Multiplication Example 1011 Multiplicand (11 dec) x 1101 Multiplier (13 dec) 1011 Partial products 0000 Note: if multiplier bit is 1 copy 1011 multiplicand (place value) 1011 otherwise zero 10001111 Product (143 dec) Note: need double length result Unsigned Binary Multiplication 1011 0000 Initial Values 1101 If this digit is a one then add multiplicand Shift A and Q right Unsigned Binary Multiplication 1011 First Cycle 1011 0000 1101 If this digit is a one then add multiplicand Shift A and Q right Unsigned Binary Multiplication 1011 First Cycle 1011 1011 1101 If this digit is a one then add multiplicand Shift A and Q right Unsigned Binary Multiplication 1011 First Cycle 1011 0101 1110 If this digit is a one then add multiplicand Shift A and Q right Unsigned Binary Multiplication 1011 Second Cycle 0000 0101 1110 If this digit is a one then add multiplicand Shift A and Q right Unsigned Binary Multiplication 1011 Second Cycle 0000 0101 1110 If this digit is a one then add multiplicand Shift A and Q right Unsigned Binary Multiplication 1011 Second Cycle 0000 0010 1111 If this digit is a one then add multiplicand Shift A and Q right Unsigned Binary Multiplication 1011 Third Cycle 1011 0010 1111 If this digit is a one then add multiplicand Shift A and Q right Unsigned Binary Multiplication 1011 Third Cycle 1011 1101 1111 If this digit is a one then add multiplicand Shift A and Q right Unsigned Binary Multiplication 1011 Third Cycle 1011 0110 1111 If this digit is a one then add multiplicand Shift A and Q right Unsigned Binary Multiplication 1011 Fourth Cycle 1011 0110 1111 If this digit is a one then add multiplicand Shift A and Q right Unsigned Binary Multiplication 1011 Fourth Cycle 1011 1 0001 1111 If this digit is a one then add multiplicand Shift A and Q right Unsigned Binary Multiplication 1011 Fourth Cycle 1011 1000 1111 If this digit is a one then add multiplicand Shift A and Q right Flowchart for Unsigned Binary Multiplication Multiplying Negative Numbers • This does not work! • Solution 1 – Convert to positive if required – Multiply as above – If signs were different, negate answer • Solution 2 – Booth’s algorithm Example of Booth’s Algorithm •Multiplier and multiplicand placed in Q and M •Additional 1 bit register Q-1 •Results appear in A and Q •A and Q initialised to 0 •Q0 and Q-1 examined •If both same (0-0 or 1-1) Shift A and Q right If different If 0-1 then Add M If 1-0 then Subtract M Shift A and Q right •Shift preserves the sign An-1 An-2 and remains in An-1 Division • More complex than multiplication • Negative numbers are really bad! • Based on long division 00001101 Quotient 1011 10010011 1011 001110 Partial 1011 Remainders 001111 1011 100 Dividend Divisor Remainder Real Numbers • Numbers with fractions • Could be done in pure binary – 1001.1010 = 24 + 20 +2-1 + 2-3 =9.625 • Where is the binary point? • Fixed? – Very limited • Moving? – How do you show where it is? Sign bit Floating Point Biased Exponent Significand or Mantissa • +/- .significand x 2exponent • Misnomer – not really a floating point • Point is actually fixed between sign bit and body of mantissa • Exponent indicates place value (point position) Floating Point Examples • Mantissa is stored in 2s compliment • Exponent is in excess or biased notation – e.g. Excess (bias) 128 means – 8 bit exponent field – Pure value range 0-255 – Subtract 128 to get correct value – Range -128 to +127 FP Ranges • For a 32 bit number – 8 bit exponent – +/- 2256 1.5 x 1077 • Accuracy – The effect of changing lsb of mantissa – 23 bit mantissa 2-23 1.2 x 10-7 – About 6 decimal places Expressible Numbers IEEE 754 • • • • Standard for floating point storage 32 and 64 bit standards 8 and 11 bit exponent respectively Extended formats (both mantissa and exponent) for intermediate results FP Arithmetic +/• • • • Check for zeros Align significands (adjusting exponents) Add or subtract significands Normalize FP Arithmetic x/ • • • • • • Check for zero Add/subtract exponents Multiply/divide significands (watch sign) Normalize Round All intermediate results should be in double length storage Floating Point Multiplication Floating Point Division Syllabus 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Introduction .....................................................................chapter 2 Computer Interconnection Structures .....................chapter 3 Internal & External Memory ........................................chapter 4 Input & Output ................................................................chapter 6 Operating Systems – part 1 ........................................chapter 7 Operating Systems – part 2 Computer Arithmetic – part 1 .....................................chapter 8 Computer Arithmetic – part 2 Instruction Sets: Characteristics & Function ........chapter 9 Instruction Sets: Addressing Modes & Formats .chapter 10 CPU Structure & Function – part 1 .........................chapter 11 CPU Structure & Function – part 2 Control Unit Operation ..............................................chapter 14 Computer Organisation & Architecture – William Stallings What is an instruction set? • The complete collection of instructions that are understood by a CPU • Machine Code – set of binary digits • Usually represented by assembly codes Elements of an Instruction • Operation code (Op code) – Do this • Source Operand reference – To this • Result Operand reference – Put the answer here • Next Instruction Reference – When you have done that, do this... Operands – Where From? • Main memory (or virtual memory or cache) • CPU register • I/O device Instruction Representation • In machine code each instruction has a unique bit pattern • For human consumption (well, programmers anyway) a symbolic representation is used – e.g. ADD, SUB, LOAD • Operands can also be represented in this way – ADD A,B Instruction Types • • • • Data processing Data storage (main memory) Data movement (I/O) Program flow control Number of Addresses (a) • 3 addresses – – – – – Operand 1, Operand 2, Result a = b + c; May be a fourth - next instruction (usually implicit) Not common Needs very long “words” to hold everything Number of Addresses (b) • 2 addresses – – – – One address doubles as operand and result a=a+b Reduces length of instruction Requires some extra work • Temporary storage to hold some results Number of Addresses (c) • 1 address – Implicit second address – Usually a register (accumulator) – Common on early machines Number of Addresses (d) • 0 (zero) addresses – All addresses implicit – Uses a stack – e.g. push a – push b – add – pop c – c=a+b How Many Addresses is good? • More addresses – More complex (powerful?) instructions – More registers • Inter-register operations are quicker – Fewer instructions per program • Fewer addresses – Less complex (powerful?) instructions – More instructions per program – Faster fetch/execution of instructions Design Decisions (1) • Operation repertoire – How many operations? – What can they do? – How complex are they? • Data types • Instruction formats – Length of op code field – Number of addresses/instruction • Registers – Number of CPU registers available – Which operations can be performed on which registers? • Addressing modes (later…) • RISC v CISC - whole lecture on its own Types of Operand • Addresses • Numbers – Integer/floating point • Characters – ASCII etc. • Logical Data – Bits or flags • (Aside: Is there any difference between numbers and characters? ) Specific Data Types • • • • • • • • • • General - arbitrary binary contents Integer - single binary value Ordinal - unsigned integer Unpacked BCD - One digit per byte Packed BCD - 2 BCD digits per byte Near Pointer - 32 bit offset within segment Bit field Byte String Floating Point ………… Types of Operation • • • • • • • Data Transfer Arithmetic Logical Conversion I/O System Control Transfer of Control Data Transfer • Specify – Source – Destination – Amount of data • May be different instructions for different movements – e.g. IBM 370 • • • • L LH LE Etc… ; 32 bits from memory to register ; 16 bits from memory to register ; 32 bits from memory to fp register • Or one instruction and different addresses – e.g. VAX • LD ;16 bits from anywhere to anywhere Arithmetic • • • • Add, Subtract, Multiply, Divide Signed Integer Floating point ? May include – Increment (a++) – Decrement (a--) – Negate (-a) Logical • Bitwise operations • AND, OR, NOT Conversion • E.g. Binary to Decimal Input/Output • May be specific instructions • May be done using data movement instructions (memory mapped) • May be done by a separate controller (DMA) Systems Control • Privileged instructions • CPU needs to be in specific state • For operating systems or compiler use Transfer of Control • Branch – e.g. branch to x if result is zero • Skip – – – – e.g. increment and skip if zero ISZ Register1 Branch xxxx ADD A • Subroutine call – c.f. interrupt call Byte Order • What order do we read numbers that occupy more than one byte? • e.g. (numbers in hex to make it easy to read) • 12345678 can be stored in 4x8bit locations as follows • • • • • Address 184 185 186 187 Value (1) 12 34 56 78 Value(2) 78 56 34 12 • i.e. read top down or bottom up? Byte Order Names • The problem is called Endian • The system on the left has the least significant byte in the lowest address 12 34 56 78 184 185 186 187 • This is called big-endian • The system on the right has the least significant byte in the highest address 12 34 56 78 187 186 185 184 • This is called little-endian Standard…What Standard? • Pentium (80x86), VAX are little-endian • IBM 370, Moterola 680x0 (Mac), and most RISC are big-endian • Internet is big-endian – Makes writing Internet programs on PC more awkward! – WinSock provides htoi and itoh (Host to Internet & Internet to Host) functions to convert Syllabus 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Introduction .....................................................................chapter 2 Computer Interconnection Structures .....................chapter 3 Internal & External Memory ........................................chapter 4 Input & Output ................................................................chapter 6 Operating Systems – part 1 ........................................chapter 7 Operating Systems – part 2 Computer Arithmetic – part 1 .....................................chapter 8 Computer Arithmetic – part 2 Instruction Sets: Characteristics & Function ........chapter 9 Instruction Sets: Addressing Modes & Formats .chapter 10 CPU Structure & Function – part 1 .........................chapter 11 CPU Structure & Function – part 2 Control Unit Operation ..............................................chapter 14 Computer Organisation & Architecture – William Stallings Addressing Modes • • • • • • • Immediate Direct Indirect Register Register Indirect Displacement (Indexed) Stack Immediate Addressing • Operand is part of instruction • Operand = address field • e.g. ADD 5 – Add 5 to contents of accumulator – 5 is operand • No memory reference to fetch data • Fast • Limited range Instruction Opcode Operand Direct Addressing • Address field contains address of operand • Effective address (EA) = address field (A) • e.g. ADD A – Add contents of cell A to accumulator – Look in memory at address A for operand • Single memory reference to access data • No additional calculations to work out effective address Instruction • Limited address space Opcode Address A Memory Operand Indirect Addressing (1) • Memory cell pointed to by address field contains the address of (pointer to) the operand • EA = (A) – Look in A, find address (A) and look there for operand • e.g. ADD (A) – Add contents of cell pointed to by contents of A to accumulator Opcode Instruction Address A Memory Pointer to operand Operand Indirect Addressing (2) • Large address space • 2n where n = word length • May be nested, multilevel, cascaded – e.g. EA = (((A))) • Draw the diagram yourself • Multiple memory accesses to find operand • Hence slower Register Addressing (1) • • • • Operand is held in register named in address filed EA = R Limited number of registers Very small address field needed – Shorter instructions – Faster instruction fetch Opcode Instruction Register Address R Registers Operand Register Addressing (2) • • • • No memory access Very fast execution Very limited address space Multiple registers helps performance – Requires good assembly programming or compiler writing • c.f. Direct addressing Register Indirect Addressing • C.f. indirect addressing • EA = (R) • Operand is in memory cell pointed to by contents of register R • Large address space (2n) • One fewer memory access than indirect addressing Opcode Instruction Register Address R Memory Registers Pointer to Operand Operand Displacement Addressing • EA = A + (R) • Address field hold two values – A = base value – R = register that holds displacement – or vice versa Instruction OpcodeRegister R Address A Memory Registers Pointer to Operand + Operand Relative Addressing • • • • A version of displacement addressing R = Program counter, PC EA = A + (PC) i.e. get operand from A cells from current location pointed to by PC • c.f locality of reference & cache usage Base-Register Addressing • • • • A holds displacement R holds pointer to base address R may be explicit or implicit e.g. segment registers in 80x86 Indexed Addressing • • • • A = base R = displacement EA = A + R Good for accessing arrays – EA = A + R – R++ Combinations • Postindex • EA = (A) + (R) • Preindex • EA = (A+(R)) • (Draw the diagrams) Stack Addressing • Operand is (implicitly) on top of stack • e.g. – ADD Pop top two items from stack and add Instruction Formats • • • • Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than one instruction format in an instruction set Instruction Length • Affected by and affects: – – – – – Memory size Memory organization Bus structure CPU complexity CPU speed • Trade off between powerful instruction repertoire and saving space Allocation of Bits • • • • • • Number of addressing modes Number of operands Register versus memory Number of register sets Address range Address granularity Syllabus 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Introduction .....................................................................chapter 2 Computer Interconnection Structures .....................chapter 3 Internal & External Memory ........................................chapter 4 Input & Output ................................................................chapter 6 Operating Systems – part 1 ........................................chapter 7 Operating Systems – part 2 Computer Arithmetic – part 1 .....................................chapter 8 Computer Arithmetic – part 2 Instruction Sets: Characteristics & Function ........chapter 9 Instruction Sets: Addressing Modes & Formats .chapter 10 CPU Structure & Function – part 1 .........................chapter 11 CPU Structure & Function – part 2 Control Unit Operation ..............................................chapter 14 Computer Organisation & Architecture – William Stallings CPU Structure • CPU must: – – – – – Fetch instructions Interpret instructions Fetch data Process data Write data Registers • • • • • CPU must have some working space (temporary storage) Called registers Number and function vary between processor designs One of the major design decisions Top level of memory hierarchy User Visible Registers • • • • General Purpose Data Address Condition Codes General Purpose Registers • • • • May be true general purpose May be restricted May be used for data or addressing Data – Accumulator • Addressing – Segment • Make them general purpose – Increase flexibility and programmer options – Increase instruction size & complexity • Make them specialized – Smaller (faster) instructions – Less flexibility How Many GP Registers? • Between 8 - 32 • Fewer = more memory references • More does not reduce memory references and takes up processor real estate How big? • Large enough to hold full address • Large enough to hold full word • Often possible to combine two data registers – C programming – long int a; Condition Code Registers • Sets of individual bits – e.g. result of last operation was zero • Can be read (implicitly) by programs – e.g. Jump if zero • Can not (usually) be set by programs Control & Status Registers • • • • Program Counter Instruction Decoding Register Memory Address Register Memory Buffer Register • Revision: what do these all do? Program Status Word • A set of bits - Includes Condition Codes – – – – – – Sign - of last result Zero – set when result of last op was zero Carry – set if last op resulted in a carry Equal – set if a logical compare was true Overflow – set to indicate arithmetic overflow Interrupt enable/disable • Supervisor Mode • • • • • Intel ring zero Kernel mode Allows privileged instructions to execute Used by operating system Not available to user programs Other Registers • May have registers pointing to: – Process control blocks (see O/S) – Interrupt Vectors (see O/S) • N.B. CPU design and operating system design are closely linked Example Register Organizations Instruction Cycle Indirect Cycle • May require memory access to fetch operands • Indirect addressing requires more memory accesses • Can be thought of as additional instruction subcycle Instruction Cycle with Indirect Instruction Cycle State Diagram Data Flow (Instruction Fetch) • Depends on CPU design – in general: – Fetch • • • • • • PC contains address of next instruction Address moved to MAR Address placed on address bus Control unit requests memory read Result placed on data bus, copied to MBR, then to IR Meanwhile PC incremented by 1 Data Flow (Data Fetch) • IR is examined • If indirect addressing, indirect cycle is performed – Right most N bits of MBR transferred to MAR – Control unit requests memory read – Result (address of operand) moved to MBR Data Flow (Execute) • May take many forms • Depends on instruction being executed • May include – – – – Memory read/write Input/Output Register transfers ALU operations Data Flow (Interrupt) – Current PC saved to allow resumption after interrupt – Contents of PC copied to MBR – Special memory location (e.g. stack pointer) loaded to MAR – MBR written to memory – PC loaded with address of interrupt handling routine – Next instruction (first of interrupt handler) can be fetched Prefetch • Fetch accessing main memory • Execution usually does not access main memory • Can fetch next instruction during execution of current instruction • Called instruction prefetch Improved Performance • But not doubled: – Fetch usually shorter than execution • Prefetch more than one instruction? – Any jump or branch means that prefetched instructions are not the required instructions • Add more stages to improve performance Pipelining • • • • • • Fetch instruction Decode instruction Calculate operands (i.e. EAs) Fetch operands Execute instructions Write result • Overlap these operations Timing of Pipeline Branch in a Pipeline Dealing with Branches • • • • • Multiple Streams Prefetch Branch Target Loop buffer Branch prediction Delayed branching Multiple Streams • Have two pipelines • Prefetch each branch into a separate pipeline • Use appropriate pipeline • Leads to bus & register contention • Multiple branches lead to further pipelines being needed Prefetch Branch Target • Target of branch is prefetched in addition to instructions following branch • Keep target until branch is executed • Used by IBM 360/91 Loop Buffer • • • • • • Very fast memory Maintained by fetch stage of pipeline Check buffer before fetching from memory Very good for small loops or jumps c.f. cache Used by CRAY-1 Branch Prediction • Predict never taken – Assume that jump will not happen – Always fetch next instruction • Predict always taken – Assume that jump will happen – Always fetch target instruction • Predict by Opcode – Some instructions more likely to result in a jump – Can get up to 75% success • Taken/Not taken switch – Based on previous history – Good for loops • Delayed Branch – Do not take jump until you have to – Rearrange instructions Branch Prediction State Diagram Syllabus 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Introduction .....................................................................chapter 2 Computer Interconnection Structures .....................chapter 3 Internal & External Memory ........................................chapter 4 Input & Output ................................................................chapter 6 Operating Systems – part 1 ........................................chapter 7 Operating Systems – part 2 Computer Arithmetic – part 1 .....................................chapter 8 Computer Arithmetic – part 2 Instruction Sets: Characteristics & Function ........chapter 9 Instruction Sets: Addressing Modes & Formats .chapter 10 CPU Structure & Function – part 1 .........................chapter 11 CPU Structure & Function – part 2 Control Unit Operation ..............................................chapter 14 Computer Organisation & Architecture – William Stallings Micro-Operations • A computer executes a program • Fetch/execute cycle • Each cycle has a number of steps – see pipelining • Called micro-operations • Each step does very little • Atomic operation of CPU Constituent Elements of Program Execution Fetch - 4 Registers • Memory Address Register (MAR) – Connected to address bus – Specifies address for read or write op • Memory Buffer Register (MBR) – Connected to data bus – Holds data to write or last data read • Program Counter (PC) – Holds address of next instruction to be fetched • Instruction Register (IR) – Holds last instruction fetched Fetch Sequence • • • • • • • • Address of next instruction is in PC Address (MAR) is placed on address bus Control unit issues READ command Result (data from memory) appears on data bus Data from data bus copied into MBR PC incremented by 1 (in parallel with data fetch from memory) Data (instruction) moved from MBR to IR MBR is now free for further data fetches Fetch Sequence (symbolic) t1: t2: t3: MAR <- (PC) MBR <- (memory) PC <- (PC) +1 IR <- (MBR) (tx = time unit/clock cycle) • Or (conceptually identical) t1: t2: t3: MAR <- (PC) MBR <- (memory) PC <- (PC) +1 IR <- (MBR) Rules for Clock Cycle Grouping • Proper sequence must be followed – MAR <- (PC) must precede MBR <- (memory) • Conflicts must be avoided – Must not read & write same register at same time – MBR <- (memory) & IR <- (MBR) must not be in same cycle • Also: PC <- (PC) +1 involves addition – Use ALU – May need additional micro-operations Indirect Address Cycle MAR <- (IRaddress) - address field of IR MBR <- (memory) IRaddress <- (MBRaddress) • MBR contains an address • IR is now in same state as if direct addressing had been used Interrupt Cycle t1: t2: t3: MBR <-(PC) MAR <- save-address PC <- routine-address memory <- (MBR) • This is a minimum – May be additional micro-ops to get addresses – N.B. saving context is done by interrupt handler routine, not micro-ops Execute Cycle (ADD) • Different for each instruction • e.g. ADD R1,X - add the contents of location X to Register 1 , result in R1 t1: t2: t3: MAR <- (IRaddress) MBR <- (memory) R1 <- R1 + (MBR) • Note no overlap of micro-operations Execute Cycle (ISZ) • ISZ X - increment and skip if zero t1: t2: t3: t4: MAR <- (IRaddress) MBR <- (memory) MBR <- (MBR) + 1 memory <- (MBR) if (MBR) == 0 then PC <- (PC) + 1 • Notes: – if is a single micro-operation – Micro-operations done during t4 Execute Cycle (BSA) • BSA X - Branch and save address – Address of instruction following BSA is saved in X – Execution continues from X+1 t1: t2: t3: MAR <- (IRaddress) MBR <- (PC) PC <- (IRaddress) memory <- (MBR) PC <- (PC) + 1 Functional Requirements • Define basic elements of processor • Describe micro-operations processor performs • Determine functions control unit must perform • • • • • Basic Elements of Processor ALU Registers Internal data paths External data paths Control Unit Types of Micro-operation • • • • Transfer data between registers Transfer data from register to external Transfer data from external to register Perform arithmetic or logical ops Functions of Control Unit • Sequencing – Causing the CPU to step through a series of microoperations • Execution – Causing the performance of each micro-op • This is done using Control Signals Control Signals • Clock – One micro-instruction (or set of parallel micro-instructions) per clock cycle • Instruction register – Op-code for current instruction – Determines which micro-instructions are performed • Flags – State of CPU – Results of previous operations • From control bus – Interrupts – Acknowledgements Control Signals - output • Within CPU – Cause data movement – Activate specific functions • Via control bus – To memory – To I/O modules Example Control Signal Sequence - Fetch • MAR <- (PC) – Control unit activates signal to open gates between PC and MAR • MBR <- (memory) – Open gates between MAR and address bus – Memory read control signal – Open gates between data bus and MBR Internal Organization • Usually a single internal bus • Gates control movement of data onto and off the bus • Control signals control data transfer to and from external systems bus • Temporary registers needed for proper operation of ALU Hardwired Implementation (1) • Control unit inputs • Flags and control bus – Each bit means something • Instruction register – Op-code causes different control signals for each different instruction – Unique logic for each op-code – Decoder takes encoded input and produces single output – n binary inputs and 2n outputs Hardwired Implementation (2) • Clock – – – – Repetitive sequence of pulses Useful for measuring duration of micro-ops Must be long enough to allow signal propagation Different control signals at different times within instruction cycle – Need a counter with different control signals for t1, t2 etc. Problems With Hard Wired Designs • • • • Complex sequencing & micro-operation logic Difficult to design and test Inflexible design Difficult to add new instructions • Better to use a microprogrammed controller……