Download Computer Systems & Networks

Document related concepts
no text concepts found
Transcript
Computer Systems &
Networks
I – Computer Systems
Roger Webb (13DJ01)
[email protected]
Syllabus
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Introduction .....................................................................chapter 2
Computer Interconnection Structures .....................chapter 3
Internal & External Memory ........................................chapter 4
Input & Output ................................................................chapter 6
Operating Systems – part 1 ........................................chapter 7
Operating Systems – part 2
Computer Arithmetic – part 1 .....................................chapter 8
Computer Arithmetic – part 2
Instruction Sets: Characteristics & Function ........chapter 9
Instruction Sets: Addressing Modes & Formats .chapter 10
CPU Structure & Function – part 1 .........................chapter 11
CPU Structure & Function – part 2
Control Unit Operation ..............................................chapter 14
Computer Organisation & Architecture – William Stallings
Introduction
History of Computing:
Computer:- a person who performs computations
Early Computing Devices: (8000BC – 1600)
8000 BC Tally Bones/Sticks
5000 BC – Sumeria – Abaq (writing in dust)
3000 BC – China - the Abacus
800 BC – Abax arrives in Europe
1594 – John Napier natural logs
Introduction
History of Computing:
Mechanical Computers: (1600-1800)
1615 – Slide Rule – William Oughtred
1617 – Napier’s Bones
1642 – Pascaline - Blaise Pascal
1804 – Punch Card Loom
– Jacquard
Introduction
History of Computing:
Electro-Mechanical Computers:
(1800-1890)
1833 – Difference Engine – Babbage
- to make accurate trig and log tables
- never actually finished – “of no use to
science”
1842 – Analytic Engine (programmable
version)
(1991 Science Museum spent £400,000 to build
working model of simplified Difference Engine
– 3tons, 6’ high and 10’ long)
1840 – Lady Ada Lovelace
– 1st programmer
- suggested that Babbage’s engine could
be programmed using punch cards
Introduction
History of Computing:
Mechanical Computers: (1890-1920)
1890 – Holerith’s Tabulating Machine
- punch card data for sorting census data
1906 – Lee De Forest invents vacuum tube (triode)
1911 – Holerith founds Tabulating Machines Company
1924 – Renamed International Business Machines
Corporation (IBM) by Thomas J. Watson
Introduction
History of Computing:
Electro-Mechanical Computers:
(1920-1945)
1920 – Punch Card Technology
1944 – Mark I (Harvard)
1943 – Colossus (Bletchley Park)
similar to Babbage engine.
IBM: “electro-mechanical computers will never
replace punched card-equipment”
Introduction
History of Computing:
Electronic Computers: (1940-1950)
1942- First Electronic Computer - ABC
uses base-2 number system, memory and logic
circuits built from vacuum tubes
IBM: “we will never be interested in an electronic
computing machine”
1945: “I think there is a world market for maybe 5
computers” Chairman IBM
1946 – ENIAC (Electronic Numerical Integrator &
Computer)
to compute trajectory tables for the army (only 20%
of bombs got within 1000ft)
ENIAC’s 18,000 vacuum tubes dimmed the lights
of Philadelphia when switched on
Used in building first atomic bomb tests at Los
Alamos
Calculated pi to 2000 places in seventy hours
Operated in decimal
ENIAC - details
•
•
•
•
•
•
•
•
Decimal (not binary)
20 accumulators of 10 digits
Programmed manually by switches
18,000 vacuum tubes
30 tons
15,000 square feet
140 kW power consumption
5,000 additions per second
von Neumann/Turing
•
•
•
•
Stored Program concept
Main memory storing programs and data
ALU operating on binary data
Control unit interpreting instructions from memory
and executing
• Input and output equipment operated by control unit
• Princeton Institute for Advanced Studies - IAS
• Completed 1952
Arithmetic and Logic Unit
Input
Output
Equipment
Main
Memory
Program Control Unit
Introduction
History of Computing:
Electronic Computers: (1940-1950)
1947- First Computer “Bug”
moth found in Mark II relay
causing malfunction – hence “debugging”
1947 – Transistor
Schockley, Bardeen & Brattain
paves the way for smaller, low power
and more stable systems to be built using same
designs but replacing valve technology with
transistor technology
IAS Computer
IAS – Institute of Advanced Study (Princeton) 1952
Basic Von Neuman Architecture:
• 1000 x 40 bit words
– Binary number
– 2 x 20 bit instructions
• Set of registers (storage in CPU)
–
–
–
–
–
–
–
Memory Buffer Register
Memory Address Register
Instruction Register
Instruction Buffer Register
Program Counter
Accumulator
Multiplier Quotient
Structure of IAS - detail
Central Processing Unit
Arithmetic and Logic Unit
Accumulator
MQ
Arithmetic & Logic Circuits
MBR
Input
Output
Equipment
Instructions
Main
& Data
Memory
PC
IBR
MAR
IR
Control
Circuits
Program Control Unit
Address
Introduction
History of Computing:
Electronic Computers: (1950-70)
1954 - First commercial silicon transistor
Texas Instruments
1958- First Integrated Circuit
Jack Kilby from Texas Instruments
made simple oscillator
1971 – 1st Microprocessor -Intel
Intel (1968) make 4004 processor to do job of 12
chips
1972 – Pioneer 10 uses 4004 chips
intel 4004 is first uprocessor to asteroid belt...
Introduction
History of Computing:
1st Generation Computers: (1950-60)
Using Valve Technology
1951 - First Commercial Computer
UNIVAC–1
correctly forecast US election in 1951 after
only 5% of vote was in
1953- IBM 701
1st attempt from IBM
1954 – IBM 650
2nd attempt made as upgrade of punch card
machines estimated total market as 50 – sold
1000...
Introduction
History of Computing:
2nd Generation Computers: (1960-65)
Using transistor technology
1958 – Univac – NTDS
32kwords memory, 10,702 transistors, $500,000, 25kw
1960 – PDP 1
4kwords memory – CRT display
first two player computer game – spacewar (1962)
1963 – PDP 8
$18,000 by 1971 25 companies making mini-comp
Introduction
History of Computing:
3rd Generation Computers: (1965-70)
Using Integrated Circuits
1964 – IBM System 360
Upward compatibility - no need to redevelop
when upgrading
1966 – Funding for ARPA net
The beginnings of the internet
1967 – First Handheld Electronic Calculator
Introduction
History of Computing:
4th Generation Computers: (1971- )
Using Large Scale Integrated Circuits
1971 – LSI ICs
several thousand transistors on a chip
1971 – Intel invent Microprocessor
1980’s– VLSI ICs
several tens of thousands of transistors
80486
Pentium 4
Introduction
• Multi Processor Systems – the way ahead
• Fastest machine on the planet – as of November
2002 – the Earth Simulator in Japan
Generations of Computer
• Vacuum tube - 1946-1957
• Transistor - 1958-1964
• Small scale integration - 1965 on
– Up to 100 devices on a chip
• Medium scale integration - to 1971
– 100-3,000 devices on a chip
• Large scale integration - 1971-1977
– 3,000 - 100,000 devices on a chip
• Very large scale integration - 1978 to date
– 100,000 - 100,000,000 devices on a chip
• Ultra large scale integration
– Over 100,000,000 devices on a chip
Moore’s Law
•
•
•
•
•
•
•
•
•
Increased density of components on chip
Gordon Moore – co-founder of Intel
Number of transistors on a chip will double every year
Since 1970’s development has slowed a little
– Number of transistors doubles every 18 months
Cost of a chip has remained almost unchanged
Higher packing density means shorter electrical paths,
giving higher performance
Smaller size gives increased flexibility
Reduced power and cooling requirements
Fewer interconnections increases reliability
Growth in CPU Transistor
Count
Itanium 2
Pentium 4
Semiconductor Memory
• 1970
• Fairchild produce SRAM
– Size of a single core
• i.e. 1 bit of magnetic core storage
– Holds 256 bits
– Non-destructive read – 6 transistors
– Much faster than core
• Intel produce DRAM
– Cheaper but slower – 1 transistor
• Capacity approximately doubles each year
Core
Memory
DRAM
Speeding it up
•
•
•
•
•
•
Pipelining
On board cache
On board L1 & L2 cache
Branch prediction
Data flow analysis
Speculative execution
Performance Mismatch
• Processor speed increased
• Memory capacity increased
• Memory speed lags behind processor speed
Trends in DRAM use
Solutions
• Increase number of bits retrieved at one time
– Make DRAM “wider” rather than “deeper”
• Change DRAM interface
– Cache
• Reduce frequency of memory access
– More complex cache and cache on chip
• Increase interconnection bandwidth
– High speed buses
– Hierarchy of buses
Internet Resources
• http://www.intel.com/
– Search for the Intel Museum
•
•
•
•
•
•
http://www.ibm.com
http://www.dec.com
http://www.digidome.nl
http://ed-thelen.org/comp-hist/
http://www.computerhistory.org/
http://www.cs.man.ac.uk/CCS
Syllabus
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Introduction .....................................................................chapter 2
Computer Interconnection Structures .....................chapter 3
Internal & External Memory ........................................chapter 4
Input & Output ................................................................chapter 6
Operating Systems – part 1 ........................................chapter 7
Operating Systems – part 2
Computer Arithmetic – part 1 .....................................chapter 8
Computer Arithmetic – part 2
Instruction Sets: Characteristics & Function ........chapter 9
Instruction Sets: Addressing Modes & Formats .chapter 10
CPU Structure & Function – part 1 .........................chapter 11
CPU Structure & Function – part 2
Control Unit Operation ..............................................chapter 14
Computer Organisation & Architecture – William Stallings
von Neumann/Turing
•
•
•
•
Stored Program concept
Main memory storing programs and data
ALU operating on binary data
Control unit interpreting instructions from memory
and executing
• Input and output equipment operated by control unit
• Princeton Institute for Advanced Studies - IAS
• Completed 1952
Arithmetic and Logic Unit
Input
Output
Equipment
Main
Memory
Program Control Unit
Program Concept
• Hardwired systems are inflexible
• General purpose hardware can do different tasks, given correct
control signals
• Instead of re-wiring, supply a new set of control signals
What is a program?
• A sequence of steps
• For each step, an arithmetic or logical operation is done
• For each operation, a different set of control signals is needed
Function of Control Unit
• For each operation a unique code is provided
– e.g. ADD, MOVE
• A hardware segment accepts the code and issues the control
signals
• We have a computer!
Components
• The Control Unit and the
Arithmetic and Logic Unit
constitute the Central
Processing Unit
• Data and instructions need
to get into the system and
results out
– Input/output
• Temporary storage of code
and results is needed
– Main memory
Top Level View
Instruction Cycle
• Two steps:
– Fetch
– Execute
Fetch Cycle
• Program Counter (PC) holds address of next
instruction to fetch
• Processor fetches instruction from memory location
pointed to by PC
• Increment PC
Central Processing Unit
– Unless told otherwise
• Instruction loaded into
Instruction Register (IR)
• Processor interprets
instruction and performs
required actions
Arithmetic and Logic Unit
MQ
Accumulator
Arithmetic & Logic Circuits
MBR
Input
Output
Instructions
Main
& Data
Memory
IBR
PC
MAR
IR
Control
Circuits
Program Control Unit
Address
Execute Cycle
• Processor-memory
– data transfer between CPU and main memory
• Processor I/O
– Data transfer between CPU and I/O module
• Data processing
Central Processing Unit
Arithmetic and Logic Unit
MQ
Accumulator
– Some arithmetic or
logical operation on
data
• Control
– Alteration of
sequence of
operations
– e.g. jump
• Combination of above
Arithmetic & Logic Circuits
MBR
Input
Output
Instructions
Main
& Data
Memory
IBR
PC
MAR
IR
Control
Circuits
Program Control Unit
Address
Example of Program
Execution
Connecting
• All the units must be connected
• Different type of connection for different type of unit
– Memory
– Input/Output
– CPU
Memory Connection
• Receives and sends data
• Receives addresses (of locations)
• Receives control signals
– Read
– Write
– Timing
Input/Output Connection
• Similar to memory from computer’s viewpoint
• Output
– Receive data from computer
– Send data to peripheral
• Input
– Receive data from peripheral
– Send data to computer
• Receive control signals from computer
• Send control signals to peripherals
– e.g. spin disk
• Receive addresses from computer
– e.g. port number to identify peripheral
• Send interrupt signals (control)
CPU Connection
•
•
•
•
Reads instruction and data
Writes out data (after processing)
Sends control signals to other units
Receives (& acts on) interrupts
Buses
• There are a number of possible interconnection systems
• Single and multiple BUS structures are most common
• e.g. Control/Address/Data bus (PC)
• e.g. Unibus (DEC-PDP)
Data Bus
• Carries data
– Remember that there is no difference between “data” and “instruction”
at this level
• “Width” is a key determinant of performance
– 8, 16, 32, 64 bit – number of parallel lines
Address bus
• Identify the source or destination of data
• e.g. CPU needs to read an instruction (data) from a given location in
memory
• Bus width determines maximum memory capacity of system
– e.g. 8080 has 16 bit address bus giving 64k address space
Control Bus
• Control and timing information
– Memory read/write signal
– Interrupt request
– Clock signals
Bus Interconnection
Scheme
Big, Red & Double Decker?
• What do buses look like?
– Parallel lines on circuit boards
– Ribbon cables
– Strip connectors on mother boards
• e.g. PCI
– Sets of wires
Single Bus Problems
• Lots of devices on one bus leads to:
– Propagation delays
• Long data paths mean that co-ordination of bus use can
adversely affect performance
• If aggregate data transfer approaches bus capacity
• Most systems use multiple buses to overcome these
problems
Traditional Bus Strtucture
High Performance Bus
Bus Types
• Dedicated
– Separate data & address lines
• Multiplexed
–
–
–
–
Shared lines
Address valid or data valid control line
Advantage - fewer lines
Disadvantages
• More complex control
• Ultimate performance
Bus Arbitration
•
•
•
•
More than one module controlling the bus
e.g. CPU and DMA controller
Only one module may control bus at one time
Arbitration may be centralised or distributed
Centralised Arbitration
• Single hardware device controlling bus access
– Bus Controller or Arbiter
• May be part of CPU or separate
Distributed Arbitration
• Each module may claim the bus
• Control logic on all modules
Timing
• Co-ordination of events on bus
• Synchronous
–
–
–
–
–
–
Events determined by clock signals
Control Bus includes clock line
A single 1-0 is a bus cycle
All devices can read clock line
Usually sync on leading edge
Usually a single cycle for an event
Synchronous Timing
Diagram
Asynchronous Timing
Diagram
Syllabus
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Introduction .....................................................................chapter 2
Computer Interconnection Structures .....................chapter 3
Internal & External Memory ........................................chapter 4
Input & Output ................................................................chapter 6
Operating Systems – part 1 ........................................chapter 7
Operating Systems – part 2
Computer Arithmetic – part 1 .....................................chapter 8
Computer Arithmetic – part 2
Instruction Sets: Characteristics & Function ........chapter 9
Instruction Sets: Addressing Modes & Formats .chapter 10
CPU Structure & Function – part 1 .........................chapter 11
CPU Structure & Function – part 2
Control Unit Operation ..............................................chapter 14
Computer Organisation & Architecture – William Stallings
Memory Characteristics
•
•
•
•
•
•
•
•
Location
Capacity
Unit of transfer
Access method
Performance
Physical type
Physical characteristics
Organisation
Location
• CPU – Internal - External
• Word size
Capacity
– The natural unit of organisation
• Number of words
– or Bytes
• Internal
Unit of Transfer
– Usually governed by data bus width
• External
– Usually a block which is much larger than a word
• Addressable unit
– Smallest location which can be uniquely addressed
– Word internally
Access Methods
• Sequential
– Start at the beginning and read through in order
– Access time depends on location of data and previous location
– e.g. tape
• Direct
–
–
–
–
Individual blocks have unique address
Access is by jumping to vicinity plus sequential search
Access time depends on location and previous location
e.g. disk
• Random
– Individual addresses identify locations exactly
– Access time is independent of location or previous access
– e.g. RAM
• Associative
– Data is located by a comparison with contents of a portion of the store
– Access time is independent of location or previous access
– e.g. cache
Memory Hierarchy
• Registers
– In CPU
• Internal or Main memory
– May include one or more levels of cache
– “RAM”
• External memory
– Backing store
Performance
• Access time
– Time between presenting the address and getting the
valid data
• Memory Cycle time
– Time may be required for the memory to “recover” before
next access
– Cycle time is access + recovery
• Transfer Rate
– Rate at which data can be moved
Physical Types
• Semiconductor
– RAM
• Magnetic
– Disk & Tape
• Optical
– CD & DVD
• Others
– Bubble
– Hologram
Physical Characteristics
•
•
•
•
Decay
Volatility
Erasable
Power consumption
Organisation
• Physical arrangement of bits into words
• Not always obvious
• e.g. interleaved
The Bottom Line
• How much?
– Capacity
• How fast?
– Time is money
• How expensive?
Hierarchy List
•
•
•
•
•
•
•
•
Registers
L1 Cache
L2 Cache
Main memory
Disk cache
Disk
Optical
Tape
CPU
Semiconductor Memory
• RAM
– Misnamed as all semiconductor memory is random
access
– Read/Write
– Volatile
– Temporary storage
– Static or dynamic
Dynamic RAM
•
•
•
•
•
•
•
•
•
Bits stored as charge in capacitors
Charges leak
Need refreshing even when powered
Simpler construction
Smaller per bit
Less expensive
Need refresh circuits
Slower
Main memory
Static RAM
•
•
•
•
•
•
•
•
•
Bits stored as on/off switches
No charges to leak
No refreshing needed when powered
More complex construction
Larger per bit
More expensive
Does not need refresh circuits
Faster
Cache
Read Only Memory (ROM)
•
•
•
•
•
Permanent storage
Microprogramming (see later)
Library subroutines
Systems programs (BIOS)
Function tables
Types of ROM
• Written during manufacture
– Very expensive for small runs
• Programmable (once)
– PROM
– Needs special equipment to program
• Read “mostly”
– Erasable Programmable (EPROM)
• Erased by UV
– Electrically Erasable (EEPROM)
• Takes much longer to write than read
– Flash memory
• Erase whole memory electrically
Organisation in detail
• A 16Mbit chip can be organised as 1M of 16 bit words
• A bit per chip system has 16 lots of 1Mbit chip with bit
1 of each word in chip 1 and so on
• A 16Mbit chip can be organised as a 2048 x 2048 x
4bit array
– Reduces number of address pins
• Multiplex row address and column address
• 11 pins to address (211=2048)
• Adding one more pin doubles range of values so x4
capacity
Refreshing
•
•
•
•
•
•
Refresh circuit included on chip
Disable chip
Count through rows
Read & Write back
Takes time
Slows down apparent performance
Typical 16 Mb DRAM (4M x 4)
Module
Organisation
1 bit per chip per word
Module Organisation (2)
Larger Memory Size
Cache
• Small amount of fast memory
• Sits between normal main memory and CPU
• May be located on CPU chip or module
So you want fast?
• It is possible to build a computer which uses only
static RAM
• This would be very fast
• This would need no cache
– How can you cache cache?
• This would cost a very large amount
Cache operation - overview
•
•
•
•
CPU requests contents of memory location
Check cache for this data
If present, get from cache (fast)
If not present, read required block from main memory
to cache
• Then deliver from cache to CPU
• Cache includes tags to identify which block of main
memory is in each cache slot
Cache Design
•
•
•
•
•
•
Size
Mapping Function
Replacement Algorithm
Write Policy
Block Size
Number of Caches
Size does matter
• Cost
– More cache is expensive
• Speed
– More cache is faster (up to a point)
– Checking cache for data takes time
Locality of Reference
• During the course of the execution of a program,
memory references tend to cluster
• e.g. loops
Newer RAM Technology (1)
• Basic DRAM same since first RAM chips
• Enhanced DRAM
– Contains small SRAM as well
– SRAM holds last line read (c.f. Cache!)
• Cache DRAM
– Larger SRAM component
– Use as cache or serial buffer
Newer RAM Technology (2)
• Synchronous DRAM (SDRAM)
–
–
–
–
–
currently on DIMMs
Access is synchronized with an external clock
Address is presented to RAM
RAM finds data (CPU waits in conventional DRAM)
Since SDRAM moves data in time with system clock,
CPU knows when data will be ready
– CPU does not have to wait, it can do something else
– Burst mode allows SDRAM to set up stream of data
and fire it out in block
SDRAM
Syllabus
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Introduction .....................................................................chapter 2
Computer Interconnection Structures .....................chapter 3
Internal & External Memory ........................................chapter 4
Input & Output ................................................................chapter 6
Operating Systems – part 1 ........................................chapter 7
Operating Systems – part 2
Computer Arithmetic – part 1 .....................................chapter 8
Computer Arithmetic – part 2
Instruction Sets: Characteristics & Function ........chapter 9
Instruction Sets: Addressing Modes & Formats .chapter 10
CPU Structure & Function – part 1 .........................chapter 11
CPU Structure & Function – part 2
Control Unit Operation ..............................................chapter 14
Computer Organisation & Architecture – William Stallings
Input/Output Problems
• Wide variety of peripherals
– Delivering different amounts of data
– At different speeds
– In different formats
• All slower than CPU and RAM
• Need I/O modules
– Interface to CPU and Memory
– Interface to one or more peripherals
I/O Module Function
•
•
•
•
•
Control & Timing
CPU Communication
Device Communication
Data Buffering
Error Detection
Address bus
Data bus
Control bus
I/O Steps
•
•
•
•
•
•
CPU checks I/O module device status
I/O module returns status
If ready, CPU requests data transfer
I/O module gets data from device
I/O module transfers data to CPU
Variations for output, DMA, etc.
I/O Module
Links to Peripheral Devices
I/O Module Decisions
•
•
•
•
Hide or reveal device properties to CPU
Support multiple or single device
Control device functions or leave for CPU
Also O/S decisions
– e.g. Unix treats everything it can as a file
Input Output Techniques
• Programmed
• Interrupt driven
• Direct Memory Access (DMA)
Programmed I/O
• CPU has direct control over I/O
– Sensing status
– Read/write commands
– Transferring data
• CPU waits for I/O module to complete operation
– Wastes CPU time
Programmed I/O - detail
•
•
•
•
•
•
•
CPU requests I/O operation
I/O module performs operation
I/O module sets status bits
CPU checks status bits periodically
I/O module does not inform CPU directly
I/O module does not interrupt CPU
CPU may wait or come back later
I/O Commands
• CPU issues address
– Identifies module (& device if >1 per module)
• CPU issues command
– Control - telling module what to do
• e.g. spin up disk
– Test - check status
• e.g. power? Error?
– Read/Write
• Module transfers data via buffer from/to device
Addressing I/O Devices
• Under programmed I/O data transfer is very like memory
access (CPU viewpoint)
• Each device given unique identifier
• CPU commands contain identifier (address)
I/O Mapping
• Memory mapped I/O
– Devices and memory share an address space
– I/O looks just like memory read/write
– No special commands for I/O
• Large selection of memory access commands available
• Isolated I/O
– Separate address spaces
– Need I/O or memory select lines
– Special commands for I/O
• Limited set
Interrupt Driven I/O
• Overcomes CPU waiting
• No repeated CPU checking of device
• I/O module interrupts when ready
Interrupt Driven I/O - Basic Operation
• CPU issues read command
• I/O module gets data from peripheral whilst CPU
does other work
• I/O module interrupts CPU
• CPU requests data
• I/O module transfers data
CPU Viewpoint
•
•
•
•
Issue read command
Do other work
Check for interrupt at end of each instruction cycle
If interrupted:– Save context (registers)
– Process interrupt
• Fetch data & store
• See Operating Systems notes
Design Issues
• How do you identify the module issuing the interrupt?
• How do you deal with multiple interrupts?
– i.e. an interrupt handler being interrupted
Identifying Interrupting Module
• Different line for each module
– Limits number of devices
• Software poll
– CPU asks each module in turn - Slow
• Daisy Chain or Hardware poll
– Interrupt Acknowledge sent down a chain
– Module responsible places vector on bus
– CPU uses vector to identify handler routine
• Bus Master
– Module must claim the bus before it can raise interrupt
– e.g. PCI & SCSI
Multiple Interrupts
• Each interrupt line has a priority
• Higher priority lines can interrupt lower priority lines
• If bus mastering only current master can interrupt
Direct Memory Access
• Interrupt driven and programmed I/O require active
CPU intervention
– Transfer rate is limited
– CPU is tied up
• DMA is the answer
DMA Function
• Additional Module (hardware) on bus
• DMA controller takes over from CPU for I/O
DMA Operation
• CPU tells DMA controller:–
–
–
–
Read/Write
Device address
Starting address of memory block for data
Amount of data to be transferred
• CPU carries on with other work
• DMA controller deals with transfer
• DMA controller sends interrupt when finished
DMA Transfer - Cycle Stealing
• DMA controller takes over bus for a cycle
• Transfer of one word of data
• Not an interrupt
– CPU does not switch context
• CPU suspended just before it accesses bus
– i.e. before an operand or data fetch or a data write
• Slows down CPU but not as much as CPU doing
transfer
DMA Configurations (1)
CPU
DMA
Controller
I/O
Device
I/O
Device
• Single Bus, Detached DMA controller
• Each transfer uses bus twice
– I/O to DMA then DMA to memory
• CPU is suspended twice
Main
Memory
DMA Configurations (2)
CPU
DMA
Controller
I/O
Device
I/O
Device
DMA
Controller
I/O
Device
• Single Bus, Integrated DMA controller
• Controller may support >1 device
• Each transfer uses bus once
– DMA to memory
• CPU is suspended once
Main
Memory
Syllabus
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Introduction .....................................................................chapter 2
Computer Interconnection Structures .....................chapter 3
Internal & External Memory ........................................chapter 4
Input & Output ................................................................chapter 6
Operating Systems – part 1 ........................................chapter 7
Operating Systems – part 2
Computer Arithmetic – part 1 .....................................chapter 8
Computer Arithmetic – part 2
Instruction Sets: Characteristics & Function ........chapter 9
Instruction Sets: Addressing Modes & Formats .chapter 10
CPU Structure & Function – part 1 .........................chapter 11
CPU Structure & Function – part 2
Control Unit Operation ..............................................chapter 14
Computer Organisation & Architecture – William Stallings
O/S Objectives and Functions
• Convenience
– Making the computer easier to use
• Efficiency
– Allowing better use of computer resources
Layers
Operating System Services
•
•
•
•
•
•
•
Program creation
Program execution
Access to I/O devices
Controlled access to files
System access
Error detection and response
Accounting
Types of Operating System
•
•
•
•
Interactive
Batch
Single program (Uni-programming)
Multi-programming (Multi-tasking)
Early Systems
• Late 1940s to mid 1950s
– No Operating System
– Programs interact directly with hardware
– Two main problems:
– Scheduling
– Setup time
Simple Batch Systems
•
•
•
•
•
Resident Monitor program
Users submit jobs to operator
Operator batches jobs
Monitor controls sequence of events to process batch
When one job is finished, control returns to Monitor
which reads next job
• Monitor handles scheduling
Desirable Hardware Features
• Memory protection
– To protect the Monitor (not screen saver!)
• Timer
– To prevent a job monopolizing the system
• Privileged instructions
– Only executed by Monitor
– e.g. I/O
• Interrupts
– Allows for relinquishing and regaining control
Multi-programmed Batch Systems
• I/O devices very slow
• When one program is waiting for I/O, another can use
the CPU
Single Program
Multi-Programming with 2 Programs
Multi-Programming with 3 Programs
Time Sharing Systems
• Allow users to interact directly with the computer
– i.e. Interactive
• Multi-programming allows a number of users to
interact with the computer
Process States
Process Control Block
•
•
•
•
•
•
•
•
Identifier PID
State
Running, Ready, Blocked, New, Exit
Priority
High, Medium, Low, etc….
Program Counter Where were we?
Memory pointers Preserve our calculations
Context data
What was in the registers
I/O status
Reading, Writing, Null
Accounting information How Much Used Already
Memory Management
• Uni-program
– Memory split into two
– One for Operating System (monitor)
– One for currently executing program
• Multi-program
– “User” part is sub-divided and shared among active
processes
Swapping
• Problem: I/O is so slow compared with CPU that
even in multi-programming system, CPU can be idle
most of the time
• Solutions:
– Increase main memory
• Expensive
• Leads to larger programs
– Swapping
What is Swapping?
• Long term queue of processes stored on disk
• Processes “swapped” in as space becomes available
• As a process completes it is moved out of main
memory
• If none of the processes in memory are ready (i.e. all
I/O blocked)
– Swap out a blocked process to intermediate queue
– Swap in a ready process or a new process
– But swapping is an I/O process...
Partitioning
• Splitting memory into sections
to allocate to processes
(including Operating System)
• Fixed-sized partitions
– May not be equal size
– Process is fitted into smallest
hole that will take it (best fit)
– Some wasted memory
– Leads to variable sized
partitions
Variable Sized Partitions (1)
• Allocate exactly the required memory to a process
• This leads to a hole at the end of memory, too small
to use
– Only one small hole - less waste
• When all processes are blocked, swap out a process
and bring in another
• New process may be smaller than swapped out
process
• Another hole
Variable Sized Partitions (2)
• Eventually have lots of holes (fragmentation)
• Solutions:
– Coalesce - Join adjacent holes into one large hole
– Compaction - From time to time go through memory
and move all hole into one free block (c.f. disk defragmentation)
Effect of Dynamic Partitioning
Relocation
• No guarantee that process will load into the same
place in memory
• Instructions contain addresses
– Locations of data
– Addresses for instructions (branching)
• Logical address - relative to beginning of program
• Physical address - actual location in memory (this
time)
• Automatic conversion using base address
Paging
• Split memory into equal sized, small chunks -page
frames
• Split programs (processes) into equal sized small
chunks - pages
• Allocate the required number page frames to a
process
• Operating System maintains list of free frames
• A process does not require contiguous page frames
• Use page table to keep track
Logical and Physical
Addresses - Paging
Virtual Memory
• Demand paging
– Do not require all pages of a process in memory
– Bring in pages as required
• Page fault
–
–
–
–
Required page is not in memory
Operating System must swap in required page
May need to swap out a page to make space
Select page to throw out based on recent history
Thrashing
•
•
•
•
Too many processes in too little memory
Operating System spends all its time swapping
Little or no real work is done
Disk light is on all the time
• Solutions
– Good page replacement algorithms
– Reduce number of processes running
– Fit more memory
Bonus
• We do not need all of a process in memory for it to run
• We can swap in pages as required
• So - we can now run processes that are bigger than total
memory available!
• Main memory is called real memory
• User/programmer sees much bigger memory - virtual
memory
Page Table Structure
Segmentation
• Paging is not (usually) visible to the programmer
• Segmentation is visible to the programmer
• Usually different segments allocated to program and
data
• May be a number of program and data segments
Advantages of Segmentation
• Simplifies handling of growing data structures
• Allows programs to be altered and recompiled
independently, without re-linking and re-loading
• Lends itself to sharing among processes
• Lends itself to protection
• Some systems combine segmentation with paging
Syllabus
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Introduction .....................................................................chapter 2
Computer Interconnection Structures .....................chapter 3
Internal & External Memory ........................................chapter 4
Input & Output ................................................................chapter 6
Operating Systems – part 1 ........................................chapter 7
Operating Systems – part 2
Computer Arithmetic – part 1 .....................................chapter 8
Computer Arithmetic – part 2
Instruction Sets: Characteristics & Function ........chapter 9
Instruction Sets: Addressing Modes & Formats .chapter 10
CPU Structure & Function – part 1 .........................chapter 11
CPU Structure & Function – part 2
Control Unit Operation ..............................................chapter 14
Computer Organisation & Architecture – William Stallings
Arithmetic & Logic Unit
• Does the calculations
• Everything else in the computer is there to service
this unit
• Handles integers
• May handle floating point (real) numbers
• May be separate FPU (maths co-processor)
• May be on chip separate FPU (486DX +)
Integer Representation
• Only have 0 & 1 to represent everything
• Positive numbers stored in binary
– e.g. 41=00101001
• No minus sign
• No decimal point
• Instead we can use either:
– Sign-Magnitude
– Two’s compliment
Sign-Magnitude
•
•
•
•
•
•
Left most bit is sign bit
0 means positive
1 means negative
+18 = 00010010
-18 = 10010010
Problems
– Need to consider both sign and magnitude in arithmetic
– Two representations of zero (+0 and -0)
Two’s Compliment
•
•
•
•
•
•
•
+3 = 00000011
+2 = 00000010
+1 = 00000001
+0 = 00000000
-1 = 11111111
-2 = 11111110
-3 = 11111101
Benefits
• One representation of zero
• Arithmetic works easily (see later)
• Negating is fairly easy
– 3 = 00000011
– Boolean complement gives
– Add 1 to LSB
11111100
11111101
Geometric Depiction of Twos
Complement Integers
Negation Special Case 1
•
•
•
•
•
•
0=
00000000
Bitwise not
11111111
Add 1 to LSB
+1
Result
1 00000000
Overflow is ignored, so:
-0=0
Negation Special Case 2
•
•
•
•
•
•
•
•
•
-128 =
10000000
bitwise not 01111111
Add 1 to LSB
+1
Result
10000000
So:
-(-128) = -128 X
Monitor MSB (sign bit)
It should change during negation
In this case 128 is too big to be represented
Range of Numbers
• 8 bit 2s compliment
– +127 = 01111111 = 27 -1
– -128 = 10000000 = -27
• 16 bit 2s compliment
– +32767 = 011111111 11111111 = 215 - 1
– -32768 = 100000000 00000000 = -215
Conversion Between
Lengths
•
•
•
•
•
•
•
Positive number pack with leading zeros
+18 =
00010010
+18 = 00000000 00010010
Negative numbers pack with leading ones
-18 =
10010010
-18 = 11111111 10010010
i.e. pack with MSB (sign bit)
Addition and Subtraction
• Normal binary addition
• Monitor sign bit for overflow
• Take twos compliment of subtrahend and add to
minuend
– i.e. a - b = a + (-b)
• So we only need addition and complement circuits
Hardware for Addition
and Subtraction
Multiplication
•
•
•
•
Complex
Work out partial product for each digit
Take care with place value (column)
Add partial products
Multiplication Example
1011 Multiplicand (11 dec)
x 1101 Multiplier (13 dec)
1011 Partial products
0000 Note: if multiplier bit is 1 copy
1011
multiplicand (place value)
1011
otherwise zero
10001111 Product (143 dec)
Note: need double length result
Unsigned Binary Multiplication
1011
0000
Initial Values
1101
If this digit is a one
then
add
multiplicand
Shift A and Q right
Unsigned Binary Multiplication
1011
First Cycle
1011
0000
1101
If this digit is a one
then
add
multiplicand
Shift A and Q right
Unsigned Binary Multiplication
1011
First Cycle
1011
1011
1101
If this digit is a one
then
add
multiplicand
Shift A and Q right
Unsigned Binary Multiplication
1011
First Cycle
1011
0101
1110
If this digit is a one
then
add
multiplicand
Shift A and Q right
Unsigned Binary Multiplication
1011
Second Cycle
0000
0101
1110
If this digit is a one
then
add
multiplicand
Shift A and Q right
Unsigned Binary Multiplication
1011
Second Cycle
0000
0101
1110
If this digit is a one
then
add
multiplicand
Shift A and Q right
Unsigned Binary Multiplication
1011
Second Cycle
0000
0010
1111
If this digit is a one
then
add
multiplicand
Shift A and Q right
Unsigned Binary Multiplication
1011
Third Cycle
1011
0010
1111
If this digit is a one
then
add
multiplicand
Shift A and Q right
Unsigned Binary Multiplication
1011
Third Cycle
1011
1101
1111
If this digit is a one
then
add
multiplicand
Shift A and Q right
Unsigned Binary Multiplication
1011
Third Cycle
1011
0110
1111
If this digit is a one
then
add
multiplicand
Shift A and Q right
Unsigned Binary Multiplication
1011
Fourth Cycle
1011
0110
1111
If this digit is a one
then
add
multiplicand
Shift A and Q right
Unsigned Binary Multiplication
1011
Fourth Cycle
1011
1
0001
1111
If this digit is a one
then
add
multiplicand
Shift A and Q right
Unsigned Binary Multiplication
1011
Fourth Cycle
1011
1000
1111
If this digit is a one
then
add
multiplicand
Shift A and Q right
Flowchart for Unsigned
Binary Multiplication
Multiplying Negative
Numbers
• This does not work!
• Solution 1
– Convert to positive if required
– Multiply as above
– If signs were different, negate answer
• Solution 2
– Booth’s algorithm
Example of Booth’s Algorithm
•Multiplier and multiplicand placed in Q and M
•Additional 1 bit register Q-1
•Results appear in A and Q
•A and Q initialised to 0
•Q0 and Q-1 examined
•If both same (0-0 or 1-1)
Shift A and Q right
If different
If 0-1 then Add M
If 1-0 then Subtract M
Shift A and Q right
•Shift preserves the sign
An-1  An-2 and remains in An-1
Division
• More complex than multiplication
• Negative numbers are really bad!
• Based on long division
00001101
Quotient
1011 10010011
1011
001110
Partial
1011
Remainders
001111
1011
100
Dividend
Divisor
Remainder
Real Numbers
• Numbers with fractions
• Could be done in pure binary
– 1001.1010 = 24 + 20 +2-1 + 2-3 =9.625
• Where is the binary point?
• Fixed?
– Very limited
• Moving?
– How do you show where it is?
Sign bit
Floating Point
Biased
Exponent
Significand or Mantissa
• +/- .significand x 2exponent
• Misnomer – not really a floating point
• Point is actually fixed between sign bit and body of
mantissa
• Exponent indicates place value (point position)
Floating Point Examples
• Mantissa is stored in 2s compliment
• Exponent is in excess or biased notation
– e.g. Excess (bias) 128 means
– 8 bit exponent field
– Pure value range 0-255
– Subtract 128 to get correct value
– Range -128 to +127
FP Ranges
• For a 32 bit number
– 8 bit exponent
– +/- 2256  1.5 x 1077
• Accuracy
– The effect of changing lsb of mantissa
– 23 bit mantissa 2-23  1.2 x 10-7
– About 6 decimal places
Expressible Numbers
IEEE 754
•
•
•
•
Standard for floating point storage
32 and 64 bit standards
8 and 11 bit exponent respectively
Extended formats (both mantissa and exponent) for
intermediate results
FP Arithmetic +/•
•
•
•
Check for zeros
Align significands (adjusting exponents)
Add or subtract significands
Normalize
FP Arithmetic x/
•
•
•
•
•
•
Check for zero
Add/subtract exponents
Multiply/divide significands (watch sign)
Normalize
Round
All intermediate results should be in double length
storage
Floating
Point
Multiplication
Floating
Point
Division
Syllabus
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Introduction .....................................................................chapter 2
Computer Interconnection Structures .....................chapter 3
Internal & External Memory ........................................chapter 4
Input & Output ................................................................chapter 6
Operating Systems – part 1 ........................................chapter 7
Operating Systems – part 2
Computer Arithmetic – part 1 .....................................chapter 8
Computer Arithmetic – part 2
Instruction Sets: Characteristics & Function ........chapter 9
Instruction Sets: Addressing Modes & Formats .chapter 10
CPU Structure & Function – part 1 .........................chapter 11
CPU Structure & Function – part 2
Control Unit Operation ..............................................chapter 14
Computer Organisation & Architecture – William Stallings
What is an instruction set?
• The complete collection of instructions that are
understood by a CPU
• Machine Code – set of binary digits
• Usually represented by assembly codes
Elements of an Instruction
• Operation code (Op code)
– Do this
• Source Operand reference
– To this
• Result Operand reference
– Put the answer here
• Next Instruction Reference
– When you have done that, do this...
Operands – Where From?
• Main memory (or virtual memory or cache)
• CPU register
• I/O device
Instruction Representation
• In machine code each instruction has a unique bit
pattern
• For human consumption (well, programmers anyway)
a symbolic representation is used
– e.g. ADD, SUB, LOAD
• Operands can also be represented in this way
– ADD A,B
Instruction Types
•
•
•
•
Data processing
Data storage (main memory)
Data movement (I/O)
Program flow control
Number of Addresses (a)
• 3 addresses
–
–
–
–
–
Operand 1, Operand 2, Result
a = b + c;
May be a fourth - next instruction (usually implicit)
Not common
Needs very long “words” to hold everything
Number of Addresses (b)
• 2 addresses
–
–
–
–
One address doubles as operand and result
a=a+b
Reduces length of instruction
Requires some extra work
• Temporary storage to hold some results
Number of Addresses (c)
• 1 address
– Implicit second address
– Usually a register (accumulator)
– Common on early machines
Number of Addresses (d)
• 0 (zero) addresses
– All addresses implicit
– Uses a stack
– e.g. push a
–
push b
–
add
–
pop c
– c=a+b
How Many Addresses is good?
• More addresses
– More complex (powerful?) instructions
– More registers
• Inter-register operations are quicker
– Fewer instructions per program
• Fewer addresses
– Less complex (powerful?) instructions
– More instructions per program
– Faster fetch/execution of instructions
Design Decisions (1)
• Operation repertoire
– How many operations?
– What can they do?
– How complex are they?
• Data types
• Instruction formats
– Length of op code field
– Number of addresses/instruction
• Registers
– Number of CPU registers available
– Which operations can be performed on which
registers?
• Addressing modes (later…)
• RISC v CISC - whole lecture on its own
Types of Operand
• Addresses
• Numbers
– Integer/floating point
• Characters
– ASCII etc.
• Logical Data
– Bits or flags
•
(Aside: Is there any difference between numbers and characters? )
Specific Data Types
•
•
•
•
•
•
•
•
•
•
General - arbitrary binary contents
Integer - single binary value
Ordinal - unsigned integer
Unpacked BCD - One digit per byte
Packed BCD - 2 BCD digits per byte
Near Pointer - 32 bit offset within segment
Bit field
Byte String
Floating Point
…………
Types of Operation
•
•
•
•
•
•
•
Data Transfer
Arithmetic
Logical
Conversion
I/O
System Control
Transfer of Control
Data Transfer
• Specify
– Source
– Destination
– Amount of data
• May be different instructions for different movements
– e.g. IBM 370
•
•
•
•
L
LH
LE
Etc…
; 32 bits from memory to register
; 16 bits from memory to register
; 32 bits from memory to fp register
• Or one instruction and different addresses
– e.g. VAX
• LD
;16 bits from anywhere to anywhere
Arithmetic
•
•
•
•
Add, Subtract, Multiply, Divide
Signed Integer
Floating point ?
May include
– Increment (a++)
– Decrement (a--)
– Negate (-a)
Logical
• Bitwise operations
• AND, OR, NOT
Conversion
• E.g. Binary to Decimal
Input/Output
• May be specific instructions
• May be done using data movement instructions
(memory mapped)
• May be done by a separate controller (DMA)
Systems Control
• Privileged instructions
• CPU needs to be in specific state
• For operating systems or compiler use
Transfer of Control
• Branch
– e.g. branch to x if result is zero
• Skip
–
–
–
–
e.g. increment and skip if zero
ISZ Register1
Branch xxxx
ADD A
• Subroutine call
– c.f. interrupt call
Byte Order
• What order do we read numbers that occupy more
than one byte?
• e.g. (numbers in hex to make it easy to read)
• 12345678 can be stored in 4x8bit locations as follows
•
•
•
•
•
Address
184
185
186
187
Value (1)
12
34
56
78
Value(2)
78
56
34
12
• i.e. read top down or bottom up?
Byte Order Names
• The problem is called Endian
• The system on the left has the least significant byte in
the lowest address
12 34 56 78
184
185
186
187
• This is called big-endian
• The system on the right has the least significant byte
in the highest address
12 34 56 78
187
186
185
184
• This is called little-endian
Standard…What Standard?
• Pentium (80x86), VAX are little-endian
• IBM 370, Moterola 680x0 (Mac), and most RISC are
big-endian
• Internet is big-endian
– Makes writing Internet programs on PC more awkward!
– WinSock provides htoi and itoh (Host to Internet &
Internet to Host) functions to convert
Syllabus
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Introduction .....................................................................chapter 2
Computer Interconnection Structures .....................chapter 3
Internal & External Memory ........................................chapter 4
Input & Output ................................................................chapter 6
Operating Systems – part 1 ........................................chapter 7
Operating Systems – part 2
Computer Arithmetic – part 1 .....................................chapter 8
Computer Arithmetic – part 2
Instruction Sets: Characteristics & Function ........chapter 9
Instruction Sets: Addressing Modes & Formats .chapter 10
CPU Structure & Function – part 1 .........................chapter 11
CPU Structure & Function – part 2
Control Unit Operation ..............................................chapter 14
Computer Organisation & Architecture – William Stallings
Addressing Modes
•
•
•
•
•
•
•
Immediate
Direct
Indirect
Register
Register Indirect
Displacement (Indexed)
Stack
Immediate Addressing
• Operand is part of instruction
• Operand = address field
• e.g. ADD 5
– Add 5 to contents of accumulator
– 5 is operand
• No memory reference to fetch data
• Fast
• Limited range
Instruction
Opcode
Operand
Direct Addressing
• Address field contains address of operand
• Effective address (EA) = address field (A)
• e.g. ADD A
– Add contents of cell A to accumulator
– Look in memory at address A for operand
• Single memory reference to access data
• No additional calculations to work out effective address
Instruction
• Limited address space
Opcode
Address A
Memory
Operand
Indirect Addressing (1)
• Memory cell pointed to by address field contains the
address of (pointer to) the operand
• EA = (A)
– Look in A, find address (A) and look there for operand
• e.g. ADD (A)
– Add contents of cell pointed to by contents of A to
accumulator
Opcode
Instruction
Address A
Memory
Pointer to operand
Operand
Indirect Addressing (2)
• Large address space
• 2n where n = word length
• May be nested, multilevel, cascaded
– e.g. EA = (((A)))
• Draw the diagram yourself
• Multiple memory accesses to find operand
• Hence slower
Register Addressing (1)
•
•
•
•
Operand is held in register named in address filed
EA = R
Limited number of registers
Very small address field needed
– Shorter instructions
– Faster instruction fetch
Opcode
Instruction
Register Address R
Registers
Operand
Register Addressing (2)
•
•
•
•
No memory access
Very fast execution
Very limited address space
Multiple registers helps performance
– Requires good assembly programming or compiler writing
• c.f. Direct addressing
Register Indirect Addressing
• C.f. indirect addressing
• EA = (R)
• Operand is in memory cell pointed to by contents of
register R
• Large address space (2n)
• One fewer memory access than indirect addressing
Opcode
Instruction
Register Address R
Memory
Registers
Pointer to Operand
Operand
Displacement Addressing
• EA = A + (R)
• Address field hold two values
– A = base value
– R = register that holds displacement
– or vice versa
Instruction
OpcodeRegister R Address A
Memory
Registers
Pointer to Operand
+
Operand
Relative Addressing
•
•
•
•
A version of displacement addressing
R = Program counter, PC
EA = A + (PC)
i.e. get operand from A cells from current location
pointed to by PC
• c.f locality of reference & cache usage
Base-Register Addressing
•
•
•
•
A holds displacement
R holds pointer to base address
R may be explicit or implicit
e.g. segment registers in 80x86
Indexed Addressing
•
•
•
•
A = base
R = displacement
EA = A + R
Good for accessing arrays
– EA = A + R
– R++
Combinations
• Postindex
• EA = (A) + (R)
• Preindex
• EA = (A+(R))
• (Draw the diagrams)
Stack Addressing
• Operand is (implicitly) on top of stack
• e.g.
– ADD Pop top two items from stack
and add
Instruction Formats
•
•
•
•
Layout of bits in an instruction
Includes opcode
Includes (implicit or explicit) operand(s)
Usually more than one instruction format in an
instruction set
Instruction Length
• Affected by and affects:
–
–
–
–
–
Memory size
Memory organization
Bus structure
CPU complexity
CPU speed
• Trade off between powerful instruction repertoire and
saving space
Allocation of Bits
•
•
•
•
•
•
Number of addressing modes
Number of operands
Register versus memory
Number of register sets
Address range
Address granularity
Syllabus
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Introduction .....................................................................chapter 2
Computer Interconnection Structures .....................chapter 3
Internal & External Memory ........................................chapter 4
Input & Output ................................................................chapter 6
Operating Systems – part 1 ........................................chapter 7
Operating Systems – part 2
Computer Arithmetic – part 1 .....................................chapter 8
Computer Arithmetic – part 2
Instruction Sets: Characteristics & Function ........chapter 9
Instruction Sets: Addressing Modes & Formats .chapter 10
CPU Structure & Function – part 1 .........................chapter 11
CPU Structure & Function – part 2
Control Unit Operation ..............................................chapter 14
Computer Organisation & Architecture – William Stallings
CPU Structure
• CPU must:
–
–
–
–
–
Fetch instructions
Interpret instructions
Fetch data
Process data
Write data
Registers
•
•
•
•
•
CPU must have some working space (temporary storage)
Called registers
Number and function vary between processor designs
One of the major design decisions
Top level of memory hierarchy
User Visible Registers
•
•
•
•
General Purpose
Data
Address
Condition Codes
General Purpose Registers
•
•
•
•
May be true general purpose
May be restricted
May be used for data or addressing
Data
– Accumulator
• Addressing
– Segment
• Make them general purpose
– Increase flexibility and programmer options
– Increase instruction size & complexity
• Make them specialized
– Smaller (faster) instructions
– Less flexibility
How Many GP Registers?
• Between 8 - 32
• Fewer = more memory references
• More does not reduce memory references and takes
up processor real estate
How big?
• Large enough to hold full address
• Large enough to hold full word
• Often possible to combine two data registers
– C programming
– long int a;
Condition Code Registers
• Sets of individual bits
– e.g. result of last operation was zero
• Can be read (implicitly) by programs
– e.g. Jump if zero
• Can not (usually) be set by programs
Control & Status Registers
•
•
•
•
Program Counter
Instruction Decoding Register
Memory Address Register
Memory Buffer Register
• Revision: what do these all do?
Program Status Word
• A set of bits - Includes Condition Codes
–
–
–
–
–
–
Sign - of last result
Zero – set when result of last op was zero
Carry – set if last op resulted in a carry
Equal – set if a logical compare was true
Overflow – set to indicate arithmetic overflow
Interrupt enable/disable
• Supervisor Mode
•
•
•
•
•
Intel ring zero
Kernel mode
Allows privileged instructions to execute
Used by operating system
Not available to user programs
Other Registers
• May have registers pointing to:
– Process control blocks (see O/S)
– Interrupt Vectors (see O/S)
• N.B. CPU design and operating system design are
closely linked
Example Register Organizations
Instruction Cycle
Indirect Cycle
• May require memory access to fetch operands
• Indirect addressing requires more memory accesses
• Can be thought of as additional instruction subcycle
Instruction Cycle with Indirect
Instruction Cycle State Diagram
Data Flow (Instruction Fetch)
• Depends on CPU design – in general:
– Fetch
•
•
•
•
•
•
PC contains address of next instruction
Address moved to MAR
Address placed on address bus
Control unit requests memory read
Result placed on data bus, copied to MBR, then to IR
Meanwhile PC incremented by 1
Data Flow (Data Fetch)
• IR is examined
• If indirect addressing, indirect cycle is performed
– Right most N bits of MBR transferred to MAR
– Control unit requests memory read
– Result (address of operand) moved to MBR
Data Flow (Execute)
• May take many forms
• Depends on instruction being executed
• May include
–
–
–
–
Memory read/write
Input/Output
Register transfers
ALU operations
Data Flow (Interrupt)
– Current PC saved to allow resumption after interrupt
– Contents of PC copied to MBR
– Special memory location (e.g. stack pointer) loaded to
MAR
– MBR written to memory
– PC loaded with address of interrupt handling routine
– Next instruction (first of interrupt handler) can be
fetched
Prefetch
• Fetch accessing main memory
• Execution usually does not access main memory
• Can fetch next instruction during execution of current
instruction
• Called instruction prefetch
Improved Performance
• But not doubled:
– Fetch usually shorter than execution
• Prefetch more than one instruction?
– Any jump or branch means that prefetched instructions
are not the required instructions
• Add more stages to improve performance
Pipelining
•
•
•
•
•
•
Fetch instruction
Decode instruction
Calculate operands (i.e. EAs)
Fetch operands
Execute instructions
Write result
• Overlap these operations
Timing of Pipeline
Branch in a Pipeline
Dealing with Branches
•
•
•
•
•
Multiple Streams
Prefetch Branch Target
Loop buffer
Branch prediction
Delayed branching
Multiple Streams
• Have two pipelines
• Prefetch each branch into a separate pipeline
• Use appropriate pipeline
• Leads to bus & register contention
• Multiple branches lead to further pipelines being
needed
Prefetch Branch Target
• Target of branch is prefetched in addition to
instructions following branch
• Keep target until branch is executed
• Used by IBM 360/91
Loop Buffer
•
•
•
•
•
•
Very fast memory
Maintained by fetch stage of pipeline
Check buffer before fetching from memory
Very good for small loops or jumps
c.f. cache
Used by CRAY-1
Branch Prediction
• Predict never taken
– Assume that jump will not happen
– Always fetch next instruction
• Predict always taken
– Assume that jump will happen
– Always fetch target instruction
• Predict by Opcode
– Some instructions more likely to result in a jump
– Can get up to 75% success
• Taken/Not taken switch
– Based on previous history
– Good for loops
• Delayed Branch
– Do not take jump until you have to
– Rearrange instructions
Branch Prediction State Diagram
Syllabus
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Introduction .....................................................................chapter 2
Computer Interconnection Structures .....................chapter 3
Internal & External Memory ........................................chapter 4
Input & Output ................................................................chapter 6
Operating Systems – part 1 ........................................chapter 7
Operating Systems – part 2
Computer Arithmetic – part 1 .....................................chapter 8
Computer Arithmetic – part 2
Instruction Sets: Characteristics & Function ........chapter 9
Instruction Sets: Addressing Modes & Formats .chapter 10
CPU Structure & Function – part 1 .........................chapter 11
CPU Structure & Function – part 2
Control Unit Operation ..............................................chapter 14
Computer Organisation & Architecture – William Stallings
Micro-Operations
• A computer executes a program
• Fetch/execute cycle
• Each cycle has a number of steps
– see pipelining
• Called micro-operations
• Each step does very little
• Atomic operation of CPU
Constituent Elements of
Program Execution
Fetch - 4 Registers
• Memory Address Register (MAR)
– Connected to address bus
– Specifies address for read or write op
• Memory Buffer Register (MBR)
– Connected to data bus
– Holds data to write or last data read
• Program Counter (PC)
– Holds address of next instruction to be fetched
• Instruction Register (IR)
– Holds last instruction fetched
Fetch Sequence
•
•
•
•
•
•
•
•
Address of next instruction is in PC
Address (MAR) is placed on address bus
Control unit issues READ command
Result (data from memory) appears on data bus
Data from data bus copied into MBR
PC incremented by 1 (in parallel with data fetch from memory)
Data (instruction) moved from MBR to IR
MBR is now free for further data fetches
Fetch Sequence (symbolic)
t1:
t2:
t3:
MAR <- (PC)
MBR <- (memory)
PC <- (PC) +1
IR <- (MBR)
(tx = time unit/clock cycle)
• Or (conceptually identical)
t1:
t2:
t3:
MAR <- (PC)
MBR <- (memory)
PC <- (PC) +1
IR <- (MBR)
Rules for Clock Cycle Grouping
• Proper sequence must be followed
– MAR <- (PC) must precede MBR <- (memory)
• Conflicts must be avoided
– Must not read & write same register at same time
– MBR <- (memory) & IR <- (MBR) must not be in same
cycle
• Also: PC <- (PC) +1 involves addition
– Use ALU
– May need additional micro-operations
Indirect Address Cycle
MAR <- (IRaddress) - address field of IR
MBR <- (memory)
IRaddress <- (MBRaddress)
• MBR contains an address
• IR is now in same state as if direct addressing had
been used
Interrupt Cycle
t1:
t2:
t3:
MBR <-(PC)
MAR <- save-address
PC <- routine-address
memory <- (MBR)
• This is a minimum
– May be additional micro-ops to get addresses
– N.B. saving context is done by interrupt handler
routine, not micro-ops
Execute Cycle (ADD)
• Different for each instruction
• e.g. ADD R1,X - add the contents of location X to
Register 1 , result in R1
t1:
t2:
t3:
MAR <- (IRaddress)
MBR <- (memory)
R1 <- R1 + (MBR)
• Note no overlap of micro-operations
Execute Cycle (ISZ)
• ISZ X - increment and skip if zero
t1:
t2:
t3:
t4:
MAR <- (IRaddress)
MBR <- (memory)
MBR <- (MBR) + 1
memory <- (MBR)
if (MBR) == 0 then PC <- (PC) + 1
• Notes:
– if is a single micro-operation
– Micro-operations done during t4
Execute Cycle (BSA)
• BSA X - Branch and save address
– Address of instruction following BSA is saved in X
– Execution continues from X+1
t1:
t2:
t3:
MAR <- (IRaddress)
MBR <- (PC)
PC <- (IRaddress)
memory <- (MBR)
PC <- (PC) + 1
Functional Requirements
• Define basic elements of processor
• Describe micro-operations processor performs
• Determine functions control unit must perform
•
•
•
•
•
Basic Elements of Processor
ALU
Registers
Internal data paths
External data paths
Control Unit
Types of Micro-operation
•
•
•
•
Transfer data between registers
Transfer data from register to external
Transfer data from external to register
Perform arithmetic or logical ops
Functions of Control Unit
• Sequencing
– Causing the CPU to step through a series of microoperations
• Execution
– Causing the performance of each micro-op
• This is done using Control Signals
Control Signals
• Clock
– One micro-instruction (or set of parallel micro-instructions)
per clock cycle
• Instruction register
– Op-code for current instruction
– Determines which micro-instructions are performed
• Flags
– State of CPU
– Results of previous operations
• From control bus
– Interrupts
– Acknowledgements
Control Signals - output
• Within CPU
– Cause data movement
– Activate specific functions
• Via control bus
– To memory
– To I/O modules
Example Control Signal
Sequence - Fetch
• MAR <- (PC)
– Control unit activates signal to open gates between PC
and MAR
• MBR <- (memory)
– Open gates between MAR and address bus
– Memory read control signal
– Open gates between data bus and MBR
Internal Organization
• Usually a single internal bus
• Gates control movement of data onto and off the bus
• Control signals control data transfer to and from
external systems bus
• Temporary registers needed for proper operation of
ALU
Hardwired
Implementation (1)
• Control unit inputs
• Flags and control bus
– Each bit means something
• Instruction register
– Op-code causes different control signals for each
different instruction
– Unique logic for each op-code
– Decoder takes encoded input and produces single
output
– n binary inputs and 2n outputs
Hardwired
Implementation (2)
• Clock
–
–
–
–
Repetitive sequence of pulses
Useful for measuring duration of micro-ops
Must be long enough to allow signal propagation
Different control signals at different times within
instruction cycle
– Need a counter with different control signals for t1, t2
etc.
Problems With Hard
Wired Designs
•
•
•
•
Complex sequencing & micro-operation logic
Difficult to design and test
Inflexible design
Difficult to add new instructions
• Better to use a microprogrammed controller……