Download Ch. 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter3. Processor Design
CPU function : to execute instructions stored in a memory.
– instruction cycle
fetch cycle : fetch an instruction from main memory
execute cycle : decode the instruction, fetch any required operands, and
perform the operation.
The behavior of CPU : sequence of register transfer operations.
CPU time( tCPU ) : the time required for the shortest CPU microoperation.
Interrupt : I/O devices request service from CPU
CPU Design issues
1. CPU should be as fast as the available technology permits.
# of components in the CPU must be kept small.
2. Because of the size of the main memory, it must be constructed using less
expensive and therefore slower technology than that of CPU.(1 to 10 ratio)
Von Noumann CPU design : the basic of almost all CPU design
CPU operation form
X1 := fi ( X1, X2 )
X1 and X2 denote CPU register( AC, DR or PC ) or an external memory
location M(adr)
fi : fixed-point addition/subtraction shifting and logical operation
I(instruction) = op.adr
opcode
memory address
fetch operation IR.AR = M(PC)
Two essential memory-addressing instruction
load AC := M(adr)
store M(adr) := AC
Architecture Extensions
1. Additional addressable registers can be provided for storing operands
and addresses. (index registers or base registers)
replacing the single accumulator by a set of register
2. The capabilities of ALU can be extended from fixed-point addition/
subtraction to fixed-point multiplication
3. Special registers can be included to facilitate the transfer of control
between instruction within a program (such as a flag register)
4. The transfer of control between different subroutines due to interrupts
or subroutine calls and returns is facilitated by special registers (such
as PSW -Program status word- of IBM 360). Control is transferred by
saving the current PSW in main memory and loading a new PSW into
CPU. Control is returned to the first program by retrieving the
previously saved PSW from memory and restoring to CPU.
Most computers now use LIFO (last-in first-out) stack.
5. Facilitate simultaneous processing of two or more distinct instructions
by extending memory addressing circuits and adding sufficient buffer
storage to CPU.
ALU can be divided into K parts to execute K instructions at
once : Pipelining
A coprocessor is a specialized instruction execution unit that can be coupled a
microprocessor so that instructions to be executed by P can be included in
programs fetched by the microprocessor.
For example, the floating-point instructions of Motorola 68020 can be executed by
means of an auxiliary 68881 floating-point coprocessor.
A set of coprocessor instructions are defined for the 68020 when 68020 fetches
and decodes such instructions, it transfer the command position to the coprocessor,
which then execute it.
3.2 Information representation
A word : an information unit of fixed length
ASCII code( 8bits )
American Standards Committee on Information Interchange
instruction
fixed-point number
information
numerical data
data
floating-point number
non-numerical data
o fixed-point numbers
b0b1…bn-1 bi = {0,1}
o floating-point number
exponent
M  2E
mantissa
To assign representation that identify the major information types
Tag is used.
Word format of Burroughs B 6500/7500
0
1
47
48
information
Advantage : Instruction sets can be simplified and
software errors can be detected
Disadvantage : waste of memory
49
tag
50
51
Parity bit
Error detection and correction
parity bit : a single check bit
– The parity bit is appended to an n-bit word X = (x0, x1, ···, xn-1) to form
( n+1) bit word X* = (x0, x1, ···, xn-1, c0)
even-parity : c0= x0  x1  ···  xn-1
odd-parity : c0= 1  x0  x1  ···  xn-1
If c0 = c0* ( the recomputed parity bit based on the received word ), then there is
no single-bit error but maybe multiple even # of bits error.
Single-bit error correction for n-bit word.
c: #of check bits required for single error correction.
n+c : all possible single error locations
2c > n + c
2c ≥ n + c + 1
Error-free- case
2c ≥ n + c + 1
for n = 4
c≥3
for n = 8
c≥4
for n = 16
c≥5
These codes also have the ability to detect double errors
SECDED ( Single Error Correction / Double Error Detection )
Example) 16 bit word X = ( x0, x1, ··· , x15 ) → 5 check bit ( c0, c1, c2, c3, c4 )
0
1
2
3
4
⊕x2
c0 =
5
6
7
8
9
10
11
12
13
14
⊕x1 ⊕x1 ⊕x1 ⊕x1 ⊕x1 ⊕x
15
⊕x5
0
1
2
3
4
⊕x4 ⊕x5 ⊕x6 ⊕x7 ⊕x8 ⊕x9 ⊕x1
c1 =
⊕x15
0
⊕x1 ⊕x2 ⊕x3
c2 =
c3 = ⊕x0
c4 = ⊕x0 ⊕x1
⊕x2 ⊕x3
⊕x3 ⊕x4
⊕x1 ⊕x
15
⊕x7 ⊕x8 ⊕x9
⊕x5 ⊕x6
(x0, x1, ··· , x15, c0, c1, c2, c3, c4)
calculate a new set of check
15
⊕x6
4
⊕x1 ⊕x1
⊕x9
⊕xr8
2
⊕x1
3
⊕x1
(x0 x1 , ··· , x15 c0r,1 c1r, c2r, c33r, c4r )
bits(c0*, c1*, c2*, c3*, c4*) from (x0r, x1r, ···
r,
r,
, x15r)
The error vector E=(c0r  c0*, c1r  c1*, c2r  c2*, c3r  c3*, c4r  c4* )
If E = (0, 0, 0, 0, 0), then no detectable error has occurred.
If E=(0, 0, 0, 1, 1), then a single fault in a bit common only to c3 & c4 is detected.
The error caused to x0 to become x0
The error is corrected by changing x0r to x0r
Number Format
1. The types of numbers to be represented : integer, real number.
2. The range values
3. The precision of values
4. Hardware complexity
Binary number
- sign-magnitude
x0, x1, · · ·, xn-1
sign
magnitude
+ 5 : 0101
– 5 : 1101
One’s complement representation
- positive number : same as sign-magnitude
- negative number : bitwise logical complement
+ 5 : 0101
+ 0 : 0 · · · 0 Two representations of 0
– 5 : 1010
–0:1···1
Two’s complement representation
- positive number : same as sign-magnitude
- negative number : do the bit-wise complement, then
add 1 to the least significant bit, and ignore
carry generated from the most significant bit
–5:
0101
1010
1011
- unique representation of 0
IEEE 754 standard 32 bit floating point number format
0
Sign bit
1
···
8
exponent E
( excess 127
binary integer )
9
···
31
mantissa M( 23 bit )
M : a sign-magnitude binary number
The magnitude part of a normalized sign-magnitude number has 1 as its most
significant digit.
No need to store this 1.
The complete mantissa, called significand, is actually 1.M
The precision is effectively increased by 1 bit.
The actual exponent value is computed as E-127.
1 bit left(right) shift of M
corresponds to incrementing( decrementing ) E by 1.
N = (–1)S 2E–127 ( 1.M )
for 0 < E < 255
E
M
N=10111111100 ··· 0
= – 2127–127 ( 1.5 )
= – 1.5
N=00111111000
= ( – 1 )0 2127 –127 (1.M )
=1
1.75
001111111110 0 ··· 0
E
M
Magnitude range : 1 ⅹ 2–126 ~ ( 2 – 2–23 ) ⅹ 2127
32-bit fixed-point number range : 2–32 ~ 231 – 1
If the result of a floating-point operation is not a valid floating-point number
then a special code referred to as not-a-number( NaN ) is used.
If E = 255 and M ≠ 0, then N = NaN
If E = 255 and M = 0, then N = ( – 1 )S 
If 0 < E < 255, then N = ( – 1 )S 2 E – 127( 1.M )
If E = 0 and M ≠ 0, then underflow
If E = 0 and M = 0, then N = 0
Floating-point round-off error
: caused by the fact that every number must be represented by a limited number
of bits
N1 + N2 = M1 2e1 + M2 2e2
For example,
1.1  23 + 1.01  22 = 22( 11 + 1.01 )
= 22  100.01
= 1.001  24
1.10 · · · 01  23 + 1.010 · · · 01  22
= 23 ( 1.10 · · · 01 + 0.1010 · · · 01 )
23
M1
= 10 ··· 01
M2
= 01 ··· 01
M2  2–1 = 0 0 1 · · · 0 0 1
Shift out
Example of matrix multiplication: accumulation of roundoff
errors
: caused by the fact that every number must be represented
by a limited number of bits
AxB=C
a11 a12 … a1n
a21 a22 … a2n
.
an1 … ann
b11 b12 … b1n
b21 b22 … b2n
.
bn1 … bnn
=
c11 c12 … c1n
c21 c22 … c2n
.
cn1 … cnn
3.2 Instruction sets
- to specify an operation to be carried out and the set of operands or data
to be used
f ( x1, x2, · · · , xn )
X1
Basic Instruction Format
0
N–1
opcode
operands
Addressing modes : How to specify the current value of data X
- immediate addressing
: when data X is constant, its value can be placed in the operand field
- direct addressing
: the corresponding operand field contains the address X of the storage
location containing the required value
- indirect addressing
: the instruction contains the address W of a storage location which in turn
contains the address X of the desired operand
Intel 8085’s
MVI A, 99
MOV A, B
immediate addressing
direct addressing
absolute addressing : require the complete operand address to appear in
the instruction operand field
relative addressing : the operand fields contain a relative address, and the
effective address of an operand is some function
The reasons for relative addressing
① Since all the address information need not be included in the instruction,
instruction length is reduced.
② By changing the contents of R, the processor can change the absolute
addresses referred to by a block of instructions B R : a Base register
③ R can be used for storing indices to facilitate the processing of indexed data.
R : an index register.
Disadvantage of relative addressing
: needs extra hardware to calculate the effective address and extra processing
time to calculate the effective address.
Number of addresses
: The fewer the addresses, the shorter the instruction
The fewer addresses mean more primitive instructions, longer program.
A 3-address machine
ADD Z, X, Y : add the contents of memory locations X and Y and place its
result in Z
A 2-address machines
ADD X, Y : AC
X
X + Y or
X+Y
A 1-address machines
ADD X :
AC
AC + X
A 0-address machines
ADD : all operands are required to be in the top positions in the stack
Instruction types: what types of instructions should be included in a general purpose
processor?
Requirements of an instruction set
① should be complete to evaluate any computable function
② should be efficient in that frequently required functions can be preformed
rapidly using relatively few instructions.
③ should be regular
④ should be compatible to reduce hardware & software design cost
No standard machine
Five main types of instructions
① Data-transfer instructions, which copy information from one location to
another either in the processor’s internal register set or in the external main
memory.
② Arithmetic instructions, which perform operations on numerical data.
③ Logical instructions, which include Boolean and other non-numerical
operations.
④ Program-control instructions, such as branch instructions, which change
the sequence in which programs are executed.
⑤ Input-output( IO ) instructions, which cause information to be transferred
between the processor or its main memory and external IO devices.
RISC versus CISC
With cheaper hardware, instructions tend to increase both in number and
complexity.
Suppose that a particular complex operation F can be implemented either by
a single complex instruction IF or by a multi-instruction routine PF composed of
simple instructions.
Execution of PF will be slower than that of IF due to fetching time.
PF occupies more memory space than IF.
IF address to the complexity of control unit, thus increasing the size of the
processor and design time.
RISC versus CISC
Assembly language : simple by using IF
High-level language
: The improvement in the execution speed for IF may not be fully realizable.
A compiler will translate F into the corresponding instruction IF which uses
fixed CPU registers and has a fixed execution time. If IF is not available, an
efficient “optimizing” compiler may be able to generate object code OF
corresponding to PF that exploits information known at compilation time, to
reduce the execution time for F.
The speed gap between IF and PF can be narrowed by designing the small
instruction set required for PF, to reduce the instruction fetch and execution
cycle times as far as possible. Another speed advantage of PF over IF is that
PF can be interrupted in mid-operation, whereas IF must proceed to termination
before CPU can respond to an interrupt.
RISC versus CISC
The main features of RISC(Reduced Instruction Set Computer)
1. Relatively few instruction types and addressing modes.
2. Fixed and easily decoded instruction formats.
3. Fast single-cycle instruction execution. /* Main point */
4. Hardwired rather than microprogrammed control.
5. Memory access is limited mainly to load and store instruction.
Large # of registers in CPU. Most RISC instructions involve only
register-to- register operation internal to CPU
6. Use of compilers to optimize object code performance.
Key points : efficient compilation
architects and compiler
cooperation of the machine
In scientific computing application with lots of floating-point arithmetic, CISC
is better.
RISC versus CISC
RISC I microprocessor( by Patterson )
A single-chip 32 bit CPU, 32 bit 138 general purpose registers
- to achieve single cycle execution with instructions of fixed size (all
instructions are 32 bit long)
- to access main memory with load and store only.
- to provide some support for high level language.
Instruction format of RISC I
set condition code
0
6
Opcode
7
8
set immediate address
12
Source1 RS
13
18
Destination RD
Relative address Y
19
31
Source2 S2
Most instructions are register-to-register types
RD
f ( RS , S2 ) : Rightmost 5 bits of S2 define a second source register
If bit 18 is set to 1, then S2 is interpreted as a 13-bit constant or immediate
address. In this case, S2 is automatically expanded to 32 extension by sign
extension.
Memory is addressed by using RS as an index register and S2 as
a 13-bit offset(effective address ; M(RS + S2))
Setting RS= 0 → direct addressing
Setting S2 = 0 → indirect addressing
No explicit I/O instruction : memory-mapped I/O
Multiple-word operation
: through add/subtract-with-carry instruction 64-bit addition
C ←A+B
①Apply ADD instruction to the right halves of A and B.
②Apply ADDC instruction it the left halves of A and B
Logical instruction
: AND, OR, XOR, SLL, SRL, SRA( arithmetic right-shift )
Program control
: allow hardware support for parameter passing in high-level languages.
RISC I allows the passing of parameters during subroutine calls and returns to
be done rapidly using its CPU registers (cf. Most computers employ a memory
stack: slow operation).
Each subroutine is assigned from the 138 CPU registers.
A virtual set of 32 registers ( R:0 ~ R:31) for storing its input and output
parameter(a register window): When subroutine A calls subroutine B, the
register window assigned to B is overlapped with that of A. The output
parameter part of A’s window and the input parameter part of B’s window are
assigned to the same physical register : immediate access.
3.3.3 Programming Consideration
Assembly language programming
Assembler : translation of assembly language instruction into the equivalent
machine instructions.
Pseudo instruction : not part of object code
Macro instruction(macro) part of object code
Subroutine : part of object code
Tools to simplify
the program