Download Slides 4 - USC Upstate: Faculty

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
SCSC 311 Information Systems:
hardware and software
Objectives







CPU execution cycle
CPU instructions
Instruction format
CPU design: CISC vs. RISC
CPU registers
Enhancing CPU performance
The limitations of semiconductor-based microprocessors
Review: CPU Components

Control unit


Arithmetic logic unit (ALU)


Moves data and instructions between main
memory and registers
Performs computation and comparison operations
Set of registers

Storage locations that hold inputs and outputs for
the ALU
A complex chain of events occurs when CPU executes a program.
Actions Performed by CPU
Fetch cycle Control Unit:
1.
Fetches an instruction from primary storage
(instruction 2. Increments instruction pointer to location of next
cycle)
instruction
3.
Decode: Separates instruction into components
(instruction code and data inputs)
4.
Stores each component in a separate register
Execution
cycle
ALU:
1.
Retrieves instruction code from a register
2.
Retrieves data inputs from registers
3.
Passes data inputs through internal circuits to
perform data transformation
4.
Stores results in a register
Index







CPU execution cycle
CPU instructions
Instruction format
CPU design: CISC vs. RISC
CPU registers
Enhancing CPU performance
The limitations of semiconductor-based
microprocessors
Instructions and Instruction Sets

Instruction
 Is lowest-level command to the CPU
 A bit string, logically divided into components (op code and
operands)




Op code: is the unique binary number represents a instruction
Operands: the input values for the instruction (data or address)
Instruction sets is a collection of instructions that a CPU can process
 Vary among different CPUs
Three types of instruction
 data movement
 data transformation
 sequence control
Three types of instructions
Data Movement Instructions
1.
Copy data among registers, primary storage,
secondary storage, and I/O devices



Load: data transfer from RAM to a register
Store: data transfer from a register to RAM
Data Transformation Instructions
2.
(details later)
Boolean operations (NOT, AND, OR, XOR)
Addition (ADD)
Bit manipulation (SHIFT)





Logical shift
Arithmetic shift
Three types of instructions
Sequence control instructions alter the flow of
instruction execution
3.
Branch instruction


Normally the control unit fetches the next sequential instruction from
RAM at the end of each execution cycle;
Q: how would the control unit accomplish this?




Ans: In Branch instruction, one operand contains the RAM address for
the next instruction is loaded into PC register
Unconditional branch always depart from the normal sequential
execution sequence
Conditional branch depart from the normal sequential execution
sequence only if a specific condition is met

The control unit checks a register which contains the result
from a Boolean operation
Halt instruction suspends the flow of instruction execution
in the current program.
Q: what happens when an executing program halt?
Six Data Transformation Instructions
• The rules of NOT, AND, OR, XOR, ADD
• ADD
• SHIFT (next slide)
(self-study)
SHIFT Instruction

Two types of SHIFT instruction


Logic SHIFT and Arithmetic SHIFT
Logic SHIFT
Q: What can a logic SHIFT instruction do?
Logical SHIFT
Ans: A logic SHIFT instruction can extract a signal bit from a bit string.
 E.g. 1, extract the fourth bit and put this bit on the rightmost
 E.g. 2, extract and check the sign bit of a 2’s complement number
Arithmetic SHIFT
Arithmetic SHIFT instructions perform multiplication or division.
• For unsigned binary numbers: e.g.1: multiple by 2, e.g.2: divided by
four
• For 2’s complement numbers: need to preserve the sign bit first
Complex Processing Operations

Complex processing operations can be
implemented by combining the primitive
instructions


Examples …
Most CPUs provide a much larger instruction set

Directly support complex instructions, such as
multiplication, division, …
Q: Why include these complex instructions in CPU
instruction set?
Complex Processing Operations

Ans: Tradeoff of CPU design



Tradeoff between CPU complexity and programming simplicity
 directly support complex instruction complicates circuitry but
reduces programming complexity
Tradeoff between CPU complexity and program execution speed
 multi-step instruction sequences execute faster if they are
executed within hardware as a single instruction – avoiding
overhead
Note that: additional instructions are required when new data
types are added

E.g., If double precision floating point data type are supported by
CPU, a set of instructions are required for this data type
Index







CPU execution cycle
CPU instructions
Instruction format
CPU design: CISC vs. RISC
CPU registers
Enhancing CPU performance
The limitations of semiconductor-based
microprocessors
Instruction Format

Instruction format is a template




specifies the number of operands
specifies the position and length of the op code and
operand(s)
Since instructions vary in the number and type of operands,
CPUs support multiple instruction formats (in the next slide)
Instruction formats vary among CPUs:




op code size
meaning of specific op code
length and coding format of operand
Etc. ...
(1) A 20-bit instruction
uses register inputs
and output:
8-bit Op code, 3 4-bit
operands store
register numbers
(2) A 32-bit load and
store instruction:
8-bit Op code, 2 4bit operands, 1 16bit operand
Fixed Length Instruction vs.
Variable Length Instruction
Fixed length
Instructions

Padding shorter instructions with trailing zeros

Simplify the instruction fetching
inefficient memory use

Variable
length
Instructions


(why?)
Complicate the instruction fetching (why?)
efficient memory use
Index







CPU execution cycle
CPU instructions
Instruction format
CPU design: CISC vs. RISC
CPU registers
Enhancing CPU performance
The limitations of semiconductor-based
microprocessors
Complex Instruction Set Computing
(CISC)

CISC



In early days, memory is expensive, slow
CPU designer provided complex instructions
do more work per instruction  each complex instruction require
less memory and execution time
Features of CISC



Need less memory for program storage and execution
Complex instructions usually have variable instruction length
A large instruction set complicates CPU design  CISC CPUs
are complicated  hard to manufacture
Reduced Instruction Set Computing
(RISC)

RISC is a relatively new philosophy of CPU design
(1980s - )



Absence of some complex instructions from the instruction set
 RISC CPUs do not combine data transformation and data
movement in one instruction
RISC uses fixed length instructions, short instruction length, large
number of general-purpose registers
Feature of RISC



Need more memory for program storage and execution
Inefficient at executing complex instructions
RISC CPU is simple, easy to manufacture
RISC vs. CISC

RISC vs. CISC
 OSs are implemented based on CPU design



RISC chip: e.g. Hewlett Packard's PA-RISC processor
Apples’ power Macintosh and some version of Linux are
implemented based on RISC processor
CISC chip : e.g. Intel Pentium, Xeon, Itanium
Win OS are implemented based on CISC processor
Pros and Cons of CISC and RISC
e.g. c = a * b
Q 1: Which CPU design is better ?
Q 2: Why does Intel use CISC design?
Index







CPU execution cycle
CPU instructions
Instruction format
CPU design: CISC vs. RISC
CPU registers
Enhancing CPU performance
The limitations of semiconductor-based
microprocessors
CPU Registers

Two primary roles of registers


general-purpose registers: hold data for currently executing
program that is needed quickly or frequently
special-purpose registers: store information about currently
executing program and status of CPU
General-Purpose Registers

General-purpose registers hold intermediate
results and frequently needed data items




like a scratch-pad for CPU
Used only by currently executing program
Implemented within the CPU, so that contents can
be read or written quickly
Increasing general-purpose registers usually
decreases program execution time, to a point
Q: Why is that?
Special-Purpose Registers

CPU uses special-purpose registers to track processor
and program status

Some special purpose registers



Instruction register
Instruction pointer (a.k.a. program counter)
Program status word (PSW)



each bit in PSW is called a flag
At the end of each execution cycle, control unit tests PSW
flags to determine whether an error has occurred.
Examples of PSW bits …
Word Size

A word is a unit of data that contains a fixed number of bits.
 the amount of data that a CPU processes at a time
 32-bit CPU, 64-bit CPU

Word size
 matches the size of general purpose registers
 is a fundamental CPU design decision

Implications for system bus design and implementation of RAM



the bus width should be at least as large as word size
RAM should be able to read / write a word at a time
Increasing word size usually increases CPU efficiency, up to a
point
Q: why is that?
E.g.,: Doubling word size generally increases the number of CPU
components by 2.5 to 3 times
Index







CPU execution cycle
CPU instructions
Instruction format
CPU design: CISC vs. RISC
CPU Registers
Enhancing CPU performance
The limitations of semiconductor-based
microprocessors
Clock Rate

System clock




A digital circuit that generates timing pulses (ticks) and
transmits the pulses to other components
All devices coordinate their activities with the system clock.
The clock rate: the frequency at which the system clock
generates timing pulses, measured in MHz or GHz
The CPU cycle time – inverse of clock rate, measured in
nanosecond
Q: A computer has clock rate 5 GHz, what is CPU cycle
time?
Clock Rate

The clock rate / cycle time is only a part of CPU performance
measurement

The rate of actual / average instruction execution is measured in MIPS
or MFLOPS



Simple instructions need one cycle time
Complex instructions usually need multiple cycle time
CPU relies on slower devices (RAM or HD) for supply of data



The performance of a computer system is not only decided by the CPU,
but also by other devices (RAM, system bus … )
Example …
Wait state: each clock cycle that the CPU spends waiting for a slower
device
Q: How to enhance the performance of a computer system ?
Enhancing Processor Performance
Memory caching
(in Ch 5)
Pipelining
Method of organizing CPU circuitry to
enable multiple instructions to execute
simultaneously in different stages
Branch prediction
and speculative
execution
Ensure pipeline is kept full while
executing conditional branch
instructions
Multiprocessing
Duplicate CPUs or processor stages
execute in parallel
Pipelining


Basic observation: each step in fetch and execution cycle is
performed by a separate portion or stage of the CPU circuitry.
1.
Fetch
2.
Increment IP
3.
Decode
4.
Access ALU inputs
5.
Execute arithmetic / comparison
6.
Store ALU output
Pipelining


Organizing CPU circuitry to enable multiple instructions to be in different
stages of execution at the same time.
Similar to an assembly line
Challenges in Pipelining
It is difficult to fully realize the theoretical
improvement of pipelining

hard to design a CPU that finishes different stages in
one clock cycle.

Some instructions in a program are not executed
sequentially.
 Conditional branch – CPU does not know which
branch to take until evaluate a condition  has to
break the pipelining
Branch prediction and speculative execution

Some solutions to the problem of conditional branch in pipelining
 Early evaluation



Branch prediction



CPU gets several instructions ahead of the current one, exam
whether there is conditional branch instruction, if so, try to evaluate
the conditional branch instruction earlier.
But not always possible
CPU guesses which branch to take based on past experience
(maybe CPU executed this portion of code before)
Cannot guarantee taking the correct path
Simultaneous execution


CPU executes both paths until the condition branch is evaluated,
then aborts the incorrect path
Requires redundant CPU stages and registers
Multiprocessing

Multiprocessing: CPU architecture duplicates
CPUs or processor stages can execute in
parallel.

Some approaches of multiprocessing:



Duplicate circuitry for some or all processing stages within a
single CPU (80’s)
e.g. Sun UltraSparc CPU duplicates ALU, Registers
Embedding multiple CPUs in a computer system and sharing
memory (90’s)
Multiple CPUs to be placed on the same chip and sharing
memory
 Enable multiple CPUs communicate at higher speed (2000 -)
Index







CPU execution cycle
CPU instructions
Instruction format
CPU design: CISC vs. RISC
CPU Registers
Enhancing CPU performance
The limitations of semiconductor-based
microprocessors
The Physical CPU

CPU is a complex system



Contains millions of switches, which perform basic
processing functions
From early CPUs with hundreds of switches to modern
CPUs with millions of switches
The physical implementation of CPUs


Switches and Gates
Are basic building blocks of computer processing
circuits
Switches and Gates

Electronic switches


Control electrical current flow in a circuit
Implemented as transistors



Gates



a solid state semiconductor device
control the flow of electronic current
An interconnection of switches
A circuit that can perform a processing function on an
individual binary electrical signal, or bit
The construction of switches and the properties of
electricity determine the CPU’s speed and reliability.
(The symbols of gates are not required.)
Addition circuit: combines a half-adder and an array of full adders
(The detailed layout is not required.)
Electrical Properties of switches and Gates
Conductivity
Ability of an element to enable electron flow
(conductor)
Resistance
Loss of electrical power that occurs within a conductor
(electrical energy  heat and/or light)
Heat
Two negative effects of heat:
 Physical damage to conductor
 Changes to inherent resistance of conductor
Dissipate heat with a heat sink, fan
Speed and
circuit length
Time required to perform a processing operation is a function
of length of circuit and speed of light
Miniaturization: reduce circuit length for faster processing
Processor Fabrication

Performance and reliability of processors has
increased with improvements in materials and
fabrication techniques


Transistors and integrated circuits (ICs)
Microchips and microprocessors

First microprocessor (1971) 2,300 transistor

Current memory chip – 300 million
transistors

Small circuit size, low-resistance materials,
and heat dissipation ensure fast and reliable
operation

Fabricated using expensive processes etching process (details are not required)
Current Technology Capabilities and
Limitations
Moore’s Law: rate of increase in transistor density on microchips
doubles every 18-24 months with no increase in unit Cost
Current Technology Capabilities and
Limitations

Rock’s Law


Arthur Rock made a short addendum to Moore’s
Law
Cost of fabrication facilities for the latest chip
generation doubles every four years

E.g., A fabrication facility using latest production
processes costs at least $10 B.
Q1: Does Rock’s Law mean CPUs are becoming more expensive?
Q2: Would Moore’s Law always be true?
Future Trends

Semiconductors are approaching fundamental physical
size limits


Further miniaturization will be more difficult to achieve
 The nature of etching process
 The limits of semiconducting materials
Some technologies may improve performance beyond
semiconductor limitations (details are not required)



Optical processing
Hybrid optical-electrical processing
Quantum processing