Download File - MCE-CSE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Asynchronous I/O wikipedia , lookup

Transcript
1. Explain Memory Hierarchy
INTRODUCTION
Computer pioneers correctly predicted that programmers would want unlimited amounts of fast
memory. An economical solution to that desire is a memory hierarchy, which takes advantage of
locality and cost-performance of memory technologies.
The principle of locality says that most programs do not access all code or data uniformly.
Locality occurs in time (temporal locality) and in space (spatial locality). This principle, plus the
guideline that smaller hardware can be made faster, led to hierarchies based on memories of
different speeds and sizes. The goal is to provide a memory system with cost per byte almost as
low as the cheapest level of memory and speed almost as fast as the fastest level.
Note that each level maps addresses from a slower, larger memory to a smaller but faster
memory higher in the hierarchy. As part of address mapping, the memory hierarchy is given the
responsibility of address checking; hence, protection schemes for scrutinizing addresses are also
part of the memory hierarchy.When a word is not found in the cache, the word must be fetched
from the memory and placed in the cache before continuing. Multiple words, called a block (or
line ), are moved for efficiency reasons. Each cache block includes a tag to see which memory
address it corresponds to.
A key design decision is where blocks (or lines) can be placed in a cache. The most popular
scheme is set associative, where a set is a group of blocks in the cache. A block is first mapped
onto a set, and then the block can be placed anywhere within that set. Finding a block consists of
first mapping the block address to the set, and then searching the set—usually in parallel—to
find the block. The set is chosen by the address of the data:
(Block address) MOD (Number of sets in cache)
If there are n blocks in a set, the cache placement is called n-way set associative. The end points
of set associativity have their own names. A direct-mapped cache has just one block per set (so a
block is always placed in the same location), and a fully associative cache has just one set (so a
block can be placed anywhere).
Caching data that is only read is easy, since the copy in the cache and memory will be identical.
Caching writes is more difficult: how can the copy in the cache and memory be kept consistent?
There are two main strategies. A writethrough cache updates the item in the cache and writes
through to update main memory. A write-back cache only updates the copy in the cache.
Miss rate is simply the fraction of cache accesses that result in a miss—that is, the number of
accesses that miss divided by the number of accesses. To gain insights into the causes of high
miss rates, which can inspire better cache designs, the three Cs model sorts all misses into three
simple categories:
Compulsory:
The very first access to a block cannot be in the cache, so the block must be brought into the
cache. Compulsory misses are those that occur even if you had an infinite cache.
Capacity:
If the cache cannot contain all the blocks needed during execution of a program, capacity misses
(in addition to compulsory misses) will occur because of blocks being discarded and later
retrieved.
Conflict:
If the block placement strategy is not fully associative, conflict misses (in addition to compulsory
and capacity misses) will occur because a block may be discarded and later retrieved if
conflicting blocks map to its set.
Some designers prefer measuring misses per instruction rather than misses per memory reference
(miss rate). These two are related:
For speculative processors, we only count instructions that commit.
The problem with both measures is that they don’t factor in the cost of a miss.
where Hit time is the time to hit in the cache and Miss penalty is the time to replace the block
from memory (that is, the cost of a miss). Average memory access time is still an indirect
measure of performance; although it is a better measure than miss rate, it is not a substitute for
execution time.
LARGER BLOCK SIZE TO REDUCE MISS RATE
The simplest way to reduce the miss rate is to take advantage of spatial locality and increase the
block size. Note that larger blocks also reduce compulsory misses, but they also increase the
miss penalty.
BIGGER CACHES TO REDUCE MISS RATE
The obvious way to reduce capacity misses is to increase cache capacity. Drawbacks include
potentially longer hit time of the larger cache memory and higher cost and power.
HIGHER ASSOCIATIVITY TO REDUCE MISS RATE
Obviously, increasing associativity reduces conflict misses. Greater associativity can come at the
cost of increased hit time.
MULTILEVEL CACHES TO REDUCE MISS PENALTY
A difficult decision is whether to make the cache hit time fast, to keep pace with the increasing
clock rate of processors, or to make the cache large, to overcome the widening gap between the
processor and main memory.
Adding another level of cache between the original cache and memory simplifies the decision
(see Below Figure).
The first-level cache can be small enough to match a fast clock cycle time, yet the second-level
cache can be large enough to capture many accesses that would go to main memory.
GIVING PRIORITY TO READ MISSES OVER WRITES TO REDUCE MISS
PENALTY
A write buffer is a good place to implement this optimization. Write buffers create hazards
because they hold the updated value of a location needed on a read miss—that is, a read-afterwrite hazard through memory.
One solution is to check the contents of the write buffer on a read miss. If there are no conflicts,
and if the memory system is available, sending the read before the writes reduces the miss
penalty. Most processors give reads priority over writes.
AVOIDING ADDRESS TRANSLATION DURING INDEXING OF THE CACHE TO
REDUCE HIT TIME
Caches must cope with the translation of a virtual address from the processor to a physical
address to access memory. Below Figure shows a typical relationship between aches, translation
lookaside buffers (TLBs), and virtual memory.
A common optimization is to use the page offset—the part that is identical in both virtual and
physical addresses—to index the cache. The virtual part of the address is translated while the
cache is read using that index, so the tag match can use physical addresses.
This scheme allows the cache read to begin immediately, and yet the tag comparison still uses
physical addresses. The drawback of this virtually indexed, physically tagged optimization is that
the size of the page limits the size of the cache.
For example, a direct-mapped cache can be no bigger than the page size. Higher associativity can
keep the cache index in the physical part of the address and yet still support a cache larger than a
page.
For example, doubling associativity while doubling the cache size maintains the size of the
index, since it is controlled by this formula:
A seemingly obvious alternative is to just use virtual addresses to access the cache, but this can
cause extra overhead in the operating system. Note that each of these six optimizations above has
a potential disadvantage that can lead to increased, rather than decreased, average memory
access time.
SPEED, SIZE AND COST
An ideal memory would be fast, large and inexpensive. It is that a very fast memory can be
implemented if SRAM chips are used. But these chips are expensive because their basic cells
have six transistors and a large number of basic cells are required to build a single chip. So it
is impractical to build a large memory using SRAM chips. The alternative is to use DRAM
chips, which have simpler basic cells and thus, they are less expensive. But such memories
are significantly slower.
 Although dynamic memory units can be implemented at a reasonable cost with the
capacity of hundreds of megabytes, still the affordable size is small compared to the large
voluminous data. This leads to the solution of secondary storage, mainly magnetic disks,
to implement large memory spaces. Very large disks are available at a reasonable price,
and they are used extensively in computer systems, but they are much slower than the
semiconductor memory units. So we conclude the following:
 A huge amount of cost-effective storage can be provided by magnetic disks.
 A large, yet affordable, main memory can be built with dynamic RAM technology.
 A smaller unit is the SRAM, where speed is more such as in cache memories.
 All of these different types of memory units are employed effectively in a computer. The
entire computer memory can be viewed as a hierarchy, shown in fig. The access is faster
in processor registers and they are at the top in memory hierarchy.
 The next level of the hierarchy is 'a relatively small amount of memory that can be
implemented directly on the processor chip. This memory is called a processor cache,
holds copies of instructions and data from the larger memory. Cache is classified into two
levels. A primary cache is located on the processor chip and it is small because it
competes for space, which must implement many other functions.
CPU
Increased speed
Register
1 Cache
Increased
cost per
bit
2 Cache L2
Main memory
Magnetic 2
memory
Fig:Memory hierarchy
The primary cache is referred to as level l (Ll) cache. A larger, secondary cache is placed
between the primary cache and the rest of the memory. It is refel Ted to as level 2 (L2) cache. It
is usually implemented using SRAM chips.
 The most common way of designing computers is to include a primary cache on the
processor chip and a larger secondary cache.
 The next level in the hierarchy is called the main memory. This rather large memory in
implemented using dynamic memory components, typically in the form of SIMMs,
DIMMs or RlMMs. The main memory is much larger but significantly slower than the
cache memory.Disk devices provide a huge amount of inexpensive storage. They are very
slow compared to the semiconductor devices.
2.
Write a Short notes on Memory Technology
•
Performance metrics
–
Latency: two measures
•
Access time: The time between when a read is requested and when the
desired word arrives.
•
–
Cycle time: The minimum time between requests to memory.
Usually cycle time > access time
DRAM
Refresh time < 5%; slow increase in speed
SRAM, ROM and Flash Technology
•
SRAM
–
No refresh
–
8 to 16 times faster than DRAM
•
–
8 to 16 times more expensive than DRAM
–
Suitable for embedded applications
ROM and flash
–
Non-volatile
–
Best suit the embedded processors
Improving Memory Performance in a Standard DRAM Chip
•
Use of multi-bank organization provides larger bandwidth
•
Other three methods to increase bandwidth
–
Fast page mode
•
–
Repeated accesses to a row without another row access time.
Synchronous DRAM
•
Have a programmable register to hold the number of bytes requested and
hence can send many bytes over several cycles per request with the
overhead of synchronizing the controller.
–
Double Data Rate (DDR) DRAM
•
Use falling and rising edges of the clock for transfering data.
RAMBUS DRAM (RDARM)
•
Each chip has interleaved memory and a high-speed interface and acts more like a
memory system.
•
RDARM: First generation RAMBUS DRAM
–
Drop RAS/CAS, replacing it with a bus that allows other accesses over the bus
between the sending of the address and return of the data (called packet-switched
bus or split-transaction bus).
•
–
Use double edges of the clock.
–
Runs at 300MHZ.
Direct RDRAM (DRDRAM): Second generation
–
Separate data, row, column buses such that three transactions on these buses can
be performed simultaneously.
–
•
Runs at 400 MHZ.
Comparing RAMBUS and DDRSDRAM
–
Both increase memory bandwidth.
–
None help in reducing latency.
3. Explain Cache Memories.
The effectiveness of cache mechanism is based on the property of ‘Locality of reference’.
Locality of Reference:
time period and remainder of the program is accessed relatively infrequently.
Temporal(The recently executed instruction are likely to be executed again very soon.)
Spatial(The instructions in close proximity to recently executed instruction are also likely to
be executed soon.)
If the active segment of the program is placed in cache memory, then the total execution time
can be reduced significantly.
1. The term Block refers to the set of contiguous address locations of some size.
2. The cache line is used to refer to the cache block.
The Cache memory stores a reasonable number of blocks at a given time but this number is small
compared to the total number of blocks available in Main Memory.
1. The correspondence between main memory block and the block in cache memory is specified
by a mapping function.
2. The Cache control hardware decide that which block should be removed to create space for
the new block that contains the referenced word.
3. The collection of rule for making this decision is called the replacement algorithm.
4. The cache control circuit determines whether the requested word currently exists in the cache.
5. If it exists, then Read/Write operation will take place on appropriate cache location. In this
case Read/Write hit will occur.
6 .In a Read operation, the memory will not involve.
7. The write operation is proceed in 2 ways.They are,
a) Write-through protocol
b) Write-back protocol
4. Short notes for Mapping Function:
1.Direct Mapping:
is the simplest technique in which block j of the main memory maps onto block ‘j’ modulo
128 of the cache.
block 0.
k 1 and so on.
block.
Merit:
It is easy to implement.
Demerit:
It is not very flexible.
2.Associative Mapping:
1. 12 tag bits will identify a memory block when it is resolved in the cache.
2. The tag bits of an address received from the processor are compared to the tag bits of each
block of the cache to see if the desired block is persent.This is called associative mapping.
3. It gives complete freedom in choosing the cache location.
4. A new block that has to be brought into the cache has to replace(eject)an existing block if the
cache is full.
5. In this method,the memory has to determine whether a given block is in the cache.
6. A search of this kind is called an associative Search.
Merit:
It is more flexible than direct mapping technique.
Demerit:
Its cost is high.
3.Set-Associative Mapping:
1.It is the combination of direct and associative mapping.
2. The blocks of the cache are grouped into sets and the mapping allows a block of the main
memory to reside in any block of the specified set.
3. In this case,the cache has two blocks per set,so the memory blocks 0,64,128……..4032 maps
into cache set ‘0’ and they can occupy either of the two block position within the set.
the tags of the two blocks of
the set to clock if the desired block is present.
7. The cache which contains 1 block per set is called direct Mapping.
8. A cache that has ‘k’ blocks per set is called as ‘k-way set associative cache’.
9. Each block contains a control bit called a valid bit.
10. The Valid bit indicates that whether the block contains valid data.
11. The dirty bit indicates that whether the block has been modified during its cache residency.
initially applied to system
a) If the main memory block is updated by a source & if the block in the source is already exists in the
cache,then the valid bit will be cleared to ‘0’.
b) If Processor & DMA uses the same copies of data then it is called as the Cache Coherence Problem.
Merit:
1.The Contention problem of direct mapping is solved by having few choices for block placement.
2. The hardware cost is decreased by reducing the size of associative search.
5.
Explain Cache memory with its performance
CACHE PERFORMANCE
•
The average memory access time formula above gives us three metrics for cache
•
Reducing the hit time: small and simple caches, way prediction, and trace caches
•
Increasing cache bandwidth: pipelined caches, multibanked caches, and nonblocking caches
•
Reducing the miss penalty: critical word first and merging write buffers
•
Reducing the miss rate: compiler optimizations
•
Reducing the miss penalty or miss rate via parallelism: hardware prefetching and compiler
prefetching
“Measuring and improving cache performance”
SMALL AND SIMPLE CACHES TO REDUCE HIT TIME
•
A time-consuming portion of a cache hit is using the index portion of the address to read the tag
memory and then compare it to the address. Smaller hardware can be faster, so a small cache can
help the hit time.
•
It is also critical to keep an L2 cache small enough to fit on the same chip as the processor to
avoid the time penalty of going off chip.
•
The second suggestion is to keep the cache simple, such as using direct mapping. One benefit of
direct-mapped caches is that the designer can overlap the tag check with the transmission of the
data. This effectively reduces hit time.
•
Hence, the pressure of a fast clock cycle encourages small and simple cache designs for firstlevel caches. For lower-level caches, some designs strike a compromise by keeping the tags on
chip and the data off chip, promising a fast tag check, yet providing the greater capacity of
separate memory chips.
•
Although the amount of on-chip cache increased with new generations of microprocessors, the
size of the L1 caches has recently not increased between generations.
•
One approach to determining the impact on hit time in advance of building a chip is to use CAD
tools. CACTI is a program to estimate the access time of alternative cache structures on CMOS
microprocessors within 10% of more detailed CAD tools. For a given minimum feature size, it
estimates the hit time of caches as you vary cache size, associativity, and number of read/write
ports.
Example
•
Assume that the hit time of a two-way set-associative first-level data cache is 1.1 times faster
than a four-way set-associative cache of the same size. The miss rate falls from 0.049 to 0.044
for an 8 KB data cache. Assume a hit is 1 clock cycle and that the cache is the critical path for the
clock. Assume the miss penalty is 10 clock cycles to the L2 cache for the two-way set-associative
cache, and that the L2 cache does not miss. Which has the faster average memory access time?
Answer
•
For the two-way cache:
•
For the four-way cache, the clock time is 1.1 times longer. The elapsed time of the miss penalty
should be the same since it’s not affected by the processor clock rate, so assume it takes 9 of the
longer clock cycles:
•
If it really stretched the clock cycle time by a factor of 1.1, the performance impact would be
even worse than indicated by the average memory access time, as the clock would be slower
even when the processor is not accessing the cache.
•
Another approach reduces conflict misses and yet maintains the hit speed of direct-mapped
cache. In way prediction, extra bits are kept in the cache to predict the way, or block within the
set of the next cache access.
•
This prediction means the multiplexor is set early to select the desired block, and only a single
tag comparison is performed that clock cycle in parallel with reading the cache data.
•
A miss results in checking the other blocks for matches in the next clock cycle. Added to each
block of a cache are block predictor bits. The bits select which of the blocks to try on the next
cache access.
•
If the predictor is correct, the cache access latency is the fast hit time. If not, it tries the other
block, changes the way predictor, and has a latency of one extra clock cycle.
•
Simulations suggested set prediction accuracy is in excess of 85% for a two-way set, so way
prediction saves pipeline stages more than 85% of the time. Way prediction is a good match to
speculative processors, since they must already undo actions when speculation is unsuccessful.
The Pentium 4 uses way prediction.
6. Explain Virtual Memory
VM divides physical memory into blocks and allocates them to different processes, each of
which has its own address space.
•
Need a protection scheme that restricts a process to the blocks belonging only to that process.
•
With VM, not all code and data are needed to be in physical memory before a program can
begin.
•
VM provides process (program) relocation.
•
Virtual address
–
•
Physical address
–
•
Given by CPU
For having an access to main memory
Address translation
–
Convert a virtual address to a physical address.
–
Can easily form the critical path that limits the clock cycle time
Types of virtual machine

Paged

Segmented

Paged segment
Protection and Examples of VM
•
Process
–
•
Process (context) switch
–
•
A running program plus any state needed to continue running it.
One process is stop execution and another process is brought into execution.
Requirements for context switches
–
Be able to save CPU states for continue execution
•
–
Protect a process from been interfered by another process
•
•
A computer designer’s responsibility
OS’s responsibility
Computer designers can make protection easily implemented by the OS via VM design.
7. Explain about TLB
Translation-Lookaside Buffer (TLB) A cache that keeps track of recently used address
mappings to avoid an access to the page table.
Page table resides in memory
–Each translation requires accessing memory
–Might be required for each load/store!
TLB
–Cache recently used PTEs
–speed up translation
–typically 128 to 256 entries
–usually 4 to 8 way associative
–TLB access time is comparable to L1 cache
Typically more than once
■ TLB size: 16–512 entries
■ Block size: 1–2 page table entries (typically 4–8 bytes each)
■ Hit time: 0.5–1 clock cycle
■ Miss penalty: 10–100 clock cycles
■ Miss rate: 0.01%–1%
7. Discuss about Accessing I/O Devices
1.Interface to CPU and Memory
2.Interface to one or more peripherals
 I/O systems focus on dependability and cost
 Processors and Memory focus on performance and cost.
I/O systems must also plan for expandability and for diversity of devices, which is not a concern for
processors
Characteristics:
 Behaviour: Input (read once), output (write only, cannot be read), or storage (can be reread and
usually rewritten).
 Partner: Either a human or a machine is at the other end of the I/O device, either feeding data on
input or reading data on output.
Data rate: The peak rate at which data can be transferred between the I/O device & the main memory or
processor
Interface for an IO Device:
CPU checks I/O module device status •I/O module returns status
•If ready, CPU requests data transfer •I/O module gets data from device
•I/O module transfers data to CPU
8. Explain in detail about Direct Memory Access
•
•
•
•
•
•
A special control unit may be provided to allow the transfer of large block of data
at high speed directly between the external device and main memory , without
continous intervention by the processor. This approach is called DMA.
DMA transfers are performed by a control circuit called the DMA Controller.
To initiate the transfer of a block of words , the processor sends,
Ø Starting address
Ø Number of words in the block
Ø Direction of transfer.
When a block of data is transferred , the DMA controller increment the memory
address for successive words and keep track of number of words and it also informs
the processor by raising an interrupt signal.
While DMA control is taking place, the program requested the transfer cannot
continue and the processor can be used to execute another program.
After DMA transfer is completed, the processor returns to the program that requested
the transfer.
R/W à Determines the direction of transfer .
When
R/W =1, DMA controller read data from memory to I/O device.
R/W =0, DMA controller perform write operation.
Done Flag=1, the controller has completed transferring a block of data and is
ready to receive another command.
IE=1, it causes the controller to raise an interrupt (interrupt Enabled) after it has
completed transferring the block of data.
IRQ=1, it indicates that the controller has requested an interrupt.
Fig: Use of DMA controllers in a computer system
•
•
•
A DMA controller connects a high speed network to the computer bus . The disk
controller two disks, also has DMA capability and it provides two DMA channels.
To start a DMA transfer of a block of data from main memory to one of the disks,
the program write s the address and the word count inf. Into the registers of the
corresponding channel of the disk controller.
When DMA transfer is completed, it will be recorded in status and control
registers of the DMA channel (ie) Done bit=IRQ=IE=1.
Cycle Stealing:
•
•
•
•
Requests by DMA devices for using the bus are having higher priority than
processor requests .
Top priority is given to high speed peripherals such as ,
Ø Disk
Ø High speed Network Interface and Graphics display device.
Since the processor originates most memory access cycles, the DMA controller
can be said to steal the memory cycles from the processor.
This interviewing technique is called Cycle stealing.
Burst Mode:
The DMA controller may be given exclusive access to the main memory to
transfer a block of data without interruption. This is known as Burst/Block Mode
Bus Master:
The device that is allowed to initiate data transfers on the bus at any given time is
called the bus master.
Bus Arbitration:
It is the process by which the next device to become the bus master is selected and
the bus mastership is transferred to it.
Types:
There are 2 approaches to bus arbitration. They are,
Ø Centralized arbitration ( A single bus arbiter performs arbitration)
Ø Distributed arbitration (all devices participate in the selection of next bus
master).
Centralized Arbitration:
•
•
•
•
•
•
Here the processor is the bus master and it may grants bus mastership to one of its
DMA controller.
A DMA controller indicates that it needs to become the bus master by activating
the Bus Request line (BR) which is an open drain line.
The signal on BR is the logical OR of the bus request from all devices connected
to it.
When BR is activated the processor activates the Bus Grant Signal (BGI) and
indicated the DMA controller that they may use the bus when it becomes free.
This signal is connected to all devices using a daisy chain arrangement.
If DMA requests the bus, it blocks the propagation of Grant Signal to other
devices and it indicates to all devices that it is using the bus by activating open
collector line, Bus Busy (BBSY).
Fig:A simple arrangement for bus arbitration using a daisy chain
Fig: Sequence of signals during transfer of bus mastership for the devices
•
•
•
•
The timing diagram shows the sequence of events for the devices connected to the
processor is shown.
DMA controller 2 requests and acquires bus mastership and later releases the bus.
During its tenture as bus master, it may perform one or more data transfer.
After it releases the bus, the processor resources bus mastership
Distributed Arbitration:
It means that all devices waiting to use the bus have equal responsibility in carrying out
the arbitration process.
Fig:A distributed arbitration scheme
•
•
Each device on the bus is assigned a 4 bit id.
When one or more devices request the bus, they assert the Start-Arbitration signal
& place their 4 bit ID number on four open collector lines, ARB0 to ARB3.
• A winner is selected as a result of the interaction among the signals transmitted
over these lines.
• The net outcome is that the code on the four lines represents the request
that has the highest ID number.
• The drivers are of open collector type. Hence, if the i/p to one driver is
equal to 1, the i/p to another driver connected to the same bus line is
equal to ‘0’(ie. bus the is in low-voltage state).
Eg:
•
•
•
•
•
Assume two devices A & B have their ID 5 (0101), 6(0110) and their
code is 0111.
Each devices compares the pattern on the arbitration line to its own ID
starting from MSB.
If it detects a difference at any bit position, it disables the drivers at that
bit position. It does this by placing ‘0’ at the i/p of these drivers.
In our eg. ‘A’ detects a difference in line ARB1, hence it disables the
drivers on lines ARB1 & ARB0.
This causes the pattern on the arbitration line to change to 0110 which
means that ‘B’ has won the contention.
9. Short Notes on Buses
1.A bus protocol is the set of rules that govern the behavior of various devices connected to the
bus ie, when to place information in the bus, assert control signals etc.
2. The bus lines used for transferring data is grouped into 3 types. They are,
Control signals
It also carries timing infn/. (ie) they specify the time at which the
processor & I/O devices place the data on the bus & receive the data
from the bus.
1. During data transfer operation, one device plays the role of a ‘Master’.
2. Master device initiates the data transfer by issuing read / write command on the bus. Hence it
is also called as ‘Initiator’.
3. The device addressed by the master is called as Slave / Target.
Types of Buses:
There are 2 types of buses. They are,
Synchronous Bus:1.In synchronous bus, all devices derive timing information from a common clock line.
2. Equally spaced pulses on this line define equal time.
bus cycle’, one data transfer on take place.
1. The ‘crossing points’ indicate the tone at which the patterns change.
2. A ‘signal line’ in an indeterminate / high impedance state is represented by an intermediate
half way between the low to high signal levels.
Asynchronous Bus:-
An alternate scheme for controlling data transfer on. The bus is based on the use of ‘handshake’
between Master & the Slave. The common clock is replaced by two timing control lines.
–
10 . Explain briefly about Interrupts
1).When a program enters a wait loop, it will repeatedly check the device status. During this
period, the processor will not perform any function.
2). The Interrupt request line will send a hardware signal called the interrupt signal to the
processor.
3). On receiving this signal, the processor will perform the useful function during the waiting
period.
4). The routine executed in response to an interrupt request is called Interrupt Service Routine.
5). The interrupt resembles the subroutine calls.
6.The processor first completes the execution of instruction i Then it loads the PC(Program
Counter) with the address of the first instruction of the ISR.
7. After the execution of ISR, the processor has to come back to instruction i + 1.
8. Therefore, when an interrupt occurs, the current contents of PC which point to i +1 is put in
temporary storage in a known location.
9. A return from interrupt instruction at the end of ISR reloads the PC from that temporary
storage location, causing the execution to resume at instruction i+1.
10. When the processor is handling the interrupts, it must inform the device that its request has
been recognized so that it remove its interrupt requests signal.
11. This may be accomplished by a special control signal called the interrupt acknowledge
signal.
12. The task of saving and restoring the information can be done automatically by the processor.
13. The processor saves only the contents of program counter & status register (ie) it saves only
the minimal amount of information to maintain the integrity of the program execution.
14.Saving registers also increases the delay between the time an interrupt request is received and
the start of the execution of the ISR. This delay is called the Interrupt Latency.
the long interrupt latency in unacceptable.
processing of certain routines must be accurately timed relative to external events. This
application is also called as real-time processing.
Interrupt Hardware:
1.A single interrupt request line may be used to serve ‘n’ devices. All devices are connected to
the line via switches to ground.
2. To request an interrupt, a device closes its associated switch, the voltage on INTR line drops
to 0(zero).
3. If all the interrupt request signals (INTR1 to INTRn) are inactive, all switches are open and
the voltage on INTR line is equal to Vdd.
4. When a device requests an interrupts, the value of INTR is the logical OR of the requests from
individual devices.
Enabling and Disabling Interrupts:
1.The arrival of an interrupt request from an external device causes the processor to suspend the
execution of one program & start the execution of another because the interrupt may alter the
sequence of events to be executed.
INTR is active during the execution of Interrupt Service Routine.
There are 3 mechanisms to solve the problem of infinite loop which occurs due to successive
interruptions of active INTR signals.
The following are the typical scenario.
The device raises an interrupt request.
The processor interrupts the program currently being executed.
device is informed that its request has been recognized & in response, it deactivates the INTR
signal.
The actions are enabled & execution of the interrupted program is resumed.
Edge-triggered:
The processor has a special interrupt request line for which the interrupt handling circuit
responds only to the leading edge of the signal. Such a line said to be edge-triggered.
Handling Multiple Devices:
When several devices requests interrupt at the same time, it raises some questions. They are.
How can the processor recognize the device requesting an interrupt?
Given that the different devices are likely to require different ISR, how can the processor obtain
the starting address of the appropriate routines in each case?
Should a device be allowed to interrupt the processor while another interrupt is being serviced?
How should two or more simultaneous interrupt requests be handled?
Polling Scheme:
If two devices have activated the interrupt request line, the ISR for the selected device (first
device) will be completed & then the second request can be serviced.
The simplest way to identify the interrupting device is to have the ISR polls all the encountered
-> when a device
raises an interrupt requests, the status register IRQ is set to 1.
Merit:
It is easy to implement.
Demerit:
The time spent for interrogating the IRQ bits of all the devices that may not be requesting any
service.
Vectored Interrupt:
Here the device requesting an interrupt may identify itself to the processor by sending a special
code over the bus & t
ing
address to ISR. The processor reads this address, called the interrupt vector & loads into PC.
The interrupt vector also includes a new value for the Processor Status Register.
When the processor is ready to receive the interrupt vector code, it activate the interrupt
acknowledge (INTA) line.
Interrupt Nesting:
Multiple Priority Scheme:
In multiple level priority scheme, we assign a priority level to the processor that can be changed
under program control.
The priority level of the processor is the priority of the program that is currently being executed.
The processor accepts interrupts only from devices that have priorities higher than its own.
At the time the execution of an ISR for some device is started, the priority of the processor is
raised to that of the device.
The action disables interrupts from devices at the same level of priority or lower.
Privileged Instruction:
The processor priority is usually encoded in a few bits of the Processor Status word. It can also
be changed by program instruction & then it is write into PS. These instructions are called
privileged instruction.
This can be executed only when the processor is in supervisor mode.
supervisor mode only when executing OS routines.
It switches to the user mode before beginning to execute application program.
Initiating the Interrupt Process:
Load the starting address of ISR in location INTVEC (vectored interrupt).
Load the address LINE in a memory location PNTR. The ISR will use this location as a pointer
to store the i/p characters in the memory.
Enable the keyboard interrupts by setting bit 2 in register CONTROL to 1.
Enable interrupts in the processor by setting to 1, the IE bit in the processor status register PS.
Exception of ISR:
cause the interface circuits to remove its interrupt requests.
end of lin
interrupt.
Exceptions:
1. An interrupt is an event that causes the execution of one program to be suspended and the
execution of another program to begin.
2. The Exception is used to refer to any event that causes an interruption.
Kinds of exception:
Recovery from errors
Debugging
Privileged Exception
Recovery From Errors:
.1Computers have error-checking code in Main Memory , which allows detection of errors in the
stored data.
2. If an error occurs, the control hardware detects it informs the processor by raising an interrupt.
3. The processor also interrupts the program, if it detects an error or an unusual condition while
executing the instance (ie) it suspends the program being executed and starts an execution
service routine.
4. This routine takes appropriate action to recover from the error.
Debugging:
1. System software has a program called debugger, which helps to find errors in a program.
2. The debugger uses exceptions to provide two important facilities
3. They are
Trace
Breakpoint
Trace Mode:
1. When processor is in trace mode , an exception occurs after execution of every instance using
the debugging program as the exception service routine.
2. The debugging program examine the contents of registers, memory location etc.
3. On return from the debugging program the next instance in the program being debugged is
executed
4. The trace exception is disabled during the execution of the debugging program.
Break point:
1. Here the program being debugged is interrupted only at specific points selected by the user.
An instance called the Trap (or) software interrupt is usually provided for this purpose.
2. While debugging the user may interrupt the program execution after instance ‘I’
3. When the program is executed and reaches that point it examine the memory and register
contents.
Privileged Exception:
1. To protect the OS of a computer from being corrupted by user program certain instance can be
executed only when the processor is in supervisor mode. These are called privileged exceptions.
2. When the processor is in user mode, it will not execute instance (ie) when the processor is in
supervisor mode , it will execute instance.
11. Explain I/O Processor
A specialized processor, not only loads and stores into memory but also can execute
instructions, which are among a set of I/O instructions
• The IOP interfaces to the system and devices
• The sequence of events involved in I/O transfers to move or operate the results of an I/O
operation into the main memory (using a program for IOP, which is also in main
memory)Schaum’s Outline of Theory and Problems of Computer Architecture
I/O processor:
• Used to address the problem of direct transfer after executing the necessary format
conversion or other instructions
• In an IOP-based system, I/O devices can directly access the memory without intervention
by the processor
IOP instructions
• Instructions help in format conversionsbyte from memory as packed decimals to the output
device for line-printer
• The I/O device data in different format can be transferred to main memory using an IOP
12. Explain I/O interfacing techniques
a. Memory mapped I/O
b. I/O mapped I/O
Memory mapped I/O
The total memory address space is partitioned and part of this space is devoted to I/O
addressing
When this technique is used, a memory reference instruction that causes data to be
fetched from or stored at address specified, automatically becomes an I/O instruction if
that address is made the address of an I/O port.
Memory
Address space
I/O address
space
Total
address space
Advantages
The usual memory related instruction are used for I/O related operations. The special I/O
instructions are not required.
Disadvantages
The memory address space is reduced.
I/O mapped I/O
Memory
Total
address
space
Address space I/O address
I/O address space
space
We do not want to reduce the memory address space, we allot a different I/O address
space, apart from total memory space which is called I/O mapped I/O technique.
Advantage
The advantage is that the full memory address space is available.
Disadvantage
The memory related instruction do not work. Therefore, processor can only use this mode
if it has special instructions for I/O related operations such as I/O read, I/O wrire.