Download File - MCE-CSE

1. Explain Memory Hierarchy INTRODUCTION Computer pioneers correctly predicted that programmers would want unlimited amounts of fast memory. An economical solution to that desire is a memory hierarchy, which takes advantage of locality and cost-performance of memory technologies. The principle of locality says that most programs do not access all code or data uniformly. Locality occurs in time (temporal locality) and in space (spatial locality). This principle, plus the guideline that smaller hardware can be made faster, led to hierarchies based on memories of different speeds and sizes. The goal is to provide a memory system with cost per byte almost as low as the cheapest level of memory and speed almost as fast as the fastest level. Note that each level maps addresses from a slower, larger memory to a smaller but faster memory higher in the hierarchy. As part of address mapping, the memory hierarchy is given the responsibility of address checking; hence, protection schemes for scrutinizing addresses are also part of the memory hierarchy.When a word is not found in the cache, the word must be fetched from the memory and placed in the cache before continuing. Multiple words, called a block (or line ), are moved for efficiency reasons. Each cache block includes a tag to see which memory address it corresponds to. A key design decision is where blocks (or lines) can be placed in a cache. The most popular scheme is set associative, where a set is a group of blocks in the cache. A block is first mapped onto a set, and then the block can be placed anywhere within that set. Finding a block consists of first mapping the block address to the set, and then searching the set—usually in parallel—to find the block. The set is chosen by the address of the data: (Block address) MOD (Number of sets in cache) If there are n blocks in a set, the cache placement is called n-way set associative. The end points of set associativity have their own names. A direct-mapped cache has just one block per set (so a block is always placed in the same location), and a fully associative cache has just one set (so a block can be placed anywhere). Caching data that is only read is easy, since the copy in the cache and memory will be identical. Caching writes is more difficult: how can the copy in the cache and memory be kept consistent? There are two main strategies. A writethrough cache updates the item in the cache and writes through to update main memory. A write-back cache only updates the copy in the cache. Miss rate is simply the fraction of cache accesses that result in a miss—that is, the number of accesses that miss divided by the number of accesses. To gain insights into the causes of high miss rates, which can inspire better cache designs, the three Cs model sorts all misses into three simple categories: Compulsory: The very first access to a block cannot be in the cache, so the block must be brought into the cache. Compulsory misses are those that occur even if you had an infinite cache. Capacity: If the cache cannot contain all the blocks needed during execution of a program, capacity misses (in addition to compulsory misses) will occur because of blocks being discarded and later retrieved. Conflict: If the block placement strategy is not fully associative, conflict misses (in addition to compulsory and capacity misses) will occur because a block may be discarded and later retrieved if conflicting blocks map to its set. Some designers prefer measuring misses per instruction rather than misses per memory reference (miss rate). These two are related: For speculative processors, we only count instructions that commit. The problem with both measures is that they don’t factor in the cost of a miss. where Hit time is the time to hit in the cache and Miss penalty is the time to replace the block from memory (that is, the cost of a miss). Average memory access time is still an indirect measure of performance; although it is a better measure than miss rate, it is not a substitute for execution time. LARGER BLOCK SIZE TO REDUCE MISS RATE The simplest way to reduce the miss rate is to take advantage of spatial locality and increase the block size. Note that larger blocks also reduce compulsory misses, but they also increase the miss penalty. BIGGER CACHES TO REDUCE MISS RATE The obvious way to reduce capacity misses is to increase cache capacity. Drawbacks include potentially longer hit time of the larger cache memory and higher cost and power. HIGHER ASSOCIATIVITY TO REDUCE MISS RATE Obviously, increasing associativity reduces conflict misses. Greater associativity can come at the cost of increased hit time. MULTILEVEL CACHES TO REDUCE MISS PENALTY A difficult decision is whether to make the cache hit time fast, to keep pace with the increasing clock rate of processors, or to make the cache large, to overcome the widening gap between the processor and main memory. Adding another level of cache between the original cache and memory simplifies the decision (see Below Figure). The first-level cache can be small enough to match a fast clock cycle time, yet the second-level cache can be large enough to capture many accesses that would go to main memory. GIVING PRIORITY TO READ MISSES OVER WRITES TO REDUCE MISS PENALTY A write buffer is a good place to implement this optimization. Write buffers create hazards because they hold the updated value of a location needed on a read miss—that is, a read-afterwrite hazard through memory. One solution is to check the contents of the write buffer on a read miss. If there are no conflicts, and if the memory system is available, sending the read before the writes reduces the miss penalty. Most processors give reads priority over writes. AVOIDING ADDRESS TRANSLATION DURING INDEXING OF THE CACHE TO REDUCE HIT TIME Caches must cope with the translation of a virtual address from the processor to a physical address to access memory. Below Figure shows a typical relationship between aches, translation lookaside buffers (TLBs), and virtual memory. A common optimization is to use the page offset—the part that is identical in both virtual and physical addresses—to index the cache. The virtual part of the address is translated while the cache is read using that index, so the tag match can use physical addresses. This scheme allows the cache read to begin immediately, and yet the tag comparison still uses physical addresses. The drawback of this virtually indexed, physically tagged optimization is that the size of the page limits the size of the cache. For example, a direct-mapped cache can be no bigger than the page size. Higher associativity can keep the cache index in the physical part of the address and yet still support a cache larger than a page. For example, doubling associativity while doubling the cache size maintains the size of the index, since it is controlled by this formula: A seemingly obvious alternative is to just use virtual addresses to access the cache, but this can cause extra overhead in the operating system. Note that each of these six optimizations above has a potential disadvantage that can lead to increased, rather than decreased, average memory access time. SPEED, SIZE AND COST An ideal memory would be fast, large and inexpensive. It is that a very fast memory can be implemented if SRAM chips are used. But these chips are expensive because their basic cells have six transistors and a large number of basic cells are required to build a single chip. So it is impractical to build a large memory using SRAM chips. The alternative is to use DRAM chips, which have simpler basic cells and thus, they are less expensive. But such memories are significantly slower.  Although dynamic memory units can be implemented at a reasonable cost with the capacity of hundreds of megabytes, still the affordable size is small compared to the large voluminous data. This leads to the solution of secondary storage, mainly magnetic disks, to implement large memory spaces. Very large disks are available at a reasonable price, and they are used extensively in computer systems, but they are much slower than the semiconductor memory units. So we conclude the following:  A huge amount of cost-effective storage can be provided by magnetic disks.  A large, yet affordable, main memory can be built with dynamic RAM technology.  A smaller unit is the SRAM, where speed is more such as in cache memories.  All of these different types of memory units are employed effectively in a computer. The entire computer memory can be viewed as a hierarchy, shown in fig. The access is faster in processor registers and they are at the top in memory hierarchy.  The next level of the hierarchy is 'a relatively small amount of memory that can be implemented directly on the processor chip. This memory is called a processor cache, holds copies of instructions and data from the larger memory. Cache is classified into two levels. A primary cache is located on the processor chip and it is small because it competes for space, which must implement many other functions. CPU Increased speed Register 1 Cache Increased cost per bit 2 Cache L2 Main memory Magnetic 2 memory Fig:Memory hierarchy The primary cache is referred to as level l (Ll) cache. A larger, secondary cache is placed between the primary cache and the rest of the memory. It is refel Ted to as level 2 (L2) cache. It is usually implemented using SRAM chips.  The most common way of designing computers is to include a primary cache on the processor chip and a larger secondary cache.  The next level in the hierarchy is called the main memory. This rather large memory in implemented using dynamic memory components, typically in the form of SIMMs, DIMMs or RlMMs. The main memory is much larger but significantly slower than the cache memory.Disk devices provide a huge amount of inexpensive storage. They are very slow compared to the semiconductor devices. 2. Write a Short notes on Memory Technology • Performance metrics – Latency: two measures • Access time: The time between when a read is requested and when the desired word arrives. • – Cycle time: The minimum time between requests to memory. Usually cycle time > access time DRAM Refresh time < 5%; slow increase in speed SRAM, ROM and Flash Technology • SRAM – No refresh – 8 to 16 times faster than DRAM • – 8 to 16 times more expensive than DRAM – Suitable for embedded applications ROM and flash – Non-volatile – Best suit the embedded processors Improving Memory Performance in a Standard DRAM Chip • Use of multi-bank organization provides larger bandwidth • Other three methods to increase bandwidth – Fast page mode • – Repeated accesses to a row without another row access time. Synchronous DRAM • Have a programmable register to hold the number of bytes requested and hence can send many bytes over several cycles per request with the overhead of synchronizing the controller. – Double Data Rate (DDR) DRAM • Use falling and rising edges of the clock for transfering data. RAMBUS DRAM (RDARM) • Each chip has interleaved memory and a high-speed interface and acts more like a memory system. • RDARM: First generation RAMBUS DRAM – Drop RAS/CAS, replacing it with a bus that allows other accesses over the bus between the sending of the address and return of the data (called packet-switched bus or split-transaction bus). • – Use double edges of the clock. – Runs at 300MHZ. Direct RDRAM (DRDRAM): Second generation – Separate data, row, column buses such that three transactions on these buses can be performed simultaneously. – • Runs at 400 MHZ. Comparing RAMBUS and DDRSDRAM – Both increase memory bandwidth. – None help in reducing latency. 3. Explain Cache Memories. The effectiveness of cache mechanism is based on the property of ‘Locality of reference’. Locality of Reference: time period and remainder of the program is accessed relatively infrequently. Temporal(The recently executed instruction are likely to be executed again very soon.) Spatial(The instructions in close proximity to recently executed instruction are also likely to be executed soon.) If the active segment of the program is placed in cache memory, then the total execution time can be reduced significantly. 1. The term Block refers to the set of contiguous address locations of some size. 2. The cache line is used to refer to the cache block. The Cache memory stores a reasonable number of blocks at a given time but this number is small compared to the total number of blocks available in Main Memory. 1. The correspondence between main memory block and the block in cache memory is specified by a mapping function. 2. The Cache control hardware decide that which block should be removed to create space for the new block that contains the referenced word. 3. The collection of rule for making this decision is called the replacement algorithm. 4. The cache control circuit determines whether the requested word currently exists in the cache. 5. If it exists, then Read/Write operation will take place on appropriate cache location. In this case Read/Write hit will occur. 6 .In a Read operation, the memory will not involve. 7. The write operation is proceed in 2 ways.They are, a) Write-through protocol b) Write-back protocol 4. Short notes for Mapping Function: 1.Direct Mapping: is the simplest technique in which block j of the main memory maps onto block ‘j’ modulo 128 of the cache. block 0. k 1 and so on. block. Merit: It is easy to implement. Demerit: It is not very flexible. 2.Associative Mapping: 1. 12 tag bits will identify a memory block when it is resolved in the cache. 2. The tag bits of an address received from the processor are compared to the tag bits of each block of the cache to see if the desired block is persent.This is called associative mapping. 3. It gives complete freedom in choosing the cache location. 4. A new block that has to be brought into the cache has to replace(eject)an existing block if the cache is full. 5. In this method,the memory has to determine whether a given block is in the cache. 6. A search of this kind is called an associative Search. Merit: It is more flexible than direct mapping technique. Demerit: Its cost is high. 3.Set-Associative Mapping: 1.It is the combination of direct and associative mapping. 2. The blocks of the cache are grouped into sets and the mapping allows a block of the main memory to reside in any block of the specified set. 3. In this case,the cache has two blocks per set,so the memory blocks 0,64,128……..4032 maps into cache set ‘0’ and they can occupy either of the two block position within the set. the tags of the two blocks of the set to clock if the desired block is present. 7. The cache which contains 1 block per set is called direct Mapping. 8. A cache that has ‘k’ blocks per set is called as ‘k-way set associative cache’. 9. Each block contains a control bit called a valid bit. 10. The Valid bit indicates that whether the block contains valid data. 11. The dirty bit indicates that whether the block has been modified during its cache residency. initially applied to system a) If the main memory block is updated by a source & if the block in the source is already exists in the cache,then the valid bit will be cleared to ‘0’. b) If Processor & DMA uses the same copies of data then it is called as the Cache Coherence Problem. Merit: 1.The Contention problem of direct mapping is solved by having few choices for block placement. 2. The hardware cost is decreased by reducing the size of associative search. 5. Explain Cache memory with its performance CACHE PERFORMANCE • The average memory access time formula above gives us three metrics for cache • Reducing the hit time: small and simple caches, way prediction, and trace caches • Increasing cache bandwidth: pipelined caches, multibanked caches, and nonblocking caches • Reducing the miss penalty: critical word first and merging write buffers • Reducing the miss rate: compiler optimizations • Reducing the miss penalty or miss rate via parallelism: hardware prefetching and compiler prefetching “Measuring and improving cache performance” SMALL AND SIMPLE CACHES TO REDUCE HIT TIME • A time-consuming portion of a cache hit is using the index portion of the address to read the tag memory and then compare it to the address. Smaller hardware can be faster, so a small cache can help the hit time. • It is also critical to keep an L2 cache small enough to fit on the same chip as the processor to avoid the time penalty of going off chip. • The second suggestion is to keep the cache simple, such as using direct mapping. One benefit of direct-mapped caches is that the designer can overlap the tag check with the transmission of the data. This effectively reduces hit time. • Hence, the pressure of a fast clock cycle encourages small and simple cache designs for firstlevel caches. For lower-level caches, some designs strike a compromise by keeping the tags on chip and the data off chip, promising a fast tag check, yet providing the greater capacity of separate memory chips. • Although the amount of on-chip cache increased with new generations of microprocessors, the size of the L1 caches has recently not increased between generations. • One approach to determining the impact on hit time in advance of building a chip is to use CAD tools. CACTI is a program to estimate the access time of alternative cache structures on CMOS microprocessors within 10% of more detailed CAD tools. For a given minimum feature size, it estimates the hit time of caches as you vary cache size, associativity, and number of read/write ports. Example • Assume that the hit time of a two-way set-associative first-level data cache is 1.1 times faster than a four-way set-associative cache of the same size. The miss rate falls from 0.049 to 0.044 for an 8 KB data cache. Assume a hit is 1 clock cycle and that the cache is the critical path for the clock. Assume the miss penalty is 10 clock cycles to the L2 cache for the two-way set-associative cache, and that the L2 cache does not miss. Which has the faster average memory access time? Answer • For the two-way cache: • For the four-way cache, the clock time is 1.1 times longer. The elapsed time of the miss penalty should be the same since it’s not affected by the processor clock rate, so assume it takes 9 of the longer clock cycles: • If it really stretched the clock cycle time by a factor of 1.1, the performance impact would be even worse than indicated by the average memory access time, as the clock would be slower even when the processor is not accessing the cache. • Another approach reduces conflict misses and yet maintains the hit speed of direct-mapped cache. In way prediction, extra bits are kept in the cache to predict the way, or block within the set of the next cache access. • This prediction means the multiplexor is set early to select the desired block, and only a single tag comparison is performed that clock cycle in parallel with reading the cache data. • A miss results in checking the other blocks for matches in the next clock cycle. Added to each block of a cache are block predictor bits. The bits select which of the blocks to try on the next cache access. • If the predictor is correct, the cache access latency is the fast hit time. If not, it tries the other block, changes the way predictor, and has a latency of one extra clock cycle. • Simulations suggested set prediction accuracy is in excess of 85% for a two-way set, so way prediction saves pipeline stages more than 85% of the time. Way prediction is a good match to speculative processors, since they must already undo actions when speculation is unsuccessful. The Pentium 4 uses way prediction. 6. Explain Virtual Memory VM divides physical memory into blocks and allocates them to different processes, each of which has its own address space. • Need a protection scheme that restricts a process to the blocks belonging only to that process. • With VM, not all code and data are needed to be in physical memory before a program can begin. • VM provides process (program) relocation. • Virtual address – • Physical address – • Given by CPU For having an access to main memory Address translation – Convert a virtual address to a physical address. – Can easily form the critical path that limits the clock cycle time Types of virtual machine  Paged  Segmented  Paged segment Protection and Examples of VM • Process – • Process (context) switch – • A running program plus any state needed to continue running it. One process is stop execution and another process is brought into execution. Requirements for context switches – Be able to save CPU states for continue execution • – Protect a process from been interfered by another process • • A computer designer’s responsibility OS’s responsibility Computer designers can make protection easily implemented by the OS via VM design. 7. Explain about TLB Translation-Lookaside Buffer (TLB) A cache that keeps track of recently used address mappings to avoid an access to the page table. Page table resides in memory –Each translation requires accessing memory –Might be required for each load/store! TLB –Cache recently used PTEs –speed up translation –typically 128 to 256 entries –usually 4 to 8 way associative –TLB access time is comparable to L1 cache Typically more than once ■ TLB size: 16–512 entries ■ Block size: 1–2 page table entries (typically 4–8 bytes each) ■ Hit time: 0.5–1 clock cycle ■ Miss penalty: 10–100 clock cycles ■ Miss rate: 0.01%–1% 7. Discuss about Accessing I/O Devices 1.Interface to CPU and Memory 2.Interface to one or more peripherals  I/O systems focus on dependability and cost  Processors and Memory focus on performance and cost. I/O systems must also plan for expandability and for diversity of devices, which is not a concern for processors Characteristics:  Behaviour: Input (read once), output (write only, cannot be read), or storage (can be reread and usually rewritten).  Partner: Either a human or a machine is at the other end of the I/O device, either feeding data on input or reading data on output. Data rate: The peak rate at which data can be transferred between the I/O device & the main memory or processor Interface for an IO Device: CPU checks I/O module device status •I/O module returns status •If ready, CPU requests data transfer •I/O module gets data from device •I/O module transfers data to CPU 8. Explain in detail about Direct Memory Access • • • • • • A special control unit may be provided to allow the transfer of large block of data at high speed directly between the external device and main memory , without continous intervention by the processor. This approach is called DMA. DMA transfers are performed by a control circuit called the DMA Controller. To initiate the transfer of a block of words , the processor sends, Ø Starting address Ø Number of words in the block Ø Direction of transfer. When a block of data is transferred , the DMA controller increment the memory address for successive words and keep track of number of words and it also informs the processor by raising an interrupt signal. While DMA control is taking place, the program requested the transfer cannot continue and the processor can be used to execute another program. After DMA transfer is completed, the processor returns to the program that requested the transfer. R/W à Determines the direction of transfer . When R/W =1, DMA controller read data from memory to I/O device. R/W =0, DMA controller perform write operation. Done Flag=1, the controller has completed transferring a block of data and is ready to receive another command. IE=1, it causes the controller to raise an interrupt (interrupt Enabled) after it has completed transferring the block of data. IRQ=1, it indicates that the controller has requested an interrupt. Fig: Use of DMA controllers in a computer system • • • A DMA controller connects a high speed network to the computer bus . The disk controller two disks, also has DMA capability and it provides two DMA channels. To start a DMA transfer of a block of data from main memory to one of the disks, the program write s the address and the word count inf. Into the registers of the corresponding channel of the disk controller. When DMA transfer is completed, it will be recorded in status and control registers of the DMA channel (ie) Done bit=IRQ=IE=1. Cycle Stealing: • • • • Requests by DMA devices for using the bus are having higher priority than processor requests . Top priority is given to high speed peripherals such as , Ø Disk Ø High speed Network Interface and Graphics display device. Since the processor originates most memory access cycles, the DMA controller can be said to steal the memory cycles from the processor. This interviewing technique is called Cycle stealing. Burst Mode: The DMA controller may be given exclusive access to the main memory to transfer a block of data without interruption. This is known as Burst/Block Mode Bus Master: The device that is allowed to initiate data transfers on the bus at any given time is called the bus master. Bus Arbitration: It is the process by which the next device to become the bus master is selected and the bus mastership is transferred to it. Types: There are 2 approaches to bus arbitration. They are, Ø Centralized arbitration ( A single bus arbiter performs arbitration) Ø Distributed arbitration (all devices participate in the selection of next bus master). Centralized Arbitration: • • • • • • Here the processor is the bus master and it may grants bus mastership to one of its DMA controller. A DMA controller indicates that it needs to become the bus master by activating the Bus Request line (BR) which is an open drain line. The signal on BR is the logical OR of the bus request from all devices connected to it. When BR is activated the processor activates the Bus Grant Signal (BGI) and indicated the DMA controller that they may use the bus when it becomes free. This signal is connected to all devices using a daisy chain arrangement. If DMA requests the bus, it blocks the propagation of Grant Signal to other devices and it indicates to all devices that it is using the bus by activating open collector line, Bus Busy (BBSY). Fig:A simple arrangement for bus arbitration using a daisy chain Fig: Sequence of signals during transfer of bus mastership for the devices • • • • The timing diagram shows the sequence of events for the devices connected to the processor is shown. DMA controller 2 requests and acquires bus mastership and later releases the bus. During its tenture as bus master, it may perform one or more data transfer. After it releases the bus, the processor resources bus mastership Distributed Arbitration: It means that all devices waiting to use the bus have equal responsibility in carrying out the arbitration process. Fig:A distributed arbitration scheme • • Each device on the bus is assigned a 4 bit id. When one or more devices request the bus, they assert the Start-Arbitration signal & place their 4 bit ID number on four open collector lines, ARB0 to ARB3. • A winner is selected as a result of the interaction among the signals transmitted over these lines. • The net outcome is that the code on the four lines represents the request that has the highest ID number. • The drivers are of open collector type. Hence, if the i/p to one driver is equal to 1, the i/p to another driver connected to the same bus line is equal to ‘0’(ie. bus the is in low-voltage state). Eg: • • • • • Assume two devices A & B have their ID 5 (0101), 6(0110) and their code is 0111. Each devices compares the pattern on the arbitration line to its own ID starting from MSB. If it detects a difference at any bit position, it disables the drivers at that bit position. It does this by placing ‘0’ at the i/p of these drivers. In our eg. ‘A’ detects a difference in line ARB1, hence it disables the drivers on lines ARB1 & ARB0. This causes the pattern on the arbitration line to change to 0110 which means that ‘B’ has won the contention. 9. Short Notes on Buses 1.A bus protocol is the set of rules that govern the behavior of various devices connected to the bus ie, when to place information in the bus, assert control signals etc. 2. The bus lines used for transferring data is grouped into 3 types. They are, Control signals It also carries timing infn/. (ie) they specify the time at which the processor & I/O devices place the data on the bus & receive the data from the bus. 1. During data transfer operation, one device plays the role of a ‘Master’. 2. Master device initiates the data transfer by issuing read / write command on the bus. Hence it is also called as ‘Initiator’. 3. The device addressed by the master is called as Slave / Target. Types of Buses: There are 2 types of buses. They are, Synchronous Bus:1.In synchronous bus, all devices derive timing information from a common clock line. 2. Equally spaced pulses on this line define equal time. bus cycle’, one data transfer on take place. 1. The ‘crossing points’ indicate the tone at which the patterns change. 2. A ‘signal line’ in an indeterminate / high impedance state is represented by an intermediate half way between the low to high signal levels. Asynchronous Bus:- An alternate scheme for controlling data transfer on. The bus is based on the use of ‘handshake’ between Master & the Slave. The common clock is replaced by two timing control lines. – 10 . Explain briefly about Interrupts 1).When a program enters a wait loop, it will repeatedly check the device status. During this period, the processor will not perform any function. 2). The Interrupt request line will send a hardware signal called the interrupt signal to the processor. 3). On receiving this signal, the processor will perform the useful function during the waiting period. 4). The routine executed in response to an interrupt request is called Interrupt Service Routine. 5). The interrupt resembles the subroutine calls. 6.The processor first completes the execution of instruction i Then it loads the PC(Program Counter) with the address of the first instruction of the ISR. 7. After the execution of ISR, the processor has to come back to instruction i + 1. 8. Therefore, when an interrupt occurs, the current contents of PC which point to i +1 is put in temporary storage in a known location. 9. A return from interrupt instruction at the end of ISR reloads the PC from that temporary storage location, causing the execution to resume at instruction i+1. 10. When the processor is handling the interrupts, it must inform the device that its request has been recognized so that it remove its interrupt requests signal. 11. This may be accomplished by a special control signal called the interrupt acknowledge signal. 12. The task of saving and restoring the information can be done automatically by the processor. 13. The processor saves only the contents of program counter & status register (ie) it saves only the minimal amount of information to maintain the integrity of the program execution. 14.Saving registers also increases the delay between the time an interrupt request is received and the start of the execution of the ISR. This delay is called the Interrupt Latency. the long interrupt latency in unacceptable. processing of certain routines must be accurately timed relative to external events. This application is also called as real-time processing. Interrupt Hardware: 1.A single interrupt request line may be used to serve ‘n’ devices. All devices are connected to the line via switches to ground. 2. To request an interrupt, a device closes its associated switch, the voltage on INTR line drops to 0(zero). 3. If all the interrupt request signals (INTR1 to INTRn) are inactive, all switches are open and the voltage on INTR line is equal to Vdd. 4. When a device requests an interrupts, the value of INTR is the logical OR of the requests from individual devices. Enabling and Disabling Interrupts: 1.The arrival of an interrupt request from an external device causes the processor to suspend the execution of one program & start the execution of another because the interrupt may alter the sequence of events to be executed. INTR is active during the execution of Interrupt Service Routine. There are 3 mechanisms to solve the problem of infinite loop which occurs due to successive interruptions of active INTR signals. The following are the typical scenario. The device raises an interrupt request. The processor interrupts the program currently being executed. device is informed that its request has been recognized & in response, it deactivates the INTR signal. The actions are enabled & execution of the interrupted program is resumed. Edge-triggered: The processor has a special interrupt request line for which the interrupt handling circuit responds only to the leading edge of the signal. Such a line said to be edge-triggered. Handling Multiple Devices: When several devices requests interrupt at the same time, it raises some questions. They are. How can the processor recognize the device requesting an interrupt? Given that the different devices are likely to require different ISR, how can the processor obtain the starting address of the appropriate routines in each case? Should a device be allowed to interrupt the processor while another interrupt is being serviced? How should two or more simultaneous interrupt requests be handled? Polling Scheme: If two devices have activated the interrupt request line, the ISR for the selected device (first device) will be completed & then the second request can be serviced. The simplest way to identify the interrupting device is to have the ISR polls all the encountered -> when a device raises an interrupt requests, the status register IRQ is set to 1. Merit: It is easy to implement. Demerit: The time spent for interrogating the IRQ bits of all the devices that may not be requesting any service. Vectored Interrupt: Here the device requesting an interrupt may identify itself to the processor by sending a special code over the bus & t ing address to ISR. The processor reads this address, called the interrupt vector & loads into PC. The interrupt vector also includes a new value for the Processor Status Register. When the processor is ready to receive the interrupt vector code, it activate the interrupt acknowledge (INTA) line. Interrupt Nesting: Multiple Priority Scheme: In multiple level priority scheme, we assign a priority level to the processor that can be changed under program control. The priority level of the processor is the priority of the program that is currently being executed. The processor accepts interrupts only from devices that have priorities higher than its own. At the time the execution of an ISR for some device is started, the priority of the processor is raised to that of the device. The action disables interrupts from devices at the same level of priority or lower. Privileged Instruction: The processor priority is usually encoded in a few bits of the Processor Status word. It can also be changed by program instruction & then it is write into PS. These instructions are called privileged instruction. This can be executed only when the processor is in supervisor mode. supervisor mode only when executing OS routines. It switches to the user mode before beginning to execute application program. Initiating the Interrupt Process: Load the starting address of ISR in location INTVEC (vectored interrupt). Load the address LINE in a memory location PNTR. The ISR will use this location as a pointer to store the i/p characters in the memory. Enable the keyboard interrupts by setting bit 2 in register CONTROL to 1. Enable interrupts in the processor by setting to 1, the IE bit in the processor status register PS. Exception of ISR: cause the interface circuits to remove its interrupt requests. end of lin interrupt. Exceptions: 1. An interrupt is an event that causes the execution of one program to be suspended and the execution of another program to begin. 2. The Exception is used to refer to any event that causes an interruption. Kinds of exception: Recovery from errors Debugging Privileged Exception Recovery From Errors: .1Computers have error-checking code in Main Memory , which allows detection of errors in the stored data. 2. If an error occurs, the control hardware detects it informs the processor by raising an interrupt. 3. The processor also interrupts the program, if it detects an error or an unusual condition while executing the instance (ie) it suspends the program being executed and starts an execution service routine. 4. This routine takes appropriate action to recover from the error. Debugging: 1. System software has a program called debugger, which helps to find errors in a program. 2. The debugger uses exceptions to provide two important facilities 3. They are Trace Breakpoint Trace Mode: 1. When processor is in trace mode , an exception occurs after execution of every instance using the debugging program as the exception service routine. 2. The debugging program examine the contents of registers, memory location etc. 3. On return from the debugging program the next instance in the program being debugged is executed 4. The trace exception is disabled during the execution of the debugging program. Break point: 1. Here the program being debugged is interrupted only at specific points selected by the user. An instance called the Trap (or) software interrupt is usually provided for this purpose. 2. While debugging the user may interrupt the program execution after instance ‘I’ 3. When the program is executed and reaches that point it examine the memory and register contents. Privileged Exception: 1. To protect the OS of a computer from being corrupted by user program certain instance can be executed only when the processor is in supervisor mode. These are called privileged exceptions. 2. When the processor is in user mode, it will not execute instance (ie) when the processor is in supervisor mode , it will execute instance. 11. Explain I/O Processor A specialized processor, not only loads and stores into memory but also can execute instructions, which are among a set of I/O instructions • The IOP interfaces to the system and devices • The sequence of events involved in I/O transfers to move or operate the results of an I/O operation into the main memory (using a program for IOP, which is also in main memory)Schaum’s Outline of Theory and Problems of Computer Architecture I/O processor: • Used to address the problem of direct transfer after executing the necessary format conversion or other instructions • In an IOP-based system, I/O devices can directly access the memory without intervention by the processor IOP instructions • Instructions help in format conversionsbyte from memory as packed decimals to the output device for line-printer • The I/O device data in different format can be transferred to main memory using an IOP 12. Explain I/O interfacing techniques a. Memory mapped I/O b. I/O mapped I/O Memory mapped I/O The total memory address space is partitioned and part of this space is devoted to I/O addressing When this technique is used, a memory reference instruction that causes data to be fetched from or stored at address specified, automatically becomes an I/O instruction if that address is made the address of an I/O port. Memory Address space I/O address space Total address space Advantages The usual memory related instruction are used for I/O related operations. The special I/O instructions are not required. Disadvantages The memory address space is reduced. I/O mapped I/O Memory Total address space Address space I/O address I/O address space space We do not want to reduce the memory address space, we allot a different I/O address space, apart from total memory space which is called I/O mapped I/O technique. Advantage The advantage is that the full memory address space is available. Disadvantage The memory related instruction do not work. Therefore, processor can only use this mode if it has special instructions for I/O related operations such as I/O read, I/O wrire.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download File - MCE-CSE