Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 11 System Performance Enhancement Basic Operation of a Computer Program is loaded into memory Instruction is fetched from memory Operands are decoded and required data fetched from specified location (using addressing mode built into instruction) Operation corresponding to instruction is executed Additional operand determines return location for the result of operation Performance CPU performs program instructions via a sequence of fetch-execute cycles Note: F-E Cycle consists of many phases Performance is degraded by delays in memory accesses - Performance Enhancement RISC Architecture - Reduced Instruction Set Computing - Simple instructions: easier to decode and to run in parallel - Limited memory access - only load and store - Many registers and compilers that optimize their use Performance Enhancement Pipelining - Overlap processing of instructions so that more than one instruction is being worked on at a given time While one instruction is fetching, another may be executing So pipelining performs Fetch - Execute phases in parallel NOTE: Only 1 instruction at a time is actually being executed to completion Objective: start and finish one instruction per clock cycle: CPI = 1 Pipelining - Fig. 10.23 Performance Enhancement SuperScalar Design - start and finish more than one instruction per clock cycle: CPI < 1 Executes several operations at once Hardware duplication to support parallelism CPU may have instruction fetch unit and several execution units operating in parallel Hardware schedules instructions to exploit parallelism Other Means of Improving Performance Multiprocessing Faster Clock Speed Wider Instructions and Data Paths Longer Registers Faster Disk Access Memory Enhancements Multiprocessing Increase number of processors Multiprocessors - computers that have multiple CPUs within a single system, sharing memory and I/O devices Typically, 2-4 processors Tightly coupled system Typical Multiprocessing System Symmetrical Multiprocessing (SMP) Systems Each CPU operates independently Each CPU has access to all the system resources (memory and I/O) Any CPU can respond to an interrupt A program in memory can be executed by any CPU Each CPU has identical access to OS Each CPU performs its own dispatch scheduling that is, determining what program will execute Very controlled environment - CPUs, memory, I/O devices, and OS are designed to operate together and communication is built into the system Increase Clock Speed Faster clock speeds impact overall speed of the system since instruction cycle time is proportional to clock speed Limitation - ability of CPU, busses, and other components to keep up Wider Instruction and Data Paths Ability to process more bits at a time improves performance CPU can fetch or store more data in a single operation CPU can fetch more instructions at a time Memory accesses are slow compared to CPU operations, so improves performance Longer Registers Longer registers (# of bits) within CPU reduces number of program steps to complete a calculation Example - Using 16-bit registers for 64-bit addition requires 4 additions plus steps to handle carries between registers and 4 moves to transfer result to memory With 64-bit registers only a single addition and single move to memory via wider internal bus Faster Disk Access Small improvements in disk access can have significant improvement in system performance Approach - data distributed among multiple devices so data can be accessed simultaneously from different devices Manufacturers continue to produce disk drives that are smaller and more densely packed Larger/Faster Memory Increased amounts of memory provide larger buffers that can be used to hold data and programs transferred from I/O devices Reduces number of disk accesses Faster memory reduces number of wait states that must be inserted into the instruction cycle when memory access takes place Memory access time can be reduced via RISC architecture - more registers - and by providing wider memory data paths (8 bytes) Memory DRAM - Dynamic RAM - inexpensive memory, requires less electrical power, and more compact with more bits of memory in single integrated circuit. Requires periodic refreshing. SRAM -Static RAM - 2-3 times faster, but more expensive and requires more chips Impractical to use SRAM memory Solution - Cache Memory Cache Memory Cache Memory Cache memory is organized into blocks of 8-16 bytes each Block holds exact copy of data stored in main memory Each block has a tag that identifies location of data in main memory contained in the block 64KB of cache => 8,192 blocks of data CPU request for memory is handled by Cache Controller that checks tags for desired location Hit => data in Cache Miss => not present Read => transfer data from Cache to CPU and Write => store data with tag in Cache memory If Miss, data is copied from memory to Cache Cache Illustration Cache Illustration Cache Situations Full Cache and Memory Write: LRU - Least Recently Used Algorithm replace block that has not been accessed for the longest Suppose block to be replaced has been altered - first write block to memory before replacement Cache controller manages entire cache operation. CPU is unaware of Cache presence. Why does Cache work? Locality of Reference - Empirical studies show that most well written programs confine memory references to a few small regions of memory - e.q. sequential instructions or loops or small procedure or array data. Hit-to-Miss ratios of 90%. Two-Level Cache System