Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 3 Memory Hardware Architecture Example: MSP-430 Types of memory • ROM (read-only memory, non-volatile): – Mask-programmable: programmed by manufactory and not by user, can not be modified – Flash programmable: can be modified, all the memory is flashed and programmed again • RAM (random access memory for writes): – DRAM(dynamic): each bit of data requires separate capacitor and transistor of the integrated circuit, dynamic since must be refreshed periodically otherwise info fades. – SRAM(static): does not need refresh, each bit requires six transistors MSP-430 Memory Map Bits, Bytes and Words in memory Flash / ROM RAM See file “lnk430F2274.xcl” for more details about the memory map. Memories MPC430 – 2274 has the following memory size: •Flash: 32 KB •RAM: 1KB Peripherals • Are connected to CPU through data, control and address buses using instruction set. Consist of: 1. Clock system – used by CPU and peripherals. 2. Brownout – provides internal reset signal during power off/on. 3. Digital four 8-bit I/O ports: • • Any combination of input, output and interrupt condition is possible. Read/write to port control registers are supported by all instructions. Watchdog timer • Watchdog timer is periodically reset by system timer. • If watchdog is not reset, it generates an interrupt to reset the host. interrupt host CPU reset watchdog timer 4. Watchdog timer: • Primary function is to perform system reset after a software problem occurs. • If the selected time interval is expired, a systems reset is generated. • If watchdog function is not needed in the application, it can perform secondary function: can be configured as interval timer and can generate interrupts after certain time interval. 5. Timer_a3, Timer_b3: 16 bit timer/counter with three capture/compare registers. Interrupts may be generated from counter overflow condition and from each of the capture/compare registers. 6. 7. 8. Peripherals See msp430x22x4.h file See msp430x22x4.h file CPU RISC vs. CISC • Complex instruction set computer (CISC): – many addressing modes; – many operations. • Reduced instruction set computer (RISC): – load/store; – pipelinable instructions. • For code efficiency, it is better to use CISC. • RISC processor designed for speed and not for code size efficiency. • CISC designed for code size efficiency since connects to slow devices. • Compressed instruction set is stored. CPU block diagram Buses • The system interconnect using Memory Address Bus (MAB) and Memory Data Bus (MDB) Generic bus structure m • Address: n • Data: • Control: c Fixed-delay memory access read = 1 adrs = A R/W data R/W R reg = data adrs mem[adrs] = data data = mem[adrs] memory CPU W Variable-delay memory access read = 1 adrs = A R/W done = 0 data R/W R done n y reg = data W mem[adrs] = data done = 1 adrs data = mem[adrs] done = 1 done memory CPU Overheads for Computers as Components © 2000 Morgan Kaufman Memory Management Unit (MMU) This is a computer hardware component responsible for handling accesses to memory requested by the CPU. MMU functions are: • Translation of virtual addresses to physical addresses • Memory protection • Cache control • Bus arbitration • Bank switching Translation of virtual addresses Memory management units • Memory management unit (MMU) translates addresses: logical address CPU memory management unit physical address main memory Memory management tasks • Allows programs to move in physical memory during execution. In past used to compensate on limited address space. Today the memory is cheaper, and physical memory can be used without logical memory. • Allows virtual memory: – memory images kept in secondary storage; – images returned to main memory on demand during execution. Address translation • Requires some sort of register/table to allow arbitrary mappings of logical to physical addresses. • Two basic schemes: – segmented; segment is a large arbitrarily size section of memory – paged; page is a small fixed size section of memory • Segmentation and paging can be combined Segments and pages page 1 page 2 segment 1 memory segment 2 Segment address translation segment base address logical address + segment lower bound segment upper bound range check physical address range error Page address translation page offset page i base concatenate page offset Page table organizations page descriptor page descriptor flat tree MMU address translation • MMU divides the virtual address space (the range of addresses used by the processor) into pages, each having a size which is a power of 2, usually a few kilobytes, but they may be much larger. • The bottom n bits of the address (the offset within a page) are left unchanged. • The upper address bits are the (virtual) page number. MMU address translation • The MMU normally translates virtual page numbers to physical page numbers via an associative cache called a TLB. • When the TLB lacks a translation, a slower mechanism involving hardware-specific data structures or software assistance is used. • The data found in such data structures are typically called page table entries (PTEs), and the data structure itself is typically called a page table. • The physical page number is combined with the page offset to give the complete physical address. MMU cache A TLB entry may also include information about whether the page has been written to (the dirty bit), when it was last used (the accessed bit, for a least recently used page replacement algorithm), what kind of processes (user mode, supervisor mode) may read and write it, and whether it should be cached. Page fault • MMU keeps track of which logical addresses actually reside in the main memory and those that are kept in secondary storage. • When CPU requests an address not in main memory, MMU generates an exception page fault. • The exception handler reads the location from secondary storage into main memory. • For that some other location (usually LRU) is moved from main memory to secondary storage. MMU • Sometimes, a TLB entry or PTE prohibits access to a virtual page, perhaps because no physical RAM has been allocated to that virtual page. • In this case the MMU signals a page fault to the CPU. • OS tries to find a spare frame of RAM and set up a new PTE to map it to the requested virtual address. If no RAM is free, it may be necessary to choose an existing page, using some replacement algorithm, and save it to disk (this is called "paging"). With some MMUs, there can also be a shortage of PTEs or TLB entries, in which case the OS will have to free one for the new mapping. • In some cases a "page fault" may indicate a software bug. A key benefit of an MMU is memory protection: an OS can use it to protect against errant programs, by disallowing access to memory that a particular program should not have access to. Typically, an OS assigns each program its own virtual address space. • An MMU also reduces the problem of fragmentation of memory. After blocks of memory have been allocated and freed, the free memory may become fragmented (discontinuous) so that the largest contiguous block of free memory may be much smaller than the total amount. With virtual memory, a contiguous range of virtual addresses can be mapped to several non-contiguous blocks of physical memory. Caching address translations • Large translation tables require main memory access. • TLB(translation look aside buffer): cache for address translation. – Typically small. Example of memory management • Memory region types: – section: 1 Mbyte block; – large page: 64 kbytes; – small page: 4 kbytes. • An address is marked as section-mapped or page-mapped. • Two-level translation scheme. Example of address translation Translation table base register descriptor 1st level table 1st index 2nd index offset concatenate concatenate descriptor 2nd level table physical address Zero-copy • Describes computer operations in which the CPU does not perform the task of copying data from one memory area to another. • Zero-copy versions of operating system elements such as device drivers, file systems, and network protocol stacks greatly increase the performance of certain application programs and more efficiently utilize system resources. • Performance is enhanced by allowing the CPU to move on to other tasks while data copies proceed in parallel in another part of the machine. • Also, zero-copy operations reduce the number of time-consuming mode switches between user space and kernel space. • System resources are utilized more efficiently since using a sophisticated CPU to perform extensive copy operations, which is a relatively simple task, is wasteful if other simpler system components can do the copying. Zero-copy • Techniques for creating zero-copy software include the use of DMA-based copying and memory-mapping through an MMU. These features require specific hardware support and usually involve particular memory alignment requirements. • Zero-copy protocols have some initial overhead, so avoiding programmed IO (PIO) makes sense only for large messages. Zero-copy • Zero-copy protocols are especially important for high-speed networks in which the capacity of a network link approaches or exceeds the CPU's processing capacity. • In such a case the CPU spends nearly all of its time copying transferred data, and thus becomes a bottleneck which limits the communication rate to below the link's capacity. Direct memory access (DMA) • DMA provides parallelism on bus by controlling transfers without CPU. I/O memory CPU DMA DMA • Peripheral device controls a CPU memory bus directly. • DMA permits the peripheral, (eg. UART), to transfer data to/from memory without having each byte handled by the CPU • DMA advantages: – enables more efficient use of interrupts – increases data throughput – reduces hardware costs by eliminating the need for peripheral specific FIFO buffers DMA operation • On some event (such as an incoming data-available signal from a UART), CPU notifies a separate device called DMA Controller. • DMA Controller asserts DMA request signal to the CPU, asking its permission to use the bus • CPU completes its current bus activity and returns DMA acknowledge signal to the DMA Controller. • DMA controller reads/writes one or more memory bytes, driving the address, data and control signals as if it were CPU itself • When complete, DMA Controller stops driving the bus and deasserts DMA request signal • CPU removes DMA acknowledge signal and resume control of the bus DMA operation • CPU sets up DMA transfer: – – – – Start address. Length. Transfer block length. Style of transfer. • DMA controller performs transfer, signals when done • DMA is essential to provide zero-copy implementation Remote DMA • RDM is a direct memory access from the memory of one computer into that of another without involving either one's operating system. This permits high-throughput, low-latency networking, which is especially useful in massively parallel computer clusters. • RDMA supports zero-copy networking by enabling the network adapter to transfer data directly to or from application memory, eliminating the need to copy data between application memory and the data buffers in the operating system. • Such transfers require no work to be done by CPUs, caches, or context switches, and transfers continue in parallel with other system operations. • When an application performs an RDMA Read or Write request, the application data is delivered directly to the network, reducing latency and enabling fast message transfer.