Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 18 Multicore Computers Vy Luong Multicore Computers Chip multiprocessor: combines two or more processors (cores) on a die. Each core consists of: registers, ALU, pipeline hardware, control unit and L1 cache. Hardware Performance Issues Goal: increase instruction-level parallelism. Superscalar Replicate execution resources enabling parallel execution of instructions in parallel pipelines. Simultaneous multithreading (SMT) Duplicate register banks so that multiple threads can share the use of pipeline resources. Problem: Managing multiple threads and power consumption. Why Multicore? Control power density by using more of the chip area for cache memory (instead of logic transistors). Near linear performance improvement. Applications That Benefit From Multicore Systems Servers Multithreaded native applications Multiprocess applications Java applications Multiinstance applications Valve Game Software Valve Reprogrammed Source engine software to use multithreading to exploit the power of multicore processor chips from Intel and AMD. Twice the performance with coarse threading. Hybrid threading approach (combine coarse with finegrained threading). Scene-rendering lists for multiple scenes in parallel (and other graphic-related simulation). Multicore Organization Variables in a multicore organization: Number of core processors on the chip Number of levels of cache memory Amount of cache memory that is shared Superscalar or SMT? Intel Core Duo: individual cores are superscalar. Intel Core i7: Implement SMT cores. Advantages: scales up the number of hardware threads that the system supports. Multicore system with four cores (and SMT) that supports four simultaneous threads in each core, on the application level, appears the same as 16 cores. SMT appears to be more attractive than superscalar. Intel Core Duo Introduced in 2006. Two x86 superscalar processors. Separate thermal control units. Advanced Programming Interrupt Controller Intel Core i7 Introduced in November 2008. Four x86 SMT processors DDR3 QuickPath interconnect ARM11 MPCore Can be configured with up to four processors. DIC Timer Watchdog Interrupt Handling support between 0 and 255 hardware interrupt inputs. maintains a list of interrupts, showing their priority and status. DIC satisfies two requirements: Routing an interrupt request to a single CPU or CPUs, as required. Provide interprocessor communication so a thread on one CPU can cause activity by a thread on another CPU. Interrupts: Inactive - processed by that CPU but pending or active in some CPUs to which it is targeted. Pending – asserted but processing has not started. Active – started but processing is not completed. Cache Coherency Snoop unit control (SCU): resolve bottlenecks related to access to shared data. The SCU introduces three types of optimization: direct data intervention enables copying clean data from one CPU L1 data cache to another CPU L1 data cache without accessing external memory. duplicated tag RAMs duplicated versions of L1 tag RAMs used by the SCU to check for data availability before sending coherency commands to the relevant CPUs. migratory lines enables moving dirty data from one CPU to another without writing to L2 and reading the data back in from external memory.