Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Multi-core Programming Evolution Based on slides from Intel Software College and Multi-Core Programming – increasing performance through software multi-threading by Shameem Akhter and Jason Roberts, Evolution of Multi-Core Technology Single-threading: Only one task processes at one time. Multitasking and Multithreading: Two or more tasks execute at one time by using content switching (Functionality)). HT Technology: Two single threads execute simultaneously on the same processor core. Multi-core Technology: Computational work of an application is divided and spread over multiple execution cores (Performance). Singlethreading Multithreading HT Technology Multi-core Technology Evolution of Multi-Core Technology 2009/9/3 2 1 Formal Definition of Threads, Processors, and Hyper-Threading in Terms of Resources • Thread – Basic unit of CPU utilization – Program counter, CPU state information, stack • Processor – Architecture state: general purpose CPU registers, interrupt controller registers, caches, buses, execution units, branch prediction logic • Logical Processor – Duplicates the architecture space of processor – Execution can be shared among different processors • HyperThreading – Intel version of Simultaneous Multi-Threading (SMT) – Makes single physical processor appear as multiple logical processors – OS can schedule to logical processors 2009/9/3 3 HyperThreading and Multicore • HyperThreading – Instructions from logical processors are persistent and execute on shared execution resources – Up to microarchitecture how and when to interleave the instructions – When one thread stalls, another can proceed • This includes cache misses, and branch mispredictions • Multicore – Chip multiprocessing • 2 or more execution cores in a single processor • 2 or more processors on a single die • Can share an on-chip cache • Can be combined with SMT 2009/9/3 4 2 • Single Processor • Multiprocessor 2009/9/3 • Hyper-Threading 5 • Multi-core Processor 2009/9/3 6 3 • Multicore Shared Cache • Multicore with HypeThreading Technology 2009/9/3 7 Defining Multi-Core Technology With Multi-core technology: • Two or more complete computational engines are placed in a single processor. CPU State Execution Units • Computers split the computational work of a threaded application and spread it over multiple execution cores. • More tasks get completed in less time increasing the performance and responsiveness of the system. CPU State Interrupt Logic Cache Interrupt Logic Execution Units Cache Processor with Multi-Core Technology Computational Work Core 1 Core 2 Core 3 Computational Work Distributed Between Multiple Cores 2009/9/3 8 4 Evolution to Dual Core Registers * + Registers * + Basic Processor Pipelined Execution 1 Instr. / Cycle 2009/9/3 9 Evolution to Dual Core Registers * + Registers * + + Registers * + + Superscalar Parallel Functional Units 2009/9/3 10 5 Evolution to Dual Core Registers * + Registers Registers * + + * Registers * + + Registers Registers * + * + Dual Core Double the transistors 2009/9/3 11 Scope Registers * Instruction Level – encoding/pipelined execution – increase throughput + Registers * + + Instruction Level – scheduling, superscalar Registers * + Outermost Loops, whole program – data/functional Registers * + 2009/9/3 12 6 HT Technology and Dual-Core Dual Processors Processor w/ HT Log. Log. Proc. Proc. Log. Log. Proc. Proc. Log. Log. Proc. Proc. Log. Log. Proc. Proc. Log. Log. Proc. Proc. Log. Log. Proc. Proc. APIC APIC APIC APIC APIC APIC APIC APIC APIC APIC APIC APIC Execution Execution Core Core Execution Execution Core Core Execution Execution Core Core System Bus APIC: Advanced Programmable Interrupt Controllers: Solve interrupt routing efficiency issues in DC-Processor DC and HT multiprocessor computer systems. Logical Logical Processor Processor Logical Logical Processor Processor Log. Log. Proc. Proc. Log. Log. Log. Log. Proc. Proc. Proc. Proc. Log. Log. Proc. Proc. APIC APIC APIC APIC APIC APIC APIC APIC APIC APIC Execution Execution Core Core Execution Execution Core Core Execution Execution Core Core APIC APIC Execution Execution Core Core System Bus 2009/9/3 13 Hyper-Threading vs. Dual Core Both threads without Multitasking or Hyper-Threading Technology: Both threads with MultiTasking Technology: Both threads with Hyperthreading: Time saved Time saved Both threads with Dual Core processors: Time 2009/9/3 14 7 Driving Greater Parallelism Normalized Performance vs. initial Intel® Pentium® 4 Processor Performance DUAL/MULTIDUAL/MULTI-CORE PERFORMANCE 10X SINGLESINGLE-CORE PERFORMANCE 3X 2008+ 2004 2000 FORECAST 2009/9/3 15 Power Vs. Frequency P ~ 4f2 = 4P0 2f 4x Power Increase f f P = 4P0 f P ~ 2f2 = 2P0 P0 ~ f2 Core f ‘only’ 2x Die/Socket 2009/9/3 16 8 How HT Technology and Dual-Core Work Physical processor cores Logical processors visible to OS Physical processor resource allocation Time Throughput Resource 1 Thread 2 Thread 1 Resource 2 Resource 3 Resource 1 d1 Threa Thre ad 2 + Resource 2 Resource 3 Resource 1 Thread 1 Resource 2 + Resource 3 Resource 1 Thread 2 Resource 2 Resource 3 2009/9/3 17 Today? … Multi-Core Power Cache 4 Large Core C1 3 2 2 1 1 4 4 3 3 2 2 1 1 C2 Cache C3 Power = 1/4 Performance C4 Performance = 1/2 Small Core 1 1 Multi-Core: Power efficient Better power and thermal management 2009/9/3 18 9 The Future?* … Many-core C GP C GP GPC GPC C GP SPC GPC SPC C SP C GP C GP SPC C GP C GP C GP C GP General Purpose Cores Special Purpose HW Interconnect fabric Heterogeneous Multi-Core Platform 2009/9/3 19 Taking full advantage of Multi-Core requires multi-threaded software •Increased responsiveness and worker productivity – Increased application responsiveness when different tasks run in parallel •Improved performance in parallel environments – When running computations on multiple processors •More computation per cubic foot of data center – Web-based apps are often multi-threaded by nature 2009/9/3 20 10 Threading - the most common parallel technique Process •OS creates process for each program loaded •Additional threads can be created within the process – – – – Data Each thread has its own Stack and Instruction Pointer All threads share code and data Single shared address space Thread local storage if required Code thread1() Stack IP Contrast with Distributed address parallelism: Message Passing Interface (MPI) thread2() threadN() Stack IP Stack IP … 2009/9/3 Independent Programs Application Application Application 21 Parallel Programs Application Application Application Threads Process Switch Thread Switch Threads allows one application to utilize the power of multiple processors 2009/9/3 22 11 Multi-Threading on Single-Core v Multi-Core • Optimal application performance on multi-core architectures will be achieved by effectively using threads to partition software workloads – On single core, multi-threading can only be used to hide latency • Ex: rather than block UI when printing spawn a thread, free UI thread to respond to user • Multi-threaded applications running on multi-core platforms have different design considerations than do multi-threaded applications running on single-core platforms – On single core can make assumptions to simplify writing and debugging – These may not be valid on multi-core platforms e.g areas like memory caching and thread priority 2009/9/3 23 Memory Caching Example • Each core may have own cache – May be out of sync – Ex: 2 thread on dual-core reading and writing neighboring memory locations • Data values may be in same cache line because of data locality • Memory system may mark the line as invalid due to write • Result is cache miss even though the data in cache being read is valid – Not an issue on single-core since only one cache 2009/9/3 24 12 Thread Priorities Example • Applications with two threads of different priorities – To improve performance developer assumes that higher thread will run without interference from lower priority thread – On single core valid since scheduler will not yield CPU to lower priority thread – On multi-core may schedule 2 threads on different cores, and they may run simultaneously making code unstable 2009/9/3 25 13