Download Multi-core Programming Evolution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Multi-core Programming
Evolution
Based on slides from Intel Software College
and
Multi-Core Programming –
increasing performance through software multi-threading
by Shameem Akhter and Jason Roberts,
Evolution of Multi-Core Technology
Single-threading:
Only one task processes at one time.
Multitasking and Multithreading:
Two or more tasks execute at one time by
using content switching (Functionality)).
HT Technology:
Two single threads execute simultaneously
on the same processor core.
Multi-core Technology:
Computational work of an application is
divided and spread over multiple execution
cores (Performance).
Singlethreading
Multithreading
HT
Technology
Multi-core
Technology
Evolution of Multi-Core
Technology
2009/9/3
2
1
Formal Definition of Threads, Processors,
and Hyper-Threading in Terms of Resources
• Thread
– Basic unit of CPU utilization
– Program counter, CPU state information, stack
• Processor
– Architecture state: general purpose CPU registers, interrupt controller
registers, caches, buses, execution units, branch prediction logic
• Logical Processor
– Duplicates the architecture space of processor
– Execution can be shared among different processors
• HyperThreading
– Intel version of Simultaneous Multi-Threading (SMT)
– Makes single physical processor appear as multiple logical processors
– OS can schedule to logical processors
2009/9/3
3
HyperThreading and Multicore
• HyperThreading
– Instructions from logical processors are persistent and execute on
shared execution resources
– Up to microarchitecture how and when to interleave the instructions
– When one thread stalls, another can proceed
• This includes cache misses, and branch mispredictions
• Multicore
– Chip multiprocessing
• 2 or more execution cores in a single processor
• 2 or more processors on a single die
• Can share an on-chip cache
• Can be combined with SMT
2009/9/3
4
2
• Single Processor
• Multiprocessor
2009/9/3
• Hyper-Threading
5
• Multi-core Processor
2009/9/3
6
3
• Multicore Shared Cache
• Multicore with HypeThreading Technology
2009/9/3
7
Defining Multi-Core Technology
With Multi-core technology:
• Two or more complete computational
engines are placed in a single
processor.
CPU State
Execution
Units
• Computers split the computational
work of a threaded application and
spread it over multiple execution
cores.
• More tasks get completed in less time
increasing the performance and
responsiveness of the system.
CPU State
Interrupt Logic
Cache
Interrupt Logic
Execution
Units
Cache
Processor with Multi-Core Technology
Computational Work
Core 1
Core 2
Core 3
Computational Work Distributed Between
Multiple Cores
2009/9/3
8
4
Evolution to Dual Core
Registers
*
+
Registers
*
+
Basic Processor
Pipelined Execution
1 Instr. / Cycle
2009/9/3
9
Evolution to Dual Core
Registers
*
+
Registers
*
+
+
Registers
* +
+
Superscalar
Parallel Functional Units
2009/9/3
10
5
Evolution to Dual Core
Registers
*
+
Registers
Registers
*
+
+
*
Registers
*
+
+
Registers
Registers
*
+
*
+
Dual Core
Double the transistors
2009/9/3
11
Scope
Registers
*
Instruction Level – encoding/pipelined execution – increase
throughput
+
Registers
*
+
+
Instruction Level – scheduling, superscalar
Registers
*
+
Outermost Loops, whole program – data/functional
Registers
*
+
2009/9/3
12
6
HT Technology and Dual-Core
Dual Processors
Processor w/ HT
Log.
Log.
Proc.
Proc.
Log.
Log.
Proc.
Proc.
Log.
Log.
Proc.
Proc.
Log.
Log.
Proc.
Proc.
Log.
Log.
Proc.
Proc.
Log.
Log.
Proc.
Proc.
APIC
APIC
APIC
APIC
APIC
APIC
APIC
APIC
APIC
APIC
APIC
APIC
Execution
Execution
Core
Core
Execution
Execution
Core
Core
Execution
Execution
Core
Core
System
Bus
APIC: Advanced Programmable
Interrupt
Controllers:
Solve interrupt routing efficiency issues in
DC-Processor
DC and HT
multiprocessor
computer systems.
Logical
Logical
Processor
Processor
Logical
Logical
Processor
Processor
Log.
Log.
Proc.
Proc.
Log.
Log. Log.
Log.
Proc.
Proc. Proc.
Proc.
Log.
Log.
Proc.
Proc.
APIC
APIC
APIC
APIC
APIC
APIC
APIC
APIC
APIC
APIC
Execution
Execution
Core
Core
Execution
Execution
Core
Core
Execution
Execution
Core
Core
APIC
APIC
Execution
Execution
Core
Core
System Bus
2009/9/3
13
Hyper-Threading vs. Dual Core
Both threads without Multitasking or Hyper-Threading Technology:
Both threads with MultiTasking Technology:
Both threads with Hyperthreading:
Time saved
Time saved
Both threads with Dual Core processors:
Time
2009/9/3
14
7
Driving Greater Parallelism
Normalized Performance vs. initial Intel® Pentium® 4 Processor
Performance
DUAL/MULTIDUAL/MULTI-CORE
PERFORMANCE
10X
SINGLESINGLE-CORE
PERFORMANCE
3X
2008+
2004
2000
FORECAST
2009/9/3
15
Power Vs. Frequency
P ~ 4f2 = 4P0
2f
4x Power
Increase
f
f
P = 4P0
f
P ~ 2f2 = 2P0
P0 ~ f2
Core
f
‘only’
2x
Die/Socket
2009/9/3
16
8
How HT Technology and Dual-Core Work
Physical
processor
cores
Logical processors
visible to OS
Physical processor
resource allocation
Time
Throughput
Resource 1
Thread 2 Thread 1
Resource 2
Resource 3
Resource 1
d1
Threa
Thre
ad 2
+
Resource 2
Resource 3
Resource 1
Thread 1
Resource 2
+
Resource 3
Resource 1
Thread 2
Resource 2
Resource 3
2009/9/3
17
Today? … Multi-Core
Power
Cache
4
Large Core
C1
3
2
2
1
1
4
4
3
3
2
2
1
1
C2
Cache
C3
Power = 1/4
Performance
C4
Performance = 1/2
Small
Core
1
1
Multi-Core:
Power efficient
Better power and
thermal management
2009/9/3
18
9
The Future?* … Many-core
C
GP
C
GP
GPC
GPC
C
GP
SPC
GPC
SPC
C
SP
C
GP
C
GP
SPC
C
GP
C
GP
C
GP
C
GP
General Purpose Cores
Special Purpose HW
Interconnect fabric
Heterogeneous Multi-Core Platform
2009/9/3
19
Taking full advantage of Multi-Core requires
multi-threaded software
•Increased responsiveness and worker productivity
– Increased application responsiveness when different tasks run in parallel
•Improved performance in parallel environments
– When running computations on multiple processors
•More computation per cubic foot of data center
– Web-based apps are often multi-threaded by nature
2009/9/3
20
10
Threading - the most common parallel technique
Process
•OS creates process for each program loaded
•Additional threads can be created within the
process
–
–
–
–
Data
Each thread has its own Stack and Instruction Pointer
All threads share code and data
Single shared address space
Thread local storage if required
Code
thread1()
Stack
IP
Contrast with Distributed address
parallelism: Message Passing
Interface (MPI)
thread2()
threadN()
Stack
IP
Stack
IP
…
2009/9/3
Independent
Programs
Application
Application
Application
21
Parallel
Programs
Application
Application
Application Threads
Process
Switch
Thread
Switch
Threads allows one application to utilize the
power of multiple processors
2009/9/3
22
11
Multi-Threading on Single-Core v Multi-Core
• Optimal application performance on multi-core architectures will be
achieved by effectively using threads to partition software
workloads
– On single core, multi-threading can only be used to hide latency
• Ex: rather than block UI when printing spawn a thread, free UI
thread to respond to user
• Multi-threaded applications running on multi-core platforms have
different design considerations than do multi-threaded applications
running on single-core platforms
– On single core can make assumptions to simplify writing and debugging
– These may not be valid on multi-core platforms e.g areas like memory
caching and thread priority
2009/9/3
23
Memory Caching Example
• Each core may have own cache
– May be out of sync
– Ex: 2 thread on dual-core reading and writing neighboring memory
locations
• Data values may be in same cache line because of data locality
• Memory system may mark the line as invalid due to write
• Result is cache miss even though the data in cache being read is
valid
– Not an issue on single-core since only one cache
2009/9/3
24
12
Thread Priorities Example
• Applications with two threads of different priorities
– To improve performance developer assumes that higher thread will run
without interference from lower priority thread
– On single core valid since scheduler will not yield CPU to lower priority
thread
– On multi-core may schedule 2 threads on different cores, and they may
run simultaneously making code unstable
2009/9/3
25
13