Download ppt - Compilers Creating Custom Processors

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Parallelizing Sequential Applications on
Commodity Hardware Using a Low-Cost
Software Transactional Memory
Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu,
Scott Mahlke
Advanced Computer Architecture Lab.
University of Michigan
1
University of Michigan
Electrical Engineering and Computer Science
Multicore Architectures
• Industry wide move to multicore
– Higher throughput
– More power efficient
• Great for parallel programs
• Sequential see little benefit
Intel 4 Core Nehalem
AMD 4 Core Shanghai
Sun Niagara 2
2
IBM Cell
University of Michigan
Electrical Engineering and Computer Science
Loop Parallelization
Parallelizable loop
i = 0-39
i = 0-19
i = 20-39
Bad news: limited number of parallel
No cross-iteration register
Core 1
Core 0
loopsdependences
in general purpose applications
or memory
[Zhong ‘08]
3
University of Michigan
Electrical Engineering and Computer Science
Loop Parallelization
SPECfp
[Zhong ‘08]
4
University of Michigan
Electrical Engineering and Computer Science
Speculative Loop Parallelization
Speculatively
parallelizable loop
Loop Chunk
i = 0-9
Pointer?
i = 10-19
Pointer?
i = 20-29
Pointer?
i = 30-39
Pointer?
i = 0-39
Pointer?
Memory address is
unresolvable statically
Core 0
5
Core 1
University of Michigan
Electrical Engineering and Computer Science
Speculative Loop Parallelization
6
University of Michigan
Electrical Engineering and Computer Science
Supporting Thread Level Speculation
• Execution of speculative loops requires
– Conflict detection
– Rollback mechanism
• Speculation can be supported by
transactional memory
– Software is slow
– Hardware needs complex structures
• Previous TLS works require hardware
– Hydra [Hammond ‘98], Stampede [Steffan ‘98],
POSH [Liu ‘06]
7
University of Michigan
Electrical Engineering and Computer Science
Objectives
• Challenge
– Can we get speedup supporting speculative loop
parallelization without additional hardware?
• Build a specialized software system
– Provide functionality needed for speculation with
software transactional memory
– Leverage existing loop parallelization framework
from [Zhong ‘08]
– Tightly couple STM with compiler to ensure low
overhead
8
University of Michigan
Electrical Engineering and Computer Science
Traditional STM Execution Flow
Execution Transaction
Execute TX
TX Commit
RdSet WrSet
Consistency
Check
Start TX
Abort
Writeback
WrSet to
Memory
End TX
Commit
High Overhead:
Validating RdSet
High Overhead:
Global Locking
9
University of Michigan
Electrical Engineering and Computer Science
Ordering Transaction Commit
• TMs typically have no way
of controlling commit order
• Loop iterations must
commit in original order
– Ensures proper rollback
• Requires centralized
control to enforce ordering
10
TX 1
i = 0-9
TX 3
TX 2
i = 10-19
TX 4
i = 20-29
i = 30-39
Core 0
Core 1
University of Michigan
Electrical Engineering and Computer Science
STMlite
• Dedicated thread to control commits
– Called the Transaction Commit Manager (TCM)
– Performs consistency checks for all transactions
– Provides point to easily enforce in-order commit
• Bloom-filter based signatures
– Hash read and write sets
– Similar technique used by HTMs like Bulk [Ceze ‘06]
– Low-cost consistency checks during commit
11
University of Michigan
Electrical Engineering and Computer Science
Bloom-Filter Based Signatures
Address
101
100
010
Decode
Signature
(Bit array)
0 0 1
0 0 0 0 0 0
0 0 0 1
0 0 1
0 0 0
• Constant time insertion and find
• Linear time intersection (bitwise OR)
12
University of Michigan
Electrical Engineering and Computer Science
STMlite Execution Flow
Execution Transaction
Execute TX
TX Commit
RdSig WrSet
WrSig
RdSet
Consistency
Flag Ready
Check
Start TX
Ready
Abort
Wait for
Ready
Writeback
WrSet to
Memory
End TX
Abort Commit
Commit
Consistency
Check
13
Transaction
Commit
Manager
(TCM)
University of Michigan
Electrical Engineering and Computer Science
Experimental Setup
• Implemented framework in LLVM Compiler
• Benchmarks
– Stanford STAMP transactional benchmarks
– SPECfp benchmarks
• Run on Sunfire T2000
– 8-core UltraSPARC T1 processor
• Baseline STM is Sun’s TL2 [Dice ‘06]
14
University of Michigan
Electrical Engineering and Computer Science
STAMP Benchmarks
15
University of Michigan
Electrical Engineering and Computer Science
SPECfp Benchmarks
16
University of Michigan
Electrical Engineering and Computer Science
Conclusion
• STMlite
–
–
–
–
Customized for speculative loop parallelization
Transaction commit ordering
Centralized consistency checks
Hashing read/write sets with signatures
• Parallelization of sequential applications
is feasible on commodity hardware
– Removes much of the slowdown traditionally
associated with STM
17
University of Michigan
Electrical Engineering and Computer Science
Thank You!
Questions?
18
University of Michigan
Electrical Engineering and Computer Science
Transaction Execution and Commit
Executing
Writeback
Waiting
Checking
Waiting
Transaction
Start
End
RdSig WrSig
Transaction Commit
Manager (TCM)
Ready
Commit
Commit Log
End
End
End
WrSig
WrSig
WrSig
• Stale entries periodically removed from
commit log
19
University of Michigan
Electrical Engineering and Computer Science
Related documents