Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Parallelizing Sequential Applications on Commodity Hardware Using a Low-Cost Software Transactional Memory Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu, Scott Mahlke Advanced Computer Architecture Lab. University of Michigan 1 University of Michigan Electrical Engineering and Computer Science Multicore Architectures • Industry wide move to multicore – Higher throughput – More power efficient • Great for parallel programs • Sequential see little benefit Intel 4 Core Nehalem AMD 4 Core Shanghai Sun Niagara 2 2 IBM Cell University of Michigan Electrical Engineering and Computer Science Loop Parallelization Parallelizable loop i = 0-39 i = 0-19 i = 20-39 Bad news: limited number of parallel No cross-iteration register Core 1 Core 0 loopsdependences in general purpose applications or memory [Zhong ‘08] 3 University of Michigan Electrical Engineering and Computer Science Loop Parallelization SPECfp [Zhong ‘08] 4 University of Michigan Electrical Engineering and Computer Science Speculative Loop Parallelization Speculatively parallelizable loop Loop Chunk i = 0-9 Pointer? i = 10-19 Pointer? i = 20-29 Pointer? i = 30-39 Pointer? i = 0-39 Pointer? Memory address is unresolvable statically Core 0 5 Core 1 University of Michigan Electrical Engineering and Computer Science Speculative Loop Parallelization 6 University of Michigan Electrical Engineering and Computer Science Supporting Thread Level Speculation • Execution of speculative loops requires – Conflict detection – Rollback mechanism • Speculation can be supported by transactional memory – Software is slow – Hardware needs complex structures • Previous TLS works require hardware – Hydra [Hammond ‘98], Stampede [Steffan ‘98], POSH [Liu ‘06] 7 University of Michigan Electrical Engineering and Computer Science Objectives • Challenge – Can we get speedup supporting speculative loop parallelization without additional hardware? • Build a specialized software system – Provide functionality needed for speculation with software transactional memory – Leverage existing loop parallelization framework from [Zhong ‘08] – Tightly couple STM with compiler to ensure low overhead 8 University of Michigan Electrical Engineering and Computer Science Traditional STM Execution Flow Execution Transaction Execute TX TX Commit RdSet WrSet Consistency Check Start TX Abort Writeback WrSet to Memory End TX Commit High Overhead: Validating RdSet High Overhead: Global Locking 9 University of Michigan Electrical Engineering and Computer Science Ordering Transaction Commit • TMs typically have no way of controlling commit order • Loop iterations must commit in original order – Ensures proper rollback • Requires centralized control to enforce ordering 10 TX 1 i = 0-9 TX 3 TX 2 i = 10-19 TX 4 i = 20-29 i = 30-39 Core 0 Core 1 University of Michigan Electrical Engineering and Computer Science STMlite • Dedicated thread to control commits – Called the Transaction Commit Manager (TCM) – Performs consistency checks for all transactions – Provides point to easily enforce in-order commit • Bloom-filter based signatures – Hash read and write sets – Similar technique used by HTMs like Bulk [Ceze ‘06] – Low-cost consistency checks during commit 11 University of Michigan Electrical Engineering and Computer Science Bloom-Filter Based Signatures Address 101 100 010 Decode Signature (Bit array) 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 • Constant time insertion and find • Linear time intersection (bitwise OR) 12 University of Michigan Electrical Engineering and Computer Science STMlite Execution Flow Execution Transaction Execute TX TX Commit RdSig WrSet WrSig RdSet Consistency Flag Ready Check Start TX Ready Abort Wait for Ready Writeback WrSet to Memory End TX Abort Commit Commit Consistency Check 13 Transaction Commit Manager (TCM) University of Michigan Electrical Engineering and Computer Science Experimental Setup • Implemented framework in LLVM Compiler • Benchmarks – Stanford STAMP transactional benchmarks – SPECfp benchmarks • Run on Sunfire T2000 – 8-core UltraSPARC T1 processor • Baseline STM is Sun’s TL2 [Dice ‘06] 14 University of Michigan Electrical Engineering and Computer Science STAMP Benchmarks 15 University of Michigan Electrical Engineering and Computer Science SPECfp Benchmarks 16 University of Michigan Electrical Engineering and Computer Science Conclusion • STMlite – – – – Customized for speculative loop parallelization Transaction commit ordering Centralized consistency checks Hashing read/write sets with signatures • Parallelization of sequential applications is feasible on commodity hardware – Removes much of the slowdown traditionally associated with STM 17 University of Michigan Electrical Engineering and Computer Science Thank You! Questions? 18 University of Michigan Electrical Engineering and Computer Science Transaction Execution and Commit Executing Writeback Waiting Checking Waiting Transaction Start End RdSig WrSig Transaction Commit Manager (TCM) Ready Commit Commit Log End End End WrSig WrSig WrSig • Stale entries periodically removed from commit log 19 University of Michigan Electrical Engineering and Computer Science