Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Transactional Memory An Overview of Hardware Alternatives David A. Wood University of Wisconsin Transactional Memory Workshop April 8th, 2005 What’s database got to do with it? Atomicity Consistency Correct at begin and end All (or some) memory ops, not just database objects Isolation All updates, or none Partial work not visible Inputs stay stable Durability Survive “system” failures October 21, 2004 Despite increasing awareness of failures Thread-Level Transactional Memory 2 801 Database Storage Lock bits on virtual memory 128 byte granularity Added to pagetable and TLB Caches user’s lock state Trap on lock conflict No h/w for logging, abort, etc. Only uniprocessors Memory PPN TLB 801 and RS/6000 Was this transactional memory? October 21, 2004 Tid, Lock bits Tid Thread-Level Transactional Memory CPU 3 SQL/801 “The development of SQL/801 was greatly simplified because, with minor exceptions, it considers only a single user. It achieves multiuser concurrency [on a uniprocessor] by running in multiple processes using the shared database storage….” Chang and Mergen, ’88 Largest transactional memory application Only real hardware transactional memory implementation No one seems to be looking at what they learned October 21, 2004 Thread-Level Transactional Memory 4 Basic Transactional Mechanisms Isolation Version management Detect when transactions conflict Track read and write sets Record new and old values Atomicity Commit new values Abort back to old values October 21, 2004 Thread-Level Transactional Memory 5 H/W Transactional Memory Systems Knight’s Lisp Work Transactional Memory Oklahoma Update SLE/TLR Transactional Coherence and Consistency Unbounded TM Virtual TM Thread-level TM October 21, 2004 Thread-Level Transactional Memory 6 Knight’s Lisp Work [’86] Parallel execution of sequential code Break program into “transaction blocks” Execute transactions in parallel Multiple loads in a transaction Exactly one store ends the transaction No register state passed between transactions Track dependences (i.e., read set) Abort and restart on conflicting write Transactions commit in sequential order October 21, 2004 Broadcast writes on commit Thread-Level Transactional Memory 7 Knight’s Hardware Two caches Dependency cache Confirm cache Holds write set Supports multiple writes Commits Check dep. cache Broadcast writes Fast aborts Memory Tracks read set Bus monitor detects conflicts Valid Old Value Depends Old Value A New Value Invalid Invalidate Confirm cache Use old values in Dep. Cache Immediately restart execution Dependency Cache Confirm Cache CPU Spawned two threads: TLS & TM October 21, 2004 Thread-Level Transactional Memory 8 H&M’s Transactional Memory [’93] Targets explicitly parallel (non-functional) codes Motivated by lock-free data structures Transactions: Read and write multiple locations Commit in arbitrary order Implicit begin, explicit commit operations Abort affects memory, not registers Software manages restarting execution Validate instruction detects pending abort Implementation extends cache coherence Read/Write locks correspond to MOESI states Add orthogonal transaction states October 21, 2004 Thread-Level Transactional Memory 9 H&M’s Transactional Memory Adds Transaction Cache Before and after image Even for read-only data Small, fully associative Abort on all conflicts NACK conflicting requests Abort NACKed transaction Fast commit and abort Memory Stores all data accessed by transactions 2 copies of each line Change trans. cache state October 21, 2004 M M XCommit New Value S M XAbort Old Value S S XCommit Old Value S XAbort Old Value Cache Transaction Cache CPU Thread-Level Transactional Memory 10 SLE/TLR Hardware exploits speculative processors Read sets tracked by coherence protocol Write set maintained in store queue Abort restarts execution, including register state Speculative lock elision (SLE) Elide locks from the dynamic execution stream Convert critical sections to optimistic transactions Concurrently execute non-conflicting transactions Fall back on explicit locks if conflicts Transactional Lock Removal (TLR) Resolve conflicts using priority ordering (timestamps) Delay lower priority transactions Deadlock and starvation free October 21, 2004 Thread-Level Transactional Memory 11 Transactional Coherence and Consistency [’04] TCC unifies coherence, memory consistency, and transaction support Transaction ordering All transactions, all the time Ordered, Unordered, Partially Ordered Supports thread-level speculation Optimistic concurrency model Unordered transactions serialize at commit Conflicts detected at commit October 21, 2004 Thread-Level Transactional Memory 12 TCC Write buffer ~4 kB, holds new values until commit Shadow register file checkpoints architectural registers On-Chip Interconnect Broadcast updates at commit L2 Cache Logically Shared CPU SRF October 21, 2004 L1 D L1 cache tracks read set, bit per line Thread-Level Transactional Memory 13 TCC Commits are sequential Broadcasts addresses of all updates Supports large transactions Serialize all other transactions Cannot abort large transactions Grabs and holds the commit bus Updates affect L2/Mem; no undo Extensions forthcoming talk to Kunle and Christos October 21, 2004 Thread-Level Transactional Memory 14 Unbounded Transactional Memory (UTM) Unbounded transactions Arbitrary size Arbitrary duration Not limited by interrupts, context switch, etc. Complex implementation Not limited by write buffer, cache, or memory Not justified by performance Settle for “nearly” unbounded transactions Much simpler hardware October 21, 2004 Thread-Level Transactional Memory 15 Transactional Linux 9.355x10^6 make dbench 10^6 10^4 10^2 Log-log scale 1 1 10 100 1000 Fully associative cache size (64 byte lines) 8144 Almost all of the transactions require < 100 cache lines 99.9% need fewer than 54 cache lines There are, however, some very large transactions! >500k-byte fully-associative cache required October 21, 2004 Thread-Level Transactional Memory 16 Large Transaction Memory (LTM) Register checkpoints Cache tracks read and write sets Snapshot of rename maps T-bits mark transactional blocks Cache holds new data values “in place” O-bit indicates overflow to in-memory hashtable Memory holds committed state Abort invalidates all modified blocks Miss on re-execution Transactional writes force memory updates October 21, 2004 Repeated writes (e.g., to local data) are written through Thread-Level Transactional Memory 17 Virtual Transactional Memory (VTM) Only an overflow mechanism No overhead on common in-cache case Low overhead when no conflict Shared Bloom Filter rules out conflicts Filter resides in virtual memory Higher overhead on possible conflict Check shared overflow counter on cache miss Hardware table walk to detect actual conflict Table resides in virtual memory Only incurred by large transactions with likely conflict Supports context switches and paging October 21, 2004 Thread-Level Transactional Memory 18 801 revisited Why didn’t 801 database storage succeed? Answer #1: Lock bits helped performance and simplified software Changing lock bits requires TLB shootdown Too complicated for the benefits? Not a current problem: transaction h/w is easy Answer #2: Not universally available DB2 was (is) multiplatform Can’t rely on feature only available in one architecture Still a relevant concern October 21, 2004 Thread-Level Transactional Memory 19 Need Standard Transaction Interface Abstract away resource requirements Virtualize transactional memory Support large, long transactions Transaction semantics between threads NOT a hardware property Permit range of implementations Hardware, software, and combinations October 21, 2004 Thread-Level Transactional Memory 20 Thread-level Transactional Memory Abstract mechanisms Version management Isolation Update memory “in place” Log “before images” to thread level VM Logically extend memory words with read and write bits Implementations can be conservative (e.g., blocks) Atomicity Commits easy due to in place updates Aborts trap to user-level software October 21, 2004 Hardware can accelerate common case Thread-Level Transactional Memory 21 Conclusions Make the common case fast Handle the uncommon case 99+% of transactions fit in hardware Lots of alternatives Make both commits and aborts fast Large transactions will occur, deal with ‘em Shouldn’t be limited by hardware Agree on a common abstraction Success requires multi-platform support Let vendors compete on price-performance October 21, 2004 Thread-Level Transactional Memory 22