Download David Wood

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Serializability wikipedia , lookup

Concurrency control wikipedia , lookup

Transcript
Transactional Memory
An Overview of Hardware Alternatives
David A. Wood
University of Wisconsin
Transactional Memory Workshop
April 8th, 2005
What’s database got to do with it?

Atomicity


Consistency


Correct at begin and end
All (or some) memory ops,
not just database objects
Isolation



All updates, or none
Partial work not visible
Inputs stay stable
Durability

Survive “system” failures
October 21, 2004
Despite increasing
awareness of failures
Thread-Level Transactional Memory
2
801 Database Storage

Lock bits on virtual memory






128 byte granularity
Added to pagetable and TLB
Caches user’s lock state
Trap on lock conflict
No h/w for logging, abort,
etc.
Only uniprocessors

Memory
PPN
TLB
801 and RS/6000
Was this transactional memory?
October 21, 2004
Tid, Lock bits
Tid
Thread-Level Transactional Memory
CPU
3
SQL/801

“The development of SQL/801 was greatly simplified
because, with minor exceptions, it considers only a
single user. It achieves multiuser concurrency [on a
uniprocessor] by running in multiple processes using
the shared database storage….”
Chang and Mergen, ’88



Largest transactional memory application
Only real hardware transactional memory implementation
No one seems to be looking at what they learned
October 21, 2004
Thread-Level Transactional Memory
4
Basic Transactional Mechanisms

Isolation



Version management


Detect when transactions conflict
Track read and write sets
Record new and old values
Atomicity


Commit new values
Abort back to old values
October 21, 2004
Thread-Level Transactional Memory
5
H/W Transactional Memory Systems








Knight’s Lisp Work
Transactional Memory
Oklahoma Update
SLE/TLR
Transactional Coherence and Consistency
Unbounded TM
Virtual TM
Thread-level TM
October 21, 2004
Thread-Level Transactional Memory
6
Knight’s Lisp Work [’86]

Parallel execution of sequential code

Break program into “transaction blocks”




Execute transactions in parallel



Multiple loads in a transaction
Exactly one store ends the transaction
No register state passed between transactions
Track dependences (i.e., read set)
Abort and restart on conflicting write
Transactions commit in sequential order

October 21, 2004
Broadcast writes on commit
Thread-Level Transactional Memory
7
Knight’s Hardware

Two caches

Dependency cache



Confirm cache





Holds write set
Supports multiple writes
Commits

Check dep. cache
Broadcast writes
Fast aborts



Memory
Tracks read set
Bus monitor detects conflicts
Valid
Old Value
Depends
Old Value
A
New
Value
Invalid
Invalidate Confirm cache
Use old values in Dep. Cache
Immediately restart execution
Dependency
Cache
Confirm
Cache
CPU
Spawned two threads: TLS & TM
October 21, 2004
Thread-Level Transactional Memory
8
H&M’s Transactional Memory [’93]

Targets explicitly parallel (non-functional) codes


Motivated by lock-free data structures
Transactions:




Read and write multiple locations
Commit in arbitrary order
Implicit begin, explicit commit operations
Abort affects memory, not registers



Software manages restarting execution
Validate instruction detects pending abort
Implementation extends cache coherence


Read/Write locks correspond to MOESI states
Add orthogonal transaction states
October 21, 2004
Thread-Level Transactional Memory
9
H&M’s Transactional Memory

Adds Transaction Cache








Before and after image
Even for read-only data
Small, fully associative
Abort on all conflicts

NACK conflicting requests
Abort NACKed transaction
Fast commit and abort

Memory
Stores all data accessed by
transactions
2 copies of each line
Change trans. cache state
October 21, 2004
M
M
XCommit
New Value
S
M
XAbort
Old Value
S
S
XCommit
Old Value
S
XAbort
Old Value
Cache
Transaction
Cache
CPU
Thread-Level Transactional Memory
10
SLE/TLR

Hardware exploits speculative processors




Read sets tracked by coherence protocol
Write set maintained in store queue
Abort restarts execution, including register state
Speculative lock elision (SLE)

Elide locks from the dynamic execution stream




Convert critical sections to optimistic transactions
Concurrently execute non-conflicting transactions
Fall back on explicit locks if conflicts
Transactional Lock Removal (TLR)



Resolve conflicts using priority ordering (timestamps)
Delay lower priority transactions
Deadlock and starvation free
October 21, 2004
Thread-Level Transactional Memory
11
Transactional Coherence and Consistency [’04]

TCC unifies coherence, memory consistency,
and transaction support


Transaction ordering



All transactions, all the time
Ordered, Unordered, Partially Ordered
Supports thread-level speculation
Optimistic concurrency model


Unordered transactions serialize at commit
Conflicts detected at commit
October 21, 2004
Thread-Level Transactional Memory
12
TCC
Write buffer ~4 kB,
holds new values
until commit
Shadow register
file checkpoints
architectural
registers
On-Chip
Interconnect
Broadcast updates
at commit
L2 Cache
Logically Shared
CPU
SRF
October 21, 2004
L1 D
L1 cache tracks read
set, bit per line
Thread-Level Transactional Memory
13
TCC

Commits are sequential


Broadcasts addresses of all updates
Supports large transactions

Serialize all other transactions


Cannot abort large transactions


Grabs and holds the commit bus
Updates affect L2/Mem; no undo
Extensions forthcoming

talk to Kunle and Christos
October 21, 2004
Thread-Level Transactional Memory
14
Unbounded Transactional Memory (UTM)

Unbounded transactions

Arbitrary size


Arbitrary duration


Not limited by interrupts, context switch, etc.
Complex implementation


Not limited by write buffer, cache, or memory
Not justified by performance
Settle for “nearly” unbounded transactions

Much simpler hardware
October 21, 2004
Thread-Level Transactional Memory
15
Transactional Linux
9.355x10^6
make
dbench
10^6
10^4
10^2
Log-log scale
1
1


10
100
1000
Fully associative cache size (64 byte lines)
8144
Almost all of the transactions require < 100 cache lines
 99.9% need fewer than 54 cache lines
There are, however, some very large transactions!
 >500k-byte fully-associative cache required
October 21, 2004
Thread-Level Transactional Memory
16
Large Transaction Memory (LTM)

Register checkpoints


Cache tracks read and write sets




Snapshot of rename maps
T-bits mark transactional blocks
Cache holds new data values “in place”
O-bit indicates overflow to in-memory hashtable
Memory holds committed state

Abort invalidates all modified blocks


Miss on re-execution
Transactional writes force memory updates

October 21, 2004
Repeated writes (e.g., to local data) are written through
Thread-Level Transactional Memory
17
Virtual Transactional Memory (VTM)

Only an overflow mechanism

No overhead on common in-cache case


Low overhead when no conflict



Shared Bloom Filter rules out conflicts
Filter resides in virtual memory
Higher overhead on possible conflict




Check shared overflow counter on cache miss
Hardware table walk to detect actual conflict
Table resides in virtual memory
Only incurred by large transactions with likely conflict
Supports context switches and paging
October 21, 2004
Thread-Level Transactional Memory
18
801 revisited

Why didn’t 801 database storage succeed?


Answer #1:




Lock bits helped performance and simplified software
Changing lock bits requires TLB shootdown
Too complicated for the benefits?
 Not a current problem: transaction h/w is easy
Answer #2:


Not universally available
DB2 was (is) multiplatform


Can’t rely on feature only available in one architecture
Still a relevant concern
October 21, 2004
Thread-Level Transactional Memory
19
Need Standard Transaction Interface

Abstract away resource requirements


Virtualize transactional memory



Support large, long transactions
Transaction semantics between threads
NOT a hardware property
Permit range of implementations

Hardware, software, and combinations
October 21, 2004
Thread-Level Transactional Memory
20
Thread-level Transactional Memory

Abstract mechanisms

Version management



Isolation



Update memory “in place”
Log “before images” to thread level VM
Logically extend memory words with read and write bits
Implementations can be conservative (e.g., blocks)
Atomicity


Commits easy due to in place updates
Aborts trap to user-level software

October 21, 2004
Hardware can accelerate common case
Thread-Level Transactional Memory
21
Conclusions

Make the common case fast




Handle the uncommon case



99+% of transactions fit in hardware
Lots of alternatives
Make both commits and aborts fast
Large transactions will occur, deal with ‘em
Shouldn’t be limited by hardware
Agree on a common abstraction


Success requires multi-platform support
Let vendors compete on price-performance
October 21, 2004
Thread-Level Transactional Memory
22