Download Software Transactional Memory and the Rotate

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Java ConcurrentMap wikipedia , lookup

B-tree wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

Red–black tree wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Software Transactional Memory and the RotateFree Tree.
Alexander Hogue
The University of Sydney
Australia
[email protected]
Abstract
We introduce a new binary search tree implementation in Java, using the concept of Software
Transactional Memory. Rather than the traditional pessimistic synchronisation provided by lockbased synchronisation, we show how optimistic synchronisation presents unique opportunities to
directly observe contention, and how this information affects the way we program.
We provide an experimental evaluation of our implementation of such a data structure, and perform
a simple modification to an existing transactional data structure, the Speculation Friendly Binary
Search Tree [1], in order to create a tree optimised for certain data sets, the Rotate Free Tree. On
such data sets, we observe that the Rotate Free Tree outperforms a transactional AVL free by up to
2.6X. From this, we conclude that for transactional balanced trees, rotation is a key performance
bottleneck.
1. Introduction
The multicore era is rapidly providing new opportunities to exploit parallelism in our technology.
From massively multicore operations in supercomputers, to graphics in mobile phones, the ability to
program multicore technology is readily available today. New programming constructs such as
transactions enable a programmer to easily exploit these multicore architectures, without need for
classical synchronisation methods. Software Transactional Memory allows for a programmer to
simply delimit sections of code, marking them as transactions. These sections will then be executed
atomically when multiple threads are involved.
Inevitably, concurrent data structures present themselves as means to achieve better performance in
software. However, the restrictions associated with efficient data structures (such as the many
invariants in red-black trees) often limit concurrency potential, especially when synchronising using
locks. This is true both regarding the ease of implementing such data structures, and their
performance. Previously, Gramoli et al. [1] provided a C implementation of a Speculation Friendly
Tree, a transactional data structure that seeks to avoid these restrictions. We offer a Java
implementation, and an extension into the Rotate Free Tree; a simple data structure tuned for
particular data sets.
Section 2 provides an overview of Software Transactional memory, and illustrates how to apply it in
Java using DeuceSTM. Section 3 explores the problems with concurrency, Software Transactional
Memory, and data structures, as well as solutions to these problems. Section 4 shows our
experimental results on the Rotate Free Tree and a Speculation Friendly Tree modified to use a copy
thread. Section 5 concludes the paper.
1
2 Software Transactional Memory
In this section, we focus on the features, advantages, and disadvantages of Software Transactional
Memory.
Software Transactional Memory [4], a software implementation of Hardware Transactional Memory
[2] provides a way for programmers to use transactions in their code. A transaction is a piece of
code that executes a series of read/write operations to memory atomically. From the perspective of
other transactions (threads), these read/writes occur at a single instant in time. That is, a transaction
performs some read/write operations (on data shared with other threads) without any regard to what
other threads may be doing, recording the read/writes it does in a log. After completing the entire
transaction, a thread verifies that other threads have not concurrently modified any data that the
original thread accessed previously. If another thread has done so, we say the transaction aborts;
and the results of its read/write operations are undone, and if there has been no data race, the
transaction has been successful, and we say it commits. When a transaction aborts, it is typically
restarted until it can commit.
Since threads need not wait on each other to modify shared memory, this optimistic approach
benefits from increased concurrency. In the pessimistic approach, lock based programming, threads
would typically have to wait to modify parts of a data structure that are protected by a lock.
In addition to performance benefits, STM also simplifies the task of the programmer. Programmers
need not concern themselves with the complexity of deciding how to best allocate locks, deadlocks,
livelocks, priority inversion, locking granularity, and other typical lock-based programming issues.
To illustrate the use of STM, we provide an example in Java, using DeuceSTM[3].
First, a programmer simply annotates methods with @Atomic:
import org.deuce.Atomic;
public class Foo {
…
@Atomic
public void bar() {
…
}
}
The Java Runtime is instrumented for transactions using DeuceSTM:
(Here, $JAVA_RT_PATH refers to the location of rt.jar
on the machine running DeuceSTM.)
#java ­cp ~/deuceAgent.jar ­Dorg.deuce.exclude=java.lang.Enum,sun.* org.deuce.transform.asm.Agent $JAVA_RT_PATH/rt.jar $JAVA_RT_PATH/rt_instrumented.jar
Note that we exclude sun.* from being instrumented by DeuceSTM, as it contains objects used
internally by Deuce in order to instrument.
Then, code is compiled into a .jar file:
#javac *.java
2
#jar ­cf code.jar *.class
Which is then also instrumented:
#java ­cp ~/deuceAgent.jar ­Dorg.deuce.exclude=java.lang.Enum,sun.* org.deuce.transform.asm.Agent code.jar code_instrumented.jar
Finally, the jar is executed:
#java ­Dorg.deuce.exclude=java.lang.Enum,sun.* ­Dorg.deuce.transaction.contextClass=org.deuce.transaction.tl2.Context ­Xbootclasspath/p:
$JAVA_RT_PATH/rt_instrumented.jar:~/deuceAgnet.jar:code_instrumented.j
ar MainClass
Since DeuceSTM implements transactions by modifying Java byte-code, the resulting transactional
program is reusable. That is, a programmer composing multiple transactional methods, possibly
from multiple libraries, is guaranteed to obtain another transactional program, free of any
synchronisation issues. Such re-usability is typically not present in complex lock-based programs.
However, there is some overhead associated with using transactions. This overhead is primarily due
to the extra computations performed maintaining the log and checking whether other threads have
interrupted any transaction.
3. The problem with Self-Balancing Binary Search Trees and Software
Transactional Memory
In this section, we investigate performance issues regarding self-balancing binary search trees and
Software Transactional Memory.
Binary Search Tree property: For every node n in a Binary Search Tree, n's left subtree contains
only elements smaller than n, or is empty, and n's right subtree contains only elements larger than n,
or is empty. Moreover, each of these subtrees are also Binary Search Trees.
Self-Balancing Binary Search trees are a popular choice of data structure for performance due to
their logarithmic access, insert, and remove times. However, such data structures impose an
invariant. An invariant is a condition about the state of a data structure that must always be true. For
example, a self-balancing binary search tree has the invariant that its height must always be log(n),
where n is the number of nodes in the tree. That is, the distance from the root node to any leaf is the
same, and log(n). If a tree has this property, we say it is balanced. Considering that an arbitrary
element can be added to or removed from the tree at any time, rebalancing must occur to ensure the
invariant.
AVL Tree invariant: An AVL tree is a Binary Search Tree with the property that the root's left
subtree and right subtree differ in height by at most 1.
In an AVL tree, this rebalancing is achieved through
tree rotations. That is, nodes are rearranged within
the tree without violating the binary search tree
property.
3
To illustrate some of the shortcomings in using transactions in AVL trees, we describe the insertion
operation from a transactional point of view. The case for removal is analogous. Typically, an insert
operation might access nodes as follows. Starting from the root, a path to an appropriate leaf is
followed via the binary search tree property. Insertion is then performed at this leaf, and tree
rotation may occur. Potentially, rotation can affect nodes from the leaves up to the root. From a
transactional point of view, this is a problem. Such a long transaction is likely to be interrupted by
other threads, and consequently abort. The long series of reads and writes is likely to access many
nodes in the tree, which in turn are accessed by other transactions. The key issues causing the
transactions to be restart-prone are the amount of accessed memory addresses, and the length of
time the transactions run for. Increasing either increases the likelihood of a transaction aborting. The
coupled nature of the insert/delete operations and the rotation operations leads to unnecessarily long
and restart-prone transactions, which in turn lead to decreased performance. Matters are only made
worse as concurrency increases, as highly contended data causes more aborts.
3.1 The Speculation Friendly Binary Search Tree
In this section, we explain how the Speculation Friendly Binary Search Tree avoids many of the
pitfalls mentioned in section 2.
Previously, Gramoli et al. introduced a C implementation of a Speculation Friendly Binary Search
Tree[1]. The main idea of such a tree is to decouple the insert/delete and rotate operations. We saw
in section 2 that the coupling of these operations leads to long and abort-prone transactions. The
Speculation Friendly Tree overcomes the decoupling operation by introducing a dedicated rotator
thread. This thread continually scans the tree for imbalances, and performs the rebalance operation
when necessary. Logically, the insert/delete operations and the rotate operation need not be
performed at the same instance in time, so the Speculation Friendly Tree relaxes the balanced
invariant of the AVL tree during insertion/removal.
3.2 The Rotate Free Tree
In this section we explain the Rotate Free Tree by comparing it to the AVL tree from which it stems.
The Speculation Friendly Tree provides performance increases for all data sets[3]. We introduce a
simple modification that improves performance compared to a standard AVL Tree for data sets
consisting of numbers chosen uniformly at random.
Lemma: Suppose a Binary Search Tree is constructed from a sequence of insertions of elements
chosen uniformly at random. The expected tree is balanced.
Consider the construction of a tree from a sequence of insert operations, with each element being
inserted being chosen uniformly at random. Suppose the first element inserted is x. The next
element to be inserted has a chance of approximately 50% to be larger than x, and so inserted on the
right, and an approximately 50% chance of being smaller, and so inserted on the left. The second
element to be inserted has a 50% chance to insert into the right subtree. Suppose x was inserted into
the right subtree. Then this new element has a further 50% chance of becoming a left child of x, and
a 50% chance of becoming a right child. If insertions are continued in this fashion, the expected tree
will be balanced. This can be deduced from the observation that for a new node being inserted
4
chosen uniformly at random, there is an equal chance of the insertion location being in the left and
right subtrees of every node on the path to the final insertion location. Note the chance of x being
inserted again tends to zero as the range from which elements are chosen at random increases in
size.
An AVL tree using the same data would spend time rotating, and this need not be done in the
expected case. It follows that for this variety of data set, performance can be increased by simply
not rotating the tree.
We also investigated applying a new version of the rotator thread, the copy thread, to the
Speculation Friendly Tree, with the aim of achieving increased performance compared to the AVL
tree for all data sets. The copy thread's role is, rather than rotating, to make a balanced copy of the
current version of the tree. It does this by first performing a recursive inorder traversal of the tree,
gaining a sorted list of the data in the tree in O(n) time, where n is the number of nodes in the tree.
Then, by inserting the data according to Algorithm 1, the resulting tree is guaranteed to be balanced.
The root node of the original tree is then set to be the root node of this new tree, thus “copying” the
tree.
ALGORITHM CreateBalancedTree
INPUT: Sorted list of tree data l
OUTPUT: Balanced tree containing input data.
sft = new Speculation Friendly Tree
Insert middle element of l into sft
middle = LENGTH(l)/2
for i in {1,2,3....middle}
Insert l[middle – i] into sft
Insert l[middle + i] into sft
return sft
Algorithm 1: The construction of a new, balanced tree from a sorted list of elements.
4. Experimental Evaluation.
We experimented with the modified Speculation Friendly Tree by comparing it to a transactional
AVL tree during testing. Each thread was set to run a loop for a set duration, the contents of which
was probabilistically either an insert or contains operation. By testing in this way, we can easily
measure the throughput (operations/second) of each tree. Table 1 contains the specifications
regarding how testing was performed.
5
CPU
Intel core i7-2630QM @ 2.00 GHZ – 4 physical cores with Hyper
Threading
RAM
8GB
OS
Java version
Ubuntu 12.04 LTS (64 bit)
1.7.0.0_07 – OpenJDK AMD64
Operation probabilities 10% insert operations, 90% contains operations
Time per experiment
DeuceSTM version
20 seconds
1.3.0
Data – Rotate Free Tree Chosen uniformly at random from Doubles in the range [0, 2^20].
Comparison
Data – Copy Thread
Experiment
Chosen uniformly at random from Doubles in the range [0, 2^20] with
some probability (1%) of inserting a Double smaller than any element
already present in the tree.
Table 1: Equipment and settings used to run experiments on the Speculation Friendly Tree
The data presented below has been averaged over multiple experiments in order to minimise the
non-determinism associated with choosing random data elements. Note that for both Illustration 1
and Illustration 2, the throughput at 16 threads is significantly less. In Table 1, we note that we
tested on a CPU with 4 physical cores and Hyper Threading (a feature in some Intel Processors that
allows two 'virtual' cores for each physical core), so we expect scalability with at most 8 threads.
Illustration 1: Performance comparison between a transactional AVL tree and the Rotate Free Tree
The first test we present compares the Rotate Free Tree with a standard AVL tree, both utilising
6
transactions. We see that for a data set chosen uniformly at random, the Rotate Free tree has a
higher throughput than that of the AVL tree. This is expected, as an AVL tree will perform some
unnecessary rotations that the Rotate Free Tree will not. These rotations, while still balancing the
tree, will have little effect overall, since the data being inserted has been chosen uniformly at
random.
In our second experiment, we compared the throughput of the AVL Tree to a Rotate Free Tree with
a copy thread, as described in Section 3.2. The copy thread was set to run with a 5000 millisecond
delay between copy operations. Moreover, the data inserted into the tree was chosen as described in
Table 1, as an attempt to exploit the infrequent balancing performed by the copy thread. We find
that while both trees scale with the number of physical cores, the performance of the Rotate Free
Tree with a copy thread suffers. We suggest this is due to the copy operation's long length as a
transaction. That is, the inorder traversal used to collect the tree data in sorted order is a transaction
which affects every node in the tree, so either any interrupting threads must abort, or the copy
operation must abort, both of which are costly in terms of time. Furthermore, the tree cannot be
modified while the copy is being made, or else the copy would be invalid, so the transaction must
extend to the duration of the construction of the new tree, even though the original tree is not
accessed in this time. This situation is similar to a coarse-grained lock; that is, the entire data
structure is locked for the duration of the copy operation.
Illustration 2: Performance comparison between a transactional AVL tree and the Speculation
Friendly tree with a copy thread.
Source code
Jar files containing the source code to the Speculation Friendly Binary Search Tree and the AVL tree
7
are available at www.ug.it.usyd.edu.au/~ahog5691/sft.jar and
www.ug.it.usyd.edu.au/~ahog5691/AVLTree.jar
5. Conclusion
In the current technological era's focus on multicore architectures, new opportunities to exploit
concurrency present themselves. Using one such new technology, Software Transactional Memory,
we provide a Java implementation of a transactional data structure, the Speculation Friendly Binary
Search Tree. Our Java implementation retains the increased performance compared to a
transactional AVL tree, and we also investigated the possibility of a copy thread. A natural
extension to our work is to develop a more efficient, transaction-friendly task for the copy thread.
We note finally that optimistic synchronisation provides a means for a programmer to directly
observe the effects of thread contention. This information provides potential for even more efficient
transactional data structures in the future.
References
[1] V. Gramoli, T. Crain, M. Raynal. A Speculation Friendly Binary Search Tree. In Proceedings of
the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, 2012
[2] T. Knight. An architecture for mostly functional languages. In Proceedings of the 1986 ACM
conference on LISP and functional programming, 1986
[3] G. Korland, N. Shavit, P. Felber. Deuce: Noninvasive Software Transactional Memory in Java.
In Transactions on HiPEAC 5(2) 2010
[4] N. Shavit, D. Touitou. Software Transactional Memory. In Proceedings of the fourteenth annual
ACM symposium on Principles of distributed computing, 1995
8