Download Software Transactional Memory and the Rotate

Software Transactional Memory and the RotateFree Tree. Alexander Hogue The University of Sydney Australia [email protected] Abstract We introduce a new binary search tree implementation in Java, using the concept of Software Transactional Memory. Rather than the traditional pessimistic synchronisation provided by lockbased synchronisation, we show how optimistic synchronisation presents unique opportunities to directly observe contention, and how this information affects the way we program. We provide an experimental evaluation of our implementation of such a data structure, and perform a simple modification to an existing transactional data structure, the Speculation Friendly Binary Search Tree [1], in order to create a tree optimised for certain data sets, the Rotate Free Tree. On such data sets, we observe that the Rotate Free Tree outperforms a transactional AVL free by up to 2.6X. From this, we conclude that for transactional balanced trees, rotation is a key performance bottleneck. 1. Introduction The multicore era is rapidly providing new opportunities to exploit parallelism in our technology. From massively multicore operations in supercomputers, to graphics in mobile phones, the ability to program multicore technology is readily available today. New programming constructs such as transactions enable a programmer to easily exploit these multicore architectures, without need for classical synchronisation methods. Software Transactional Memory allows for a programmer to simply delimit sections of code, marking them as transactions. These sections will then be executed atomically when multiple threads are involved. Inevitably, concurrent data structures present themselves as means to achieve better performance in software. However, the restrictions associated with efficient data structures (such as the many invariants in red-black trees) often limit concurrency potential, especially when synchronising using locks. This is true both regarding the ease of implementing such data structures, and their performance. Previously, Gramoli et al. [1] provided a C implementation of a Speculation Friendly Tree, a transactional data structure that seeks to avoid these restrictions. We offer a Java implementation, and an extension into the Rotate Free Tree; a simple data structure tuned for particular data sets. Section 2 provides an overview of Software Transactional memory, and illustrates how to apply it in Java using DeuceSTM. Section 3 explores the problems with concurrency, Software Transactional Memory, and data structures, as well as solutions to these problems. Section 4 shows our experimental results on the Rotate Free Tree and a Speculation Friendly Tree modified to use a copy thread. Section 5 concludes the paper. 1 2 Software Transactional Memory In this section, we focus on the features, advantages, and disadvantages of Software Transactional Memory. Software Transactional Memory [4], a software implementation of Hardware Transactional Memory [2] provides a way for programmers to use transactions in their code. A transaction is a piece of code that executes a series of read/write operations to memory atomically. From the perspective of other transactions (threads), these read/writes occur at a single instant in time. That is, a transaction performs some read/write operations (on data shared with other threads) without any regard to what other threads may be doing, recording the read/writes it does in a log. After completing the entire transaction, a thread verifies that other threads have not concurrently modified any data that the original thread accessed previously. If another thread has done so, we say the transaction aborts; and the results of its read/write operations are undone, and if there has been no data race, the transaction has been successful, and we say it commits. When a transaction aborts, it is typically restarted until it can commit. Since threads need not wait on each other to modify shared memory, this optimistic approach benefits from increased concurrency. In the pessimistic approach, lock based programming, threads would typically have to wait to modify parts of a data structure that are protected by a lock. In addition to performance benefits, STM also simplifies the task of the programmer. Programmers need not concern themselves with the complexity of deciding how to best allocate locks, deadlocks, livelocks, priority inversion, locking granularity, and other typical lock-based programming issues. To illustrate the use of STM, we provide an example in Java, using DeuceSTM[3]. First, a programmer simply annotates methods with @Atomic: import org.deuce.Atomic; public class Foo { … @Atomic public void bar() { … } } The Java Runtime is instrumented for transactions using DeuceSTM: (Here, $JAVA_RT_PATH refers to the location of rt.jar on the machine running DeuceSTM.) #java cp ~/deuceAgent.jar Dorg.deuce.exclude=java.lang.Enum,sun.* org.deuce.transform.asm.Agent $JAVA_RT_PATH/rt.jar $JAVA_RT_PATH/rt_instrumented.jar Note that we exclude sun.* from being instrumented by DeuceSTM, as it contains objects used internally by Deuce in order to instrument. Then, code is compiled into a .jar file: #javac *.java 2 #jar cf code.jar *.class Which is then also instrumented: #java cp ~/deuceAgent.jar Dorg.deuce.exclude=java.lang.Enum,sun.* org.deuce.transform.asm.Agent code.jar code_instrumented.jar Finally, the jar is executed: #java Dorg.deuce.exclude=java.lang.Enum,sun.* Dorg.deuce.transaction.contextClass=org.deuce.transaction.tl2.Context Xbootclasspath/p: $JAVA_RT_PATH/rt_instrumented.jar:~/deuceAgnet.jar:code_instrumented.j ar MainClass Since DeuceSTM implements transactions by modifying Java byte-code, the resulting transactional program is reusable. That is, a programmer composing multiple transactional methods, possibly from multiple libraries, is guaranteed to obtain another transactional program, free of any synchronisation issues. Such re-usability is typically not present in complex lock-based programs. However, there is some overhead associated with using transactions. This overhead is primarily due to the extra computations performed maintaining the log and checking whether other threads have interrupted any transaction. 3. The problem with Self-Balancing Binary Search Trees and Software Transactional Memory In this section, we investigate performance issues regarding self-balancing binary search trees and Software Transactional Memory. Binary Search Tree property: For every node n in a Binary Search Tree, n's left subtree contains only elements smaller than n, or is empty, and n's right subtree contains only elements larger than n, or is empty. Moreover, each of these subtrees are also Binary Search Trees. Self-Balancing Binary Search trees are a popular choice of data structure for performance due to their logarithmic access, insert, and remove times. However, such data structures impose an invariant. An invariant is a condition about the state of a data structure that must always be true. For example, a self-balancing binary search tree has the invariant that its height must always be log(n), where n is the number of nodes in the tree. That is, the distance from the root node to any leaf is the same, and log(n). If a tree has this property, we say it is balanced. Considering that an arbitrary element can be added to or removed from the tree at any time, rebalancing must occur to ensure the invariant. AVL Tree invariant: An AVL tree is a Binary Search Tree with the property that the root's left subtree and right subtree differ in height by at most 1. In an AVL tree, this rebalancing is achieved through tree rotations. That is, nodes are rearranged within the tree without violating the binary search tree property. 3 To illustrate some of the shortcomings in using transactions in AVL trees, we describe the insertion operation from a transactional point of view. The case for removal is analogous. Typically, an insert operation might access nodes as follows. Starting from the root, a path to an appropriate leaf is followed via the binary search tree property. Insertion is then performed at this leaf, and tree rotation may occur. Potentially, rotation can affect nodes from the leaves up to the root. From a transactional point of view, this is a problem. Such a long transaction is likely to be interrupted by other threads, and consequently abort. The long series of reads and writes is likely to access many nodes in the tree, which in turn are accessed by other transactions. The key issues causing the transactions to be restart-prone are the amount of accessed memory addresses, and the length of time the transactions run for. Increasing either increases the likelihood of a transaction aborting. The coupled nature of the insert/delete operations and the rotation operations leads to unnecessarily long and restart-prone transactions, which in turn lead to decreased performance. Matters are only made worse as concurrency increases, as highly contended data causes more aborts. 3.1 The Speculation Friendly Binary Search Tree In this section, we explain how the Speculation Friendly Binary Search Tree avoids many of the pitfalls mentioned in section 2. Previously, Gramoli et al. introduced a C implementation of a Speculation Friendly Binary Search Tree[1]. The main idea of such a tree is to decouple the insert/delete and rotate operations. We saw in section 2 that the coupling of these operations leads to long and abort-prone transactions. The Speculation Friendly Tree overcomes the decoupling operation by introducing a dedicated rotator thread. This thread continually scans the tree for imbalances, and performs the rebalance operation when necessary. Logically, the insert/delete operations and the rotate operation need not be performed at the same instance in time, so the Speculation Friendly Tree relaxes the balanced invariant of the AVL tree during insertion/removal. 3.2 The Rotate Free Tree In this section we explain the Rotate Free Tree by comparing it to the AVL tree from which it stems. The Speculation Friendly Tree provides performance increases for all data sets[3]. We introduce a simple modification that improves performance compared to a standard AVL Tree for data sets consisting of numbers chosen uniformly at random. Lemma: Suppose a Binary Search Tree is constructed from a sequence of insertions of elements chosen uniformly at random. The expected tree is balanced. Consider the construction of a tree from a sequence of insert operations, with each element being inserted being chosen uniformly at random. Suppose the first element inserted is x. The next element to be inserted has a chance of approximately 50% to be larger than x, and so inserted on the right, and an approximately 50% chance of being smaller, and so inserted on the left. The second element to be inserted has a 50% chance to insert into the right subtree. Suppose x was inserted into the right subtree. Then this new element has a further 50% chance of becoming a left child of x, and a 50% chance of becoming a right child. If insertions are continued in this fashion, the expected tree will be balanced. This can be deduced from the observation that for a new node being inserted 4 chosen uniformly at random, there is an equal chance of the insertion location being in the left and right subtrees of every node on the path to the final insertion location. Note the chance of x being inserted again tends to zero as the range from which elements are chosen at random increases in size. An AVL tree using the same data would spend time rotating, and this need not be done in the expected case. It follows that for this variety of data set, performance can be increased by simply not rotating the tree. We also investigated applying a new version of the rotator thread, the copy thread, to the Speculation Friendly Tree, with the aim of achieving increased performance compared to the AVL tree for all data sets. The copy thread's role is, rather than rotating, to make a balanced copy of the current version of the tree. It does this by first performing a recursive inorder traversal of the tree, gaining a sorted list of the data in the tree in O(n) time, where n is the number of nodes in the tree. Then, by inserting the data according to Algorithm 1, the resulting tree is guaranteed to be balanced. The root node of the original tree is then set to be the root node of this new tree, thus “copying” the tree. ALGORITHM CreateBalancedTree INPUT: Sorted list of tree data l OUTPUT: Balanced tree containing input data. sft = new Speculation Friendly Tree Insert middle element of l into sft middle = LENGTH(l)/2 for i in {1,2,3....middle} Insert l[middle – i] into sft Insert l[middle + i] into sft return sft Algorithm 1: The construction of a new, balanced tree from a sorted list of elements. 4. Experimental Evaluation. We experimented with the modified Speculation Friendly Tree by comparing it to a transactional AVL tree during testing. Each thread was set to run a loop for a set duration, the contents of which was probabilistically either an insert or contains operation. By testing in this way, we can easily measure the throughput (operations/second) of each tree. Table 1 contains the specifications regarding how testing was performed. 5 CPU Intel core i7-2630QM @ 2.00 GHZ – 4 physical cores with Hyper Threading RAM 8GB OS Java version Ubuntu 12.04 LTS (64 bit) 1.7.0.0_07 – OpenJDK AMD64 Operation probabilities 10% insert operations, 90% contains operations Time per experiment DeuceSTM version 20 seconds 1.3.0 Data – Rotate Free Tree Chosen uniformly at random from Doubles in the range [0, 2^20]. Comparison Data – Copy Thread Experiment Chosen uniformly at random from Doubles in the range [0, 2^20] with some probability (1%) of inserting a Double smaller than any element already present in the tree. Table 1: Equipment and settings used to run experiments on the Speculation Friendly Tree The data presented below has been averaged over multiple experiments in order to minimise the non-determinism associated with choosing random data elements. Note that for both Illustration 1 and Illustration 2, the throughput at 16 threads is significantly less. In Table 1, we note that we tested on a CPU with 4 physical cores and Hyper Threading (a feature in some Intel Processors that allows two 'virtual' cores for each physical core), so we expect scalability with at most 8 threads. Illustration 1: Performance comparison between a transactional AVL tree and the Rotate Free Tree The first test we present compares the Rotate Free Tree with a standard AVL tree, both utilising 6 transactions. We see that for a data set chosen uniformly at random, the Rotate Free tree has a higher throughput than that of the AVL tree. This is expected, as an AVL tree will perform some unnecessary rotations that the Rotate Free Tree will not. These rotations, while still balancing the tree, will have little effect overall, since the data being inserted has been chosen uniformly at random. In our second experiment, we compared the throughput of the AVL Tree to a Rotate Free Tree with a copy thread, as described in Section 3.2. The copy thread was set to run with a 5000 millisecond delay between copy operations. Moreover, the data inserted into the tree was chosen as described in Table 1, as an attempt to exploit the infrequent balancing performed by the copy thread. We find that while both trees scale with the number of physical cores, the performance of the Rotate Free Tree with a copy thread suffers. We suggest this is due to the copy operation's long length as a transaction. That is, the inorder traversal used to collect the tree data in sorted order is a transaction which affects every node in the tree, so either any interrupting threads must abort, or the copy operation must abort, both of which are costly in terms of time. Furthermore, the tree cannot be modified while the copy is being made, or else the copy would be invalid, so the transaction must extend to the duration of the construction of the new tree, even though the original tree is not accessed in this time. This situation is similar to a coarse-grained lock; that is, the entire data structure is locked for the duration of the copy operation. Illustration 2: Performance comparison between a transactional AVL tree and the Speculation Friendly tree with a copy thread. Source code Jar files containing the source code to the Speculation Friendly Binary Search Tree and the AVL tree 7 are available at www.ug.it.usyd.edu.au/~ahog5691/sft.jar and www.ug.it.usyd.edu.au/~ahog5691/AVLTree.jar 5. Conclusion In the current technological era's focus on multicore architectures, new opportunities to exploit concurrency present themselves. Using one such new technology, Software Transactional Memory, we provide a Java implementation of a transactional data structure, the Speculation Friendly Binary Search Tree. Our Java implementation retains the increased performance compared to a transactional AVL tree, and we also investigated the possibility of a copy thread. A natural extension to our work is to develop a more efficient, transaction-friendly task for the copy thread. We note finally that optimistic synchronisation provides a means for a programmer to directly observe the effects of thread contention. This information provides potential for even more efficient transactional data structures in the future. References [1] V. Gramoli, T. Crain, M. Raynal. A Speculation Friendly Binary Search Tree. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, 2012 [2] T. Knight. An architecture for mostly functional languages. In Proceedings of the 1986 ACM conference on LISP and functional programming, 1986 [3] G. Korland, N. Shavit, P. Felber. Deuce: Noninvasive Software Transactional Memory in Java. In Transactions on HiPEAC 5(2) 2010 [4] N. Shavit, D. Touitou. Software Transactional Memory. In Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing, 1995 8

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Software Transactional Memory and the Rotate