Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Scalable lock-free Stack Algorithm Wael Yehia York University February 8, 2010 A scalable lock-free stack algorithm • To reduce contention on the stack (i.e. on the top pointer) we used: – Elimination as a Backoff mechanism • • • • Example: t1: push(3), t2: pop(), t3: push(1), t4: push(2) All but t4 fail to modify the top pointer Stack So they try to collide: top top 2 5 t1 t2 t3 5 1 Collision Array 1 Implementation • The algorithm is non-blocking so no locks • 4 classes created stack<T> StackThread AtomicReference<cell<T>> top T pop(threadInfo<T> p) void push(T value, threadInfo<T> p) threadInfo<T> info threadInfo<T> cell<T> cell<T> next T value final int id final int spin OP op cell<T> cell Key variables • We have three shared variables: – two arrays used during collision: • void * location[] - for data exchange and synch java.util.concurrent.atomic.AtomicReferenceArray • int collision[] - for finding a collision partner java.util.concurrent.atomic.AtomicIntegerArray – one top pointer: • Cell * top - points to the top of the stack java.util.concurrent.atomic.AtomicReference Correctness Tests • Tested for correctness by: – Counting the number of pushes, pops and failed pops – Compare the expected stack size to the actual size • So far all expected == actual Timing tests • Ran the stack under different number of threads and total operations • Compared to Treiber’s stack (the one discussed in class) • Ours ran 2-3 times faster. • The rate increased as # of threads increased Our stack vs Treiber’s 2 threads 8 threads 4500 25000 4000 20000 3500 2500 generic 2000 simple time (ms) time (ms) 3000 15000 generic simple 10000 1500 1000 5000 500 0 0 100,000 1,000,000 2,000,000 5,000,000 100,000 # of Ops 1,000,000 2,000,000 5,000,000 # of Ops 256 threads 64 threads 200000 800000 180000 700000 160000 600000 120000 generic 100000 simple 80000 60000 time (ms) time (ms) 140000 500000 generic 400000 simple 300000 200000 40000 100000 20000 0 0 100,000 1,000,000 2,000,000 # of Ops 5,000,000 100,000 1,000,000 2,000,000 # of Ops 5,000,000 Possible improvement and the next step • Will test against: – improved implementations of our stack algorithm – synchronized stack • Try to prove some properties of our algorithm