Download slides - University of California San Diego

Succinct Priority Indexing Structures for the Management of Large Priority Queues Hao Wang and Bill Lin University of California, San Diego IEEE IWQoS 2009 Charleston, South Carolina July 13-15, 2009 1 Introduction • Priority queues used many network applications – Per-flow advanced QoS scheduling – Management of per-flow DRAM packet buffers – Maintenance of per-flow statistics counters for realtime network measurements • Items in priority queue sorted at all times (e.g. smallest key first) • Common operations: INSERT, FINDMIN, DELETE • Challenge: Need to operate at high speeds (e.g. 40+ Gb/s) 2 Introduction • Binary heap common structure for priority queues – Has Q(log2n) time complexity, where n is # items – e.g. in fine-grained per-flow scheduling, n can be very large (e.g. 1 million) – But Q(log2n) may be too slow for high line rates • Pipelined heaps [Bhagwan, Lin 2000][Ioannou 2001][Wang, Lin 2006] – Reduced amortized time complexity to constant time – At the expense of Q(log2n) pipeline stages 3 Introduction • van Emde Boas (vEB) trees – Instead of maintaining priority queue of sorted items, maintain sorted dictionary of keys – In many applications, since keys are represented by a k-bit integer, possible keys can only be from a fixed universe of U = 2k values – Only Q(log2log2U) complexity vs. Q(log2n) for heaps • Pipelined vEB trees [Wang, Lin 2007] – Reduced amortized time complexity to constant time – At the expense of Q(log2log2U) pipeline stages 4 This Talk • Propose 3 related Priority Indexing (PI) structures that leverage built-in hardware optimized instructions in modern 64-bit x86 processors (both Intel and AMD) • Specifically, given a W=64 bit word, the instructions BSR (bit-scan-reverse) and BSF (bit-scan-forward) return the positions of the most-significant and leastsignificant bits, respectively 0 0 1 0 1 1 BSR … 0 0 1 0 0 0 BSF 5 This Talk • Most-significant (least-significant) bit positions can also be easily implemented using efficient priority encoder designs in custom hardware 6 Basic Priority Indexing Structure • Essentially a W-way tree. Maintains sorted subset S of N elements from a fixed universe of size U = Wh, where N ≤ U. • Each element i of the universe is associated with a binary bit bi • Leaf node contains W bits of bi: bi = 1 if element i is in the set • Non-leaf node serves as summary of child nodes: bit in non-leaf node set to 1 if its child node has at least one non-zero bit Data Structure of PI with h = 3 7 Example Operations • TEST(i): Just check bi • INSERT(i): Start at leaf, set bi. Set corresponding bit in parent. Repeat until root. • FINDMIN(): Start at root. Find MSB (most-significant-bit) and traverse sub-tree. Repeat until leaf. • DELETE(i): Start at leaf, clear bi. If word = 0 (no more bits set), clear corresponding bit in parent. Repeat until root. Data Structure of PI with h = 3 8 Time/Memory Complexity of PI • TEST(i) takes constant time. All other operations take Q(logwU) time, which is asymptotically not as “good” as the Q(log2log2U) time complexity of a van Emde Boas tree • However, for W = 64, PI requires fewer or same number of operations for U ≤ 64 billion (h ≤ 6), but much simpler • For PI of size U, memory size only 1.016U bits, Q(U) space 9 Motivation for Modified Structures • PI is fast, but Q(logwU) time may still not be fast enough for high-performance applications – Want constant time operations (issue new operation every cycle) • But PI cannot be readily pipelined – Some PI operations are top-down (e.g. FINDMIN), but others are bottom-up (e.g. DELETE) • Propose 2 modified structures – Counting Priority Index (CPI) – Pipelined Counting Priority Index (Pipelined CPI) 10 Counting-Priority-Index • In addition to having a bit set to indicate a child node has at least one bit set, add counter to keep track of “how many” bits in a child node are set. Enables all top-down operations. Data Structure of CPI with h = 3 11 Example CPI Operations • FINDMIN(): Start at root. Find MSB (most-significant-bit) and traverse sub-tree. Repeat until leaf. Same as before. • DELETE(i): Start at root. Decrement counter. If count = 0, clear bit. Go down corresponding sub-tree. Repeat until leaf. Data Structure of CPI with h = 3 12 Time/Memory Complexity of CPI • TEST(i) takes constant time. All other operations take Q(logwU) time, same as basic PI structure. • But all operations supported in top-down fashion • For CPI of size U, memory size only 1.11U bits, still Q(U) space 13 Pipelined Counting-Priority-Index • Reduced amortized time complexity to constant time • At the expense of Q(logwU) pipeline stages • Memory size also only 1.11U bits, Q(U) space Data Structure of Pipelined CPI with h = 3 14 Operations Supported • Operations supported by all 3 priority indexing structures TEST(i) INSERT(i) DELETE(i) FINDMIN FINDMAX EXTRACTMIN EXTRACTMAX SUCCESSOR(i) PREDECESSOR(i) EXTRACTSUCC(i) EXTRACTPRED(i) Test if index i is in set S Insert a new index i to set S Delete index i from set S Find the smallest index in set S Find the largest index in set S Delete the smallest index in set S Delete the largest index in set S Find the successor of index i in set S Find the predecessor of index i in set S Delete the successor of index i in set S Delete the predecessor of index i in set S 15 Comparison Time Hardware Memory PI Q(logwU) constant 1.016 U CPI Q(logwU) constant 1.11 U Pipelined CPI constant Q(logwU) 1.11 U 16 Hardware Complexity of Pipelined CPI Number of Pipeline Stages in the Data Structures 17 Summary • Fast sorting data structures – Fast and scalable succinct data structures for the implementation of priority queues – The Pipelined CPI supports constant time priority management operations – The hardware complexity is only Q(logwU) with Q(U) memory space 18 Thank You 19

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download slides - University of California San Diego