Download slides - University of California San Diego

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Succinct Priority Indexing
Structures for the Management
of Large Priority Queues
Hao Wang and Bill Lin
University of California, San Diego
IEEE IWQoS 2009
Charleston, South Carolina
July 13-15, 2009
1
Introduction
• Priority queues used many network applications
– Per-flow advanced QoS scheduling
– Management of per-flow DRAM packet buffers
– Maintenance of per-flow statistics counters for realtime network measurements
• Items in priority queue sorted at all times
(e.g. smallest key first)
• Common operations: INSERT, FINDMIN, DELETE
• Challenge: Need to operate at high speeds
(e.g. 40+ Gb/s)
2
Introduction
• Binary heap common structure for priority queues
– Has Q(log2n) time complexity, where n is # items
– e.g. in fine-grained per-flow scheduling, n can be very
large (e.g. 1 million)
– But Q(log2n) may be too slow for high line rates
• Pipelined heaps [Bhagwan, Lin 2000][Ioannou 2001][Wang,
Lin 2006]
– Reduced amortized time complexity to constant time
– At the expense of Q(log2n) pipeline stages
3
Introduction
• van Emde Boas (vEB) trees
– Instead of maintaining priority queue of sorted items,
maintain sorted dictionary of keys
– In many applications, since keys are represented by a
k-bit integer, possible keys can only be from a fixed
universe of U = 2k values
– Only Q(log2log2U) complexity vs. Q(log2n) for heaps
• Pipelined vEB trees [Wang, Lin 2007]
– Reduced amortized time complexity to constant time
– At the expense of Q(log2log2U) pipeline stages
4
This Talk
• Propose 3 related Priority Indexing (PI) structures
that leverage built-in hardware optimized
instructions in modern 64-bit x86 processors (both
Intel and AMD)
• Specifically, given a W=64 bit word, the instructions
BSR (bit-scan-reverse) and BSF (bit-scan-forward)
return the positions of the most-significant and leastsignificant bits, respectively
0 0 1 0 1 1
BSR
…
0 0 1 0 0 0
BSF
5
This Talk
• Most-significant (least-significant) bit positions can
also be easily implemented using efficient priority
encoder designs in custom hardware
6
Basic Priority Indexing Structure
• Essentially a W-way tree. Maintains sorted subset S of N
elements from a fixed universe of size U = Wh, where N ≤ U.
• Each element i of the universe is associated with a binary bit bi
• Leaf node contains W bits of bi: bi = 1 if element i is in the set
• Non-leaf node serves as summary of child nodes: bit in non-leaf
node set to 1 if its child node has at least one non-zero bit
Data Structure of PI with h = 3
7
Example Operations
• TEST(i): Just check bi
• INSERT(i): Start at leaf, set bi. Set corresponding bit in parent.
Repeat until root.
• FINDMIN(): Start at root. Find MSB (most-significant-bit) and
traverse sub-tree. Repeat until leaf.
• DELETE(i): Start at leaf, clear bi. If word = 0 (no more bits set),
clear corresponding bit in parent. Repeat until root.
Data Structure of PI with h = 3
8
Time/Memory Complexity of PI
• TEST(i) takes constant time. All other operations take
Q(logwU) time, which is asymptotically not as “good”
as the Q(log2log2U) time complexity of a van Emde
Boas tree
• However, for W = 64, PI requires fewer or same
number of operations for U ≤ 64 billion (h ≤ 6), but
much simpler
• For PI of size U, memory size only 1.016U bits, Q(U)
space
9
Motivation for Modified Structures
• PI is fast, but Q(logwU) time may still not be fast
enough for high-performance applications
– Want constant time operations (issue new
operation every cycle)
• But PI cannot be readily pipelined
– Some PI operations are top-down (e.g. FINDMIN),
but others are bottom-up (e.g. DELETE)
• Propose 2 modified structures
– Counting Priority Index (CPI)
– Pipelined Counting Priority Index (Pipelined CPI)
10
Counting-Priority-Index
• In addition to having a bit set to indicate a child node has at
least one bit set, add counter to keep track of “how many”
bits in a child node are set. Enables all top-down operations.
Data Structure of CPI with h = 3
11
Example CPI Operations
• FINDMIN(): Start at root. Find MSB (most-significant-bit) and
traverse sub-tree. Repeat until leaf. Same as before.
• DELETE(i): Start at root. Decrement counter. If count = 0, clear
bit. Go down corresponding sub-tree. Repeat until leaf.
Data Structure of CPI with h = 3
12
Time/Memory Complexity of CPI
• TEST(i) takes constant time. All other operations take
Q(logwU) time, same as basic PI structure.
• But all operations supported in top-down fashion
• For CPI of size U, memory size only 1.11U bits, still
Q(U) space
13
Pipelined Counting-Priority-Index
• Reduced amortized time complexity to constant time
• At the expense of Q(logwU) pipeline stages
• Memory size also only 1.11U bits, Q(U) space
Data Structure of Pipelined CPI with h = 3
14
Operations Supported
• Operations supported by all 3 priority indexing structures
TEST(i)
INSERT(i)
DELETE(i)
FINDMIN
FINDMAX
EXTRACTMIN
EXTRACTMAX
SUCCESSOR(i)
PREDECESSOR(i)
EXTRACTSUCC(i)
EXTRACTPRED(i)
Test if index i is in set S
Insert a new index i to set S
Delete index i from set S
Find the smallest index in set S
Find the largest index in set S
Delete the smallest index in set S
Delete the largest index in set S
Find the successor of index i in set S
Find the predecessor of index i in set S
Delete the successor of index i in set S
Delete the predecessor of index i in set S
15
Comparison
Time
Hardware
Memory
PI
Q(logwU)
constant
1.016 U
CPI
Q(logwU)
constant
1.11 U
Pipelined CPI
constant
Q(logwU)
1.11 U
16
Hardware Complexity of Pipelined CPI
Number of Pipeline Stages in the Data Structures
17
Summary
• Fast sorting data structures
– Fast and scalable succinct data structures for the
implementation of priority queues
– The Pipelined CPI supports constant time priority
management operations
– The hardware complexity is only Q(logwU) with
Q(U) memory space
18
Thank You
19