Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Computer network wikipedia , lookup
Network tap wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Airborne Networking wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Wake-on-LAN wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Advanced topics in Computer Networks Lecture 9: Tree-based lookup University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani Univ. of Tehran Adv. topics in Computer Network 1 Outline Issues Multiway and Multicolumn search DMP-Tree Some implementation issues Univ. of Tehran Adv. topics in Computer Network 2 Issues How to sort prefixes Prefixes as ranges Comparing prefixes Based on length Add extra bits at the end. New definition (DMP-tree) How to apply tree structures like binary tree or m_way tree to prefixes Univ. of Tehran Adv. topics in Computer Network 3 Multiway tree lookup. Proposed by G. Varghese and his students. Consider prefixes as range. First try: Pad 0’s to prefixes in order to apply binary search tree. consider {1*, 101* and 10101* }prefixes Should match here 100000 101000 101010 Binary search fail for 101011 Binary search ends here. Univ. of Tehran 101110 111110 all of them!. Adv. topics in Computer Network 4 Multiway tree lookup(cont) Two problem in the previous example Being Far away from matching prefix Multiple addresses matching different prefixes end up in the same region. Solution: Prefixes as ranges, Put the end of range in the table. L 100000 L 101000 We have the explicit ranges. L 101010 101011 Search maps to one range H 101011 only. H 101110 101111 111110 H 111111 Univ. of Tehran Adv. topics in Computer Network 5 Multiway tree lookup(cont) 100000 101000 101010 101011 101111 111111 L L L H 101011 H 101110 111110 H For 101011, we try to find first L which is not followed by H. For the rest, we can have a stack operation to find the first L. Problem: Linear search to find L Univ. of Tehran Adv. topics in Computer Network 6 Multiway tree lookup(cont) Solution: Precompute prefixes corresponding to ranges. > = 100000 101000 101010 101011 101111 111111 Univ. of Tehran L L P1)100000 P2)101000 P3)101010 101011 101111 111111 P1 P2 P3 P2 P1 - P1 P2 P3 P3 P2 P1 L H 101011 H 101110 111110 H 1* matching prefix. Adv. topics in Computer Network 7 DMP-Tree Comparing prefixes. Sorting prefixes Binary prefix Tree. M_way prefix tree. Univ. of Tehran Adv. topics in Computer Network 8 Trie structure Trie or radix tree Univ. of Tehran Adv. topics in Computer Network 9 Sorting prefixes Question? Why well-known tree structures cannot be applied to the longest prefix matching problem? Answer- No a well-known method for sorting. Definition: Assume Aa1a2…an and B=b1b2…bm to be prefixes of and there a character 1. If n=m, the numerical values of A and B are compared. 2. If n m (assume n<m), the two substrings a1a2…an and b1b2…bn are compared. If a1a2…an and b1b2…bn are equal, then, the (n+1)th character of string B is checked. It is considered B>A if bn+1 is before and B A otherwise. Univ. of Tehran Adv. topics in Computer Network 10 Sorting prefixes (cont) Example- Assume M is Then, BOAT is smaller than GOAT and SAD is bigger than BALLOON. CAT is considered bigger than CATEGORY since the fourth character in CATEGORY, E, is smaller than M. Sorting is a function to determine the position of each prefix. Prefixes of table is sorted as: 00010*,0001*,001100*,01001100*,0100110*,01011,00 1*,01011*,01*,10*,10110001*,1011001*,10110011*, 1011010*,1011*,110* Univ. of Tehran Adv. topics in Computer Network 11 Binary prefix tree •Unfortunately, it fails for 101100001000 Why? •Prefixes are ranges and not just a data point in the search space. Univ. of Tehran Adv. topics in Computer Network 12 Binary prefix tree (cont) Definition: prefixes A and B are disjoint if none of them is a prefix of other. Definition : prefix A is called enclosure if there exists at least one element set such that A is a prefix of that element. We modify the sort structure; 1. 2. 3. 4. Each enclosure has a bag to put its data element on it. Sort remaining elements. Distribute the bag elements to the right and left according the sort definition. Apply algorithm recursively. Univ. of Tehran Adv. topics in Computer Network 13 Binary prefix tree (cont) Example- Prefixes in table 1. First step. The second step, Note- enclosures are in the higher level than the contained elements. (important!) Univ. of Tehran Adv. topics in Computer Network 14 Binary prefix tree (cont) The final tree structure Univ. of Tehran Adv. topics in Computer Network 15 Sorting prefixes (cont) Sorting algorithms Based on bubble sort Based on Radix sort. Tmp= MinLength(list) for all i in list except tmp do compare i with tmp if i matches tmp then; put i in tmp’s bag if i<tmp then put i in leftList: if i>tmp then put i in rightist: endfor list = Sort(leftList) Sort(rightList) Univ. of Tehran Adv. topics in Computer Network 16 M_way prefix tree Problems with the binary prefix tree. Two way branching. The structure is not dynamic and insertion may cause problems!. 1. Divide by m after sorting the strings 2. Static m_way tree. Build a dynamic data structure like B-tree. How to guarantee enclosure to be in the higher level than its contained elements. Define node splitting and insertion. Univ. of Tehran Adv. topics in Computer Network 17 M_way prefix tree (Cont) Node splitting: Finding the split point. 1. 2. 3. Take the median if the data elements are disjoint. If there is an enclosure containing other elements, take it as split point. Otherwise, take an element which gives the best splitting result. Note, this does not guarantee the final tree will be balanced. Univ. of Tehran Adv. topics in Computer Network 18 M_way prefix tree (Cont) Insertion: 1. 2. 3. If the new element is not an enclosure of others, find its place and insert in the corresponding leaf, like B-tree. Otherwise, replace the closet element with element and reinsert the replace elements. Resort the resulted subtree, (space division) if necessary. Building tree is similar to building B-tree. Univ. of Tehran Adv. topics in Computer Network 19 M-way prefix tree (cont) Example Univ. of Tehran Prefix Abbrv. 10 01 110 1011 0001 01011 00010 001100 1011001 1011010 0100110 01001100 10110011 10110001 01011001 001011 00111010 A B C D E F G H I J Prefix 1101110010 10001101 11101101 01010110 00100101 100110100 101011011 11101110 10110111 011010 011011 011101 0110010 101101000 101101110 00011101 011110110 Adv. topics in Computer Network Abbrv. K L M N O P Q R S T U V W X Y Z II 20 M-way prefix tree (cont) We insert prefixes randomly. The tree uses 5 branching factor (at most 4 prefixes in each node) Insert 01011, 1011010, 10110001 and 0100110. Then, adding 110 cause overflow. Split node 10110001 (0100110,01011) (1011010, 110) (all element are disjoint) Univ. of Tehran Adv. topics in Computer Network 21 M-way prefix tree (cont) Insert 10110011, 1101110010, Adding 1011001 causes overflow. 00010. 10110001 1011010 (00010,0100110,01011) (1011001,10110011) (110,1101110010) (case 3 of splitting) Latter adding 1011 cause problem. It is the case of adding an enclosure. We will have space division. Univ. of Tehran Adv. topics in Computer Network 22 M-way prefix tree (cont) The final tree • The tree supersede B-tree or B-tree is a special case of this tree. Then, when data element are relatively disjoint, the height of tree is logMN. Univ. of Tehran Adv. topics in Computer Network 23 DMP-Tree No. of Data Max. height 50 60 70 80 90 BF =4 BF =5 BF =6 BF =7 BF =8 BF =9 BF =1 0 BF =3 25 20 15 10 5 0 100 • BF is Branching factor in the internal nodes. • No. of Data is in1000s. Univ. of Tehran Adv. topics in Computer Network 24 DMP-Tree Min. memory utilization No. of Data 50K 0.68 0.66 0.64 0.62 60K 70K 80K 3 4 5 6 7 8 9 branching 10 90K 100K • Number of prefixes in the right. Univ. of Tehran Adv. topics in Computer Network 25 DMP-Tree • Height of tree for 100K data prefixes. Height 25 20 Min 15 Ave. 10 Max 5 0 3 Univ. of Tehran 4 5 6 7 8 9 10 Branching Adv. topics in Computer Network 26 DMP-Tree Analyzing of results. With increasing BF, Branching Factor, the height decreases. The result are for the worst case, Max height, and the ave. case is much less. After BF=9, increasing Branching Factor does not decrease the max. height. The results are for the set of prefixes of 50,000-100,000 with lengths from 8 t0 31. The size of actual prefixes in use is around 50,000 and the length is 8-31. Univ. of Tehran Adv. topics in Computer Network 27 DMP-Tree Memory utilization:, Mem. Utilization is 0.64%-0.67% without considering the tree branching overhead. Mem. Utilization is 0.53%-0.62% with tree branching overhead (pointers). Without considering branching pointers, the mem. Utilization decreases with increasing the branching factor. Total mem. Utilization increases with increasing the branching factor. Univ. of Tehran Adv. topics in Computer Network 28 DMP-Tree Therefore, The longest matching prefix of a network can be determined in 5 steps with 9 or more branching factor. In the worst case, we need at most 2 times of total prefix data size of memory to implement the scheme. For instance, for 50,000 prefixes of 32bit, we need at most 3.2 Mbit of memory. Univ. of Tehran Adv. topics in Computer Network 29 Overall Design All operations need search first in the Tree structure. Two search procedures, one for the longest matching prefix and another for update. The prefix tree data structure is on the chip. The Policy table is on the off chip memory. There is a port to data link layer mapping module. Univ. of Tehran Adv. topics in Computer Network 30 Tree Nodes Internal nodes Internal nodes. Addr 1 [?] Prefix 1 [33] Port [?] Branching factor Leaf nodes Addr 2 [?] Prefix 2 [33] port [?] … Addr [?] Each prefix has a left and right pointer which are pointing to left and right subtrees respectively. We can have N prefixes in each internal node. Then, N+1 is the branching factor. The bigger N, the faster search time, but the more logic is needed. Univ. Tehran is the address Adv. topics in Computer of Port of theNetwork port in the switch31to Tree Nodes Leaf nodes. Prefix 1 [33] port [?] Prefix 2 [33] port [?] … There is no left and right subtree pointers. The number of prefixes in the leaf node is M. The leaf nodes are stored in a off chip memory to make the scheme scalable to the large number of prefixes. Univ. of Tehran Adv. topics in Computer Network 32 Branching Factor What is the best number for N? (Branching factor) The bigger N, the faster search process. (Fact 1) The bigger N, the more memory pins are and usually the more mem. Bandwidth is needed (Fact 2). The bigger N, the more logic we need to process the node (Fact 3). Simulation result shows The bigger N, the better memory utilization in the memory. For N 8, theAdv. max. of the tree does not Univ. of Tehran topics inheight Computer Network 33 Simulation result Total memory: assuming one memory block and OC-192. # of Prefixes required Mem. Branching Factor Mem. Pins Mem BW (G/s) max Max mem Access Mem. Size (on chip)mm2 Max heights 64K 5.4 Mbits 15 897 89.7 4 81 5 64K 5.5 Mbits 11 655 65.5 4 82 5 64K 5.4Mbit 9 527 52.7 4 81 5 64K 6 Mbit 6 335 46.9 6 96 7 64K 6.6 Mbit 5 275 44 7 110 8 64K 6.5 Mbit 4 207 62.1 14 112 15 100K 8.3 Mbit 15 897 89.7 4 122 5 100K 8.5 Mbit 11 655 65.5 4 125 5 100K 8.3 9 527 63.24 5 122 6 100K 9 Mbit 6 335 53.6 7 135 8 100K 9.1Mbit 5 275 49.5 8 140 9 100K 9.5 4 207 62.1 14 150 15 40 5 30KUniv. of Tehran 2.6 9 527topics in Computer 52.7 4 expected Adv. Network 34 Branching Factor It seems any number between 8-16 is reasonable. But, N=9 gives a better search time, memory size. Assuming 9 branching factors in the internal node, %50 node utilization and 128K prefixes, we need max. 128K/4.5= 28.5K address. Then, 15 bit address for left and right pointers are more than enough. But, we need more for off chip addressing The number of switch port are usually limited, around 64, We can assume 256, Univ. of Tehran Adv. topics in Computer Network 35 then 8 bit is enough to address them. Branching Factor In order to make the internal node branching and leaf node branching even, M=10. If we want to read a node at once, we will need 41x10=410 pins which is difficult to support in one chip. We can divide a node in two and read/write in two clock cycles. This reduce the memory pins to 205 which is affordable. Univ. of Tehran Adv. topics in Computer Network 36 Memory requirement Prefix tree: Assuming 128K prefixes. N = 9 (BF) and M=10 (BF in leaves), the majority of prefixes, %80 will be in leaves, assume %65 node utilization, # of ave prefixes in a leaf node node = 10*0.6 5= 6.5 # of leaf nodes 128Kx%80x2/6.5 = 31.5K and %10 overhead 35 K Total off chip memory = 35K x 205(Mem BW) = 7.2 Mbits Then, we need 16 bits for addressing. 1 bit for internal/external. # of internal nodes= 128Kx%20/5.8=4.41K and %10 Link Addr overhead 4.9 K Total on chip memory=4.9Kx529K 2.6Mbits 48 bit addr. Port to link address mapping table. Univ. of Tehran Adv. topics in Computer Network 37 Memory requirement In summary: # of on chip Branchin Prefixes memor g Factor y Mbits On chip Mem. Pins Off chip mem Mem. Size memor Access( (on chip) 2 y search) mm Mbits Off 128K 529 7.2 250 Note: 2.6 10 5 40 chip mem pins Branching factor is the # of branching in internal nodes. The size of the memory scales with the size of data or # of prefixes. Power dissip. depends on the r/w freq, current & core voltage Considering Faraday Mem. Modules A 10Kx32 bits single port mem size is 36x1.45 mm2. Univ. of Tehran Adv. topics in Computer Network 38 Overall Design Mem. Ctrl Memory root addr To/From NP Search root content Update Search Insertion update delete CPU Inter face Port2Addr Ctrl Univ. of Tehran MEM To/From CPU Output mem Ctrl To/From out Mem Adv. topics in Computer Network 39 Search Path Addr[32] Piping Len[Nx6] Compare First[1] Found[1] PackAdd[29] SOA[1] Prty[1] GetLen CResult[1] Addr[32] MemAddr[14] Match[1] Next Addr[32] OutMemAddr Input Node Node Addr[32] SOA[1] InClk[1] Node Data[32] RdAdd[19] To/From Off mem Root Mem Ctrl IpAddr[32] LinkAddr[48] Cashing Dispatch Addr[32] DataOut[32] Addr[32] Port[8] To Scheduler There are data assertion signals between blocks which has not been shown every where because of space limitation. Univ. of Tehran Adv. topics in Computer Network 40 Input Module: Get the packet destination addresses from the parser. Do parity checking. It has the following input signals Search Path Input data which 32 bits. Start of Address, 1 bit, (SOA) Parity, 1 bit, (prty) Input clock, (InClk) It gets Data in two clock cycles, first the IP address and then, the packet address in the memory or packet id (cid) Univ. of Tehran Adv. topics in Computer Network 41 Search Path Input Module: 29 bits is used for the packet address and the last 3 bit for the policy, Then, 512 Mbytes can be supported to store the packet before sending them out. 31 2 The 2nd clock cycle data format: 0 Packet address InClk Policy The timing SOA Data Univ. of Tehran IpAddr PackAddr Or cid Adv. topics in Computer Network 42 Search Path Piping Module: Pipelines the search process. For new elements from input block does. For each new IP address do If found in the hash table send the packet memory address to dispatcher Else Enter IP address and the policy into the pipe FIFO End do, Univ. of Tehran Adv. topics in Computer Network 43 Search Path Piping Module: For elements in the FIFO For the first IP address in FIFO do If IP address is new then, assert first signal and send IP address and policy out. Else if next addr is on chip send the next node address to Mem. Ctrl. Else send the next node address to OffMemCtrl. send to the pipe the IP address and policy. For the recirculated address If the node was leaf then, Send the longest matching address to OutMemCtrl. Send Policy to Extract port and the packet address to dispatch. Else Put the IP address into the FIFO Replace the longest matching prefix address if a new one found. Univ. of Tehran Adv. topics in Computer Network 44 Search Path Piping Module: FIFO . Keep the current information of IPs. IP Addr [32] Port [8] Next Node [19] LMPA[] New [1] LMPA = Longest Matching Prefix Address New = 1 new , 0 old If the packet is new the next address will be zero and we can read root cash content instead of reading from memory. The address is off chip if the first, most significant bit is 1, otherwise it on chip. Univ. of Tehran Adv. topics in Computer Network 45 Search Path GetLen Module: This module get the length of prefixes. We add 1 to the end of a prefix and then padded with ‘0’s to make it 33 bits. Ex. 11011010 110110101000…0 (33 bits). Then, we should start from right and the first ‘1’ we meet, the rest is the prefix length. GetLen can be implemented as a multiplexer with case statement (32 case statement) and it can be done in one clock cycle. Univ. of Tehran Adv. topics in Computer Network 46 Search Path Compare Module: compare two prefix A and B with lengths L1 and L2. Assume L1>L2 and A[1:L2] is the first L2 bit from A, Then, 1. 2. If A[1:L2] = B A and B match. If A[L2+1] = 0 A B. Otherwise, A> B. If A[1:L2] > B A >B, otherwise A<B. One of the prefixes here is IP address with length 32. We assume there are no two identical elements in the tree. Univ. of Tehran Adv. topics in Computer Network 47 Search Path Next Module: Get the next node address to read and also the matching prefix and its corresponding port number. It gets two signals for each prefix, Match and ComResult (compare), Match =1 the prefix match, ComResult = 1 Prefix is bigger. It gets the left address of the first prefix, from the left, such that its ComResult signal is 1. It compares the matching prefix lengths and the get the one with the largest length. Univ. of Tehran Adv. topics in Computer Network 48 Search Path Dispatch Module: forms the Routing Group Address, RGA, from the port number and send it with packet stored memory address (PSMA) or CID. RGA is a 64 bit size bit map. The bit correspond to port number is set to 1. PSMA is dispatched first and Port and DLL address follows. Cashing Module: keep a cash of IP address and corresponding port. IP address Port[8] [32] Univ. of Tehran Adv. topics in Computer Network 49 Search Path Cashing Module: The cash is kept as a FIFO and its depth depends on the technology. Check IP address in FIFO. If the address found, then, assert found signal. write IP address on top of FIFO if it is not there already. Else write IP address on top of FIFO Cashing system always removes the last reference IP address from the cash. Univ. of Tehran Adv. topics in Computer Network 50 Search Process operations: This is for large prefixes (50K &up) cashing Piping Piping Piping Piping Piping Dispatch Off Mem root 0 2 On chip memory nodes 5 8 11 Leaf node 14 16 19 Time • This operation is for an IP address lookup • Piping is the bottleneck in the system and in ave. take 5 cycles. • Assuming 100 MHZ operation # of packets = 109/50 = 20 Million Line speed = 512x20 = 10.24 G for 64 byte packets. = 256x8x20= 41 G for 256 byte packets. • It is possible to support higher speeds with duplicating pipe. Univ. of Tehran Adv. topics in Computer Network 51 Output pins Pin Name Type Number DataIn (IP Addr) Input DataOut(Port&cid) Output 32 Port and cid, to schedular DataBus(cpu) In/out 32 CPU data bus CtrlBus(cpu) In/out 12 CPU control bus MemData2(Tree) In/out 205 If off chip is used MemAddr2(Tree) In/out 18 If off chip is used MemCtrl1(Tree) In/out 8? If off chip is used 340 This value can change around 10 percent Total Univ. of Tehran 32 Comment IP address, from parser Adv. topics in Computer Network 52