Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Discussion Notes for Project 3 1. Goal In project 3 we will build a peer-to-peer file sharing with distributed index table --- a prototype of eDonkey. You will write approximate 2000 line of code, and can have one partner to work with you together. 2. The intuition and design choice behind the p2p network. 2.1 Project 1: using centralized index table 1 Using central server to maintains the index table. It’s not scalable. The central server is the bottleneck and is one-point of failure. 2. Use TCP for index table maintenance, resulting in low efficiency. 3. Use hard state: assume nodes are always here, never refresh state. 2.2 Project 3: using decentralized index table 1. Index table distributed across nodes All nodes cooperate with each other, building and maintaining the index service together. A node acts as bother client and index server. A node program consists of two parts: client program and server program. Big question: How to build the distributed index table? How to do search on the distributed index table? 2. Use UDP for index table maintenance. 3. How to detect failure? Using soft-state: assume failure is part of normal operation. Client periodically update information (index, state) to the server. --- Server put a timer on each data. Upon a timer expired, and the client doesn’t update the information on time, server assumes the client dies and remove all information from the server. --- Upon receiver data from a client, if it has the data, this is an information update from the client. It update corresponds entry and reset the timer. If it doesn’t have the data, create a new entry and set the timer. 4. What other issues do you want to consider? Load balance: Assume no node is special and all are the same. Indices should be evenly distributed over all nodes. Each node stores the same amount of indices (from others), and answer the same amount of query (from others). Dynamic management of indices: Nodes come in and leave frequently. What’s the problem? Each node store indices for the network, if the node leave, we have to re-distributed the indices. 3. Solution: a simple variation of Chord protocol 3.1 The setup 1. A one-dimensional name space It can be a space with 6 bits, 32 bits or 128 bits Name are generated in the range of (0 … 2^m -1). Wrap the name space around and you get a simple RING topology. 2. Each machine and each file has a unique ID (name) in the space. ID of machine: hash(port + IP address) ID of file: hash(filename) --- Map machines and files to IDs. IDs of files and IDs of machines are equivalent. --- If name space is large, few machines have same ID and few files have same ID --- We get an order on machines and files. The order is on the value of the ID of the machine and file. --- Achieve load balance. Based on hash, we let both machines and file indices are evenly (uniformly) spread out in the name space. If each machine stores a set of file indices close to itself, you get the load balance. In both chord protocol and our protocol, it stores a set of indices between previous nodes itself. We call it successor mode. --- SuccessorNode(x): x is ID of machine or file: the node with NodeID = x if such node exists; otherwise, the node whose NodeID immediately follows x on the name space. IF x is the ID of a file, successorNode(x) is the root for x, on which it’s the indeices of x stores. NodeID is PredecessorNode of SuccessorNode(NodeID). 3.2 Routing: 1. To form the ring network, each node keeps track of its sucessorNode and predecessor node. The data structure is as follows: struct Peer{ U32 NodeId; U32 address; U16 port; } Peer predecessorNode; Peer successorNodes[k+1]; At checkpoint 1, you only need to form a static ring, in which you manually assign a node the node ID, successorNode and predecessorNode. 2. Forwarding algorithm : send(x), which is Routing to ID x. x is the ID of a node, or a file. The node with ID x might even not exist in the network, so the semantics of send(x) is to route to the SuccessorNode(x). You don’t whethter node x exist, you also don’t know what its IP address. The only thing you know is your successor. You also know if you go along the ring, you will definitely find out which is SuccessorNode(x). Then you just do the routing hop by hop via the successor until you hit SuccessorNode(x). When you get to the destination, the results can be sent back via IP, because you know the IP of both end. 3.3 The distributed Index Service 1. Each node maintains the indices for the ID range it’s responsible for in memory. The node uses the index to add newly published file metadata or to answer search requests by looking up the index. 2. Publish When a node wants to publish its local file, it hashes the file name of the file to generate FileId, constructs a DP_PUBLISH packet, and sends it through the overlay using send(FileId). Note that the DP_PUBLISH packet should be published every PUBLISH_PERIOD to refresh the published state. 3. Search When a node wants to search a file, it first hashes the file name it looks for and gets a FileId. The node constructs a DP_SEARCH packet with the given file name and sends it through the overlay using send(FileId). It will get to SuccessorNode(FileId). The SuccessorNode(FileId) lookups locally in its index table and returns search results (DP_ SEARCHRESULT) back to the querying node. 4. What to do in checkpoint 1: 1. Form a static ring 2. Forwarding algorithm. Send(x), routing to ID x and find sucessorNode(x). 3. Publishing, searching with soft state. 4. Downloading, use code from project 1.