* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Project In Computer Science Computer Networks
Survey
Document related concepts
Transcript
CS-279-I: Design Project In Computer Science Computer Networks Michalis Faloutsos Surge 333 [email protected] www.cs.ucr.edu/~michalis 1 Riverside Copyright: UC This is THE Class! The ultimate challenge Developing a complete system Real engineering Dress rehearsal for when you go out there 2 What is different here Open-ended definition of the problem Freedom to design Freedom to define the scope Fruitful interaction with colleagues 3 This Translates to: Problem-solving engineering skills Designing complete systems • Design, Implement, Test Exercise self motivation and independent thinking Cultivate teamwork skills 4 Freedom Comes At a Price Responsible behavior Proactive approach Need for communication • Towards me and among yourselves 5 How Real Engineering Differs from School Assignments Development is a process, not an all-nighter The user defines features not guidelines The design is half of the solution Most design decisions attempt to strike the balance in a trade-off You need to justify your approach Teamwork is critical for success 6 The Project Develop a distributed file sharing system Requirements scalable to large number of users Other requirements • Exchange arbitrary type files • Search for the files of interest (id, keywords, description) 7 Illustrating the Project Application level connections New “users” can join Users can leave Users can search for information Users acquire information 8 The Process and the Phases Understanding the problem and previous work Design of the system Implementation Testing Evaluation of the process 9 Some Tips Start early: it is a lot of work! Find ways to distribute the work equally • Maximize parallelism • Modularity (contain errors, facilitate testing) Thinking ahead and organizing is critical • Desing and Tool selection, mode of operation 10 Layout of work End of 2nd week: project proposal ver 1 End of 3-4th week: project proposal ver 2 End of 5-6th week: midpoint presentation • Design of system End of 10th week: Deliver project, presentation, demonstration 11 Peer to Peer File Sharing: A Survey Based on a presentation by Ismail Guvenc and Juan Jose Urdaneta Copyright:12 UC Riverside Outline Peer-to-Peer Concept Overview of P2P systems: • Napster, Gnutella, Freenet, Freehaven, Oceanstore, PAST, Farsite, Publius, CFS, Tapestry, Pastry, Chord, CAN and others Comparison of the systems Conclusion 13 What is Peer-to-Peer? Every node is designed to(but may not by user choice) provide some service that helps other nodes in the network to get service Each node potentially has the same responsibility Sharing can be in different ways: • CPU cycles: SETI@Home • Storage space: Napster, Gnutella, Freenet… 14 P2P: Why so attractive? Peer-to-peer applications fostered explosive growth in recent years. • Low cost and high availability of large numbers of computing and storage resources, • Increased network connectivity As long as these issues keep their importance, peer-to-peer applications will continue to gain importance 15 Main Design Goals of P2P systems Ability to operate in a dynamic environment Performance and scalability Reliability Anonymity: Freenet, Freehaven, Publius Accountability: Freehaven, Farsite 16 First generation P2P routing and location schemes Napster, Gnutella, Freenet… Intended for large scale sharing of data files Reliable content location was not guaranteed Self-organization and scalability: two issues to be addressed 17 Second generation P2P systems Pastry, Tapestry, Chord, CAN… They guarantee a definite answer to a query in a bounded number of network hops. They form a self-organizing overlay network. They provide a load balanced, fault-tolerant distributed hash table, in which items can be inserted and looked up in a bounded number of forwarding hops. 18 The Main Functions of Sharing Publish a file: make available Search for a file • Search for the ID of the file (Springsteen, born in) • Forward request (flood or with clue/structure) Retrieve the file Maintain the connectivity of the network • Who connects to whom - optimize Maintain copies of files in the network • File replication - optimize availability of files 19 Napster Application-level, client-server protocol over point to-point TCP, centralized system Retrieval: four steps Connect to Napster server Upload your list of files (push) to server. Give server keywords to search the full list with. Select “best” of correct answers. (pings) centralized server: single logical point of failure, can load balance among servers using DNS rotation, potential for congestion, Napster “in control” (freedom is an illusion) no security: passwords in plain text, no authentication, no anonymity 20 Napster: How it works?(1) 1. File list is uploaded napster.com users 21 Napster: How it works?(2) 2. User requests search at server. napster.com Request and results user 22 Napster: How it works?(3) 3. napster.com User pings hosts that apparently have data. Looks for best transfer rate. pings pings user 23 Napster: How it works?(4) 4. User retrieves file napster.com Retrieves file user 24 Gnutella peer-to-peer networking: applications connect to peer applications focus: decentralized method of searching for files each application instance serves to: • store selected files • route queries (file searches) from and to its neighboring peers • respond to queries (serve file) if file stored locally How it works: Searching by flooding: • If you don’t have the file you want, query 7 of your partners. • If they don’t have it, they contact 7 of their partners, for a maximum hop count of 10. • Requests are flooded, but there is no tree structure. • No looping but packets may be received twice. Note: Play gnutella animation at: http://www.limewire.com/index.jsp/p2p 25 Freenet Completely anonymous, for producers or consumers of information Resistance to attempts by third parties to deny access to information Users share disk-space • “Freenet” can write files in your space -> not resposnsible Goals: • • • • Anonymity for producers and consumers Deniability for information storers Resistance to denial attacks Efficient storing and routing 26 27 Free haven Anonymous Resists powerful adversaries to find/destroy data Goals • • • • Anonymity: publishers, readers, servers Persistence: lifetime determined by publisher Flexibility: add/remove nodes Accountability: reputation A server gives up space => gets space on other servers 28 Freehaven: Publication Split doc into n shares, k of which can rebuild the file(k<n) • Large k => brittle file • Small k =>larger share, more duplication Generate (SKdoc, PKdoc) and encrypt each share with SKdoc Store on a server: • Encrypted share, timestamp, expiration date, hash(PKdoc) 29 Freehaven: Retrieval Documents are indexed by hash(PKdoc) Reader generates (PKclient ,SKclient) and a one-time remailler reply block Reader broadcasts hash(PKdoc), PKclient ,and the remailler block • To all servers it knows about • Broadcasts may be queued and bulk sent Servers holding shares with hash(PKdoc) • Encode the share with PKclient • Send it using the remailler block 30 Freehaven: Share expiration Absolute Date “Price” of a file: size x lifetime Freenet and Mojo Notion favor Popular documents Unsolved problems: • Large corrupt servers, list of “discouraged” documents, DoS Not ready for wide deployment: • Inefficient communication=>few users=>weak anonymity 31 PAST & Pastry Pastry: • Completely decentralized, scalable, and selforganizing; it automatically adapts to the arrival, departure and failure of nodes. • Seeks to minimize the distance messages travel, according to a scalar proximity metric like the number of IP routing hops. • In a Pastry network, Each node has a unique id, nodeId. Presented with a message & a key, Pastry node efficiently routes the message to the node with a nodeId that is numerically closest to the key. 32 Pastry: NodeId Leaf set: stores numerically closest nodeIds. Routing table: Common prefix with 10233102next digit-rest of NodeId Neighborhood set: Stores closest nodes according to proximity metric 33 Pastry: Routing Given a message, Check: If it falls within the range of nodeId’s covered in the leaf set, then forward directly to it. If not, using the Routing table, the message is forwarded to a node that shares a most common prefix with the key. If routing table is empty or the node cannot be reached, then forward to a node that is numerically closer to the key and also shares a prefix with the key. Performance: • If key within the leaf set = O ( 1 ) • If key goes to the routing table=O(LogN) • Worst case = O (N) ( under failures) 34 PAST PAST: an archival, cooperative file storage and distribution facility. uses Pastry as its routing scheme Offers persistent storage services for replicated read-only files Owners can insert or reclaim files, but clients can just look up Collection of PAST nodes form an overlay network. A PAST node is at least an access point, but it can also contribute to storage and participate in the routing optionally as well. Security: Each node and each user in the system holds a smartcard with which there is a private/public key pair associated. Three operations: insert, lookup and reclaim 35 Farsite Farsite is a symbiotic, serverless, distributed file system. Symbiotic: It works among cooperating but not completely trusting the clients. Main design goals: • To provide high availability and reliability for file storage. • To provide security and resistance to Byzantine threats. • To have the system automatically configure and tune itself adaptively. Farsite first encrypts the contents of the files. This prevents an unauthorized user to read the file. Such a user can not read a file even if it is in his own desktop computer, because of encryption. Digital signatures are used to prevent an unauthorized user to write a file. After encryption, multiple replicas of the file are made and they are distributed to several other client machines. 36 Publius Publius system mainly focuses on availability and anonymity. It maintains availability by distributing files as shares over n web servers. J of these shares are enough to reconstruct a file. For publishing the file, we first encrypt the document with key K. Then K is split in n shares, any j of which can rebuild K. K(doc) and a share are sent to n servers. “Name” of the document is the address of n servers. Query operation is basically running a local web proxy, contacting j servers and rebuild K. While the identity of the servers are not anonymized, an attacker can remove information by forcing the closure of n-k+1 servers. Publius lacks accountability(DoS with garbage) and smooth join/leave for servers. 37 Chord Chord: Provides peer-to-peer hash lookup service: Lookup(key) IP address Chord does not store the data Efficient: O(Log N) messages per lookup N is the total number of servers Scalable: O(Log N) state per node Robust: survives massive changes in membership 38 Chord: Lookup Mechanism Lookups take O(Log N) hops N5 N10 N110 N20 K19 N99 N32 Lookup(K19) N80 N60 39 Tapestry Self-administered, self-organized, location independent, scalable, fault-tolerant Each node has a neighbor map table with neighbor information. 40 Tapestry Cont. The system is able to adapt to network changes because it algorithms are dynamic. This also provides for Fault-handling 41 CAN The network is created in a tree-like form. Each node is associated to one in the upper level an to a group in the lower level. A query travels from the uppermost level down through the network until a match is found or until it reaches the lowermost level. For its query model, scalability is an issue. 42 CAN Cont. The tree like network: 43 Oceanstore De-centralized but monitored system. Build thinking of untrusted peers (for data storage) and nomadic data. Monitoring allows pro-active movement of data. Uses replication and caching. Two lookup methods used: Fast probabilistic and slow deterministic. 44 MojoNation Centralized: Central Service Broker + many peers. When a file is inserted it’s hashed. This is the files Unique Identifier. Uses fragmentation and replication (50%) Load balancing is an issue since it is marketbased. 45 Panther Based on Chord lookup algorithms Chord capabilities are used for load balancing and replication Files and file chunks are identified by keys generated through Chords hash system. Replication, fragmentation and caching are used. Authentication is provided through the use of public and private keys. 46 Eternity Provide “eternal” storage capabilities. Protect data even from its publisher and systems administrators Fragmentation and replication are proposed Anonymity is used to protect data 47 Others Others: • • • • • • Ohaha system uses consistent hashing-like algorithm to map documents to nodes. Query routing is like the one in freenet, which brings some of the weaknesses of freenet together. The Rewebber maintains a measure of anonymity for producers of web information by means of an encrypted URL service. TAZ extends Rewebber by using chains of nested encrypted URL’s that successively point to different rewebber services to be contacted. Intermemory and INDIA are two cooperative systems where files are divided into redundant shares and distributed among many servers. They are intended for long term archival storage along the lines of Eternity. The xFS file system focuses on providing support to distributed applications on workstations interconnected by a very high-performance network, providing high availability and reliability. Frangipani is a file system built on the Petal distributed virtual disk, providing high availability and reliability like xFS through distributed RAID semantics. Unlike xFS, Petal provides support for transparently adding, deleting, or reconfiguring servers. GNUnet is free software, available to the general public under the GNU Public License (GPL). As opposed to Napster and Gnutella, GNUnet was designed with security in mind as the highest priority. 48 Comparison Things to note Routing and File Retrieval Napster Centralized server(single point of failure), not scalable Centralized lookup, keyword is used for look-up Gnutella Inefficient routing(flooding), not scalable Flooded routing(multicast), filename is used for look-up Freenet Scales well, intelligent routing, anonymity maintained No structured lookup algorithm, degrades efficiency Free Haven Anonymous, accountability maintained File is split into n shares, k of which can rebuild the file Shared-Private keys used for encryption Farsite Scales well, accountability is maintained, feasible Files stored in encrypted format Publius Anonymity is maintained File is split into n shares, k of which can rebuild the file Pastry Leafset, Routing Table, Neighborhood set O(logN) without neighbor failure, O(1) if node is in leaf set Used by Squirrel, PAST, Scribe Routing based on address prefixes lookup(key)=> IP address, used by CFS, Panther O(logN) with neighbor failure, using replicas on successors Routing based on finger table(O(LogN)) Routing based on numerical difference with destination node Tapestry Neighbor Map Table on Nodes Routing based on address prefixes CAN Tree-like network topology Routing table doesn't grow with network size Scalability is an issue Messages routed in d-dim. Space, each node has routing Chord table of O(d) entries, any node reached in (dN1/d) routing hops 49 File Storage Systems Comparison Caching Fragmentation Replication De-centralized Panther OceanStore Mojonation CFS 50 Conclusion Issues that need to be addressed: Caching, versioning, fragmentation, replication. Copyright laws! The technology is very promising. It will probably a common thing in the near future. 51