* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Centralized P2P systems
Survey
Document related concepts
Transcript
P2P Search COP6731 Advanced Database Systems P2P Computing Powerful personal computer Share computing resources P2P Computing Advantages: Shared infrastructure costs Highly scalable No SPOF censorship-resistance P2P Search Techniques Centralized P2P systems Decentralized & unstructured P2P systems e.g. Gnutella Hybrid - partially decentralized e.g. Napster, SETI@home e.g., Freenet Structured P2P systems DHT systems (CAN/Chord/Pastry/Tapestry) Skip-list based systems Napster MP3 file sharing with a centralized catalog Peers hold files Napster Inc’s servers hold catalog File transfer is P2P, using a proprietary protocol Napster: Publish a File Users upload their IP address and music titles they wish to share Central Napster server 192.1.2.3 Napster: Query for a File Users search for peers to download desired files 192.1.2.3 Central Napster server Napster: Transfer Requested File File transfer is P2P, using a proprietary protocol 192.1.2.3 Central Napster server Disadvantage of Centralized Directory Performance bottleneck Single point of failure Can we do it without a directory ? Gnutella No catalog Pings network to locate Gnutella peers File requests are broadcast to peers Flooding or breadth-first research When provider is located, the file is transferred via HTTP Gnutella: Issue a Request xyz.mp3 ? Gnutella: Flood the Request Gnutella: Reply with the File Gnutella - Disadvantages Network flooding - unnecessary network traffic Using TTL - some files might not be found Alternatively, using ultranodes (or supernodes) using depth-first search, i.e., Freenet Morpheus, Kazaa Supernode Layer Center Index for its cluster as A h rH Pee ly: “ e X” fil s E I Rep o ha Wh ry: “ Que file X” F D C G H Cluster Cluster Download file X from Peer H B Cluster Using Ultranodes Queries flood only the network of ultranodes Other peer nodes shielded from query traffic Combine the benefits of centralized and decentralized search; Take advantage of the heterogeneity in peer capabilities; Freenet - Depth-First Search Download file X from Peer E Query: “Who has file X” A E Em : “I ply t ha s fi le X ” le X Pee rD h a s mi g h t file X Rep ly : has “Peer E file X” igh e fi hav B er Pe Re ht ig m C eX er fil Pe has E er Pe ” : “ le X y pl s fi Re ha C D Freenet – File not Found Download file X from Peer E A E Em NOT FOUND ! er Pe t igh m C eX er fil Pe has I HAVE FILE X ! igh F t ha s fi Pee rD h a s mi g h t file X le X C B D The requested file not found due to a poor routing decision made at peer D In this case, query backs out of the deadend, and tries another peer in depth-first manner Structured P2P Systems DHT-based Skip-list based Chord / Pastry / Tapestry: hashbased into single dimensional space CAN: hash-based into multidimensional space P-grid: hash-based into virtual binary search tree Skipgraph / SkipNet Index Tree-based BATON DHT Design Goals An “overlay” network with: Flexible mapping of keys to physical nodes Data Independence Small network diameter Small degree (fan-out) Local routing decisions Robustness to churn Routing flexibility Proximity A “storage” or “memory” mechanism with No guarantees on persistence Maintenance via soft state Metrics Searching/Lookup Number of hops in searching Number of messages Database related metrics: Total disk I/O Response Time Accuracy Maintenance Number of hops Number of messages How to Bound Search Space ? Work on placement! Network Basic Idea - Hashing Publish (H(y)) P2P Network Object “y” Objects have hash keys Join (H(x)) Peer “x” H(y) H(x) y Hash key Peer nodes also x have hash keys in the same hash space Place object to the peer with closest hash keys Viewed as a Distributed Hash Table 0 Hash table Peer nodes Each is responsible for a range of the hash table, according to the peer hash key Objects Note that are placed in the peer with the closest key peers are Internet edges Internet 2128-1 How to Find an Object? Hash table 0 2128-1 Peer node Want to keep onlyone hop to a few entries! find the object Simplest idea: Everyone knows everyone else! Using Distributed Hash Table (DHT) Hash table Peer node 0 A peer only needs to know its logical neighbors Search based on multihop routing 2128-1 DHT in action K V K V K V K V K V K V K V K V K V K V K V DHT in action K V K V K V K V K V K V K V K V K V K V K V DHT in action K V K V K V K V K V K V K V K V K V K V K V Operation: take key as input; route messages to node holding key DHT in action: put() K V K V K V K V K V K V K V K V K V insert(K1,V1) K V K V Operation: take key as input; route messages to node holding key DHT in action: put() K V K V K V K V K V K V K V K V K V insert(K1,V1) K V K V Operation: take key as input; route messages to node holding key DHT in action: put() (K1,V1) K V K V K V K V K V K V K V K V K V K V K V Operation: take key as input; route messages to node holding key DHT in action: get() K V K V K V K V K V K V K V K V K V K V K V retrieve (K1) Operation: take key as input; route messages to node holding key DHT in action K V K V K V K V K V K V K V K V K V K V K V retrieve (K1) CAN – Content Addressable Network Each peer is responsible for one zone, i.e., stores all (key, value) pairs of the zone Each peer knows the neighbors of its zone Random assignment of peers to zones at startup Dimensional-ordered multihop routing CAN: Object Publishing node I::publish(K,V) I CAN: Object Publishing x=a node I::publish(K,V) (1) a = hx(K) I CAN: Object Publishing x=a node I::publish(K,V) (1) a = hx(K) b = hy(K) y=b I CAN: Object Publishing node I::publish(K,V) (1) a = hx(K) b = hy(K) (2) route (K,V) -> J I J CAN: Object Publishing node I::publish(K,V) (1) a = hx(K) b = hy(K) (2) route (K,V) -> J (3) J stores (K,V) I J (K,V) CAN: Object Retrieval node I::retrieve(K) J (1) a = hx(K) b = hy(K) (2) route “retrieve(K)” to J that is in charge of (a,b) I (K,V) Some Research Topics Content-based Image Retrieval in P2P Location Management in P2P Security Considerations for DHT P2P Backup Wireless P2P