Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
P2P Storage 1.0 25/November/2003 Klaus Marius Hansen University of Aarhus Overview Storage (and retrieval) an original motivation for P2P systems (viz., file sharing) Load balancing (share resources) Fault tolerance Resource utilization How can P2P techniques be used to provide decentralized, self-organizing, scalable storage? o Two proposals built on top of Pastry and Chord o Material o o o o (Rowstron & Druschel 2001) Rowstron, A. & Druschel, P. (2001), Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility, in Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP '01), pp. 188-201. PAST: P2P global, archival storage based on Pastry (Dabek, Kaashoek, Karger, Morris & Stoica 2001) Dabek, F., Kaashoek, M.F., Karger, D., Morris, R. & Stoica, I. (2001), Widearea cooperative storage with CFS , in Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP '01), pp. 202-215. CFS: P2P block file storage based on Chord P2P Storage Looking Back... PAST The Cooperative File System (CFS) Summary Looking Back... Napster and Gnutella o o o o Use local file system of nodes Napster Index files centrally Allow nodes to search for files through index server Download in a peer-to-peer fashion Single point of failure... Gnutella o o o o No central index - only local knowledge of storage Search through flooding/walking Download in a peer-to-peer fashion Bad scalability Yes, this can be repaired in a number of ways :-) Basically just transient file sharing supported Freenet Completely decentralized Anonymous storage, clients, and publishers Replication through routing Probabilistic storage PAST Wants to Exploit multitude and diversity of Internet nodes to achieve strong persistence and high availability Create global storage utility for backup, mirroring, ... Share storage and bandwidth of a group of nodes - larger than capacity of any individual node The PAST System o o o o o o o Large-scale P2P persistent storage utility Strong persistence (resilient to failure) High availability Scalability Security Self-organizing, Internet-based structured overlay of nodes cooperate Route file queries Store replicas of file Cache popular files Based on Pastry Pastry Review Effective, distributed object location and routing substrate for P2P overlay networks Each node has a unique identifier (nodeId) Given a key and a message, Pastry routes the message to the node with nodeId numerically closest to the key ID Takes into account network locality based on an application-defined scalar proximity metric Pastry Routing Table Pastry Routing Example PAST Design o o o Any node running the PAST system may participate in the PAST network Nodes minimally acts as access points for users, but may also contribute storage and routing capabilities to the network Nodes have 128 bit quasi-random IDs (e.g., lower 128 bit of SHA-1 on IP address of node) -> nodes with adjacent IDs diverse File publishers have public/private cryptographic keys o o o Operations fileId = Insert(name, owner-credentials, k, file) Inserts replicas of file on the k nodes whose IDs are numerically closest to fileId (k <= |L|) file = Lookup(fileId) Retrieve file designated by fileId from if it exists and one of the k replica hosts are reachable The file is usually retrieved from the "closest" (in terms of a proximetry metric) of the k nodes Reclaim(fileId, owner-credentials) Weak delete: lookup of fileId is no longer guaranteed to return a result PAST: Insert Implementation o o o fileId = Insert(name, owner-credentials, k, file) fileId is calculated (SHA-1 of file name + public key + random number ("salt")) Storage required deducted against a client quota File certificate created and signed with private key Contains fileId, SHA-1 of file content, replication factor k, the random salt, various metadata File certificate + file is then routed to the fileId destination Destination verifies certificate, forwards to k-1 closest nodes Destination returns store receipt if all accepts PAST: Lookup Implementation file = Lookup(fileId) Given a requested fileId, a lookup request is routed towards a node with ID closest to fileId Any node storing a replica may respond with with file and file certificate Since k numerically adjacent nodes store replicas and Pastry routes towards local nodes, a node close in proximetry metric is likely to reply PAST: Reclaim Implementation Reclaim(fileId, owner-credentials) Analogous to insert, but with a "reclaim certificate" verifying that the original publisher reclaims the file A reclaim receipt is received, used to reclaim storage quota Storage Management We want aggregrate size of stored files to be close to aggregate capacity in a PAST network, before insert requests are rejected o Should be done in a decentralized way... Two ways of ensuring this o Replica diversion o File diversion Replica Diversion o o o Balances free storage space among nodes in a leaf set If a node cannot store a replica locally, it asks a node in its leaf set if it can Protocol must handle failure of leaf nodes then Acceptance a replica at a node for storage is subject to policies File size divided by available size should be lower than a certain threshold (leave room for small files) Threshold lower for nodes containing diverted replicas (leave most space for primary replicas) File Diversion If one of k nodes with nodeId closest to fileId declines (primary and diverted) to store a replica, the file needs to be diverted Inserting node generates a new fileId (using a new random salt) and tries again... Try three times, if all fails reject insertion Caching Goals of cache management o Minimize access latency (here routing distance) o Maximize throughput o Balance query load in system The k replicas ensures availability, but also gives some load balancing and latency reduction because of locality properties of Pastry A file is cached in PAST at a node traversed in lookup or insert operations if the file size is less than a fraction of the node's remaining cache size Caching files are evicted as needed - expiry not needed since files are immutable PAST Evaluation - Experimental Setup o o o o o Prototype implementation of PAST in Java Network emulation environment All nodes run in same Java VM Workload data from traces of file usage Eight Web proxy logs (1,863,055 entries, 18.7 GBytes) Workstation file system (2,027,908 files, 166.6 GBytes) "Problematic to get data of real P2P usage" 2250 PAST nodes, k = 5, b = 4 Different normal distributions of storage capacity of nodes used PAST: Storage Management is Needed o o Experiment without replica diversion and file diversion (using d1 and Web trace) Primary replica threshold = 1, diversion replica threshold = 0 Insertion rejection on first file insertion failure 51.1% insertion rejection... 60.8% ultimate storage utilization... PAST: Storage Management is Effective PAST: Caching is Good o File request characteristics based on web logs 775 unique clients mapped to PAST nodes PAST: Summary Based on Pastry P2P routing and location Insertion and replication of files High storage utilization and load balancing through storage management and caching The Cooperative File System (CFS) CFS Goals o o o o o o Distributed, cooperative, read-only storage and serving of files based on blocks Fault tolerance Load balance Tap into unused resources Challenges for a P2P architecture for this Decentralization Unmanaged participants Frequent joins and leaves Support multiple file systems in single CFS system with millions of servers... CFS Design (1) o o Two types of CFS nodes and three layers CFS Client FS: Uses DHash layer to retrive blocks, interprets blocks as files DHash: Uses Chord layer to locate CFS Server holding desired blocks Has public/private keys for signing file publishing CFS Server DHash: Storing blocks, replicating blocks, caching Chord: Looking up blocks, checking for cached copies CFS Design (2) Blocks (of size ~ 10 KB) interpreted similar to blocks in a Unix file system (with block IDs instead of disk addresses) Publishers insert file systems o Each block is inserted into CFS using a hash on the block's contents as ID o Root block is signed using private key, public key as ID File systems may be updated by publisher (by signing a new root block with same private key) Data is stored for a finite period of time o Extensions may be asked for o No explicit delete operation - assumes lots of storage Chord Review o o Nodes have m-bit IDs on an "identifier ring" One operation IP address = lookup(key) Given a key ID, find the successor node, i.e., node whose ID most closely follows the key ID on the identifier ring Simple Key Location Simple key location can be implemented in time O(log N) and space O(1) Example: Node 8 performs a lookup for key 54 Each node maintains r successors in a successor list for fault tolerance purposes Scalable Key Location (1) o Uses finger tables n.finger[i] = successor(n + 2^(i-1)), 1 <= i <= m Scalable Key Location (2) If successor not found, search finger table to find n' whose ID most immediately precedes id o Rationale: this node will know the most about n' of all nodes in the finger table Chord Extensions o o o o o Locality awareness in lookup Reduce lookup latency by preferably contacting nodes that are close by in underlying network Chooses preceding node in algorithm based on calculated average RPC latency and guesses on remaining number of routing hops Node ID authentication What if an attacker claims a node ID just after an inserted block ID? A node ID is a SHA-1 hash on the nodes IP address concatenated with a virtual node index Hard to choose own ID Joining nodes are checked When a new node is joined to a finger table, its ID is checked with the claimed IP address (by sending a random "nonce" that should be included in reply from IP) DHash Stores and retrieves uniquely identified blocks Handles distribution, replication, and caching of blocks Uses Chord to locate blocks Replication in DHash Blocks are replicated to k CFS servers immediately after the blocks successor on ID ring (k <= r) The k CFS servers are likely to be diverse in location DHash "get" operation uses replicas to choose server with lowest reported latency to download from Caching in DHash Each DHash layer in nodes sets storage aside for caching blocks to help avoiding overloading CFS servers with popular blocks Blocks are cached along lookup route o Lookups takes shorter and shorter hops to get to target -> lookups from different clients will likely to visit same nodes late in lookup o Least-Recently-Used (LRU) replacement of blocks in caches is used Cached root blocks may become inconsistent since their ID is based on public key rather than on content hash Load Balancing in DHash o Blocks are spread evenly in ID space Multiple virtual servers may be created on one physical node Virtual server on same physical node may look at each other location information DHash Quotas How to ensure Denial-of-Service attack by injecting large amounts of data? Reliable identification of publishers would require certificate authority (as with smartcards in PAST) CFS assigns a fixed storage quota per IP address o IP addresses are checked upon insert using nonces CFS Evaluation o o o Experiment on CFS performance on real hosts on the Internet Real-world client-perceived performance 12 machines scattered over the Internet Simulated servers on a single machine Robustness, scalability Real-Life: CFS vs TCP Speeds comparable, less variation in download speed Simulation: Load Balancing Server storage close to 0.016 for 64 servers (x 6 virtual) and 10,000 blocks Simulation: Caching CFS: Summary Based on Chord location in P2P overlay networks Implements a decentralized, distributed, read-only file system Scalable and fault-tolerant Summary Summary o File storage (or file sharing) one of the original drivers for P2P systems P2P routing and location substrates may be used to implement (file) storage Additional requirements towards, e.g., load balancing and caching PAST: Archival storage based on Pastry CFS: Read-only file system based on Chord Created by JackSVG