Download p2p_7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Airborne Networking wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Distributed operating system wikipedia , lookup

CAN bus wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Kademlia wikipedia , lookup

Peer-to-peer wikipedia , lookup

Transcript
P2P Storage
1.0
25/November/2003
Klaus Marius Hansen
University of Aarhus
Overview

Storage (and retrieval) an original motivation for P2P systems (viz., file sharing)
Load balancing (share resources)
Fault tolerance
Resource utilization

How can P2P techniques be used to provide decentralized, self-organizing, scalable
storage?
o
Two proposals built on top of Pastry and Chord
o
Material

o
o

o
o
(Rowstron & Druschel 2001)
Rowstron, A. & Druschel, P. (2001), Storage management and caching in
PAST, a large-scale, persistent peer-to-peer storage utility, in Proceedings of the 18th
ACM Symposium on Operating Systems Principles (SOSP '01), pp. 188-201.
PAST: P2P global, archival storage based on Pastry
(Dabek, Kaashoek, Karger, Morris & Stoica 2001)
Dabek, F., Kaashoek, M.F., Karger, D., Morris, R. & Stoica, I. (2001), Widearea cooperative storage with CFS , in Proceedings of the 18th ACM Symposium on
Operating Systems Principles (SOSP '01), pp. 202-215.
CFS: P2P block file storage based on Chord
P2P Storage




Looking Back...
PAST
The Cooperative File System (CFS)
Summary
Looking Back...
Napster and Gnutella


o
o
o
o

Use local file system of nodes
Napster
Index files centrally
Allow nodes to search for files through index server
Download in a peer-to-peer fashion
Single point of failure...
Gnutella
o
o
o
o


No central index - only local knowledge of storage
Search through flooding/walking
Download in a peer-to-peer fashion
Bad scalability
Yes, this can be repaired in a number of ways :-)
Basically just transient file sharing supported
Freenet




Completely decentralized
Anonymous storage, clients, and publishers
Replication through routing
Probabilistic storage
PAST
Wants to

Exploit multitude and diversity of Internet nodes to achieve strong persistence and
high availability
Create global storage utility for backup, mirroring, ...
Share storage and bandwidth of a group of nodes - larger than capacity of any
individual node


The PAST System

o
o
o
o

o
o
o

Large-scale P2P persistent storage utility
Strong persistence (resilient to failure)
High availability
Scalability
Security
Self-organizing, Internet-based structured overlay of nodes cooperate
Route file queries
Store replicas of file
Cache popular files
Based on Pastry
Pastry Review




Effective, distributed object location and routing substrate for P2P overlay networks
Each node has a unique identifier (nodeId)
Given a key and a message, Pastry routes the message to the node with nodeId
numerically closest to the key ID
Takes into account network locality based on an application-defined scalar proximity
metric
Pastry Routing Table
Pastry Routing Example
PAST Design

o
o
o
Any node running the PAST system may participate in the PAST network
Nodes minimally acts as access points for users, but may also contribute
storage and routing capabilities to the network
Nodes have 128 bit quasi-random IDs (e.g., lower 128 bit of SHA-1 on IP
address of node) -> nodes with adjacent IDs diverse
File publishers have public/private cryptographic keys

o

o


o

Operations
fileId = Insert(name, owner-credentials, k, file)
Inserts replicas of file on the k nodes whose IDs are numerically closest
to fileId (k <= |L|)
file = Lookup(fileId)
Retrieve file designated by fileId from if it exists and one of the k
replica hosts are reachable
The file is usually retrieved from the "closest" (in terms of a proximetry
metric) of the k nodes
Reclaim(fileId, owner-credentials)
Weak delete: lookup of fileId is no longer guaranteed to return a
result
PAST: Insert Implementation




o

o
o
fileId = Insert(name, owner-credentials, k, file)
fileId is calculated (SHA-1 of file name + public key + random number ("salt"))
Storage required deducted against a client quota
File certificate created and signed with private key
Contains fileId, SHA-1 of file content, replication factor k, the random salt,
various metadata
File certificate + file is then routed to the fileId destination
Destination verifies certificate, forwards to k-1 closest nodes
Destination returns store receipt if all accepts
PAST: Lookup Implementation




file = Lookup(fileId)
Given a requested fileId, a lookup request is routed towards a node with ID closest to
fileId
Any node storing a replica may respond with with file and file certificate
Since k numerically adjacent nodes store replicas and Pastry routes towards local
nodes, a node close in proximetry metric is likely to reply
PAST: Reclaim Implementation



Reclaim(fileId, owner-credentials)
Analogous to insert, but with a "reclaim certificate" verifying that the original
publisher reclaims the file
A reclaim receipt is received, used to reclaim storage quota
Storage Management

We want aggregrate size of stored files to be close to aggregate capacity in a PAST
network, before insert requests are rejected
o
Should be done in a decentralized way...

Two ways of ensuring this
o
Replica diversion
o
File diversion
Replica Diversion


o

o
o
Balances free storage space among nodes in a leaf set
If a node cannot store a replica locally, it asks a node in its leaf set if it can
Protocol must handle failure of leaf nodes then
Acceptance a replica at a node for storage is subject to policies
File size divided by available size should be lower than a certain threshold
(leave room for small files)
Threshold lower for nodes containing diverted replicas (leave most space for
primary replicas)
File Diversion

If one of k nodes with nodeId closest to fileId declines (primary and diverted) to store
a replica, the file needs to be diverted
Inserting node generates a new fileId (using a new random salt) and tries again...
Try three times, if all fails reject insertion


Caching

Goals of cache management
o
Minimize access latency (here routing distance)
o
Maximize throughput
o
Balance query load in system

The k replicas ensures availability, but also gives some load balancing and latency
reduction because of locality properties of Pastry

A file is cached in PAST at a node traversed in lookup or insert operations if the file
size is less than a fraction of the node's remaining cache size

Caching files are evicted as needed - expiry not needed since files are immutable
PAST Evaluation - Experimental Setup

o
o

o
o
o


Prototype implementation of PAST in Java
Network emulation environment
All nodes run in same Java VM
Workload data from traces of file usage
Eight Web proxy logs (1,863,055 entries, 18.7 GBytes)
Workstation file system (2,027,908 files, 166.6 GBytes)
"Problematic to get data of real P2P usage"
2250 PAST nodes, k = 5, b = 4
Different normal distributions of storage capacity of nodes used
PAST: Storage Management is Needed

o
o


Experiment without replica diversion and file diversion (using d1 and Web trace)
Primary replica threshold = 1, diversion replica threshold = 0
Insertion rejection on first file insertion failure
51.1% insertion rejection...
60.8% ultimate storage utilization...
PAST: Storage Management is Effective
PAST: Caching is Good

o
File request characteristics based on web logs
775 unique clients mapped to PAST nodes
PAST: Summary



Based on Pastry P2P routing and location
Insertion and replication of files
High storage utilization and load balancing through storage management and caching
The Cooperative File System (CFS)
CFS Goals

o
o
o

o
o
o

Distributed, cooperative, read-only storage and serving of files based on blocks
Fault tolerance
Load balance
Tap into unused resources
Challenges for a P2P architecture for this
Decentralization
Unmanaged participants
Frequent joins and leaves
Support multiple file systems in single CFS system with millions of servers...
CFS Design (1)

o



o


Two types of CFS nodes and three layers
CFS Client
FS: Uses DHash layer to retrive blocks, interprets blocks as files
DHash: Uses Chord layer to locate CFS Server holding desired blocks
Has public/private keys for signing file publishing
CFS Server
DHash: Storing blocks, replicating blocks, caching
Chord: Looking up blocks, checking for cached copies
CFS Design (2)

Blocks (of size ~ 10 KB) interpreted similar to blocks in a Unix file system (with
block IDs instead of disk addresses)

Publishers insert file systems
o
Each block is inserted into CFS using a hash on the block's contents as ID
o
Root block is signed using private key, public key as ID

File systems may be updated by publisher (by signing a new root block with same
private key)

Data is stored for a finite period of time
o
Extensions may be asked for
o
No explicit delete operation - assumes lots of storage
Chord Review


o
o
Nodes have m-bit IDs on an "identifier ring"
One operation
IP address = lookup(key)
Given a key ID, find the successor node, i.e., node whose ID most closely
follows the key ID on the identifier ring
Simple Key Location



Simple key location can be implemented in time O(log N) and space O(1)
Example: Node 8 performs a lookup for key 54
Each node maintains r successors in a successor list for fault tolerance purposes
Scalable Key Location (1)

o
Uses finger tables
n.finger[i] = successor(n + 2^(i-1)), 1 <= i <= m
Scalable Key Location (2)

If successor not found, search finger table to find n' whose ID most immediately
precedes id
o
Rationale: this node will know the most about n' of all nodes in the finger table
Chord Extensions

o
o

o
o
o
Locality awareness in lookup
Reduce lookup latency by preferably contacting nodes that are close by in
underlying network
Chooses preceding node in algorithm based on calculated average RPC latency
and guesses on remaining number of routing hops
Node ID authentication
What if an attacker claims a node ID just after an inserted block ID?
A node ID is a SHA-1 hash on the nodes IP address concatenated with a virtual
node index

Hard to choose own ID
Joining nodes are checked

When a new node is joined to a finger table, its ID is checked with the
claimed IP address (by sending a random "nonce" that should be included in reply
from IP)
DHash



Stores and retrieves uniquely identified blocks
Handles distribution, replication, and caching of blocks
Uses Chord to locate blocks
Replication in DHash

Blocks are replicated to k CFS servers immediately after the blocks successor on ID
ring (k <= r)


The k CFS servers are likely to be diverse in location
DHash "get" operation uses replicas to choose server with lowest reported latency to
download from
Caching in DHash

Each DHash layer in nodes sets storage aside for caching blocks to help avoiding
overloading CFS servers with popular blocks

Blocks are cached along lookup route
o
Lookups takes shorter and shorter hops to get to target -> lookups from
different clients will likely to visit same nodes late in lookup
o
Least-Recently-Used (LRU) replacement of blocks in caches is used

Cached root blocks may become inconsistent since their ID is based on public key
rather than on content hash
Load Balancing in DHash


o
Blocks are spread evenly in ID space
Multiple virtual servers may be created on one physical node
Virtual server on same physical node may look at each other location
information
DHash Quotas

How to ensure Denial-of-Service attack by injecting large amounts of data?

Reliable identification of publishers would require certificate authority (as with
smartcards in PAST)

CFS assigns a fixed storage quota per IP address
o
IP addresses are checked upon insert using nonces
CFS Evaluation

o
o

o
Experiment on CFS performance on real hosts on the Internet
Real-world client-perceived performance
12 machines scattered over the Internet
Simulated servers on a single machine
Robustness, scalability
Real-Life: CFS vs TCP

Speeds comparable, less variation in download speed
Simulation: Load Balancing

Server storage close to 0.016 for 64 servers (x 6 virtual) and 10,000 blocks
Simulation: Caching
CFS: Summary



Based on Chord location in P2P overlay networks
Implements a decentralized, distributed, read-only file system
Scalable and fault-tolerant
Summary
Summary


o


File storage (or file sharing) one of the original drivers for P2P systems
P2P routing and location substrates may be used to implement (file) storage
Additional requirements towards, e.g., load balancing and caching
PAST: Archival storage based on Pastry
CFS: Read-only file system based on Chord
Created by JackSVG