Download Peer-to-peer applications fostered explosive growth in recent years

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

AppleTalk wikipedia , lookup

Airborne Networking wikipedia , lookup

Backpressure routing wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Distributed operating system wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Everything2 wikipedia , lookup

CAN bus wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Kademlia wikipedia , lookup

Transcript
Peer to Peer File
Sharing
Huseyin Ozgur TAN
What is Peer-to-Peer?
Every node is designed to(but may not by
user choice) provide some service that
helps other nodes in the network to get
service
 Each node potentially has the same
responsibility
 Sharing can be in different ways:



CPU cycles: SETI@Home
Storage space: Napster, Gnutella, Freenet…
P2P: Why so attractive?

Peer-to-peer applications fostered
explosive growth in recent years.


Low cost and high availability of large
numbers of computing and storage
resources,
Increased network connectivity

As long as these issues keep their importance,
peer-to-peer applications will continue to gain
importance
Main Design Goals of P2P
Ability
to
operate
in
a
dynamic
environment
 Performance and scalability
 Reliability
 Anonymity: Freenet, Freehaven, Publius
 Accountability: Freehaven, Farsite

First generation P2P routing and
location schemes
Napster, Gnutella, Freenet…
 Intended for large scale sharing of data
files
 Reliable content location was not
guaranteed
 Self-organization and scalability: two
issues to be addressed

Second generation P2P systems
Pastry, Tapestry, Chord, CAN…
 They guarantee a definite answer to a
query in a bounded number of network
hops.
 They form a self-organizing overlay
network.
 They provide a load balanced, faulttolerant distributed hash table, in which
items can be inserted and looked up in a
bounded number of forwarding hops.

CAN

Content Addressable Network




distributed infrastructure
based on hash tables
serves for Internet scale networks
Advantages



Scalable
Fault-tolerant
Self orginazing
CAN

Current P2P systems at that time




Napster
Gnutella
Both not scalable
Napster



Central server
Expensive (for server)
Vulnerable (single point of failure)
CAN

Gnutella




De-centralized file location as well
File location is based on location
Not scalable
Aim

Building a scalable distributed infrastructure
for p2p networks
CAN

Basic operations




Insertion
Lookup
Deletion
Each node


stores a zone of the entire hash table
stores information about its neighbors
CAN - Design



Virtual d-dimensional
Cartesian coordinate space
on a d-torus
The entire space is
partitioned among all
nodes
The space is used to store
key,value pairs


K -> P -> zone
Each node holds info about
each of its neigbors for
efficient routing
CAN - Routing


Works by following the straight line
path between source and destination
Local neighbor state is sufficient for
routing



Complexity




CAN message includes dest. Coordinates
A node routes it to the neighbor with
closest to the destination coordinates
(d/4)(n1/d) avg path length
2d neighbors for a node
Good for scalability
Many paths between source and
destination

In case of crashes the node routes the
message to the next best available path
CAN - Construction

A new node that joins the system must be
allocated its own zone


Achieved by an existing node splitting its zone
in half, retaining half and handing the other
half to the new node
3 Steps



New node must find a node already in CAN
It must find a node whose zone will be split
The neighbors of split zone must be notified
CAN - Construction

Bootstrap




Finding a zone



Chooses randomly a point P in space and sends JOIN request
destined P
Destination splits its zone and key,value pairs are transferred
Joining the routing




New node discovers an existing node
CAN DNS name resolves to one or more nodes IP addresses
which are believed to be in the system
By querying the DNS new node gets a bootstrap node and
from this node gets other nodes IP addresses
The new node learns the IP addresses of the neighbors
Both new and existing nodes update their neighbor set
To inform others two nodes send an immediate update,
followed by periodic refreshes
Complexity

O(d)
CAN – Maintenance

Node departure




Explicitly hand over its zone to one of its neighbors
If node’s zones can be merged it is done
If not, the responsibility is handed to the neighbor
whose current zone is smallest
Takeover (a node becomes unreachable)




One of its neighbors takes over the zone
But the pairs are lost
Nodes send periodic advertisements
Absence of such adv. signals its failure


Each neighbor starts a timer
When it expires node sends TAKEOVER message to all
neighbors of failed node
Chord

Distributed lookup protocol


Advantages






One operation : lookup = key -> node
Load balance
Decentralization
Scalability
Availability
Flexible naming
An infrastructure for p2p system
Chord - Overview

Specifies






How to find locations of keys
How new nodes join the system
How to recover from failure of existing nodes
It uses consistent hashing
Improves the scalability of it by avoiding the
requirement that every node know about every
other node.
In N-node network



each node maintains information about only O (log N)
nodes
a lookup requires O (log N) messages
Updating info on join and leave requires O (log2 N)
Chord – Consistent Hashing

Consistent hash function each node and key an
m-bit identifier using a base hash function




Node’s identifier = Hash of its IP
Key’s identifier = Hash of key
M must be big enough to make probability of collisions
negligible
Assignment of keys to nodes



Identifiers are ordered in an identifier circle
Key k is assigned to the first node whose identifier is
equal to or follows k in the identifier space
This node is called successor (k)
Chord – Consistent Hashing

Scalable key location

Very small amount of routing information suffices


Routing can be done by following successors until the
node is found


Each node need only beware of its successor
Not scalable: O(N) time lookup
To accelerate Chord maintains additional routing
information



Each node, n, maintains a routing table with at most m
entries (finger table)
ith entry contains the identity of first node, s, that
succeeds n by at least 2i-1 on the identifier circle
i.e. s= successor(n+2i-1)
Chord
Chord – Node Joins
Main challenge is preserving the ability to
locate every key in the network
 2 invariants




Each nodes successor is correctly maintained
For every key k, node successor(k) is
responsible for k
Also in order to be the lookup fast the
finger tables must be correct
Chord – Node joins
To simplify the join and leave mechanism
each node in Chord maintains a
predecessor pointer
 3 tasks




Init predecessor and fingers of node n
Update the finger and predecessors of existing
nodes
Notify higher layer so that it can transfer state
associated with keys that node n is now
responsible for
Chord – Node joins

Initializing fingers and predecessor


Learns its predecessor and fingers by asking n’ to look them
up
Updating fingers of existing nodes

Node n will become the ith finger of node p iff





P precedes n by at least 2i-1
The ith finger of p succeeds n
The first node p that can meet these 2 conditions is the
immediate predecessor of n- 2i-1
The algorithm starts with the ith finger of node n and then
continues to walk in the counter clockwise direction on
identifier circle until it encounters a node whose ith finger
precedes n
Transferring keys

New node must contact the successor of it and transfer
responsibility
Chord – Concurrent Operations

Stabilization



The invariants are difficult to maintain in the
face of concurrent joins in large networks
Separate correctness and performance goals
Stabilization protocol


Keep nodes’ successor pointers up to date, which is
sufficient to correctness of lookups
Those successors are then used to verify and correct
finger table entries
Stabilization
Chord