* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download L27 - Cornell University
Survey
Document related concepts
Transcript
Peer to Peer Systems and File Sharing Carl Lagoze – CS431 – Cornell University May 3, 2004 Portions borrowed from sources listed on next slide Sources of this lecture • J. Berkes, Decentralized Peer-to-Peer Network Architecture: Gnutella and Freenet • R. Morris, Chord+DHash+Ivy: Building Principled Peer-to-Peer Systems • S. Kamvar, M. Schlosser, H. Garcia Molina, EigenRep: Reputation Management in P2P Networks • J. Golbeck, B. Paris, J. Hendler, Trust Networks on the Semantic Web Characteristics of P2P Network • Sharing of computing resources by direct exchange • Blur between clients, servers, routers • Nodes are autonomous P2P Advantages • • • • • Efficient use of resources Scalability Reliability Administrative simplicity Democracy P2P’s Political History • Major basis in music sharing context • Overshadows numerous applications • Recent research is investigating generic applicability – DHTs – Reputation and Trust Small-World Phenomenon • Milgram’s “six degrees of separation (1967) • Forwarding of letters from Nebraska to Boston, MA • Average chain 6 of six hops Power Laws and Small Worlds • Out-degree distribution is: – 1/Kα where α > 0 • Characteristics of a variety of phenomenon – – – – – Web Graph IMDb connection (acted in same movie) Social interactions P2P networks (Gnutella) Epidemiology Strength of Weak Ties • Extension of power-law phenomenon • Short-cuts (between cliques) critical to small world phenomenon Napster and P2P • Not really P2P • Central search index • Direct interaction for access (p2p) • Central index was key to litigation Gnutella • Fully P2P • Flooded query – Scalability problems – TTL controls broadcast – “Query Memory” controls circularity • Reliability problems • But “whom to sue?” Kazaa • Hybrid of Napster model and Gnutella model • Notion of a super peer – Like a regionalized napster server – Dynamically chosen by characteristics • P2P relationship among super-peers • Queries directed towards super-peers “Free-riding” • Definition: downloading but not sharing any data • On gnutella networks 15% of users contribute 94% of content Freenet • Goal: create an uncensorable and secure global information store – Anonymity and fault tolerance • http://freenet.sourceforge.net/ • Three types of network messages: – Advertise storage space to store unknown data – Insert a file to the network – Request a file (with a key) from the network • Use of one-way secure hashes to identify files and encryption to store files – Node does not know what it is storing • Non-traceability of messages – A node can not determine where its message is stored Freenet Request/Response Sequence Distributed Hash Tables (DHT) • • • • Overcoming the flooded search problem Operationally like standard hash tables Data is distributed around the network Features – Efficient: • O(log N) messages per lookup • Even distribution of keys among nodes – Adaptable • Network reconfiguration does not cascade to all nodes – Robust: replication of tables provides survival to node failures Chord • One implementation of DHT within a larger P2P project • http://www.pdos.lcs.mit.edu/chord/ • Algorithm properties – Common hash function distributes node ID (IP) and document ID uniformly – Maps a content key to its node successor Chord Key Mapping Key ID Node ID K100 N100 N10 K5, K10 Circular ID Space N32 K11, K30 K65, K70 N80 N60 K33, K40, K52 Robustness via each node remembering N successors and replicating table at successors Use of finger table to avoid linear lookups ith finger table position points to first node that succeeds n by at least 2i+1 Key location with finger table • Use finger table to find furthest node that precedes key • O(logN) hops leads to target From DHTs to P-trees • DHTs only support equality queries – Return the value of resources with ID1 • Need to support range queries – OAI type query, find all nodes resources that were changed between D1 and D2 • P-tree reuses aspects of fault-tolerant ring of Chord with logarithmic search properties for equality and range queries. Pastry Project • Factors in network locality as part of DHT algorithm • http://research.microsoft.com/~antr/Pastry/ Identity, Trust, Reputation • Identity – Who is making a statement – Certificates, PKI • Trust – Can I believe the person who is making the statement – PGP Web of Trust • Reputation – What is the history of trust in the person making the statement – Reputation management Reputation Issues • Small world phenomenon makes web of trust feasible • Reputation is context specific – I can be trusted with questions about OAI-PMH – Can you trust me belaying for you? Simple reputation network A C B •A knows and trusts B •B knows and trusts C •A can infer trust for C Reputation Inference Algorithm • Begin at source (node seeking a reputation) • Poll each of neighbors whose reputation it trusts – Ignore neighbors with bad reputation • Have each neighbor recursively find reputation of sink (node for which reputation is sought) Accuracy of inferences • Incorrect bad rating by a node has minimal effect – Will be dropped from path in reputation seeking – Will be overcome by “correct” good rating by another node. • Incorrect good rating by a node can have cascading effect – Can cause ratings of good nodes to be ignored through lies – Serious threat to network • Good trust algorithm minimizes effect of “bad nodes” From Golbeck and Hendler Trellis • http://trellis.semanticweb.org • Semantic web based system for decision making assessing reliability of information and sources • Decision maker can construct compound statements justifying decision and providing basis for others’ decisions Trellis (cont.) • Components – Statements (Carl Lagoze is a bad teacher) – Basis of statement • http://cornellbigred.collegesports.com/sports/mcrew/mtt/kruse_william00.html – Principal source of basis/statement • William Kruse – Qualifications to state certainty of component Trellis compound statement From Gil and Ratnaker Advogato • Trust metrics for open source software developers • http://www.advogato.org/ • Three levels of trust/certification – Master – Journeyer – Apprentice Advogato (cont) • Graph structure of trust – Domain of master is only master – Domain of journeyer is master and journeyer – Domain of apprentice is all • Computation of trust is via network flow (well known problem with efficient solutions) – Hard-wired set of users from which all trust flows (gods of the system) – people reached by the flow are those accepted by the trust metric – With the three levels, the maxflow is computed three times: • Robust (resistant to attack) and efficient Eigentrust • Algorithm for Reputation Management in P2P Networks • Kamvar, Schlosser, Garcia-Molia (Stanford) • http://www.stanford.edu/~sdkamvar/research.html Eigentrust Approach • Goal: Identify sources of inauthentic files and bias peers agains downloading from them • Method: Give each peer a trust value based on its previous behavior • Trust values – Local: open a peer has on another based on past experience – Global: trust that entire system places in a peer • Want latter computed from aggregate of former • Dual goals – Know all peers – Perform minimal computation and store minimal data Past History Approach • Each peer biases its choice of downloads using its own opinion vector • Problems: – Each peer has limited past experience – Inertia – if a peer has good past experience with another, it will be biased towards relying on it Friends of friends approach • Ask for opinions of the people who you trust • Weigh their opinions by your trust in them • Problems – You have a lot of friends: too much to compute and store – Few friends: won’t have enough data Eigentrust Approach • Whole networks cooperates to store and compute trust vector • Each peer holds its own opinions • Each peer holds its own global reputation • Iterative algorithm that converges to compute global trust ratings (in the nature of PageRank) More Eigentrust Issues • Secure Score Management – Voting among multiple score managers – Peer score held by another peer • Threat scenarios – Malicious individuals (always bad) – Malicious collectives (always bad, think highly of each other) – Camouflaged collectives (sometimes good to trick people) – Malicious spies (good all the time but friends with bad folks)