Download slides - Disco Lab

FreeNet: A Distributed Anonymous Information Storage and Retrieval System Ian Clark, Oskar Sandberg, Brandon Wiley and Theodore Hong FreeNet • P2P network for anonymous publishing and retrieval of data – – – – – Decentralized Nodes collaborate in storage and routing Data centric routing Adapts to demands Addresses privacy & availability concerns Motivation • Problem - Querying the network – Source - Requestor – Destination – Provider • It’s a distributed search problem – Approximating global knowledge with local knowledge – Other systems – Chord, Tapestry, Pastry • Privacy and availability – Protect authorship, prevent denial attacks Goals of Freenet • • • • • Anonymity for producers and consumers Deniability for information storers Resistance to denial attacks Efficient storing and routing Does NOT provide – Permanent file storage – Load balancing – Anonymity for general n/w usage Architecture Request: 1. key 2. Hops to live 3. ID 4. Depth • • • • Each node – local data store + routing table Request file through location independent keys Routing - chain of proxy requests - decision is local Graph structure actively evolves over time Key Based Searching •Keyword signed key(KSK) •Easy for retrieval – only need ‘D’ •Minimal protection against tampering D ‘D’– key generation Pb + Pr ; SHA(Pb) FILE + Pr E(FILE, D) Encrypted FILE Signature KSK Keys and Searching….. • Problems with KSK – flat namespace (collisions), key squatting, dictionary attacks • Signed Subspace Key (SSK) Randomly generated key pair  namespace ID SSK = SHA(‘D’) ^ SHA(Pb) (-)Advertisement – subspace Pb + ‘D’ (+)Owner can construct hierarchical space of arbitrary depth - using indirect files – (+)Reduces collision greatly – – – – Keys and Searching… • Problems with SSK - updating, versioning • Content Hash Keys (CHK) – Encrypted by a random encryption key – Publish CHK + decryption key – CHK + SSK  easily updateable files • 2 step process – publish file, publish pointer • Results in pointers to newer version • Older versions accessed thru CHK – Can be used for splitting files Retrieving Files • How do u locate the keys? – Hypertext spider – Indirect files – published with KSK of search words – Publish bookmarks • File retrieval – Request forwarded to node in RT with closest lexicographic match for the binary key – Request routing follows steepest-ascent hill climbing: first choice  failure  backtrack  second choice Still Retrieving…. c a b f e d • Timers, hops - curtail request threads • Files cached all along the retrieval path • Self-reinforcing cycle – results in key expertise Ring Topology •1000 nodes in ring topology •Datastore = 50 items •RT = 250 items •Keys associated with links are hash of destn IPs Self Reinforced Routing • Snapshots using 300 requests with hops = 500 • As network converges it drops to 6 - “six degrees of separation” Retrieval Discussion • No controlled replication  no persistence • No correlation between keys and content – (+) Documents related to a subject are scattered • Geographical fault resilience – (-) No spatial locality – search latencies can suffer • Building indexes by other means Publishing • Similar to retrieval but, 2 step process – Detect collisions – ‘all clear’ if no collision – Publish to node in RT with closest key match • Are CD and publish paths same? – Can result in collision during publish step • Inserts allow new nodes to advertise themselves • (+) Key-squatting is not effective Data Management • Finite data stores - nodes resort to LRU • Routing table entries linger after data eviction • Outdated (or unpopular) docs disappear automatically • Bipartite eviction – short term policy – New files replace most recent files – Prevents established files being evicted by attacks Network Growth • New nodes have to know one or more guys • Problem: How to consistently decide on what key the new node specializes in? – Needs to be consensus decision – else denial attacks • Advertisement  IP + H(random seed s0) – Commitment - H(H(H(s0) ^ H(s1)) ^ H(s2))……. – Key for new node = XOR of all seeds • Each node adds a RT entry for the new node Network Growth • Key assigned to new nodes = H(IP) • Scales as log(n) until n ~ 40000 • At 40000, RTs are full Protocol • Nodes with frequently changing IPs use ARKs • Return address specified in requests – threat? • Messages do not always terminate when hopsto-live reaches 1 • Depth is initialized by original requestor to arbitrarily small value • Request state maintained at each node – timers - LRU Fault Resilience • Median path length < 20 at 30% node failures? • N/w becomes ineffective at 40% failures ??? Small World • Most nodes form local clusters • Few high link connecting nodes • Power law distribution provides high degree of fault tolerance Security Concerns • Pre- routing – mesg. encrypted by public keys which determine path of pre-routing • Protecting data source – using random and probabilistic methods Security • File integrity - KSK vulnerable to dictionary attacks • DOS attacks – Hash Cash to slow down • Attempts to displace valid files are constrained by the insert procedure Conclusion • Provides a n/w to anonymously store and request files • Adaptive routing who’s efficiency increases with experience • Deals with privacy and data integrity in various scenarios • Applications? – Freedom of speech – Unaccountable, decentralized Napster

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download slides - Disco Lab