Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Network tap wikipedia , lookup
Computer network wikipedia , lookup
Internet protocol suite wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Airborne Networking wikipedia , lookup
Crawling Gnutella Network By: Samer Al-Kiswany 1 Roadmap • Introduction • Gnutella network structure • Gnutella protocol overview • Gnutella crawling protocol • Crawling topology information • Crawling node content EECE 411 2 Introduction Gnutella network is a decentralized peer to peer system for file sharing. Original created by Justin Frankel of Nullsoft Large scale today up to 4M nodes, 1000TB data, 100M files today Fast growth in its early stages more than 50 times during first half of 2001 (50 times again 2001 to 2006) Self-organizing network Open, simple and flexible protocol EECE 411 3 Roadmap • Introduction • Gnutella network structure • Gnutella protocol overview • Gnutella crawling protocol • Crawling topology information • Crawling node content EECE 411 4 Gnutella Network Structure Gnutella Protocol 0.6 Two tier architectures of ultrapeers and leaves Ultrapeers Leaves EECE 411 5 Roadmap • Introduction • Gnutella network structure • Gnutella protocol overview • Gnutella crawling protocol • Crawling topology information • Crawling node content EECE 411 6 Basic Primitives for File Sharing Join: How do I begin participating? Publish: How do I advertise my file(s)? Search: How do I find a file? Fetch: How do I retrieve a file? EECE 411 7 Gnutella Protocol Overview Join: on startup, client contacts an ultrapeer node(s) Publish: no need Search: Ask the ultrapeer node The ultrapeer will propagate the questions to other ultrapeers and will return the answer back Fetch: get the file directly from peer (HTTP) EECE 411 8 Roadmap • Introduction • Gnutella network structure • Gnutella protocol overview • Gnutella crawling protocol • Crawling topology information • Crawling node content EECE 411 9 Crawling a Gnutella node By Crawling we are interested in two main pieces of information: With whom the node is connected ? - Topology information Gnutella protocols terms “Crawling/Communicating Network Topology Information ” What files the node is sharing with others? Gnutella protocol terms “Browsing Host ” EECE 411 10 Crawling Topology Information Gnutella protocol 0.6 supports network topology information crawling !!! Topo crawl Gnutella Network Topo information Topology Information: - Ultrapeers - Leaves EECE 411 11 Crawling Topology Information Topo Crawl Topo information GNUTELLA CONNECT/0.6 User-Agent: LimeWire (crawl) X-Ultrapeer: False Query-Routing: 0.1 Crawler: 0.1 GNUTELLA/0.6 200 OK User-Agent: BearShare Leaves: 127.0.0.1:6346,127.0.0.2:6346 Peers: 127.0.0.4:6346,127.0.0.5:6346 GNUTELLA/0.6 200 OK EECE 411 12 Browsing Node Content Browse Host Gnutella Network List of files EECE 411 13 Browsing Node Content Browse Host List of files GET / HTTP/1.1 Host: Crawler_IP:PORT User-Agent: UBCECE Accept: application/x-gnutellapackets Connection: close HTTP/1.1 200 OK Server: LimeWire/x.y Content-Type: application/x-gnutellapackets Connection:close <List of files> Query Hit Message EECE 411 14 Query Hit Parsing Query Hit Message 1 2 A B C D E F 3 – Gnutella message header 1 important field : message length. – Query Hit Header 2 The HTTP response message may contain more than one query Hit response important field : Number of files A-F– list of shared files includes file name and size – Other Gnutella protocol fields 3 Query Hit Message 1 2 A BCDE F 3 Query Hit Message --EECE 411 1 2 ABCDE F 3 15 Limitations - Does this always work ? Topology Crawling: • The topology information crawling is not supported by some Gnutella protocol v0.4 implementations Host Browsing : • Some Gnutella node implementations will return the list of files in HTML (BearShare for instance). (will not respond with Query Hit message) EECE 411 16 Roadmap • Introduction • Gnutella network structure • Gnutella protocol overview • Gnutella crawling protocol • Crawling topology information • Crawling node content EECE 411 17 Single Gnutella-Node Crawler A proof of concept implementation of single Gnutella-node crawler. The main class that implements the crawling protocol is the Crawler class: • crawlpeers(ip_address, port) • parsePeers(byte[] ) • listFiles(ip_address, port) • processQueryHit(byte[] ) Available through the following link http://www.ece.ubc.ca/~samera/TA/project/sgnc.html EECE 411 18 Project Phase II • Implement a single-node Gnutella network crawler • Report: The active leaf nodes Information regarding the “agent” (i.e., the implementation: LimeWire , BearShare …etc) The domain name corresponding to the node IP address. Avoid cycles !! EECE 411 19 Project Phase III • Implement a master/worker crawler with Java NIO sockets. Crawled Problems ? (Hint: Failures) To be Crawled Master Primary Crawl the following list : … Results: peers IPs, statistics Problems ? Gnutella Network EECE 411 20 Project Phase III • Implement a master/worker crawler with Java NIO sockets. • Adopt primary/backup replication for the manager Master Backup Master Primary X Crawled To be Crawled Gnutella Network EECE 411 21 Previous Years Ideas – Part I Programming languages / frameworks / protocols • Java (the vast majority) • Scala • Apache MINA framework. • Java RMI • Jython • XML-RPC • SQL • Python/Perl/Shell/cron jobs Architecture • Master/worker (the majority) • Hierarchical EECE 411 22 Previous Years Ideas – Part II Design choices • NIO at both master and workers • Careful load balancing • Keep the workers always busy • Bootstrapping new workers if old works fail Additional bells and whistles • GUI manager • Statistics in real-time through GUI and web page • Graphviz EECE 411 23 References • Single Gnutella-Node Crawler: http://www.ece.ubc.ca/~samera/TA/project/sgnc.html • Gnutella Crawling protocol : http://www.ece.ubc.ca/~samera/TA/project/Gnuttela-Protocol.html Other references: • http://gnutella-specs.rakjar.de/index.php/Main_Page • www.limewire.com EECE 411 24 Thank you www.ece.ubc.ca/~samera 25