Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BitTorrent Introduction to BitTorrent • BT: BitTorrent • BT is not itself a network – it allows small Internet networks to be created to share files – Does not perform all the functions of a typical p2p system, like searching – Its virtual network is called data-oriented overlay • Written by Bram Cohen in 2001 • Written in Python and it uses GTK for its GUI • It is an efficient content distribution system using file swarming – Each file split into smaller pieces • equal-sized blocks (typically 32- 256 KB) – Nodes request desired pieces from neighbors • Encourages contribution by all nodes – The throughput increases with the number of downloaders via the efficient use of network bandwidth About GTK • GTK a cross-platform toolkit for creating graphical user interfaces • GTK stands for GIMP ToolKit – GIMP stands for GNU Image Manipulation Program • It is an image retouching and editing tools • It is open-source software • It is developed by a self-organized group of volunteers under the banner of the GNOME Project • It is licensed under the terms of the GNU LGPL – GNU Lesser General Public License • It allows developers and companies to use and integrate LGPL software into their own (even proprietary) software without being required to release the source code of their own software-parts • GNU is a recursive acronym meaning "GNU's not Unix" Terminology Seeder = a peer that provides the complete file Initial seeder = a peer that provides the file that is torrented Leecher Initial seeder One who is downloading Leecher Seeder Terminology • Peer: a client to the network dedicated to a torrent • Seeding: serving a file for download • Leeching: downloading without serving a complete file for download • Leech: peer that’s downloading the file – Fairer term might have been “downloader” • Subpiece: Further subdivision of a piece – The “unit for requests” is a subpiece – But a peer uploads only after assembling complete piece Leecher can become seeder • As a leecher downloads pieces of the file, replicas of the pieces are created – More downloads mean more replicas available • As soon as a leecher has a complete piece, it can potentially share it with other downloaders – Eventually each leecher becomes a seeder by • obtaining all the pieces, • assembles the file, and • verifies the checksum Swarm • Swarm – Set of peers all downloading the same file – Organized as a random mesh • BT is differentiated from the traditional file sharing as swarming • File is divided into many small pieces for distribution – – – – Each node knows list of pieces downloaded by neighbors Node requests pieces it does not own from neighbors Clients request different pieces from the seeder or from other clients Clients become seeders for those pieces downloaded • When all pieces are downloaded, clients can reconstruct the whole file • There exists no single BT network, but thousands of temporary networks consisting of clients downloading the same file Tracker • The tracker is a central server keeping a list of all peers participating in the swarm – A peer joins a swarm by asking the tracker for a peer list and connects to those peers – The tracker gives the requester peers random selection of peers • Get Request consists of: – – – – File ID Peer ID Peer IP Peer Port • Tracker responses with: – List of peers, containing ID, IP and Port of each peer • Peers may rerequest on nonscheduled times, if they need more peers How a node enters a swarm for file “popeye.mp4” • The file distributor publishes details of the .torrent file on (well-known) web server – File popeye.mp4.torrent hosted at the webserver • The .torrent has address of tracker for file • The tracker running on a webserver as well, keeps track of all peers downloading file • Tracker supplies peers with addresses of other peers that share the wanted files How a node enters a swarm for file “popeye.mp4” www.bittorrent.com • URL of tracker 1 Peer – .torrent file refers to the tracker • which steers the download process – This method makes it very clear who is responsible for the legitimacy of the content: • the operator of the tracker server How a node enters a swarm for file “popeye.mp4” www.bittorrent.com Peer 2 Tracker • • File popeye.mp4.torrent hosted at a (well-known) webserver • The .torrent has address of tracker for file • The tracker, which runs on a webserver as well, keeps track of all peers downloading file To download the file, peers access the tracker and join the torrent • torrent is a group of peers connected to the same tracker • The torrent is downloaded and the peer registers with the tracker, which provides a list of available peers and seeds How a node enters a swarm for file “popeye.mp4” www.bittorrent.com • Piece length – Usually 256 KB • SHA-1 hashes of each piece in file Peer – For reliability 3 Swarm Tracker BT Client Software • The BT client enables a host of features including multiple parallel downloads • The BT client also intermediates peering between itself, source file servers (trackers) and other clients – Thereby yielding great distribution efficiencies • The BT client also enables users to create and share torrent files • When a peer has finished downloading a file, it may become a seed by staying online for a while and sharing the file for free – i.e., without bartering Requirements from the Web Server and the Tracker • The requirements from the Web hosting end are not too much • To transmit a torrent you only need a standard HTTP Web server and a free program called a tracker • The Web server should be configured to use MIME type application/x-bittorrent for any file with the ".torrent" extension • The tracker's job is: – to keep track of which clients can serve which files to other clients • The tracker – can be installed • either on individual Web servers • or operated centrally by the Web host – Its traffic load is relatively light, and – offering a tracker to your hosting customers can make using BT to distribute content a much simpler process for your customers Overall Architecture Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Overall Architecture Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Overall Architecture Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Overall Architecture Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Overall Architecture Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Overall Architecture Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Overall Architecture Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Simple example {1,2,3,4,5,6,7,8,9,10} Seeder: A {}{1,2,3} {1,2,3,5} {} {1,2,3} {1,2,3,4} {1,2,3,4,5} Downloader Downloader B C Pipelining • When transferring data over TCP, always have several requests pending at once (typically 5), to avoid a delay between pieces being sent • Every time a piece or a sub-piece arrives, a new request is sent out Piece Selection • The order in which pieces are selected by different peers is critical for good performance • If an inefficient policy is used, then peers may end up in a situation where each has all identical set of easily available pieces, and none of the missing ones. • If the original seed is prematurely taken down, then the file cannot be completely downloaded Piece Selection Small overlap is good Large overlap is bad -- wastes bandwidth Piece selection • Strict Priority • Rarest First – General rule • Random First Piece – Special case, at the beginning • Endgame Mode – Special case Random First Piece • Initially, a peer has nothing to trade • Important to get a complete piece ASAP • So as to assemble first complete piece quickly • Then participate in uploads • Select a random piece of the file and download it • When first complete piece assembled, switch to rarestfirst Rarest Piece First • Determine the pieces that are most rare among your peers, and download those first – This ensures that the most commonly available pieces are left till the end to download – Increases diversity in the pieces downloaded – avoids case where a node and each of its peers have exactly the same pieces; increases throughput – Increases likelihood all pieces still available even if original seed leaves before any one node has downloaded entire file Endgame Mode • Near the end, missing pieces are requested from every peer containing them • This ensures that a download is not prevented from completion due to a single peer with a slow transfer rate – This can potentially delay the finishing of a download • When all the sub-pieces a peer lacks are requested – this request is flooded to all peers – This helps to get the last chunk of the file as quickly as possible – To speed up completion of download • Once a sub-piece arrives, it sends a cancel-message indicating that the peer has obtained it • Some bandwidth is of course wasted by this flooding – but not much because of the short period of the endgame mode Difficulties with BitTorrent • Works best with files that are widely copied on the network • In practice, files appear and disappear – There is no permanent archive, or incentive for users to keep old files • Since blocks are not downloaded sequentially, a partial file is not useful Reason of the success • The success of Bittorent is unlikely to be due purely to the use of the tit-for-tat inspired protocol • The real driving force behind the high cooperation might be the by-product of the lack of meta-data search within BitTorrent – This results in the creation of a number of disconnected “tribes” at both the swarm and the tracker level • The users are active in the tribal dynamics by selecting those tribes that best satisfy their needs – hence tribes filled with free-riders will tend not to operate An example • The BitTorrent client BitTornado displays information about the peers and seeders engaged in sharing and distributing a torrent Additional features • BitTorrent uses 6881 as the default port – if that port is unreachable BitTorrent tries to connect to a number of successive ports up to 6889 – If the client cannot connect to port 6889, it gives up • BitTorrent supports resuming, it resumes where it left off after checking the partial download • How do the user know the download is not corrupted? – BitTorrent does cryptographic hashing (SHA1) of all data – When seeing "Download succeeded!" the user can be sure that BitTorrent has already verified the integrity of the data – The integrity and authenticity of a BitTorrent download is as good as the original request to the tracker – Checking the MD5/CRC32/other hash of a file downloaded via BitTorrent is redundant An application of the BitTorrent Technology • Cranberry Publishing uses Bittorrent as one of the means of distribution for its free Home Computer Magazine – As the magazine is free to its readers, it is available simultaneously as a free download from the magazine website and as a torrent file for Bittorrent users • However, the potential load on their server is enormous, so they wanted a way to ensure that it could be delivered • Bittorrent offered the means to make sure the users can get the files they want distributed faster • The best thing about it is that the more people that download it, the faster it gets for everyone, not slower • Torrent users can just grab the torrent and download Why is (studying) BitTorrent important? (From CacheLogic, 2004) Legal Issues • You should know about some inherent dangers to using BitTorrent to download movies and TV shows • In the USA, organizations like the Recording Industry Association of America (RIAA) and the Motion Picture Association of America (MPAA) actively prosecute people and companies that are engaged in making copyrighted content available to others illegally • Internationally, the laws are even more complex • You are not anonymous when you use BitTorrent, because the process itself involves sharing of identifying information about your computer • This lack of anonymity puts you at risk if you use BitTorrent or other file-sharing technologies to download or share music, movies, TV shows, and other content