Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BitTorrent History • • • • • • • • The BitTorrent peer-to-peer file transfer protocol was created and introduced in 2001 by BitTorrent Inc. co-founder Bram Cohen Bram began his mission to solve a problem experienced by the online community since the birth of the Internet While it wasn't clear it could be done, Bram wanted to enable effective swarming distribution - - transferring massive files from server to client with the efficiency of peer-to-peer - - reliably, quickly and efficiently Bram Cohen’s only income vas voluntary donations from satisfied and kind BitTorrent users By 2003, BitTorrent had sparked a global revolution in file distribution on the web 2004: – HTTP only covered about 10 percent – P2P is about 50 to 80 percent of all Internet traffic • Amongst the different peer-to-peer protocols the BitTorrent became the most prominently used Today, it is providing millions of users worldwide with a valuable platform to publish, search and download popular digital content An estimated 40 million will have downloaded BitTorrent by 2006 Features • BitTorrent is a p2p network specifically designed for sharing large files • Key ideas: – File is broken into lots of small pieces that can be independently downloaded – Users can upload pieces to other users while downloading – ’Free riders’ are punished, sharers are rewarded Original goals • It was designed to distribute large files over the internet – e.g. distributions of the operating system Linux • Such a distribution fills several CDs – i.e. several GB of data • If thousands of users would download this amount of data from the same server the traffic would exceed any acceptable limits – especially if they do so at the same time BitTorrent • BitTorrent is a free P2P software application • It is very popular for larger files – E.g. CD-size • In contradiction to other P2P networks it has no built in search algorithm • To download a file the according .torrent file has to be downloaded from a webpage • Torrent files contain the hash values of a file and the address of a tracker • The tracker supplies peers with addresses of other peers that share the wanted files • There exists no single BitTorrent network, but thousands of temporary networks consisting of clients downloading the same file Implementation of the BitTorrent • Bram Cohen’s code for BitTorrent is well written • It lacks commenting – per each file there are 2 lines of comments at the top of the file • The language BitTorrent was written in is Python – A clean cut scripting language • which has some nice array manipulation operators – Python is virtually equivalent to Perl • Seeding • Leech • Leeching • Chunk • .torrent serving a file for download – A client who is downloading from the seeders – to download without contributing – a piece of a file typically 64 KB to 256 KB in size – A file which provides a URL to the tracker as well contains a list of SHA1 hashes for the data being transferred • This is so that the hashes in the .torrent can be used to verify if the blocks received are valid or not • Tracker • Peer • Seeder • Choked • Interest • Snubbed • Terminology – – A middleman who informs the peers of all the other peers in the network – A client to the network dedicated to a torrent – A Peer who has all the blocks in a torrent – – A connection is choked if not file data is passed through it Control data may flow but the transmission of actual blocks will not – indicates whether a peer has blocks which other peers want – A peer acting poorly - not uploading - or sending bad control messages • usually disconnected or ignored Reseeding – – Reintroducing a file to the network If no one is currently sharing (or seeding) a file, it must be reintroduced the network by someone who has a complete copy of the file Operation Summary • The original file distributor – publishes details of the file on a web server, and – creates atracker that allows peers interested in the file to find each other • To download the file, peers access the tracker and join the torrent – in BitTorrent lingo, a torrent is a group of peers connected to the same tracker • The basic idea is to divide the file into equal-sized blocks (typically 32- 256 KB) and have nodes download the blocks from multiple peers concurrently • The blocks are further subdivided into sub-blocks to enable pipelining of requests so as to mask the request-response latency • As a peer downloads blocks of the file, it also uploads to other peers in the torrent blocks that it has previously downloaded • Thus, the burden of bandwidth consumption is moved from the original content distributor to all peers in the torrent Sub-blocks • BitTorrent uses TCP and it is thus crucial to always transfer data or else the transfer rate will drop because of the slow start mechanism • The pieces are further broken into sub-pieces, often about 16kb in size • The protocol makes sure to always have some number of requests (typically five) for a sub-piece pipelined at any time • When a new sub-piece is downloaded, a new request is sent • Sub-pieces can be downloaded from different peers Detailed Operation I • • The person who wants to distribute the file does not put the large file itself on the web server, but only meta information: – the so-called .torrent file This file directs users to the server – which steers the download process • this server is the so-called "tracker" – and it contains information by which users can verify if they have retrieved the file • hash values of the file's pieces • The users just need to install the BitTorrent software • The software will connect to the "tracker„ • The clients (peers) do not download the information from the central server • If somebody wants to download pieces of the file • The core of BitTorrent is: – which will handle the information in the .torrent files – which will tell it a list of peers who are currently downloading the file – they just get the information from each other – he must also upload pieces he already has to his peers – a clever piece of economic incentives not only to download the file but – also to upload it • and therefore contribute your share of upload bandwidth • to help other users downloading Deatiled Operation II • Nodes in the system are either – seeds, i.e., nodes that have a complete copy of the file and are willing to serve it to others, – or leechers, i.e., nodes that are still downloading the file but are willing to serve the blocks that they already have to others • When a new node joins a torrent, it contacts the tracker to obtain a list containing a random subset of the nodes currently in the system – both seeds and leechers • The new node then attempts to establish connections to about 40 existing nodes – which then become its neighbors • If the number of neighbors of a node ever dips below 20, say due to the departure of peers, the node contacts the tracker again to obtain a list of additional peers it could connect to BitTorrent (Cont.) • During the download process the tracker periodically sends updated information about new download locations • Communication between two peers is done with TCP • The official BitTorrent P2P client generally does not support bandwidth throttling – meaning that it will tend to monopolize a network connection and not allow surfing the Internet or otherwise utilizing the network while files are being downloaded or uploaded • A freely-available alternative BitTorrent client overcomes this limitation • There exist many different BitTorrent clients – which are in general freely available for everyone • Very populars: – The original BitTorrent client written by Bram Cohen himself – The java based client Azureus Properties • Someone who wants to share a file creates a .torrent file – File name, size, hash of each block – Address of a tracker • The torrent is downloaded and the peer registers with the tracker, which provides a list of available peers and seeds • The peer begins requesting blocks, starting with the rarest available block • As it finishes receiving a block, it begins uploading those blocks to other peers • BitTorrent uses a pretty simple tit-for-tat approach to sharing – Those four peers who have shared the most data in the last 10 seconds are unchoked BitTorrent Scenario Endgame Mode • Sometimes a piece might be downloaded from a peer with a slow transfer rate • This can potentially delay the finishing of a download • To prevent this we have the “endgame mode” • Remember the pipelining principle, which ensures that we always have a number of requests (for sub-pieces) pending – the number often being set to five • When all the sub-pieces a peer lacks are requested – this request is broadcasted to all peers – This helps us to get the last chunk of the file as quickly as possible • Once a sub-piece arrives, we send a cancel-message indicating that we have obtained it – and the peers can disregard the request • Some bandwidth is of course wasted by this broadcasting – but in practice this is not very much because of the short period of the endgame mode BitTorrent is an open network • Its virtual network is called data-oriented overlay • BitTorrent is the technology behind much of the current peer-to-peer sharing on the Internet • BitTorrent is not itself a network – it allows small Internet networks to be created to share files • BitTorrent is a prime example of an open network – anything added to the network can be copied and added to another network P2P Swarming • The application, which is solved by the BitTorrent is sometimes differentiated from the traditional file sharing as swarming – File is divided into many small pieces for distribution – Clients request different pieces from the server or from other clients – Clients become servers for those pieces downloaded – When all pieces are downloaded, clients can reconstruct the whole file Protocol • The open source BitTorrent protocol provides a means for content creators and publishers to manage the distribution of data from a single source to many destinations (peers) in parallel making maximum use of upstream and downstream bandwidth available within a given peering group • A file being distributed as a Torrent is broken into pieces – These are subsequently shared between the peers (instead of being sent only from the source to each peer) yielding a potentially massive increase in distribution efficiency for the publisher and generally a better overall experience for any given peer • With BitTorrent, a single upload from the publisher's source server can result in a total distribution of hundreds if not thousands of file copies by peers requesting the file The Torrent File • The torrent file has all necessary information for a peer to download a file – URL of the tracker – Fileinfo (considering only one file) • • • • Name of the file Piece length/size File size SHA1 hashs of each piece – File ID is generated as SHA1 hash of the fileinfo Tracker • The tracker receives information of all peers and giving them random lists of peers • single point of failure – New versions of BitTorrent can use a DHT for recieving other peers • (trackerless) • Get Request consists of: – – – – File ID Peer ID Peer IP Peer Port • Tracker response with: – Interval, number of seconds between normal requests – List of peers, containing ID, IP and Port of each peer • Peers may rerequest on nonscheduled times, if they need more peers Peer Protocol • peer connections are symmetrical • a peer first tries to make a handshake to a new peer. – Checks for expected file and peer ID • each downloader reports to all of its peers, what pieces it has. • peers download pieces from all peers they can. • peers upload to other peers accordingly to the Choking Algorithm • piece selection – rarest first – peer downloads the piece which the fewest of its peers has first – this piece has best chance to be requested from other peers • to avoid delays between pieces, that lowers transfer rates – splits pieces into sub-pieces – always having some number of sub-pieces requests pipelined – complets a piece before requesting sub-pieces from other pieces Requirements from the Web Server and the Tracker • The requirements from the Web hosting end are not too much • To transmit a torrent you only need a standard HTTP Web server and a free program called a "tracker” • The Web server should be configured to use MIME type application/x-bittorrent for any file with the ".torrent" extension • The tracker's job is: – to keep track of which clients can serve which files to other clients • The tracker – can be installed • either on individual Web servers • or operated centrally by the Web host – Its traffic load is relatively light, and – offering a tracker to your hosting customers can make using BitTorrent to distribute content a much simpler process for your customers Choking Algorithm • The mechanism used to limit the number of concurrent uploads is called choking – which is the temporary refusal of a node to upload to a neighbor • a peer always unchoke a fixed number of peers (current 4) • which peers to unchoke is based strictly on current download rate from that peer • peers recalculate which peers to choke or unchoke every 10 seconds – enough time for TCP to achieve full transfer capacity – avoids fibrillation (no rapid change of choke and unchoke) • optimistic unchoke – unchokes a peer, regardless of its current download rate – which peer to optimistic unchoke is rotated every third rechoke • enough time for upload to achieve full transfer capacity • enough time for the unchoked peer to reciprocate • enough time for the download to achieve full transfer capacity Choking Algorithm (Cont.) • The piece exchange strategy between peers is based on a trading model: • In particular, preference is given to those peers that are uploading data at the highest rate Once in a choking period, typically every ten seconds, each peer recalculates the receiving data rate from all the peers in its list and selects the fastest ones • – peers prefer to send data to peers who reciprocate – typically three • It then uploads only to those peers for the duration of the period • Whenever a peer successfully downloads a new piece • Furthermore, everyone constantly keeps looking for better connections by randomly unchoking an additional peer once every third choking period • • • – A peer unchokes the fastest uploaders, and chokes all the rest – it sends out an advertisement to all others in its list – by means of an optimistic unchoke Seeds, who do not need to download any pieces, choose to unchoke the fastest downloaders This algorithm is the main driving factor behind BitTorrent’s fairness model: – a free-rider will eventually get low download rates, • since its lack of cooperation will result in being choked from most other peers Client • The BitTorrent client enables a user to search for files in the .Torrent format and download them • The current client enables a host of features including multiple parallel downloads • The client also intermediates peering between itself, source file servers ("trackers") and other clients – Thereby yielding great distribution efficiencies • The client also enables users to create and share torrent files • When a peer has finished downloading a file, it may become a seed by staying online for a while and sharing the file for free – i.e., without bartering Seed Client • One required component is a seed client – that can service the first request for a torrent • when it arrives • The seed will actually deliver files, so it doesn't make sense for a Web host to offer this as a centralized service – but it would be helpful to install a seed client as an optional module on standard server configurations • Normally, there must be a single seed client for each file being served – but the open-source Java client Azureus is capable of acting as a seed for multiple source files simultaneously • A future design challenge for P2P file sharing is creating incentives to seed – E.g. peers that seed files should be given preference to barter for other files A Usual BitTorrent Configuration • Tracker – administers the torrent, tracks historical data • .torrent – the file which contains a link to the tracker and hashes of each chunk of the distribution • Seeder – The initial client who provides the file that is torrented • Leech – Finds the torrent, the torrent points to a tracker – Then receives a list of peers from the tracker Search • Its search engine crawls the entire known torrent file space on a frequent basis retrieving and building an index of all public torrent files • The search interface on BitTorrent.com's home page (www.bittorrent.com) enables users to search this index with keywords to discover interesting files being shared by BitTorrent users around the globe • Users can quickly sort and filter results by various useful parameters and download the files with a single click Overall Architecture Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Overall Architecture Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Overall Architecture Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Overall Architecture Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Overall Architecture Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Overall Architecture Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Overall Architecture Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] The Elements of the Architecture • Seeders – Seeders release the rarest blocks first in random order. – Random order maintains a uniform distribution of blocks among peers. – A client that is good at distributing blocks will get more attention from the seeder – Seeder offloads work to the pee • Leeches and Peers – Chunks of files are traded. – Chunks are verified by hashes in the torrent. – Tit For Tat Algorithm • Those clients who upload the fastest to other clients get faster download rates from those clients • Some determination is to do with how much the client shares – BitTorrent is for file distribution, not ensuring 1 to 1 upload to download ratios – Assumption is about single clients not about groups of clients Bartering • If a peer receives several download requests at the same time – it needs a way of deciding which request to handle first • The peer will always select the candidate that offers the most packages in exchange • Thus, the first package for a new peer might take a while to load – but the download rate will increase with every additional package it has on offer itself • This technique prevents clients from downloading without offering anything in exchange – which is a good thing as it would effectively prevent load balancing Pareto Efficiency • • • • • • The main goal of BitTorrent concerning efficiency is to be Pareto efficient A Game that is Pareto efficient if there no way someone is better off – without making someone worse off – That is, a Pareto Optimal outcome cannot be improved upon without hurting at least one – The term is named after Vilfredo Pareto, an Italian economist In BitTorrent this is used to spur peers to look for better peers – or at least be fair and communicate with many peers Seeking Pareto efficiency is a local optimization algorithm – in which pairs of counterparties see • if they can improve their lot together – and such algorithms tend to lead to global optima Specifically, if two peers are both getting poor reciprocation for some of the upload they are providing – they can often start uploading to each other instead • and both get a better download rate than they had before BitTorrent is designed to promote the sharing of bandwidth – in order to improve transfer rates between peers The Byzantine General’s Problem • • • • • It was introduced by Lamport, Shostak and Pease to computer science The problem is about how a group of traitors can – confuse messages and cause miscommunication – cause disagreement or agreement based on their bias These traitors could be working together to subvert the integrity of the final decision The Byzantine General’s Problem is related to BitTorrent more so as a warning against sabotage on the BitTorrent network – Sabotage could come from copyright holders to Internet vigilantes to hackers How does BitTorrent defend against colluding peers that seek to subvert the network? – If a peer or a group of peers is lying about their upload/download statistics to the tracker • and if everyone voted and agreed what one client uploaded • that might work out quite well – If a peer detects invalid data from another peer • such as damaged data structures or improper field lengths • it automatically disconnects that peer – If a peer sends invalid data to another peer • this will be noticed as the SHA1 hash from that chunk will not match BitTorrent Community • To find a file in BitTorrent, users access web sites which act as global directories of available files • In Table are shown: – the most popular of these web sites (in 2004) – the number of different files and – the number of active file transfers at a certain time Free Riding • A “smart” peer might still be able to download at a high speed without contributing much – by either circumventing the incentive mechanism • e.g. by repeatedly asking the tracker for new peers that can be exploited for a while – or by taking advantage of weaknesses of the incentive mechanism itself • e.g. by abusing the optimistic unchoking mechanism Difficulties with BitTorrent • Works best with files that are widely copied on the network • In practice, files appear and disappear – There is no permanent archive, or incentive for users to keep old files • How to find torrents? – No central torrent repository – No metadata standard for searching and indexing torrents • Since blocks are not downloaded sequentially, a partial file is not useful One Reason of the Success • The success of Bittorent is unlikely to be due purely to the use of the tit-for-tat inspired protocol, as is often claimed • The real driving force behind the high cooperation might be the by-product of the lack of meta-data search within BitTorrent – This results in the creation of a number of disconnected “tribes” at both the swarm and the tracker level • The users are active in the tribal dynamics by selecting those tribes that best satisfy their needs – hence tribes filled with free-riders will tend not to operate An example • The BitTorrent client BitTornado displays information about the peers and seeders engaged in sharing and distributing a torrent Ports used by BitTorrent • BitTorrent uses 6881 as the default • port – if that port is unreachable BitTorrent tries to connect to a number of successive ports up to 6889 • If the client cannot connect to port 6889, it gives up • Thus, currently BitTorrent does not transfer files on arbitrary ports Opera integrates BitTorrent in upcoming browser • • • • Opera Software today announced that it has teamed with BitTorrent Inc. to include the BitTorrent™ protocol in the upcoming version of the Opera Web browser Integrating this popular technology in the Opera browser means faster and more efficient downloads of large files BitTorrent's technology will be made available to users of the Opera browser in two ways: 1. users can search for torrent files in the Opera browser's integrated search field 2. when a file has been selected, Opera's Transfer Manager feature will handle the download As a result of integrating BitTorrent into the Opera browser, users no longer need separate software for the searching and downloading of torrent content More Features • Does BitTorrent support resuming? – Yes, just save your download to the same location as the existing partial download – BitTorrent will resume where it left off after checking the partial download • How do I know the download isn't corrupted? – BitTorrent does cryptographic hashing (SHA1) of all data – When you see "Download succeeded!" you can be sure that BitTorrent has already verified the integrity of the data – The integrity and authenticity of a BitTorrent download is as good as the original request to the tracker – Checking the MD5/CRC32/other hash of a file downloaded via BitTorrent is redundant • What language is BitTorrent written in? – Python – And it uses GTK for its GUI System Requirements • Windows: – BitTorrent 4.2 has been tested on Windows XP, 2000, ME, and 98. There are currently CPU usage problems in ME and 98. BitTorrent 4.2 does not work on Windows 95. • Mac OS X: – Mac OS X 10.3 and newer • Linux/BSD/*nix: – It requires Python 2.2 or greater – The GUI also requires Python 2.3, GTK 2.2 or greater, and pygtk 2.4 or greater Tracker Error • An error message about the "tracker," like "problem connecting to tracker" or "rejected by tracker„: – Tracker errors always indicate a problem with the file you are trying to download and never a problem with the BitTorrent program – The "tracker" is a computer on the internet that helps you download a particular torrent – This error message indicates a problem with the tracker – It is possible that the tracker for the one particular file you're trying to download is down, or the tracker is no longer handling that particular file you were trying to download – Try to download again later, or contact the people distributing the file – Alternately, this error message may indicate that your internet connection is down or doesn't allow arbitrary outgoing connections • which is necessary for BitTorrent to work Legal Issues • You should know about some inherent dangers to using BitTorrent to download movies and TV shows • Organizations like the Recording Industry Association of America (RIAA) and the Motion Picture Association of America (MPAA) actively prosecute people and companies that are engaged in making copyrighted content available to others illegally • Internationally, the laws are even more complex • You are not anonymous when you use BitTorrent, because the process itself involves sharing of identifying information about your computer • This lack of anonymity puts you at risk if you use BitTorrent or other file-sharing technologies to download or share music, movies, TV shows, and other content Application of the BitTorrent Technology • • • • • • • Bittorrent used to distribute new computer magazine Cranberry Publishing has decided to use Bittorrent as one of the means of distribution for its new free computer magazine, Home Computer Magazine As the magazine is free to its readers, it will be made available simultaneously as a free download from the magazine website and as a torrent file for Bittorrent users “When we were looking at creating this magazine, we realised that the potential load on our server was enormous, so we wanted a way to ensure that it could be delivered,” says David Taylor The answer was clearly Bittorrent, which offers companies the means to make sure their users can get the files they want distributed faster The best thing about it is that the more people that download it, the faster it gets for everyone, not slower Torrent users can just grab the torrent and download – while new users can download the file straight from our website as usual, and then find out more about BitTorrent in the tutorial in issue one Responsibilities • BitTorrent itself does not support the search process – Users must use common methods like search engines and web sites for searching until they find a .torrent file for the resource they want to download • The .torrent file refers to the tracker – which steers the download process • This method makes it very clear who is responsible for the legitimacy of the content: – the operator of the tracker server Using BitTorrent for streaming media files • The primary weakness that makes BitTorrent unsuitable for video publishing is that when you publish a file, you must remain online at all times • BitTorrent uses the concept of a seed – This is the original place that the content was published, to start file sharing • It is common to find useless BitTorrent seeds – because the original server is no longer online, has changed IP addresses, or has simply revoked the seed • BitTorrent forces the publisher to act as a server – if only to provide seeding • Again, your computer becomes a single point-of-failure – and your costs and levels of effort increase • BitTorrent's other issue is that it is not designed to support more than a few thousand users on a single network • This prevents its use in a global video hosting network New Way for Distributing Contents • Currently the content industry tries to motivate users to download content from their stores – From the content industries' point of view an ideal technology would allow the store's web site to detect • if the user has a "reliable" environment that prevents him for using or copying the content under violation of the digital rights he bought – They are currently in favor of this distribution model • It works for small content snippets like ring tones – The music industry could make some progress • They also sell relatively small files • But it is not really realistic that millions of movies will be downloaded from giant movie store websites – The movie industry would not like to pay great traffic fees for the upstream from its servers – and the lines to the servers would always collapse when a new movie is being released • The technologies like BitTorrent help in distributing and selling the content: – Such technologies allow to distribute large files like movies right to the customer's home • without paying anything for traffic bandwidth or for manufacturing DVDs: – they can shift all the traffic costs to the users Advantages of the Media Distributors • BitTorrent shows that large amounts of data can be efficiently distributed over the network – without unacceptable costs for the distributor • By use of the BitTorrent protocol the distributor can motivate the users to pay for almost all of the distribution costs: – they do not only pay for the downstream, – they also provide the upstream and – the distributor only has to pay for the little overhead traffic produced by his tracker Streaming application of the BitTorrent and similar methods • Minute swarming: – Here a live stream is broken up into minute length files that are swarmed via P2P software such as • BitTorrent • Coral • Dijjer Live Radio or TV Casting • In its current stage BitTorrent is designed for the distribution of large static files – not for data streams • The file is cut into pieces – which are distributed in random order • not sequentially • There is also work done to develop peer-to-peer radios – but this seems not to be as developed and broadly used as BitTorrent • For really high bandwidths as they would be needed for full-featured TV broadcasts only academic research seems to exist until now