BitTorrent Protocol -- BTP/1.0 AbstractThis document describes the BitTorrent Protocol version 1.0 referred to as "BTP/1.0". The BitTorrent Protocol (BTP) is a protocol for collaborative file distribution across the Internet and has been in place on the Internet since 2002. It is best classified as a peer-to-peer (P2P) protocol, although it also contains highly centralized elements. BTP has already been implemented many times for different platforms, and could well be said to be a mature protocol, although a formal, detailed and complete description has so far been lacking. BTP was devised and implemented by Bram Cohen as a P2P replacement to standard File Transfer Protocol (FTP) to be used in places where the usage of an FTP implementation poses too much strain on the server in terms of request-processing and sheer bandwidth. Normally a client does not use her upload capacity while downloading a file. The BTP approach capitalizes on this fact by having clients upload bits of the data to each other. In comparison to FTP this adds huge scalability and cost-management advantages. Table of Contents1.
Introduction
1. IntroductionThe key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 (Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.)[3]. The File Transfer Protocol (FTP) (Postel, J. and J. Reynolds, “File Transfer Protocol,” October 1985.)[1] with the recent additions of security extensions, still remains the standard for secure and reliable transmission of large files over the Internet. However, its highly centralized client-server approach also means, it is inadequate for mass publication of files, where a single point may expect to be requested by a critically large number of clients simultaneously. To remedy this situation many organizations either implement a cap on the number of simultaneous requests, or spread the load on multiple mirror servers. Needless to say both approaches have their drawbacks, and a solution that addresses these problems is highly needed. The approach in BitTorrent Protocol (BTP) is to spread the load not on mirror servers, but to the clients themselves by having them upload bits of the file to each other while downloading it. Since the clients usually do not utilize their upload capacity while fetching a file, this approach does not put the clients in any disadvantage. This has the added advantage that even small organizations with limited resources can publish large files on the Internet without having to invest in costly infrastructure. 1.1 ExtensionsSince the introduction of BTP many modifications and extensions have been proposed by individuals and community forums. To the extent that these extensions have become part of what the BitTorrent community considers best practice they have been included in this document. However, many extensions have been omitted either because they have been deemed to lack interoperability with existing implementations, or because they are not regarded as being sufficiently mature. 1.2 AudienceThis document is aimed at developers who wish to implement BTP for a particular platform. Also, system administrators and architects may use this document to fully understand the implications of installing an implementation of BTP. In particular, it is advised to study the security implications in more detail, before installing an implementation on a machine that also contains sensitive data. Security implications are discussed in Section 7 (Security Consideration). 1.3 Terminology
1.4 Overall OperationBTP consists of two logically distinct protocols, namely the Tracker HTTP Protocol (THP), and the Peer Wire Protocol (PWP). THP defines a method for contacting a tracker for the purposes of joining a swarm, reporting progress etc. PWP defines a mechanism for communication between peers, and is thus responsible for carrying out the actual download and upload of the torrent. In order for a client to download a torrent the following steps must be carried through:
To publish a torrent the following steps must be taken:
2. BencodingBencoding encodes data in a platform independent way. In BTP/1.0 the metainfo file and all responses from the tracker are encoded in the bencoding format. The format specifies two scalar types (integers and strings) and two compound types (lists and dictionaries). The Augmented BNF syntax (Crocker, D., Ed. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” November 1997.)[5] for bencoding format is: dictionary = "d" 1*(string anytype) "e" ; non-empty dictionary list = "l" 1*anytype "e" ; non-empty list integer = "i" signumber "e" string = number ":" <number long sequence of any CHAR> anytype = dictionary / list / integer / string signumber = "-" number / number number = 1*DIGIT CHAR = %x00-FF ; any 8-bit character DIGIT = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" 2.1 Scalar TypesIntegers are encoded by prefixing a string containing the base ten representation of the integer with the letter "i" and postfixing it with the letter "e". E.g. the integer 123 is encoded as i123e. Strings are encoded by prefixing the string content with the length of the string followed by a colon. E.g. the string "announce" is encoded as "8:announce". 2.2 Compound TypesThe compound types provides a mean to structure elements of any bencoding type. Lists are an arbitrary number of bencoded elements prefixed with the letter "l" and postfixed with the letter "e". It follows that lists can contain nested lists and dictionaries. For instance "li2e3:fooe" defines a list containing the integer "2" and the string "foo". Dictionaries are an arbitrary number of key/value pairs delimited by the letter "d" at the beginning and the letter "e" at the end. All keys are bencoded strings while the associated value can be any bencoded element. E.g. "d5:monthi4e4:name5:aprile" defines a dictionary holding the associations: "month" => "4" and "name" => "april". All dictionary keys MUST be sorted. 2.3 Bencode Examples
3. Pieces and BlocksThis section describes how a torrent is organized in pieces and blocks. The torrent is divided into one or more pieces. Each piece represents a range of data which it is possible to verify using a piece SHA1 hash. When distributing data over PWP pieces are divided into one or more blocks, as shown in the following diagram: --------------------------------------- | Piece #0 | Piece #1 | .. | Piece #N | --------------------------------------- _-´ `-_ _-´ `-_ ---------------------------- | Block #0 | .. | Block #M | ---------------------------- 3.1 PiecesThe number of pieces in the torrent is indicated in the metainfo file. The size of each piece in the torrent remains fixed and can be calculated using the following formula: fixed_piece_size = size_of_torrent / number_of_pieces where "/" is the integer division operator. Only the last piece of the torrent is allowed to have fewer bytes than the fixed piece size. The size of a piece is determined by the publisher of the torrent. A good recommendation is to use a piece size so that the metainfo file does not exceed 70 kilobytes. For the sake of calculating the correct position of a piece within a file, or files, the torrent is regarded as a single continuous byte stream. In case the torrent consists of multiple files, it is to be viewed as the concatenation of these files in the order of their appearance in the metainfo file. Conceptually, the torrent is only translated into files when all its pieces have been downloaded and verified using their respective SHA1 values; although in practice an implementation may choose a better approach in accordance with local operating system and file system specific demands. 3.2 BlocksThe size of a block is an implementation defined value that is not dependant on the fixed piece size. Once a fixed size is defined, the number of blocks per piece can be calculated using the formula: number_of_blocks = (fixed_piece_size / fixed_block_size) + !!(fixed_piece_size % fixed_block_size) where "%" denotes the modulus operator, and "!" the negation operator. The negation operator is used to ensure that the last factor only adds a value of 0 or 1 to the sum. Given the start offset of the block its index within a piece can be calculated using the formula: block_index = block_offset % fixed_block_size
4. The Metainfo FileThe metainfo file provides the client with information on the tracker location as well as the torrent to be downloaded. Besides listing which files will result from downloading the torrent, it also lists how the client should split up and verify individual pieces making up the complete torrent. In order for a client to recognize the metainfo file it SHOULD have the extension .torrent and the associated the media type "application/x-bittorrent". How the client retrieves the metainfo file is beyond the scope of this document, however, the most user-friendly approach is for a client to find the file on a web page, click on it, and start the download immediately. This way, the apparent complexity of BTP as opposed to FTP or HTTP transfer is transparent to the user. 4.1 The Structure of the Metainfo FileThe metainfo file contains a bencoded dictionary with the following structure. A key below is REQUIRED unless otherwise noted.
4.1.1 Single File TorrentsIf the torrent only specifies one file, the info dictionary must have the following keys:
The complete file is derived by combining all the pieces into one string of bytes. 4.1.2 Multi File TorrentsIf the torrent specifies multiple files, the info dictionary must have the following structure:
5. The Tracker HTTP ProtocolThe Tracker HTTP Protocol (THP) is a simple mechanism for introducing peers to each other. A tracker is a HTTP service that must be contacted by a peer in order to join a swarm. As such the tracker constitutes the only centralized element in BTP/1.0. A tracker does not by itself provide access to any downloadable data. A tracker relies on peers sending regular requests. It may assume that a peer is dead if it misses a request. 5.1 RequestTo contact the tracker a peer MUST send a standard HTTP GET request using the URL in the "announce" entry of the metainfo file. The GET request must be parameterized as specified in the HTTP protocol. The following parameters must be present in the request:
5.2 ResponseUpon receiving the HTTP GET request, the tracker MUST respond with a document having the "text/plain" MIME type. This document MUST contain a bencoded dictionary with the following keys:
6. The Peer Wire ProtocolThe aim of the PWP, is to facilitate communication between neighboring peers for the purpose of sharing file content. PWP describes the steps taken by a peer after it has read in a metainfo file and contacted a tracker to gather information about other peers it may communicate with. PWP is layered on top of TCP and handles all its communication using asynchronous messages. 6.1 Peer Wire GuidelinesPWP does not specify a standard algorithm for selecting elements from a clients neighboring peers with whom to share pieces, although the following guidelines are expected to be observed by any such algorithm:
6.2 HandshakingA client is a peer executing the peer wire protocol in communicating with remote peers. The client's handshaking message may indicate either to share a piece (uploading the piece) or to receive a piece (downloading the piece). It's the client's responsibility to open a port on which to listen for incoming connections from remote peers. This port is then reported to the tracker. As BTP/1.0 does not specify any standard port for listening it is the sole responsibility of the implementation to select a port. Any remote peer wishing to communicate with the client must open a TCP connection to this port and perform a handshake operation. The handshake operation MUST be carried out before any other data is sent from the remote peer. The client MUST NOT send any data back to the remote peer before a well constructed handshake has been recognized according to the rules below. If the handshake in any way violates these rules the client MUST close the connection with the remote peer.
A handshake is a string of bytes with the following structure: ---------------------------------------------------------------- | Name Length | Protocol Name | Reserved | Info Hash | Peer ID | ----------------------------------------------------------------
In BTP/1.0 the handshake has a total of 68 bytes. 6.3 Two Categories of MessagesFollowing the PWP handshake both ends of the TCP channel may send messages to each other in a completely asynchronous fashion. PWP messages have the dual purpose of updating the state of neighboring peers with regard to changes in the local peer, as well as transferring data blocks between neighboring peers. PWP Messages fall into two different categories:
6.3.1 Peer StatesA peer must maintain the two state flags for each end of the a connection:
By definition, peer A is interested in peer B when peer B has pieces that peer A does not have. Conversely, peer A is not interested in peer B when peer A has the same or more pieces than peer B has. Therefore, each peer maintains 4 flag in total:
Initially, all connections start out in a state of choked and un-interested.
A choked peer MUST not send any data-oriented messages, but is allowed to send state-oriented message to the other peer that has choked it. Actually it is important to keep the remote peer informed of the interested status so that the remote peer can know if the client will start downloading when the client is unchoked. If a peer chokes a remote peer, it MUST also discard any unanswered requests for blocks previously received from the remote peer. An unchoked peer is allowed to send data-oriented messages to the remote peer. It is left to the implementation how many peers any given peer may choose to choke or unchoke, and in what fashion. This is done deliberately to allow parties to use different heuristics for peer selection. An interested peer indicates to the remote peer that it must expect to receive data-oriented messages as soon as it unchokes the interested peer. It must be noted, that a peer must not assume a remote peer is interested solely because it has pieces that the remote peer is lacking. There may be valid reasons why a peer is not interested in another peer other than data-based ones. 6.3.1.1 State TransitionsFour states can be formed by two status flags.
A base line transition of connection status can be described in a model as below, where the client is downloading from a remote peer (uploader). The transition is described with respect to the client (downloader).
Similar, you may derived the state transition table with respect to the uploader. 6.3.2 Peer Wire MessagesAll integer members in PWP messages are encoded as a 4-byte big-endian number. Furthermore, all index and offset members in PWP messages are zero-based. A PWP message has the following structure: ----------------------------------------- | Message Length | Message ID | Payload | -----------------------------------------
If an incoming message in any way violates this structure then the connection SHOULD be dropped. In particular the receiver SHOULD make sure the message ID constitutes a valid message, and the payload matches the expected payload, as given below. For the purpose of compatibility with future protocol extensions the client SHOULD ignore unknown messages. There may arise situations in which a client may choose to drop a connection after receiving an unknown message, either for security reasons, or because discarding large unknown messages may be viewed as excessive waste. BTP/1.0 specifies the following messages: 6.3.3 Choke[L= 0x0001] [ID = 0] This message has ID 0 and no payload. A peer sends this message to a remote peer to inform the remote peer that it is being choked. 6.3.4 Unchoke[L= 0x0001] [ID = 1] This message has ID 1 and no payload. A peer sends this message to a remote peer to inform the remote peer that it is no longer being choked. 6.3.5 Interested[L= 0x0001] [ID = 2] This message has ID 2 and no payload. A peer sends this message to a remote peer to inform the remote peer of its desire to request data. 6.3.6 Uninterested[L= 0x0001] [ID = 3] This message has ID 3 and no payload. A peer sends this message to a remote peer to inform it that it is not interested in any pieces from the remote peer. 6.3.7 Have[L= 0x0005] [ID = 4] [Piece Index, zero-based, 4-byte long] This message has ID 4 and a payload of length 4. The payload is a number denoting the index of a piece that the peer has successfully downloaded and validated. A peer receiving this message must validate the index and drop the connection if this index is not within the expected bounds. Two scenarios in using the message. When a peer (downloader) receives this message, it MUST send an interested message to the sender (uploader) if indeed it lacks the piece announced. It MAY also send a request for that piece. On the other hand, the downloader may also use the message as an acknowledgement after a piece has been received and checked. 6.3.8 Bitfield[L= 0x0001 + K] [ID = 5] [one-hot coded, K-byte long] This message has ID 5 and a variable payload length. The payload is a bitfield representing the pieces that the sender has successfully downloaded, with the high bit in the first byte corresponding to piece index 0. If a bit is cleared it is to be interpreted as a missing piece. A peer can ONLY send this message immediately after the handshake operation, and MAY choose not to send it if it has no pieces at all. This message MUST not be sent at any other time during the communication. 6.3.9 Request[L= 0x0001 + 0x000C] [ID = 6] [Piece Index] [Blk Offset] [Size] This message has ID 6 and a payload of length 12. The payload is 3 integer values (4-byte each) indicating a block within a piece that the sender is interested in downloading from the recipient. The recipient MUST only send piece messages to a sender that has already requested it, and only in accordance to the rules given above about the choke and interested states. The payload has the following structure: --------------------------------------------- | Piece Index | Block Offset | Block Length | --------------------------------------------- 6.3.10 Piece[L= 0x0009 + K] [ID = 7] [Piece Index] [Blk Offset] [Data, K bytes] This message has ID 7 and a variable length payload. The payload includes Piece Index and zero-based block offset within the piece, and the block data in the piece. The size of data can be derived by subtracting 9 from the message length field. The payload has the following structure: ------------------------------------------- | Piece Index | Block Offset | Block Data | ------------------------------------------- 6.3.11 Cancel[L= 0x0001 + 0x000C] [ID = 8] [Piece Index] [Blk Offset] [Size] This message has ID 8 and a payload of length 12. The payload is 3 integer values indicating a block within a piece that the sender has requested for, but is no longer interested in. The recipient MUST erase the request information upon receiving this messages. It is typically used during "End Game" when a peer (downloader) sends requests for all of its missing blocks to all of its peers. To keep this from becoming horribly inefficient, the peer also sends a cancel to everyone else every time a block arrives. The payload has the following structure: --------------------------------------------- | Piece Index | Block Offset | Block Length | --------------------------------------------- 6.4 The End GameTowards the end of a download session, it may speed up the download to send request messages for the remaining blocks to all the neighboring peers. A client must issue cancel messages to all pending requests sent to neighboring peers as soon as a piece is downloaded successfully. This is referred to as the end game. A client usually sends requests for blocks in stages; sending requests for newer blocks as replies for earlier requests are received. The client enters the end game, when all remaining blocks have been requested. 6.5 Piece Selection StrategyBTP/1.0 does not force a particular order for selecting which pieces to download. However, experience shows that downloading in rarest-first order seems to lessen the wait time for pieces. To find the rarest piece a client must calculate for each piece index the number of times this index is true in the bitfield vectors of all the neighboring peers. The piece with the lowest sum is then selected for requesting. 6.6 Peer Selection Strategy (choking algorithm)Peers are selected by invoking a choking strategy that can maximize the chance of finding the best suitable peers with whom to exchange pieces. Implementations are free to implement any strategy as long as the guidelines in Section 6.1 are observed. Initially, both ends of a connection set the Choked Flag to true and the Interested Flag to false, after the initial handshaking. All uploading peers are periodically rated in terms of their ability to provide the client (downloader) with a better download rate. The rating may take into account factors such as the uploading peers willingness to maintain an unchoked connection with the client over a certain period of time, the uploading peers' data rate to the client, and other implementation defined criteria. The rating with regard to the above scheme is then used as the base for the client to determine which peer to serve first when the client serves other peers (as a uploader to share the available pieces that have been downloaded). Assume only 5 peers are allowed to download at the same time. The choking algorithm will now unchoke 5 of the best rated peers that are interested, and also unchoke as many of the best rated but uninterested peers as necessary. When one of the top rated peers at a later stage becomes interested, then the algorithm will choke the worst unchoked peer. Notice that the worst unchoked peer is always interested. A lacking element from the above algorithm is the capability to ensure that new peers can have a fair chance of downloading a piece, even though they would evaluate poorly in the above schema. A simple method is to make sure that a random peer is selected periodically regardless of how it evaluates. Since this process is repeated in a round robin manner, it ensures that ultimately even new peers will have a chance of being unchoked. Another pitfall it should avoid is to choke and unchoke quickly, known as 'fibrillation'. A client should reciprocate to peers who let it download. A typical choking algorithm avoids fibrillation by only changing choked peers once every ten (10) seconds - it's the interval between each call of the algorithm. But the algorithm may also be invoked when a peer leaves (certain message exchange failed with the peer) or when an unchoked peer becomes interested or not interested. Finally, it should try out unused connections once in a while to find out if they might be better than the currently used ones, known as optimistic unchoking. Here is an outline of the choking algorithm for seeding a piece by a leeching client (at least one or more pieces are available to upload, but also leeching other pieces):
The choking algorithm will be slightly different for a seeding client (who has the full content). Instead the remote peer's uploading rate, their downloading rate is used to sort the peer list. 6.7 Optimistic UnchokingFor optimistic unchoking, at any one time there is one peer being unchoked regardless of it's upload rate (if interested, it counts as one of the four allowed downloaders). The optimistically unchoked peer selection rotates every 30 seconds. The implementation should favor the newly connected peers so that they are more likely to start in the current optimistic unchoke than anywhere else in the rotation. That'd increase the chance for new peers to receive a complete piece and start to upload. 6.8 Anti-snubbingSnubbing is a problem we may run into. It happens when a client (downloader) was choked by all peers which uploading the contents to the client. In such cases the client will usually continue to get poor download rates until the optimistic unchoke finds better peers. To to reduce the severity of impacts of this problem, when over a period of time (60 seconds, e.g.) goes by without getting a single piece from a particular peer, the protocol assumes the client is "snubbed" by that peer. As a result, the client doesn't upload to the peer except as an optimistic unchoke. This frequently results in more than one concurrent optimistic unchoke, (an exception to the exactly one optimistic unchoke rule mentioned above), which causes download rates to recover much more quickly when they falter. But anti-snubbing may result in situations where a peer does not contribute its upload bandwidth, so as to cause inefficiency in the network.
7. Security ConsiderationThis section examines security considerations for BTP/1.0.The discussion does not include definitive solutions to the problems revealed, though it does make some suggestions for reducing security risks. 7.1 Tracker HTTP Protocol IssuesThe use of the HTTP protocol for communication between the tracker and the client makes BTP/1.0 vulnerable to the attacks mentioned in the security consideration section of RFC 2616 (Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.)[6]. 7.2 Denial of Service Attacks on TrackersThe nature of the tracker is to serve many clients. By mounting a denial of service attack against the tracker the swarm attached to the tracker can be starved. This type of attack is hard to defend against, however, the metainfo file allows for multiple trackers to be specified, making it possible to spread the load on a number of trackers, and thus containing such an attack. 7.3 Peer Identity IssuesThere is no strong authentication of clients when they contact the tracker. The main option for trackers is to check peer ID and the IP address of the client. The lack of authentication can be used to mount an attack where a client can shut down another client if the two clients are running on the same host and thus are sharing the same IP address. In addition, a rogue peer may masquerade its identity by using multiple peer IDs. Clients should there refrain from taking the peer ID at face value. 7.4 DNS SpoofingClients using BTP/1.0 rely heavily on the Domain Name Service, which can be used for both specifying the URI of the tracker and how to contact a peer. Clients are thus generally prone to security attacks based on the deliberate mis-association of IP addresses and DNS names. Clients need to be cautious in assuming the continuing validity of an IP address/DNS name association. In particular, BTP/1.0 clients SHOULD rely on their name resolver for confirmation of an IP number/DNS name association, rather than caching the result of previous host name lookups. If clients cache the results of host name lookups in order to achieve a performance improvement, they MUST observe the TTL information reported by DNS. If clients do not observe this rule, they could be spoofed when a previously-accessed peers or trackers IP address changes. As network renumbering is expected to become increasingly common according to RFC 1900 (Carpenter, B. and Y. Rekhter, “Renumbering Needs Work,” February 1996.)[2], the possibility of this form of attack will grow. Observing this requirement reduces this potential security vulnerability. 7.5 Issues with File and Directory NamesThe metainfo file provides a way to suggest a name of the downloaded file for single-file torrents and the top-most directory for multi-file torrents. This functionality is very like the Content-Disposition header field documented in RFC 2183 (Troost, R., Dorner, S., and K. Moore, “Communicating Presentation Information in Internet Messages: The Content-Disposition Header Field,” August 1997.)[4] and the security considerations mentioned in this RFC also apply to BitTorrent clients. In short, BTP clients SHOULD verify that the suggested file names in the metainfo file do not compromise services on the local system. Furthermore, care must be taken for multi-file torrents to validate that individual files are relative to the top-most directory and that the paths do not contain path elements to the parent (that is directory elevators such as ``..''), which can be used to place files outside the top-most directory. Using UNIX as an example, some hazards would be:
It is very important to note that this is not an exhaustive list; it is intended as a small set of examples only. Implementers must be alert to the potential hazards on their target systems. In general, the BTP client SHOULD NOT name or place files such that they will get interpreted or executed without the user explicitly initiating the action. 7.6 Validating the Integrity of Data Exchanged Between PeersBy default, all content served to the client from other peers should be considered tainted and the client SHOULD validate the integrity of the data before accepting it. The metainfo file contains information for checking both individual pieces using SHA1, and optionally individual files using MD5. SHA1, being the strongest of the two, is preferred. Furthermore, sole reliance on whole-file checking can potentially render otherwise valid pieces invalid, and should only be considered for small files, to limit the amount of data being discarded. Trusting the validity of the resulting file or files ends up being a matter of trusting the content of the metainfo file. Ensuring the validity of the metainfo file is beyond the scope of this document. 7.7 Transfer of Sensitive InformationSome clients include information about themselves when generating the peer ID string. Clients should be aware that this information can potentially be used to determine whether a specific client has a exploitable security hole.
8. IANA ConsiderationsThis document makes no request of IANA.
9. References
Original Authors
11 IndexAnti-snubbingBencoding Bitfield Blocks Cancel Choke Compound Types Denial of Service Attacks (tracker) DNS Spoofing End Game Extensions Handshaking Have IANA Considerations Integrity of Data Exchanged Between Peers Interested Two Categories of Messages Metainfo File Multi File Torrents Optimistic Unchoking Peer Identity Issues Peer States Peer Wire Messages Peer Wire Protocol Pieces Piece Piece Selection Strategy Peer Selection Strategy (choking algorithm) Peer Wire Guidelines Request (trucker) Request (peer) Response Scalar Types Security Consideration Tracker HTTP Protocol Unchoke Uninterested Tracker HTTP Protocol Issues Transfer of Sensitive Information
|