* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Survey
Document related concepts
Multiprotocol Label Switching wikipedia , lookup
Network tap wikipedia , lookup
Computer network wikipedia , lookup
Asynchronous Transfer Mode wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Deep packet inspection wikipedia , lookup
Real-Time Messaging Protocol wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
UniPro protocol stack wikipedia , lookup
Transcript
TRANSPORT LAYER Dr. Nawaporn Wisitpongphan Credit: Prof. Nick McKeown http://www.stanford.edu/~nickm OUTLINE The Transport Layer The UDP Protocol The TCP Protocol TCP Characteristics TCP Connection setup TCP Segments TCP Sequence Numbers TCP Sliding Window Timeouts and Retransmission Congestion Control and Avoidance REVIEW OF THE TRANSPORT LAYER Athena.MIT.edu Leland.Stanford.edu Application Layer Nick Dave Transport Layer O.S. D Data Header Data O.S. Header Network Layer H D H D D H H D D H Link Layer H LAYERING: THE OSI MODEL layer-to-layer communication Application Application Presentation Presentation Session Session 7 6 5 4 3 2 1 7 6 Peer-layer communication Transport Router Router Transport Network Network Network Network Link Link Link Link Physical Physical Physical Physical 5 4 3 2 1 USER DATAGRAM PROTOCOL (UDP) CHARACTERISTICS UDP is a connectionless datagram service. There is no connection establishment: packets may show up at any time. UDP is unreliable: No acknowledgements to indicate delivery of data. Checksums cover the header, and only optionally cover the data. Contains no mechanism to detect missing or mis-sequenced packets. No mechanism for automatic retransmission. No mechanism for flow control, and so can over-run the receiver. USER-DATAGRAM PROTOCOL (UDP) A1 App A2 App B1 B2 App App OS Port Description 123 Network Time Protocol (NTP) 67,68 Dynamic Host Configuration Protocol (DHCP) 500 Internet Security Association Key Management Protocol (ISAKMP) 520 Routing Information Protocol UDP IP UDP uses port number to demultiplex packets USER-DATAGRAM PROTOCOL (UDP) PACKET FORMAT By default, only covers the header. SRC port DST port checksum length DATA Why do we have UDP? It is used by applications that don’t need reliable delivery, or Applications that have their own special needs, such as streaming of real-time audio/video. TCP CHARACTERISTICS TCP is connection-oriented. 3-way handshake used for connection setup. TCP provides a stream-of-bytes service. TCP is reliable: Acknowledgements indicate delivery of data. Checksums are used to detect corrupted data. Sequence numbers detect missing, or mis-sequenced data. Corrupted data is retransmitted after a timeout. Mis-sequenced data is re-sequenced. (Window-based) Flow control prevents over-run of receiver. TCP uses congestion control to share network capacity among users. HTTP AND TCP Port Description 80 HTTP 23 Telnet 20/21 FTP(data/control) 25 Simple Mail Transfer Protocol (SMTP) TCP IS CONNECTION-ORIENTED (Active) Client (Passive) Server (Active) Client (Passive) Server Syn Fin Syn + Ack (Data +) Ack Ack Fin Ack Connection Setup 3-way handshake Connection Close/Teardown 2 x 2-way handshake THE TCP DIAGRAM Which path does the Active Client or Passive Server follow? (Active) Client (Passive) Server TCP CLIENT TCP SERVER TCP SUPPORTS A “STREAM OF BYTES” SERVICE Host A TCP accepts data as a constant stream from the applications There are no record markers automatically inserted by TCP. Example: If the application on one end writes 10 bytes, followed by a write of 20 bytes, followed by a write of 50 bytes, the application at the other end of the connection cannot tell what size the individual writes were. The other end may read the 80 bytes in four reads of 20 bytes at a time. One end puts a stream of bytes into TCP and the same, identical stream of bytes appears at the other end Host B …WHICH IS EMULATED USING TCP “SEGMENTS” Host A Segment sent when: 1. Segment full (MSS bytes), 2. Not full, but times out, or 3. “Pushed” by application. TCP Data Host B TCP Data THE TCP SEGMENT FORMAT IP Data TCP Data 0 TCP Hdr 15 Src port 31 Dst port Sequence # Ack Sequence # HLEN 4 RSVD 6 Flags URG ACK PSH RST SYN FIN TCP Header and Data + IP Addresses Checksum IP Hdr Window Size Urgent Pointer (TCP Options) TCP Data Src/dst port numbers and IP addresses uniquely identify socket SEQUENCE NUMBERS Host A ISN (initial sequence number) Sequence number = 1st byte Host B How does ISN get chosen? TCP Data TCP HDR TCP Data Ack sequence number = next expected byte TCP HDR INITIAL SEQUENCE NUMBERS (Active) Client (Passive) Server Syn +ISNA Syn + Ack +ISNB Sequence number = 32 bits What if a message has more than 232 bytes? Sequence Number wrap-around Ack Connection Setup 3-way handshake Solution : Timestamp Option : Sender places timestamp in every segment : Receiver copies timestamp in the ACK it sends for a segment TCP SLIDING WINDOW How much data can a TCP sender have outstanding in the network? How much data should TCP retransmit when an error occurs? Just selectively repeat the missing data? How does the TCP sender avoid over-running the receiver’s buffers? TCP SLIDING WINDOW Window Size Data ACK’d Outstanding Un-ack’d data Data OK to send Data not OK to send yet Window is meaningful to the sender. Current window size is “advertised” by receiver (usually 4k – 8k Bytes when connection set-up). TCP SLIDING WINDOW Round-trip time Round-trip time Window Size ??? Window Size Window Size Host A Host B ACK (1) RTT > Window size ACK ACK (2) RTT = Window size TCP: RETRANSMISSION AND TIMEOUTS Round-trip time (RTT) Retransmission TimeOut (RTO) Guard Band Host A Estimated RTT Data1 Data2 ACK ACK Host B TCP uses an adaptive retransmission timeout value: Congestion RTT changes Changes in Routing frequently TCP: RETRANSMISSION AND TIMEOUTS Picking the RTO is important: Pick a values that’s too big and it will wait too long to retransmit a packet, Pick a value too small, and it will unnecessarily retransmit packets. The original algorithm for picking RTO: 1. EstimatedRTTk= EstimatedRTTk-1 + (1 - ) SampleRTT 2. RTO = 2 * EstimatedRTT Determined empirically Characteristics of the original algorithm: Variance is assumed to be fixed. But in practice, variance increases as congestion increases. TCP: RETRANSMISSION AND TIMEOUTS Router queues grow when there is more traffic, until they become unstable. As load grows, variance of delay grows rapidly. Average Queueing Delay There will be some (unknown) distribution of RTTs. We are trying to estimate an RTO to minimize the probability of a false timeout. Probability Variance grows rapidly with load variance mean RTT Load (Amount of traffic arriving to router) TCP: RETRANSMISSION AND TIMEOUTS Newer Algorithm includes estimate of variance in RTT: Difference = SampleRTT - EstimatedRTT EstimatedRTTk = EstimatedRTTk-1 + (*Difference) Deviation = Deviation + *( |Difference| - Deviation ) RTO = * EstimatedRTT + * Deviation 1 4 Same as before TCP: RETRANSMISSION AND TIMEOUTS KARN’S ALGORITHM Host A Host B Host A Retransmission Wrong RTT Sample Host B Retransmission Wrong RTT Sample Problem: How can we estimate RTT when packets are retransmitted? Solution: On retransmission, don’t update estimated RTT (and double RTO). CONGESTION CONTROL: MAIN POINTS Congestion is inevitable Congestion happens at different scales – from two individual packets colliding to too many users TCP Senders can detect congestion and reduce their sending rate by reducing the window size TCP modifies the rate according to “Additive Increase, Multiplicative Decrease (AIMD)”. To probe and find the initial rate, TCP uses a restart mechanism called “slow start”. Routers slow down TCP senders by buffering packets and thus increasing delay CONGESTION H1 A1(t) 10Mb/s R1 H2 D(t) 1.5Mb/s H3 A2(t) 100Mb/s A1(t) A2(t) Cumulative bytes A2(t) A1(t) X(t) D(t) t X(t) D(t) TIME SCALES OF CONGESTION Too many users using a link during a peak hour 7:00 8:00 9:00 1s 2s 3s TCP flows filling up all available bandwidth Two packets colliding at a router 100µs 200µs 300µs DEALING WITH CONGESTION EXAMPLE: TWO FLOWS ARRIVING AT A ROUTER A1(t) A2(t) ? R1 Strategy Drop one of the flows Buffer one flow until the other has departed, then send it Re-Schedule one of the two flows for a later time Ask both flows to reduce their rates CONGESTION IS UNAVOIDABLE ARGUABLY IT’S GOOD! We use packet switching because it makes efficient use of the links. Therefore, buffers in the routers are frequently occupied. If buffers are always empty, delay is low, but our usage of the network is low. If buffers are always occupied, delay is high, but we are using the network more efficiently. So how much congestion is too much? LOAD, DELAY AND POWER Typical behavior of queueing systems with random arrivals: A simple metric of how well the network is performing: Load Power Delay Burstiness tends to move asymptote to the left Average Packet delay Power Load “optimal load” Load OPTIONS FOR CONGESTION CONTROL 1. 2. 3. Implemented by host versus network Reservation-based, versus feedback-based Window-based versus rate-based. TCP CONGESTION CONTROL TCP implements host-based, feedback-based, window-based congestion control. TCP sources attempts to determine how much capacity is available TCP sends packets, then reacts to observable events (loss). TCP CONGESTION CONTROL TCP sources change the sending rate by modifying the window size: Window = min{Advertized window, Congestion Window} Receiver Transmitter (“cwnd”) In other words, send at the rate of the slowest component: network or receiver. “cwnd” follows additive increase/multiplicative decrease On receipt of Ack: cwnd += 1 On packet loss (timeout): cwnd *= 0.5 ADDITIVE INCREASE/ MULTIPLICATIVE DECREASE Src D A D D A A D D D A A A Dest Additive Increase: Every time the source successfully sends a cwnd’s worth of packets (each pkt sent out during the last RTT has been ACKed) add the equivalent of 1 pkt to the cwnd Increment = MSS×(MSS/CWND) ; CWND≥MSS CWND +=Increment LEADS TO THE TCP “SAWTOOTH” Window Timeouts halved Could take a long time to get started! t Multiplicative Decrease: For each timeout, the source set CWND to half of its previous value. CWND is large all the packets dropped will be retransmitted congestion gets worse Need to get out of this state quickly “SLOW START” Designed to find the fair-share rate quickly at startup. How Does it work? 1. 2. 3. 4. Src Increase cwnd exponentially for each ACK received, until it reaches SSthreshold. If cwnd < SSthreshold {Do Slow Start}, else {Do Congestion Avoidance} Initial SSThreshold = large value. After the pkt lost, SSThreshold = cwnd/2 Congestion Avoidance Increase cwnd linearly 1 D 2 A D D 4 A A D D 8 D A Dest D A A A SLOW START Why is it called slow-start? Because TCP originally had no congestion control mechanism. The source would just start by sending a whole advertised window’s worth of data. FAST RETRANSMIT AND FAST RECOVERY? Homework!! TCP SENDING RATE What is the sending rate of TCP? Acknowledgement for sent packet is received after one RTT Amount of data sent until ACK is received is the current window size W Therefore sending rate is R = W/RTT Is the TCP sending rate saw tooth shaped as well? TCP AND BUFFERS TCP AND BUFFERS For TCP with a single flow over a network link with enough buffers, RTT and W are proportional to each other Therefore the sending rate R = W/RTT is constant (and not a sawtooth) But experiments and theory suggest that with many flows: 1 R RTT p Where: p is the drop probability. TCP rate can be controlled in two ways: 1. Buffering packets and increasing the RTT 2. Dropping packets to decrease TCP’s window size CONGESTION CONTROL IN THE INTERNET Maximum window sizes of most TCP implementations by default are very small Windows XP: 12 packets Linux/Mac: 40 packets Often the buffer of a link is larger than the maximum window size of TCP A typical DSL line has 200 packets worth of buffer For a TCP session, the maximum number of packets outstanding is 40 The buffer can never fill up The router will never drop a packet CONGESTION AVOIDANCE TCP reacts to congestion after it takes place. The data rate changes rapidly and the system is barely stable (or is even unstable). Can we predict when congestion is about to happen and avoid it? E.g. by detecting the knee of the curve. Average Packet delay Load CONGESTION AVOIDANCE SCHEMES Router-based Congestion Avoidance: DECbit: Routers explicitly notify sources about congestion. Random Early Detection (RED): Routers implicitly notify sources by dropping packets. RED drops packets at random, and as a function of the level of congestion. Host-based Congestion Avoidance Source monitors changes in RTT to detect onset of congestion. DECBIT Each packet has a “Congestion Notification” bit called the DECbit in its header. If any router on the path is congested, it sets the DECbit. Set if average queue length >= 1 packet, averaged since the start of the previous busy cycle. To notify the source, the destination copies DECbit into ACK packets. Source adjusts rate to avoid congestion. Counts fraction of DECbits set in each window. If <50% set, increase rate additively. If >=50% set, decrease rate multiplicatively. Queue Length at router Averaging period Time RANDOM EARLY DETECTION (RED) RED is based on DECbit, and was designed to work well with TCP. RED implicitly notifies sender by dropping packets. Drop probability is increased as the average queue length increases. (Geometric) moving average of the queue length is used so as to detect long term congestion, yet allow short term bursts to arrive. AvgLenn 1 (1 ) AvgLenn Lengthn n i.e. AvgLenn 1 Lengthi ( )(1 ) n i i 1 RED DROP PROBABILITIES D(t) A(t) 1 maxP If minTh AvgLen maxTh : AvgLen minTh pˆ AvgLen maxP maxTh minTh pˆ AvgLen Pr(Drop Packet) 1 count pˆ AvgLen minTh maxTh AvgLen count counts how long we've been in minTh AvgLen maxTh since we last dropped a packet. i.e. drops are spaced out in time, reducing likelihood of re-entering slow-start. PROPERTIES OF RED Drops packets before queue is full, in the hope of reducing the rates of some flows. Drops packet for each flow roughly in proportion to its rate. Drops are spaced out in time. Because it uses average queue length, RED is tolerant of bursts. Random drops hopefully desynchronize TCP sources. SYNCHRONIZATION OF SOURCES RTT A B C D Source A N RTT SYNCHRONIZATION OF SOURCES RTT A B C D Aggregate Flow f(RTT) Avg DESYNCHRONIZED SOURCES RTT A B C D Source A N RTT DESYNCHRONIZED SOURCES RTT A B C D Aggregate Flow N RTT Avg