Download Document

Document related concepts

Multiprotocol Label Switching wikipedia , lookup

Network tap wikipedia , lookup

Lag wikipedia , lookup

Computer network wikipedia , lookup

Asynchronous Transfer Mode wikipedia , lookup

Net bias wikipedia , lookup

Wake-on-LAN wikipedia , lookup

RapidIO wikipedia , lookup

Deep packet inspection wikipedia , lookup

IEEE 1355 wikipedia , lookup

Real-Time Messaging Protocol wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Internet protocol suite wikipedia , lookup

TCP congestion control wikipedia , lookup

Transcript
TRANSPORT LAYER
Dr. Nawaporn Wisitpongphan
Credit: Prof. Nick McKeown
http://www.stanford.edu/~nickm
OUTLINE
The Transport Layer
 The UDP Protocol
 The TCP Protocol

TCP Characteristics
 TCP Connection setup
 TCP Segments
 TCP Sequence Numbers
 TCP Sliding Window
 Timeouts and Retransmission
 Congestion Control and Avoidance

REVIEW OF THE TRANSPORT LAYER
Athena.MIT.edu
Leland.Stanford.edu
Application Layer
Nick
Dave
Transport Layer
O.S.
D
Data
Header
Data
O.S.
Header
Network Layer
H
D
H
D
D
H
H
D
D
H
Link Layer
H
LAYERING: THE OSI MODEL
layer-to-layer communication
Application
Application
Presentation
Presentation
Session
Session
7
6
5
4
3
2
1
7
6
Peer-layer communication
Transport
Router
Router
Transport
Network
Network
Network
Network
Link
Link
Link
Link
Physical
Physical
Physical
Physical
5
4
3
2
1
USER DATAGRAM PROTOCOL (UDP)
CHARACTERISTICS

UDP is a connectionless datagram service.


There is no connection establishment: packets may show up at
any time.
UDP is unreliable:





No acknowledgements to indicate delivery of data.
Checksums cover the header, and only optionally cover the data.
Contains no mechanism to detect missing or mis-sequenced
packets.
No mechanism for automatic retransmission.
No mechanism for flow control, and so can over-run the receiver.
USER-DATAGRAM PROTOCOL (UDP)
A1
App
A2
App
B1
B2
App
App
OS
Port
Description
123
Network Time Protocol (NTP)
67,68
Dynamic Host Configuration
Protocol (DHCP)
500
Internet Security Association
Key Management Protocol
(ISAKMP)
520
Routing Information Protocol
UDP
IP
UDP uses port number
to demultiplex packets
USER-DATAGRAM PROTOCOL (UDP)
PACKET FORMAT
By default, only
covers the header.
SRC port
DST port
checksum
length
DATA

Why do we have UDP?
 It is used by applications that don’t need reliable delivery, or
 Applications that have their own special needs, such as
streaming of real-time audio/video.
TCP CHARACTERISTICS

TCP is connection-oriented.

3-way handshake used for connection setup.
TCP provides a stream-of-bytes service.
 TCP is reliable:








Acknowledgements indicate delivery of data.
Checksums are used to detect corrupted data.
Sequence numbers detect missing, or mis-sequenced data.
Corrupted data is retransmitted after a timeout.
Mis-sequenced data is re-sequenced.
(Window-based) Flow control prevents over-run of receiver.
TCP uses congestion control to share network
capacity among users.
HTTP AND TCP
Port
Description
80
HTTP
23
Telnet
20/21
FTP(data/control)
25
Simple Mail Transfer
Protocol (SMTP)
TCP IS CONNECTION-ORIENTED
(Active)
Client
(Passive)
Server
(Active)
Client
(Passive)
Server
Syn
Fin
Syn + Ack
(Data +) Ack
Ack
Fin
Ack
Connection Setup
3-way handshake
Connection Close/Teardown
2 x 2-way handshake
THE TCP
DIAGRAM
Which path does
the Active Client
or Passive Server
follow?
(Active)
Client
(Passive)
Server
TCP CLIENT
TCP SERVER
TCP SUPPORTS A “STREAM OF BYTES”
SERVICE
Host A
TCP accepts data as a constant
stream from the applications
There are no record markers
automatically inserted by TCP.
 Example:
If the application on one end
writes 10 bytes, followed by a write
of 20 bytes, followed by a write of
50 bytes, the application at the
other end of the connection cannot
tell what size the individual writes
were. The other end may read the
80 bytes in four reads of 20 bytes at
a time.
 One end puts a stream of bytes
into TCP and the same, identical
stream of bytes appears at the
other end
Host B
…WHICH IS EMULATED USING TCP
“SEGMENTS”
Host A
Segment sent when:
1. Segment full (MSS bytes),
2. Not full, but times out, or
3. “Pushed” by application.
TCP Data
Host B
TCP Data
THE TCP SEGMENT FORMAT
IP Data
TCP Data
0
TCP Hdr
15
Src port
31
Dst port
Sequence #
Ack Sequence #
HLEN
4
RSVD
6
Flags
URG
ACK
PSH
RST
SYN
FIN
TCP Header
and Data + IP
Addresses
Checksum
IP Hdr
Window Size
Urgent Pointer
(TCP Options)
TCP Data
Src/dst port numbers
and IP addresses
uniquely identify socket
SEQUENCE NUMBERS
Host A
ISN (initial sequence number)
Sequence
number = 1st
byte
Host B
How does ISN
get chosen?
TCP Data
TCP
HDR
TCP Data
Ack sequence
number = next
expected byte
TCP
HDR
INITIAL SEQUENCE NUMBERS
(Active)
Client
(Passive)
Server
Syn +ISNA
Syn + Ack +ISNB
Sequence number = 32 bits
What if a message has more
than 232 bytes?
Sequence Number wrap-around
Ack
Connection Setup
3-way handshake
Solution : Timestamp Option
: Sender places timestamp in
every segment
: Receiver copies timestamp in
the ACK it sends for a segment
TCP SLIDING WINDOW
How much data can a TCP sender have
outstanding in the network?
 How much data should TCP retransmit when an
error occurs? Just selectively repeat the missing
data?
 How does the TCP sender avoid over-running the
receiver’s buffers?

TCP SLIDING WINDOW
Window Size
Data ACK’d
Outstanding
Un-ack’d data
Data OK
to send
Data not OK
to send yet
Window is meaningful to the sender.
 Current window size is “advertised” by receiver
(usually 4k – 8k Bytes when connection set-up).

TCP SLIDING WINDOW
Round-trip time
Round-trip time
Window Size
???
Window Size
Window Size
Host A
Host B
ACK
(1) RTT > Window size
ACK
ACK
(2) RTT = Window size
TCP: RETRANSMISSION AND TIMEOUTS
Round-trip time (RTT)
Retransmission TimeOut (RTO)
Guard
Band
Host A
Estimated RTT
Data1
Data2
ACK
ACK
Host B
TCP uses an adaptive retransmission timeout value:
Congestion RTT changes
Changes in Routing frequently
TCP: RETRANSMISSION AND TIMEOUTS
Picking the RTO is important:


Pick a values that’s too big and it will wait too long to
retransmit a packet,
Pick a value too small, and it will unnecessarily retransmit
packets.
The original algorithm for picking RTO:
1. EstimatedRTTk=  EstimatedRTTk-1 + (1 - ) SampleRTT
2. RTO = 2 * EstimatedRTT
Determined
empirically
Characteristics of the original algorithm:


Variance is assumed to be fixed.
But in practice, variance increases as congestion increases.
TCP: RETRANSMISSION AND TIMEOUTS


Router queues grow when there is more
traffic, until they become unstable.
As load grows, variance of delay grows
rapidly.
Average Queueing Delay

There will be some (unknown) distribution
of RTTs.
We are trying to estimate an RTO to
minimize the probability of a false timeout.
Probability

Variance
grows rapidly
with load
variance
mean
RTT
Load
(Amount of traffic
arriving to router)
TCP: RETRANSMISSION AND TIMEOUTS
Newer Algorithm includes estimate of variance in RTT:
Difference = SampleRTT - EstimatedRTT
 EstimatedRTTk = EstimatedRTTk-1 + (*Difference)
 Deviation = Deviation + *( |Difference| - Deviation )


RTO =  * EstimatedRTT +  * Deviation
1
4
Same as
before
TCP: RETRANSMISSION AND TIMEOUTS
KARN’S ALGORITHM
Host A
Host B
Host A
Retransmission
Wrong RTT
Sample
Host B
Retransmission
Wrong RTT
Sample
Problem:
How can we estimate RTT when packets are retransmitted?
Solution:
On retransmission, don’t update estimated RTT (and double RTO).
CONGESTION CONTROL: MAIN POINTS
Congestion is inevitable
 Congestion happens at different scales – from two
individual packets colliding to too many users
 TCP Senders can detect congestion and reduce
their sending rate by reducing the window size
 TCP modifies the rate according to “Additive
Increase, Multiplicative Decrease (AIMD)”.
 To probe and find the initial rate, TCP uses a
restart mechanism called “slow start”.
 Routers slow down TCP senders by buffering
packets and thus increasing delay

CONGESTION
H1
A1(t)
10Mb/s
R1
H2
D(t)
1.5Mb/s
H3
A2(t)
100Mb/s
A1(t)
A2(t)
Cumulative
bytes
A2(t)
A1(t)
X(t)
D(t)
t
X(t)
D(t)
TIME SCALES OF CONGESTION
Too many users using a
link during a peak hour
7:00
8:00
9:00
1s
2s
3s
TCP flows filling up all
available bandwidth
Two packets colliding
at a router
100µs 200µs 300µs
DEALING WITH CONGESTION
EXAMPLE: TWO FLOWS ARRIVING AT A ROUTER
A1(t)
A2(t)
?
R1
Strategy
Drop one of the flows
Buffer one flow until the other
has departed, then send it
Re-Schedule one of the two flows
for a later time
Ask both flows to reduce their
rates
CONGESTION IS UNAVOIDABLE
ARGUABLY IT’S GOOD!
We use packet switching because it makes efficient
use of the links. Therefore, buffers in the routers
are frequently occupied.
 If buffers are always empty, delay is low, but our
usage of the network is low.
 If buffers are always occupied, delay is high, but we
are using the network more efficiently.
 So how much congestion is too much?

LOAD, DELAY AND POWER
Typical behavior of queueing
systems with random arrivals:
A simple metric of how well the
network is performing:
Load
Power 
Delay
Burstiness tends to move
asymptote to the left
Average
Packet delay
Power
Load
“optimal
load”
Load
OPTIONS FOR CONGESTION CONTROL
1.
2.
3.
Implemented by host versus network
Reservation-based, versus feedback-based
Window-based versus rate-based.
TCP CONGESTION CONTROL
TCP implements host-based, feedback-based,
window-based congestion control.
 TCP sources attempts to determine how much
capacity is available
 TCP sends packets, then reacts to observable
events (loss).

TCP CONGESTION CONTROL

TCP sources change the sending rate by modifying the
window size:
Window = min{Advertized window, Congestion Window}
Receiver
Transmitter (“cwnd”)
In other words, send at the rate of the slowest
component: network or receiver.
 “cwnd” follows additive increase/multiplicative
decrease

On receipt of Ack: cwnd += 1
 On packet loss (timeout): cwnd *= 0.5

ADDITIVE INCREASE/ MULTIPLICATIVE
DECREASE
Src
D
A
D D
A A
D D
D A A
A
Dest
Additive Increase: Every time the source successfully sends a cwnd’s worth of
packets (each pkt sent out during the last RTT has been ACKed)
 add the equivalent of 1 pkt to the cwnd
Increment = MSS×(MSS/CWND) ; CWND≥MSS
CWND +=Increment
LEADS TO THE TCP “SAWTOOTH”
Window
Timeouts
halved
Could take a long
time to get started!
t
Multiplicative Decrease: For each timeout, the source set CWND to half
of its previous value.
CWND is large
all the packets dropped will be retransmitted  congestion gets worse
Need to get out of this state quickly
“SLOW START”


Designed to find the fair-share rate quickly at startup.
How Does it work?
1.
2.
3.
4.
Src
Increase cwnd exponentially for each ACK received, until it reaches
SSthreshold.
If cwnd < SSthreshold  {Do Slow Start}, else {Do Congestion Avoidance}
Initial SSThreshold = large value. After the pkt lost, SSThreshold = cwnd/2
Congestion Avoidance Increase cwnd linearly
1
D
2
A
D D
4
A A
D D
8
D
A
Dest
D
A
A
A
SLOW START
Why is it called slow-start?
Because TCP originally had no congestion control mechanism.
The source would just start by sending a whole advertised
window’s worth of data.
FAST RETRANSMIT AND FAST RECOVERY?
Homework!!
TCP SENDING RATE
What is the sending rate of TCP?
 Acknowledgement for sent packet is received after
one RTT
 Amount of data sent until ACK is received is the
current window size W
 Therefore sending rate is R = W/RTT


Is the TCP sending rate saw tooth shaped as well?
TCP AND BUFFERS
TCP AND BUFFERS



For TCP with a single flow over a network link with enough
buffers, RTT and W are proportional to each other
Therefore the sending rate R = W/RTT is constant (and not a
sawtooth)
But experiments and theory suggest that with many flows:
1
R
RTT p
Where: p is the drop probability.

TCP rate can be controlled in two ways:
1. Buffering packets and increasing the RTT
2.
Dropping packets to decrease TCP’s window size
CONGESTION CONTROL IN THE INTERNET

Maximum window sizes of most TCP
implementations by default are very small
Windows XP: 12 packets
 Linux/Mac: 40 packets


Often the buffer of a link is larger than the
maximum window size of TCP
A typical DSL line has 200 packets worth of buffer
 For a TCP session, the maximum number of packets
outstanding is 40
 The buffer can never fill up
 The router will never drop a packet

CONGESTION AVOIDANCE
TCP reacts to congestion after it takes place. The
data rate changes rapidly and the system is barely
stable (or is even unstable).
 Can we predict when congestion is about to happen
and avoid it? E.g. by detecting the knee of the
curve.

Average
Packet delay
Load
CONGESTION AVOIDANCE SCHEMES

Router-based Congestion Avoidance:

DECbit:


Routers explicitly notify sources about congestion.
Random Early Detection (RED):
Routers implicitly notify sources by dropping packets.
 RED drops packets at random, and as a function of the level of
congestion.


Host-based Congestion Avoidance

Source monitors changes in RTT to detect onset of
congestion.
DECBIT
Each packet has a “Congestion Notification” bit called the
DECbit in its header.
 If any router on the path is congested, it sets the DECbit.


Set if average queue length >= 1 packet, averaged since the start of the
previous busy cycle.
To notify the source, the destination copies DECbit into ACK
packets.
 Source adjusts rate to avoid congestion.

Counts fraction of DECbits set in each window.
 If <50% set, increase rate additively.
 If >=50% set, decrease rate multiplicatively.

Queue
Length
at router
Averaging period
Time
RANDOM EARLY DETECTION (RED)




RED is based on DECbit, and was designed to work well with TCP.
RED implicitly notifies sender by dropping packets.
Drop probability is increased as the average queue length increases.
(Geometric) moving average of the queue length is used so as to
detect long term congestion, yet allow short term bursts to arrive.
AvgLenn 1  (1   )  AvgLenn    Lengthn
n
i.e. AvgLenn 1   Lengthi ( )(1   ) n i
i 1
RED DROP PROBABILITIES
D(t)
A(t)
1
maxP
If minTh  AvgLen  maxTh :
 AvgLen  minTh 
pˆ AvgLen  maxP 

 maxTh  minTh 
pˆ AvgLen
Pr(Drop Packet) 
1  count  pˆ AvgLen
minTh
maxTh
AvgLen
count counts how long we've been in minTh  AvgLen  maxTh
since we last dropped a packet. i.e. drops are spaced out in
time, reducing likelihood of re-entering slow-start.
PROPERTIES OF RED
Drops packets before queue is full, in the hope of
reducing the rates of some flows.
 Drops packet for each flow roughly in proportion to
its rate.
 Drops are spaced out in time.
 Because it uses average queue length, RED is
tolerant of bursts.
 Random drops hopefully desynchronize TCP sources.

SYNCHRONIZATION OF SOURCES
RTT
A
B
C
D
Source A
N  RTT
SYNCHRONIZATION OF SOURCES
RTT
A
B
C
D
Aggregate Flow
f(RTT)
Avg
DESYNCHRONIZED SOURCES
RTT
A
B
C
D
Source A
N  RTT
DESYNCHRONIZED SOURCES
RTT
A
B
C
D
Aggregate Flow
N  RTT
Avg