Download Transport Layer protocols

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Net bias wikipedia , lookup

Computer network wikipedia , lookup

RapidIO wikipedia , lookup

RS-232 wikipedia , lookup

Asynchronous Transfer Mode wikipedia , lookup

Airborne Networking wikipedia , lookup

CAN bus wikipedia , lookup

Network tap wikipedia , lookup

Parallel port wikipedia , lookup

Serial port wikipedia , lookup

Deep packet inspection wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Real-Time Messaging Protocol wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

IEEE 1355 wikipedia , lookup

Internet protocol suite wikipedia , lookup

UniPro protocol stack wikipedia , lookup

TCP congestion control wikipedia , lookup

Transcript
Transport Layer - Overview
Understanding:
‹ Transport layer services
‹ Multiplexing/demultiplexing
‹ Connectionless transport: UDP
‹ Principles of reliable data transfer
‹ Connection-oriented transport: TCP
‹ Goals:
Understand principles behind Transport layer services
¾ Instantiation and implementation in the Internet
¾
1
¾
flow control
¾
connection management
2
TLP
Transport Layer protocols
Transport Services and Protocols
‹ Internet transport services:
‹ Provide logical communication
¾
network
data link
physical
a
ic
network
data link
physical
network
data link
physical
rt
po
TLP
network
data link
physical
s
an
tr
rt
po
3
nd
s
an
tr
application
transport
network
data link
physical
-e
nd
Unreliable ( Ù best-effort),
unordered unicast or
multicast delivery: UDP
‹ Services not provided by TCP:
• real-time (need RTP, RTCP)
• bandwidth guarantees
• reliable multicast
¾
nd
-e
network
data link
physical
network
data link
physical
network
data link
physical
le
nd
network
data link
physical
application
transport
network
data link
physical
a
ic
le
network
data link
physical
Reliable, in-order unicast
delivery: TCP
• congestion
• flow control
• connection setup
g
lo
network
data link
physical
g
lo
TLP
reliable transfer
‹ TCP congestion control
TLP
between app’ processes running application
transport
network
data link
on different hosts
physical
‹ Transport protocols run in end
systems (only)
‹ Transport vs network layer
services:
¾ network layer: data transfer
between nodes/end systems
¾ transport layer: data transfer
between processes at end systems
¾ relies on, but enhances, network
layer service capability
¾
application
transport
network
data link
physical
4
—
UDP
• Delivery of packet without guarantee (of arrival and in-order)
• No handshaking and ACKnowledgement ÕÕ fast response
• Reliability of link is application’s responsibility
Transmission Control Layer
• Two Protocol suites of TCP in internet architecture:
- UDP (User Datagram Protocol) (RFC768)
~ provides connectionless, unreliable, without flow
control services
- TCP (Transport Control Protocol)
UDP datagram
TCP segment
Ö IP packet
(RFC793, 1122, 1323, 2018, 2581)
~ provides connection-oriented, reliable, and byte-stream
oriented, with flow control services
ACK packet
(TCP only)
TCP
• Encapsulation of TPL’s PDU in IP packet
• Need Connection setup before transmission
• Guarantee packet delivery (no duplication) and in-order
reception, byte-stream oriented
• Reliability of link is TCP’s responsibility
IP (20B) TCP (20B)/UDP(8B)
header
header
UDP datagram
TCP segment
TLP
5
—
Multiplexing:
application-layer data
segment
header
segment
Ht M
Hn segment
P1
P
application
transport
network
P3
• Multiplexing/Demultiplexing:
– based on IP addresses,
sender’s and receiver’s
port numbers
~ delivering received segments
to correct app layer processes
via socket
P
P
Source port
Destination port
Sequence number
Acknowledgment number
P
P4
application
transport
network
P
Destination port
Source port
~ process
~ socket
Message length
P2
Checksum
UDP
Data
application
transport
network
~ unit of data exchanged between transport layer entities
aka TPDU:
TPDU: Transport Protocol Data Unit
TLP
6
Multiplexing/
Demultiplexing (cont
’d)
Multiplexing/Demultiplexing
(cont’d)
Demultiplexing:
receiver
B: bytes
IP packet (65535B max)
TLP
Multiplexing
Multiplexing and
and Demultiplexing
Demultiplexing
~ gathering data from multiple
application processes(sockets),
enveloping data with header
(later used for demul.)
TCP/UDP
data
7
DA SA
TLP
TF IP Header
Data offset
U
P RSF
Reserved R A
C S SYI
Window
P K H T NN
Checksum
Urgent pointer
Option
Padding
TCP data
TCP/UDP header
Data
CRC
8
TCP Well-known
Port Numbers
• Source port numbers
~ randomly assigned by the
sending host (1024< # <65536)
• Destination port numbers
~ the well-known one or the
incoming source port #
(# <1024)
Port N umber
0
1
5
7
9
11
13
15
17
19
20
21
23
25
37
42
43
53
77
79
80
93
95
101
102
103
104
111
113
117
119
129
139
TLP
—
UDP Well-known Port Numbers
Description
Reserve
TC P M ultiplexer
Rem ote Job Entry
Echo
Discard
Active Users
Daytim e
Network status program
Quote of the day
Character generator
FTPȐ dataȑ
FTPȐ com m andȑ
Term inal Connection
SM TP
Tim e
Host Nam e Server
W ho is
Dom ain Nam e Server
Private RJE service
Finger
Http protocol
Device Control Protocol
SU PD UP Protocol
N IC host nam e server
IOS-TSA P
X.400 m ail service
X.400 m ail sending
SU N RPC
Authentication Service
UUC P-path service
USEN ET new s Transfer Protocol
Password Generator Protocol
NETB IOS Session Service
Port
N um ber
0
7
9
11
13
15
17
19
37
42
43
53
67
68
69
111
123
161
162
512
513
514
525
9
• When host receives IP datagrams . . .
– each datagram has source IP
address, destination IP address
– each datagram carries 1
transport-layer segment
– each segment has source,
destination port number
(recall: well-known port numbers
for specific applications)
• Host uses IP address & port number
to direct segment to the appropriate
socket (w.r.t. a process)
RFC 1700.
1700.
RFC
FTP site:
site: ftp://isi.edu./in-notes/iana/assignments.
ftp://isi.edu./in-notes/iana/assignments.
FTP
Assigned port
port numbers
numbers range
range from
from 00 -- 1023.
1023.
•• Assigned
Assignedare
arereserved
reservedby
byIANA
IANAand
andcannot
cannotbe
beused
used
–– Assigned
Usedfor
forTCP,
TCP,IP,
IP,UDP
UDPand
andvarious
variousapplications
applicationssuch
such
–– Used
asTELNET
TELNET
as
Registered range
range for
for 1024
1024 -- 65535
65535 and
and these
these are
are
•• Registered
companies that
that have
have registered
registered their
their application.
application.
companies
Dynamic port
port numbers
numbers are
are also
also in
in the
the range
range of
of
•• Dynamic
1024 -- 65535.
65535.
1024
[ check with Unix/Linux files: /etc/services ]
10
How Demultiplexing works ?
Up-to-dateassignments
assignmentsof
ofnumbers
numbers
–– Up-to-date
TLP
R eserve
E cho
D iscard
A ctive U sers
D aytim e
N etw ork status program
Q uote of the day
C haracter G enerator
Tim e
H ost N am e Server
W ho is
D om ain N am e Server
B ootstrap Protocol Server
B ootstrap Protocol C lient
Trivial File Transfer (TFT P)
S un M icrosystem s R PC
N etw ork Tim e Protocol (N T P)
S N M P net m onitor
S N M P traps
U N IX com sat
U N IX rw ho daem on
S ystem log
Tim e daem on
TLP
Assigned, Registered and Dynamic Port Numbers
••
••
D escription
11
TLP
32 bits
source port #
dest port #
other header fields
application
data
(message)
TCP/UDP segment format
12
—
—
Mux/DeMux (TCP): Example I
Mux/DeMux (TCP) : Example II
• Multiple connection to multiple processes
• One process to one connection
host A
Web client
host C
server B
src port: 5678
dest. port: 23
P1’
C’s IP: 140.112.234.2
Dest IP: B
src port: 7976
dest. port: 80
P
P’
Web server
host B
P2’
source port:23
dest. port:5678
Telnet client
13
B’s IP = 140.124.13.3
Well-known Port = 80
TLP
14
MUX/DeMUX Happened Everywhere
Mux/DeMux (TCP): Example I
• One process to one connection
AP Layer
PING
host A
TELNET
SMTP
FTP
TRACE
ROUTE
SNMP
BOOTP
server B
src port: 5678
dest. port: 23
Multiplexing
)ӭπ*
P
P’
application
transport
network
application
transport
network
source port:23
dest. port:5678
Segment
or
Datagram
TCP
TP Layer
UTP
Packet
(Daragram)
ICMP
ARP
IP
Frame
+
15
TLP
NTP
Based on
port #
Internet Layer
(S/W modules)
IGMP
Demultiplexing
)ှӭπ*
RARP
(Interface-SAP)
Telnet server
DNS
Based on
protocol type
Bits
TLP
network
Dest IP: B
src port: 8879
dest. port: 80
Src IP: 140.124.70.13
Dest IP: 140.124.13.3
source port: 8879
dest. port: 80
P1’
Telnet client
P3
Web client
host A
Telnet server
TLP
P2
transport
DeMUX
C’s IP: 140.112.234.2
application
transport
network
application
transport
network
P1
Based on
frame’s L/T
DATA LINK (e.g., Ethernet)
Ntwk Access Layer
Medium (Frames)
(from physical link)
16
UDP Header and Segment Format
UDP: User Datagram Protocol [RFC 768]
‹ “no frills,”
frills,” “bare bones”
bones”
Internet transport protocol
‹ “best effort”
effort” service, UDP
segments may be:
¾lost
¾delivered out of order to
applications
‹ connectionless:
¾no handshaking between
UDP sender, receiver
¾each UDP segment handled
independently of others
‹ Why is there a UDP?
multimedia apps with
¾ loss tolerant
Length, in
bytes of UDP
¾ rate sensitive
segment,
‹ Other UDP uses:
including
header
¾ DNS
¾ SNMP
‹ Reliable transfer over UDP:
add reliability at application
layer
¾ application-specific error
recover!
17
TLP
32 bits
source port #
dest port #
length
checksum
Application
data
(message)
UDP segment format
18
TLP
UDP Checksum
Checksum in the UDP Header
‹ Goal: detect “errors”
errors” (e.g., flipped bits) in transmitted segment
• ChecKSum
( | IP’s CKS with the differences of following )
1. Allowing odd # of data byte (by padding one byte of “0”
but don’t transmit it)
9 2. Including pseudo-header from IP header (12 bytes
counted in total)
Sender:
‹ treat segment contents as
sequence of 16-bit integers
‹ checksum: addition (1’s
complement sum) of
segment contents
‹ sender puts checksum value
into UDP checksum field
Receiver:
‹ compute checksum of
received segment
‹ check if computed checksum
SRC IP(4B), DEST IP(4B), 00 + Protocol (2B), UDP length(2B)
9
equals checksum field value:
¾
NO - error detected
¾
YES - no error detected.
• Goal : to verify that the UDP DG has reached its correct
destination
‹ But maybe errors nonethless?
nonethless?
See next slide for implementation details
TLP
‹ Often used for streaming
no connection setup/
establishing time(which
can add delay)
¾ simple: no connection
state at sender’s and
receiver’s app
¾ small segment header
• Low overhead
¾ no congestion control:
UDP can blast away as fast
as desired (unregulated
sending rate)
¾
pp.200-201
19
TLP
• No CKS used if CKS = all 0’s being transmitted.
• Transmit 65535 if computed CKS = all 0’s (one’s complement)
• CKS Æ adds pseudo hdr and UDP data (plus 8-bit 0’s if necessary)
20
TCP: Overview (RFCs:
RFCs: 793, 1122, 1323, 2018, 2581)
‹ full duplex data:
one sender, one receiver
‹ reliable, inin-order byte steam:
¾ no “message boundaries”
‹ pipelined:
¾ TCP congestion and flow
control set window size
‹ send & receive buffers
¾
application
writes data
application
reads data
TCP
send buffer
TCP
receive buffer
segment
‹ Important issue in application, transport, and link layers
bi-directional data flow in
same connection
¾ MSS: maximum segment
size
‹ connectionconnection-oriented:
¾ handshaking (exchange of
control msgs) init’s
sender, receiver state
before data exchange
‹ flow controlled:
socket
door
¾ sender will not
overwhelm receiver
¾
‹ Top of important networking topics!
Being called (details
when data arrives
Being called
when pkt arrives
‹ characteristics of a unreliable channel will determine the
complexity of reliable data transfer (rdt
(rdt)) protocol.
21
TLP
coming next)
Network
layer
‹ pointpoint-toto-point:
socket
door
Principles of Reliable data transfer
TLP
‹ udt ~ unreliable data transfer protocol (IP, here)
Reliable data transfer: getting started
22
IP contradicts TCP ?
• Recall:
rdt_send(): called from above,
(e.g., by app.). Passed data to
deliver to receiver upper layer
send
side
udt_send(): called by rdt,
to transfer packet over
unreliable channel to receiver
TLP
C.O.
• TCP provides completely reliable transfer
C.L.
• (But) IP offers best-effort (unreliable) delivery
• TCP uses IP ? (YES ) How does it be done ?
deliver_data(): called by
rdt to deliver data to upper
) Reliable Data Transmission rely on . . .
receive
side
- Positive acknowledgment
~ Receiver returns a short message (called ACK,
acknowledgement) to the sender when data arrives
- Retransmission (upon timeout)
~ Sender starts timer whenever a segment is transmitted
~ If timer expires before acknowledgment arrives,
sender retransmits THE message
rdt_rcv(): called when packet
arrives on rcv-side of channel
23
TLP
24
TCP Header – II
TCP Header - I
• Sequence number (SEQ # ) :
Head
length
- identifies each byte in the stream of data from the
sending TCP to the receiving TCP (byte streams)
- numbering ranging from 0 to 232 -1 and wrapping back
around to 0
- SEQ # = (so-called) initial SEQ # (ISN) when SYN = 1
(flag)
(the first (data) segment = ISN + 1)
receiver window size
SQN is bounded to octets rather than to entire segments.
• Acknowledgment number :
- the next sequence number that the receiver expects to
receive (i.e., the piggybacked ACK)
= the SEQ # of the last successfully received data byte + 1
• TCP packed data in “segment” but counting/tracking by bytes.
• Seq# and Ack#: Counting by bytes of data (not segments)!
TLP
25
TLP
(ACK { 1 when the connection is firstly established)
TCP Header – III
TCP Connection Establishment
• Data Offset = header length (HL) in 32-bit word, (60 bytes max)
• Establishing a connection between two ends before exchanging data
• Connection establishing protocol ~ a threethree-way handshaking
• Code bits :
client
- URG Ö “urgent pointer” field is valid (when it is set to 1)
- ACK Ö Making ACK number valid (when it is set to 1)
(Active open)
SYN_SENT
Open a conn.
||
Open a socket
Connection
Established
- PSH Ö sender should send out all data in the sending buffer
conn. management
Ö receiver should pass this data to an application ASAP
- RST Ö reset the connection (port unreachable)
- SYN Ö synchronize sequence numbers to initiate a connection
server
SYN = j = ISN
(SYN = 1, Seq# = j)
Listen (passive open)
ISN
SYN = k, ACK = j+1
SYN_RCVD
( k ~ Rxer’s seq # )
ACK = k+1
27
initialize TCP variables:
seq. #, buffers, flow
control info (e.g.
RcvWindow)
Established
- FIN Ö sender is finished sending data (ask to close connection)
• Window (for credit allocation flow control) :
Ö indicating the number of bytes the sender is willing to accept
TLP
26
TLP
- SYN consumes one sequence number
- ISN should change over time (differs from connection to connection ) 28
Decompose PDUs in a TCP/IP Scenario
Windows> telnet 140.124.70.26
(PDU cont’d)
(showing the first two packets sending by the client)
Port #: Transport--Application layer
Src port # (randomly generated by the src PC – 1059, here)
Dest port # (an well-known for well-known application)
(for reliable, in-order reception)
(Selective ACKnowledgment) - see next pages
Protocol #: Network--Transport layer
TLP
29
—
TLP
30
Stop-and-Wait Protocol
Performance of Stop-and-Wait Protocol
(rdt3.0 – Alternating-bit protocol, textbook)
‹ rdt3.0 works, but performance stinks
‹ Performance issue:
Example: 1 Gbps link, 15 ms ee-e prop. delay, 1KB packet:
• Sends one segment and
waits for Ack returning
before continuing
sending the next segment
pipe
(Packet size)
8kb/pkt
Ttransmit =
= 8 microsec
10**9 b/sec
(performance)
(channel capacity)
receiver
sender
• Sender/channel Utilization
fraction of time
8 usec
Utilization = U = sender busy sending = 30.008 ms = 0.00027
(or 0.027%)
Bits into the channel
(Sender)
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R
time
RTT
ACK arrives, send next
packet, t = RTT + L / R
TLP
first packet bit arrives
last packet bit arrives, send ACK
Send 1KB pkt every 30.008 msec
Æ effective throughput only 267 kbps over 1 Gbps link
¾ network protocol limits use of physical resources a lot!
Æ
(assuming no error)
31
TLP
(15.008 x 2, if ACK ignored)
(ref. P.214)
32
Pipelined protocols
(Why need ?)
Pipelining: increaseing utilization
‹ Pipelining : allowign sender to send multiple, “inin-flight”
flight”,
yetyet-toto-bebe- acknowledged pkts w/o waiting for ACKs
‹ For reliable data transfer :
¾ the range of sequence numbers must be increased (not retx.)
retx.)
¾ need to buffer more than one packet at sender and/or receiver
first packet bit transmitted, t = 0
ACK arrives, send next
packet, t = RTT + L / R
(next cycle begins)
Increase utilization
by a factor of 3!
U
‹ Two generic forms of pipelined protocols:
go-Back-N and Selective repeat
first packet bit arrives
last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
RTT
• Seq.# range and buffering
depend on the manner in
which a data transfer protocol
responds to lost, corrupted,
and overly delayed packets.
¾
(assuming no error)
last bit transmitted, t = L / R
filling a pipeline
TLP
receiver
sender
(pipelined with error recovery)
Go-Back-N
33
sender
=
3*L/R
RTT + L / R
=
.024
30.008
= 0.0008
l 0.00027
(0.08%)
34
TLP
GBN (Cont’d)
Preview : sliding window
Sender :
‹k-bit seq # in pkt header
Receiver :
‹“window”
window” of up to N, consecutive unAck’
unAck’ed pkts allowed (the window size)
‹ ACK-only: always send ACK for correctly-received
pkt with highest in-order seq #
may generate duplicate ACKs
¾ need only remember expected seqnum
¾
‹ out-of-order packet:
‹ACK(n): ACKs all pkts up to, including seq # n ~ “cumulative
ACK”
ACK” (Advantage: see Fig. 3.34)
¾ may deceive duplicate ACKs (see receiver) ?? You find it out.
‹Set timer for each inin-flight pkt
‹timeout(n): retransmit pkt n and all higher seq # pkts in window
TLP
35
TLP
¾
discard (don’t buffer) Æ no receiver buffering
¾
ACK pkt with highest in-order seq #
36
GBN in action
Selective Repeat/Rejecct
‹ Receiver individually acknowledges all correctly
received pkts
¾buffers pkts, as needed, for eventual in-order delivery
to upper layer
‹ Sender only resends pkts for which ACK not received
¾sender timer for each unACKed pkt
‹ Sender window
¾N consecutive seq #’s
¾again limits seq #s of sent and unACKed pkts
discard
discard
reTx Æ
discard
37
TLP
38
TLP
Selective repeat: sender, receiver windows
Selective Repeat in action
loss
TLP
(Read: Fig. 3.23-25 for Sender’s and receiver’s events and actions)39
Window size = 4
TLP
40
Selective Repeat: a dilemma
¾seq
Connection Maintenance (Ex: Telnet Scenario)
Invisible
curtain
‹ Example:
Example:
an interactive application
#’s: 0, 1, 2, 3 (size = 4)
• "echo back"
size = 3 < Max seq #
Internet
¾window
¾Receiver
sees no difference
in both scenarios (a) and
(b).
¾Incorrectly
(duplicate pkt
|| 0)
ReTx the
the 11stst pkt
pkt
passes duplicate ReTx
remote site.
• Each character traverses the
network twice
Internet
what should be the
relationship between seq #
size and window size?
?
0
A: sequence # space >= 2*window
(Sec. 3.4.4)
(new pkt 0)
Tx55ththpkt
pkt
Tx
client
Close a conn.
||
close a socket
Se
41
receipt
Q: How receiver handles
of echoed
outout-ofof-order segments ?
‘C’
A: TCP spec doesn’
doesn’t say,
~ up to implementor
(go(go-backback-N or Selective Repeat)
Seq=4
3, ACK
=80, .
..
time
TLP
42
TCP Connection Management
TCP Connection Termination
(Active close)
FIN_WAIT_1
User Seq=4
2, AC
K=79,
types
data =
‘C’
‘C’
host ACKs
receipt of
’
= ‘C ‘C’, echoes
, data
3
4
=
, ACK
back ‘C’
q=79
host ACKs
‹ Q: To prevent this ambiguity,
(Problem 3.18)
been received and processed at
0
Host B
Segment exchange
seen by Telnet user have already
data as new in case (a)
TLP
Host A
Æ ensure that characters
server
FIN = M
(FIN=1& SYN=M)
Listen (passive close)
Å TCP client lifecycle
CLOSE_WAIT
ACK = M+1
FIN_WAIT_2
(ACK=1& SYN=M+1)
FIN = N
LAST_ACK
(closing)
TIME_WAIT
(2 MSL wait state)
Timed wait
CLOSED
ACK = N+1
Resend ACK
in case it lost
(if ACK rxed)
CLOSED
TCP server lifecycle Æ
Resources at both C
and S are deallocated.
• MSL = Max Segment Lifetime; MSL in RFC 793 = 2 min, max.
TLP
• Connection termination protocol Æ3-way but taking four segments
43
TLP
44
TCP Flow Control
TCP: retransmission scenarios
lost ACK scenario
Host A
s data
ACK
Seq=100 timeout
Seq=92 timeout
timeout
Host A
Host B
Seq=9
2, 8 b
yte
=100
X
loss
Seq=9
2, 8 b
yte
s data
=100
ACK
flow control
sender won’t overrun
receiver’s buffers by
transmitting too much,
too fast
premature timeout, cumulative ACKs
New timeout
for seq.=92
Host B
Seq=9
2, 8 b
ytes d
ata
Seq=
100,
20 by
tes d
ata
RcvBuffer = size or TCP Receive Buffer
RcvWindow = amount of spare room in Buffer
0
10
K=
120
C
K
A AC =
Seq=9
2, 8 b
yte
receiver: explicitly informs
sender of (dynamically
changing) amount of free
buffer space
s data
20
K=1
AC
time
time
45
TLP
TLP
Flow Control - Sliding Window
sender: limits the amount of
transmitted, unACKed
data less than most
recently received
RcvWindow
- guarantees receive
buffer doesn’t
overflow
❒ spare room in buffer
Duplicated. Host B’s action?
- RcvWindow field in
TCP segment
= RcvWindow
= RcvBuffer-[LastByteRcvd - LastByteRead]
Example
46
Sliding window flow control (cont’d)
• To improve the utilization of the channel in the cases of Tprop > Tframe
by allowing multiple frames to be transmited before receiving ACK(s)
(to improve the performance of the stop-and-wait mechanism)
• To keep track of which frames without waiting for any ACKed,
each frame is labeled with sequence number.
• Rule of sliding window:
ACK
- Txer maintains a list of SEQ numbers that it is allowed to send
- Rxer maintains a list of SEQ numbers that it is prepared to receive
RR ~ Receiver Ready (in HDLC)
Window of frames
- Frames are numbered (0 ~
2K-1)
modulo 2K , k = # of bits in SEQ #
- The window size d 2K , and the SEQ # has a bounded size since it
occupies a field in the frame
(?)
- Sender must buffer these frames in case they need to be retransmitted
ACK
• Applied to Go-back-N and Selective-reject ARQ, and LLC, HDLC, and X.25
TLP
47
TLP
Back to GBN 48
Example
TCP Flow Control - Credit Allocation
W=1400
A=1001,
• Operation:
- Sending TCP includes a SEQ # of the first byte in the
segment field
- Receiving TCP ACKs an incoming segment with (A=i, W=j),
where
A=i Ö expecting SEQ = i and all SEQ prior to i are ACKed
W=j Ö granting of permission to send additional j (window)
bytes, i.e., corresponding to SEQ # in i ~ (i+j-1)
(granted permission)
Remaining credits
• Some examples of granting credit:
+ 600
Assuming Rxer just issued (A=i, W=j )]
- Rxer issues (A=i, W=k) to increase credit to k (k > j) when no
additional data have arrived
- Rxer issues (A=i+m, W=j-m) without granting additional
credit to ACK an incoming segment containing m bytes (m < j)
TLP
49
TLP
Estimation of RTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- D)*EstimatedRTT + D*SampleRTT
- Exponential weighted moving average (why?)
- influence of given sample decreases exponentially fast
- typical value of D = 0.125 (RFC 2988)
Q: How to estimate RTT?
z SampleRTT: measured time
from segment transmission
until ACK receipt, ignore
350
RTT: from gaia.cs.umass.edu to fantasia.eurecom.fr
retransmissions and
cumulatively ACKed
segments
300
RTT (milliseconds)
Q: How to set TCP
timeout value?
z longer than RTT
Γ note: RTT will
vary
z too short:
premature timeout,
Γ unnecessary
retransmissions
z too long: slow
reaction to
segment loss (which
is unnecessary)
- sending 200 bytes/segment; sending and receiving SEQ
# are synchronized through connection establishment
- initial credit = 1400 bytes, and SEQ # = 1001
50
z SampleRTT will vary, want
estimated RTT “smoother”
average several recent
measurements, not just
current SampleRTT
250
200
150
Sample RTT
1
TLP
Estimated RTT
100
51
TLP
8
15
22
29
36
43
50
57
time (seconds)
64
71
78
85
92
99
106
52
RTO (Retransmission Time Out)
Principles of Congestion Control
Setting the timeout
❒ EstimtedRTT plus “safety margin”
❍
Congestion:
large variation in EstimatedRTT -> larger safety margin
z informally: “too many sources sending too much
❒ First estimate of how much SampleRTT deviates from
data too fast for network to handle”
EstimatedRTT:
z different from flow control (w.r.t.
receiver)
z Manifestations:
DevRTT = (1-E)*DevRTT +
Γ Γ lost packets (buffer overflow at routers)
E*|SampleRTT - EstimatedRTT|
Γ Γ long delays (queueing in router buffers)
(typically, E = 0.25)
z a top-10 problem!
❒ Then set timeout interval:
TimeoutInterval(RTO)= EstimatedRTT + 4*DevRTT
53
TLP
54
TLP
Case study: ATM ABR congestion control
Approaches towards congestion control
™ ABR: available bit
™ Two broad approaches towards congestion control:
1. End-end congestion
control:
2. Network-assisted
congestion control:
❒ no explicit feedback
❒ routers provide feedback
from network
❒ congestion inferred
❍
to end systems
❍
from end-system
observed loss, delay
❒ approach taken by TCP
❍
Ref: Sec 3.6.2~3.6.3
rate
❒ “elastic service”
❒ if sender’s path
“underloaded”:
sender should use
available bandwidth
❒ if sender’s path
single bit indicating
congestion (SNA,
DECbit, TCP/IP
ECN(RFC2481), ATM)
congested:
❍
sender throttled to
minimum guaranteed
rate
™ RM (resource management)
cells:
❒ sent by sender, interspersed
with data cells
❒ Two bits in RM cell set by
switches (i.e., “networkassisted”)
NI bit: No Increase in rate
(mild congestion)
❍ CI bit: Congestion Indication
❒ RM cells returned to sender by
receiver, with bits intact
❍
explicit rate sender
should send at
(Ex: choke packet in PSN)
TLP
55
TLP
56
TCP Congestion Control (cont’d)
TCP Congestion Control : Overview
z “probing” for usable
bandwidth:
- ideally: transmit as fast
as possible (Congwin as
large as possible)
without loss
- increase Congwin until
loss (congestion)
- loss happened: decrease
Congwin, then begin
probing (increasing)
again
уу෧෧, ٩ᏵࣁՖ ?
❒ end-end control (no network
z Important variables:
- Congwin:
~ congestion
window size
- threshold:
~ defines threshold
between two slow
start phase and
congestion control
phase
perceive congestion?
assistance)
❒ loss event = timeout or 3
❒ sender limits transmission:
duplicate acks
LastByteSent-LastByteAcked
❒ TCP sender reduces rate
d CongWin
(CongWin) after loss
❒ Roughly,
event
rate =
CongWin
Bytes/sec
RTT
❒ CongWin is dynamic, function
of perceived network
congestion
Abbreviations:
• Congwin l cwnd
• threshold l ssthresh
57
TLP
~ increase CongWin
by 1 MSS every
RTT in the absence
of loss events:
24 Kbytes
AIMD
❍
Slow start
❍
Congestion Avoidance
58
™ When connection begins, increase rate exponentially fast until
first loss event.
❒ Multiplicative Decrease:
~ cut CongWin in half
• Operation:
after loss event
- Initializing cwnd = 1 (1 MSS) whenever opening a new connection
- Increasing cwnd by 1 (up to a Max) every time an ACK is received
- At any time, TCP measures the congestion window in segment
and restrains the transmission by
AIMD Operation
awnd = Min { credit, cwnd }
awnd = allowed window (currently allowed to send w/o receiving ACKs)
cwnd = congestion window (used at startup and reduced during congestion)
credit = receiver advertised window (used to calculate window/segment size)
16 Kbytes
• Slow start probes the internet to make sure not to send too many
segments into an already congested network
• Connection’s data flow is controlled by the incoming ACK (not cwnd)
8 Kbytes
TLP
❍
II.
II. Slow
Slow Start
Start
probing
congestion
window
™ Three mechanisms:
TLP
I.
I. TCP
TCP AIMD
AIMD Congestion
Congestion Control
Control
❒ Additive Increase:
™ How does sender
time
59
TLP
60
Slow Start Operation
Initialization
Æa new connection
- A is sending 100-byte
segments
III.
III. Congestion
Congestion Avoidance
Avoidance
SN = 1
1st
• Also, Dynamic Window sizing on Congestion (Jacobson [88/95])
~ modified the growth of cwnd from exponential to linear
~ a way to deal with the segment loss :
a timeout occurring and receipt of duplicate ACKs
ACK = 101
RTT
SN = 101
SN = 201
SN = 701
ACK = 801
Really slow ?
• Slow start may be a
misnomer since cwnd
grows exponentially
(pretty much close to)
TLP
1st RTT
2nd RTT
SN = 1401
ACK = 1501
3rd RTT
4th RTT
61
TLP
62
Congestion avoidance (cont’d)
• Example
- check how long
it would take to
recover the cwnd
level before
congestion ?
• Operation:
- Begin with slow start algorithm until a congestion occurs :
- Set ssthresh (a slow start threshold) = cwnd/2
- Set cwnd = 1 and perform slow start process
(i.e., increase cwnd by 1 for every ACK received)
until cwnd = ssthresh
- For cwnd t ssthresh, increase cwnd by one for each round-trip
time (RTT)
ACK = 201
- A can fill the pipe with a
continuous flow of segments
after approximately FOUR
RTTs
6th
RTT
1st
RTT
Comparison of Slow Start and
Congestion Avoidance
Slow start, ending
with a timeout
Å counted as
ONE more RTT)
Exponential
growth of cwnd
9
8
ssthresh
Linear
growth of cwnd
Linear
growth of cwnd
1
(RTT)
cwnd = 9
Slow start, ending
with a timeout
TLP
Exponential
growth of cwnd
? ssthresh = 8
63
TLP
(See what the texkbook says.)
64
TCP Slow Start Algorithm
TCP Congestion Avoidance : Tahoe
Slowstart algorithm
• exponential increase (per
RTT) in window size (not
so slow!)
Host A
RTT
initialize: Congwin = 1
for (each segment ACKed)
Congwin++
until (loss event OR
CongWin > threshold)
TCP Tahoe Congestion avoidance
Host B
/* slowstart is over
*/
/* Congwin > threshold */
Until (loss event) {
every w segments ACKed:
Congwin++
}
threshold = Congwin/2
Congwin = 1
perform slowstart
one segm
ent
two segm
ents
four segm
ents
• loss event Γ timeout
(Tahoe TCP) and/or or
three duplicate ACKs
(Reno TCP)
time
TLP
65
TLP
66
TCP Congestion Avoidance : Reno
TCP Reno versus TCP Tahoe:
TCP Reno Congestion avoidance
TLP
congestion window size
(segments)
• Three duplicate ACKs
(Reno TCP):
• Some segments are
getting through
correctly!
• Don’t “overreact” by
decreasing window to
1 as in Tahoe
– decrease window
sizeindicates
by half
3 dup ACKs
network capable of
delivering some segments
14
/* slowstart is over
*/
/* Congwin > threshold */
Until (loss event) {
every w segments ACKed:
Congwin++
}
threshold = Congwin/2
If (loss detected by timeout) {
Congwin = 1
Threshold = Congwin/2
perform slowstart }
If (loss detected by triple
duplicate ACK) {
Congwin = Congwin/2,
Congwin increases linearly }
12
10
8
6
threshold
4
(variable)
2
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15
Transmission round
TCP
Tahoe
Series1
TCP
Series2
Reno
Fig. 3-51 Evolution of TCP Congestion window (Tahoe and Reno)
67
TLP
68
TCP Fairness
(Joined) Throughput Realized by Two TCPs
• TCP Fairness goal:
• Two competing sessions:
~ if K TCP connections pass through a router (share same
bottleneck link), each TCP should get R/K of link capacity
– Additive increase gives slope of 1, as throughout increases
– multiplicative decrease decreases throughput proportionally
TCP connection 1
TCP
connection 2
Goal : having achieved throughput fall
somewhere around intersection
equal bandwidth share
R
Connection 2 throughput
Example:
bottleneck
Router capacity R
❒ Example: link of rate R supporting 9 connections;
❍ What
❍
TLP
How TCP approaches fairness ?
if :new app asks for 1 TCP, gets rate R/10
What if new app asks for 11 TCPs, gets what ?
(A: R/2)
69
TLP
71
TLP
loss: decrease window by factor of 2
congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase
Assuming starting
Connection 1 throughput R
70
The End
Understanding the Computer
TLP
72