Download scratch

Document related concepts

Wake-on-LAN wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Computer network wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Net bias wikipedia , lookup

Deep packet inspection wikipedia , lookup

IEEE 1355 wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Internet protocol suite wikipedia , lookup

TCP congestion control wikipedia , lookup

Transcript
Transport Layer
Chapter 6
•
•
•
•
•
•
•
Transport Service
Elements of Transport Protocols
Congestion Control
Internet Protocols – UDP
Internet Protocols – TCP
Performance Issues
Delay-Tolerant Networking
Revised: August 2011
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
The Transport Layer
Responsible for delivering data
across networks with the desired
reliability or quality
Application
Transport
Network
Link
Physical
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Transport Service
Data transport from a process on the source machine to a process on
the destination machine with a desired level of reliability that is
independent of the physical networks actually being used.
−
−
•
•
•
•
Transport entity process can be physically located in the operating system
kernel, in a NIC card, in a library package bound into the network applications or
else in a separate user process.
Transport code runs entirely on end user’s computers in direct contrast to the
network layer, which runs mostly on routers.
Services Provided to the Upper Layer »
Transport Service Primitives »
Berkeley Sockets »
Socket Example: Internet File Server »
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Services Provided to the Upper Layers (1)
Transport layer adds reliability to the network layer
• Offers connectionless (e.g., UDP) and connectionoriented (e.g, TCP) service to applications
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Services Provided to the Upper Layers (2)
New term: “segments” – transport layer “packets”
Transport layer sends segments in packets (in frames)
Segment
Segment
Transport Layer Protocols in the Internet Protocol Suite:
1. UDP – connectionless
2. TCP – connection-oriented reliable service
3. SCTP (Stream Control Transmission Protocol) – Emerging; tailored for multimedia
and wireless environments
4. SST (Structured Stream Transport) – experimental; finer grained TCP-like streams
5. DCCP (Datagram Congestion Controlled Protocol) – UDP with congestion control
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Transport Service Primitives (1)
Service Primitives for TCP.
•
Transport layer makes it possible for the (TCP) transport service
to be more reliable than the underlying network.
−
−
•
TCP hides the imperfections of the network layer
Transport Layer provides services to the application layer
Greater reliability is accomplished entirely without any
dependence (or reliance) upon underlying network primitives.
TCP’s primitives that applications might call to transport
data for a simple connection-oriented service:
• Client calls CONNECT, SEND, RECEIVE, DISCONNECT
• Server calls LISTEN, RECEIVE, SEND, DISCONNECT
Segment
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Transport Service Primitives (2)
State diagram for a simple connection-oriented service
Solid lines (right) show
client state sequence
Dashed lines (left) show
server state sequence
Transitions in italics are
due to segment arrivals.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Berkeley Sockets
Very widely used primitives started with TCP on UNIX
• Notion of “sockets” as transport endpoints
• Like simple set plus SOCKET, BIND, and ACCEPT
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Socket Example – Internet File Server (1)
Client code
...
See code on page 504
Get server’s IP
address
Make a socket
Try to connect
...
9
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Socket Example – Internet File Server (2)
Client code (cont.)
...
Write data (equivalent to
send)
Loop reading (equivalent to
receive) until no more data;
exit implicitly calls close
10
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Socket Example – Internet File Server (3)
Server code
...
See code on page 505
Make a socket
Assign address
Prepare for
incoming
connections
...
11
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Socket Example – Internet File Server (4)
Server code (continued)
...
Block waiting for the
next connection
Read (receive) request
and treat as file name
Write (send) all file data
Done, so close this connection
12
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Elements of Transport Protocols
•
•
•
•
•
•
Addressing »
Connection establishment »
Connection release »
Error control and flow control »
Multiplexing »
Crash recovery »
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Addressing
• Transport layer adds
TSAPs
• Multiple clients and
servers can run on a
host with a single
network (IP) address
• TSAPs are ports for
TCP/UDP
This slide explains how the OSI protocols “address” services on adjacent layers
of the OSI reference model.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Instructor’s Slide Addition: Addressing
•
Textbook describes Internet Port Addresses in terms of OSI’s “TSAP” concept
• Your Instructor believes that this is confusing for those not familiar with OSI, even
though these OSI concepts are helpful for a formal orientation.
• This instructor-inserted slide seeks to summarize pgs 509-512 and 554 of the
textbook in purely Internet terms, which he believes is much simpler.
•
OSI’s TSAPs and Internet’s Ports both provide addresses for application-layer
applications so that the transport layer can forward packets to the appropriate
application layer processing entity (e.g., a process server local within that device
such as Unix’s inetd).
•
https://www.ietf.org/assignments/service-names-port-numbers/service-namesport-numbers.xml
Security firewalls also extensively
• Three types of Ports
use port references and definitions
1. Well-known Ports: ports 0 – 1023
to identify specific applications.
Registered Ports: ports 1024 – 49151
2.
»
Ports 0 through 49151 are formally registered worldwide through IANA (see above URL
for current registration results)
Dynamic Ports: Ports 49152 – 65535
3.
»
These ports are not registered thru IANA but rather use a locally scoped port ID system
such as portmapper (see textbook page 510)

Example is Sun Microsystems’ ONC RPC protocol system, which is a popular approach for remote
procedure calls (see pages 543-546)
Another Instructor Slide Insertion
•
Tannenbaum is a European and therefore apparently thinks that connectionoriented systems are “normal” and connectionless systems “abnormal” or
“unusual” or “bad” – they didn’t exist within OSI. Regardless, he introduces and
describes Transport layer concepts solely in terms of the connection-oriented
variant.
•
The next 11 slides are true for the Internet’s TCP.
•
They are NOT true for the Internet’s UDP, which provides a connectionless
service to the application layer.
•
Textbook doesn’t really explain this distinction until UDP and TCP are
subsequently introduced late in the chapter circa page 541
•
This reflects the textbook’s origins: earlier editions were written in a time of
protocol diversity and discussed data communications in terms of the OSI
Reference Model and OSI Protocols. These foundational concepts still remain
important to understand for protocol development and research.
Connection Establishment (1)
TCP is a connection-oriented protocol. It therefore provides services
common to other connection-oriented approaches (e.g., circuit switching)
Key problem is to ensure reliability even though packets
may be lost, corrupted, delayed, and duplicated
• Don’t treat an old or duplicate packet as new
• (Use ARQ and checksums for loss/corruption)
Approach:
• Don’t reuse sequence numbers within twice the MSL
(Maximum Segment Lifetime) of 2T=240 secs
• Three-way handshake for establishing connection
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Connection Establishment (2)
Requirement: need a practical and foolproof way to reject delayed
duplicate segments to ensure that no session confusion can happen.
Use a sequence number space large enough that it will
not wrap, even when sending at full rate
• Clock (high bits) advances & keeps state over crash
Need seq. number not to
wrap within T seconds
Need seq. number not to
climb too slowly for too long
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Connection Establishment (3)
Three-way handshake provides a foolproof way to ensure that an initial
CONNECTION_REQUEST sequence number is not a duplicate of a recent
connection (since seq numbers are not remembered across different connections
at a destination).
Three-way handshake used
for initial packet
• Since no state from
previous connection
• Both hosts contribute fresh
seq. numbers
• CR = Connection Request
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Connection Establishment (4)
Three-way handshake
protects against odd cases:
a)
a) Duplicate CR. Spurious
ACK does not connect
X
b) Duplicate CR and DATA.
Same plus DATA will be
rejected (wrong ACK).
b)
X
X
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Connection Release (1)
Key problem is to ensure
reliability while releasing
Asymmetric release (when
one side breaks connection)
is abrupt and may lose data
X
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Connection Release (2)
Symmetric release (both sides agree to release) can’t
be handled solely by the transport layer
• Two-army problem shows pitfall of agreement
− Blue is more powerful than white but can only win if 2 parts can
synchronize their attack, but only comms is thru the white army
where the messenger may be captured and the other part not
receive it, leaving the attacker vulnerable if attack alone.
Attack?
Attack?
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Connection Release (3)
Normal release sequence,
initiated by transport user on
Host 1
•
“Normal” in this context means “without
lost packets”
•
•
DR=Disconnect Request
Both DRs are ACKed by
the other side
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Connection Release (4)
Error cases are handled with timer and retransmission
Final ACK lost,
Host 2 times out
Lost DR causes
retransmissions
Extreme: Many lost
DRs cause both
hosts to timeout
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Error Control and Flow Control (1)
Error control is ensuring that the data is delivered with the desired level of
reliability, usually that all of the data is delivered without any errors.
Flow control is keeping a fast transmitter from overrunning a slow receiver.
Foundation for TCP error control is a sliding window (from
Link layer) with checksums and retransmissions
Flow control manages buffering at sender/receiver
•
•
Issue is that data goes to/from the network and applications at
different times
Window tells sender available buffering at receiver
− Buffers are needed at both the sender and the receiver
− A host may have many connections, each of which are treated
separately, so it may need a substantial amount of buffering for the
sliding windows.
•
Makes a variable-size sliding window
−
Need to dynamically adjust buffer allocations as connections open and close and
traffic patterns change.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Error Control and Flow Control (2)
Different buffer strategies trade efficiency / complexity
a) Chained fixedsize buffers
b) Chained variablesize buffers
c) One large circular buffer
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Error Control and Flow Control (3)
Example of how dynamic window management might work in a datagram
network with 4-bit sequence numbers. A and B are communicating hosts.
Flow control example: A’s data is limited by B’s buffer
B’s Buffer
… means lost packet
0 1 2 3
0 1 2 3
0 1 2 3
0 1 2 3
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
2 3 4 5
3 4 5 6
3 4 5 6
3 4 5 6
3 4 5 6
7 8 9 10
In line 16 above B allocated more buffers but the allocation segment (TPDU) was lost. To prevent deadlocks like
CN5E bysend
Tanenbaum
& Wetherall,
© Pearson Education-Prentice
andand
D. Wetherall,
2011status on each connection .
this, each host should periodically
control
segments
giving the Hall
ack
buffer
Multiplexing
Multiplexing = sharing several conversations over the same connection
Kinds of transport / network sharing that can occur:
• Multiplexing: connections share a network address
• Inverse multiplexing: addresses share a connection
Multiplexing
Inverse Multiplexing
Question: If this example is for an IPv4 network, how many network interface
cards (NICs) are on that computer? How about an IPv6 network?
28
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Crash Recovery
Network and Transport layers automatically handle network and router crashes.
Issue here is how the transport layer recovers from host (end system) crashes.
Want clients continue to be able to work if the server quickly reboots.
Rule of Thumb: Recovery from a layer N crash can only be done by layer N + 1
because only the higher level retains enough status info to reconstruct where it
was before the problem occurred.
Application needs to help recovering from a crash
•
Transport can fail since A(ck) / W(rite) not atomic.
Note: C = Crash
All 8 possible combos
For each strategy there
of client and server
is some seq of events
Strategies shown here.
causing protocol to fail.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Congestion Control
Two layers are responsible for congestion control:
− Transport layer, controls the offered load [here]
» Internet relies heavily on the transport layer for congestion
control with specific algorithms built into TCP and other
protocols.
» Question: Is UDP among those “and other protocols”?
− Network layer, experiences congestion and signals it
using ECN [previous chapter]
•
•
•
Desirable bandwidth allocation »
Regulating the sending rate »
Wireless issues »
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Desirable Bandwidth Allocation (1)
Efficient use of bandwidth gives high goodput, low delay
•
The metric here is load/delay – it will rise with load until delay starts
to climb rapidly
Goodput rises more slowly than
load when congestion sets in
Delay begins to rise sharply
when congestion sets in
Question: What packet behavior naturally happens at the Network Layer when
congestion appears that the “desired response” above is trying to correct?
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Desirable Bandwidth Allocation (2)
This slide and the next one address the issue of how to divide bandwidth
between 2 diff transport senders in order to resolve congestion
Definition: An allocation is max-min fair if the bandwidth given to one flow
cannot be increased without decreasing the bandwidth given to another
flow with an allocation that is no larger.
Fair use gives bandwidth to all flows (no starvation)
•
Max-min fairness gives equal shares of bottleneck
Question: Does one always want
max-min fair use for all traffic?
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Bottleneck link
Desirable Bandwidth Allocation (3)
Min-Max Fair Example
We want bandwidth levels to converge quickly when
traffic patterns change
Flow 1 slows quickly
Flow 1 speeds up
quickly when Flow 2
stops
Rate decreases when
2 starts
Example where bandwidth allocation changes over time and converges quickly
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Regulating the Sending Rate (1)
Sender may need to slow
down for different reasons:
1. Flow control, when the
receiver is not fast
enough [right]
−
Previously discussed flow control
with variable sized window to do
this.
2. Congestion control, when
the network is not fast
enough [next slide]
A fast network feeding a low-capacity receiver
 flow control is needed
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Regulating the Sending Rate (2)
Our focus is dealing with
this problem – congestion
• The way that a transport
protocol should regulate the
sending rate depends on the
form of the feedback returned
by the network (see next slide).
• Feedback can be explicit
or implicit
• Feedback can be precise
or imprecise
A slow network feeding a high-capacity receiver
 congestion control is needed
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Regulating the Sending Rate (3)
Different congestion signals the network may use to tell
the transport endpoint to slow down (or speed up)
XCP – eXplicit Congestion Control
ECN – Explicit Congestion Notification
Feedback: Packet loss retransmit packets at IP layer  transport layer
slowing down the sending rate for that flow
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Regulating the Sending Rate (4)
This and next slides reports on the results of Chiu and Jain’s 1989 study
that contrasts responding to congestion feedback by additive responses (+1 Mbps) versus multiplicative responses (+- 10%)
If two flows competing for the bandwidth of a single link increase /
decrease their bandwidth in the same way when the
network signals free/busy they will not converge to a
fair allocation
+ /– constant
+/– percentage
Congestion signal
sent to both whenever sum of their
use cross the
fairness line
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Regulating the Sending Rate (5)
Pure additive or multiplicative responses do not converge to fairness.
The AIMD (Additive Increase Multiplicative Decrease)
control law does converge to a fair and efficient point!
•
TCP uses AIMD for this reason (see slides 60 - 66)
User 2’s bandwidth
•
AIMD – additively increase bandwidth allocations but then
multiplicatively decrease them when congestion is signaled.
User 1’s bandwidth
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Wireless Issues
Wireless links lose packets due to transmission errors
• Do not want to confuse this loss with congestion
• Or connection will run slowly over wireless links!
Strategy:
• Wireless links use ARQ, which masks errors
− Address causes for packet drop at data link layer to reduce false
congestion signals occurring to TCP from IP layer (IP layer not see
most drops)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Internet Protocols – UDP
•
•
•
Introduction to UDP »
Remote Procedure Call »
Real-Time Transport »
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Introduction to UDP (1)
UDP is a connectionless protocol for the Transport Layer
UDP (User Datagram Protocol) is a shim over IP
• Header has ports (TSAPs), length and checksum.
UDP has an optional checksum for extra reliability.
UDP does NOT do flow control, congestion control, or retransmission based
upon receipt of a bad segment. If any of these services are needed, then the
application will need to do them for itself.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Introduction to UDP (2)
Checksum covers UDP segment and IP pseudo-header
• Fields that change in the network are zeroed out
• Provides an end-to-end reliability delivery check to
help detect mis-delivered packets.
This IP pseudo-header is added to the UDP checksum calculation.
TCP uses this same pseudo-header for its checksum calculation.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
RPC (Remote Procedure Call)
RPC connects applications over the network with the familiar
abstraction of procedure calls
• Stubs package parameters/results into a message
• UDP with retransmissions is a low-latency transport
• Sun’s ONC (Open Network Computing) is a very popular RPCbased product suite (e.g., NFS (network file system), port mapper, …)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
43
Instructor Slide Addition: UDP RPC Example
•
The Network File System (NFS) is one of the Sun ONC RPC (Open Network Computing
Remote Procedure Calls) series of protocols. Like other ONC RPC protocols, it was designed
to be locally used over LANs and operate with a minimum of latency, in this case to
transparently enable distributed file operations as if they were a local-computer only file
system. NFS therefore natively uses UDP as its transport layer protocol.
•
In the early 1990s NFS also became widely deployed within Boeing’s factories. This proved to
be problematic for Boeing because its factories were such an unusually high EMI
(electromagnetic interference) environment that its WIRED Ethernet V2 protocols experienced
such high error rates that NFS transmissions often/frequently timed out unsuccessfully (i.e.,
failed).
•
Boeing management tasked the instructor to represent Boeing to Sun Microsystems to
resolve this issue. The instructor proposed to Sun that NFS be optionally configurable to also
run over TCP protocols. Thus, most deployments would continue to be run over UDP but
Boeing’s factories could be optionally configured to use TCP instead.
•
Sun was willing to implement this proposed change into their products, which is why NFS
today can also be optionally configured to use TCP transports. UDP remains the default.
•
As you should expect by this point of the course, NFS deployments using TCP cannot directly
interoperate with NFS deployments using UDP. Therefore, servers used in both environments
need to be configured to support both transport variants of NFS (dual stacks).
Real-Time Protocol (1)
RTP (Real-time Transport Protocol) provides support for
sending real-time multimedia over UDP (e.g., voice, video, …)
• Often implemented as part of the application
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Real-Time Protocol (2)
RTP header contains fields to describe the type of
media and synchronize it across multiple streams
•
RTCP sister protocol helps with management tasks
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Real-Time Protocol (3)
Buffer at receiver is used to delay packets and absorb
jitter so that streaming media is played out smoothly
Packet 8’s network delay is
too large for buffer to help
Constant rate
Variable rate
Constant rate
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Real-Time Protocol (4)
High jitter, or more variation in delay, requires a larger
play-out buffer to avoid play-out misses
• Propagation delay does not affect buffer size
(i.e., the issue affecting human perceptions of streaming media is jitter,
not latency)
Buffer
Misses
Question: If propagation delay does not effect buffer size, what does?
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Instructor Slide Addition: RTCP
• The Real Time Control Protocol (RTCP) is a companion
protocol to RTP that is also defined in RFC 3550.
• RTCP handles feedback, synchronization, and user
interfaces to real time operations.
• Instructor was tasked by Microsoft to work within the IETF
to develop RTCP to flexibly enable a standards-based
user control for the Microsoft Netshow product’s
distributed multimedia Internet streaming operations:
streaming fast forward, pause, stop, rewind, skip to a
predetermined point, etc. It can change resolutions and do
a wide-variety of control tasks to the streamed multimedia.
Internet Protocols – TCP
The Transmission Control Protocol (TCP) is the Internet’s most
popular transport layer connection-oriented protocol. TCP only
accepts user data streams from local application processes. All TCP
connections are full duplex and Point-to-Point.
•
•
•
•
•
•
•
The TCP service model »
The TCP segment header »
TCP connection establishment »
TCP connection state modeling »
TCP sliding window »
TCP timer management »
TCP congestion control »
Question: Can TCP be used to convey multicast traffic?
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
The TCP Service Model (1)
TCP provides applications with a reliable byte stream
between processes; it is the workhorse of the Internet
•
•
Port Registry: https://www.ietf.org/assignments/service-namesport-numbers/service-names-port-numbers.xml
Popular servers run on well-known ports
Ports are used by all Transport protocols as well as security Firewalls.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
The TCP Service Model (2)
Applications using TCP see only the byte stream [right]
and not the segments [left], which are sent as separate
IP packets
Four segments, each with 512 bytes
of data and carried in an IP packet
2048 bytes of data
delivered to application
in a single READ call
A TCP connection is a byte stream, not a message stream. TCP breaks
the application’s data stream into pieces not exceeding 64 KB – in practice
often 1460 data bytes or less in order to fit into a single Ethernet frame.
Further fragmentation may be done at the IP Layer.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
The TCP Segment Header
TCP header includes addressing (ports), sliding window
(seq. / ack. number), flow control (window), error control
(checksum) and more. See textbook pages 557 – 560.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Connection Establishment
TCP is a connection-oriented protocol. This means that it has an explicit
connection setup, which is implemented as a 3-way handshake.
TCP sets up connections with the three-way handshake
•
Release is symmetric, also as described before
•
Note the SYN Flood attack (DoS) vulnerability described on page 561 of the
Textbook (solution: listening process must remember seq #, e.g., SYN cookies)
Normal case
Simultaneous connect
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Connection State Modeling (1)
The TCP connection finite state machine has more
states than our simple example from earlier.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Connection State Modeling (2)
Solid line is the normal
path for a client.
Dashed line is the normal
path for a server.
Light lines are unusual
events.
Transitions are labeled
by the cause and action,
separated by a slash.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Sliding Window (1)
TCP adds flow control
to the sliding window
as explained
previously (page 522+)
•
i.e., TCP window mgmt
decouples (by sending both
ACK and WIN) the issues of
error control and flow control.
•
ACK + WIN is the
sender’s
transmission limit
−
WIN indicates receiver’s
current buffer availability
Example of a TCP Window Mgmt where a receiver has a 4096-byte buffer
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Sliding Window (2) – a Detail
Need to add special cases to avoid unwanted behavior
•
E.g., silly window syndrome [e.g., one byte of data shown below]
•
Solution: Receiver should not send window update until it can handle
the max advertised segment size
Silly window syndrome: Receiver application reads single
bytes, so sender always sends one byte segments
Question: Are there popular applications that usually send only one byte of data?
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Timer Management – a detail
TCP estimates retransmit timer values from segment RTTs
•
Timeout value sensitive to variance in RTT as well as the smoothed RTT
−
•
Tracks both average and variance (for Internet case)
Timeout is set to average plus 4 x variance
LAN case – small,
regular RTT
Internet case –
large, varied RTT
Detail: actually are 4 TCP timers – see pages 570-571
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Congestion Control (1)
TCP uses AIMD (Additive Increase Multiplicative Decrease) with
loss signal to control congestion
• AIMD was previously covered on page 537 of textbook; slide 38
•
−
•
Review: AIMD – additively increase bandwidth allocations but then
multiplicatively decrease them when congestion is signaled
Implemented as a congestion window (cwnd) for the number of segments
that may be in the network
However, TCP actually uses several mechanisms
that work together for congestion control:
Name
Mechanism
Purpose
ACK clock
Congestion window (cwnd)
Smooth out packet bursts
Slow-start
Double cwnd each RTT
Rapidly increase send rate to
reach roughly the right level
Additive
Increase
Increase cwnd by 1 packet
each RTT
Slowly increase send rate to
probe at about the right level
Fast
retransmit
/ recovery
Resend lost packet after 3
duplicate ACKs; send new
packet for each new ACK
Recover from a lost packet
without stopping ACK clock
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Congestion Control (2)
Congestion window (cwnd) controls the sending rate
− cwnd = number of bytes the sender may have in the network at
any given time
» Question: how can the sender know its cwnd?
− Segment Sending Rate = cwnd / RTT
− window can stop sender quickly either due to congestion or flow control
» Effective window is the smaller of what either the sender or receiver thinks is the situation
the 4 techniques listed on prev slide:
1. ACK clock (regular receipt of ACKs) paces traffic and smoothes out
sender bursts
−
TCP congestion control Leverages
ACKs pace new segments into
the network and smooth bursts
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Congestion Control (3)
2. Slow start grows congestion window exponentially
•
Doubles every RTT while keeping ACK clock going
•
Mechanism to rapidly increase the send rate to quickly find the
approximate maximum level
Increment cwnd for
each new ACK
CWND = congestion window
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Congestion Control (4)
To keep slow start under control,
the sender keeps a threshold
for the connection called the slow start
threshold. Once threshold crossed, TCP
switches from slow start to additive
increase.
ACK
3. Additive increase grows
cwnd slowly (e.g., TCP
Tahoe)
• Adds 1 every RTT
• Conforms to the ACK
clock rate
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Congestion Control (5)
Slow start followed by additive increase (TCP Tahoe) at the slow start
threshold
−
−
•
The threshold is half of the cwnd (congestion window) of previous data loss
The connection spends much of its time with a reasonable cwnd when slow-start and
additive increase are combined.
Issue: we have to start over after packet loss because by the loss is
detected (e.g., 3 duplicate ACK # in ACKs received) the ACK clock has stopped.
Loss causes timeout;
ACK clock has stopped
so slow-start again
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Congestion Control (6)
However, with fast recovery, we get the classic sawtooth
(TCP Reno)
− Retransmit lost packet after 3 duplicate ACKs
− New packet for each dup. ACK until loss is repaired
The ACK clock doesn’t stop,
so no need to slow-start
65
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
TCP Congestion Control (7)
Much of TCP’s complexity comes from inferring from a stream of duplicate
ACKs which packets have arrived and which have been lost. SACK fixes
this.
SACK (Selective ACKs) extend ACKs with a vector to
describe received segments and hence losses
•
Selectively ACKs up to three ranges of bytes for arrived segments
•
Allows for more accurate retransmissions / recovery
No way for us to efficiently know that 2 and 5 were lost with only ACKs
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
End
Chapter 6
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011