Download Outline

Document related concepts

Asynchronous Transfer Mode wikipedia , lookup

Network tap wikipedia , lookup

Airborne Networking wikipedia , lookup

Deep packet inspection wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

IEEE 1355 wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Internet protocol suite wikipedia , lookup

TCP congestion control wikipedia , lookup

Transcript
Outline
The Transport Layer
 The TCP Protocol (RFC 793, 1122, 1323,...)









TCP Characteristics
TCP Connection setup
TCP Segments
TCP Sequence Numbers
TCP Sliding Window
Timeouts and Retransmission
(Congestion Control and Avoidance)
The UDP Protocol (RFC 768)
1
Well known port numbers
• 0-1023 is managed by IANA, e.g.:
2
Review of the transport layer
Leland.Stanford.edu
Athena.MIT.edu
Application Layer
Nick
Dave
Transport Layer
O.S.
D
Data
Header
Data
O.S.
Header
Network Layer
H
D
H
D
D
H
H
D
D
H
H
Link Layer
3
Layering: The OSI Model
layer-to-layer communication
Application
Application
Presentation
Presentation
Session
Session
7
6
5
4
3
2
1
7
6
Peer-layer communication
Transport
Router
Router
Transport
Network
Network
Network
Network
Link
Link
Link
Link
Physical
Physical
Physical
Physical
5
4
3
2
1
4
Layering: Our FTP Example
Application
Presentation
FTP
ASCII/Binary
Session
TCP
Transport
IP
Network
Ethernet
or HDLC
+V.35
Link
Transport
Network
Link
Physical
Application
The 7-layer OSI Model
The 4-layer Internet model
5
UDP, TCP, SCTP
6
TCP Characteristics

TCP is connection-oriented.



TCP provides a stream-of-bytes service.
TCP is reliable:







3-way handshake used for connection setup/teardown.
Acknowledgements indicate delivery of data.
Checksums are used to detect corrupted data.
Sequence numbers detect missing, or mis-sequenced data.
Corrupted data is retransmitted after a timeout.
Mis-sequenced data is re-sequenced.
(Window-based) Flow control prevents over-run of receiver.
TCP uses congestion control to share network
capacity among users.
7
TCP is connection-oriented
(Active)
Client
Syn
(Passive)
Server
Syn + Ack
Ack
(Active)
Client
Fin
(Passive)
Server
(Data +) Ack
(Data)
Fin
Ack
Connection Setup
3-way handshake
Connection Close/Teardown
2 x 2-way handshake
8
TCP supports a “stream of
bytes” service
Host A
Host B
9
…which is emulated using TCP
“segments”
Host A
Segment sent when:
TCP Data
Host B
1. Segment full (MSS bytes),
2. Not full, but times out, or
3. “Pushed” by application.
TCP Data
10
TCP segment format
11
Pseudo header used in checksum
 IP header
12
The TCP Segment Format
IP Data
TCP Data
0
TCP Hdr
15
Src port
31
Dst port
Sequence #
Ack Sequence #
HLEN
4
RSVD
6
Flags
URG
ACK
PSH
RST
SYN
FIN
TCP Header
and Data + IP
Addresses
Checksum
IP Hdr
Window Size
Src/dst port numbers
and IP addresses
uniquely identify socket
Urg Pointer
(TCP Options)
TCP Data
13
TCP segment
structure
32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection established
(setup, tear down
commands)
Internet
checksum
(as in UDP)
source port #
dest. port #
sequence number
acknowledgement number
head not
UA P R S F
len used
checksum
rcvr window size
ptr urgent data
Options (variable length)
application
data
(variable length)
counting
by bytes
of data
(not segments!)
# bytes
rcvr willing
to accept
typically:
maximum
TCP payload
(default is
536bytes);
window scale,
selective
repeat14
Sequence Numbers
Host A
ISN (initial sequence number)
Sequence
number = 1st
byte
Host B
TCP Data
TCP
HDR
TCP Data
Ack sequence
number = next
expected byte
TCP
HDR
15
Initial Sequence Numbers
(Active)
Client
(Passive)
Server
Syn +ISNA
Syn + Ack +ISNB
Ack
Connection Setup
3-way handshake
16
3-way Handshake for connection establishment
Host A
Host B
17
TCP application example
Host A (Client)
socket
connect (blocks)
Host B (Server)
socket
bind
listen
accept (blocks)
connect returns
write
read (blocks)
accept returns
read (blocks)
read returns
write
read (blocks)
read returns
18
TCP Window control
Host A
Host B
t0
t1
t2
t3
t4
19
Connection Termination
Host A
Host B
Deliver 150 bytes
20
TCP states
21
TCP finite state machine
22
Flow control problems
23
TCP window management
24
TCP flow control
• Window based
• Sender cannot send more data than a
window without acknowledgements.
• Window is a minimum of receiver’s buffer
and ‘congestion window’.
• After a window of data is transmitted, in
steady state, acks control sending rate.
25
TCP Flow control
•
•
•
•
Congestion window is increased gradually
At the beginning, set cwnd = 1 (TCP segm)
At the beginning, set treshold = 64K
For each ack, double the cwnd until a
threshold (slow start)
• Increase by 1 for a window of acks after
that (additive increase)
26
Slow Start
27
Additive Increase
28
Basic Control Model
• Reduce speed when congestion is perceived
– How is congestion signaled?
• Either mark or drop packets
– How much to reduce?
• Increase speed otherwise
– Probe for available bandwidth – how?
29
Phase Plots
• Simple way to
visualize
behavior of
competing
connections
over time
User 2’s
Allocation
x2
User 1’s Allocation x1
30
Phase Plots
• What are
desirable
properties?
• What if flows
are not equal?
Fairness Line
Overload
User 2’s
Allocation
x2
Optimal point
Underutilization
Efficiency Line
User 1’s Allocation x1
31
Additive Increase/Decrease
• Both X1 and X2
increase/ decrease
by the same amount
over time
– Additive increase
improves fairness and
additive decrease
reduces fairness
Fairness Line
T1
User 2’s
Allocation
x2
T0
Efficiency Line
User 1’s Allocation x1
32
Muliplicative
Increase/Decrease
• Both X1 and X2
increase by
the same
factor over
time
Fairness Line
T1
User 2’s
Allocation
x2
T0
– Extension from
origin – constant
fairness
Efficiency Line
User 1’s Allocation x1
33
What is the Right Choice?
• Constraints
limit us to
AIMD
– Can have
multiplicative
term in increase
– AIMD moves
towards optimal
point
Fairness Line
x1
User 2’s
Allocation
x2
x0
x2
Efficiency Line
User 1’s Allocation x1
34
TCP Congestion Avoidance
Congestion avoidance
/* slowstart is over
*/
/* Congwin > threshold */
Until (loss event) {
every w segments ACKed:
Congwin++
}
threshold = Congwin/2
Congwin = 1
1
perform slowstart
35
TCP Congestion Control
• When TCP sender sees loss in the network,
TCP window is reduced (sending rate
slowed)
• In fact, TCP cuts the window size in half
whenever a loss occurs and then slowly
builds it back up
36
TCP Window Dynamics
37
TCP Sliding Window
Window Size
Data ACK’d
Outstanding
Un-ack’d data
Data OK
to send
Data not OK
to send yet
Retransmission policy is “Go Back N”.
 Current window size is “advertised” by receiver
(usually 4k – 8k Bytes when connection set-up).

38
TCP Sliding Window
Round-trip time
Round-trip time
Window Size
Host A
Host B
???
Window Size
ACK
(1) RTT > Window size
Window Size
ACK
ACK
(2) RTT = Window size
39
TCP: Retransmission and
Timeouts
Round-trip time (RTT)
Retransmission TimeOut (RTO)
Guard
Band
Host A
Estimated RTT
Data1
Data2
ACK
ACK
Host B
TCP uses an adaptive retransmission timeout value:
Congestion RTT changes
Changes in Routing frequently
40
RTT probability density
small network
large network
41
TCP Timeout
Q: how to set TCP
timeout value?
• too short: premature
timeout
– unnecessary
retransmissions
• too long: slow reaction
to segment loss
• even worse: RTT
fluctuates
RTT
Q: how to estimate RTT?
• SampleRTT: measured time from
segment transmission until ACK
receipt
– ignore retransmissions,
cumulatively ACKed segments
• SampleRTT will vary, want a
“smoother” estimated RTT
– use several recent
measurements, not just current
SampleRTT
• Using the average of SampleRTT
will generate many timeouts due
to network variations
– consider variance as well
freq.
42
RTT
TCP: Retransmission and Timeouts
Picking the RTO is important:


Pick a values that’s too big and it will wait too long to
retransmit a packet,
Pick a value too small, and it will unnecessarily retransmit
packets.
The original algorithm for picking RTO:
1. EstimatedRTT =  EstimatedRTT + (1 - ) SampleRTT
2. RTO = 2 * EstimatedRTT
Characteristics of the original algorithm:


Variance is assumed to be fixed.
But in practice, variance increases as congestion increases.
43
TCP: Retransmission and Timeouts
Newer Algorithm includes estimate of variance in RTT:
Difference = SampleRTT - EstimatedRTT
 EstimatedRTT = EstimatedRTT + (*Difference)
 Deviation = Deviation + *( |Difference| - Deviation )


RTO =  * EstimatedRTT +  * Deviation
1
4
44
TCP Timeout: Initial Timeout
• Estimate the average of RTT
EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
• exponential weighted moving average
• influence of given sample decreases exponentially fast
• typical value of x: 0.125
• Estimate the variance of RTT
Deviation = (1-x)*Deviation +
x*|SampleRTT-EstimatedRTT|
• Set initial timeout value
Timeout = EstimatedRTT + 4*Deviation
45
An Example of Initial Timeout
timeout value
per packet round-trip time
46
TCP: Retransmission and Timeouts
Karn’s Algorithm
Host A
Host B
Host A
Retransmission
Wrong RTT
Sample
Host B
Retransmission
Wrong RTT
Sample
Problem:
How can we estimate RTT when packets are retransmitted?
Solution:
On retransmission, don’t update estimated RTT (and double RTO).
47
TL: TCP flow control
enhancements
• Solutions to silly window syndrome
– Problem: sender sends in large blocks, but receiving
application reads data 1 byte at the time
• Clark (1982)
– receiver avoidance
– prevent receiver from advertising small windows
– increase advertised receiver window by min(MSS,
RecvBuffer/2)
48
TL: TCP flow control
enhancements
• Nagle’s algorithm (1984)
– sender avoidance
– prevent sender from unnecessarily sending small packets
– http://www.rfc-editor.org/rfc/rfc896.txt
• “Inhibit the sending of new TCP segments when new outgoing
data arrives from the user if any previously transmitted data on
the connection remains unacknowledged”
• Allow only one outstanding small (not full sized) segment
that has not yet been acknowledged
• Works for idle connections (no deadlock)
• Works for telnet (send one-byte packets immediately)
• Works for bulk data transfer (delay sending)
49
TCP MSS
• Earlier
– 576 bytes for non-local destinations (other network)
– 1460 bytes for local destinations (same network)
• Now
– 1460 butes and DF bit in IP header set
– ICMP message “fragmentation required, but not
permitted” triggers reduction of MSS
• Workaround now
– Restet DF bit to “0”
50
User Datagram Protocol (UDP)
Characteristics

UDP is a connectionless datagram service.
 There is no connection establishment: packets may show up at
any time.


UDP packets are self-contained.
UDP is unreliable:
 No acknowledgements to indicate delivery of data.
 Checksums cover the header, and only optionally cover the data.
 Contains no mechanism to detect missing or mis-sequenced
packets.
 No mechanism for automatic retransmission.
 No mechanism for flow control, and so can over-run the
receiver.
51
User-Datagram Protocol (UDP)
A1
A2
B1
B2
App
App
App
App
OS
UDP
IP
Like TCP, UDP uses
port number to
demultiplex packets
52
UDP header

UDP Checksum is optional (all-0 permitted)
53
User-Datagram Protocol (UDP)

Why do we have UDP?
 It is used by applications that don’t need reliable
delivery, or
 Applications that have their own special needs, such as
streaming of real-time audio/video
Connection-less: no time needed to set up connection,
each packet (datagram) is independent
54
Stream Control
Transmission Protocol
SCTP
55
SCTP open and close
56
Multiple interfaces
57
Stream v.s. Message based
58