Download chapter5_1

Document related concepts

CAN bus wikipedia , lookup

Parallel port wikipedia , lookup

SIP extensions for the IP Multimedia Subsystem wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Deep packet inspection wikipedia , lookup

I²C wikipedia , lookup

IEEE 1355 wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Real-Time Messaging Protocol wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Internet protocol suite wikipedia , lookup

TCP congestion control wikipedia , lookup

Transcript
End-to-End Protocols
User Datagram Protocol (UDP)
Transmission Control Protocol(TCP)
1
Underlying Best-Effort Network
•
•
•
•
•
Drops messages
Re-orders messages
Delivers duplicate copies of a given message
Limits messages to some finite size
Delivers messages after an arbitrarily long delay
2
Common End-to-End Services
•
•
•
•
•
•
•
Guarantee message delivery
Deliver messages in the same order they are sent
Deliver at most one copy of each message
Support arbitrarily large messages
Support synchronization
Allow the receiver to flow control the sender
Support multiple application processes on each host
3
Transport Protocol
• All end-to-end services provided by transport
protocol
 Separate layer of protocol stack
 Conceptually between
- Applications layer
- IP layer
4
Terminology
• IP
- Provides computer-to-computer (or node to node)
communication in two different networks
- Source and destination addresses are computers
- Called machine-to-machine
 Transport protocols
- Provide application-to-application communication
- Need extended addressing mechanism to identify
applications (port numbers)
- Called end-to-end
5
Transport Protocol Functionality
• Identify sending and receiving applications (by
using port numbers)
 Optionally provide
- Reliability
- Flow control
- Congestion control
 Note: not all transport protocols provide above
facilities
6
Two Transport Protocols Available
• Transmission Control Protocol (TCP)
 User Datagram Protocol (UDP)
 Major differences
- Interface to applications
- Functionality
7
User Datagram Protocol (UDP)
• Provides unreliable transfer
 Requires minimal
- Overhead
- Computation
- Communication
 Best for LAN applications (because LAN is a very
reliable environment)
 Used to support audio/video streaming
applications or similar applications that does not
require absolute reliability
8
UDP Details
• Connectionless service paradigm
- Message-oriented interface
 Each message encapsulated in IP datagram
 UDP header identifies
- Sending application
- Receiving application
9
UDP Detail (cont’d)
•






Included in TCP/IP suite
Supports many protocol ports
Transports a message from one machine to another
Does not use acknowledgements
Does not order incoming messages
Does not do flow control
Application program is responsible to handle all problems
such as loss, duplication, delay, out of order delivery, etc.
10
Simple Demultiplexor (UDP)
• Adds multiplexing
• No flow control
• Endpoints identified by ports
0
16
SrcPort
31
DstPort
Length
Checksum
Data
– servers have well-known ports
– see /etc/services on Unix
• Header format
• Optional checksum
– pseudo header + UDP header + data
 Source port is optional (zero, if not used)
 Length includes all fields (Minimum value is 8)
11
UDP Checksum
• Optional: checksum can be used to verify delivery of
datagram to a correction destination
• Pseudoheader
– Consists of three fields from IP header: protocol number, source IP
address, and destination IP address plus the UDP length field
• UDP uses the same checksum algorithm as IP
• Note: UDP length is included twice in the calculation
12
UDP Pseudo-Header
• Used to verify that UDP has reached its
destination
• Used during UDP checksum computation
Zero
Source IP Address
Destination IP Address
Proto
UDP Length
13
UDP Protocol Layering
• Operates in the same layer as TCP
Application
Transport
Internet
Network Interface
Physical
14
UDP Encapsulation
 Entire UDP message is encapsulated in an IP datagram
UDP-Hdr Data Area
IP-Hdr
Frame-Hdr
IP Data Area
Frame Data Area
15
Note: The IP layer is responsible only for
transferring data between a pair of hosts on an
internet, while UDP layer is responsible only for
differentiating among multiple sources or
destinations within one host.
16
UDP Message Queue
Application
process
Application
process
Application
process
Ports
Queues
Packets
demultiplexed
UDP
Packets arrive
17
Identifying An Application
• Cannot extend IP address: No unused bits
 Cannot use OS-dependent quantity: Process ID, task
number or Job name
 Must work on all computer systems
 Invent new abstraction
- Used only with TCP/IP
- Identifies sender and receiver unambiguously
 Technique
- Each application assigned unique integer
- Called protocol port number
18
Protocol Ports
 Server
- Follows standard
- Always uses same port number
- Uses lower port numbers
 Client
- Obtains unused port from protocol software
- Uses higher port numbers
19
Protocol Port Example
• Domain name server application is assigned port 53
 Application using DNS obtains port 28900
 UDP datagram sent from application to DNS server has
- Source port number 28900
- Destination port number 53
 When DNS server replies, UDP datagram has
- Source port number 53
- Destination port number 28900
20
Common Port Assignments
Protocol
Echo
Port
7
Encoding
TCP/UDP
Daytime
FTP-Data
FTP
TELNET
SMTP
Whois
13
20
21
23
25
43
TCP/UDP
TCP
TCP
TCP
TCP
TCP
Finger
HTTP
POP3
79
80
110
TCP
TCP
TCP
NNTP
119
TCP
Purpose
Used to verify whether two machines can
communicate
Provides the current time on the server
Used to transfer files
To send ftp commands such as "put" and "get
For interactive, remote command-line sessions
To send email between machines
Directory service for Internet network
administrator
Gets information about a user or users
Protocol of the World Wide Web
Post Office Protocol Version 3 for transfer of
email from host to clients
Network News Transfer Protocol
21
Transmission Control Protocol
(TCP)
 Major transport protocol used in Internet
 Heavily used
 Completely reliable transfer
22
TCP Features
•






Connection-oriented service
Point-to-point
Full-duplex communication
Stream interface
Stream divided into segments for transmission
Each segment encapsulated in IP datagram
Uses protocol ports to identify applications
23
TCP Overview
• Connection-oriented
• Byte-stream
• Full duplex
• Flow control: keep sender
from overrunning receiver
• Congestion control: keep
sender from overrunning
network
– app writes bytes
– TCP sends segments
– app reads bytes
Application process
Application process
…
…
Write
bytes
TCP
Send buffer
Segment
Read
bytes
TCP
Receive buffer
Segment
…
Segment
Transmit segments
24
TCP Feature Summary
TCP provides a completely reliable (no data
duplication or loss), connection-oriented, fullduplex stream transport service that allows two
application programs to form a connection, send
data in either direction, and then terminate the
connection.
25
Data Link Versus Transport
• Potentially connects many different hosts
– need explicit connection establishment and termination
• Potentially different RTT
– need adaptive timeout mechanism
• Potentially long delay in network
– need to be prepared for arrival of very old packets
• Potentially different capacity at destination
– need to accommodate different node capacity
• Potentially different network capacity
– need to be prepared for network congestion
26
Apparent Contradiction




IP offers best-effort (unreliable) delivery
TCP uses IP
TCP provides completely reliable transfer
How is this possible?
27
Achieving Reliability
 Reliable connection startup
 Reliable data transmission
 Graceful connection shutdown
28
Reliable Data Transmission
• Positive acknowledgement
- Receiver returns short message when data arrives
- Called acknowledgement
 Retransmission
- Sender starts timer whenever message is transmitted
- If timer expires before acknowledgement arrives,
sender retransmits message
29
How Long Should TCP Wait Before
Retransmitting?
 Time for acknowledgement to arrive depends on
- Distance to destination
- Current traffic conditions
 Multiple connections can be open simultaneously
 Traffic conditions change rapidly
30
Important Point
The delay is required for data to reach a
destination and an acknowledgement to return
depends on traffic in the internet as well as the
distance to the destination. Because it allows
multiple application programs to communicate
with multiple destinations concurrently, TCP must
handle a variety of delays that can change rapidly.
31
Solving Retransmission Problem
 Keep estimate of roundtrip time on each
connection
 Use current estimate to set retransmission timer
 Known as adaptive retransmission
 Key to TCP's success
32
TCP Flow Control
 Receiver
- Advertises available buffer space
- Called window
 Sender
- Can send up to entire window before ack arrives
33
Window Advertisement
• Each acknowledgement carries new window
information
- Called window advertisement
- Can be zero (called closed window)
 Interpretation: I have received up through X, and
can take Y more octets
34
Startup And Shutdown
 Connection startup
- Must be reliable
 Connection shutdown
- Must be graceful
 Difficult
35
Why Startup/Shutdown Difficult
 Segments can be
-
Lost
Duplicated
Delayed
Delivered out of order
Either side can crash
Either side can reboot
 Need to avoid duplicate "shutdown" message from
affecting later connection
36
TCP's Startup/Shutdown Solution
 Uses three-message exchange
 Known as 3-way handshake
 Necessary and sufficient for
- Unambiguous, reliable startup
- Unambiguous, graceful shutdown
 SYN used for startup
 FIN used for shutdown
37
Connection Establishment and
Termination
Active participant
(client)
Passive participant
(server)
38
Illustration: Closing a connection
Host 1
Send FIN + ACK
.
.
Host 2
Receive FIN+ACK
Send FIN +ACK
Receive FIN+ACK
Send ACK
Receive ACK
39
TCP Segment Format
• All TCP segments have
same format
-
Data
Acknowledgement
SYN (startup)
FIN (shutdown)
 Segment divided into two
parts
- Header
- Payload area (zero or more
bytes of data)
 Header contains
 protocol port numbers to
identify sending and
receiving applications
 Bits to specify items such as
 SYN
 FIN
 ACK
 Fields for window
advertisement,
acknowledgement, etc.
40
TCP Segment Format
0
10
4
16
31
SrcPort
DstPort
SequenceNum
Acknow ledgment
HdrLen
0
Flags
AdvertisedWindow
Checksum
UrgPtr
Options (variable)
Data
• Sequence number specifies where in stream data belongs
• Acknowledgement field specifies the sequence number of first byte in stream
data to be received next (implying all previous bytes received correctly)
• Few segments contain options
41
Segment Format (cont)
• Each connection identified with 4-tuple:
– (SrcPort, SrcIPAddr, DsrPort, DstIPAddr)
• Sliding window + flow control
– acknowledgment, SequenceNum, AdvertisedWinow
Data(SequenceNum)
Sender
Receiver
Acknowledgment +
AdvertisedWindow
• Flags (6 bits)
– SYN, FIN, RESET, PUSH, URG, ACK
• Checksum
– pseudo header + TCP header + data
• HdrLen - Header length in 32-bit words
42
Flag/Code Bits in TCP Segment
• 6-bit field
• Tells how to interpret other fields in the header
• RST bit set to abort the connection during unexpected
situation (ex. Arrival of an unexpected segment )
Bit (left to right)
URG
ACK
PSH
RST
SYN
FIN
Meaning if bit set to 1
Urgent pointer field is valid
Acknowledgement field is valid
This segment requests a push
Reset the connection
Synchronize sequence numbers
Sender has reached end of its byte stream
43
Out of Band Data
• Data to be sent immediately
 Example: interrupting or aborting the program in a remote login
session.
 TCP option: specify data as urgent
 Set URG code bit
 Set the urgent pointer (UrgPtr) specifying where urgent data ends in
the segment (Urgent data is contained at the front the segment body)
 TCP tells the application to return to normal mode after consumption
of urgent data
• PUSH flag to tell receiving TCP to notify the receiving process about
the PUSH operation
44
TCP Checksum Computation
•





TCP prepends a pseudo header to the segment
Appends enough zero bits to make the segment a multiple of 16 bits
Computes 16-bit checksum over the entire result
Uses 16-bit arithmetic
Takes one's complement of the one's complement sum
Notes:
- TCP does not count the pseudo header or padding in the
segment length
- TCP does not transmit the pseudo header or padding
- TCP assumes the checksum field itself is zero before
computation
45
Format of Pseudoheader used in TCP
computation
0
8
ZERO
16
31
SOURCE IP ADDRESS
DESTINATION IP ADDRESS
PROTOCOL
TCP LENGTH
• PROTOCOL field - for IP datagrams, it is 6
• TCP LENGTH field - total length of TCP segment including
TCP header
46
Purpose of Pseudo Header
•
To verify the segment has reached its correct destination
(Port number & destination IP address)
 IP passes the source and destination IP addresses to the
receiving TCP from the datagram
 Receiving TCP performs the same computation
47
Timeout and Retransmission
• Note: TCP software must accommodate both the
vast differences in time required to reach various
destinations and the changes in time required to
reach a given destination as traffic load varies.
• Uses an adaptive retransmission algorithm
 Records the time at which each segment is sent
 Records the time at which an acknowledgement arrives for
the segment
 From the two times, it computes a sample round trip time
(RTT).
48
Adaptive Formulas
 From the sample RTT and old RTT, it estimates the round-trip time for
next segment
- Example:
RTT = (a * Old_RTT) + ((1 - a)* New_Round_Trip_Sample
where, 0 < a < 1
 Calculates a timeout value as a function of the current round trip
estimate
- Example
Timeout = b * RTT
where b > 1 (Recommended value for b is 2)
49
Accurate Measurement Of Round Trip
Samples
• TCP uses a cumulative acknowledgement
 ACK refers to data received but not to the instance of a
specific segment
 In case of retransmission, both the original and the
retransmitted segments carry exactly same data
 TCP cannot relate an ACK with the original or the
retransmitted segment
 The problem is called acknowledgement ambiguity
50
Questions
 How is it going to compute accurate sample RTT?
 What can happen, if it computes RTT from the time of
original transmission? (larger and larger timeout, becomes
unstable, why?)
 What can happen, if it computes RTT from the time of the
most recent transmission? (smaller and smaller but
becomes stable at 1/2 of the correct value, thus it sends
each segment twice even though no loss occurs)
51
Karn's Algorithm And Timer Backoff
• If the original transmission and the most recent transmission
both fail to provide accurate round trip times, what should
TCP do?
 Simple solution: Avoid ambiguous ACKs altogether; only
adjust the estimated round trip for unambiguous ACKs.
Known as Karn's algorithm
 It can lead to failure as well. When? How?
- When TCP sends a segment after a sharp increase in delay
and keeps computing a timeout using existing roundtrip
estimate. Timeout will be too small and will force
retransmission. Will never update the estimate and cycle will
continue.
52
Remedy
• Sender combines retransmission timeouts with a timer
backoff strategy
 Each time it must retransmit a segment, TCP increases the
timeout upto an upperbound (larger than the delay along
any path in the internet)
 Example implementation:
new_timeout = g * timeout
typically, g = 2.
53
Karn/Partridge Algorithm
Receiver
Sender
Receiver
SampleR TT
SampleR TT
Sender
• Do not sample RTT when retransmitting
• Double timeout after each retransmission
54
Final Form of Karn’s Algorithm
• When computing the round trip estimate, ignore samples
that correspond to retransmitted segments, but use a
backoff strategy, and retain the timeout value from a
retransmitted packet for subsequent packets until a valid
sample is observed.
 Also known as Karn/Partridge algorithm
 Experience shows that Karn's algorithm works well even in
networks with high packet loss
55
Responding To High Variance In Delay
• Research shows:
- Computations described above do not adapt to a wide
range of variations in delay.
 1989 specification for TCP
- Estimate both the average time and the variance
- Use the estimated variance in place of constant b
 Can adapt to a wider range of variation in delay
and yield substantially higher throughput.
56
New Set of Equations
•
•
•
•
DIFF = SAMPLE - Old_RTT
Smoothed_RTT = Old_RTT + d * DIFF
DEV = Old_DEV + r (|DIFF| - Old_Dev)
Timeout = Smoothed_RTT + h * DEV
where
DIFF is the estimated mean deviation
d is a fraction 0 to 1 (inverse power of 2)
r is fraction 0 to 1
h is a controlling factor ( h = 3, for 4.3BSD it was 2)
57
Example
• Given
– Old_RTT = 40 ms
– Sample_RTT = 25 ms
– Old_Dev = 10 ms
– and d = 1/4, r = 0.4, h = 3
• Calculate Timeout
• Solution
–
–
–
–
Diff = 25 - 40 = -15 ms
Smoothed_RTT = 40 + (1/4) * -15 = 36.25 ms
DEV = 10 + 0.4 * (15 - 10) = 12 ms
Timeout = 36.25 + 3 * 12 = 72.25 ms
58
Congestion
 Condition of severe delay
 Caused by an overload of datagrams at one or more
switching points (routers)
• Retransmissions aggravate congestion instead of
alleviating it. Why?
59
Congestion Collapse
• A condition at which the entire network becomes
useless due to congestion (a stalemate situation).
60
How to Avoid Congestion Collapse
• TCP must reduce transmission rates when congestion
occurs
• Routers watch queue lengths and use techniques like ICMP
source quench to inform hosts that congestion has occurred
61
TCP Standard Techniques To Avoid
Congestion
• Remember the size of the receiver's window
• Maintain a second limit, called the congestion window
limit as:
Allowed_window = min(receiver_advertisement,
congestion_window)
• Use the slow-start algorithm during recovery stage and the
multiplicative decrease algorithm for avoidance of
congestion
62
Multiplicative Decrease Congestion
Avoidance Algorithm
Upon loss of a segment:
– Reduce the congestion window by half (down to a
minimum of at least one segment)
– For those segments that remain in the allowed window,
backoff the retransmission timer exponentially.
63
Slow-Start (Additive) Recovery
Whenever starting traffic on a new connection or
increasing traffic after a period of congestion,
start the congestion window at the size of a single
segment and increase the congestion window by
one segment each time an acknowledgement
arrives (one more segment for each acknowledged
one).
Notice carefully, during favorable situation (no
congestion and no loss), the window will grow as
1, 2, 4, 8, 16, . . ., 2n segments
64
Congestion Avoidance Phase
Once the congestion window reaches one half of
its original size before congestion, TCP slows
down the rate of increment. It increases the
congestion window by 1 only if all segments in the
window have been acknowledged.
65
Sliding Window Revisited
Sending application
Receiving application
TCP
LastByteWritten
LastByteAcked
LastByteSent
• Sending side
– LastByteAcked < =
LastByteSent
– LastByteSent < =
LastByteWritten
– buffer bytes between
LastByteAcked and
LastByteWritten
TCP
LastByteRead
NextByteExpected
LastByteRcvd
• Receiving side
– LastByteRead <
NextByteExpected
– NextByteExpected < =
LastByteRcvd +1
– buffer bytes between
LastByteRead and
LastByteRcvd
66
Flow Control
• Send buffer size: MaxSendBuffer
• Receive buffer size: MaxRcvBuffer
• Receiving side
– LastByteRcvd - LastByteRead < = MaxRcvBuffer
– AdvertisedWindow = MaxRcvBuffer - (NextByteExpected LastByteRead)
• Sending side
– LastByteSent - LastByteAcked < = AdvertisedWindow
– EffectiveWindow = AdvertisedWindow - (LastByteSent LastByteAcked)
– LastByteWritten - LastByteAcked < = MaxSendBuffer
– block sender if (LastByteWritten - LastByteAcked) + y >
MaxSenderBuffer where y is the number of bytes sender is trying to
write
• Always send ACK in response to arriving data segment
• Persist when AdvertisedWindow = 0
67
Smart Sender/Dumb Receiver Rule
• Problem
Suppose
– Receiver notifies AdvertisedWindow = 0 in an ACK
segment
– Sender is not permitted to send any more segment.
Receiver does not spontaneously sends any non-data
segment
– How sender can discover that the advertised window is
no longer 0 in future
• Solution
– Sender side persists in sending a segment with 1 byte of
data.
68
Protection Against Wrap Around
• 32-bit SequenceNum
Bandwidth
T1 (1.5 Mbps)
Ethernet (10 Mbps)
T3 (45 Mbps)
FDDI (100 Mbps)
STS-3 (155 Mbps)
STS-12 (622 Mbps)
STS-24 (1.2 Gbps)
Time Until Wrap Around
6.4 hours
57 minutes
13 minutes
6 minutes
4 minutes
55 seconds
28 seconds
• Not enough for STS-12 and STS-24 as recommendation says that
seqnum should not wrap around within a 120-second period of time
69
Keeping the Pipe Full
• 16-bit AdvertisedWindow
• Assume, RTT = 100 ms
Bandwidth
T1 (1.5 Mbps)
Ethernet (10 Mbps)
T3 (45 Mbps)
FDDI (100 Mbps)
STS-3 (155 Mbps)
STS-12 (622 Mbps)
STS-24 (1.2 Gbps)
Delay x Bandwidth Product
18KB
122KB
549KB
1.2MB
1.8MB
7.4MB
14.8MB
70
TCP Extensions
•
•
•
•
Implemented as header options
Store timestamp in outgoing segments
Extend sequence space with 32-bit timestamp
Shift (scale) advertised window
71
Initial Sequence Numbers
• TCP software of both sides agree on initial sequence
numbers
 TCP software in each machine chooses an initial sequence
number at random
 Sequence numbers are sent and acknowledged during
handshake
72
TCP State Machine
• TCP operation can be modeled as a finite state
machine
 Terminology
- Passive open command
 To wait for a connection from another machine
- Active open command
 To initiate a connection
- Maximum segment lifetime
 Maximum time an old segment can remain alive in an internet
(120 seconds by current standard)
73
State Transition Diagram
CLOSED
Active open/SYN
Passive open
Close
Close
LISTEN
SYN_RCVD
SYN/SYN + ACK
Send/SYN
SYN/SYN + ACK
ACK
Close/FIN
SYN_SENT
SYN + ACK/ACK
ESTABLISHED
Close/FIN
FIN/ACK
FIN_WAIT_1
CLOSE_WAIT
FIN/ACK
ACK
Close/FIN
FIN_WAIT_2
CLOSING
FIN/ACK
ACK Timeout after two
segment lifetimes
TIME_WAIT
LAST_ACK
ACK
CLOSED
74
State Diagram Details
• All states above ESTABLISHED are involved in
opening a connection
• All states below ESTABLISHED are involved in
closing a connection
• Actual data transfer occurs in ESTABLISHED
state
• A state transition occurs when
– a segment arrives from the peer or
– the local application process invokes an operation on
TCP
• Each arc is labeled in the form of event/action
75
State Diagram Details (Contd)
• Establishing a connection (3-way handshake)
– Client state transitions:
• CLOSED -> SYN_SENT -> ESTABLISHED
– Server state transitions:
• LISTEN -> SYN_RECD->ESTABLISHED
• Three common ways to get a connection from
ESTABLISHED state to CLOSED state
– This sides closes first
• ESTABLISHED -> FIN_WAIT_1 -> FIN_WAIT_2 -> TIME_WAIT
-> CLOSED
– The other side closes first
• ESTABLISHED -> CLOSE_WAIT -> LAST_ACK -> CLOSED
– Both sides close at the same time
• ESTABLISHED-> FIN_WAIT_1 -> CLOSING -> TIME_WAIT ->
CLOSED
76
TCP Performance
• TCP is a complex protocol
 Common misconception
- Since it is complex, the code must be cumbersome and
inefficient
 But experiments show:
 The same TCP that operates over the global Internet can
deliver 8 Mbps of sustained throughput of user data
between two workstations on a 10Mbps Ethernet.
77