Download ppt

Document related concepts

Piggybacking (Internet access) wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Multiprotocol Label Switching wikipedia , lookup

Airborne Networking wikipedia , lookup

Computer network wikipedia , lookup

Net bias wikipedia , lookup

Deep packet inspection wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

IEEE 1355 wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Routing wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Internet protocol suite wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

TCP congestion control wikipedia , lookup

Transcript
CS 54001-1: Large-Scale
Networked Systems
Professor: Ian Foster
TAs: Xuehai Zhang, Yong Zhao
Winter Quarter
www.classes.cs.uchicago.edu/classes/archive/2003/winter/54001-1
CS 54001-1 Course Goals

Yes
– Gain understanding of fundamental issues
that effect design, construction, and
operation of large-scale networked systems
– Gain understanding of some significant
future trends in network design and use

No
– Learn how to write network applications
CS 54001-1
Winter Quarter
2
Remember

I ask you to:
– Read Peterson and Davies Ch 1 and 2
– Read “End to End Arguments in System
Design”
– Use traceroute to determine paths to
following locations & build map of network
> ANL, IIT, NWU, UIC, Loyola, UIUC, Purdue, Indiana
CS 54001-1
Winter Quarter
3
Last Week:
Internet Design Principles & Protocols

An introduction to the mail system

An introduction to the Internet

Internet design principles and layering

Brief history of the Internet

Packet switching and circuit switching

Protocols

Addressing and routing

Performance metrics

A detailed FTP example
CS 54001-1
Winter Quarter
4
This Week: Routing and Transport

Routing techniques
– Flooding
– Distributed Bellman Ford Algorithm
– Dijkstra’s Shortest Path First Algorithm

Routing in the Internet
– Hierarchy and Autonomous Systems
– Interior Routing Protocols: RIP, OSPF
– Exterior Routing Protocol: BGP

Transport: achieving reliability

Transport: achieving fair sharing of links
CS 54001-1
Winter Quarter
5
Recap:
An Introduction to the Internet
gargoyle.cs.uchicago.edu
Athena.MIT.edu
Application Layer
Ian
Dave
Transport Layer
O.S.
D
Data
Header
Data
Network Layer
H
D
H
D
D
CS 54001-1
O.S.
Header
Winter Quarter
H
H
D
D
H
H
Link Layer
6
Characteristics of the Internet

Each packet is individually routed

No time guarantee for delivery

No guarantee of delivery in sequence

No guarantee of delivery at all!
– Things get lost
– Acknowledgements
– Retransmission
> How to determine when to retransmit? Timeout?
> Need local copies of contents of each packet.
> How long to keep each copy?
> What if an acknowledgement is lost?
CS 54001-1
Winter Quarter
7
Characteristics of the Internet (2)

No guarantee of integrity of data.

Packets can be fragmented.

Packets may be duplicated.
CS 54001-1
Winter Quarter
8
Size of the Routing Table at the
core of the Internet
Source: http://www.telstra.net/ops/bgptable.html
CS 54001-1
Winter Quarter
9
This Week: Routing and Transport

Routing techniques
– Flooding
– Distributed Bellman Ford Algorithm
– Dijkstra’s Shortest Path First Algorithm

Routing in the Internet
– Hierarchy and Autonomous Systems
– Interior Routing Protocols: RIP, OSPF
– Exterior Routing Protocol: BGP

Transport: achieving reliability

Transport: achieving fair sharing of links
CS 54001-1
Winter Quarter
10
The Problem
“A”
“B”
R2
R1
How does R1
choose a route
to host B?
CS 54001-1
Winter Quarter
R4
R3
11
Technique 1: Flooding
Routers forward packets to all ports
except the ingress port.
R1
Advantages:
 Every destination in the network is reachable.
 Useful when network topology is unknown.
Disadvantages:
 Some routers receive packet multiple times.
 Packets can go round in loops forever.
CS 54001-1
Winter Quarter
12
Technique 2: Bellman-Ford Algorithm
Objective: Determine the route from (R1, …, R7) to R8
that minimizes the cost. Examples of link cost:
Distance, data rate, price,
congestion/delay, …
1
R1
R2
R3
4
Winter Quarter
4
R4
2
2
CS 54001-1
1
R6
3
R5
2
R7
3
2
R8
13
Solution is simple by inspection...
(in this case)
1
R1
1
R2
R4
2
2
R3
4
R5
2
4
R6
3
R7
2
3
R8
The solution is a spanning tree with R8 as the root of the tree.
 The Bellman-Ford Algorithm finds the spanning tree automatically.

CS 54001-1
Winter Quarter
14
The Distributed Bellman-Ford Algorithm
1. Let X n  (C1 , C2 , ..., C7 ) where: Ci  cost from Ri to R8 .
2. Set X 0  (, , ,..., ).
3. Every T seconds, router i sends Ci to
its neighbors.
4. If router i is told of a lower cost path
to R8 , it updates Ci . Hence, X n 1  f ( X n )
where f (.) determines the next step improvement.
5. If X n 1  X n then goto step (3).
6. Stop.
CS 54001-1
Winter Quarter
15
Bellman-Ford Algorithm: Example




1
R1
R2
2
2
R3
R1
Inf
R2
Inf
R3
4, R8
R4
Inf
R5
2, R8
R6
2, R8
R7
3, R8
CS 54001-1
R1
1



1

2
Winter Quarter
4

R7
2
R4
R5
R6
2
3
R8

2
R3
3
1
R2
4
R4
R5
4
4
2
2
4
3
R7
3
R6
2
2
3
R8
16
R1
6, R3
R2
4, R5
R3
4, R8
R4
6, R7
R5
2, R8
R6
2, R8
R7
3, R8
R1
5, R2
R2
4, R5
R3
4, R8
R4
5, R2
R5
2, R8
R6
2, R8
R7
3, R8
CS 54001-1
Bellman-Ford Algorithm
6
4
1
R1
1
R2
2
2
4
2
3
2
4
R6
3
R7
2
R5
4
R3
R4
6
2
3
R8
Solution
5
R1
4
1
1
R2
2
4
R3
Winter Quarter
4
5
2
R4
3
2
R5
R6
2
R7 3
2
R8
17
Bellman-Ford Algorithm
Questions:
1.
2.
3.
How long can the algorithm take to run?
How do we know that the algorithm
always converges?
What happens when link costs change, or
when routers/links fail?
CS 54001-1
Winter Quarter
18
A Problem with Bellman-Ford
“Bad news travels slowly”
R1
1
R2
1
1
R3
R4
Consider the calculation of distances to R4:
Time
0
1
2
3
…
CS 54001-1
R1
R2
R3
3,R2
2,R3
1, R4
3,R2
2,R3
3,R2
3,R2
4,R3
3,R2
5,R2
4,R3
5,R2
“Counting
to
…
…infinity” …
Winter Quarter
R3
R4 fails
19
Counting to Infinity Problem
Solutions
1.
2.
3.
Set infinity = “some small integer” (e.g.,
16) Stop when count = 16
Split Horizon: Because R2 received lowest
cost path from R3, it does not advertise
cost to R3
Split-horizon with poison reverse: R2
advertises infinity to R3
CS 54001-1
Winter Quarter
20
Technique 3:
Dijkstra’s Shortest Path First Algorithm




Routers send out update messages
whenever the state of a link changes.
Hence the name: “Link State” algorithm
Each router calculates lowest cost path to
all others, starting from itself
At each step of the algorithm, router adds
the next shortest (i.e., lowest-cost) path to
the tree
Finds spanning tree routed on source
router
CS 54001-1
Winter Quarter
21
Dijkstra’s Shortest Path First
Algorithm: Example
Step 1:
Shortest path set, S  {R8 }. Candidate set, C  {R3 , R5 , R7 , R6 }
Step 2:
S  {R8 , R5 },
C  {R3 , R7 , R6 , R2 }.
Step 3:
S  {R8 , R5 , R6 },
C  {R3 , R7 , R2 , R4 }.
Step 4:
R8
R6
R5
R8
S  {R8 , R5 , R6 , R7 },
C  {R3 , R2 , R4 }.
CS 54001-1
R5
Winter Quarter
R6
R5
R7
R8
22
Dijkstra’s SPF Algorithm
Step 8 : S  {R8 , R5 , R6 , R7 , R2 , R1 , R4 },
C  {}.
R1
1
R2
1
R4
R6
2
R3
CS 54001-1
Winter Quarter
4
R5
2
R7
3
R8
23
This Week: Routing and Transport

Routing techniques
– Flooding
– Distributed Bellman Ford Algorithm
– Dijkstra’s Shortest Path First Algorithm

Routing in the Internet
– Hierarchy and Autonomous Systems
– Interior Routing Protocols: RIP, OSPF
– Exterior Routing Protocol: BGP

Transport: achieving reliability

Transport: achieving fair sharing of links
CS 54001-1
Winter Quarter
24
Routing in the Internet

The Internet uses hierarchical routing

Internet is split into Autonomous Systems (ASs)
– Examples of ASs: Stanford (32), HP (71), MCI
Worldcom (17373)
– Try: whois –h whois.arin.net ASN “MCI Worldcom”

Within an AS, the administrator chooses an
Interior Gateway Protocol (IGP)
– Examples of IGPs: RIP (rfc 1058), OSPF (rfc 1247).

Between ASs, the Internet uses an Exterior
Gateway Protocol
– ASs today use the Border Gateway Protocol, BGP-4
(rfc 1771)
CS 54001-1
Winter Quarter
25
AS ‘A’
Routing in the Internet
AS ‘B’
BGP
BGP
Interior Gateway
Protocol
Stub AS
CS 54001-1
AS ‘C’
Interior Gateway
Protocol
Interior Gateway
Protocol
Transit AS
Stub AS
e.g. backbone service provider
Winter Quarter
26
Routing within a Stub AS

There is only one exit point, so routers
within the AS can use default routing
– Each router knows all Network IDs within
AS
– Packets destined to another AS are sent to
the default router
– Default router is the border gateway to the
next AS

Routing tables in Stub ASs tend to be small
CS 54001-1
Winter Quarter
27
Interior Routing Protocols

RIP (Routing Information Protocol)
– Uses distributed Bellman-Ford algorithm
– Updates sent every 30 seconds
– No authentication
– Originally in BSD UNIX

OSPF (Open Shortest Path First)
– Link-state updates sent (using flooding) as
and when required
– Every router runs Dijkstra’s algorithm
– Authenticated updates
– Autonomous system may be partitioned into
“areas”
CS 54001-1
Winter Quarter
28
Exterior Routing Protocols
Problems:
– Topology: The Internet is a complex mesh of
different ASs with very little structure
– Autonomy of ASs: Each AS defines link costs in
different ways, so not possible to find lowest cost
paths
– Trust: Some ASs can’t trust others to advertise
good routes (e.g., two competing backbone
providers), or to protect the privacy of their traffic
(e.g., two warring nations)
– Policies: Different ASs have different objectives
(e.g., route over fewest hops; use one provider
rather than another)
CS 54001-1
Winter Quarter
29
Border Gateway Protocol (BGP-4)

BGP is not a link-state or distance-vector
routing protocol

BGP advertises complete paths (a list of ASs)

Example of path advertisement:
– “The network 171.64/16 can be reached via the
path {AS1, AS5, AS13}”.



Paths with loops are detected locally and
ignored
Local policies pick the preferred path among
options
When link/router fails, the path is “withdrawn”
CS 54001-1
Winter Quarter
30
This Week: Routing and Transport

Routing techniques
– Flooding
– Distributed Bellman Ford Algorithm
– Dijkstra’s Shortest Path First Algorithm

Routing in the Internet
– Hierarchy and Autonomous Systems
– Interior Routing Protocols: RIP, OSPF
– Exterior Routing Protocol: BGP

Transport: achieving reliability

Transport: achieving fair sharing of links
CS 54001-1
Winter Quarter
31
Outline

The Transport Layer

The TCP Protocol
– TCP Characteristics
– TCP Connection setup
– TCP Segments
– TCP Sequence Numbers
– TCP Sliding Window
– Timeouts and Retransmission
– (Congestion Control and Avoidance)

The UDP Protocol
CS 54001-1
Winter Quarter
32
The Transport Layer

What is the transport layer for?

What characteristics might it have?
– Reliable delivery
– Flow control
–…
CS 54001-1
Winter Quarter
33
Review of the Transport Layer
Athena.MIT.edu
Gargoyle.cs.uchicago.edu
Application Layer
Ian
Dave
Transport Layer
O.S.
D
Data
Header
Data
Network Layer
H
D
H
D
D
CS 54001-1
O.S.
Header
Winter Quarter
H
H
D
D
H
H
Link Layer
34
Layering: FTP Example
Application
Presentation
FTP
ASCII/Binary
Session
TCP
Transport
IP
Network
Ethernet
Link
Transport
Network
Link
Physical
The 7-layer OSI Model
CS 54001-1
Application
Winter Quarter
The 4-layer Internet model
35
TCP Characteristics

TCP is connection-oriented
– 3-way handshake used for connection setup

TCP provides a stream-of-bytes service

TCP is reliable:
– Acknowledgements indicate delivery of data
– Checksums are used to detect corrupted data
– Sequence numbers detect missing, or mis-sequenced
data
– Corrupted data is retransmitted after a timeout
– Mis-sequenced data is re-sequenced
– (Window-based) Flow control prevents over-run of
receiver

TCP uses congestion control to share network capacity
among users
CS 54001-1
Winter Quarter
36
TCP is connection-oriented
(Active)
Client
Syn
(Passive)
Server
Syn + Ack
Ack
(Active)
Client
Fin
(Passive)
Server
(Data +) Ack
Fin
Ack
Connection Setup
3-way handshake
CS 54001-1
Winter Quarter
Connection Close/Teardown
2 x 2-way handshake
37
TCP supports a “stream of bytes”
service
Host A
Host B
CS 54001-1
Winter Quarter
38
…which is emulated using TCP
“segments”
Host A
Segment sent when:
TCP Data
Host B
CS 54001-1
1. Segment full (MSS bytes),
2. Not full, but times out, or
3. “Pushed” by application.
TCP Data
Winter Quarter
39
The TCP Segment Format
IP Data
TCP Data
0
TCP Hdr
15
Src port
31
Dst port
Sequence #
Ack Sequence #
HLEN
4
RSVD
6
Flags
URG
ACK
PSH
RST
SYN
FIN
TCP Header
and Data + IP
Addresses
Checksum
IP Hdr
Window Size
Src/dst port numbers
and IP addresses
uniquely identify socket
Urg Pointer
(TCP Options)
TCP Data
CS 54001-1
Winter Quarter
40
Host A
Sequence Numbers
ISN (initial sequence number)
Sequence
number = 1st
byte
TCP Data
TCP Data
Host B
CS 54001-1
TCP
HDR
Winter Quarter
Ack sequence
number = next
expected byte
TCP
HDR
41
Initial Sequence Numbers
(Active)
Client
(Passive)
Server
Syn +ISNA
Syn + Ack +ISNB
Ack
Connection Setup
3-way handshake
CS 54001-1
Winter Quarter
42
TCP Sliding Window



How much data can a TCP sender have
outstanding in the network?
How much data should TCP retransmit
when an error occurs? Just selectively
repeat the missing data?
How does the TCP sender avoid overrunning the receiver’s buffers?
CS 54001-1
Winter Quarter
43
TCP Sliding Window
Window Size
Data ACK’d
Outstanding
Un-ack’d data
Data OK
to send
Data not OK
to send yet
Retransmission policy is “Go Back N”
 Current window size is “advertised” by receiver
(usually 4k – 8k Bytes when connection set-up)

CS 54001-1
Winter Quarter
44
TCP Sliding Window
Round-trip time
Round-trip time
Window Size
Host A
Host B
???
Window Size
ACK
(1) RTT > Window size
CS 54001-1
Winter Quarter
Window Size
ACK
ACK
(2) RTT = Window size
45
TCP: Retransmission and Timeouts
Round-trip time (RTT)
Retransmission TimeOut (RTO)
Guard
Band
Host A
Estimated RTT
Data1
Data2
ACK
ACK
Host B
TCP uses an adaptive retransmission timeout value:
Congestion RTT changes
Changes in Routing frequently
CS 54001-1
Winter Quarter
46
TCP: Retransmission and Timeouts
Picking the RTO is important:


Pick a values that’s too big and it will wait too long to
retransmit a packet,
Pick a value too small, and it will unnecessarily retransmit
packets.
The original algorithm for picking RTO:
1. EstimatedRTT =  EstimatedRTT + (1 - ) SampleRTT
2. RTO = 2 * EstimatedRTT
Characteristics of the original algorithm:


CS 54001-1
Variance is assumed to be fixed.
But in practice, variance increases as congestion increases.
Winter Quarter
47
TCP: Retransmission and Timeouts
Newer Algorithm includes estimate of variance in RTT:
Difference = SampleRTT - EstimatedRTT
 EstimatedRTT = EstimatedRTT + (*Difference)
 Deviation = Deviation + *( |Difference| - Deviation )


RTO =  * EstimatedRTT +  * Deviation
1
4
CS 54001-1
Winter Quarter
48
TCP: Retransmission and Timeouts
Karn’s Algorithm
Host A
Host B
Host A
Retransmission
Wrong RTT
Sample
Host B
Retransmission
Wrong RTT
Sample
Problem:
How can we estimate RTT when packets are retransmitted?
Solution:
On retransmission, don’t update estimated RTT (and double RTO)
CS 54001-1
Winter Quarter
49
User Datagram Protocol (UDP)
Characteristics

UDP is a connectionless datagram service
– There is no connection establishment: packets may
show up at any time

UDP packets are self-contained

UDP is unreliable:
– No acknowledgements to indicate delivery of data
– Checksums cover the header, and only optionally cover
the data
– Contains no mechanism to detect missing or missequenced packets
– No mechanism for automatic retransmission
– No mechanism for flow control, and so can over-run
the receiver
CS 54001-1
Winter Quarter
50
User-Datagram Protocol (UDP)
A1
A2
B1
B2
App
App
App
App
OS
UDP
Like TCP, UDP uses
port number to
demultiplex packets
IP
CS 54001-1
Winter Quarter
51
User-Datagram Protocol (UDP)
Packet format
By default, only
covers the header.
SRC port
DST port
checksum
length
DATA

Why do we have UDP?
 It is used by applications that don’t need reliable delivery, or
 Applications that have their own special needs, such as
streaming of real-time audio/video.
CS 54001-1
Winter Quarter
52
This Week: Routing and Transport

Routing techniques
– Flooding
– Distributed Bellman Ford Algorithm
– Dijkstra’s Shortest Path First Algorithm

Routing in the Internet
– Hierarchy and Autonomous Systems
– Interior Routing Protocols: RIP, OSPF
– Exterior Routing Protocol: BGP


Transport: achieving reliability
Transport: achieving fair sharing of
links
CS 54001-1
Winter Quarter
53
Main points






Congestion is inevitable
TCP sources detect congestion and, cooperatively, reduce the rate at which they
transmit
The rate is controlled using the TCP window size
TCP modifies the rate according to “Additive
Increase, Multiplicative Decrease (AIMD)”
To jump start flows, TCP uses a fast restart
mechanism (called “slow start”!)
TCP achieves high throughput by encouraging
high delay
CS 54001-1
Winter Quarter
54
Congestion
H1
A1(t)
10Mb/s
R1
H2
D(t)
1.5Mb/s
H3
A2(t)
100Mb/s
A1(t)
A2(t)
Cumulative
bytes
A2(t)
X(t)
D(t)
A1(t)
X(t)
D(t)
t
CS 54001-1
Winter Quarter
55
Congestion is unavoidable
Arguably it’s good!




We use packet switching because it makes
efficient use of the links. Therefore, buffers
in the routers are frequently occupied
If buffers are always empty, delay is low,
but our usage of the network is low
If buffers are always occupied, delay is
high, but we are using the network more
efficiently
So how much congestion is too much?
CS 54001-1
Winter Quarter
56
Load, Delay and Power
Typical behavior of queueing
systems with random arrivals:
A simple metric of how well the
network is performing:
Load
Power 
Delay
Burstiness tends to move
asymptote to the left
Average
Packet delay
Power
Load
CS 54001-1
Winter Quarter
“optimal
load”
Load
57
Options for Congestion Control
1.
2.
3.
Implemented by host versus network
Reservation-based, versus feedbackbased
Window-based versus rate-based
CS 54001-1
Winter Quarter
58
TCP Congestion Control



TCP implements host-based, feedbackbased, window-based congestion control.
TCP sources attempts to determine how
much capacity is available
TCP sends packets, then reacts to
observable events (loss)
CS 54001-1
Winter Quarter
59
TCP Congestion Control

TCP sources change the sending rate by
modifying the window size:
Window = min{Advertized window, Congestion Window}
Receiver


Transmitter (“cwnd”)
In other words, send at the rate of the
slowest component: network or receiver
“cwnd” follows additive
increase/multiplicative decrease
– On receipt of Ack: cwnd += 1
– On packet loss (timeout): cwnd *= 0.5
CS 54001-1
Winter Quarter
60
Additive Increase
Src
D
A
D D
A A
D D
D A A
A
Dest
Actually, TCP uses bytes, not segments to count:
When ACK is received:
 MSS 
cwnd   MSS 

cwnd


CS 54001-1
Winter Quarter
61
Leads to the TCP “sawtooth”
Rate
Timeouts
halved
Could take a long
time to get started!
CS 54001-1
Winter Quarter
t
62
“Slow Start”
Designed to cold-start connection quickly at startup or if
a connection has been halted (e.g. window dropped to zero,
or window full, but ACK is lost).
How it works: increase cwnd by 1 for each ACK received.
Src
1
D
2
A
D D
4
A A
D D
8
D
A
D
A
A
A
Dest
CS 54001-1
Winter Quarter
63
Slow Start
Timeouts
Rate
halved
Exponential “slow
start”
Slow start in operation
until it reaches half of
t cwnd.
previous
Why is it called slow-start? Because TCP originally had
no congestion control mechanism. The source would just
start by sending a whole window’s worth of data.
CS 54001-1
Winter Quarter
64
Fast Retransmit & Fast Recovery



TCP source can take advantage of an
additional hint: if a duplicate ACK arrives
out of sequence, there was probably some
data lost, even if it hasn’t yet timed out.
Upon 3 duplicate ACKs, TCP retransmits.
Does not enter slow-start: there are
probably ACKs in the pipe that will
continue correct AIMD operation.
CS 54001-1
Winter Quarter
65
Course Outline (Subject to Change)
1.
(January 9th) Internet design principles and protocols
2.
(January 16th) Internetworking, transport, routing
3.
(January 23rd) Mapping the Internet and other networks
4.
(January 30th) Security (with guest lecturer: Gene Spafford)
5.
(February 6th) P2P technologies & applications (Matei Ripeanu)
(plus midterm)
6.
(February 13th) Optical networks (Charlie Catlett)
7.
*(February 20th) Web and Grid Services (Steve Tuecke)
8.
(February 27th) Network operations (Greg Jackson)
9.
10.
*(March 6th) Advanced applications (with guest lecturers:
Terry Disz, Mike Wilde)
(March 13th) Final exam
* Ian Foster is out of town.
CS 54001-1
Winter Quarter
66