* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download L09_TCP_IP - Interactive Computing Lab
Distributed firewall wikipedia , lookup
Multiprotocol Label Switching wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
Asynchronous Transfer Mode wikipedia , lookup
Wake-on-LAN wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Computer network wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Network tap wikipedia , lookup
Airborne Networking wikipedia , lookup
Deep packet inspection wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
TCP congestion control wikipedia , lookup
Internet protocol suite wikipedia , lookup
UniPro protocol stack wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Chapter 3
Transport Layer
A note on the use of these ppt slides:
We’re making these slides freely available to all (faculty, students, readers).
They’re in PowerPoint form so you can add, modify, and delete slides
(including this one) and slide content to suit your needs. They obviously
represent a lot of work on our part. In return for use, we only ask the
following:
 If you use these slides (e.g., in a class) in substantially unaltered form,
that you mention their source (after all, we’d like people to use our book!)
 If you post any slides in substantially unaltered form on a www site, that
you note that they are adapted from (or perhaps identical to) our slides, and
note our copyright of this material.
Computer Networking:
A Top Down Approach
5th edition.
Jim Kurose, Keith Ross
Addison-Wesley, April
2009.
Thanks and enjoy! JFK/KWR
All material copyright 1996-2009
J.F Kurose and K.W. Ross, All Rights Reserved
Transport Layer
3-1
Chapter 3: Transport Layer
Our goals:
 understand principles
behind transport layer
services:
 learn about transport
multiplexing/demultiplexing
reliable data transfer
flow control
congestion control
layer protocols in the
Internet:
UDP: connectionless
transport
TCP: connection-oriented
transport
• TCP congestion control
Transport Layer
3-2
Transport services and protocols
 provide logical communication
between app processes
running on different hosts
 transport protocols run in
end systems
 send side: breaks app
messages into segments,
passes to network layer
 rcv side: reassembles
segments into messages,
passes to app layer
 more than one transport
protocol available to apps
 Internet: TCP and UDP
application
transport
network
data link
physical
application
transport
network
data link
physical
Transport Layer
3-3
Transport vs. network layer
 network layer: logical
communication
between hosts
 transport layer: logical
communication
between processes
relies on, enhances,
network layer services
Household analogy:
12 kids sending letters to
12 kids
 processes = kids
 app messages = letters
in envelopes
 hosts = houses
 transport protocol =
Ann and Bill
 network-layer protocol
= postal service
Transport Layer
3-4
Internet transport-layer protocols
 reliable, in-order
delivery (TCP)
congestion control
flow control
connection setup
 unreliable, unordered
delivery: UDP
no-frills extension of
“best-effort” IP
 services not available:
 delay guarantees
 bandwidth guarantees
application
transport
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physicalnetwork
network
data link
physical
data link
physical
network
data link
physical
application
transport
network
data link
physical
Transport Layer
3-5
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP
segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer
3-6
Multiplexing/demultiplexing
Multiplexing at send host:
gathering data from multiple
sockets, enveloping data
with header (later used for
demultiplexing)
Demultiplexing at rcv host:
delivering received segments
to correct socket
= socket
application
transport
network
link
= process
P3
P1
P1
application
transport
network
P2
P4
application
transport
network
link
link
physical
host 1
physical
host 2
physical
host 3
Transport Layer
3-7
How demultiplexing works
 host receives IP datagrams
each datagram has source
IP address, destination IP
address
 each datagram carries
1 transport-layer segment
 each segment has source,
destination port number
 host uses IP addresses &
port numbers to direct
segment to appropriate
socket
32 bits
source port #
dest port #
other header fields
application
data
(message)
TCP/UDP segment format
Transport Layer
3-8
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP
segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer
3-9
UDP: User Datagram Protocol [RFC 768]
 “no frills,” “bare bones”
Internet transport
protocol
 “best effort” service, UDP
segments may be:
 lost
 delivered out of order
to app
 connectionless:
 no handshaking between
UDP sender, receiver
 each UDP segment
handled independently
of others
Why is there a UDP?
 no connection
establishment (which can
add delay)
 simple: no connection state
at sender, receiver
 small segment header
 no congestion control: UDP
can blast away as fast as
desired
Transport Layer 3-10
UDP: more
 often used for streaming
multimedia apps
 loss tolerant
 rate sensitive
Length, in
bytes of UDP
segment,
including
header
 other UDP uses
 DNS
 SNMP
 reliable transfer over UDP:
add reliability at
application layer
 application-specific
error recovery!
32 bits
source port #
dest port #
length
checksum
Application
data
(message)
UDP segment format
Transport Layer
3-11
TCP: Overview
 point-to-point:
 one sender, one receiver
 reliable, in-order byte
steam:
no “message boundaries”
 pipelined:
 TCP congestion and flow
control set window size
 send & receive buffers
socket
door
application
writes data
application
reads data
TCP
send buffer
TCP
receive buffer
RFCs: 793, 1122, 1323, 2018, 2581
 full duplex data:
 bi-directional data flow
in same connection
 MSS: maximum segment
size
 connection-oriented:
 handshaking (exchange
of control msgs) init’s
sender, receiver state
before data exchange
 flow controlled:
 sender will not
socket
door
overwhelm receiver
segment
Transport Layer 3-12
TCP segment structure
32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
Internet
checksum
(as in UDP)
source port #
dest port #
sequence number
acknowledgement number
head not
UA P R S F
len used
checksum
Receive window
Urg data pnter
Options (variable length)
counting
by bytes
of data
(not segments!)
# bytes
rcvr willing
to accept
application
data
(variable length)
Transport Layer 3-13
TCP Connection Management
Recall: TCP sender, receiver establish “connection” before
exchanging data segments
 initialize TCP variables (discussed later):
 seq. #s
 buffers, flow control info (e.g. RcvWindow)
 client: connection initiator
Socket clientSocket = new Socket("hostname","port number");
 server: contacted by client
Socket connectionSocket = welcomeSocket.accept();
Transport Layer 3-14
TCP Connection Management (cont.)
Open a connection:
client open socket:
Socket cs = new
Socket(“hostname”, 80)
Step 1: client host sends TCP SYN
segment to server
 specifies initial seq #
 no data
client
server
open
data
Step 2: server host receives SYN,
replies with SYN-ACK segment
server allocates buffers
 specifies server initial seq. #
Step 3: client receives SYN-ACK,
replies with ACK segment, which
may contain data
Transport Layer 3-15
TCP Connection Management (cont.)
Closing a connection:
client closes socket:
clientSocket.close();
client
server
close
Step 1: client end system
sends FIN control segment
to server
Step 2: server receives
FIN, replies with ACK.
Closes connection, sends
FIN.
Transport Layer 3-16
TCP Connection Management (cont.)
Step 3: client receives FIN,
replies with ACK.
client
server
closing
Enters “timed wait” will respond with ACK
to received FINs
closing
Step 4: server, receives
Note: why do we wait?
timed wait
ACK. Connection closed.
closed
closed
Transport Layer 3-17
TCP seq. #’s and ACKs
Host A
Seq. #’s:
 byte stream
“number” of first
byte in segment’s
data
ACKs:
 seq # of next byte
expected from
other side
 cumulative ACK
User
types
‘C’
Host B
host ACKs
receipt of
‘C’, echoes
back ‘C’
host ACKs
receipt
of echoed
‘C’
simple telnet scenario
time
Transport Layer 3-18
Stop-and-wait (for reliability)
sender
receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R
first packet bit arrives
last packet bit arrives, send ACK
RTT
ACK arrives, send next
packet, t = RTT + L / R
U
sender
=
L/R
RTT + L / R
=
.008
30.008
= 0.00027
microsec
onds
Transport Layer 3-19
Pipelined protocols
Pipelining: sender allows multiple, “in-flight”,
yet-to-be-acknowledged pkts
range of sequence numbers must be increased
buffering at sender and/or receiver
 Two generic forms of pipelined protocols: go-Back-N,
selective repeat
Transport Layer 3-20
Pipelining: increased utilization
sender
receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R
first packet bit arrives
last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
RTT
ACK arrives, send next
packet, t = RTT + L / R
Increase utilization
by a factor of 3!
U
sender
=
3*L/R
RTT + L / R
=
.024
30.008
= 0.0008
microsecon
ds
Transport Layer 3-21
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP
segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-22
TCP reliable data transfer
 TCP creates rdt
service on top of IP’s
unreliable service
 Pipelined segments
 Cumulative acks
 TCP uses single
retransmission timer
 Retransmissions are
triggered by:
timeout events
duplicate acks
 Initially consider
simplified TCP sender:
ignore duplicate acks
ignore flow control,
congestion control
Transport Layer 3-23
TCP sender events:
data rcvd from app:
 Create segment with
seq #
seq # is byte-stream
number of first data
byte in segment
 start timer if not
already running (think
of timer as for oldest
unacked segment)
expiration interval:
TimeOutInterval
timeout:
 retransmit segment
that caused timeout
 restart timer
Ack rcvd:
 If acknowledges
previously unacked
segments
update what is known to
be acked
start timer if there are
outstanding segments
Transport Layer 3-24
TCP: retransmission scenarios
Host A
X
loss
Sendbase
= 100
SendBase
= 120
SendBase
= 100
time
SendBase
= 120
lost ACK scenario
Host B
Seq=92 timeout
Host B
Seq=92 timeout
timeout
Host A
time
premature timeout
Transport Layer 3-25
TCP retransmission scenarios (more)
timeout
Host A
Host B
X
loss
SendBase
= 120
time
Cumulative ACK scenario
Transport Layer 3-26
TCP ACK generation recommendation
[RFC 1122, RFC 2581]
Event at Receiver
TCP Receiver action
Arrival of in-order segment with
expected seq #. All data up to
expected seq # already ACKed
Delayed ACK. Wait up to 500ms
for next segment. If no next segment,
send ACK
Arrival of in-order segment with
expected seq #. One other
segment has ACK pending
Immediately send single cumulative
ACK, ACKing both in-order segments
Arrival of out-of-order segment
higher-than-expect seq. # .
Gap detected
Immediately send duplicate ACK,
indicating seq. # of next expected byte
Arrival of segment that
partially or completely fills gap
Immediate send ACK, provided that
segment starts at lower end of gap
Transport Layer 3-27
Fast Retransmit
 Time-out period often
relatively long:
long delay before
resending lost packet
 Detect lost segments
via duplicate ACKs.
Sender often sends
many segments back-toback
If segment is lost,
there will likely be many
duplicate ACKs.
 If sender receives 3
ACKs for the same
data, it supposes that
segment after ACKed
data was lost:
fast retransmit: resend
segment before timer
expires
Transport Layer 3-28
Host A
Host B
timeout
X
time
Figure 3.37 Resending a segment after triple duplicate ACK
Transport Layer 3-29
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP
segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-30
TCP Flow Control
 receive side of TCP
connection has a
receive buffer:
flow control
sender won’t overflow
receiver’s buffer by
transmitting too much,
too fast
 speed-matching
 app process may be
service: matching the
send rate to the
receiving app’s drain
rate
slow at reading from
buffer
Transport Layer 3-31
TCP Flow control: how it works
 Rcvr advertises spare
(Suppose TCP receiver
discards out-of-order
segments)
 spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd LastByteRead]
room by including value
of RcvWindow in
segments
 Sender limits unACKed
data to RcvWindow
guarantees receive
buffer doesn’t overflow
LastByteSentLastByteAcked <=
RcvWindow
Transport Layer 3-32
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP
segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-33
TCP congestion control:
additive increase, multiplicative decrease (AIMD)
 Approach: increase transmission rate (window size),
Saw tooth
behavior: probing
for bandwidth
congestion window size
probing for usable bandwidth, until loss occurs
 additive increase: increase CongWin by 1 MSS
every RTT until loss detected
 multiplicative decrease: cut CongWin in half after
loss
congestion
window
24 Kbytes
16 Kbytes
8 Kbytes
time
time
Transport Layer 3-34
TCP Congestion Control: details
 sender limits transmission:
LastByteSent-LastByteAcked
 min{CongWin, RecvWindow}
 Roughly,
rate =
CongWin
Bytes/sec
RTT
 CongWin is dynamic, function
of perceived network
congestion
How does sender
perceive congestion?
 loss event = timeout or
3 duplicate acks
 TCP sender reduces
rate (CongWin) after
loss event
three mechanisms:
AIMD
slow start
conservative after
timeout events
Transport Layer 3-35
TCP Slow Start
 When connection begins,
CongWin = 1 MSS
Example: MSS = 500
bytes & RTT = 200 msec
initial rate = 20 kbps
 When connection begins,
increase rate
exponentially fast until
first loss event
 available bandwidth may
be >> MSS/RTT
desirable to quickly ramp
up to respectable rate
Transport Layer 3-36
TCP Slow Start (more)
 When connection
Host B
RTT
begins, increase rate
exponentially until
first loss event:
Host A
double CongWin every
RTT
done by incrementing
CongWin for every ACK
received
 Summary: initial rate
is slow but ramps up
exponentially fast
time
Transport Layer 3-37
Refinement: inferring loss
 After 3 dup ACKs:
is cut in half
 window then grows
linearly
 But after timeout event:
 CongWin instead set to
1 MSS;
 window then grows
exponentially
 to a threshold, then
grows linearly
 CongWin
Philosophy:
 3 dup ACKs indicates
network capable of
delivering some segments
 timeout indicates a
“more alarming”
congestion scenario
Transport Layer 3-38
Refinement
Q: When should the
exponential
increase switch to
linear?
A: When CongWin
gets to 1/2 of its
value before
timeout.
Implementation:
 Variable Threshold
 At loss event, Threshold is
set to 1/2 of CongWin just
before loss event
Transport Layer 3-39
Summary: TCP Congestion Control
 When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially.
 When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows linearly.
 When a triple duplicate ACK occurs, Threshold
set to CongWin/2 and CongWin set to
Threshold.
 When timeout occurs, Threshold set to
CongWin/2 and CongWin is set to 1 MSS.
Transport Layer 3-40
TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K
TCP connection 1
TCP
connection 2
bottleneck
router
capacity R
Transport Layer 3-41
Why is TCP fair?
Two competing sessions:
 Additive increase gives slope of 1, as throughout increases
 multiplicative decrease decreases throughput proportionally
R
equal bandwidth share
loss: decrease window by factor of 2
congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase
Connection 1 throughput R
Transport Layer 3-42
Fairness (more)
Fairness and UDP
 Multimedia apps often
do not use TCP
do not want rate
throttled by congestion
control
 Instead use UDP:
 pump audio/video at
constant rate, tolerate
packet loss
 Research area: TCP
friendly
Fairness and parallel TCP
connections
 nothing prevents app from
opening parallel
connections between 2
hosts.
 Web browsers do this
 Example: link of rate R
supporting 9 connections;
new app asks for 1 TCP, gets
rate R/10
new app asks for 11 TCPs,
gets R/2 !
Transport Layer 3-43
Chapter 3: Summary
 principles behind transport
layer services:
 multiplexing,
demultiplexing
 reliable data transfer
 flow control
 congestion control
 instantiation and
implementation in the
Internet
 UDP
 TCP
Next:
 leaving the network
“edge” (application,
transport layers)
 into the network
“core”
Transport Layer 3-44
Chapter 4
Network Layer
A note on the use of these ppt slides:
We’re making these slides freely available to all (faculty, students, readers).
They’re in PowerPoint form so you can add, modify, and delete slides
(including this one) and slide content to suit your needs. They obviously
represent a lot of work on our part. In return for use, we only ask the
following:
 If you use these slides (e.g., in a class) in substantially unaltered form,
that you mention their source (after all, we’d like people to use our book!)
 If you post any slides in substantially unaltered form on a www site, that
you note that they are adapted from (or perhaps identical to) our slides, and
note our copyright of this material.
Computer Networking: A
Top Down Approach
5th edition.
Jim Kurose, Keith Ross
Addison-Wesley, April
2009.
Thanks and enjoy! JFK/KWR
All material copyright 1996-2009
J.F Kurose and K.W. Ross, All Rights Reserved
Network Layer 4-45
Chapter 4: Network Layer
Chapter goals:
 understand principles behind network layer
services:
IP addresses (+ getting an IP address via DHCP)
 Routing algorithms
 Network of networks (BGP, dealing with scales)
 ICMP
 NAT (network address translation)
Network Layer 4-46
Datagram networks
 no call setup at network layer
 routers: no state about end-to-end connections
 no network-level concept of “connection”
 packets forwarded using destination host address
 packets between same source-dest pair may take
different paths
application
transport
network
data link
physical
1. Send data
2. Receive data
application
transport
network
data link
physical
Network Layer 4-47
The Internet Network layer
Host, router network layer functions:
Transport layer: TCP, UDP
Network
layer
IP protocol
•addressing conventions
•datagram format
•packet handling conventions
Routing protocols
•path selection
•RIP, OSPF, BGP
forwarding
table
ICMP protocol
•error reporting
•router “signaling”
Link layer
physical layer
Network Layer 4-48
IP datagram format
IP protocol version
number
header length
(bytes)
“type” of data
max number
remaining hops
(decremented at
each router)
upper layer protocol
to deliver payload to
how much overhead
with TCP?
 20 bytes of TCP
 20 bytes of IP
 = 40 bytes + app
layer overhead
32 bits
ver
head. type of
len service
16-bit identifier
time to
live
upper
layer
total datagram
length (bytes)
length
fragment
flgs
offset
header
checksum
for
fragmentation/
reassembly
32 bit source IP address
32 bit destination IP address
E.g. timestamp,
record route
taken, specify
list of routers
to visit.
Options (if any)
data
(variable length,
typically a TCP
or UDP segment)
Application
TCP/UDP
IP
Ethernet
Subnets
 IP address:
 subnet part (high
order bits)
 host part (low order
bits)
 What’s a subnet ?
 device interfaces with
same subnet part of IP
address
 can physically reach
each other without
intervening router
223.1.1.1
223.1.2.1
223.1.1.2
223.1.1.4
223.1.1.3
223.1.2.9
223.1.3.27
223.1.2.2
subnet
223.1.3.1
223.1.3.2
network consisting of 3 subnets
Network Layer 4-50
IP addressing: CIDR
CIDR: Classless InterDomain Routing
subnet portion of address of arbitrary length
 address format: a.b.c.d/x, where x is # bits in
subnet portion of address
subnet
part
host
part
11001000 00010111 00010000 00000000
200.23.16.0/23
Network Layer 4-51
IP addresses: how to get one?
Q: How does a host get IP address?
 hard-coded by system admin in a file
Windows: control-panel->network->configuration>tcp/ip->properties
 Linux (ubuntu): /etc/network/interface
 DHCP: Dynamic Host Configuration Protocol:
dynamically get address from as server
 “plug-and-play”
Network Layer 4-52
DHCP: Dynamic Host Configuration Protocol
Goal: allow host to dynamically obtain its IP address
from network server when it joins network
Can renew its lease on address in use
Allows reuse of addresses (only hold address while connected
an “on”)
Support for mobile users who want to join network (more
shortly)
DHCP overview:
1. host broadcasts “DHCP discover” msg
2. DHCP server responds with “DHCP offer” msg
3. host requests IP address: “DHCP request” msg
4. DHCP server sends address: “DHCP ack” msg
Network Layer 4-53
Graph abstraction of a network
5
2
u
v
2
1
x
3
w
3
1
• c(x,x’) = cost of link (x,x’)
5
z
1
y
2
Graph: G = (N,E)
N = set of routers = { u, v, w, x, y, z }
E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w),
(x,y), (w,y), (w,z), (y,z) }
- e.g., c(w,z) = 5
• cost could always be 1, or
inversely related to bandwidth,
or inversely related to
congestion
Cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)
Question: What’s the least-cost path between u and z ?
Routing algorithm: algorithm that finds least-cost path
Network Layer 4-54
Routing algorithms
Global or decentralized
information?
Global:
 all routers have complete topology,
link cost info
 “link state” algorithms (OSPF)
Decentralized:
 router knows physically-connected
neighbors, link costs to neighbors
 iterative process of computation,
exchange of info with neighbors
 “distance vector” algorithms (RIP)
Static or dynamic?
Static:
 routes change slowly over
time
Dynamic:
 routes change more quickly
periodic update
in response to link cost
changes
Network Layer 4-55
Interplay between routing, forwarding
routing algorithm
local forwarding table
header value output link
0100
0101
0111
1001
3
2
2
1
value in arriving
packet’s header
0111
1
3 2
Network Layer 4-56
Hierarchical routing for scalability
Our routing study thus far - idealization
 all routers identical
 network “flat”
… not true in practice
scale: with 200 million
destinations:
 can’t store all dest’s in
routing tables!
 routing table exchange
would swamp links!
administrative autonomy
 internet = network of
networks
 each network admin may
want to control routing in its
own network
Network Layer 4-57
Hierarchical addressing: route aggregation
Hierarchical addressing allows efficient advertisement of routing
information:
Organization 0
200.23.16.0/23
Organization 1
200.23.18.0/23
Organization 2
200.23.20.0/23
Organization 7
.
.
.
.
.
.
Fly-By-Night-ISP
“Send me anything
with addresses
beginning
200.23.16.0/20”
Internet
200.23.30.0/23
ISPs-R-Us
“Send me anything
with addresses
beginning
199.31.0.0/16”
Network Layer 4-58
Hierarchical addressing: more specific
routes
ISPs-R-Us has a more specific route to Organization 1
Organization 0
200.23.16.0/23
Organization 2
200.23.20.0/23
Organization 7
.
.
.
.
.
.
Fly-By-Night-ISP
“Send me anything
with addresses
beginning
200.23.16.0/20”
Internet
200.23.30.0/23
ISPs-R-Us
Organization 1
200.23.18.0/23
“Send me anything
with addresses
beginning 199.31.0.0/16
or 200.23.18.0/23”
Network Layer 4-59
Hierarchical routing for scalability
 aggregate routers into
regions, “autonomous
systems” (AS)
 routers in same AS run
same routing protocol
Gateway router
 Direct link to router in
another AS
“intra-AS” routing
protocol
routers in different AS
can run different intraAS routing protocol
Network Layer 4-60
Interconnected ASs
3c
3a
3b
AS3
1a
2a
1c
1d
1b
Intra-AS
Routing
algorithm
2c
AS2
AS1
Inter-AS
Routing
algorithm
Forwarding
table
2b
 forwarding table
configured by both
intra- and inter-AS
routing algorithm
intra-AS sets entries
for internal dests
inter-AS & intra-AS
sets entries for
external dests
Network Layer 4-61
Internet inter-AS routing: BGP
 BGP (Border Gateway Protocol):
the de facto standard
 BGP provides each AS a means to:
1.
2.
3.
Obtain subnet reachability information from
neighboring ASs.
Propagate reachability information to all ASinternal routers.
Determine “good” routes to subnets based on
reachability information and policy.
 allows subnet to advertise its existence to
rest of Internet: “I am here”
Network Layer 4-62
BGP basics
 pairs of routers (BGP peers) exchange routing info
over TCP connections (called BGP sessions)
 BGP sessions need not correspond to physical
links.
 when AS2 advertises prefix “200.23.16.0/23” to AS1:
 AS2 promises it will forward datagrams towards
that prefix.
 AS2 can aggregate prefixes in its advertisement
eBGP session
3c
3a
3b
AS3
1a
AS1
iBGP session
2a
1c
1d
2c
AS2
2b
1b
Network Layer 4-63
Distributing reachability info
 using eBGP session between 3a and 1c, AS3 sends
prefix reachability info to AS1.
 1c can then use iBGP do distribute new prefix
info to all routers in AS1
 1b can then re-advertise new reachability info
to AS2 over 1b-to-2a eBGP session
 when router learns of new prefix, it creates entry
for prefix in its forwarding table.
3c
3a
3b
AS3
Any dest w/
IP addr AS1
should be
routed to 1c
Any dest w/
IP addr AS1
should be
routed to 2a
eBGP session
1a
AS1
iBGP session
2a
1c
1d
2c
AS2
2b
1b
Network Layer 4-64
Path attributes & BGP routes
 advertised prefix includes BGP attributes.
 prefix + attributes = “route”
 two important attributes:
 AS-PATH: contains ASs through which prefix
advertisement has passed: e.g, AS 67, AS 17
 NEXT-HOP: indicates specific internal-AS router
to next-hop AS. (may be multiple links from
current AS to next-hop-AS)
 when gateway router receives route
advertisement, uses local import policy to
accept/decline.
Network Layer 4-65
BGP route selection
 router may learn about more than 1 route
to some prefix. Router must select route.
 elimination rules:
1.
2.
3.
4.
local preference value attribute: policy
decision
shortest AS-PATH
closest NEXT-HOP router: hot potato routing
additional criteria
Network Layer 4-66
					 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            