Download L09_TCP_IP - Interactive Computing Lab

Document related concepts

Zigbee wikipedia , lookup

Net bias wikipedia , lookup

Distributed firewall wikipedia , lookup

Multiprotocol Label Switching wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Asynchronous Transfer Mode wikipedia , lookup

Wake-on-LAN wikipedia , lookup

CAN bus wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Computer network wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Network tap wikipedia , lookup

Airborne Networking wikipedia , lookup

Deep packet inspection wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

IEEE 1355 wikipedia , lookup

TCP congestion control wikipedia , lookup

Internet protocol suite wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Transcript
Chapter 3
Transport Layer
A note on the use of these ppt slides:
We’re making these slides freely available to all (faculty, students, readers).
They’re in PowerPoint form so you can add, modify, and delete slides
(including this one) and slide content to suit your needs. They obviously
represent a lot of work on our part. In return for use, we only ask the
following:
 If you use these slides (e.g., in a class) in substantially unaltered form,
that you mention their source (after all, we’d like people to use our book!)
 If you post any slides in substantially unaltered form on a www site, that
you note that they are adapted from (or perhaps identical to) our slides, and
note our copyright of this material.
Computer Networking:
A Top Down Approach
5th edition.
Jim Kurose, Keith Ross
Addison-Wesley, April
2009.
Thanks and enjoy! JFK/KWR
All material copyright 1996-2009
J.F Kurose and K.W. Ross, All Rights Reserved
Transport Layer
3-1
Chapter 3: Transport Layer
Our goals:
 understand principles
behind transport layer
services:




 learn about transport
multiplexing/demultiplexing
reliable data transfer
flow control
congestion control
layer protocols in the
Internet:


UDP: connectionless
transport
TCP: connection-oriented
transport
• TCP congestion control
Transport Layer
3-2
Transport services and protocols
 provide logical communication
between app processes
running on different hosts
 transport protocols run in
end systems
 send side: breaks app
messages into segments,
passes to network layer
 rcv side: reassembles
segments into messages,
passes to app layer
 more than one transport
protocol available to apps
 Internet: TCP and UDP
application
transport
network
data link
physical
application
transport
network
data link
physical
Transport Layer
3-3
Transport vs. network layer
 network layer: logical
communication
between hosts
 transport layer: logical
communication
between processes

relies on, enhances,
network layer services
Household analogy:
12 kids sending letters to
12 kids
 processes = kids
 app messages = letters
in envelopes
 hosts = houses
 transport protocol =
Ann and Bill
 network-layer protocol
= postal service
Transport Layer
3-4
Internet transport-layer protocols
 reliable, in-order
delivery (TCP)



congestion control
flow control
connection setup
 unreliable, unordered
delivery: UDP

no-frills extension of
“best-effort” IP
 services not available:
 delay guarantees
 bandwidth guarantees
application
transport
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physicalnetwork
network
data link
physical
data link
physical
network
data link
physical
application
transport
network
data link
physical
Transport Layer
3-5
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP




segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer
3-6
Multiplexing/demultiplexing
Multiplexing at send host:
gathering data from multiple
sockets, enveloping data
with header (later used for
demultiplexing)
Demultiplexing at rcv host:
delivering received segments
to correct socket
= socket
application
transport
network
link
= process
P3
P1
P1
application
transport
network
P2
P4
application
transport
network
link
link
physical
host 1
physical
host 2
physical
host 3
Transport Layer
3-7
How demultiplexing works
 host receives IP datagrams
each datagram has source
IP address, destination IP
address
 each datagram carries
1 transport-layer segment
 each segment has source,
destination port number
 host uses IP addresses &
port numbers to direct
segment to appropriate
socket

32 bits
source port #
dest port #
other header fields
application
data
(message)
TCP/UDP segment format
Transport Layer
3-8
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP




segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer
3-9
UDP: User Datagram Protocol [RFC 768]
 “no frills,” “bare bones”
Internet transport
protocol
 “best effort” service, UDP
segments may be:
 lost
 delivered out of order
to app
 connectionless:
 no handshaking between
UDP sender, receiver
 each UDP segment
handled independently
of others
Why is there a UDP?
 no connection
establishment (which can
add delay)
 simple: no connection state
at sender, receiver
 small segment header
 no congestion control: UDP
can blast away as fast as
desired
Transport Layer 3-10
UDP: more
 often used for streaming
multimedia apps
 loss tolerant
 rate sensitive
Length, in
bytes of UDP
segment,
including
header
 other UDP uses
 DNS
 SNMP
 reliable transfer over UDP:
add reliability at
application layer
 application-specific
error recovery!
32 bits
source port #
dest port #
length
checksum
Application
data
(message)
UDP segment format
Transport Layer
3-11
TCP: Overview
 point-to-point:
 one sender, one receiver
 reliable, in-order byte
steam:

no “message boundaries”
 pipelined:
 TCP congestion and flow
control set window size
 send & receive buffers
socket
door
application
writes data
application
reads data
TCP
send buffer
TCP
receive buffer
RFCs: 793, 1122, 1323, 2018, 2581
 full duplex data:
 bi-directional data flow
in same connection
 MSS: maximum segment
size
 connection-oriented:
 handshaking (exchange
of control msgs) init’s
sender, receiver state
before data exchange
 flow controlled:
 sender will not
socket
door
overwhelm receiver
segment
Transport Layer 3-12
TCP segment structure
32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
Internet
checksum
(as in UDP)
source port #
dest port #
sequence number
acknowledgement number
head not
UA P R S F
len used
checksum
Receive window
Urg data pnter
Options (variable length)
counting
by bytes
of data
(not segments!)
# bytes
rcvr willing
to accept
application
data
(variable length)
Transport Layer 3-13
TCP Connection Management
Recall: TCP sender, receiver establish “connection” before
exchanging data segments
 initialize TCP variables (discussed later):
 seq. #s
 buffers, flow control info (e.g. RcvWindow)
 client: connection initiator
Socket clientSocket = new Socket("hostname","port number");
 server: contacted by client
Socket connectionSocket = welcomeSocket.accept();
Transport Layer 3-14
TCP Connection Management (cont.)
Open a connection:
client open socket:
Socket cs = new
Socket(“hostname”, 80)
Step 1: client host sends TCP SYN
segment to server
 specifies initial seq #
 no data
client
server
open
data
Step 2: server host receives SYN,
replies with SYN-ACK segment
server allocates buffers
 specifies server initial seq. #
Step 3: client receives SYN-ACK,
replies with ACK segment, which
may contain data

Transport Layer 3-15
TCP Connection Management (cont.)
Closing a connection:
client closes socket:
clientSocket.close();
client
server
close
Step 1: client end system
sends FIN control segment
to server
Step 2: server receives
FIN, replies with ACK.
Closes connection, sends
FIN.
Transport Layer 3-16
TCP Connection Management (cont.)
Step 3: client receives FIN,
replies with ACK.

client
server
closing
Enters “timed wait” will respond with ACK
to received FINs
closing
Step 4: server, receives
Note: why do we wait?
timed wait
ACK. Connection closed.
closed
closed
Transport Layer 3-17
TCP seq. #’s and ACKs
Host A
Seq. #’s:
 byte stream
“number” of first
byte in segment’s
data
ACKs:
 seq # of next byte
expected from
other side
 cumulative ACK
User
types
‘C’
Host B
host ACKs
receipt of
‘C’, echoes
back ‘C’
host ACKs
receipt
of echoed
‘C’
simple telnet scenario
time
Transport Layer 3-18
Stop-and-wait (for reliability)
sender
receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R
first packet bit arrives
last packet bit arrives, send ACK
RTT
ACK arrives, send next
packet, t = RTT + L / R
U
sender
=
L/R
RTT + L / R
=
.008
30.008
= 0.00027
microsec
onds
Transport Layer 3-19
Pipelined protocols
Pipelining: sender allows multiple, “in-flight”,
yet-to-be-acknowledged pkts


range of sequence numbers must be increased
buffering at sender and/or receiver
 Two generic forms of pipelined protocols: go-Back-N,
selective repeat
Transport Layer 3-20
Pipelining: increased utilization
sender
receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R
first packet bit arrives
last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
RTT
ACK arrives, send next
packet, t = RTT + L / R
Increase utilization
by a factor of 3!
U
sender
=
3*L/R
RTT + L / R
=
.024
30.008
= 0.0008
microsecon
ds
Transport Layer 3-21
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP




segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-22
TCP reliable data transfer
 TCP creates rdt
service on top of IP’s
unreliable service
 Pipelined segments
 Cumulative acks
 TCP uses single
retransmission timer
 Retransmissions are
triggered by:


timeout events
duplicate acks
 Initially consider
simplified TCP sender:


ignore duplicate acks
ignore flow control,
congestion control
Transport Layer 3-23
TCP sender events:
data rcvd from app:
 Create segment with
seq #

seq # is byte-stream
number of first data
byte in segment
 start timer if not
already running (think
of timer as for oldest
unacked segment)

expiration interval:
TimeOutInterval
timeout:
 retransmit segment
that caused timeout
 restart timer
Ack rcvd:
 If acknowledges
previously unacked
segments


update what is known to
be acked
start timer if there are
outstanding segments
Transport Layer 3-24
TCP: retransmission scenarios
Host A
X
loss
Sendbase
= 100
SendBase
= 120
SendBase
= 100
time
SendBase
= 120
lost ACK scenario
Host B
Seq=92 timeout
Host B
Seq=92 timeout
timeout
Host A
time
premature timeout
Transport Layer 3-25
TCP retransmission scenarios (more)
timeout
Host A
Host B
X
loss
SendBase
= 120
time
Cumulative ACK scenario
Transport Layer 3-26
TCP ACK generation recommendation
[RFC 1122, RFC 2581]
Event at Receiver
TCP Receiver action
Arrival of in-order segment with
expected seq #. All data up to
expected seq # already ACKed
Delayed ACK. Wait up to 500ms
for next segment. If no next segment,
send ACK
Arrival of in-order segment with
expected seq #. One other
segment has ACK pending
Immediately send single cumulative
ACK, ACKing both in-order segments
Arrival of out-of-order segment
higher-than-expect seq. # .
Gap detected
Immediately send duplicate ACK,
indicating seq. # of next expected byte
Arrival of segment that
partially or completely fills gap
Immediate send ACK, provided that
segment starts at lower end of gap
Transport Layer 3-27
Fast Retransmit
 Time-out period often
relatively long:

long delay before
resending lost packet
 Detect lost segments
via duplicate ACKs.


Sender often sends
many segments back-toback
If segment is lost,
there will likely be many
duplicate ACKs.
 If sender receives 3
ACKs for the same
data, it supposes that
segment after ACKed
data was lost:

fast retransmit: resend
segment before timer
expires
Transport Layer 3-28
Host A
Host B
timeout
X
time
Figure 3.37 Resending a segment after triple duplicate ACK
Transport Layer 3-29
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP




segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-30
TCP Flow Control
 receive side of TCP
connection has a
receive buffer:
flow control
sender won’t overflow
receiver’s buffer by
transmitting too much,
too fast
 speed-matching
 app process may be
service: matching the
send rate to the
receiving app’s drain
rate
slow at reading from
buffer
Transport Layer 3-31
TCP Flow control: how it works
 Rcvr advertises spare
(Suppose TCP receiver
discards out-of-order
segments)
 spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd LastByteRead]
room by including value
of RcvWindow in
segments
 Sender limits unACKed
data to RcvWindow


guarantees receive
buffer doesn’t overflow
LastByteSentLastByteAcked <=
RcvWindow
Transport Layer 3-32
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP




segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-33
TCP congestion control:
additive increase, multiplicative decrease (AIMD)
 Approach: increase transmission rate (window size),
Saw tooth
behavior: probing
for bandwidth
congestion window size
probing for usable bandwidth, until loss occurs
 additive increase: increase CongWin by 1 MSS
every RTT until loss detected
 multiplicative decrease: cut CongWin in half after
loss
congestion
window
24 Kbytes
16 Kbytes
8 Kbytes
time
time
Transport Layer 3-34
TCP Congestion Control: details
 sender limits transmission:
LastByteSent-LastByteAcked
 min{CongWin, RecvWindow}
 Roughly,
rate =
CongWin
Bytes/sec
RTT
 CongWin is dynamic, function
of perceived network
congestion
How does sender
perceive congestion?
 loss event = timeout or
3 duplicate acks
 TCP sender reduces
rate (CongWin) after
loss event
three mechanisms:



AIMD
slow start
conservative after
timeout events
Transport Layer 3-35
TCP Slow Start
 When connection begins,
CongWin = 1 MSS


Example: MSS = 500
bytes & RTT = 200 msec
initial rate = 20 kbps
 When connection begins,
increase rate
exponentially fast until
first loss event
 available bandwidth may
be >> MSS/RTT

desirable to quickly ramp
up to respectable rate
Transport Layer 3-36
TCP Slow Start (more)
 When connection


Host B
RTT
begins, increase rate
exponentially until
first loss event:
Host A
double CongWin every
RTT
done by incrementing
CongWin for every ACK
received
 Summary: initial rate
is slow but ramps up
exponentially fast
time
Transport Layer 3-37
Refinement: inferring loss
 After 3 dup ACKs:
is cut in half
 window then grows
linearly
 But after timeout event:
 CongWin instead set to
1 MSS;
 window then grows
exponentially
 to a threshold, then
grows linearly
 CongWin
Philosophy:
 3 dup ACKs indicates
network capable of
delivering some segments
 timeout indicates a
“more alarming”
congestion scenario
Transport Layer 3-38
Refinement
Q: When should the
exponential
increase switch to
linear?
A: When CongWin
gets to 1/2 of its
value before
timeout.
Implementation:
 Variable Threshold
 At loss event, Threshold is
set to 1/2 of CongWin just
before loss event
Transport Layer 3-39
Summary: TCP Congestion Control
 When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially.
 When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows linearly.
 When a triple duplicate ACK occurs, Threshold
set to CongWin/2 and CongWin set to
Threshold.
 When timeout occurs, Threshold set to
CongWin/2 and CongWin is set to 1 MSS.
Transport Layer 3-40
TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K
TCP connection 1
TCP
connection 2
bottleneck
router
capacity R
Transport Layer 3-41
Why is TCP fair?
Two competing sessions:
 Additive increase gives slope of 1, as throughout increases
 multiplicative decrease decreases throughput proportionally
R
equal bandwidth share
loss: decrease window by factor of 2
congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase
Connection 1 throughput R
Transport Layer 3-42
Fairness (more)
Fairness and UDP
 Multimedia apps often
do not use TCP

do not want rate
throttled by congestion
control
 Instead use UDP:
 pump audio/video at
constant rate, tolerate
packet loss
 Research area: TCP
friendly
Fairness and parallel TCP
connections
 nothing prevents app from
opening parallel
connections between 2
hosts.
 Web browsers do this
 Example: link of rate R
supporting 9 connections;


new app asks for 1 TCP, gets
rate R/10
new app asks for 11 TCPs,
gets R/2 !
Transport Layer 3-43
Chapter 3: Summary
 principles behind transport
layer services:
 multiplexing,
demultiplexing
 reliable data transfer
 flow control
 congestion control
 instantiation and
implementation in the
Internet
 UDP
 TCP
Next:
 leaving the network
“edge” (application,
transport layers)
 into the network
“core”
Transport Layer 3-44
Chapter 4
Network Layer
A note on the use of these ppt slides:
We’re making these slides freely available to all (faculty, students, readers).
They’re in PowerPoint form so you can add, modify, and delete slides
(including this one) and slide content to suit your needs. They obviously
represent a lot of work on our part. In return for use, we only ask the
following:
 If you use these slides (e.g., in a class) in substantially unaltered form,
that you mention their source (after all, we’d like people to use our book!)
 If you post any slides in substantially unaltered form on a www site, that
you note that they are adapted from (or perhaps identical to) our slides, and
note our copyright of this material.
Computer Networking: A
Top Down Approach
5th edition.
Jim Kurose, Keith Ross
Addison-Wesley, April
2009.
Thanks and enjoy! JFK/KWR
All material copyright 1996-2009
J.F Kurose and K.W. Ross, All Rights Reserved
Network Layer 4-45
Chapter 4: Network Layer
Chapter goals:
 understand principles behind network layer
services:
IP addresses (+ getting an IP address via DHCP)
 Routing algorithms
 Network of networks (BGP, dealing with scales)
 ICMP
 NAT (network address translation)

Network Layer 4-46
Datagram networks
 no call setup at network layer
 routers: no state about end-to-end connections
 no network-level concept of “connection”
 packets forwarded using destination host address
 packets between same source-dest pair may take
different paths
application
transport
network
data link
physical
1. Send data
2. Receive data
application
transport
network
data link
physical
Network Layer 4-47
The Internet Network layer
Host, router network layer functions:
Transport layer: TCP, UDP
Network
layer
IP protocol
•addressing conventions
•datagram format
•packet handling conventions
Routing protocols
•path selection
•RIP, OSPF, BGP
forwarding
table
ICMP protocol
•error reporting
•router “signaling”
Link layer
physical layer
Network Layer 4-48
IP datagram format
IP protocol version
number
header length
(bytes)
“type” of data
max number
remaining hops
(decremented at
each router)
upper layer protocol
to deliver payload to
how much overhead
with TCP?
 20 bytes of TCP
 20 bytes of IP
 = 40 bytes + app
layer overhead
32 bits
ver
head. type of
len service
16-bit identifier
time to
live
upper
layer
total datagram
length (bytes)
length
fragment
flgs
offset
header
checksum
for
fragmentation/
reassembly
32 bit source IP address
32 bit destination IP address
E.g. timestamp,
record route
taken, specify
list of routers
to visit.
Options (if any)
data
(variable length,
typically a TCP
or UDP segment)
Application
TCP/UDP
IP
Ethernet
Subnets
 IP address:
 subnet part (high
order bits)
 host part (low order
bits)
 What’s a subnet ?
 device interfaces with
same subnet part of IP
address
 can physically reach
each other without
intervening router
223.1.1.1
223.1.2.1
223.1.1.2
223.1.1.4
223.1.1.3
223.1.2.9
223.1.3.27
223.1.2.2
subnet
223.1.3.1
223.1.3.2
network consisting of 3 subnets
Network Layer 4-50
IP addressing: CIDR
CIDR: Classless InterDomain Routing
subnet portion of address of arbitrary length
 address format: a.b.c.d/x, where x is # bits in
subnet portion of address

subnet
part
host
part
11001000 00010111 00010000 00000000
200.23.16.0/23
Network Layer 4-51
IP addresses: how to get one?
Q: How does a host get IP address?
 hard-coded by system admin in a file
Windows: control-panel->network->configuration>tcp/ip->properties
 Linux (ubuntu): /etc/network/interface
 DHCP: Dynamic Host Configuration Protocol:
dynamically get address from as server
 “plug-and-play”

Network Layer 4-52
DHCP: Dynamic Host Configuration Protocol
Goal: allow host to dynamically obtain its IP address
from network server when it joins network
Can renew its lease on address in use
Allows reuse of addresses (only hold address while connected
an “on”)
Support for mobile users who want to join network (more
shortly)
DHCP overview:
1. host broadcasts “DHCP discover” msg
2. DHCP server responds with “DHCP offer” msg
3. host requests IP address: “DHCP request” msg
4. DHCP server sends address: “DHCP ack” msg
Network Layer 4-53
Graph abstraction of a network
5
2
u
v
2
1
x
3
w
3
1
• c(x,x’) = cost of link (x,x’)
5
z
1
y
2
Graph: G = (N,E)
N = set of routers = { u, v, w, x, y, z }
E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w),
(x,y), (w,y), (w,z), (y,z) }
- e.g., c(w,z) = 5
• cost could always be 1, or
inversely related to bandwidth,
or inversely related to
congestion
Cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)
Question: What’s the least-cost path between u and z ?
Routing algorithm: algorithm that finds least-cost path
Network Layer 4-54
Routing algorithms
Global or decentralized
information?
Global:
 all routers have complete topology,
link cost info
 “link state” algorithms (OSPF)
Decentralized:
 router knows physically-connected
neighbors, link costs to neighbors
 iterative process of computation,
exchange of info with neighbors
 “distance vector” algorithms (RIP)
Static or dynamic?
Static:
 routes change slowly over
time
Dynamic:
 routes change more quickly


periodic update
in response to link cost
changes
Network Layer 4-55
Interplay between routing, forwarding
routing algorithm
local forwarding table
header value output link
0100
0101
0111
1001
3
2
2
1
value in arriving
packet’s header
0111
1
3 2
Network Layer 4-56
Hierarchical routing for scalability
Our routing study thus far - idealization
 all routers identical
 network “flat”
… not true in practice
scale: with 200 million
destinations:
 can’t store all dest’s in
routing tables!
 routing table exchange
would swamp links!
administrative autonomy
 internet = network of
networks
 each network admin may
want to control routing in its
own network
Network Layer 4-57
Hierarchical addressing: route aggregation
Hierarchical addressing allows efficient advertisement of routing
information:
Organization 0
200.23.16.0/23
Organization 1
200.23.18.0/23
Organization 2
200.23.20.0/23
Organization 7
.
.
.
.
.
.
Fly-By-Night-ISP
“Send me anything
with addresses
beginning
200.23.16.0/20”
Internet
200.23.30.0/23
ISPs-R-Us
“Send me anything
with addresses
beginning
199.31.0.0/16”
Network Layer 4-58
Hierarchical addressing: more specific
routes
ISPs-R-Us has a more specific route to Organization 1
Organization 0
200.23.16.0/23
Organization 2
200.23.20.0/23
Organization 7
.
.
.
.
.
.
Fly-By-Night-ISP
“Send me anything
with addresses
beginning
200.23.16.0/20”
Internet
200.23.30.0/23
ISPs-R-Us
Organization 1
200.23.18.0/23
“Send me anything
with addresses
beginning 199.31.0.0/16
or 200.23.18.0/23”
Network Layer 4-59
Hierarchical routing for scalability
 aggregate routers into
regions, “autonomous
systems” (AS)
 routers in same AS run
same routing protocol


Gateway router
 Direct link to router in
another AS
“intra-AS” routing
protocol
routers in different AS
can run different intraAS routing protocol
Network Layer 4-60
Interconnected ASs
3c
3a
3b
AS3
1a
2a
1c
1d
1b
Intra-AS
Routing
algorithm
2c
AS2
AS1
Inter-AS
Routing
algorithm
Forwarding
table
2b
 forwarding table
configured by both
intra- and inter-AS
routing algorithm


intra-AS sets entries
for internal dests
inter-AS & intra-AS
sets entries for
external dests
Network Layer 4-61
Internet inter-AS routing: BGP
 BGP (Border Gateway Protocol):
the de facto standard
 BGP provides each AS a means to:
1.
2.
3.
Obtain subnet reachability information from
neighboring ASs.
Propagate reachability information to all ASinternal routers.
Determine “good” routes to subnets based on
reachability information and policy.
 allows subnet to advertise its existence to
rest of Internet: “I am here”
Network Layer 4-62
BGP basics
 pairs of routers (BGP peers) exchange routing info
over TCP connections (called BGP sessions)
 BGP sessions need not correspond to physical
links.
 when AS2 advertises prefix “200.23.16.0/23” to AS1:
 AS2 promises it will forward datagrams towards
that prefix.
 AS2 can aggregate prefixes in its advertisement
eBGP session
3c
3a
3b
AS3
1a
AS1
iBGP session
2a
1c
1d
2c
AS2
2b
1b
Network Layer 4-63
Distributing reachability info
 using eBGP session between 3a and 1c, AS3 sends
prefix reachability info to AS1.
 1c can then use iBGP do distribute new prefix
info to all routers in AS1
 1b can then re-advertise new reachability info
to AS2 over 1b-to-2a eBGP session
 when router learns of new prefix, it creates entry
for prefix in its forwarding table.
3c
3a
3b
AS3
Any dest w/
IP addr AS1
should be
routed to 1c
Any dest w/
IP addr AS1
should be
routed to 2a
eBGP session
1a
AS1
iBGP session
2a
1c
1d
2c
AS2
2b
1b
Network Layer 4-64
Path attributes & BGP routes
 advertised prefix includes BGP attributes.
 prefix + attributes = “route”
 two important attributes:
 AS-PATH: contains ASs through which prefix
advertisement has passed: e.g, AS 67, AS 17
 NEXT-HOP: indicates specific internal-AS router
to next-hop AS. (may be multiple links from
current AS to next-hop-AS)
 when gateway router receives route
advertisement, uses local import policy to
accept/decline.
Network Layer 4-65
BGP route selection
 router may learn about more than 1 route
to some prefix. Router must select route.
 elimination rules:
1.
2.
3.
4.
local preference value attribute: policy
decision
shortest AS-PATH
closest NEXT-HOP router: hot potato routing
additional criteria
Network Layer 4-66