Download Part I: Introduction

Document related concepts

Peering wikipedia , lookup

Zigbee wikipedia , lookup

CAN bus wikipedia , lookup

Net bias wikipedia , lookup

Distributed firewall wikipedia , lookup

Asynchronous Transfer Mode wikipedia , lookup

Deep packet inspection wikipedia , lookup

Multiprotocol Label Switching wikipedia , lookup

IEEE 1355 wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Wake-on-LAN wikipedia , lookup

Computer network wikipedia , lookup

Network tap wikipedia , lookup

Internet protocol suite wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Airborne Networking wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Routing wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Transcript
Chapter 4: Network Layer
Chapter goals:
Chapter Overview:
 understand principles
 network layer services
behind network layer
services:




routing (path selection)
dealing with scale
how a router works
advanced topics: IPv6,
multicast
 instantiation and
implementation in the
Internet
 routing principle: path
selection
 hierarchical routing
 IP
 Internet routing protocols
reliable transfer


intra-domain
inter-domain
 what’s inside a router?
 IPv6
 multicast routing
4: Network Layer
4a-1
Network layer functions
 transport packet from
sending to receiving hosts
 network layer protocols in
every host, router
three important functions:
 path determination: route
taken by packets from source
to dest. Routing algorithms
 switching: move packets from
router’s input to appropriate
router output
 call setup: some network
architectures require router
call setup along path before
data flows
application
transport
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
application
transport
network
data link
physical
4: Network Layer
4a-2
Network service model
Q: What service model
for “channel”
transporting packets
from sender to
receiver?
 guaranteed bandwidth?
 preservation of inter-packet
timing (no jitter)?
 loss-free delivery?
 in-order delivery?
 congestion feedback to
sender?
The most important
abstraction provided
by network layer:
? ?
?
virtual circuit
or
datagram?
4: Network Layer
4a-3
Virtual circuits
“source-to-dest path behaves much like telephone
circuit”


performance-wise
network actions along source-to-dest path
 call setup, teardown for each call before data can flow
 each packet carries VC identifier (not destination host OD)
 every router on source-dest path s maintain “state” for
each passing connection

transport-layer connection only involved two end systems
 link, router resources (bandwidth, buffers) may be
allocated to VC

to get circuit-like perf.
4: Network Layer
4a-4
Virtual circuits: signaling protocols
 used to setup, maintain teardown VC
 used in ATM, frame-relay, X.25
 not used in today’s Internet
application
transport 5. Data flow begins
network 4. Call connected
data link 1. Initiate call
physical
6. Receive data application
3. Accept call
2. incoming call
transport
network
data link
physical
4: Network Layer
4a-5
Datagram networks:
the Internet model
 no call setup at network layer
 routers: no state about end-to-end connections
 no network-level concept of “connection”
 packets typically routed using destination host ID
 packets between same source-dest pair may take
different paths
application
transport
network
data link 1. Send data
physical
application
transport
network
2. Receive data
data link
physical
4: Network Layer
4a-6
Network layer service models:
Network
Architecture
Internet
Service
Model
Guarantees ?
Congestion
Bandwidth Loss Order Timing feedback
best effort none
ATM
CBR
ATM
VBR
ATM
ABR
ATM
UBR
constant
rate
guaranteed
rate
guaranteed
minimum
none
no
no
no
yes
yes
yes
yes
yes
yes
no
yes
no
no (inferred
via loss)
no
congestion
no
congestion
yes
no
yes
no
no
 Internet model being extented: Intserv, Diffserv

Chapter 6
4: Network Layer
4a-7
Datagram or VC network: why?
Internet
 data exchange among
ATM
 evolved from telephony
computers
 human conversation:
 “elastic” service, no strict
 strict timing, reliability
timing req.
requirements
 “smart” end systems
 need for guaranteed
(computers)
service
 can adapt, perform
 “dumb” end systems
control, error recovery
 telephones
 simple inside network,
 complexity inside
complexity at “edge”
network
 many link types
 different characteristics
 uniform service difficult
4: Network Layer
4a-8
Routing
Routing protocol
Goal: determine “good” path
(sequence of routers) thru
network from source to dest.
Graph abstraction for
routing algorithms:
 graph nodes are
routers
 graph edges are
physical links

link cost: delay, $ cost,
or congestion level
5
2
A
B
2
1
D
3
C
3
1
5
F
1
E
2
 “good” path:
 typically means minimum
cost path
 other def’s possible
4: Network Layer
4a-9
Routing Algorithm classification
Global or decentralized
information?
Global:
 all routers have complete
topology, link cost info
 “link state” algorithms
Decentralized:
 router knows physicallyconnected neighbors, link
costs to neighbors
 iterative process of
computation, exchange of
info with neighbors
 “distance vector” algorithms
Static or dynamic?
Static:
 routes change slowly over
time
Dynamic:
 routes change more quickly
 periodic update
 in response to link cost
changes
4: Network Layer 4a-10
A Link-State Routing Algorithm
Dijkstra’s algorithm
 net topology, link costs
known to all nodes
 accomplished via “link
state broadcast”
 all nodes have same info
 computes least cost paths
from one node (‘source”) to
all other nodes
 gives routing table for
that node
 iterative: after k
iterations, know least cost
path to k dest.’s
Notation:
 c(i,j): link cost from node i
to j. cost infinite if not
direct neighbors
 D(v): current value of cost
of path from source to
dest. V
 p(v): predecessor node
along path from source to
v, that is next v
 N: set of nodes whose
least cost path definitively
known
4: Network Layer 4a-11
Dijsktra’s Algorithm
1 Initialization:
2 N = {A}
3 for all nodes v
4
if v adjacent to A
5
then D(v) = c(A,v)
6
else D(v) = infty
7
8 Loop
9 find w not in N such that D(w) is a minimum
10 add w to N
11 update D(v) for all v adjacent to w and not in N:
12
D(v) = min( D(v), D(w) + c(w,v) )
13 /* new cost to v is either old cost to v or known
14 shortest path cost to w plus cost from w to v */
15 until all nodes in N
4: Network Layer 4a-12
Dijkstra’s algorithm: example
Step
0
1
2
3
4
5
start N
A
AD
ADE
ADEB
ADEBC
ADEBCF
D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F)
2,A
1,A
5,A
infinity
infinity
2,A
4,D
2,D
infinity
2,A
3,E
4,E
3,E
4,E
4,E
5
2
A
B
2
1
D
3
C
3
1
5
F
1
E
2
4: Network Layer 4a-13
Dijkstra’s algorithm, discussion
Algorithm complexity: n nodes
 each iteration: need to check all nodes, w, not in N
 n*(n+1)/2 comparisons: O(n**2)
 more efficient implementations possible: O(nlogn)
Oscillations possible:
 e.g., link cost = amount of carried traffic
D
1
1
0
A
0 0
C
e
1+e
B
e
initially
2+e
D
0
1
A
1+e 1
C
0
B
0
… recompute
routing
0
D
1
A
0 0
2+e
B
C 1+e
… recompute
2+e
D
0
A
1+e 1
C
0
B
e
… recompute
4: Network Layer 4a-14
Distance Vector Routing Algorithm
iterative:
 continues until no
nodes exchange info.
 self-terminating: no
“signal” to stop
asynchronous:
 nodes need not
exchange info/iterate
in lock step!
distributed:
 each node
communicates only with
directly-attached
neighbors
Distance Table data structure
 each node has its own
 row for each possible destination
 column for each directly-
attached neighbor to node
 example: in node X, for dest. Y
via neighbor Z:
X
D (Y,Z)
distance from X to
= Y, via Z as next hop
= c(X,Z) + min {DZ(Y,w)}
w
4: Network Layer 4a-15
Distance Table: example
7
A
B
1
C
E
cost to destination via
D ()
A
B
D
A
1
14
5
B
7
8
5
C
6
9
4
D
4
11
2
2
8
1
E
2
D
E
D (C,D) = c(E,D) + min {DD(C,w)}
= 2+2 = 4
w
E
D (A,D) = c(E,D) + min {DD(A,w)}
E
w
= 2+3 = 5
loop!
D (A,B) = c(E,B) + min {D B(A,w)}
= 8+6 = 14
w
loop!
4: Network Layer 4a-16
Distance table gives routing table
E
cost to destination via
Outgoing link
D ()
A
B
D
A
1
14
5
A
A,1
B
7
8
5
B
D,5
C
6
9
4
C
D,4
D
4
11
2
D
D,4
Distance table
to use, cost
Routing table
4: Network Layer 4a-17
Distance Vector Routing: overview
Iterative, asynchronous:
each local iteration caused
by:
 local link cost change
 message from neighbor: its
least cost path change
from neighbor
Distributed:
 each node notifies
neighbors only when its
least cost path to any
destination changes

neighbors then notify
their neighbors if
necessary
Each node:
wait for (change in local link
cost of msg from neighbor)
recompute distance table
if least cost path to any dest
has changed, notify
neighbors
4: Network Layer 4a-18
Distance Vector Algorithm:
At all nodes, X:
1 Initialization:
2 for all adjacent nodes v:
3
D X(*,v) = infty
/* the * operator means "for all rows" */
4
D X(v,v) = c(X,v)
5 for all destinations, y
6
send min D X(y,w) to each neighbor /* w over all X's neighbors */
w
4: Network Layer 4a-19
Distance Vector Algorithm (cont.):
8 loop
9 wait (until I see a link cost change to neighbor V
10
or until I receive update from neighbor V)
11
12 if (c(X,V) changes by d)
13 /* change cost to all dest's via neighbor v by d */
14 /* note: d could be positive or negative */
15 for all destinations y: D X(y,V) = D X(y,V) + d
16
17 else if (update received from V wrt destination Y)
18 /* shortest path from V to some Y has changed */
19 /* V has sent a new value for its min DV(Y,w) */
w
20 /* call this received new value is "newval" */
21 for the single destination y: D X(Y,V) = c(X,V) + newval
22
23 if we have a new min DX(Y,w)for any destination Y
w
24
send new value of min D X(Y,w) to all neighbors
w
25
4: Network Layer
26 forever
4a-20
Distance Vector Algorithm: example
X
2
Y
7
1
Z
4: Network Layer 4a-21
Distance Vector Algorithm: example
X
2
Y
1
7
Z
X
Z
X
Y
D (Y,Z) = c(X,Z) + minw{D (Y,w)}
= 7+1 = 8
D (Z,Y) = c(X,Y) + minw {D (Z,w)}
= 2+1 = 3
4: Network Layer 4a-22
Distance Vector: link cost changes
Link cost changes:
 node detects local link cost change
 updates distance table (line 15)
 if cost change in least cost path,
notify neighbors (lines 23,24)
“good
news
travels
fast”
1
X
4
Y
50
1
Z
algorithm
terminates
4: Network Layer 4a-23
Distance Vector: link cost changes
Link cost changes:
 good news travels fast
 bad news travels slow -
“count to infinity” problem!
60
X
4
Y
50
1
Z
algorithm
continues
on!
4: Network Layer 4a-24
Distance Vector: poisoned reverse
If Z routes through Y to get to X :
 Z tells Y its (Z’s) distance to X is
infinite (so Y won’t route to X via Z)
 will this completely solve count to
infinity problem?
60
X
4
Y
50
1
Z
algorithm
terminates
4: Network Layer 4a-25
Comparison of LS and DV algorithms
Message complexity
 LS: with n nodes, E links,
O(nE) msgs sent each
 DV: exchange between
neighbors only
 convergence time varies
Speed of Convergence
 LS: O(n**2) algorithm
requires O(nE) msgs
 may have oscillations
 DV: convergence time varies
 may be routing loops
 count-to-infinity problem
Robustness: what happens
if router malfunctions?
LS:


DV:


node can advertise
incorrect link cost
each node computes only
its own table
DV node can advertise
incorrect path cost
each node’s table used by
others
• error propagate thru
network
4: Network Layer 4a-26
Hierarchical Routing
Our routing study thus far - idealization
 all routers identical
 network “flat”
… not true in practice
scale: with 50 million
destinations:
 can’t store all dest’s in
routing tables!
 routing table exchange
would swamp links!
administrative autonomy
 internet = network of
networks
 each network admin may
want to control routing in its
own network
4: Network Layer 4a-27
Hierarchical Routing
 aggregate routers into
regions, “autonomous
systems” (AS)
 routers in same AS run
same routing protocol


“inter-AS” routing
protocol
routers in different AS
can run different interAS routing protocol
gateway routers
 special routers in AS
 run inter-AS routing
protocol with all other
routers in AS
 also responsible for
routing to destinations
outside AS
 run intra-AS routing
protocol with other
gateway routers
4: Network Layer 4a-28
Intra-AS and Inter-AS routing
C.b
a
C
Gateways:
B.a
A.a
b
A.c
d
A
a
b
c
a
c
B
b
•perform inter-AS
routing amongst
themselves
•perform intra-AS
routers with other
routers in their
AS
network layer
inter-AS, intra-AS
routing in
gateway A.c
link layer
physical layer
4: Network Layer 4a-29
Intra-AS and Inter-AS routing
C.b
a
Host
h1
C
b
A.a
Inter-AS
routing
between
A and B
A.c
a
d
c
b
A
Intra-AS routing
within AS A
B.a
a
c
B
Host
h2
b
Intra-AS routing
within AS B
 We’ll examine specific inter-AS and intra-AS
Internet routing protocols shortly
4: Network Layer 4a-30
The Internet Network layer
Host, router network layer functions:
Transport layer: TCP, UDP
Network
layer
IP protocol
•addressing conventions
•datagram format
•packet handling conventions
Routing protocols
•path selection
•RIP, OSPF, BGP
routing
table
ICMP protocol
•error reporting
•router “signaling”
Link layer
physical layer
4: Network Layer 4a-31
IP Addressing
 IP address: 32-bit
identifier for host,
router interface
 interface: connection
between host, router
and physical link



router’s typically have
multiple interfaces
host may have multiple
interfaces
IP addresses
associated with
interface, not host,
router
223.1.1.1
223.1.2.1
223.1.1.2
223.1.1.4
223.1.1.3
223.1.2.9
223.1.3.27
223.1.2.2
223.1.3.2
223.1.3.1
223.1.1.1 = 11011111 00000001 00000001 00000001
223
1
1
1
4: Network Layer 4a-32
IP Addressing
 IP address:
 network part (high
order bits)
 host part (low order
bits)
 What’s a network ?
(from IP address
perspective)
 device interfaces with
same network part of
IP address
 can physically reach
each other without
intervening router
223.1.1.1
223.1.2.1
223.1.1.2
223.1.1.4
223.1.1.3
223.1.2.9
223.1.3.27
223.1.2.2
LAN
223.1.3.1
223.1.3.2
network consisting of 3 IP networks
(for IP addresses starting with 223,
first 24 bits are network address)
4: Network Layer 4a-33
IP Addressing
How to find the
networks?
 Detach each
interface from
router, host
 create “islands of
isolated networks
223.1.1.2
223.1.1.1
223.1.1.4
223.1.1.3
223.1.9.2
223.1.7.0
223.1.9.1
223.1.7.1
223.1.8.1
223.1.8.0
223.1.2.6
Interconnected
system consisting
of six networks
223.1.2.1
223.1.3.27
223.1.2.2
223.1.3.1
223.1.3.2
4: Network Layer 4a-34
IP Addresses
class
A
0 network
B
10
C
110
D
1110
1.0.0.0 to
127.255.255.255
host
network
128.0.0.0 to
191.255.255.255
host
network
multicast address
host
192.0.0.0 to
239.255.255.255
240.0.0.0 to
247.255.255.255
32 bits
4: Network Layer 4a-35
Getting a datagram from source to dest.
routing table in A
Dest. Net. next router Nhops
223.1.1
223.1.2
223.1.3
IP datagram:
misc source dest
fields IP addr IP addr
data
A
 datagram remains
unchanged, as it travels
source to destination
 addr fields of interest
here
223.1.1.4
223.1.1.4
1
2
2
223.1.1.1
223.1.2.1
B
223.1.1.2
223.1.1.4
223.1.1.3
223.1.3.1
223.1.2.9
223.1.3.27
223.1.2.2
E
223.1.3.2
4: Network Layer 4a-36
Getting a datagram from source to dest.
misc
data
fields 223.1.1.1 223.1.1.3
Dest. Net. next router Nhops
223.1.1
223.1.2
223.1.3
Starting at A, given IP
datagram addressed to B:
 look up net. address of B
 find B is on same net. as A
A
223.1.1.1
223.1.2.1
 link layer will send datagram
directly to B inside link-layer
frame
 B and A are directly
connected
223.1.1.4
223.1.1.4
1
2
2
B
223.1.1.2
223.1.1.4
223.1.1.3
223.1.3.1
223.1.2.9
223.1.3.27
223.1.2.2
E
223.1.3.2
4: Network Layer 4a-37
Getting a datagram from source to dest.
misc
data
fields 223.1.1.1 223.1.2.3
Dest. Net. next router Nhops
223.1.1
223.1.2
223.1.3
Starting at A, dest. E:
 look up network address of E
 E on different network
A, E not directly attached
routing table: next hop
router to E is 223.1.1.4
link layer sends datagram to
router 223.1.1.4 inside linklayer frame
datagram arrives at 223.1.1.4
continued…..
A
223.1.1.4
223.1.1.4
1
2
2
223.1.1.1





223.1.2.1
B
223.1.1.2
223.1.1.4
223.1.1.3
223.1.3.1
223.1.2.9
223.1.3.27
223.1.2.2
E
223.1.3.2
4: Network Layer 4a-38
Getting a datagram from source to dest.
misc
data
fields 223.1.1.1 223.1.2.3
Arriving at 223.1.4,
destined for 223.1.2.2
 look up network address of E
 E on same network as router’s
interface 223.1.2.9
 router, E directly attached
 link layer sends datagram to
223.1.2.2 inside link-layer
frame via interface 223.1.2.9
 datagram arrives at
223.1.2.2!!! (hooray!)
Dest.
next
network router Nhops interface
223.1.1
223.1.2
223.1.3
A
-
1
1
1
223.1.1.4
223.1.2.9
223.1.3.27
223.1.1.1
223.1.2.1
B
223.1.1.2
223.1.1.4
223.1.1.3
223.1.3.1
223.1.2.9
223.1.3.27
223.1.2.2
E
223.1.3.2
4: Network Layer 4a-39
IP datagram format
IP protocol version
number
header length
(bytes)
“type” of data
max number
remaining hops
(decremented at
each router)
upper layer protocol
to deliver payload to
32 bits
head. type of
length
len service
fragment
16-bit identifier flgs
offset
time to upper
Internet
layer
live
checksum
ver
total datagram
length (bytes)
for
fragmentation/
reassembly
32 bit source IP address
32 bit destination IP address
Options (if any)
data
(variable length,
typically a TCP
or UDP segment)
E.g. timestamp,
record route
taken, pecify
list of routers
to visit.
4: Network Layer 4a-40
IP Fragmentation and Reassembly
 network links have MTU
(max.transfer size) largest possible link-level
frame.
 different link types,
different MTUs
 large IP datagram divided
(“fragmented”) within net
 one datagram becomes
several datagrams
 “reassembled” only at
final destination
 IP header bits used to
identify, order related
fragments
fragmentation:
in: one large datagram
out: 3 smaller datagrams
reassembly
4: Network Layer 4a-41
IP Fragmentation and Reassembly
length ID fragflag offset
=4000 =x
=0
=0
One large datagram becomes
several smaller datagrams
length ID fragflag offset
=1500 =x
=1
=0
length ID fragflag offset
=1500 =x
=1
=1480
length ID fragflag offset
=1040 =x
=0
=2960
4: Network Layer 4a-42
ICMP: Internet Control Message Protocol
 used by hosts, routers,
gateways to communication
network-level information
 error reporting:
unreachable host,
network, port, protocol
 echo request/reply
(used by ping)
 network-layer “above” IP:
 ICMP msgs carried in IP
datagrams
 ICMP message: type, code
plus first 8 bytes of IP
datagram causing error
Type
0
3
3
3
3
3
3
4
Code
0
0
1
2
3
6
7
0
8
9
10
11
12
0
0
0
0
0
description
echo reply (ping)
dest. network unreachable
dest host unreachable
dest protocol unreachable
dest port unreachable
dest network unknown
dest host unknown
source quench (congestion
control - not used)
echo request (ping)
route advertisement
router discovery
TTL expired
bad IP header
4: Network Layer 4a-43
Routing in the Internet
 The Global Internet consists of Autonomous
Systems (AS) interconnected with eachother:
Stub AS: small corporation
Multihomed AS: large corporation (no
transit)
Transit AS: provider
 Two level routing:
choice
Intra-AS: administrator is responsible for
Inter-AS: unique standard
4: Network Layer 4a-44
Internet AS Hierarchy
4: Network Layer 4a-45
Intra-AS Routing
 Also known as Interior Gateway Protocol (IGP)
 Most common IGPs:
RIP: Routing Information Protocol
OSPF: Open Shortest Path First
IGRP: Interior Gateway Routing Protocol (Cisco propr.)
4: Network Layer 4a-46
RIP ( Routing Info Protocol)
 Distance vector type scheme
 Included in BSD-UNIX Distribution in 1982
 Distance metric: # of hops (max = 15 hops)
 Distance vector: exchanged every 30 sec via a Response
Message (also called Advertisement)
 Each Advertisement contains up to 25 destination nets
4: Network Layer 4a-47
RIP
4: Network Layer 4a-48
RIP
destination network next router
1
A
20
B
30
B
10
-….
….
number of hops to destination
2
2
7
1
....
4: Network Layer 4a-49
RIP: Link Failure and Recovery
 If no advertisement heard after 180 sec, neighbor/link dead
 Routes via the neighbor are invalidated; new advertisements
sent to neighbors
 Neighbors in turn send out new advertisements if their
tables changed
 Link failure info quickly propagates to entire net
 Poison reverse used to prevent ping-pong loops (infinite
distance = 16 hops)
4: Network Layer 4a-50
RIP Table processing
 RIP routing tables managed by an application process called
route-d (demon)
 advertisements encapsulated in UDP packets (no reliable
delivery required; advertisements are periodically repeated)
4: Network Layer 4a-51
RIP Table processing
4: Network Layer 4a-52
RIP Table example
Destination
-------------------127.0.0.1
192.168.2.
193.55.114.
192.168.3.
224.0.0.0
default
Gateway
Flags Ref
Use
Interface
-------------------- ----- ----- ------ --------127.0.0.1
UH
0 26492 lo0
192.168.2.5
U
2
13 fa0
193.55.114.6
U
3 58503 le0
192.168.3.5
U
2
25 qaa0
193.55.114.6
U
3
0 le0
193.55.114.129
UG
0 143454
4: Network Layer 4a-53
RIP Table example (cont)
RIP Table example (at router giroflee):
Three attached class C networks (LANs)
Router only knows routes to attached LANs
Default router used to “go up”
Route multicast address: 224.0.0.0
Loopback interface (for debugging)
4: Network Layer 4a-54
OSPF (Open Shortest Path First)
 “open”: publicly available
 uses the Link State algorithm (ie, LS packet dissemination;
topology map at each node; route computation using
Dijkstra’s alg)
 OSPF advertisement carries one entry per neighbor router
 advertisements disseminated to ENTIRE Autonomous
System (via flooding)
4: Network Layer 4a-55
OSPF “advanced” features (not in RIP)
 Security: all OSPF messages are authenticated (to




prevent malicious intrusion); TCP connections used
Multiple same-cost paths allowed (only one path in
RIP)
For each link, multiple cost metrics for different
TOS (eg, satellite link cost set “low” for best
effort; high for real time)
Integrated uni- and multicast support: Multicast
OSPF (MOSPF) uses same topology data base as
OSPF
Hierarchical OSPF in large domains
4: Network Layer 4a-56
Hierarchical OSPF
4: Network Layer 4a-57
Hierarchical OSPF
 Two level hierarchy: local area and backbone
 Link state advertisements do not leave respective




areas
Nodes in each area have detailed area topology;
they only know direction (shortest path) to
networks in other areas
Area Border routers “summarize” distances to
networks in the area and advertise them to other
Area Border routers
Backbone routers run an OSPF routing alg limited
to the backbone
Boundary routers connect to other ASs
4: Network Layer 4a-58
IGRP (Interior Gateway Routing Protocol)
 CISCO proprietary; successor of RIP (mid 80’s)
 Distance Vector, like RIP
 several cost metrics (delay, bandwidth, reliability,




load etc)
uses TCP to exchange routing updates
routing tables exchanged only when costs change
Loop free routing achieved by using a Distributed
Updating Alg. (DUAL) based on diffused
computation
In DUAL, after a distance increase, the routing
table is frozen until all affected nodes have
learned of the change
4: Network Layer 4a-59
Inter-AS routing
4: Network Layer 4a-60
Inter-AS routing (cont)
 BGP (Border Gateway Protocol): the de facto
standard
 Path Vector protocol: and extension of Distance
Vector
 Each Border Gateway broadcast to neighbors
(peers) the entire path (ie, sequence of AS’s) to
destination
 For example, Gwy X may store the following path
to destination Z:
Path (X,Z) = X,Y1,Y2,Y3,…,Z
4: Network Layer 4a-61
Inter-AS routing (cont)
 Now, suppose Gwy X send its path to peer Gwy W
 Gwy W may or may not select the path offered by Gwy X,
because of cost, policy or loop prevention reasons
 If Gwy W selects the path advertised by Gwy X, then:
Path (W,Z) = w, Path (X,Z)
Note: path selection based not so much on cost (eg,# of
AS hops), but mostly on administrative and policy issues
(eg, do not route packets through competitor’s AS)
4: Network Layer 4a-62
Inter-AS routing (cont)
 Peers exchange BGP messages using TCP
 OPEN msg opens TCP connection to peer and
authenticates sender
 UPDATE msg advertises new path (or withdraws
old)
 KEEPALIVE msg keeps connection alive in absence
of UPDATES; it also serves as ACK to an OPEN
request
 NOTIFICATION msg reports errors in previous
msg; also used to close a connection
4: Network Layer 4a-63
Address Management
 As Internet grows, we run out of addresses
 Solution (a): subnetting. Eg, Class B Host field
(16bits) is subdivided into <subnet;host> fields
 Solution (b): CIDR (Classless Inter Domain
Routing): assign block of contiguous Class C
addresses to the same organization; these
addresses all share a common prefix
 repeated “aggregation” within same provider leads
to shorter and shorter prefixes
 CIDR helps also routing table size and processing:
Border Gwys keep only prefixes and find “longest
prefix” match
4: Network Layer 4a-64
Why different Intra- and Inter-AS routing ?
 Policy: Inter is concerned with policies (which
provider we must select/avoid, etc). Intra is
contained in a single organization, so, no policy
decisions necessary
 Scale: Inter provides an extra level of routing
table size and routing update traffic reduction
above the Intra layer
 Performance: Intra is focused on performance
metrics; needs to keep costs low. In Inter it is
difficult to propagate performance metrics
efficiently (latency, privacy etc). Besides, policy
related information is more meaningful.
We need BOTH!
4: Network Layer 4a-65
Router Architecture Overview
 Router main functions: routing algorithms and protocols processing,
switching datagrams from an incoming link to an outgoing link
Router Components
4: Network Layer 4a-66
Input Ports
 Decentralized switching: perform routing table lookup using
a copy of the node routing table stored in the port memory
 Goal is to complete input port processing at ‘line speed’, ie
processing time =< frame reception time (eg, with 2.5 Gbps
line, 256 bytes long frame, router must perform about 1
million routing table lookups in a second)
 Queuing occurs if datagrams arrive at rate higher than can
be forwarded on switching fabric
4: Network Layer 4a-67
Speeding Up Routing Table Lookup
 Table is stored in a tree structure to facilitate binary
search
 Content Addressable Memory (associative memory), eg Cisco
8500 series routers
 Caching of recently looked-up addresses
 Compression of routing tables
4: Network Layer 4a-68
Switching Fabric
4: Network Layer 4a-69
Switching Via Memory
First generation routers: packet is copied under system’s
(single) CPU control; speed limited by Memory bandwidth. For
Memory speed of B packet/sec or pps, throughput is B/2 pps
Input
Port
Memory
Output
Port
System Bus
• Modern routers: input ports with CPUs that implement output port lookup,
and store packets in appropriate locations (= switch) in a shared Memory;
eg Cisco Catalyst 8500 switches
4: Network Layer 4a-70
Switching Via Bus
 Input port processors transfer a datagram from input port memory
to output port memory via a shared bus
 Main resource contention is over the bus; switching is limited by
bus speed
 Sufficient speed for access and enterprise routers (not regional or
backbone routers) is provided by a Gbps bus; eg Cisco 1900 which
has a 1 Gbps bus
4: Network Layer 4a-71
Switching Via An Interconnection Network
 Used to overcome bus bandwidth limitations
 Banyan networks and other interconnection networks were initially
developed to connect processors in a multiprocessor computer
system; used in Cisco 12000 switches provide up to 60 Gbps
through the interconnection network
 Advanced design incorporates fragmenting a datagram into fixed
length cells and switch the cells through the fabric; + better
sharing of the switching fabric resulting in higher switching speed
4: Network Layer 4a-72
Output Ports
Buffering is required to hold datagrams whenever they arrive from
the switching fabric at a rate faster than the transmission rate
4: Network Layer 4a-73
Queuing At Input and Output Ports

Queues build up whenever there is a rate mismatch or blocking.
Consider the following scenarios:
 Fabric speed is faster than all input ports combined; more
datagrams are destined to an output port than other output ports;
queuing occurs at output port
 Fabric bandwidth is not as fast as all input ports combined; queuing
may occur at input queues;
 HOL blocking: fabric can deliver datagrams from input ports in
parallel, except if datagrams are destined to same output port; in
this case datagrams are queued at input queues; there may be
queued datagrams that are held behind HOL conflict, even when
their output port is available
4: Network Layer 4a-74
IPv6
 Initial motivation is 32 bit address space is




estimated to get used up either by 2008 or 2018;
opportunity for changes to achieve faster
processing and provision of differentiated
services
Packet Format: fixed header of 40 bytes + option;
Fixed header fields:
Version: indicates IPv6
Priority: 4 bits, to give priority to certain packets
within a flow; values 0 to 7 for congestioncontrolled traffic, while values 8 to 15 is for other
traffic (eg constant bit rate)
Flow Label: intended to help with differentiating
services based on flows, a flow is not strictly
defined in IPv6 proposal, it can be traffic from a
user who paid more, traffic that is real-time,
etc.
4: Network Layer 4a-75
IPv6 Header (Cont)
 Hop Limit: same as TTL, still one byte!
 Source and Destination Addresses: 128
bits, with a new hierarchical structure
(address can imply geographical location,
not in IPv4); includes new type of address:
anycast, delivery is to one of a number of
destinations
4: Network Layer 4a-76
Other Changes from IPv4
 Fragmentation: none provided, router which has a
packet longer than the maximum allowed on a the
next hop drops the packet, and sends an ICMP
message “Packet Too Big” to the packet source;
reduces processing time of packets
 Checksum: removed entirely to reduce processing
time at each hop
 Options: Options are allowed and indicated by the
header field “Next Header”, the content of this
field indicates the higher level protocol or the
existence of an option after the 40 bytes IPv6
header
 ICMPv6: new version of ICMP, with additional
message types, eg “Packet Too Big”; and group
4: Network Layer
management function for multicast groups
(Under
4a-77
Transition From IPv4 To IPv6
 During the transition, not all routers will be
upgraded to IPv6; How will the network operate?
 Two proposed approaches: Dual Stack and
Tunneling
 Dual Stack:




Some routers with dual stack (v6, v4); others are only v4
routers
Dual stack routers translate the packet to v4 packet if
the next router is v4 only
DNS can be used to determine whether a router is dual
stack or not
Some info and v6 features will be lost if a packet has to
go through any v4 only router; eg Flow Identification
4: Network Layer 4a-78
Dual Stack Approach
4: Network Layer 4a-79
Tunneling
 Routers are as before v4/v6 or v4 only
 A v4/v6 router “encapsulates” the IPv6 packet
inside an IPv4 envelop before communication to a
v4 only router
 A v4/v6 router receiving an encapsulated packet
from a “tunnel”, remove the envelop and forwards
the IPv6 to next router if the next router is
v4/v6 capable
4: Network Layer 4a-80
Multicast Routing
 Multicast: delivery of same packet to a
group of receivers
 Multicasting is becoming increasingly
popular in the Internet (video on demand;
whiteboard; interactive games)
 Multiple unicast vs multicast
4: Network Layer 4a-81
Multicast Group Address
 M-cast group address “delivered” to all
receivers in the group
 Internet uses Class D for m-cast
 M-cast address distribution etc. managed
by IGMP Protocol
4: Network Layer 4a-82
IGMP Protocol
 IGMP (Internet Group Management Protocol)




operates between Router and local Hosts, typically
attached via a LAN (e.g., Ethernet)
Router queries the local Hosts for m-cast group
membership info
Router “connects” active Hosts to m-cast tree via
m-cast protocol
Hosts respond with membership reports: actually,
the first Host which responds (at random) speaks
for all
Host issues “leave-group” mssg to leave; this is
optional since router periodically polls anyway
(soft state concept)
4: Network Layer 4a-83
IGMP message types
GMP Message type
Purpose
Sent by
membership query: general
current active
multicast groups
membership query: specific
specific m-cast group
router
query for
router
query for
membership report
wants to join goup
host
leave group
the group
host
host
host leaves
4: Network Layer 4a-84
The Multicast Tree problem
 Problem: find the best (e.g., min cost) tree
which interconnects all the members
4: Network Layer 4a-85
Multicast Tree options
 GROUP SHARED TREE: single tree; the
root is the “CORE” or the “Rendez Vous”
point; all messages go through the CORE
 SOURCE BASED TREE: each source is the
root of its own tree connecting to all the
members; thus N separate trees
4: Network Layer 4a-86
Group Shared Tree
 Predefined CORE for given m-cast group (eg,




posted on web page)
New members “join” and “leave” the tree with
explicit join and leave control messages
Tree grows as new branches are “grafted” onto
the tree
CBT (Core Based Tree) and PIM Sparse-Mode are
Internet m-cast protocols based on GSTree
All packets go through the CORE
4: Network Layer 4a-87
Source Based Tree
 Each source is the root of its own tree: the
tree of shortest paths
 Packets delivered on the tree using
“reverse path forwarding” (RPF); i.e., a
router accepts a packet originated by
source S only if such packet is forwarded
by the neighbor on the shortest path to S
 In other words, m-cast packets are
“forwarded” on paths which are the
“reverse” of “shortest paths” to S
4: Network Layer 4a-88
Source-Based tree: DVMRP
 DVMRP was the first m-cast protocol deployed on




the Internet; used in Mbone (Multicast Backbone)
Initially, the source broadcasts the packet to ALL
routers (using RPF)
Routers with no active Hosts (in this m-cast group)
“prune” the tree; i.e., they disconnect themselves
from the tree
Recursively, interior routers with no active
descendents self-prune After timeout (2 hours in
Internet) pruned branches “grow back”
Problems: only few routers are mcast-able;
solution: tunnels
4: Network Layer 4a-89
PIM (Protocol Independent
Multicast)
 PIM (Protocol Independent Multicast) is becoming




the de facto intra AS m-cast protocol standard
“Protocol Independent” because it can operate on
different routing infrastructures (as a difference
of DVMRP)
PIM can operate in two modes: PIM Sparse and
PIM dense Mode.
Initially, members join the “Shared Tree”
centered around a Randez Vous Point
Later, once the “connection” to the shared treee
has been established, opportunities to connet
DIRECTLY to the source are explored (thus
establishing a partial Source Based tree
4: Network Layer 4a-90