Download Network Layer - Spartans Fall-14

Document related concepts

Multiprotocol Label Switching wikipedia , lookup

TCP congestion control wikipedia , lookup

CAN bus wikipedia , lookup

Net bias wikipedia , lookup

Asynchronous Transfer Mode wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Distributed firewall wikipedia , lookup

Deep packet inspection wikipedia , lookup

IEEE 1355 wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Computer network wikipedia , lookup

Internet protocol suite wikipedia , lookup

Wake-on-LAN wikipedia , lookup

Network tap wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Airborne Networking wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Transcript
Computer Networks Research Center (CNRC)
Computer Science Department, CIIT, Lahore.
http://research.ciitlahore.edu.pk/Groups/CNRC/
Network Layer
Agenda
•
•
•
•
•
Network Layer
Internet Protocol
IP Datagram
NAT
Subneting
Network Layer
2
The Internet Network Layer
Host, router network layer functions:
Transport layer: TCP, UDP
Network
layer
IP protocol
•addressing conventions
•datagram format
•packet handling conventions
Routing protocols
•path selection
•RIP, OSPF, BGP
forwarding
table
ICMP protocol
•error reporting
•router “signaling”
Link layer
physical layer
Network Layer
3
IP Datagram Format
IP protocol version
number
header length
(bytes)
“type” of data
max number
remaining hops
(decremented at
each router)
upper layer protocol
to deliver payload to
32 bits
type of
ver head.
len service
length
fragment
16-bit identifier flgs
offset
upper
time to
Internet
layer
live
checksum
total datagram
length (bytes)
for
fragmentation/
reassembly
32 bit source IP address
32 bit destination IP address
Options (if any)
data
(variable length,
typically a TCP
or UDP segment)
Network Layer
E.g. timestamp,
record route
taken, specify
list of routers
to visit.
4
IP Datagram Format
 Note that an IP datagram has a total of 20 bytes of header (assuming no
options). If the datagram carries a TCP segment, then each (nonfragmented)
datagram carries a total of 40 bytes of header (20 bytes of IP header plus 20
bytes of TCP header) along with the application-layer message.
how much overhead with TCP?
 20 bytes of TCP
 20 bytes of IP
 = 40 bytes + app layer overhead
Network Layer
5
IP Fragmentation & Reassembly
•
•
•
The maximum amount of data that a
link-layer frame can carry is called the
maximum transmission unit (MTU).
Network links have MTU (max.transfer
size) - largest possible link-level frame.
– different link types, different
MTUs
large IP datagram divided
(“fragmented”) within net
– one datagram becomes several
datagrams
– “reassembled” only at final
destination
– IP header bits used to identify,
order related fragments
fragmentation:
in: one large datagram
out: 3 smaller datagrams
reassembly
Some protocols can carry big datagrams, whereas other protocols can carry only little packets. For
example, Ethernet frames can carry up to 1,500 bytes of data, whereas frames for some wide-area
links can carry no more than 576 bytes.
Network Layer
6
IP Fragmentation and Reassembly
A datagram of 4,000 bytes (20 bytes of IP header plus 3,980 bytes of IP payload) arrives at a router and
must be forwarded to a link with an MTU of 1,500 bytes. This implies that the 3,980 data bytes in the original
datagram must be allocated to three separate fragments (each of which is also an IP datagram).
Suppose that the original datagram is stamped with an identification number of 777.
Example
 4000 byte datagram
 MTU = 1500 bytes
1480 bytes in
data field
offset = 1480/8 = 185
length
=4000
ID fragflag
=777
=0
offset
=0
One large datagram becomes
several smaller datagrams
length
=1500
ID fragflag
=777
=1
offset
=0
length
=1500
ID
fragflag
offset
=185
length
=1040
ID fragflag
=777
=0
offset
=370
Network Layer
=777
=1
7
IP Fragmentation and Reassembly
Network Layer
8
IP Addressing: Introduction
• IP address: 32-bit
identifier for host, router
interface
• interface: connection
between host/router and
physical link
– router’s typically have
multiple interfaces
– host may have multiple
interfaces
– IP addresses associated
with each interface
223.1.1.1
223.1.2.1
223.1.1.2
223.1.1.4
223.1.1.3
223.1.2.9
223.1.3.27
223.1.2.2
223.1.3.2
223.1.3.1
223.1.1.1 = 11011111 00000001 00000001 00000001
223
Network Layer
1
1
1
9
Subnets
• IP address:
223.1.1.1
– subnet part (high order
bits)
– host part (low order bits)
• What’s a subnet ?
– device interfaces with
same subnet part of IP
address
– can physically reach each
other without intervening
router
223.1.2.1
223.1.1.2
223.1.1.4
223.1.1.3
223.1.2.9
223.1.3.27
223.1.2.2
LAN
223.1.3.1
223.1.3.2
network consisting of 3 subnets
Network Layer
10
Subnets
223.1.1.0/24
223.1.2.0/24
Recipe
• To determine the subnets,
detach each interface
from its host or router,
creating islands of isolated
networks. Each isolated
network is called a
subnet.
223.1.3.0/24
Subnet mask: /24
Network Layer
11
Subnets
223.1.1.2
How many?
223.1.1.1
223.1.1.4
223.1.1.3
223.1.9.2
223.1.7.0
223.1.9.1
223.1.7.1
223.1.8.1
223.1.8.0
223.1.2.6
223.1.2.1
223.1.3.27
223.1.2.2
Network Layer
223.1.3.1
223.1.3.2
12
IP Addresses: How to get one?
Q: How does host get IP address?
•
hard-coded by system admin in a file
– Wintel: control-panel->network->configuration->tcp/ip>properties
– UNIX: /etc/rc.config
•
DHCP: Dynamic Host Configuration Protocol: dynamically get address from
as server
– “plug-and-play”
(more in next chapter)
Network Layer
13
IP Addressing
Q: How does an ISP get block of addresses?
A: ICANN: Internet Corporation for Assigned
Names and Numbers
– allocates addresses
– manages DNS
– assigns domain names, resolves disputes
Network Layer
14
NAT: Network Address Translation
rest of
Internet
local network
(e.g., home network)
10.0.0/24
10.0.0.4
10.0.0.1
10.0.0.2
138.76.29.7
10.0.0.3
All datagrams leaving local
network have same single source
NAT IP address: 138.76.29.7,
different source port numbers
Datagrams with source or
destination in this network
have 10.0.0/24 address for
source, destination (as usual)
Network Layer
15
NAT: Network Address Translation
• Motivation: local network uses just one IP address as far as
outside word is concerned:
– no need to be allocated range of addresses from ISP: - just
one IP address is used for all devices
– can change addresses of devices in local network without
notifying outside world
– can change ISP without changing addresses of devices in
local network
– devices inside local net not explicitly addressable, visible by
outside world (a security plus).
Network Layer
16
NAT: Network Address Translation
Implementation: NAT router must:
– outgoing datagrams: replace (source IP address, port #) of
every outgoing datagram to (NAT IP address, new port #)
. . . remote clients/servers will respond using (NAT IP address, new
port #) as destination address.
– remember (in NAT translation table) every (source IP address,
port #) to (NAT IP address, new port #) translation pair
– incoming datagrams: replace (NAT IP address, new port #) in
destination fields of every incoming datagram with
corresponding (source IP address, port #) stored in NAT table
Network Layer
17
NAT: Network Address Translation
2: NAT router
changes datagram
source addr from
10.0.0.1, 3345 to
138.76.29.7, 5001,
updates table
2
NAT translation table
WAN side addr
LAN side addr
1: host 10.0.0.1
sends datagram to
128.119.40, 80
138.76.29.7, 5001 10.0.0.1, 3345
……
……
S: 10.0.0.1, 3345
D: 128.119.40.186, 80
S: 138.76.29.7, 5001
D: 128.119.40.186, 80
138.76.29.7
S: 128.119.40.186, 80
D: 138.76.29.7, 5001
3: Reply arrives
dest. address:
138.76.29.7, 5001
3
1
10.0.0.4
S: 128.119.40.186, 80
D: 10.0.0.1, 3345
10.0.0.1
10.0.0.2
4
10.0.0.3
4: NAT router
changes datagram
dest addr from
138.76.29.7, 5001 to 10.0.0.1, 3345
Network Layer
18
NAT: Network Address Translation
• 16-bit port-number field:
– 60,000 simultaneous connections with a single
LAN-side address!
• NAT is controversial:
– routers should only process up to layer 3
– violates end-to-end argument
• NAT possibility must be taken into account by app
designers, eg, P2P applications
– address shortage should instead be solved by IPv6
Network Layer
19
Binary and Decimal Conversions
Network Layer
20
The 32 bit Binary IP Address
•
•
•
•
•
•
•
•
•
An IP address is represented by a 32 bit binary number
The value of the right-most bit (also called the least significant bit) is either 0 or 1
The corresponding decimal value of each bit doubles as you move left in the
binary number
So the decimal value of the 2nd bit from the right is either 0 or 2. The third bit is
either 0 or 4, the fourth bit 0 or 8, etc ...
IP addresses are expressed as dotted-decimal numbers
we break up the 32 bits of the address into four octets (an octet is a group of 8 bits).
The maximum decimal value of each octet is 255
The largest 8 bit binary number is 11111111
Those bits, from left to right, have decimal values of 128, 64, 32, 16, 8, 4, 2, and
1. Added together, they total 255
Network Layer
21
IP Address Component Fields
 The network number of an IP address identifies the network
to which a device is attached
 The host portion of an IP address identifies the specific
device on that network
Network Layer
22
IP Address Classes
The
American
Registry for Internet
Numbers (ARIN) is
the Regional Internet
Registry
(RIR)
for
Canada,
the
United States, and
many Caribbean and
North Atlantic islands.







When written in a binary format, the first (leftmost) bit of a Class A address is always 0.
Example of a Class A IP address is 124.95.44.15
The first octet, 124, identifies the network number assigned by ARIN
which will range from 0-126. (127 does start with a 0 bit, but has been reserved for special
purposes.)
The internal administrators of the network assign the remaining 24 bits
All Class A IP addresses use only the first 8 bits to identify the network part of the address
Every network that uses a Class A IP address can have assigned up to 2 to-the-power of 24
(224) (minus 2), or 16,777,214, possible IP addresses to devices that are attached to its
network
Network Layer
23

The first 2 bits of a Class B address are always 10 (one and zero).

example of a Class B IP address is 151.10.13.28

The first two octets identify the network number assigned by ARIN

Class B IP addresses always have values ranging from 128 to 191 in their first octet

Every network that uses a Class B IP address can have assigned up to 2 to-the-power of 16 (216)
(minus 2 again!), or 65,534, possible IP addresses to devices that are attached to its network

The first 3 bits of a Class C address are always 110 (one, one and zero).

An example of a Class C IP address is 201.110.213.28

The first three octets identify the network number

Class C IP addresses always have values ranging from 192 to 223 in their first octet.

All Class C IP addresses use the first 24 bits to identify the network part of the address

Every network that uses a Class C IP address can have assigned up to 28 (minus 2), or 254,
possible IP addresses to devices that areNetwork
attached
Layerto its network
24
IP Reserved Address









If your computer wanted to communicate with all of the devices on a network, it would be
quite unmanageable to write out the IP address for each device
An IP address that ends with binary 0s in all host bits is reserved for the network address
(sometimes called the wire address)
Therefore, as a Class A network example, 113.0.0.0 is the IP address of the network
containing the host 113.1.2.3
A router uses a network's IP address when it forwards data on the Internet. As a Class B
network example, the IP address 176.10.0.0 is a network address.
If you wanted to send data to all of the devices on a network, you would need to use a
broadcast address
Broadcast IP addresses end with binary 1s in the entire host part of the address (the host
field)
For the network in the example (176.10.0.0)
where the last 16 bits make up the host field (or host part of the address), the broadcast
that would be sent out to all devices on that network would include a destination address
of 176.10.255.255 (since 255 is the decimal value of an octet containing 11111111).
When you send a broadcast packet on a network, all devices on the network notice it
Network Layer
25
Subnets
• Network administrators sometimes need to divide networks,
especially large ones, into smaller networks
• These smaller divisions are called subnetworks and provide
addressing flexibility. Most of the time subnetworks are
simply referred to as subnets
• each subnet address is unique.
• Subnet addresses include the Class A, Class B, or Class C
network portion, plus a subnet field and a host field
Network Layer
26
The 32 bit Binary IP Address

The subnet field and the host field are created from the original host
portion for the entire network

To create a subnet address, a network administrator borrows bits from the
original host portion and designates them as the subnet field.

The minimum number of bits that can be borrowed is 2
Network Layer
27
Why Subnets!

A primary reason for using subnets is to reduce the size of a broadcast domain

When broadcast traffic begins to consume too much of the available bandwidth,
network administrators may choose to reduce the size of the broadcast domain.
Network Layer
28
Subnet Mask
•
The subnet mask (formal term: extended network prefix), is not an address,
but determines which part of an IP address is the network field and which
part is the host field
•
A subnet mask is 32 bits long and has 4 octets, just like an IP address.
•
To determine the subnet mask for a particular subnetwork IP address follow
these steps
– Express the subnetwork IP address in binary form
– Replace the network and subnet portion of the address with all 1s
– Replace the host portion of the address with all 0s
– As the last step convert the binary expression back to dotted-decimal notation.
 The term "operations" in mathematics refers to rules that define how one
number combines with other numbers
 The basic Boolean operations are AND, OR, and NOT.
Network Layer
29
The AND Function



In order to route a data packet, the router must first determine the destination
network/subnet address by performing a logical AND using the destination host's
IP address and the subnet mask
The result will be the network/subnet address
The router has received a packet for host 131.108.2.2 - it uses the AND
operation to learn that this packet should be routed to subnet 131.108.2.0
Network Layer
30
Subnet Mask (Cont.)

The process used to apply the subnet mask involves Boolean Algebra to filter out
non-matching bits to identify the network address.
Boolean Algebra

Boolean Algebra is a process that applies binary logic to yield binary results.

Working with subnet masks, you need only 4 basic principles of Boolean Algebra:
 1 and 1 = 1
 1 and 0 = 0
 0 and 1 = 0
 0 and 0 = 0


In another words, the only way you can get a result of a 1 is to combine 1 & 1. Everything
else will end up as a 0.
The process of combining binary values with Boolean Algebra is called Anding.
Network Layer
31
A Trial Separation




Subnet masks apply only to Class A, B or C IP addresses.
The subnet mask is like a filter that is applied to a message’s destination IP
address.
Its objective is to determine if the local network is the destination network.
The subnet mask goes like this:
1. If a destination IP address is 206.175.162.21, we know that it is a Class C address &
that its binary equivalent is: 11001110.10101111.10100010.00010101
2. We also know that the default standard Class C subnet mask is: 255.255.255.0 and
that its binary equivalent is: 11111111.11111111.11111111.00000000
3. When these two binary numbers (the IP address & the subnet mask) are combined
using Boolean Algebra, the Network ID of the destination network is the result:
4. The result is the IP address of the network which in this case is the same as the
local network & means that the message
is for a node on the local network.
Network Layer
32
Default Standard Subnet Masks

There are default standard subnet masks for Class A, B and C addresses:
Subnetting Networks ID

A 3-step example of how the default Class A subnet mask is applied to a Class A address:

The default Class A subnet mask (255.0.0.0) is AND’d with the Class A address
(123.123.123.001) using Boolean Algebra, which results in the Network ID (123.0.0.0)
being revealed.
The default Class B subnet mask (255.255.0.0) strips out the 16-bit network ID & the
default Class C subnet mask (255.255.255.0) strips out the 24-bit network ID.

Network Layer
33
Subnetting, Subnet & Subnet Mask




Subnetting, a subnet & a subnet mask are all different.
In fact, the 1st creates the 2nd & is identified by the 3rd.
Subnetting is the process of dividing a network & its IP addresses into segments, each of
which is called a subnetwork or subnet.
The subnet mask is the 32-bit number that the router uses to cover up the network
address to show which bits are being used to identify the subnet.
Subnetting

A network has its own unique address, such as a Class B network with the address 172.20.0.0
which has all zeroes in the host portion of the address.

From the basic definitions of a Class B network & the default Class B subnet mask, this network
can be created as a single network that contains more than 65,000 individual hosts.

Through the use of subnetting, the network from the previous slide can be logically divided into
subnets with fewer hosts on each subnetwork.

It does not improve the available shared bandwidth only, but it cuts down on the amount of
broadcast traffic generated over the entire network as well.

The 2 primary benefits of subnetting are:
1.
Fewer IP addresses, often as few as one, are needed to provide addressing to a network
& subnetting.
2.
Subnetting usually results in smaller routing tables in routers beyond the local
internetwork.
Network Layer
34
Subnetting

Example of subnetting: when the network administrator divides the 172.20.0.0 network
into 5 smaller networks – 172.20.1.0, 172.20.2.0, 172.20.3.0, 172.20.4.0 & 172.20.5.0 –
the outside world stills sees the network as 172.20.0.0, but the internal routers now break
the network addressing into the 5 smaller subnetworks.

In the above example, only a single IP address is used to reference the network &
instead of 5 network addresses, only one network reference is included in the routing
tables of routers on other networks.
Borrowing Bits to Grow a Subnet




The key concept in subnetting is borrowing bits from the host portion of the network to
create a subnetwork.
Rules govern this borrowing, ensuring that some bits are left for a Host ID.
The rules require that two bits remain available to use for the Host ID & that all of the
subnet bits cannot be all 1s or 0s at the same time.
For each IP address class, only a certain number of bits can be borrowed from the host
portion for use in the subnet mask.
Bits Available for Creating Subnets
Address Class
Host Bits
Bits Available for Subnet
A
24
22
B
16
14
8
6
C
Network Layer
35
Subnetting a Class A Network


The default subnet mask for a class A network is 255.0.0.0 which allows for more than
16,000,000 hosts on a single network.
The default subnet mask uses only 8 bits to identify network, leaving 24 bits for host addressing .

To subnet a Class A network, you need to borrow a sufficient number of bits from the 24-bit host
portion of the mask to allow for the number of subnets you plan to create, now & in the future.

Example: To create 2 subnets with more than 4 millions hosts per subnet, you must borrow 2 bits
from the 2nd octet & use 10 masked (value equals one) bits for the subnet mask
(11111111.11000000) or 255.192 in decimal.


Keep in mind that each of the 8-bit octets has binary place values.
When you borrow bits from the Host ID portion of the standard mask, you don’t change the value
of the bits, only how they are grouped & used.

A sample of subnet mask options available for Class A addresses.
Network Layer
36
Class A Subnet Masks


All subnet masks contain 32 bits; no more, no less.
However a subnet mask cannot filter more than 30 bits. This means 2 things:
 One, that there cannot be more than 30 ones bits in the subnet mask.
 Two, that there must always be at least 2 bits available for the Host ID.

The subnet mask with the highest value (255.255.255.252) has a binary representation of:
11111111.11111111.11111111.11111100
The 2 zeroes in this subnet mask represent the 2 positions set aside for the Host address
portion of the address.


Remember that the addresses with all ones (broadcast address) & all zeroes (local
network) cannot be used as they have special meanings.
Network Layer
37
Subnetting Class B & Class C
•
The table on previous slide “Class A Subnet Masks” is similar to the tables used for Class
B & Class C IP addresses & subnet masks.
•
The only differences are that you have fewer options (due to a fewer number of bits
available) & that you’re much more likely to work with Class B & Class C networks in real
life.
A sample of the subnet masks available for Class B networks.
A list of the subnet masks available for Class C networks.
Network Layer
38
Class C Subnets

Knowing the relationships in this table will significantly reduce the time you spend
calculating subnetting problems.
Class B Subnets

To calculate the number of subnets & hosts available from a Class B subnet mask, you
use the same host & subnet formulas described for calculating Class C values.

Using these formulas I have constructed a table that contains the Class B subnet & host
values.
Network Layer
39
Class B Subnets
Network Layer
40
So How Does This Work?

We ask our ISP for a Class C license.

They give us the Class C bank of 206.15.143.0

This gives us 1 Network (206.15.143.0) with the potential for 254 node addresses
(206.15.143.1 to 206.15.143.254).

But we have a LAN made up of 5 Networks with the largest one serving 25 nodes.

So we need to Subnet our 1 IP address...

To calculate the number of subnets (networks) and/or nodes, we need to do some math:

Use the formula 2n-2 where the n can represent either how many subnets (networks)
needed OR how many nodes per subnet needed.

We know we need at least 5 subnets. So 23-2 will give us 6 subnet addresses (Network
Addresses).
We know we need at least 25 nodes per network. 25-2 will give us 30 nodes per subnet
(network).
This will work, because we can steal the first 3 bits from the node’s portion of the address
to give to the network portion and still have 5 (8-3) left for the node portion:


Network Layer
41
Break it down:
•
Let’s go back to what portion is what: We have a Class C address:
NNNNNNNN.NNNNNNNN.NNNNNNNN.nnnnnnnn
With a Subnet mask of:
11111111.11111111.11111111.00000000
We need to steal 3 bits from the node portion to give it to the Network portion:
NNNNNNNN.NNNNNNNN.NNNNNNNN.NNNnnnnn

NNNNNNNN.NNNNNNNN.NNNNNNNN.NNNnnnnn

This will change our subnet mask to the following:

11111111.11111111.11111111.11100000

Above is how the computer will see our new subnet mask, but we need to express it in
decimal form as well:

255.255.255.224
128+64+32=224
Network Layer
42
What address is what?




Which of our 254 addresses will be a Subnet (or Network) address and which
will be our node addresses?
Because we are using the first 3 bits for our subnet mask, we can configure
them into eight different ways (binary form):
000
001
010
011
100
101
110
111
We cannot use all “0”s or all “1”s
000
001
010
011
100
101
110
111
We are left with 6 useable network numbers.
Network Layer
43
Network (Subnet) Addresses
• Remember our values:
128 64
32
16
8
Now our 3 bit configurations:
0
0
1
n
n
0
1
0
n
n
0
1
1
n
n
1
0
0
n
n
1
0
1
n
n
1
1
0
n
n
Network Layer
4
2
1
Equals
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
32
64
96
128
160
192
44
Network (Subnet) Addresses
0
0
0
1
1
1
0
1
1
0
0
1
1
0
1
0
1
0
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
32
64
96
128
160
192
Each of these numbers becomes the Network Address of their subnet...
206.15.143.32
206.15.143.64
206.15.143.96
206.15.143.128
206.15.143.160
206.15.143.192
Network Layer
45
Node Addresses

The device assigned the first address will receive the first number AFTER the network
address shown before.
206.15.143.33 or 32+1
0
0
1
0
0
0
0
1
And the last address in the Network will look like this: 206.15.143.62
0
0
1
1
1
1
1
0
*Remember, we cannot use all “1”s, that is the broadcast address (206.15.143.63)

The next network will start at 206.15.143.64

The first IP address on this subnet network will receive:
206.15.143.65
0
1
0
0
0
0
0
1
And the last address in the Network will receive: 206.15.143.94
0
1
0
1
1
1
1
0
*Remember, the broadcast address (206.15.143.95)
Network Layer
46
Can you figure out the rest?
•
•
•
•
•
•
•
Network:
206.15.143.32
206.15.143.64
206.15.143.96
206.15.143.128
206.15.143.160
206.15.143.192
Host Range
206.15.143.33 to 206.15.143.62
206.15.143.65 to 206.15.143.94
206.15.143.97 to 206.15.143.126
206.15.143.129 to 206.15.143.158
206.15.143.161 to 206.15.143.190
206.15.143.193 to 206.15.143.222
Network Layer
47
How the computer finds the Network Address
200.15.143.89
An address on the subnet
225.225.225.224 The new subnet mask
 When the computer does the Logical Bitwise AND Operation it
will come up with the following Network Address (or Subnet
Address):
11001000.00001111.10001111.01011001= 200.15.143.89
11111111.11111111.11111111.11100000 = 255.255.255.224
11001000.00001111.10001111.01000000 = 200.15.143.64 (Network)
This address falls on our 2nd Subnet (Network)
Network Layer
48
Review




We have one class C license (206.15.143.0).
We need to subnet that into 12 possible networks.
Each network needs a maximum of 10 nodes.
How many bits do we need to take?
24-2=14
4 bits need to be taken from the node portion and given to the network portion.

Will that leave enough bits for the node portion? We need a maximum of 10 on each network…
24-2=14
If we take 4 away, that leaves us with 4. That is enough for our individual networks of 10 nodes
each.

 Our new subnet mask will look like this:
11111111.11111111.11111111.11110000
255.255.255.240
128+64+32+16= 240
 Our subnet, or network addresses will be:
206.15.143.16
206.15.143.64
206.15.143.112
206.15.143.160
206.15.143.208
206.15.143.32
206.15.143.80
206.15.143.128
206.15.143.176
206.15.143.224
Network Layer
206.15.143.48
206.15.143.96
206.15.143.144
206.15.143.192
49
Review
Class B Subnet Chart
Class C Subnet Chart
#bits Subn
et M
as
k
CI
DR #Subn
ets #Ho
sts
#b
its Subne
t M
ask
2
255.255.192.0
/18
2
16,382
2
255.255.255.192
/26
2
62
3
255.255.224.0
/19
6
8,190
3
255.255.255.224
/27
6
30
4
255.255.240.0
/20
14
4094
5
6
7
8
9
10
11
12
13
14
255.255.248.0
255.255.252.0
255.255.254.0
255.255.255.0
255.255.255.128
255.255.255.192
255.255.255.224
255.255.255.240
255.255.255.248
255.255.255.252
/21
/22
/23
/24
/25
/26
/27
/28
/29
/30
30
62
126
254
510
1,022
2,046
4,094
8,190
16,382
2,046
1,022
510
254
126
62
30
14
6
2
4
255.255.255.240
/28
14
14
5
255.255.255.248
/29
30
6
6
255.255.255.252
/30
62
2
Network Layer
C
ID
R #subne
ts #ho
sts
50
ICMP: Internet Control Message Protocol
•
•
•
Used by hosts & routers to
communicate network level
information
– error reporting: unreachable host,
network, port, protocol
– echo request/reply (used by ping)
Network-layer “above” IP:
– ICMP msgs carried in IP datagrams
ICMP message: type, code plus
first 8 bytes of IP datagram causing
error
Type
0
3
3
3
3
3
3
4
Code
0
0
1
2
3
6
7
0
8
9
10
11
12 0
0
0
0
0
Network Layer
description
echo reply (ping)
dest. network unreachable
dest host unreachable
dest protocol unreachable
dest port unreachable
dest network unknown
dest host unknown
source quench (congestion
control - not used)
echo request (ping)
route advertisement
router discovery
TTL expired
bad IP header
51
Trace route and ICMP
• Source sends series of UDP
segments to dest
– First has TTL =1
– Second has TTL=2, etc.
– Unlikely port number
• When nth datagram arrives to nth
router:
– Router discards datagram
– And sends to source an ICMP
message (type 11, code 0)
– Message includes name of
router& IP address
• When ICMP message arrives,
source calculates RTT
• Traceroute does this 3 times
Stopping criterion
• UDP segment eventually arrives
at destination host
• Destination returns ICMP “host
unreachable” packet (type 3,
code 3)
• When source gets this ICMP,
stops.
Network Layer
52
IPv6


Initial motivation: 32-bit address space soon to be completely allocated.
Additional motivation:
– header format helps speed processing/forwarding
– header changes to facilitate QoS
IPv6 datagram format:
– fixed-length 40 byte header
– no fragmentation allowed
IPv6 Header
Priority:
identify priority among datagrams in flow
Flow Label: identify datagrams in same “flow.” (concept of “flow” not well defined).
Next header: identify upper layer protocol for data
Network Layer
53
Other Changes from IPv4
•
•
•
Checksum: removed entirely to reduce processing time at each hop
Options: allowed, but outside of header, indicated by “Next Header” field
ICMPv6: new version of ICMP
– additional message types, e.g. “Packet Too Big”
– multicast group management functions
Transition From IPv4 To IPv6
•
•
Not all routers can be upgraded simultaneous
– no “flag days”
– How will the network operate with mixed IPv4 and IPv6 routers?
Tunneling: IPv6 carried as payload in IPv4 datagram among IPv4 routers
Network Layer
54
Tunneling
Logical view:
Physical view:
A
B
IPv6
IPv6
A
B
C
IPv6
IPv6
IPv4
Flow: X
Src: A
Dest: F
data
A-to-B:
IPv6
E
F
IPv6
IPv6
D
E
F
IPv4
IPv6
IPv6
tunnel
Src:B
Dest: E
Src:B
Dest: E
Flow: X
Src: A
Dest: F
Flow: X
Src: A
Dest: F
data
data
B-to-C:
IPv6 inside
IPv4
Network Layer
B-to-C:
IPv6 inside
IPv4
Flow: X
Src: A
Dest: F
data
E-to-F:
IPv6
55
Graph Abstraction
5
2
u
2
1
Graph: G = (N,E)
v
x
3
w
3
1
5
z
1
y
2
N = set of routers = { u, v, w, x, y, z }
E = set of links ={ (u,v), (u,x), (u,w), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }
Remark: Graph abstraction is useful in other network contexts
Example: P2P, where N is set of peers and E is set of TCP connections
Network Layer
56
Graph abstraction: costs
5
2
u
v
2
1
x
• c(x,x’) = cost of link (x,x’)
3
w
3
1
5
z
1
y
- e.g., c(w,z) = 5
2
• cost could always be 1, or
inversely related to bandwidth,
or inversely related to
congestion
Cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)
Question: What’s the least-cost path between u and z ?
Routing algorithm: algorithm that finds least-cost path
Network Layer
57
Routing Algorithm Classification
Global or decentralized
information?
Global:
• all routers have complete
topology, link cost info
• “link state” algorithms
Decentralized:
• router knows physicallyconnected neighbors, link costs
to neighbors
• iterative process of
computation, exchange of info
with neighbors
• “distance vector” algorithms
Static or dynamic?
Static:
• routes change slowly over
time
Dynamic:
• routes change more
quickly
– periodic update
– in response to link cost
changes
Network Layer
58
A Link-State Routing Algorithm
Dijkstra’s algorithm
• net topology, link costs known to
all nodes
– accomplished via “link state
broadcast”
– all nodes have same info
• computes least cost paths from
one node (‘source”) to all other
nodes
– gives forwarding table for
that node
• iterative: after k iterations, know
least cost path to k dest.’s
Notation:
• c(x,y): link cost from node x to
y; = ∞ if not direct neighbors
• D(v): current value of cost of
path from source to dest. v
• p(v): predecessor node along
path from source to v
• N': set of nodes whose least cost
Network Layer
path definitively known
59
Dijsktra’s Algorithm: Example
D(v): current value of cost of
path from source to dest. v
p(v): predecessor node along
path from source to v
N': set of nodes whose least
cost path definitively known
Step
0
1
2
3
4
5
c(x,y): link cost from node x to y;
= ∞ if not direct neighbors
N'
u
ux
uxy
uxyv
uxyvw
uxyvwz
D(v),p(v) D(w),p(w)
2,u
5,u
2,u
4,x
2,u
3,y
3,y
2
u
2
1
x
3
w
3
1
5
z
1
y
D(y),p(y)
∞
2,x
D(z),p(z)
∞
∞
4,y
4,y
4,y
1 Initialization:
2 N' = {u}
3 for all nodes v
4
if v adjacent to u
5
then D(v) = c(u,v)
6
else D(v) = ∞
7
8 Loop
9 find w not in N' such that D(w) is a minimum
10 add w to N'
11 update D(v) for all v adjacent to w and not in N' :
12
D(v) = min( D(v), D(w) + c(w,v) )
13 /* new cost to v is either old cost to v or known
14 shortest path cost to w plus cost from w to v */
15 until all nodes in N'
5
v
D(x),p(x)
1,u
2
Network Layer
60
Dijsktra’s Algorithm, Discussion
Algorithm complexity: n nodes
• each iteration: need to check all nodes, w, not in N'
• n(n+1)/2 comparisons: O(n2)
• more efficient implementations possible: O(nlogn)
Oscillations possible:
• e.g., link cost = amount of carried traffic
D
1
1
0
A
0 0
C
e
1+e
e
initially
B
1
2+e
A
0
0
D 1+e 1 B
0
0
C
… recompute
routing
D
1
A
0 0
C
2+e
B
1+e
… recompute
Network Layer
2+e
A
0
D 1+e 1 B
e
0
C
… recompute
61
Distance Vector Algorithm (1)
Bellman-Ford (B-F) Equation (dynamic programming)
Define dx(y) := cost of least-cost path from x to y
Then
dx(y) = min {c(x,v) + dv(y) }
where min is taken over all neighbors of x
Bellman-Ford example (2)
Clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3
B-F equation says:
5
2
u
v
2
1
x
3
w
3
1
5
z
1
y
2
du(z) = min { c(u,v) + dv(z),
c(u,x) + dx(z),
c(u,w) + dw(z) }
= min {2 + 5,
1 + 3,
5 + 3} = 4
Node that achieves minimum is next
hopLayer
in shortest path ➜ forwarding table
Network
62
lnternet Routing
• Can use any of the standard routing algorithms:
– link-state
• OSPF (Open Shortest Path First)
– distance vector
• RIP (Routing Information Protocol) [RFC 1058] [RFC 1723]
• BGP (Border Gateway Protocol) (inter-AS routing)
Intra-AS Routing
• Also known as Interior Gateway Protocols (IGP)
• Most common Intra-AS routing protocols:
– RIP: Routing Information Protocol
– OSPF: Open Shortest Path First
– IGRP: Interior Gateway Routing Protocol (Cisco proprietary)
Network Layer
63
RIP (Routing Information Protocol)
•
•
•
•
•
Distance vector algorithm
Included in BSD-UNIX Distribution in 1982
Distance metric: # of hops (max = 15 hops)
Distance vectors: exchanged among neighbors every 30 sec via Response Message (also called advertisement)
Each advertisement: list of up to 25 destination nets within AS
RIP: Example
Dest
w
x
z
….
w
Next hops
C
4
…
...
A
Destination Network
w
y
z
x
….
Advertisement
from A to D
x
D
C
Next Router
A
B
B A
-….
B
y
z
Num. of hops to dest.
2
2
7 5
1
....
Network
Layer
Routing
table
in D
64
RIP: Link Failure and Recovery
• If no advertisement heard after 180 sec --> neighbor/link declared dead
–
–
–
–
–
routes via neighbor invalidated
new advertisements sent to neighbors
neighbors in turn send out new advertisements (if tables changed)
link failure info quickly propagates to entire net
poison reverse used to prevent ping-pong loops (infinite distance = 16 hops)
RIP: Table Processing
• RIP routing tables managed by application-level process called route-d (daemon)
• advertisements sent in UDP packets, periodically repeated
routed
Transprt
(UDP)
network
(IP)
link
physical
routed
forwarding
table
forwarding
table
Network Layer
Transprt
(UDP)
network
(IP)
link
physical
65
RIP: Table Example (Continued..)
Router: giroflee.eurocom.fr
Destination
-------------------127.0.0.1
192.168.2.
193.55.114.
192.168.3.
224.0.0.0
default





Gateway
Flags Ref
Use
Interface
-------------------- ----- ----- ------ --------127.0.0.1
UH
0 26492 lo0
192.168.2.5
U
2
13 fa0
193.55.114.6
U
3 58503 le0
192.168.3.5
U
2
25 qaa0
193.55.114.6
U
3
0 le0
193.55.114.129
UG
0 143454
Three attached networks (LANs)
Router only knows routes to attached LANs
Default router used to “go up”
Route multicast address: 224.0.0.0
Loopback interface (for debugging)
Network Layer
66
OSPF (Open Shortest Path First)
• “open”: publicly available
• Uses Link State algorithm
– LS packet dissemination
– Topology map at each node
– Route computation using Dijkstra’s algorithm
• OSPF advertisement carries one entry per neighbor router
• Advertisements disseminated to entire AS (via flooding)
– Carried in OSPF messages directly over IP (rather than TCP or UDP
Inter-AS routing in the Internet: BGP
R4
R5
R3
AS1
BGP
AS2
(RIP intra-AS
routing)
BGP
R1
R2
(OSPF
intra-AS
routing)
AS3
(OSPF intra-AS
routing)
Figure 4.5.2-new2: BGP use for inter-domain routing
Network Layer
67
Internet inter-AS routing: BGP
• BGP (Border Gateway Protocol): the de facto standard
• Path Vector protocol:
–
–
–
–
similar to Distance Vector protocol
each Border Gateway broadcast to neighbors (peers) entire path (i.e., sequence of AS’s) to destination
BGP routes to networks (ASs), not individual hosts
E.g., Gateway X may send its path to dest. Z:
Path (X,Z) = X,Y1,Y2,Y3,…,Z
• Suppose: gateway X send its path to peer gateway W
• W may or may not select path offered by X
– cost, policy (don’t route via competitors AS), loop prevention reasons.
• If W selects path advertised by X, then:
• Path (W,Z) = w, Path (X,Z)
• Note: X can control incoming traffic by controlling it route advertisements
to peers:
– e.g., don’t want to route traffic to Z -> don’t advertise any routes to Z
Network Layer
68
BGP: controlling who routes to you
legend:
B
W
provider
network
X
A
customer
network:
C
Y
Figure 4.5-BGPnew: a simple BGP scenario



A,B,C are provider networks
X,W,Y are customer (of provider networks)
X is dual-homed: attached to two networks





X does not want to route from B via X to C
.. so X will not advertise to B a route to C
A advertises to B the path AW
B advertises to X the path BAW
Should B advertise to C the path BAW?



No way! B gets no “revenue” for routing CBAW since neither W nor C are B’s customers
B wants to force C to route to w via A
B wants to route only to/from its customers!
Network Layer
69
BGP Operation
Q: What does a BGP router do?
• Receiving and filtering route advertisements from
directly attached neighbor(s).
• Route selection.
– To route to destination X, which path (of several
advertised) will be taken?
• Sending route advertisements to neighbors.
Network Layer
70
BGP Messages
• BGP messages exchanged using TCP.
• BGP messages:
– OPEN: opens TCP connection to peer and authenticates
sender
– UPDATE: advertises new path (or withdraws old)
– KEEPALIVE keeps connection alive in absence of UPDATES;
also ACKs OPEN request
– NOTIFICATION: reports errors in previous msg; also used to
close connection
Network Layer
71
Why different Intra- and Inter-AS routing ?
• Policy:
– Inter-AS: admin wants control over how its traffic routed, who routes
through its net.
– Intra-AS: single admin, so no policy decisions needed
• Scale:
– hierarchical routing saves table size, reduced update traffic
• Performance:
– Intra-AS: can focus on performance
– Inter-AS: policy may dominate over performance
Network Layer
72
Path vector messages
Network
Next Router
N01
R01
AS14, AS23, AS67
N02
R05
AS22, AS67, AS05, AS89
N03
R06
AS67, AS89, AS09, AS34
N04
R12
AS62, AS02, AS09
Network Layer
Path
73
Broadcast and Multicast Routing Algorithms
In broadcast routing, the network layer provides a service of delivering a packet sent from a source
node to all other nodes in the network;
multicast routing enables a single source node to send a copy of a packet to a subset of the other
network nodes.
Broadcast Routing Algorithms




The most straightforward way to accomplish broadcast communication is for the sending node to send a separate copy
of the packet to each destination, as shown in Figure 4.43(a). Given N destination nodes, the source node simply makes
N copies of the packet, addresses each copy to a different destination, and then transmits the N copies to the N
destinations using unicast routing. This N-wayunicast approach to broadcasting is simple—no new network-layer
routing protocol, packet-duplication, or forwarding functionality is needed.
Among the several drawbacks to this approach. The first drawback is its inefficiency. If the source node is connected to the
rest of the network via a single link, then N separate copies of the (same) packet will traverse this single link. It would clearly
be more efficient to send only a single copy of a packet over this first hop and then have the node at the other end of the
first hop make and forward any additional needed copies. That is, it would be more efficient for the network nodes
themselves (rather than just the source node) to create duplicate copies of a packet. For example, in Figure 4.43(b), only a
single copy of a packet traverses the R1-R2 link. That packet is then duplicated at R2, with a single copy being sent over
links R2-R3 and R2-R4.
The additional drawbacks of N-way-unicast is that broadcast recipients, and their addresses, are known to the sender.
A final drawback of N-way-unicast relates to the purposes for which broadcast is to be used.
Duplicate creation/transmission
duplicate
R1
R1
R2
R2
R3
R4
(a)
duplicate
R3
R4
(b)
Figure 4.43 Source-duplication versus in-network duplication. (a) source duplication, (b) in-network duplication
Network Layer
75
Uncontrolled & Controlled Flooding
 Uncontrolled Flooding: The most obvious technique for achieving broadcast is a flooding
approach in which the source node sends a copy of the packet to all of its neighbors. When a
node receives a broadcast packet, it duplicates the packet and forwards it to all of its neighbors.
For example, in Figure 4.43, R2 will flood to R3, R3 will flood to R4, R4 will flood to R2, and R2
will flood (again!) to R3, and so on. This results in the endless cycling of two broadcast packets,
one clockwise, and one counterclockwise. But there can be an even more calamitous fatal flaw:
When a node is connected to more than two other nodes, it will create and forward multiple
copies of the broadcast packet, each of which will create multiple copies of itself (at other nodes
with more than two neighbors), and so on. This broadcast storm, resulting from the endless
multiplication of broadcast packets.
 Controlled Flooding: The key to avoiding a broadcast storm is for a node to judiciously choose
when to flood a packet and when not to flood a packet. In practice, this can be done in one of
several ways.


In sequence-number-controlled flooding, a source node puts its address (or other
unique identifier) as well as a broadcast sequence number into a broadcast packet, then
sends the packet to all of its neighbors. Each node maintains a list of the source address
and sequence number of each broadcast packet it has already received, duplicated, and
forwarded. When a node receives a broadcast packet, it first checks whether the packet is
in this list. If so, the packet is dropped; if not, the packet is duplicated and forwarded to all
the node’s neighbors (except the node from which the packet has just been received).
A second approach to controlled flooding is known as reverse path forwarding (RPF)
[Dalal 1978], also sometimes referred to as reverse path broadcast (RPB). The idea behind
RPF is simple, yet elegant.
Network Layer
76
Controlled Flooding
 In RPF when a router receives a broadcast packet with a given source address, it transmits
the packet on all of its outgoing links (except the one on which it was received) only if the
packet arrived on the link that is on its own shortest unicast path back to the source.
Otherwise, the router simply discards the incoming packet without forwarding it on any of
its outgoing links.
 Note that RPF does not use unicast routing to actually deliver a packet to a destination, nor does it
require that a router know the complete shortest path from itself to the source. RPF need only
know the next neighbor on its unicast shortest path to the sender; it uses this neighbor’s identity
only to determine whether or not to flood a received broadcast packet.
Figure 4.44: Reverse path forwarding (RPF)
Network Layer
77
Transport Layer
Transport Services and Protocols
• provide logical communication
between app processes running
on different hosts
• transport protocols run in end
systems
– send side: breaks app
messages into segments,
passes to network layer
– rcv side: reassembles
segments into messages,
passes to app layer
• more than one transport protocol
available to apps
– Internet: TCP and UDP
application
transport
network
data link
physical
Network Layer
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
application
transport
network
data link
physical
79
Transport vs. Network Layer
• Network Layer: logical communication between hosts
– PDU: Datagram
– Datagram’s may be lost, duplicated, reordered in the Internet – “best effort”
service
• Transport Layer: logical communication between processes
– relies on, enhances, network layer services
– PDU: Segment
– extends “host-to-host” communication to “process-to-process”
communication
Network Layer
80
TCP/IP Transport Layer Protocols
• reliable, in-order delivery (TCP)
– congestion control
– flow control
– connection setup
• unreliable, unordered delivery: UDP
– no-frills extension of “best-effort” IP
– What does UDP provide in addition to IP?
• services not provided by IP (network layer):
– delay guarantees
– bandwidth guarantees
Network Layer
81
Multiplexing/Demultiplexing
HTTP
Transport
Layer
FTP
Telnet
Network
Layer
Transport
Layer
Network
Layer
• Use same communication channel between hosts for several
logical communication processes
• How does Mux/DeMux work?
– Sockets:
doors between process & host
– UDP socket: (dest. IP, dest. Port)
– TCP socket: (src. IP, src. port, dest. IP, dest. Port)
Network Layer
82
Connectionless demux
• UDP socket identified by two-tuple:
– (dest IP address, dest port number)
• When host receives UDP segment:
– checks destination port number in segment
– directs UDP segment to socket with that port number
• IP datagrams with different source IP addresses and/or source port
numbers directed to same socket
Connection-oriented demux
• TCP socket identified by 4-tuple:
–
–
–
–
source IP address
source port number
dest IP address
dest port number
• Server host may support many
simultaneous TCP sockets:
– each socket identified by its own
4-tuple
• recv host uses all four values to
direct segment to appropriate
socket
• Web servers have different
sockets for each connecting
client
Network Layer
– non-persistent HTTP will have
different socket for each request
83
User Datagram Protocol (UDP) [RFC 768]
• “no frills,” “bare bones” Internet transport protocol
• “best effort” service, UDP segments may be:
– lost
– delivered out of order to app
• Why use UDP?
– No connection establishment cost (critical for some applications,
e.g., DNS)
– No connection state
– Small segment headers (only 8 bytes)
– Finer application control over data transmission
Network Layer
84
UDP Segment Structure
• often used for streaming
multimedia apps
– loss tolerant
Length, in
bytes of UDP
– rate sensitive
• other UDP uses
– DNS
– SNMP
• reliable transfer over UDP: add
reliability at application layer
– application-specific error
recovery!
segment,
including
header
32 bits
source port #
dest port #
length
checksum
Application
data
(message)
UDP segment format
Network Layer
85
TCP Segment Structure
32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
Internet
checksum
(as in UDP)
source port #
dest port #
sequence number
acknowledgement number
head not
len used
U A P R S F Receive window
checksum
Urg data pnter
Options (variable length)
counting
by bytes
of data
(not segments!)
# bytes
rcvr willing
to accept
application
data
(variable length)
Network Layer
86
TCP Sequence Number and Acknowledgement
•
•
•
•
TCP views data as unstructured, but ordered stream of bytes.
Sequence numbers are over bytes, not segments
Initial sequence number is chosen randomly
TCP is full duplex – numbering of data is independent in each
direction
• Acknowledgement number – sequence number of the next byte
expected from the sender
• ACKs are cumulative
Network Layer
87
TCP seq. #’s and ACKs
Seq. #’s:
– byte stream
“number” of first byte
in segment’s data
ACKs:
– seq # of next byte
expected from other
side
– cumulative ACK
Q: how receiver handles outof-order segments
– A: TCP spec doesn’t
say, - up to
implementor
Host A
Host B
1000 byte
data
host ACKs
receipt of
data
Host sends
another
500 bytes
time
Network Layer
88
TCP Reliable Data Transfer
• TCP creates rdt service on top of
IP’s unreliable service
• Pipelined segments
• Cumulative acks
• TCP uses single retransmission
timer
• Retransmissions are triggered by:
– timeout events
– duplicate acks
• Initially consider simplified TCP
sender:
Network Layer
– ignore duplicate acks
– ignore flow control, congestion
control
89
TCP Sender Events
data rcvd from app:
• Create segment with seq #
• seq # is byte-stream number
of first data byte in segment
• start timer if not already
running (think of timer as for
oldest unacked segment)
• expiration interval:
TimeOutInterval
timeout:
• retransmit segment that
caused timeout
• restart timer
Ack rcvd:
• If acknowledges previously
unacked segments
Network Layer
– update what is known to be acked
– start timer if there are
outstanding segments
90
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever) {
switch(event)
TCP
sender
event: data received from application above
(simplified)
create TCP segment with sequence number NextSeqNum
if (timer currently not running)
start timer
Comment:
pass segment to IP
• SendBase-1: last
NextSeqNum = NextSeqNum + length(data)
cumulatively
ack’ed byte
event: timer timeout
Example:
retransmit not-yet-acknowledged segment with
• SendBase-1 = 71;
smallest sequence number
y= 73, so the rcvr
start timer
wants 73+ ;
y > SendBase, so
event: ACK received, with ACK field value of y
that new data is
if (y > SendBase) {
acked
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
Network Layer
91
}
TCP Flow Control

receive side of TCP connection has
a receive buffer:
TCP Flow Control: How it works
(Suppose TCP receiver discards out-oforder segments)
 spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd LastByteRead]

app process may be slow at reading
from buffer
flow control

sender won’t overflow receiver’s
buffer by transmitting too much,
too fast


Rcvr advertises spare room by
including value of RcvWindow in
segments
Sender limits unACKed data to
RcvWindow
– guarantees receive buffer doesn’t
overflow
speed-matching service: matching
the send rate to the receiving app’s
drain rate
Network Layer
92
Silly Window Syndrome
 Recall: TCP uses sliding window
 “Silly Window” occurs when small-sized segments are
transmitted, resulting in inefficient use of the network pipe
 For e.g., suppose that TCP sender generates data slowly, 1-byte
at a time
 Solution: wait until sender has enough data to transmit –
“Nagle’s Algorithm”
Nagle’s Algorithm
1. TCP sender sends the first piece of data obtained from the application (even if
data is only a few bytes).
2. Wait until enough bytes have accumulated in the TCP send buffer or until an ACK
is received.
3. Repeat step 2 for the remainder of the transmission.
Network Layer
93
Silly Window Syndrome
• Suppose that the receiver consumes data slowly
– Receive Window opens slowly, and thus sender is forced
to send small-sized segments
• Solutions
– Delayed ACK
– Advertise Receive Window = 0, until reasonable amount
of space available in receiver’s buffer
Network Layer
94
TCP Connection Management
Recall: TCP sender, receiver
establish “connection” before
exchanging data segments
• initialize TCP variables:
– seq. #s
– buffers, flow control info
(e.g. RcvWindow)
• client: connection initiator
Socket clientSocket = new
Socket("hostname","port
number");
• server: contacted by client
Socket connectionSocket =
welcomeSocket.accept();
Three way handshake:
Step 1: client host sends TCP SYN
segment to server
– specifies initial seq #
– no data
Step 2: server host receives SYN,
replies with SYNACK segment
– server allocates buffers
– specifies server initial seq. #
Step 3: client receives SYNACK, replies
with ACK segment, which may
contain data
Network Layer
95
TCP Connection Establishment
client
CLOSED
Active open;
SYN
Passive open
SYN/SYN+ACK
server
LISTEN
SYN_SENT
SYN_RCVD
SYN+ACK/ACK
ACK
Established
Solid line for client
Dashed line for server
Network Layer
96
Principles of Congestion Control
• Congestion: informally: “too many sources sending too much
data too fast for network to handle”
• Different from flow control!
• Manifestations:
– Packet loss (buffer overflow at routers)
– Increased end-to-end delays (queuing in router buffers)
• Results in unfairness and poor utilization of network resources
– Resources used by dropped packets (before they were lost)
– Retransmissions
– Poor resource allocation at high load
Network Layer
97
Historical Perspective
• October 1986, Internet had its first congestion
collapse
• Link LBL to UC Berkeley
– 400 yards, 3 hops, 32 Kbps
– throughput dropped to 40 bps
– factor of ~1000 drop!
• Van Jacobson proposes TCP Congestion Control:
– Achieve high utilization
– Avoid congestion
– Share bandwidth
Network Layer
98
Congestion Control: Approaches
• Goal: Throttle senders as needed to ensure load
on the network is “reasonable”
• End-end congestion control:
– no explicit feedback from network
– congestion inferred from end-system observed loss, delay
– approach taken by TCP
• Network-assisted congestion control:
– routers provide feedback to end systems
– single bit indicating congestion (e.g., ECN)
– explicit rate sender should send at
Network Layer
99
TCP Congestion Control: Overview
• end-end control (no network assistance)
• Limit the number of packets in the network to
window W
• Roughly,
rate =
W
RTT
Bytes/sec
• W is dynamic, function of perceived network
congestion
Network Layer
100
What is the maximum value of W ?
•
•
•
•
•
When a connection is set up, the congestion window, a value maintained independently at
each host, is set to a small multiple of the maximum segment size (MSS) allowed on that
connection. Further variance in the congestion window is dictated by an Additive
Increase/Multiplicative Decrease approach.
The maximum segment size (MSS) is a parameter of the TCP protocol that specifies the
largest amount of data, specified in octets (8 bits = 1 byte), that a computer or
communications device can receive in a single TCP segment. It does not count the TCP
header or the IP header. The IP datagram containing a TCP segment may be self-contained
within a single packet, or it may be reconstructed from several fragmented pieces; either
way, the MSS limit applies to the total amount of data contained in the final, reconstructed
TCP segment.
The default TCP Maximum Segment Size is 536 bytes. To avoid fragmentation in the IP layer, a
host must specify the maximum segment size as equal to the largest IP datagram that the
host can handle minus the IP header size and TCP header sizes. Therefore IPv4 hosts are
required to be able to handle an MSS of 536 octets (= 576 - 20 - 20) and IPv6 hosts are
required to be able to handle an MSS of 1220 octets (= 1280 - 40 - 20).
Low MSS values will reduce or eliminate IP fragmentation, but will result in higher overhead.
For most computer users, the MSS option is established by the operating system.
Network Layer
101
Slow Start
• “Slow Start” is used to reach
the equilibrium state
• Initially: W = 1 (slow start)
• On each successful ACK:
WW+1
• Exponential growth of W
each RTT: W  2 x W
• Enter Congestion Avoidance
(CA) when
W >= ssthresh
• ssthresh: window size after
which TCP carefully analyses
for bandwidth
Network Layer
receiver
sender
cwnd
1
2
data
segment
ACK
3
4
5
6
7
8
102
Congestion Avoidance (CA)
sender
• Starts when
W  ssthresh
• On each successful ACK:
W  W+ 1/W
• Linear growth of W each RTT:
WW+1
1
receiver
data segment
ACK
2
3
4
CA: Additive Increase, Multiplicative Decrease
• We have “additive increase”
in the absence of loss events
• After loss event, decrease
congestion window by half –
“multiplicative decrease”
– ssthresh = W/2
– Enter Slow Start
Multiplicative decrease
Network Layer
103
Detecting Packet Loss
• Assumption: loss
indicates congestion
• Option 1: time-out
10
11
12
X
13
– Waiting for a time-out can
be long!
14
15
16
17
10
11
11
• Option 2: duplicate ACKs
– How many? At least 3.
11
11
Sender
Network Layer
Receiver
104
Fast Retransmit
• Wait for a timeout is quite long
• Immediately retransmits after 3 duplicate ACKs
without waiting for timeout
• Adjusts ssthresh
ssthresh  W/2
• Enter Slow Start
W=1
Network Layer
105
Examples
Example: Suppose that a TCP congestion window is set to 128 KB
and a timeout occurs. How big will the window be if next four
transmission bursts are all successful? Assume the maximum
segment size is 1 KB.
Solution: The next transmission will be 1 maximum segment
size. Then 2, 4, & 8. So after four successes, it will be 8 KB.
Network Layer
106
How to Set TCP Timeout Value?
• longer than RTT
– but RTT varies
• too short: premature timeout
– unnecessary retransmissions
• too long: slow reaction to segment loss
How to Estimate RTT?
• SampleRTT: measured time from segment transmission until ACK receipt
– ignore retransmissions
• SampleRTT will vary, want estimated RTT “smoother”
– average several recent measurements, not just current SampleRTT
Network Layer
107
Peer-to-peer (P2P) Networks
•
P2P Networks: Internet users that are ready to share their resources become peers and
form a network. When a peer in the network has a file (for example, an audio or video file)
to share, it makes it available to the rest of the peers. The P2P networks is divided into
two categories: centralized and decentralized.
– Centralized Networks: The directory system ⎯ listing of the peers and what they offer
⎯ uses the client-server paradigm, but the storing and downloading of the files are
done using the peer-to-peer paradigm. For this reason, a centralized P2P network is
sometimes referred to as a hybrid P2P network.
– Decentralized Networks: A decentralized P2P network does not depend on a
centralized directory system. In this model, peers arrange themselves into an overlay
network, which is a logical network made on top of the physical network.
Depending on how the nodes in the overlay network are linked, a decentralized P2P
network is classified as either unstructured or structured.
 Unstructured Networks: In an unstructured P2P network, the nodes are linked randomly. A search in
an unstructured P2P is not very efficient because a query to find a file must be flooded through the
network, which produces significant traffic and still the query may not be resolved. Two examples of
this type of network are Gnutella and Freenet.
 Structured Networks: A structured network uses a predefined set of rules to link nodes so that a
query can be effectively and efficiently resolved. The most common technique used for this purpose
is the Distributed Hash Table (DHT). DHT is used in many applications including Distributed Data
Structure (DDS), Content Distributed Systems (CDS), Domain Name System (DNS), and P2P file
sharing. One popular P2P file sharing protocol that uses the DHT is BitTorrent. We discuss DHT
independently in the next section as a technique that can be used both in structured P2P networks
and in other systems.
Network Layer
108
Note: For the details please have a look on chapter # 30 of “Data Communications
and Networking” (5E) by Forouzan
Quality of Service (QOS)
Flow Characteristics
&
Flow Classes
Flow characteristics
A characteristic that a flow needs
in order to deliver the packets
safe and sound to the
destination. Lack of reliability
means losing a packet or
acknowledgment, which entails
retransmission.
Source-to-destination delay is another
flow characteristic. Applications can
tolerate delay in different degrees.
The variation in delay for packets
belonging to the same flow. For example, if
four packets depart at times 0, 1, 2, 3 and
arrive at 20, 21, 22, 23, all have the same
delay, 20 units of time. On the other hand,
if the above four packets arrive at 21, 23,
24, and 28, they will have different delays.
Different applications need different bandwidths. In
video conferencing we need to send millions of bits per
second to refresh a color screen while the total number
of bits in an e-mail may not reach even a million..
Techniques to Improve QoS
 Scheduling Techniques
 Traffic Shaping Techniques
 Resource Reservation
 Admission Control
110
FIFO queue (Scheduling Techniques)
Priority queuing (Scheduling Techniques)
Weighted fair queuing (Scheduling Techniques)
Network Layer
111
Leaky bucket (Traffic Shaping Techniques)
12 Mbps for 2 seconds, for a total of
24 Mb of data. The host is silent for 5
seconds and then sends data at a rate
of 2 Mbps for 3 seconds, for a total of
6 Mb of data. In all, the host has sent
30 Mb of data in 10 seconds.
Host has sent 30 Mb of data in 10
seconds.
Leaky bucket implementation (Traffic Shaping Techniques)
The algorithm for variable-length packets:
1. Initialize a counter to n at the tick of the clock.
2. If n is greater than the size of the packet, send
the packet and decrement the counter by the
packet size. Repeat this step until the counter
value is smaller than the packet size.
3. Reset the counter to n and go to step 1.
A leaky bucket algorithm shapes bursty traffic into
fixed-rate traffic by averaging the data rate. It may
drop the packets
if the bucket is full.
Network Layer
112
Token bucket (Traffic Shaping Techniques)
 The leaky bucket is very restrictive. It does not credit an idle host. For example, if a host is
not sending for a while, its bucket becomes empty. Now if the host has bursty data, the leaky
bucket allows only an average rate. The time when the host was idle is not taken into
account.
 On the other hand, the token bucket algorithm allows idle hosts to accumulate credit for the
future in the form of tokens.
The token bucket allows bursty traffic at a regulated maximum rate.
Combining Token Bucket and Leaky Bucket: The two techniques can be combined to credit an idle
host and at the same time regulate the traffic. The leaky bucket is applied after the token bucket;
the rate of the leaky bucket needs to be higher than the rate of tokens dropped in the bucket.
113
Resource Reservation
 A flow of data needs resources such as a buffer, bandwidth, CPU time, and so on. The QoS is
improved if these resources are reserved beforehand. Below, we discuss a QoS model called
Integrated Services (INTSERV), which depends heavily on resource reservation to improve the
quality of service.
Admission Control
 Admission control refers to the mechanism used by a router or a switch to accept or reject a
flow based on predefined parameters called flow specifications. Before a router accepts a flow
for processing, it checks the flow specifications to see if its capacity can handle the new flow. It
takes into account bandwidth, buffer size, CPU speed, etc., as well as its previous commitments
to other flows. Admission control in ATM networks is known as Connection Admission Control
(CAC), which is a major part of the strategy for controlling congestion.
114
Integrated Services (INTSERV)
Integrated Services is a flow-based QoS model designed for IP.
In this model packets are marked by routers according to flow characteristics.
 What is important are the resources the application needs, not what the application is doing.
The model is based on three schemes:
1. The packets are first classified according to the service they require.
2. The model uses scheduling to forward the packets according to their flow characteristics.
3. Devices like routers use admission control to determine if the device has the capability
(available resources to handle the flow) before making a commitment. For example, if an
application requires a very high data rate, but a router in the path cannot provide such a data
rate, it denies the admission.
 Flow Specification: is made of two parts:
1. Rspec (resource specification). Defines resource that flow needs to reserve (buffer, bandwidth, etc.)
2. Tspec (traffic specification). Defines the traffic characterization of the flow.
 Admission: After a router receives the flow specification from an application, it decides to admit or
deny the service. The decision is based on the previous commitments of the router and the current
availability of the resource.
 Service Classes: Two classes of services have been defined for Integrated Services:
1.
2.
Guaranteed service: Designed for real-time traffic that needs a guaranteed minimum end-to-end delay.
Controlled-load service: Designed for applications that can accept some delays but are sensitive to an
overloaded network and to the danger of losing packets. Good examples of these types of applications are
file transfer, e-mail, and Internet access.
115
 Resource Reservation Protocol (RSVP): The Integrated Services model needs a
connection-oriented network layer. Since IP is a connectionless protocol, a new protocol
is designed to run on top of IP to make it connection-oriented.
 RSVP Messages: RSVP has several types of messages. However, we discuss only
two of them: Path and Resv.
Path messages
Path message travels from the sender and reaches all receivers in the multicast path. On the way, a
Path message stores the necessary information for the receivers. A Path message is sent in a
multicast environment; a new message is created when the path diverges.
Resv messages
After a receiver has received a Path message, it sends a Resv message. The Resv message travels
toward the sender (upstream) and makes a resource reservation on the routers that support RSVP.
116
Reservation merging
Reservation styles
Network Layer
117
Differentiated Services
An Alternative to Integrated Services
Differentiated Services is a class-based QoS model designed for IP.
DS field
Traffic conditioner
118