* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Network Layer - Spartans Fall-14
Survey
Document related concepts
Multiprotocol Label Switching wikipedia , lookup
TCP congestion control wikipedia , lookup
Asynchronous Transfer Mode wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Distributed firewall wikipedia , lookup
Deep packet inspection wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
Computer network wikipedia , lookup
Internet protocol suite wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Network tap wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Airborne Networking wikipedia , lookup
UniPro protocol stack wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Transcript
Computer Networks Research Center (CNRC) Computer Science Department, CIIT, Lahore. http://research.ciitlahore.edu.pk/Groups/CNRC/ Network Layer Agenda • • • • • Network Layer Internet Protocol IP Datagram NAT Subneting Network Layer 2 The Internet Network Layer Host, router network layer functions: Transport layer: TCP, UDP Network layer IP protocol •addressing conventions •datagram format •packet handling conventions Routing protocols •path selection •RIP, OSPF, BGP forwarding table ICMP protocol •error reporting •router “signaling” Link layer physical layer Network Layer 3 IP Datagram Format IP protocol version number header length (bytes) “type” of data max number remaining hops (decremented at each router) upper layer protocol to deliver payload to 32 bits type of ver head. len service length fragment 16-bit identifier flgs offset upper time to Internet layer live checksum total datagram length (bytes) for fragmentation/ reassembly 32 bit source IP address 32 bit destination IP address Options (if any) data (variable length, typically a TCP or UDP segment) Network Layer E.g. timestamp, record route taken, specify list of routers to visit. 4 IP Datagram Format Note that an IP datagram has a total of 20 bytes of header (assuming no options). If the datagram carries a TCP segment, then each (nonfragmented) datagram carries a total of 40 bytes of header (20 bytes of IP header plus 20 bytes of TCP header) along with the application-layer message. how much overhead with TCP? 20 bytes of TCP 20 bytes of IP = 40 bytes + app layer overhead Network Layer 5 IP Fragmentation & Reassembly • • • The maximum amount of data that a link-layer frame can carry is called the maximum transmission unit (MTU). Network links have MTU (max.transfer size) - largest possible link-level frame. – different link types, different MTUs large IP datagram divided (“fragmented”) within net – one datagram becomes several datagrams – “reassembled” only at final destination – IP header bits used to identify, order related fragments fragmentation: in: one large datagram out: 3 smaller datagrams reassembly Some protocols can carry big datagrams, whereas other protocols can carry only little packets. For example, Ethernet frames can carry up to 1,500 bytes of data, whereas frames for some wide-area links can carry no more than 576 bytes. Network Layer 6 IP Fragmentation and Reassembly A datagram of 4,000 bytes (20 bytes of IP header plus 3,980 bytes of IP payload) arrives at a router and must be forwarded to a link with an MTU of 1,500 bytes. This implies that the 3,980 data bytes in the original datagram must be allocated to three separate fragments (each of which is also an IP datagram). Suppose that the original datagram is stamped with an identification number of 777. Example 4000 byte datagram MTU = 1500 bytes 1480 bytes in data field offset = 1480/8 = 185 length =4000 ID fragflag =777 =0 offset =0 One large datagram becomes several smaller datagrams length =1500 ID fragflag =777 =1 offset =0 length =1500 ID fragflag offset =185 length =1040 ID fragflag =777 =0 offset =370 Network Layer =777 =1 7 IP Fragmentation and Reassembly Network Layer 8 IP Addressing: Introduction • IP address: 32-bit identifier for host, router interface • interface: connection between host/router and physical link – router’s typically have multiple interfaces – host may have multiple interfaces – IP addresses associated with each interface 223.1.1.1 223.1.2.1 223.1.1.2 223.1.1.4 223.1.1.3 223.1.2.9 223.1.3.27 223.1.2.2 223.1.3.2 223.1.3.1 223.1.1.1 = 11011111 00000001 00000001 00000001 223 Network Layer 1 1 1 9 Subnets • IP address: 223.1.1.1 – subnet part (high order bits) – host part (low order bits) • What’s a subnet ? – device interfaces with same subnet part of IP address – can physically reach each other without intervening router 223.1.2.1 223.1.1.2 223.1.1.4 223.1.1.3 223.1.2.9 223.1.3.27 223.1.2.2 LAN 223.1.3.1 223.1.3.2 network consisting of 3 subnets Network Layer 10 Subnets 223.1.1.0/24 223.1.2.0/24 Recipe • To determine the subnets, detach each interface from its host or router, creating islands of isolated networks. Each isolated network is called a subnet. 223.1.3.0/24 Subnet mask: /24 Network Layer 11 Subnets 223.1.1.2 How many? 223.1.1.1 223.1.1.4 223.1.1.3 223.1.9.2 223.1.7.0 223.1.9.1 223.1.7.1 223.1.8.1 223.1.8.0 223.1.2.6 223.1.2.1 223.1.3.27 223.1.2.2 Network Layer 223.1.3.1 223.1.3.2 12 IP Addresses: How to get one? Q: How does host get IP address? • hard-coded by system admin in a file – Wintel: control-panel->network->configuration->tcp/ip>properties – UNIX: /etc/rc.config • DHCP: Dynamic Host Configuration Protocol: dynamically get address from as server – “plug-and-play” (more in next chapter) Network Layer 13 IP Addressing Q: How does an ISP get block of addresses? A: ICANN: Internet Corporation for Assigned Names and Numbers – allocates addresses – manages DNS – assigns domain names, resolves disputes Network Layer 14 NAT: Network Address Translation rest of Internet local network (e.g., home network) 10.0.0/24 10.0.0.4 10.0.0.1 10.0.0.2 138.76.29.7 10.0.0.3 All datagrams leaving local network have same single source NAT IP address: 138.76.29.7, different source port numbers Datagrams with source or destination in this network have 10.0.0/24 address for source, destination (as usual) Network Layer 15 NAT: Network Address Translation • Motivation: local network uses just one IP address as far as outside word is concerned: – no need to be allocated range of addresses from ISP: - just one IP address is used for all devices – can change addresses of devices in local network without notifying outside world – can change ISP without changing addresses of devices in local network – devices inside local net not explicitly addressable, visible by outside world (a security plus). Network Layer 16 NAT: Network Address Translation Implementation: NAT router must: – outgoing datagrams: replace (source IP address, port #) of every outgoing datagram to (NAT IP address, new port #) . . . remote clients/servers will respond using (NAT IP address, new port #) as destination address. – remember (in NAT translation table) every (source IP address, port #) to (NAT IP address, new port #) translation pair – incoming datagrams: replace (NAT IP address, new port #) in destination fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table Network Layer 17 NAT: Network Address Translation 2: NAT router changes datagram source addr from 10.0.0.1, 3345 to 138.76.29.7, 5001, updates table 2 NAT translation table WAN side addr LAN side addr 1: host 10.0.0.1 sends datagram to 128.119.40, 80 138.76.29.7, 5001 10.0.0.1, 3345 …… …… S: 10.0.0.1, 3345 D: 128.119.40.186, 80 S: 138.76.29.7, 5001 D: 128.119.40.186, 80 138.76.29.7 S: 128.119.40.186, 80 D: 138.76.29.7, 5001 3: Reply arrives dest. address: 138.76.29.7, 5001 3 1 10.0.0.4 S: 128.119.40.186, 80 D: 10.0.0.1, 3345 10.0.0.1 10.0.0.2 4 10.0.0.3 4: NAT router changes datagram dest addr from 138.76.29.7, 5001 to 10.0.0.1, 3345 Network Layer 18 NAT: Network Address Translation • 16-bit port-number field: – 60,000 simultaneous connections with a single LAN-side address! • NAT is controversial: – routers should only process up to layer 3 – violates end-to-end argument • NAT possibility must be taken into account by app designers, eg, P2P applications – address shortage should instead be solved by IPv6 Network Layer 19 Binary and Decimal Conversions Network Layer 20 The 32 bit Binary IP Address • • • • • • • • • An IP address is represented by a 32 bit binary number The value of the right-most bit (also called the least significant bit) is either 0 or 1 The corresponding decimal value of each bit doubles as you move left in the binary number So the decimal value of the 2nd bit from the right is either 0 or 2. The third bit is either 0 or 4, the fourth bit 0 or 8, etc ... IP addresses are expressed as dotted-decimal numbers we break up the 32 bits of the address into four octets (an octet is a group of 8 bits). The maximum decimal value of each octet is 255 The largest 8 bit binary number is 11111111 Those bits, from left to right, have decimal values of 128, 64, 32, 16, 8, 4, 2, and 1. Added together, they total 255 Network Layer 21 IP Address Component Fields The network number of an IP address identifies the network to which a device is attached The host portion of an IP address identifies the specific device on that network Network Layer 22 IP Address Classes The American Registry for Internet Numbers (ARIN) is the Regional Internet Registry (RIR) for Canada, the United States, and many Caribbean and North Atlantic islands. When written in a binary format, the first (leftmost) bit of a Class A address is always 0. Example of a Class A IP address is 124.95.44.15 The first octet, 124, identifies the network number assigned by ARIN which will range from 0-126. (127 does start with a 0 bit, but has been reserved for special purposes.) The internal administrators of the network assign the remaining 24 bits All Class A IP addresses use only the first 8 bits to identify the network part of the address Every network that uses a Class A IP address can have assigned up to 2 to-the-power of 24 (224) (minus 2), or 16,777,214, possible IP addresses to devices that are attached to its network Network Layer 23 The first 2 bits of a Class B address are always 10 (one and zero). example of a Class B IP address is 151.10.13.28 The first two octets identify the network number assigned by ARIN Class B IP addresses always have values ranging from 128 to 191 in their first octet Every network that uses a Class B IP address can have assigned up to 2 to-the-power of 16 (216) (minus 2 again!), or 65,534, possible IP addresses to devices that are attached to its network The first 3 bits of a Class C address are always 110 (one, one and zero). An example of a Class C IP address is 201.110.213.28 The first three octets identify the network number Class C IP addresses always have values ranging from 192 to 223 in their first octet. All Class C IP addresses use the first 24 bits to identify the network part of the address Every network that uses a Class C IP address can have assigned up to 28 (minus 2), or 254, possible IP addresses to devices that areNetwork attached Layerto its network 24 IP Reserved Address If your computer wanted to communicate with all of the devices on a network, it would be quite unmanageable to write out the IP address for each device An IP address that ends with binary 0s in all host bits is reserved for the network address (sometimes called the wire address) Therefore, as a Class A network example, 113.0.0.0 is the IP address of the network containing the host 113.1.2.3 A router uses a network's IP address when it forwards data on the Internet. As a Class B network example, the IP address 176.10.0.0 is a network address. If you wanted to send data to all of the devices on a network, you would need to use a broadcast address Broadcast IP addresses end with binary 1s in the entire host part of the address (the host field) For the network in the example (176.10.0.0) where the last 16 bits make up the host field (or host part of the address), the broadcast that would be sent out to all devices on that network would include a destination address of 176.10.255.255 (since 255 is the decimal value of an octet containing 11111111). When you send a broadcast packet on a network, all devices on the network notice it Network Layer 25 Subnets • Network administrators sometimes need to divide networks, especially large ones, into smaller networks • These smaller divisions are called subnetworks and provide addressing flexibility. Most of the time subnetworks are simply referred to as subnets • each subnet address is unique. • Subnet addresses include the Class A, Class B, or Class C network portion, plus a subnet field and a host field Network Layer 26 The 32 bit Binary IP Address The subnet field and the host field are created from the original host portion for the entire network To create a subnet address, a network administrator borrows bits from the original host portion and designates them as the subnet field. The minimum number of bits that can be borrowed is 2 Network Layer 27 Why Subnets! A primary reason for using subnets is to reduce the size of a broadcast domain When broadcast traffic begins to consume too much of the available bandwidth, network administrators may choose to reduce the size of the broadcast domain. Network Layer 28 Subnet Mask • The subnet mask (formal term: extended network prefix), is not an address, but determines which part of an IP address is the network field and which part is the host field • A subnet mask is 32 bits long and has 4 octets, just like an IP address. • To determine the subnet mask for a particular subnetwork IP address follow these steps – Express the subnetwork IP address in binary form – Replace the network and subnet portion of the address with all 1s – Replace the host portion of the address with all 0s – As the last step convert the binary expression back to dotted-decimal notation. The term "operations" in mathematics refers to rules that define how one number combines with other numbers The basic Boolean operations are AND, OR, and NOT. Network Layer 29 The AND Function In order to route a data packet, the router must first determine the destination network/subnet address by performing a logical AND using the destination host's IP address and the subnet mask The result will be the network/subnet address The router has received a packet for host 131.108.2.2 - it uses the AND operation to learn that this packet should be routed to subnet 131.108.2.0 Network Layer 30 Subnet Mask (Cont.) The process used to apply the subnet mask involves Boolean Algebra to filter out non-matching bits to identify the network address. Boolean Algebra Boolean Algebra is a process that applies binary logic to yield binary results. Working with subnet masks, you need only 4 basic principles of Boolean Algebra: 1 and 1 = 1 1 and 0 = 0 0 and 1 = 0 0 and 0 = 0 In another words, the only way you can get a result of a 1 is to combine 1 & 1. Everything else will end up as a 0. The process of combining binary values with Boolean Algebra is called Anding. Network Layer 31 A Trial Separation Subnet masks apply only to Class A, B or C IP addresses. The subnet mask is like a filter that is applied to a message’s destination IP address. Its objective is to determine if the local network is the destination network. The subnet mask goes like this: 1. If a destination IP address is 206.175.162.21, we know that it is a Class C address & that its binary equivalent is: 11001110.10101111.10100010.00010101 2. We also know that the default standard Class C subnet mask is: 255.255.255.0 and that its binary equivalent is: 11111111.11111111.11111111.00000000 3. When these two binary numbers (the IP address & the subnet mask) are combined using Boolean Algebra, the Network ID of the destination network is the result: 4. The result is the IP address of the network which in this case is the same as the local network & means that the message is for a node on the local network. Network Layer 32 Default Standard Subnet Masks There are default standard subnet masks for Class A, B and C addresses: Subnetting Networks ID A 3-step example of how the default Class A subnet mask is applied to a Class A address: The default Class A subnet mask (255.0.0.0) is AND’d with the Class A address (123.123.123.001) using Boolean Algebra, which results in the Network ID (123.0.0.0) being revealed. The default Class B subnet mask (255.255.0.0) strips out the 16-bit network ID & the default Class C subnet mask (255.255.255.0) strips out the 24-bit network ID. Network Layer 33 Subnetting, Subnet & Subnet Mask Subnetting, a subnet & a subnet mask are all different. In fact, the 1st creates the 2nd & is identified by the 3rd. Subnetting is the process of dividing a network & its IP addresses into segments, each of which is called a subnetwork or subnet. The subnet mask is the 32-bit number that the router uses to cover up the network address to show which bits are being used to identify the subnet. Subnetting A network has its own unique address, such as a Class B network with the address 172.20.0.0 which has all zeroes in the host portion of the address. From the basic definitions of a Class B network & the default Class B subnet mask, this network can be created as a single network that contains more than 65,000 individual hosts. Through the use of subnetting, the network from the previous slide can be logically divided into subnets with fewer hosts on each subnetwork. It does not improve the available shared bandwidth only, but it cuts down on the amount of broadcast traffic generated over the entire network as well. The 2 primary benefits of subnetting are: 1. Fewer IP addresses, often as few as one, are needed to provide addressing to a network & subnetting. 2. Subnetting usually results in smaller routing tables in routers beyond the local internetwork. Network Layer 34 Subnetting Example of subnetting: when the network administrator divides the 172.20.0.0 network into 5 smaller networks – 172.20.1.0, 172.20.2.0, 172.20.3.0, 172.20.4.0 & 172.20.5.0 – the outside world stills sees the network as 172.20.0.0, but the internal routers now break the network addressing into the 5 smaller subnetworks. In the above example, only a single IP address is used to reference the network & instead of 5 network addresses, only one network reference is included in the routing tables of routers on other networks. Borrowing Bits to Grow a Subnet The key concept in subnetting is borrowing bits from the host portion of the network to create a subnetwork. Rules govern this borrowing, ensuring that some bits are left for a Host ID. The rules require that two bits remain available to use for the Host ID & that all of the subnet bits cannot be all 1s or 0s at the same time. For each IP address class, only a certain number of bits can be borrowed from the host portion for use in the subnet mask. Bits Available for Creating Subnets Address Class Host Bits Bits Available for Subnet A 24 22 B 16 14 8 6 C Network Layer 35 Subnetting a Class A Network The default subnet mask for a class A network is 255.0.0.0 which allows for more than 16,000,000 hosts on a single network. The default subnet mask uses only 8 bits to identify network, leaving 24 bits for host addressing . To subnet a Class A network, you need to borrow a sufficient number of bits from the 24-bit host portion of the mask to allow for the number of subnets you plan to create, now & in the future. Example: To create 2 subnets with more than 4 millions hosts per subnet, you must borrow 2 bits from the 2nd octet & use 10 masked (value equals one) bits for the subnet mask (11111111.11000000) or 255.192 in decimal. Keep in mind that each of the 8-bit octets has binary place values. When you borrow bits from the Host ID portion of the standard mask, you don’t change the value of the bits, only how they are grouped & used. A sample of subnet mask options available for Class A addresses. Network Layer 36 Class A Subnet Masks All subnet masks contain 32 bits; no more, no less. However a subnet mask cannot filter more than 30 bits. This means 2 things: One, that there cannot be more than 30 ones bits in the subnet mask. Two, that there must always be at least 2 bits available for the Host ID. The subnet mask with the highest value (255.255.255.252) has a binary representation of: 11111111.11111111.11111111.11111100 The 2 zeroes in this subnet mask represent the 2 positions set aside for the Host address portion of the address. Remember that the addresses with all ones (broadcast address) & all zeroes (local network) cannot be used as they have special meanings. Network Layer 37 Subnetting Class B & Class C • The table on previous slide “Class A Subnet Masks” is similar to the tables used for Class B & Class C IP addresses & subnet masks. • The only differences are that you have fewer options (due to a fewer number of bits available) & that you’re much more likely to work with Class B & Class C networks in real life. A sample of the subnet masks available for Class B networks. A list of the subnet masks available for Class C networks. Network Layer 38 Class C Subnets Knowing the relationships in this table will significantly reduce the time you spend calculating subnetting problems. Class B Subnets To calculate the number of subnets & hosts available from a Class B subnet mask, you use the same host & subnet formulas described for calculating Class C values. Using these formulas I have constructed a table that contains the Class B subnet & host values. Network Layer 39 Class B Subnets Network Layer 40 So How Does This Work? We ask our ISP for a Class C license. They give us the Class C bank of 206.15.143.0 This gives us 1 Network (206.15.143.0) with the potential for 254 node addresses (206.15.143.1 to 206.15.143.254). But we have a LAN made up of 5 Networks with the largest one serving 25 nodes. So we need to Subnet our 1 IP address... To calculate the number of subnets (networks) and/or nodes, we need to do some math: Use the formula 2n-2 where the n can represent either how many subnets (networks) needed OR how many nodes per subnet needed. We know we need at least 5 subnets. So 23-2 will give us 6 subnet addresses (Network Addresses). We know we need at least 25 nodes per network. 25-2 will give us 30 nodes per subnet (network). This will work, because we can steal the first 3 bits from the node’s portion of the address to give to the network portion and still have 5 (8-3) left for the node portion: Network Layer 41 Break it down: • Let’s go back to what portion is what: We have a Class C address: NNNNNNNN.NNNNNNNN.NNNNNNNN.nnnnnnnn With a Subnet mask of: 11111111.11111111.11111111.00000000 We need to steal 3 bits from the node portion to give it to the Network portion: NNNNNNNN.NNNNNNNN.NNNNNNNN.NNNnnnnn NNNNNNNN.NNNNNNNN.NNNNNNNN.NNNnnnnn This will change our subnet mask to the following: 11111111.11111111.11111111.11100000 Above is how the computer will see our new subnet mask, but we need to express it in decimal form as well: 255.255.255.224 128+64+32=224 Network Layer 42 What address is what? Which of our 254 addresses will be a Subnet (or Network) address and which will be our node addresses? Because we are using the first 3 bits for our subnet mask, we can configure them into eight different ways (binary form): 000 001 010 011 100 101 110 111 We cannot use all “0”s or all “1”s 000 001 010 011 100 101 110 111 We are left with 6 useable network numbers. Network Layer 43 Network (Subnet) Addresses • Remember our values: 128 64 32 16 8 Now our 3 bit configurations: 0 0 1 n n 0 1 0 n n 0 1 1 n n 1 0 0 n n 1 0 1 n n 1 1 0 n n Network Layer 4 2 1 Equals n n n n n n n n n n n n n n n n n n 32 64 96 128 160 192 44 Network (Subnet) Addresses 0 0 0 1 1 1 0 1 1 0 0 1 1 0 1 0 1 0 n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n 32 64 96 128 160 192 Each of these numbers becomes the Network Address of their subnet... 206.15.143.32 206.15.143.64 206.15.143.96 206.15.143.128 206.15.143.160 206.15.143.192 Network Layer 45 Node Addresses The device assigned the first address will receive the first number AFTER the network address shown before. 206.15.143.33 or 32+1 0 0 1 0 0 0 0 1 And the last address in the Network will look like this: 206.15.143.62 0 0 1 1 1 1 1 0 *Remember, we cannot use all “1”s, that is the broadcast address (206.15.143.63) The next network will start at 206.15.143.64 The first IP address on this subnet network will receive: 206.15.143.65 0 1 0 0 0 0 0 1 And the last address in the Network will receive: 206.15.143.94 0 1 0 1 1 1 1 0 *Remember, the broadcast address (206.15.143.95) Network Layer 46 Can you figure out the rest? • • • • • • • Network: 206.15.143.32 206.15.143.64 206.15.143.96 206.15.143.128 206.15.143.160 206.15.143.192 Host Range 206.15.143.33 to 206.15.143.62 206.15.143.65 to 206.15.143.94 206.15.143.97 to 206.15.143.126 206.15.143.129 to 206.15.143.158 206.15.143.161 to 206.15.143.190 206.15.143.193 to 206.15.143.222 Network Layer 47 How the computer finds the Network Address 200.15.143.89 An address on the subnet 225.225.225.224 The new subnet mask When the computer does the Logical Bitwise AND Operation it will come up with the following Network Address (or Subnet Address): 11001000.00001111.10001111.01011001= 200.15.143.89 11111111.11111111.11111111.11100000 = 255.255.255.224 11001000.00001111.10001111.01000000 = 200.15.143.64 (Network) This address falls on our 2nd Subnet (Network) Network Layer 48 Review We have one class C license (206.15.143.0). We need to subnet that into 12 possible networks. Each network needs a maximum of 10 nodes. How many bits do we need to take? 24-2=14 4 bits need to be taken from the node portion and given to the network portion. Will that leave enough bits for the node portion? We need a maximum of 10 on each network… 24-2=14 If we take 4 away, that leaves us with 4. That is enough for our individual networks of 10 nodes each. Our new subnet mask will look like this: 11111111.11111111.11111111.11110000 255.255.255.240 128+64+32+16= 240 Our subnet, or network addresses will be: 206.15.143.16 206.15.143.64 206.15.143.112 206.15.143.160 206.15.143.208 206.15.143.32 206.15.143.80 206.15.143.128 206.15.143.176 206.15.143.224 Network Layer 206.15.143.48 206.15.143.96 206.15.143.144 206.15.143.192 49 Review Class B Subnet Chart Class C Subnet Chart #bits Subn et M as k CI DR #Subn ets #Ho sts #b its Subne t M ask 2 255.255.192.0 /18 2 16,382 2 255.255.255.192 /26 2 62 3 255.255.224.0 /19 6 8,190 3 255.255.255.224 /27 6 30 4 255.255.240.0 /20 14 4094 5 6 7 8 9 10 11 12 13 14 255.255.248.0 255.255.252.0 255.255.254.0 255.255.255.0 255.255.255.128 255.255.255.192 255.255.255.224 255.255.255.240 255.255.255.248 255.255.255.252 /21 /22 /23 /24 /25 /26 /27 /28 /29 /30 30 62 126 254 510 1,022 2,046 4,094 8,190 16,382 2,046 1,022 510 254 126 62 30 14 6 2 4 255.255.255.240 /28 14 14 5 255.255.255.248 /29 30 6 6 255.255.255.252 /30 62 2 Network Layer C ID R #subne ts #ho sts 50 ICMP: Internet Control Message Protocol • • • Used by hosts & routers to communicate network level information – error reporting: unreachable host, network, port, protocol – echo request/reply (used by ping) Network-layer “above” IP: – ICMP msgs carried in IP datagrams ICMP message: type, code plus first 8 bytes of IP datagram causing error Type 0 3 3 3 3 3 3 4 Code 0 0 1 2 3 6 7 0 8 9 10 11 12 0 0 0 0 0 Network Layer description echo reply (ping) dest. network unreachable dest host unreachable dest protocol unreachable dest port unreachable dest network unknown dest host unknown source quench (congestion control - not used) echo request (ping) route advertisement router discovery TTL expired bad IP header 51 Trace route and ICMP • Source sends series of UDP segments to dest – First has TTL =1 – Second has TTL=2, etc. – Unlikely port number • When nth datagram arrives to nth router: – Router discards datagram – And sends to source an ICMP message (type 11, code 0) – Message includes name of router& IP address • When ICMP message arrives, source calculates RTT • Traceroute does this 3 times Stopping criterion • UDP segment eventually arrives at destination host • Destination returns ICMP “host unreachable” packet (type 3, code 3) • When source gets this ICMP, stops. Network Layer 52 IPv6 Initial motivation: 32-bit address space soon to be completely allocated. Additional motivation: – header format helps speed processing/forwarding – header changes to facilitate QoS IPv6 datagram format: – fixed-length 40 byte header – no fragmentation allowed IPv6 Header Priority: identify priority among datagrams in flow Flow Label: identify datagrams in same “flow.” (concept of “flow” not well defined). Next header: identify upper layer protocol for data Network Layer 53 Other Changes from IPv4 • • • Checksum: removed entirely to reduce processing time at each hop Options: allowed, but outside of header, indicated by “Next Header” field ICMPv6: new version of ICMP – additional message types, e.g. “Packet Too Big” – multicast group management functions Transition From IPv4 To IPv6 • • Not all routers can be upgraded simultaneous – no “flag days” – How will the network operate with mixed IPv4 and IPv6 routers? Tunneling: IPv6 carried as payload in IPv4 datagram among IPv4 routers Network Layer 54 Tunneling Logical view: Physical view: A B IPv6 IPv6 A B C IPv6 IPv6 IPv4 Flow: X Src: A Dest: F data A-to-B: IPv6 E F IPv6 IPv6 D E F IPv4 IPv6 IPv6 tunnel Src:B Dest: E Src:B Dest: E Flow: X Src: A Dest: F Flow: X Src: A Dest: F data data B-to-C: IPv6 inside IPv4 Network Layer B-to-C: IPv6 inside IPv4 Flow: X Src: A Dest: F data E-to-F: IPv6 55 Graph Abstraction 5 2 u 2 1 Graph: G = (N,E) v x 3 w 3 1 5 z 1 y 2 N = set of routers = { u, v, w, x, y, z } E = set of links ={ (u,v), (u,x), (u,w), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) } Remark: Graph abstraction is useful in other network contexts Example: P2P, where N is set of peers and E is set of TCP connections Network Layer 56 Graph abstraction: costs 5 2 u v 2 1 x • c(x,x’) = cost of link (x,x’) 3 w 3 1 5 z 1 y - e.g., c(w,z) = 5 2 • cost could always be 1, or inversely related to bandwidth, or inversely related to congestion Cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp) Question: What’s the least-cost path between u and z ? Routing algorithm: algorithm that finds least-cost path Network Layer 57 Routing Algorithm Classification Global or decentralized information? Global: • all routers have complete topology, link cost info • “link state” algorithms Decentralized: • router knows physicallyconnected neighbors, link costs to neighbors • iterative process of computation, exchange of info with neighbors • “distance vector” algorithms Static or dynamic? Static: • routes change slowly over time Dynamic: • routes change more quickly – periodic update – in response to link cost changes Network Layer 58 A Link-State Routing Algorithm Dijkstra’s algorithm • net topology, link costs known to all nodes – accomplished via “link state broadcast” – all nodes have same info • computes least cost paths from one node (‘source”) to all other nodes – gives forwarding table for that node • iterative: after k iterations, know least cost path to k dest.’s Notation: • c(x,y): link cost from node x to y; = ∞ if not direct neighbors • D(v): current value of cost of path from source to dest. v • p(v): predecessor node along path from source to v • N': set of nodes whose least cost Network Layer path definitively known 59 Dijsktra’s Algorithm: Example D(v): current value of cost of path from source to dest. v p(v): predecessor node along path from source to v N': set of nodes whose least cost path definitively known Step 0 1 2 3 4 5 c(x,y): link cost from node x to y; = ∞ if not direct neighbors N' u ux uxy uxyv uxyvw uxyvwz D(v),p(v) D(w),p(w) 2,u 5,u 2,u 4,x 2,u 3,y 3,y 2 u 2 1 x 3 w 3 1 5 z 1 y D(y),p(y) ∞ 2,x D(z),p(z) ∞ ∞ 4,y 4,y 4,y 1 Initialization: 2 N' = {u} 3 for all nodes v 4 if v adjacent to u 5 then D(v) = c(u,v) 6 else D(v) = ∞ 7 8 Loop 9 find w not in N' such that D(w) is a minimum 10 add w to N' 11 update D(v) for all v adjacent to w and not in N' : 12 D(v) = min( D(v), D(w) + c(w,v) ) 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N' 5 v D(x),p(x) 1,u 2 Network Layer 60 Dijsktra’s Algorithm, Discussion Algorithm complexity: n nodes • each iteration: need to check all nodes, w, not in N' • n(n+1)/2 comparisons: O(n2) • more efficient implementations possible: O(nlogn) Oscillations possible: • e.g., link cost = amount of carried traffic D 1 1 0 A 0 0 C e 1+e e initially B 1 2+e A 0 0 D 1+e 1 B 0 0 C … recompute routing D 1 A 0 0 C 2+e B 1+e … recompute Network Layer 2+e A 0 D 1+e 1 B e 0 C … recompute 61 Distance Vector Algorithm (1) Bellman-Ford (B-F) Equation (dynamic programming) Define dx(y) := cost of least-cost path from x to y Then dx(y) = min {c(x,v) + dv(y) } where min is taken over all neighbors of x Bellman-Ford example (2) Clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3 B-F equation says: 5 2 u v 2 1 x 3 w 3 1 5 z 1 y 2 du(z) = min { c(u,v) + dv(z), c(u,x) + dx(z), c(u,w) + dw(z) } = min {2 + 5, 1 + 3, 5 + 3} = 4 Node that achieves minimum is next hopLayer in shortest path ➜ forwarding table Network 62 lnternet Routing • Can use any of the standard routing algorithms: – link-state • OSPF (Open Shortest Path First) – distance vector • RIP (Routing Information Protocol) [RFC 1058] [RFC 1723] • BGP (Border Gateway Protocol) (inter-AS routing) Intra-AS Routing • Also known as Interior Gateway Protocols (IGP) • Most common Intra-AS routing protocols: – RIP: Routing Information Protocol – OSPF: Open Shortest Path First – IGRP: Interior Gateway Routing Protocol (Cisco proprietary) Network Layer 63 RIP (Routing Information Protocol) • • • • • Distance vector algorithm Included in BSD-UNIX Distribution in 1982 Distance metric: # of hops (max = 15 hops) Distance vectors: exchanged among neighbors every 30 sec via Response Message (also called advertisement) Each advertisement: list of up to 25 destination nets within AS RIP: Example Dest w x z …. w Next hops C 4 … ... A Destination Network w y z x …. Advertisement from A to D x D C Next Router A B B A -…. B y z Num. of hops to dest. 2 2 7 5 1 .... Network Layer Routing table in D 64 RIP: Link Failure and Recovery • If no advertisement heard after 180 sec --> neighbor/link declared dead – – – – – routes via neighbor invalidated new advertisements sent to neighbors neighbors in turn send out new advertisements (if tables changed) link failure info quickly propagates to entire net poison reverse used to prevent ping-pong loops (infinite distance = 16 hops) RIP: Table Processing • RIP routing tables managed by application-level process called route-d (daemon) • advertisements sent in UDP packets, periodically repeated routed Transprt (UDP) network (IP) link physical routed forwarding table forwarding table Network Layer Transprt (UDP) network (IP) link physical 65 RIP: Table Example (Continued..) Router: giroflee.eurocom.fr Destination -------------------127.0.0.1 192.168.2. 193.55.114. 192.168.3. 224.0.0.0 default Gateway Flags Ref Use Interface -------------------- ----- ----- ------ --------127.0.0.1 UH 0 26492 lo0 192.168.2.5 U 2 13 fa0 193.55.114.6 U 3 58503 le0 192.168.3.5 U 2 25 qaa0 193.55.114.6 U 3 0 le0 193.55.114.129 UG 0 143454 Three attached networks (LANs) Router only knows routes to attached LANs Default router used to “go up” Route multicast address: 224.0.0.0 Loopback interface (for debugging) Network Layer 66 OSPF (Open Shortest Path First) • “open”: publicly available • Uses Link State algorithm – LS packet dissemination – Topology map at each node – Route computation using Dijkstra’s algorithm • OSPF advertisement carries one entry per neighbor router • Advertisements disseminated to entire AS (via flooding) – Carried in OSPF messages directly over IP (rather than TCP or UDP Inter-AS routing in the Internet: BGP R4 R5 R3 AS1 BGP AS2 (RIP intra-AS routing) BGP R1 R2 (OSPF intra-AS routing) AS3 (OSPF intra-AS routing) Figure 4.5.2-new2: BGP use for inter-domain routing Network Layer 67 Internet inter-AS routing: BGP • BGP (Border Gateway Protocol): the de facto standard • Path Vector protocol: – – – – similar to Distance Vector protocol each Border Gateway broadcast to neighbors (peers) entire path (i.e., sequence of AS’s) to destination BGP routes to networks (ASs), not individual hosts E.g., Gateway X may send its path to dest. Z: Path (X,Z) = X,Y1,Y2,Y3,…,Z • Suppose: gateway X send its path to peer gateway W • W may or may not select path offered by X – cost, policy (don’t route via competitors AS), loop prevention reasons. • If W selects path advertised by X, then: • Path (W,Z) = w, Path (X,Z) • Note: X can control incoming traffic by controlling it route advertisements to peers: – e.g., don’t want to route traffic to Z -> don’t advertise any routes to Z Network Layer 68 BGP: controlling who routes to you legend: B W provider network X A customer network: C Y Figure 4.5-BGPnew: a simple BGP scenario A,B,C are provider networks X,W,Y are customer (of provider networks) X is dual-homed: attached to two networks X does not want to route from B via X to C .. so X will not advertise to B a route to C A advertises to B the path AW B advertises to X the path BAW Should B advertise to C the path BAW? No way! B gets no “revenue” for routing CBAW since neither W nor C are B’s customers B wants to force C to route to w via A B wants to route only to/from its customers! Network Layer 69 BGP Operation Q: What does a BGP router do? • Receiving and filtering route advertisements from directly attached neighbor(s). • Route selection. – To route to destination X, which path (of several advertised) will be taken? • Sending route advertisements to neighbors. Network Layer 70 BGP Messages • BGP messages exchanged using TCP. • BGP messages: – OPEN: opens TCP connection to peer and authenticates sender – UPDATE: advertises new path (or withdraws old) – KEEPALIVE keeps connection alive in absence of UPDATES; also ACKs OPEN request – NOTIFICATION: reports errors in previous msg; also used to close connection Network Layer 71 Why different Intra- and Inter-AS routing ? • Policy: – Inter-AS: admin wants control over how its traffic routed, who routes through its net. – Intra-AS: single admin, so no policy decisions needed • Scale: – hierarchical routing saves table size, reduced update traffic • Performance: – Intra-AS: can focus on performance – Inter-AS: policy may dominate over performance Network Layer 72 Path vector messages Network Next Router N01 R01 AS14, AS23, AS67 N02 R05 AS22, AS67, AS05, AS89 N03 R06 AS67, AS89, AS09, AS34 N04 R12 AS62, AS02, AS09 Network Layer Path 73 Broadcast and Multicast Routing Algorithms In broadcast routing, the network layer provides a service of delivering a packet sent from a source node to all other nodes in the network; multicast routing enables a single source node to send a copy of a packet to a subset of the other network nodes. Broadcast Routing Algorithms The most straightforward way to accomplish broadcast communication is for the sending node to send a separate copy of the packet to each destination, as shown in Figure 4.43(a). Given N destination nodes, the source node simply makes N copies of the packet, addresses each copy to a different destination, and then transmits the N copies to the N destinations using unicast routing. This N-wayunicast approach to broadcasting is simple—no new network-layer routing protocol, packet-duplication, or forwarding functionality is needed. Among the several drawbacks to this approach. The first drawback is its inefficiency. If the source node is connected to the rest of the network via a single link, then N separate copies of the (same) packet will traverse this single link. It would clearly be more efficient to send only a single copy of a packet over this first hop and then have the node at the other end of the first hop make and forward any additional needed copies. That is, it would be more efficient for the network nodes themselves (rather than just the source node) to create duplicate copies of a packet. For example, in Figure 4.43(b), only a single copy of a packet traverses the R1-R2 link. That packet is then duplicated at R2, with a single copy being sent over links R2-R3 and R2-R4. The additional drawbacks of N-way-unicast is that broadcast recipients, and their addresses, are known to the sender. A final drawback of N-way-unicast relates to the purposes for which broadcast is to be used. Duplicate creation/transmission duplicate R1 R1 R2 R2 R3 R4 (a) duplicate R3 R4 (b) Figure 4.43 Source-duplication versus in-network duplication. (a) source duplication, (b) in-network duplication Network Layer 75 Uncontrolled & Controlled Flooding Uncontrolled Flooding: The most obvious technique for achieving broadcast is a flooding approach in which the source node sends a copy of the packet to all of its neighbors. When a node receives a broadcast packet, it duplicates the packet and forwards it to all of its neighbors. For example, in Figure 4.43, R2 will flood to R3, R3 will flood to R4, R4 will flood to R2, and R2 will flood (again!) to R3, and so on. This results in the endless cycling of two broadcast packets, one clockwise, and one counterclockwise. But there can be an even more calamitous fatal flaw: When a node is connected to more than two other nodes, it will create and forward multiple copies of the broadcast packet, each of which will create multiple copies of itself (at other nodes with more than two neighbors), and so on. This broadcast storm, resulting from the endless multiplication of broadcast packets. Controlled Flooding: The key to avoiding a broadcast storm is for a node to judiciously choose when to flood a packet and when not to flood a packet. In practice, this can be done in one of several ways. In sequence-number-controlled flooding, a source node puts its address (or other unique identifier) as well as a broadcast sequence number into a broadcast packet, then sends the packet to all of its neighbors. Each node maintains a list of the source address and sequence number of each broadcast packet it has already received, duplicated, and forwarded. When a node receives a broadcast packet, it first checks whether the packet is in this list. If so, the packet is dropped; if not, the packet is duplicated and forwarded to all the node’s neighbors (except the node from which the packet has just been received). A second approach to controlled flooding is known as reverse path forwarding (RPF) [Dalal 1978], also sometimes referred to as reverse path broadcast (RPB). The idea behind RPF is simple, yet elegant. Network Layer 76 Controlled Flooding In RPF when a router receives a broadcast packet with a given source address, it transmits the packet on all of its outgoing links (except the one on which it was received) only if the packet arrived on the link that is on its own shortest unicast path back to the source. Otherwise, the router simply discards the incoming packet without forwarding it on any of its outgoing links. Note that RPF does not use unicast routing to actually deliver a packet to a destination, nor does it require that a router know the complete shortest path from itself to the source. RPF need only know the next neighbor on its unicast shortest path to the sender; it uses this neighbor’s identity only to determine whether or not to flood a received broadcast packet. Figure 4.44: Reverse path forwarding (RPF) Network Layer 77 Transport Layer Transport Services and Protocols • provide logical communication between app processes running on different hosts • transport protocols run in end systems – send side: breaks app messages into segments, passes to network layer – rcv side: reassembles segments into messages, passes to app layer • more than one transport protocol available to apps – Internet: TCP and UDP application transport network data link physical Network Layer network data link physical network data link physical network data link physical network data link physical network data link physical application transport network data link physical 79 Transport vs. Network Layer • Network Layer: logical communication between hosts – PDU: Datagram – Datagram’s may be lost, duplicated, reordered in the Internet – “best effort” service • Transport Layer: logical communication between processes – relies on, enhances, network layer services – PDU: Segment – extends “host-to-host” communication to “process-to-process” communication Network Layer 80 TCP/IP Transport Layer Protocols • reliable, in-order delivery (TCP) – congestion control – flow control – connection setup • unreliable, unordered delivery: UDP – no-frills extension of “best-effort” IP – What does UDP provide in addition to IP? • services not provided by IP (network layer): – delay guarantees – bandwidth guarantees Network Layer 81 Multiplexing/Demultiplexing HTTP Transport Layer FTP Telnet Network Layer Transport Layer Network Layer • Use same communication channel between hosts for several logical communication processes • How does Mux/DeMux work? – Sockets: doors between process & host – UDP socket: (dest. IP, dest. Port) – TCP socket: (src. IP, src. port, dest. IP, dest. Port) Network Layer 82 Connectionless demux • UDP socket identified by two-tuple: – (dest IP address, dest port number) • When host receives UDP segment: – checks destination port number in segment – directs UDP segment to socket with that port number • IP datagrams with different source IP addresses and/or source port numbers directed to same socket Connection-oriented demux • TCP socket identified by 4-tuple: – – – – source IP address source port number dest IP address dest port number • Server host may support many simultaneous TCP sockets: – each socket identified by its own 4-tuple • recv host uses all four values to direct segment to appropriate socket • Web servers have different sockets for each connecting client Network Layer – non-persistent HTTP will have different socket for each request 83 User Datagram Protocol (UDP) [RFC 768] • “no frills,” “bare bones” Internet transport protocol • “best effort” service, UDP segments may be: – lost – delivered out of order to app • Why use UDP? – No connection establishment cost (critical for some applications, e.g., DNS) – No connection state – Small segment headers (only 8 bytes) – Finer application control over data transmission Network Layer 84 UDP Segment Structure • often used for streaming multimedia apps – loss tolerant Length, in bytes of UDP – rate sensitive • other UDP uses – DNS – SNMP • reliable transfer over UDP: add reliability at application layer – application-specific error recovery! segment, including header 32 bits source port # dest port # length checksum Application data (message) UDP segment format Network Layer 85 TCP Segment Structure 32 bits URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) source port # dest port # sequence number acknowledgement number head not len used U A P R S F Receive window checksum Urg data pnter Options (variable length) counting by bytes of data (not segments!) # bytes rcvr willing to accept application data (variable length) Network Layer 86 TCP Sequence Number and Acknowledgement • • • • TCP views data as unstructured, but ordered stream of bytes. Sequence numbers are over bytes, not segments Initial sequence number is chosen randomly TCP is full duplex – numbering of data is independent in each direction • Acknowledgement number – sequence number of the next byte expected from the sender • ACKs are cumulative Network Layer 87 TCP seq. #’s and ACKs Seq. #’s: – byte stream “number” of first byte in segment’s data ACKs: – seq # of next byte expected from other side – cumulative ACK Q: how receiver handles outof-order segments – A: TCP spec doesn’t say, - up to implementor Host A Host B 1000 byte data host ACKs receipt of data Host sends another 500 bytes time Network Layer 88 TCP Reliable Data Transfer • TCP creates rdt service on top of IP’s unreliable service • Pipelined segments • Cumulative acks • TCP uses single retransmission timer • Retransmissions are triggered by: – timeout events – duplicate acks • Initially consider simplified TCP sender: Network Layer – ignore duplicate acks – ignore flow control, congestion control 89 TCP Sender Events data rcvd from app: • Create segment with seq # • seq # is byte-stream number of first data byte in segment • start timer if not already running (think of timer as for oldest unacked segment) • expiration interval: TimeOutInterval timeout: • retransmit segment that caused timeout • restart timer Ack rcvd: • If acknowledges previously unacked segments Network Layer – update what is known to be acked – start timer if there are outstanding segments 90 NextSeqNum = InitialSeqNum SendBase = InitialSeqNum loop (forever) { switch(event) TCP sender event: data received from application above (simplified) create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer Comment: pass segment to IP • SendBase-1: last NextSeqNum = NextSeqNum + length(data) cumulatively ack’ed byte event: timer timeout Example: retransmit not-yet-acknowledged segment with • SendBase-1 = 71; smallest sequence number y= 73, so the rcvr start timer wants 73+ ; y > SendBase, so event: ACK received, with ACK field value of y that new data is if (y > SendBase) { acked SendBase = y if (there are currently not-yet-acknowledged segments) start timer Network Layer 91 } TCP Flow Control receive side of TCP connection has a receive buffer: TCP Flow Control: How it works (Suppose TCP receiver discards out-oforder segments) spare room in buffer = RcvWindow = RcvBuffer-[LastByteRcvd LastByteRead] app process may be slow at reading from buffer flow control sender won’t overflow receiver’s buffer by transmitting too much, too fast Rcvr advertises spare room by including value of RcvWindow in segments Sender limits unACKed data to RcvWindow – guarantees receive buffer doesn’t overflow speed-matching service: matching the send rate to the receiving app’s drain rate Network Layer 92 Silly Window Syndrome Recall: TCP uses sliding window “Silly Window” occurs when small-sized segments are transmitted, resulting in inefficient use of the network pipe For e.g., suppose that TCP sender generates data slowly, 1-byte at a time Solution: wait until sender has enough data to transmit – “Nagle’s Algorithm” Nagle’s Algorithm 1. TCP sender sends the first piece of data obtained from the application (even if data is only a few bytes). 2. Wait until enough bytes have accumulated in the TCP send buffer or until an ACK is received. 3. Repeat step 2 for the remainder of the transmission. Network Layer 93 Silly Window Syndrome • Suppose that the receiver consumes data slowly – Receive Window opens slowly, and thus sender is forced to send small-sized segments • Solutions – Delayed ACK – Advertise Receive Window = 0, until reasonable amount of space available in receiver’s buffer Network Layer 94 TCP Connection Management Recall: TCP sender, receiver establish “connection” before exchanging data segments • initialize TCP variables: – seq. #s – buffers, flow control info (e.g. RcvWindow) • client: connection initiator Socket clientSocket = new Socket("hostname","port number"); • server: contacted by client Socket connectionSocket = welcomeSocket.accept(); Three way handshake: Step 1: client host sends TCP SYN segment to server – specifies initial seq # – no data Step 2: server host receives SYN, replies with SYNACK segment – server allocates buffers – specifies server initial seq. # Step 3: client receives SYNACK, replies with ACK segment, which may contain data Network Layer 95 TCP Connection Establishment client CLOSED Active open; SYN Passive open SYN/SYN+ACK server LISTEN SYN_SENT SYN_RCVD SYN+ACK/ACK ACK Established Solid line for client Dashed line for server Network Layer 96 Principles of Congestion Control • Congestion: informally: “too many sources sending too much data too fast for network to handle” • Different from flow control! • Manifestations: – Packet loss (buffer overflow at routers) – Increased end-to-end delays (queuing in router buffers) • Results in unfairness and poor utilization of network resources – Resources used by dropped packets (before they were lost) – Retransmissions – Poor resource allocation at high load Network Layer 97 Historical Perspective • October 1986, Internet had its first congestion collapse • Link LBL to UC Berkeley – 400 yards, 3 hops, 32 Kbps – throughput dropped to 40 bps – factor of ~1000 drop! • Van Jacobson proposes TCP Congestion Control: – Achieve high utilization – Avoid congestion – Share bandwidth Network Layer 98 Congestion Control: Approaches • Goal: Throttle senders as needed to ensure load on the network is “reasonable” • End-end congestion control: – no explicit feedback from network – congestion inferred from end-system observed loss, delay – approach taken by TCP • Network-assisted congestion control: – routers provide feedback to end systems – single bit indicating congestion (e.g., ECN) – explicit rate sender should send at Network Layer 99 TCP Congestion Control: Overview • end-end control (no network assistance) • Limit the number of packets in the network to window W • Roughly, rate = W RTT Bytes/sec • W is dynamic, function of perceived network congestion Network Layer 100 What is the maximum value of W ? • • • • • When a connection is set up, the congestion window, a value maintained independently at each host, is set to a small multiple of the maximum segment size (MSS) allowed on that connection. Further variance in the congestion window is dictated by an Additive Increase/Multiplicative Decrease approach. The maximum segment size (MSS) is a parameter of the TCP protocol that specifies the largest amount of data, specified in octets (8 bits = 1 byte), that a computer or communications device can receive in a single TCP segment. It does not count the TCP header or the IP header. The IP datagram containing a TCP segment may be self-contained within a single packet, or it may be reconstructed from several fragmented pieces; either way, the MSS limit applies to the total amount of data contained in the final, reconstructed TCP segment. The default TCP Maximum Segment Size is 536 bytes. To avoid fragmentation in the IP layer, a host must specify the maximum segment size as equal to the largest IP datagram that the host can handle minus the IP header size and TCP header sizes. Therefore IPv4 hosts are required to be able to handle an MSS of 536 octets (= 576 - 20 - 20) and IPv6 hosts are required to be able to handle an MSS of 1220 octets (= 1280 - 40 - 20). Low MSS values will reduce or eliminate IP fragmentation, but will result in higher overhead. For most computer users, the MSS option is established by the operating system. Network Layer 101 Slow Start • “Slow Start” is used to reach the equilibrium state • Initially: W = 1 (slow start) • On each successful ACK: WW+1 • Exponential growth of W each RTT: W 2 x W • Enter Congestion Avoidance (CA) when W >= ssthresh • ssthresh: window size after which TCP carefully analyses for bandwidth Network Layer receiver sender cwnd 1 2 data segment ACK 3 4 5 6 7 8 102 Congestion Avoidance (CA) sender • Starts when W ssthresh • On each successful ACK: W W+ 1/W • Linear growth of W each RTT: WW+1 1 receiver data segment ACK 2 3 4 CA: Additive Increase, Multiplicative Decrease • We have “additive increase” in the absence of loss events • After loss event, decrease congestion window by half – “multiplicative decrease” – ssthresh = W/2 – Enter Slow Start Multiplicative decrease Network Layer 103 Detecting Packet Loss • Assumption: loss indicates congestion • Option 1: time-out 10 11 12 X 13 – Waiting for a time-out can be long! 14 15 16 17 10 11 11 • Option 2: duplicate ACKs – How many? At least 3. 11 11 Sender Network Layer Receiver 104 Fast Retransmit • Wait for a timeout is quite long • Immediately retransmits after 3 duplicate ACKs without waiting for timeout • Adjusts ssthresh ssthresh W/2 • Enter Slow Start W=1 Network Layer 105 Examples Example: Suppose that a TCP congestion window is set to 128 KB and a timeout occurs. How big will the window be if next four transmission bursts are all successful? Assume the maximum segment size is 1 KB. Solution: The next transmission will be 1 maximum segment size. Then 2, 4, & 8. So after four successes, it will be 8 KB. Network Layer 106 How to Set TCP Timeout Value? • longer than RTT – but RTT varies • too short: premature timeout – unnecessary retransmissions • too long: slow reaction to segment loss How to Estimate RTT? • SampleRTT: measured time from segment transmission until ACK receipt – ignore retransmissions • SampleRTT will vary, want estimated RTT “smoother” – average several recent measurements, not just current SampleRTT Network Layer 107 Peer-to-peer (P2P) Networks • P2P Networks: Internet users that are ready to share their resources become peers and form a network. When a peer in the network has a file (for example, an audio or video file) to share, it makes it available to the rest of the peers. The P2P networks is divided into two categories: centralized and decentralized. – Centralized Networks: The directory system ⎯ listing of the peers and what they offer ⎯ uses the client-server paradigm, but the storing and downloading of the files are done using the peer-to-peer paradigm. For this reason, a centralized P2P network is sometimes referred to as a hybrid P2P network. – Decentralized Networks: A decentralized P2P network does not depend on a centralized directory system. In this model, peers arrange themselves into an overlay network, which is a logical network made on top of the physical network. Depending on how the nodes in the overlay network are linked, a decentralized P2P network is classified as either unstructured or structured. Unstructured Networks: In an unstructured P2P network, the nodes are linked randomly. A search in an unstructured P2P is not very efficient because a query to find a file must be flooded through the network, which produces significant traffic and still the query may not be resolved. Two examples of this type of network are Gnutella and Freenet. Structured Networks: A structured network uses a predefined set of rules to link nodes so that a query can be effectively and efficiently resolved. The most common technique used for this purpose is the Distributed Hash Table (DHT). DHT is used in many applications including Distributed Data Structure (DDS), Content Distributed Systems (CDS), Domain Name System (DNS), and P2P file sharing. One popular P2P file sharing protocol that uses the DHT is BitTorrent. We discuss DHT independently in the next section as a technique that can be used both in structured P2P networks and in other systems. Network Layer 108 Note: For the details please have a look on chapter # 30 of “Data Communications and Networking” (5E) by Forouzan Quality of Service (QOS) Flow Characteristics & Flow Classes Flow characteristics A characteristic that a flow needs in order to deliver the packets safe and sound to the destination. Lack of reliability means losing a packet or acknowledgment, which entails retransmission. Source-to-destination delay is another flow characteristic. Applications can tolerate delay in different degrees. The variation in delay for packets belonging to the same flow. For example, if four packets depart at times 0, 1, 2, 3 and arrive at 20, 21, 22, 23, all have the same delay, 20 units of time. On the other hand, if the above four packets arrive at 21, 23, 24, and 28, they will have different delays. Different applications need different bandwidths. In video conferencing we need to send millions of bits per second to refresh a color screen while the total number of bits in an e-mail may not reach even a million.. Techniques to Improve QoS Scheduling Techniques Traffic Shaping Techniques Resource Reservation Admission Control 110 FIFO queue (Scheduling Techniques) Priority queuing (Scheduling Techniques) Weighted fair queuing (Scheduling Techniques) Network Layer 111 Leaky bucket (Traffic Shaping Techniques) 12 Mbps for 2 seconds, for a total of 24 Mb of data. The host is silent for 5 seconds and then sends data at a rate of 2 Mbps for 3 seconds, for a total of 6 Mb of data. In all, the host has sent 30 Mb of data in 10 seconds. Host has sent 30 Mb of data in 10 seconds. Leaky bucket implementation (Traffic Shaping Techniques) The algorithm for variable-length packets: 1. Initialize a counter to n at the tick of the clock. 2. If n is greater than the size of the packet, send the packet and decrement the counter by the packet size. Repeat this step until the counter value is smaller than the packet size. 3. Reset the counter to n and go to step 1. A leaky bucket algorithm shapes bursty traffic into fixed-rate traffic by averaging the data rate. It may drop the packets if the bucket is full. Network Layer 112 Token bucket (Traffic Shaping Techniques) The leaky bucket is very restrictive. It does not credit an idle host. For example, if a host is not sending for a while, its bucket becomes empty. Now if the host has bursty data, the leaky bucket allows only an average rate. The time when the host was idle is not taken into account. On the other hand, the token bucket algorithm allows idle hosts to accumulate credit for the future in the form of tokens. The token bucket allows bursty traffic at a regulated maximum rate. Combining Token Bucket and Leaky Bucket: The two techniques can be combined to credit an idle host and at the same time regulate the traffic. The leaky bucket is applied after the token bucket; the rate of the leaky bucket needs to be higher than the rate of tokens dropped in the bucket. 113 Resource Reservation A flow of data needs resources such as a buffer, bandwidth, CPU time, and so on. The QoS is improved if these resources are reserved beforehand. Below, we discuss a QoS model called Integrated Services (INTSERV), which depends heavily on resource reservation to improve the quality of service. Admission Control Admission control refers to the mechanism used by a router or a switch to accept or reject a flow based on predefined parameters called flow specifications. Before a router accepts a flow for processing, it checks the flow specifications to see if its capacity can handle the new flow. It takes into account bandwidth, buffer size, CPU speed, etc., as well as its previous commitments to other flows. Admission control in ATM networks is known as Connection Admission Control (CAC), which is a major part of the strategy for controlling congestion. 114 Integrated Services (INTSERV) Integrated Services is a flow-based QoS model designed for IP. In this model packets are marked by routers according to flow characteristics. What is important are the resources the application needs, not what the application is doing. The model is based on three schemes: 1. The packets are first classified according to the service they require. 2. The model uses scheduling to forward the packets according to their flow characteristics. 3. Devices like routers use admission control to determine if the device has the capability (available resources to handle the flow) before making a commitment. For example, if an application requires a very high data rate, but a router in the path cannot provide such a data rate, it denies the admission. Flow Specification: is made of two parts: 1. Rspec (resource specification). Defines resource that flow needs to reserve (buffer, bandwidth, etc.) 2. Tspec (traffic specification). Defines the traffic characterization of the flow. Admission: After a router receives the flow specification from an application, it decides to admit or deny the service. The decision is based on the previous commitments of the router and the current availability of the resource. Service Classes: Two classes of services have been defined for Integrated Services: 1. 2. Guaranteed service: Designed for real-time traffic that needs a guaranteed minimum end-to-end delay. Controlled-load service: Designed for applications that can accept some delays but are sensitive to an overloaded network and to the danger of losing packets. Good examples of these types of applications are file transfer, e-mail, and Internet access. 115 Resource Reservation Protocol (RSVP): The Integrated Services model needs a connection-oriented network layer. Since IP is a connectionless protocol, a new protocol is designed to run on top of IP to make it connection-oriented. RSVP Messages: RSVP has several types of messages. However, we discuss only two of them: Path and Resv. Path messages Path message travels from the sender and reaches all receivers in the multicast path. On the way, a Path message stores the necessary information for the receivers. A Path message is sent in a multicast environment; a new message is created when the path diverges. Resv messages After a receiver has received a Path message, it sends a Resv message. The Resv message travels toward the sender (upstream) and makes a resource reservation on the routers that support RSVP. 116 Reservation merging Reservation styles Network Layer 117 Differentiated Services An Alternative to Integrated Services Differentiated Services is a class-based QoS model designed for IP. DS field Traffic conditioner 118