* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ch4_1040106update
SIP extensions for the IP Multimedia Subsystem wikipedia , lookup
Point-to-Point Protocol over Ethernet wikipedia , lookup
Net neutrality law wikipedia , lookup
Computer network wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Multiprotocol Label Switching wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Deep packet inspection wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Routing in delay-tolerant networking wikipedia , lookup
UniPro protocol stack wikipedia , lookup
Internet protocol suite wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Computer Networks An Open Source Approach Chapter 4: Internet Protocol Layer Ying-Dar Lin, Ren-Hung Hwang, Fred Baker Chapter 4: Internet Protocol Layer 1 Content 4.1 General Issues 4.2 Data-Plane Protocols: IPv4 4.3 Data-Plane Protocols: IPv6 4.4 Control-Plane Protocols: Address Management 4.5 Control-Plane Protocols: Error Reporting 4.6 Control-Plane Protocols: Routing 4.7 Control-Plane Protocols: Multicast Routing 4.8 Summary Chapter 4: Internet Protocol Layer 2 Protocols Discussed in this Chapter DHCP server NAT Server host Router TCP/UDP ICMP Routing Protocols IP address Subnet Default router IP ARP Data Link Routing Table IP NAT Data Link Chapter 4: Internet Protocol Layer IP Data Link 3 Open Source Implementation 4.1: IP-Layer Packet Flows in Call Graphs Raw IP ip_append_data UDP TCP ip_append_page ip_push_pending_frames ip_queue_xmit Raw IP UDP TCP raw_v4_input udp_v4_rcv tcp_v4_rcv Transport Layer ip_route_output_flow __ip_route_output_key ip_route_output_slow ip_local_deliver_finish ip_local_deliver skb->dst->input dst_input dst_output skb->dst->output ip_output ip_route__input IP Layer ip_finish_output ip_rcv_finish ip_rcv ip_finish_output2 netif_receive_skb net_rx_action net_tx_action dev_queue_xmit Medium Access Control (MAC) Chapter 4: Internet Protocol Layer Data link Layer 4 4.1 General Issues Service Addressing Forwarding Routing Security Chapter 4: Internet Protocol Layer 5 Service Provides a host-to-host transmission service Connects several LANs into an internetwork a network of networks “Internet” the global internetwork to which most of networks are connected Chapter 4: Internet Protocol Layer 6 Internetwork An example of an internetwork Ethernet Fast Ethernet H1 H2 R1 GigabitEthernet R2 R3 H3 Wireless LAN Chapter 4: Internet Protocol Layer 7 Internet Service Model Connectionless Best effort delivery packets may be lost packets are delivered out of order duplicate copies of a packet are delivered packets can be delayed for a long time Next-hop forwarding based on destination address Chapter 4: Internet Protocol Layer 8 Address A globally unique address for host identification Data link layer: a flat address Network layer: a hierarchical address Chapter 4: Internet Protocol Layer 9 Deliver a packet How to deliver a packet? Routing Find a path from source to destination Done by routing protocols Forwarding Forward packets at a router Look up the next-hop from the routing table and then forward Chapter 4: Internet Protocol Layer 10 Forwarding at Data Plane Steps Extract destination address Look up destination address in routing table Obtain the output interface from routing table Forward the packet Chapter 4: Internet Protocol Layer 11 Look Up the Routing Table Issues Speed and memory requirement Good data structure fast look up and table update low memory requirement Classical approaches Trie Hash Fast lookup table Hardware implementation Chapter 4: Internet Protocol Layer 12 Look Up the Routing Table An example of trie with prefixes {00*,010*, 11*, 0001*, 001*, 10100*, 111*}. Chapter 4: Internet Protocol Layer 13 Routing at Control Plane Task of routing Select a path from the source to the destination Goal of routing Efficient (low delay, high throughput, …) Scalable Stable Robust Fair Chapter 4: Internet Protocol Layer 14 IP Routing D Hop-by-hop routing Option: source routing R S Shortest path routing Available information R Global information vs. local information Ex. OSPF vs RIP Information exchange Flooding (broadcast) vs. neighbors only Ex. OSPF vs RIP Chapter 4: Internet Protocol Layer 15 Principle in Action: Bridging vs. Routing Both can be used for connecting two or more LANs Both look up a table for forwarding packets Layering: A bridge forwards a frame based on the link-layer header A router forwards a packet based on the network layer header information e.g., destination MAC address e.g., destination IP address Table : A bridge usually builds a forwarding table through transparent self-learning A router builds a routing table by running a routing protocol explicitly Chapter 4: Internet Protocol Layer 16 Principle in Action: Bridging vs. Routing Collision domain vs. broadcast domain : A bridge is used to separate a collision domain A router is used to separate a broadcast domain A collision domain refers to a network segment An n-port bridge could separate one collision domain into n collision domains All these collision domains are still under the same broadcast domain unless VLANs are created All nodes can communicate with each other by broadcast at the link layer An n -port router could separate one broadcast domain into n broadcast domains Scalability : Bridging is less scalable than routing due to the broadcast requirement if millions of hosts are bridged together, it will be very difficult, if not impossible, to deliver a broadcast message to all hosts when a MAC address is not learned into the forwarding table, flooding will be used to forward a frame Chapter 4: Internet Protocol Layer 17 Multicast Definition of a multicast Communication between a group of hosts Packets are sent to all group members Issues Group membership receivers of a multicast session Multicast tree construction Multiple point-to-point connections or a multicast tree A multicast tree connects the source node to all destination nodes Chapter 4: Internet Protocol Layer 18 Security of IP Aspects on the network security Access Control Data Security Control who has the rights to access Encrypt messages transmitted Intrusion Detection Detect illegal break in Chapter 4: Internet Protocol Layer 19 Data-Plane Protocols and Mechanisms 3.2 Internet Protocol 3.3 Internet Protocol Version 6 Chapter 4: Internet Protocol Layer 20 4.2 Internet Protocol Addressing Subnetting Forwarding Packet format Fragmentation and re-assembly Chapter 4: Internet Protocol Layer 21 IP Address A globally unique 32-bit address to identify a network interface A hierarchical address consists of network id and host id A router usually has more than one interface and one address A host may have more than one address Chapter 4: Internet Protocol Layer 22 IP Address Notation 140.123.1.1 = 10001100 01111011 00000001 00000001 140 123 1 1 IP address notation Chapter 4: Internet Protocol Layer 23 Transmission Order byte order stored in memory Big Endian Little Endian 10001100 01111011 00000001 00000001 A A+1 A+2 byte order transmitted from network layer to data link layer A+3 00000001 00000001 01111011 10001100 A A+1 A+2 A+3 Big Endian 00000001 00000001 01111011 10001100 Little Endian bit order transmitted from Ethernet to physical layer … 1 0 0 0 1 1 0 0 Chapter 4: Internet Protocol Layer 24 Class-ful IP Address bits 01234 16 8 31 24 0.0.0.0 to Host Class A 0 Network Class B 1 0 Class C 1 1 0 Class D 1 1 1 0 Class E 1 1 1 1 127.255.255.255 128.0.0.0 to Host Network Network Multicast address Reserved 191.255.255.255 Host 192.0.0.0 to 223.255.255.255 224.0.0.0 to 239.255.255.255 240.0.0.0 to 255.255.255.255 IANA IPv4 Address Space Registry: http://www.iana.org/assignments/ipv4-address-space/ipv4-address-space.xhtml Chapter 4: Internet Protocol Layer 25 Reserved IP Addresses Host id = 0 denotes the network itself Host id = F…F broadcast address of the network Chapter 4: Internet Protocol Layer 26 IP Subnet Network address uniquely identifies a physical network A physical network consists of several LANs Subnet mask is used to identify a subnet Hosts in the same IP subnet talk directly without intervening router For example cs.ccu.edu.tw: 140.123.101.0 subnet mask: 255.255.255.0 or 140.123.101.0/24 Chapter 4: Internet Protocol Layer 27 IP Subnet Addressing bits 01234 Class A 0 Network Class B 10 Class C 1 1 0 16 8 31 Host Subnet Network 24 Subnet Network Copyright Chapterreserved 4: Internet 2001 Protocol (Lin & Layer Hwang) Host Subnet Host 28 IP Subnet H2 H1 Subnet: 140.123.1.0 140.123.1.2 140.123.1.1 140.123.1.250 R1 140.123.250.1 140.123.250.2 Subnet: 140.123.250.0 R2 140.123.2.250 140.123.2.1 H3 140.123.250.3 R3 140.123.3.250 140.123.2.2 140.123.3.1 H4 Subnet: 140.123.2.0 H5 Subnet: 140.123.3.0 Chapter 4: Internet Protocol Layer 29 Classless IP Address Classful addressing: Inefficient use of address space A class B address is too large A class C address is too small Scalability: too many class C routing entries CIDR: Classless InterDomain Routing network portion of address of arbitrary length address format: a.b.c.d/x Chapter 4: Internet Protocol Layer 30 Authority ICANN: Internet Corporation for Assigned Names and Numbers allocates addresses manages DNS assigns domain names, resolves disputes Chapter 4: Internet Protocol Layer 31 IP Forwarding Aspects of forwarding Packets from upper layer protocols Packets from a network interface Routing table Forwarding is based on routing table Routing entry: (Destination/SubnetMask, NextHop) Default router: (0.0.0.0/0, default router) Chapter 4: Internet Protocol Layer 32 Packet Forwarding (at Host) If (NetworkAddress of the destination == My subnet address) then Transmit the packet directly to the destination Else Look up the routing table Deliver the packet to the default router End if Check if destination is in my subnet: If ((HostIP ^ DestinationIP) & SubnetMask)==0) Chapter 4: Internet Protocol Layer 33 Packet Forwarding (at Router) Look up the routing table If the packet is to be delivered to the upper layer Deliver the packet to an upper layer protocol Else if the packet is to be delivered to a directly connected subnet Deliver the packet directly to the destination Else Deliver the packet to a next hop router End if Chapter 4: Internet Protocol Layer 34 Table Look Up Longest prefix match Organization A: 194.24.0.0/21 Organization B: 194.24.7.0/24 194.24.7.10 matches 194.24.0.0/21 (21 bits) as well as 194.24.7.0/24 (24bits) Longest prefix: 194.24.7.0/24 is the right routing entry Chapter 4: Internet Protocol Layer 35 Open Source Implementation 4.2: IPv4 Packet Forwarding Search cache first; if not found, search the routing table (FIB). ip_route_output() ip_route_output_key() return yes no found? Chapter 4: Internet Protocol Layer ip_route_output_slow() 36 Open Source Implementation 4.2 (cont) Routing Cache rt_hash_table chain u.rt_next rtable rtable chain chain Chapter 4: Internet Protocol Layer 37 Open Source Implementation 4.2 (cont) Routing Table (FIB) fib_table tb_data fn_hash fn_zone fn_zones[0] fn_zones[1] fz_next fz_hash[..] fib_node fib_node fn_next fn_next fn_info fn_info fz_next fz_hash[..] fib_nh fib_nh nh_dev fn_zone fib_info fn_zones[2] fn_zone fn_zones[32] fn_zone_list fz_next fz_hash[..] Chapter 4: Internet Protocol Layer nh_gw 38 IP Packet Format (1/5) 0 4 Version 16 8 Type of Service Header Length 31 Packet Length (bytes) Flags Identifier Time-to-Live 24 13-bit Fragmentation Offset Upper Layer Protocol Header Checksum Source IP Address Destination IP Address Options Data Chapter 4: Internet Protocol Layer 39 IP Packet Format (2/5) Version Number Header Length Current version 4 Version for next generation IP is 6 In units of 4-byte words Type of Service (TOS) Desired service of the packet Chapter 4: Internet Protocol Layer 40 IP TOS Precedence New: Used as DS codepoint. Type of Service Precedence defined In RFC 791: 111: network control 110: Internetwork control 101: CRITIC/ECP 100: Flash override 011: Flash 010: Intermediate 001: Priority 000: Routine Partially implemented!! R TOS defined in RFC 1349: 1000: minimize delay 0100: maximize throughput 0010: maximize reliability 0001: minimize cost 0000: normal service 1111: maximize security R: Reserved Not implemented!! Chapter 4: Internet Protocol Layer 41 IP Packet Format (3/5) Packet Length Identifier Total number of bytes (header + data) Maximum is 65,535 bytes Uniquely identify an IP packet Flags Low-order two bits: for fragmentation control First bit: do not fragment Last bit: more Chapter 4: Internet Protocol Layer 42 IP Packet Format (4/5) Fragmentation Offset Time-to-live (TTL) Position of the fragment, measured in unit of 8 bytes. Used as hop limit Each router decrease TTL by one If TTL reaches zero, sent an ICMP message Upper Layer Protocol IP:0, ICMP:1, TCP:6, UDP:17 Chapter 4: Internet Protocol Layer 43 IP Packet Format (5/5) Header Checksum Source Address (32 bits) Destination Address (32 bits) Options 16-bit 1’s complement checksum of the IP header and IP options loose source routing, strict source routing, record route, record timestamp Data Payload from upper layers Chapter 4: Internet Protocol Layer 44 Open Source Implementation 4.3: IPv4 Checksum in Assembly ip_fast_csum() function (src/include/asm_i386/checksum.h). optimized by writing this function in assembly languages. For 80x86 machines, do the summation in 32-bit words first The result is then copied to another register Shifting registers to have 16 bits in their low-order bits add up registers Taking the complement of the result gives the checksum Chapter 4: Internet Protocol Layer 45 IP Fragmentation & Reassembly Limitation from data link layers MTU(different link-layers, different MTUs) An IP packet larger than MTU of its data link layer needs to be “fragmented” one packet becomes several small packets Re-assembled only at the destination IP Packet Help, cannot get through. Yes, can get through now. link-layer link-layer IP fragments Chapter 4: Internet Protocol Layer 46 Fragment Control Identify fragments of a packet Know the position of a fragment All fragments have the same identifier Recorded in fragmentation offset (13 bits) Know the end of a packet more fragment bit of the last fragment is 0 Chapter 4: Internet Protocol Layer 47 IP Fragmentation Example Header id=x, more=0, offset=0 Header id=x, more=1, offset=0 1480 bytes of data Header id=x, more=1, offset=185 3200 bytes of data 1480 bytes of data Header id=x, more=0, offset=370 240 bytes of data (a) Original packet (b) Fragments Chapter 4: Internet Protocol Layer 48 Open Source Implementation 4.4: IPv4 Fragmentation Upper layer protocol calls ip_queue_xmit() After routing is determined, call ip_queue_xmit2() ip_queue_xmit2() calls ip_fragment() if the packet length is larger than the MTU of the device ip_fragment() A while loop is used to fragment the original packet into fragments Size (in bytes) of a fragment, except the last one, is set to the largest multiplicative number of 8 that is less than the MTU Chapter 4: Internet Protocol Layer 49 Open Source Implementation 4.4 (cont) Re-Assembly net_bh() ip_route_input() ip_rcv() ip_local_deliver() In ip_local_deliver(): yes ip_defrag() more or offset is set? no ip_local_deliver_finish() In ip_defrag(): ip_find() ip_frag_queue() all fragments in? ip_frag_reasm() In ip_find(): yes ipqhashfn() found in hash table? return queue no ip_frag_create() Chapter 4: Internet Protocol Layer 50 Network Address Translation Network Address Translation Protocol Chapter 4: Internet Protocol Layer 51 Network Address Translation Why NAT? Solution to IP address depletion Private IP address (RFC 1597) 10.0.0.0-10.255.255.255 172.16.0.0-172.31.255.255 192.168.0.0-192.168.255.255 Network address translation (RFC 3022) Allow hosts with private IP address to have Internet access Short-term solution for IP address depletion Also provides security for Intranet service Chapter 4: Internet Protocol Layer 52 NAT Example NAT Table 10.2.2.2 ==> 140.123.101.30 10.2.2.3:1175 ==> 140.123.101.30:6175 Src: 140.123.101.30: 1064 Src: 10.2.2.2: 1064 Router With NAT Dst: 140.113.250.5: 80 Src: 10.2.2.3: 1175 Dst: 140.113.54.100: 21 Dst: 140.113.250.5: 80 Src: 140.123.101.30: 6175 Dst: 140.113.54.100: 21 Chapter 4: Internet Protocol Layer 53 Types of NAT (1/2) NAT with a pool of global IP addresses 10.2.2.2 ==> 140.123.101.30 10.2.2.3 ==> 140.123.101.31 dynamic: translate IP address on demand static: translate IP address with pre-configuration NAT with Port Address Translation (NAPT) of one global IP address 10.2.2.2:1064 ==> 140.123.101.30:5064 10.2.2.3:1175 ==> 140.123.101.30:6175 Chapter 4: Internet Protocol Layer 54 Types of NAT (2/2) Port redirection Redirect all WWW service to a specific IP and private port number DNS: www.cs.ccu.edu.tw ==> 140.123.101.38 NAT: 140.123.101.38:80 ==> 10.2.2.2:8080 Transparent proxy Enforce all www traffic to a proxy with cache 140.123.101.38:80 ==> internal www proxy (10.1.1.1) All HTTP requests go to the internal proxy Chapter 4: Internet Protocol Layer 55 Problems with NAT (1/2) Modify source IP and/or port number Modify IP header checksum Modify TCP checksum Application dependent modification ICMP: Basic NAT: ICMP checksum, query id (echo) NAPT: ICMP packets that may contain IP address destination unreachable (3), source quench (4), redirect (5), time exceeded (11), IP header error (12) Chapter 4: Internet Protocol Layer 56 Problems with NAT (2/2) Application Specific Gateways (ALGs) FTP PORT/PASV command has IP address:port in ASCII Translate IP address may result in a change of packet size If new size is shorter, pad with zeroes If new size is longer, need to change TCP sequence number Affects acknowledge, congestion control, … A special table is used to correct the TCP sequence and acknowledge numbers Others: SMTP, SNMP, ……. Chapter 4: Internet Protocol Layer 57 Open Source Implementation 4.5: NAT Source and destination NAT implementation in Linux iptables From PRE_ROUTING Routing POST_ROUTING Interface (Destination NAT) Decision (Source NAT) To Interface LOCAL_OUT (Destination NAT) Upper Layer (TCP/UDP) Chapter 4: Internet Protocol Layer 58 Open Source Implementation 4.5 (cont) Data structure Hash table: ip_conntrack_hash[] Hash function: hash_conntrack() Linear search with a hashed list do_masquerade() ip_conntrack_in() resolve_normal_ct() ip_conntrack_find_get() Chapter 4: Internet Protocol Layer 59 Open Source Implementation 4.5 (cont) NAT function flows ip_nat_out() ip_nat_out() do_bindings() upper_layer_protocol->manip_pkt() manip_pkt() ip_nat_localout() Chapter 4: Internet Protocol Layer 60 Open Source Implementation 4.5 (cont) FTP ALG function flows do_bindings() helper->help() ip_nat_seq_adjust() ip_nat_resize_packet() ftp_data_fixup() mangle_rfc959_packet() ip_nat_mangle_tcp_packet() Chapter 4: Internet Protocol Layer 61 4.3 Internet Protocol Version 6 Changes from IPv4 IPv6 Header IPv6 Extension Header IPv6 Fragmentation and Reassembly IPv6 Address Space Chapter 4: Internet Protocol Layer 62 IPv6 Problems with IPv4 Shortage of address space Lack of Quality of Service guarantee New features of IPv6 Enlarge address space Fixed header format helps speed processing/forwarding Better support for Quality of Service Auto-configuration new “anycast” address: route to “best” of several replicated servers Chapter 4: Internet Protocol Layer 63 IPv6 Header (1/2) 0 4 12 Version Traffic Class Payload Length 16 24 31 Flow Label Next Header Hop Limit Source Address (16 octects) Destination Address (16 octects) Chapter 4: Internet Protocol Layer 64 IPv6 Header (2/2) Version: 6 Traffic class: Flow Label: identify class of service E.g., DiffServ (DS codepoint) identify datagrams in same “flow” Next header: identify upper layer protocol for data Chapter 4: Internet Protocol Layer 65 IPv4 and IPv6 Header Comparison IPv6 Header IPv4 Header Version IHL Type of Service Identification Total Length Flags Version Traffic Class Fragment Offset Payload Length Time to Live Protocol Flow Label Next Header Header Checksum Source Address Source Address Destination Address Legend Options Padding Field’s name kept from IPv4 to IPv6 Fields not kept in IPv6 Name and position changed in IPv6 New field in IPv6 Destination Address Hop Limit Changes from IPv4 (1/3) 0 4 Version 16 8 Type of Service Header Length Time-to-Live Packet Length (bytes) Flags Identifier 31 24 13-bit Fragmentation Offset Upper Layer Protocol Header Checksum Source IP Address Destination IP Address Options Data 0 4 12 Version Traffic Class Payload Length 16 24 31 Flow Label Next Header Hop Limit Source Address (16 octects) Destination Address (16 octects) Chapter 4: Internet Protocol Layer 67 Changes from IPv4 (2/3) Expanded Addressing Capabilities Header Format Simplification From 32 bits to 128 bits (more level and nodes) Improve multicast routing (“scope” field) “anycast address”: send a packet to any one of a group of nodes Reduce bandwidth cost Extensions More flexibility Chapter 4: Internet Protocol Layer 68 Changes from IPv4 (3/3) Options Checksum Allowed, but outside of header, indicated by “Next Header” field Removed to reduce processing at routers Fragmentation Not allowed at intermediate routers Chapter 4: Internet Protocol Layer 69 IPv6 Extension Header Examples IPv6 Header Next Header = TCP TCP Header Data (a) No extension header IPv6 Header Routing Header Next Header = Routing Next Header = TCP TCP Header Data (b) IPv6 header followed by a routing header IPv6 Header Routing Header Fragment Header Next Header = Routing Next Header = Frag. Next Header = TCP TCP Header Data (c) IPv6 header followed by a routing header and a fragment header Chapter 4: Internet Protocol Layer 70 IPv6 Extension Header (1/2) Order of extension headers IPv6 (41) Hop-By-Hop Options header (0) Destination Options header (60) Routing header (43) Fragment header (44) Authentication header (51) Encapsulating Security Payload header (50) Destination Options header (60) Upper-layer header ICMPv6(58) TCP(6), UDP(17), RSVP(46), SCTP(132) Chapter 4: Internet Protocol Layer 71 IPv6 Extension Header (2/2) Not processed by intermediate routers except hop-by-hop option header Processed strictly in order Each extension header occurs at most once except Destination Options header, which occurs at most twice Chapter 4: Internet Protocol Layer 72 Fragment Header Fragmentation is only performed by source Fragment header format 0 8 Next Header Reserved 16 Fragment Offset 29 31 R M Identifier Chapter 4: Internet Protocol Layer 73 Fragmentation Example IPv6 Header Fragment 1 Data Fragment 2 Data Fragment 3 Data (a) Original packet IPv6 Header Fragment Header Fragment 1 Data IPv6 Header Fragment Header Fragment 2 Data IPv6 Header Fragment Header Fragment 3 Data (b) Fragments Chapter 4: Internet Protocol Layer 74 Packet Size Issue MTU of every link must >= 1280 bytes Use Path MTU Discovery to discover MTU greater than 1280 bytes A node need to accept a fragmented packet that is as large as 1500 octets Chapter 4: Internet Protocol Layer 75 IPv6 Addressing Three categories Unicast Multicast Anycast Notation 16-bit Hex’s separated by colons 3FFD:3600:0000:0000:0302:B3FF:FE3C: C0DB Consecutive null 16-bit numbers replaced by :: 3FFD:3600:0:0:0:0:1:A =>3FFD:3600::1:A Chapter 4: Internet Protocol Layer 76 IPv6 Address Assignment Prefix Address Type Portion 0000 0000 0000 0001 0000 001 0000 010 Reserved (IPv4 compatibility) Unassigned Reserved for NSAP Reserved for IPX 1/256 1/256 1/128 1/128 0000 011 Unassigned 1/128 0000 1 0001 Unassigned Unassigned 1/32 1/16 001 010 011 100 Aggregatable Global Unicast Address Unassigned Unassigned Unassigned 1/8 1/8 1/8 1/8 101 110 1110 Unassigned Unassigned Unassigned 1/8 1/8 1/16 1111 0 1111 10 1111 110 Unassigned Unassigned Unassigned 1/32 1/64 1/128 1111 1110 0 1111 1110 10 1111 1110 11 1111 1111 Unassigned Link Local Unicast Address Site Local Unicast Address Multicast Address 1/512 1/1024 1/1024 1/256 Chapter 4: Internet Protocol Layer 77 IPv6 Unicast Address (1/2) Unicast Address without Internal Structure: Node Address Unicast Address with Subnet: Subnet Prefix Interface ID Unicast Unspecified Address: 0000 0000 0000 0000 0001 Unicast Loopback Address: 0000 Chapter 4: Internet Protocol Layer 78 IPv6 Unicast Address (2/2) ::8C7B:65A0 IPv4-compatible IPv6 Address: 32 0000 00000000 32 bits IPv4 Address IPv4-Mapped IPv6 Address: 32 0000 0000FFFF 32 bits ::FFFF:8C7B:65A0 IPv4 Address NSAP Addresses: 00000001 defined according to usage requirements IPX Addresses: 00000010 to be defined Chapter 4: Internet Protocol Layer 79 Aggregatable Global Unicast Address 3 13 8 P TLA ID RES 24 16 NLA ID SLA ID 64 Interface ID RFC 2374 P : Fromat Prefix (001) TLA : Top-Level Aggregation Identifier (8192) RES : Reserved NLA : Next-Level Aggregation Identifier SLA : Site-Level Aggregation Identifier Interface ID : Interface Identifier Current policy: Registry /23, ISP /35, Site /48 Chapter 4: Internet Protocol Layer 80 Interface ID: EUI-64 (RFC 2464) Prefix range from 001 to 111 should use EUI-64 format for interface ID. For 48-bit MAC address 0xff-fe is inserted between the 3rd and 4th bytes The universal/local bit (the second low-order bit of the first byte) is complemented. Example MAC: 00-02-b3-1e-83-29 EUI-64 ID: 02-02-b3-ff-fe-1e-83-29 Link local: FE80::202:b3ff:fe1e:8329 Some problem with privacy: a host can be traced from IPv6 address Chapter 4: Internet Protocol Layer 81 Current Address Allocations APNIC 2001:0200::/23, 2001:0C00::/23, …, 2400:0000::/12 http://www.apnic.net/faq/IPv6-FAQ.html ARIN 2001:0400::/23, …, 2600:0000::/12 http://www.arin.net/library/guidelines/ipv6_initial.html RIPE NCC 2001:0600::/23, 2001:0800::/23, 2A00:0000::/12 http://www.ripe.net/ripencc/mem-services/registeration/ipv6.html LACNIC 2001:1200::/23, …, 2800:0000::/12 6to4 tunnels 2002::/16 http://www.iana.org/assignments/ipv6-unicast-address-assignments/ipv6-unicast-address-assignments.xml Chapter 4: Internet Protocol Layer 82 IPv6 Multicast Address (1/2) Format: flag : 00PT T = 0 : well-known multicast address T = 1 : transient multicast address scope : scope of multicast group P=0 address not assigned on prefix P=1 assigned based on prefix Plen: length of network prefix Prefix: up to 64 bits 0000 : reserved 0001 : node-local scope 0010 : link-local scope 0101 : site-local scope 1000 : organization-local scope 1110 : global scope Chapter 4: Internet Protocol Layer 83 IPv6 Multicast Address (2/2) Node-Local Scope FF01:0:0:0:0:0:0:1 FF01:0:0:0:0:0:0:2 All Nodes Address All Routers Address Link-Local Scope FF02:0:0:0:0:0:0:1 All Nodes Address FF02:0:0:0:0:0:0:2 All Routers Address FF02:0:0:0:0:1:FFxx:xxxx Solicited Node Address (Unicast : 4037::01:800:200E:8C6C is FF02::1:FF0E:8C6C) Site-Local Scope FF05:0:0:0:0:0:0:2 FF05:0:0:0:0:0:0:3 All Routers Address All DHCP Servers Chapter 4: Internet Protocol Layer 84 Transition From IPv4 To IPv6 Not all routers can be upgraded simultaneous How will the network operate with mixed IPv4 and IPv6 routers? Transition assumptions No “Flag Day” Transition will be incremental Possibly over several years Transparent to end users Last Internet transition was 1983 (NCP TCP) Seamless transition from IPv4 to IPv6 IPv6 is designed with transition in mind Assumption of IPv4/IPv6 coexistence Chapter 4: Internet Protocol Layer 85 Transition Approaches Dual Stacks Allow IPv4/IPv6 to co-exist on one device Tunnels For tunneling IPv6 across IPv4 clouds Manually configured tunnel Automatic tunnel Encapsulate IPv6 packets in IPv4 packets (PID=41) Relies on some special IPv6 addresses Translators IPv6 only device communicates with IPv4 only device Chapter 4: Internet Protocol Layer 86 Conceptual View of IPv6 Routing Table fib6_table … tb6_root fib6_node *parent fib6_node fib6_node *left rt6_info rt6_info *right Neighbor Entry Copyright Chapterreserved 4: Internet 2001 Protocol (Lin & Layer Hwang) 87 FIB6 Data Structure fib6_table fib6_node hlist_node tb6_hlist fib6_node *parent … fib6_node *left fib6_node tb6_root fib6_node *right rt6_info rt6_info *leaf inet6_dev *rt6i_idev fib6_node *rt6i_node rt6_info *rr_ptr fib6_node *parent in6_addr rt6i_gateway rt6_info *leaf … fib6_node *left fib6_node *right rt6_info *leaf fib6_table *rt6i_table rt6key dst rt6key src rt6_info *rr_ptr fib6_node *parent 。 。 。 fib6_node *left fib6_node *right rt6_info *leaf rt6_info *rr_ptr Copyright Chapterreserved 4: Internet 2001 Protocol (Lin & Layer Hwang) 88 Chapter 4 Internet Protocol Layer Part II Chapter 4: Internet Protocol Layer 89 Control Plane Mechanisms Address Management Error reporting Internet Control Message Protocol Routing Address resolution Address configuration Intra-domain routing Inter-domain routing Multicast Chapter 4: Internet Protocol Layer 90 4.4 Address Management Address resolution Address configuration Chapter 4: Internet Protocol Layer 91 Address Resolution Address Resolution Protocol (ARP) Chapter 4: Internet Protocol Layer 92 Address Resolution What is address resolution Translate address at different layers For example host name to IP address IP address to Ethernet address Why address resolution MAC address vs. IP address Chapter 4: Internet Protocol Layer 93 Address Resolution Protocol Protocol operation Source node broadcasts an ARP request packet on the IP subnet All nodes on the subnet will receive the ARP request, but only the target node (or some designate server) will reply an ARP reply packet via unicast Source node receives the reply and gets the MAC address of the target node Cache is used to speed up (w/ timer) Chapter 4: Internet Protocol Layer 94 ARP Packet Format (1/3) Chapter 4: Internet Protocol Layer 95 ARP Packet Format (2/3) HARDWARE ADDRESS TYPE Link types: Ethernet=0x0001 PROTOCOL ADDRESS TYPE Upper layer protocol identifier: IP=0x0800 HADDR LEN Length of the address of the link layer: Ethernet=6 PADDR LEN Length of the address of the network layer: IP=4 Chapter 4: Internet Protocol Layer 96 ARP Packet Format (3/3) OPERATION SENDER HADDR Sender network layer address TARGET HADDR Sender link layer address SENDER PADDR Operation code: ARP request=1, ARP reply=2 RARP request=3, RARP reply=4 Target link layer address, fill zero if unknown TARGET PADDR Target network layer address Chapter 4: Internet Protocol Layer 97 Encapsulate ARP Packet into MAC Frame Protocol id: 0x0806 Destination address of an ARP request packet: 0xFFFFFFFFFFFF Chapter 4: Internet Protocol Layer 98 Reverse ARP (RARP) Allow a diskless workstation to discover its IP address Need a RARP server on each network Bootp: Use UDP messages which are forwarded over routers to find the file server that holds the mapping Chapter 4: Internet Protocol Layer 99 Open Source Implementation 4.6: ARP Data structure Hash table: arp_table Hash parameters: a primary key and device interface index Functions Arp_send(): set up ARP header and then xmit Arp_rcv(): Only deal with reply or request operation. Request: calls ip_input_route(), if routes to local, calls arp_send() to send out ARP reply. Otherwise, if the host is an arp proxy, also sends ARP reply. Reply: update ARP table. __neigh_lookup(): calls neigh_lookup() to search the arp hash table, if not found, create one Eth_rebuild_header (old) or arp_solicit() calls arp_send() Chapter 4: Internet Protocol Layer 100 Address Configuration Dynamic Host Configuration Protocol (DHCP) Chapter 4: Internet Protocol Layer 101 Address Configuration What is address configuration Automatically and dynamically assign an IP address to a host Why address configuration Setting IP address is error prone. Insufficient IP addresses: share IP addresses among hosts Better network management Chapter 4: Internet Protocol Layer 102 DHCP Protocol Dynamic Host Configuration Protocol DHCP is derived from BOOTP Some fields are not for host configuration Operations A host broadcasts a DHCPDISCOVER message A DHCP server receives and replies it Or a DHCP relay server receives it and forwards to the DHCP server, gets the configuration and relays to the host DHCP message are sent over UDP (port 67) Chapter 4: Internet Protocol Layer 103 State Diagram for DHCP Client /DHCPDISCOVER Initial DHCPNACK or Lease expires DHCPOFFER Rebind Offer DHCPNACK /DHCPREQUEST Rebinding expires /DHCPREQUEST DHCPACK Renew Request DHCPACK DHCPACK Bind Renewal expires /DHCPREQUEST Chapter 4: Internet Protocol Layer 104 DHCP Packet Format 0 8 Operation Hard. Type 16 24 Hardware Len Hops 31 Transaction ID Seconds B Flags Client IP Address Your IP Address Server IP Address Router IP Address Client Hardware Address (16 octects) Server Host Name (64 octects) Boot File Name (128 octects) Options (variable) Chapter 4: Internet Protocol Layer 105 DHCP Packet Format More information for host configuration such as default router, subnet mask encoded in the option field (code=55, length, parameter) ID 1 3 6 12 15 17 40 Request Parameter Subnet mask Default gateway DNS server Host name Domain name Boot path NIS domain name Chapter 4: Internet Protocol Layer 106 DHCP Packet Format Options Option field starts with three fields: code (53), length(1), type(1-7) Type DHCP Message 1 DHCPDISCOVER 2 DHCPOFFER 3 DHCPREQUEST 4 DHCPDECLINE 5 DHCPACK 6 DHCPNACK 7 DHCPRELEASE Chapter 4: Internet Protocol Layer 107 Open Source Implementation 4.7: DHCP ip_auto_config() struct bootp_pkt { struct iphdr iph; struct udphdr udph; u8 op; u8 htype; u8 hlen; u8 hops; u32 xid; u16 secs; u16 flags; u32 client_ip; u32 your_ip; u32 server_ip; u32 relay_ip; u8 hw_addr[16]; u8 serv_name[64]; u8 boot_file[128]; u8 exten[312]; }; ic_dynamic() /* BOOTP packet format */ /* IP header */ /* UDP header */ /* 1=request, 2=reply */ /* HW address type */ /* HW address length */ /* Used only by gateways */ /* Transaction ID */ /* Seconds since we started */ /* Just what it says */ /* Client's IP address if known */ /* Assigned IP address */ /* (Next, e.g. NFS) Server's IP address */ /* IP address of BOOTP relay */ /* Client's HW address */ /* Server host name */ /* Name of boot file */ /* DHCP options / BOOTP vendor extensions */ ic_bootp_send_if() ic_dhcp_init_options() Chapter 4: Internet Protocol Layer 108 4.5 Error Reporting Internet Control Message Protocol (ICMP) Chapter 4: Internet Protocol Layer 109 Error Control Protocol What is error control protocol A protocol for reporting error or status of TCP/IP at remote site (router or host) Why error control protocol For monitoring the status of TCP/IP at each host/router For reporting error between hosts or routers Chapter 4: Internet Protocol Layer 110 Internet Control Message Protocol (ICMP) ICMP runs over IP ICMP Header IP Header ICMP Data IP Data Chapter 4: Internet Protocol Layer 111 ICMPv4 Packet Format Type and Code are used to identify an error event The data filed contains STD 5, RFC 792, “Internet Control Message Protocol” STD 3, RFC 1122, “Requirements for Internet Hosts – Communication Layers” IP header plus the first 64 bits of the packet that elicited the ICMP message IP header and at least the first 8 data octets of the datagram that triggered the error (more than 8 octets MAY be sent) RFC 1812, "Requirements for IP Version 4 Routers“ SHOULD contain as much of the original datagram as possible without the length of the ICMP datagram exceeding 576 bytes 0 8 Type 16 24 31 Checksum Code Data Chapter 4: Internet Protocol Layer 112 Type and Code Type Code Description 0 0 Echo reply (ping) 3 0 Destination network unreachable 3 1 Destination host unreachable 3 2 Destination protocol unreachable 3 3 Destination port unreachable 3 4 Fragmentation needed and DF set 3 5 Source route failed 3 6 Destination network unknown 3 7 Destination host unknown 4 0 Source quench (congestion control) 5 0 Redirect (destination network) 5 1 Redirect (host) 8 0 Echo request (ping) 9 0 Route advertisement 10 0 Router discovery 11 0 TTL expired 12 0 Bad IP header Chapter 4: Internet Protocol Layer 113 ICMPv4 Examples (1/6) Echo Request/Reply Source sends an echo request (type=8) to a destination, destination responses with an echo reply (type=0) The data received by the Echo Request must be entirely included in the Echo Reply. The Identifier and Sequence Number is used by the client to match the reply with the request that caused the reply. ping uses echo request and reply 0 8 Type 16 24 31 Checksum Code Sequence Number Identifier Data Chapter 4: Internet Protocol Layer 114 ICMPv4 Examples (2/6) Destination Unreachable (type=3) destination unreachable is used to report various unreachable reasons, such as network, host, or port unreachable. However, code 4 of type 3 message is used to report the error that fragmentation is needed at an intermediate router (due to MTU) but the do not fragment bit in the IP header is set. 0 8 Type=3 Empty 16 Code 24 31 Checksum Next-hop MTU IP header + first 8 bytes of original packet’s data Chapter 4: Internet Protocol Layer 115 ICMPv4 Examples (3/6) Source Quench when buffer overflows, router sends a source quench (type=4) to source 0 8 31 16 Type 24 Checksum Code Unused Data Routing redirect If a host forwards a packet to a wrong router, router sends a redirect (type=5, code=0 or 1, (network/ host)) ICMP message to source 0 8 31 16 24 Type Checksum Code Gateway (router) IP address Data Chapter 4: Internet Protocol Layer 116 ICMPv4 Examples (4/6) Time Exceeded If TTL is less or equal to zero (after decrement), router sends a Time Exceeded (type=11) ICMP message to source traceroute implementation traceroute sends an ICMP echo request with TTL=1 to the target machine When the first router receives the message, it responds with a time exceeded message traceroute then sends another echo request with TTL=2 The message passes the first router, but discarded by the second router with a returned time exceeded message Traceroute repeats sending echo requests until it receives an echo reply from the target machine Chapter 4: Internet Protocol Layer 117 ICMPv4 Examples (5/6) IP header error Wrong IP header, such as wrong option field. (type=12) Code=0: IP header is invalid Code=1: a required option is missing Chapter 4: Internet Protocol Layer 118 ICMPv4 Examples (6/6) Time Stamp Request/Reply Information Request/Reply Type=13/14, code=0 Type=15/16, code=0 Address Mask Request/Reply Type=17/18, code=0 Chapter 4: Internet Protocol Layer 119 ICMPv6 New type and code Type 0..127: error report 1: Destination unreachable 2: Packet too big 3: Time Exceeded 4: Parameter problem Type 128..255: informational 128, 129: Echo request & reply (RFC 2463) 130, 131, 132: Multicast group membership management (RFC 2710) 133,134: Router solicitation and advertisement (RFC 2461) 135, 136: Neighbor solicitation and advertisement (RFC2461) 137: Redirect (RFC 2461) 138: Router renumbering (RFC 2894) 139, 140: node information query/response (draft, name-lookups) 141, 142: Inverse ND solicitation/ Adv message (RFC 3122) 150, 151: Home agent address discovery request/reply (draft) 152, 153: Mobile prefix solicitation/advertisement Chapter 4: Internet Protocol Layer 120 ICMPv6 Type Code Description 1 1 0 1 No route to destination Communication with destination administratively prohibited 1 1 3 4 Address unreachable Port unreachable 2 3 0 0 Packet too big Hop limit exceeded in transit 3 1 Fragment reassembly time exceeded 4 4 0 1 Erroneous header field encountered Unrecognized Next Header type 4 2 Unrecognized IPv6 option encountered 128 129 0 0 Echo request Echo reply 130 131 0 0 Multicast Listener Query Multicast Listener Report 132 0 Multicast Listener Done 133 134 0 0 Router Solicitation Router Advertisement 135 0 Neighbor Solicitation 136 137 0 0 Neighbor Advertisement Redirect Chapter 4: Internet Protocol Layer 121 Open Source Implementation 4.8: ICMP Data structure ICMP header: struct icmphdr Error when forwarding IP packets ip_forward() icmp_send() TTL<=1 Strict source routing Fail Route redirect Error when receiving IP packets ip_route_input_slow() ip_error() icmp_send() destination unreachable Chapter 4: Internet Protocol Layer 122 Open Source Implementation 4.8 (cont) Receiving ICMP packets Control handlers: icmp_pointers[] icmp_unreach() for type 3, 4, 11, and 12 icmp_redirect() for type 5 icmp_echo() for type 8 icmp_timestamp() for type 13 icmp_address() for type 17 icmp_address_reply() for type 18 icmp_discard() for other types icmp_rcv() icmp_pointers ICMPv6 icmpv6_send() icmpv6_rcv() icmpv6_echo_reply(), icmpv6_notify() Chapter 4: Internet Protocol Layer 123 4.6 Routing Principle Intra-domain routing Inter-domain routing Chapter 4: Internet Protocol Layer 124 Routing Principle Link State Routing Distance Vector Routing Chapter 4: Internet Protocol Layer 125 Routing Task of routing Select a path from the source to the destination Goal of routing Efficient (low delay, high throughput, …) Scalable Stable Robust Fair Chapter 4: Internet Protocol Layer 126 Optimality of IP Routing IP uses hop-by-hop routing(forwarding) Each router determines its own routing table Why packets will be delivered to their destinations along the optimal path? If k is an intermediate node on the optimal path from source node s to destination d The path from s to k is also the optimal path from s to k A shortest path tree can be constructed from a source to the rest of the graph. Chapter 4: Internet Protocol Layer 127 Routing Algorithm Classification Global or decentralized information? Static Link State routing: use Dijkstra algorithm Distance Vector routing: use distributed Bellman-Ford algorithm Fixed routing table, set up manually Dynamic (adaptive) Routing table adapts to network status Chapter 4: Internet Protocol Layer 128 The Shortest Path Algorithm View a network as a graph Nodes are routers Edges are physical links Associated with a link cost: delay, congestion level, … Find the least cost path Depends on information available Chapter 4: Internet Protocol Layer 129 Link-State Routing Routing information Global information is available by reliable broadcasting Dynamic: information exchanged when topology changes or periodically Path calculation Dijkstra algorithm Chapter 4: Internet Protocol Layer 130 Dijkstra Algorithm For each v in V-{s} { If v is adjacent to s C(v)=lc(s,v) else C(v)=? } T = {s} While (T≠ V) { find w not in T s.t. C(w) is the minimum for all w in (V-T) T = T ∪{w} For each v in V-T C(v) = MIN(C(v), C(w)+lc(w,v)) P(v)=w) } Chapter 4: Internet Protocol Layer 131 Dijkstra Algorithm Example 4 2 A 1 B D 3 1 1 C Iteration 0 1 2 3 4 T A AC ACE ACEB ACEBD 1 E C(B),p(B) C(C),p(C) C(D),p(D) C(E),p(E) ∞ ∞ 4,A 1,A 3,C 4,C 2,C 3,C 3,E 3,E Chapter 4: Internet Protocol Layer 132 Routing Table at Node A Destination Cost NextHop B 3 C C 1 C D 3 C E 2 C Chapter 4: Internet Protocol Layer 133 Distance Vector Algorithm Routing information Only local information is known Knows status of adjacent links and routing information of adjacent nodes Dynamic: information exchanged when link cost or shortest path changed Path calculation Bellman-Ford Chapter 4: Internet Protocol Layer 134 Bellman-Ford Algorithm While (1) { If x received route update message from y { For each (Dest, Distance) pair in y’s report { If (Dest is new) { /* Dest not in routing table */ Add a new entry for destination Dest rt(Dest).distance = Distance+lc(x,y) rt(Dest).NextHop = y } else if ((Distance+lc(x,y))<rt(Dest).distance){ /* y reports a shorter distance to Dest */ rt(Dest).distance = Distance+lc(x,y) rt(Dest).NextHop = y } } Send update messages to all neighbors if route changes Also send update messages to all neighbors periodically } Chapter 4: Internet Protocol Layer 135 Bellman-Ford Algorithm Example: Step 1 Dt. C Dt. C NH A 4 A C 2 D 1 C D NH B 4 B C 1 C 4 B 2 A 1 D Dt. C NH B 1 B C 3 E 1 C E 3 1 1 C 1 E Dt. C NH A 1 A Dt. C NH B 2 C 1 C D 3 B D D 1 D E 1 E Chapter 4: Internet Protocol Layer 136 Bellman-Ford Algorithm Example: Step 2 Dt. C NH B 3 C C 1 D 4 C C E 2 C 4 B 2 A Dt. C NH A 3 C C 2 D 1 C D E 2 D 1 D 3 Dt. C NH A 4 C B 1 C 2 B E E 1 E 1 1 C 1 E Dt. C NH Dt. C NH A 1 A A 2 C B 2 B 2 D 2 B E C 1 D C E 1 E D 1 D Chapter 4: Internet Protocol Layer 137 Bellman-Ford Algorithm Example: Step 3 Dt. C NH B 3 C C D 1 C C E 2 3 4 C B 2 A Dt. C NH A 3 C C D 2 1 C D E 2 D 1 Dt. C D 3 NH A 3 E B C 1 2 B E E 1 E 1 1 C 1 E Dt. C NH Dt. C NH A 1 A A 2 C B D 2 2 B E B C 2 1 D C E 1 E D 1 D Chapter 4: Internet Protocol Layer 138 Bellman-Ford Algorithm Example Routing table of node A after convergence Destination B C D E Cost 3 1 3 2 NextHop C C C C Chapter 4: Internet Protocol Layer 139 Problem with DV Routing (1/2) Phenomenon good news travels fast bad news travels slowly 4 2 A 1 B D 50 3 D 3 1 1 7 1 2 A 1 1 B C 1 E ∞ Route updated in two iterations. C 1 E Route updated in more than 25 iterations. Chapter 4: Internet Protocol Layer 140 Problem with DV Routing (2/2) Routing loop Due to the above phenomenon Loop formed before routing converged Partial solutions Split horizon Poisoned reverse Routing updates sent to a neighbor should not contain route learned from that neighbor. If A learns a route to D from B, then A tells B that he cannot reach D so to poison the route. Hold down timer When a router receives an update from a neighbor indicating a network is inaccessible, the router marks the route as inaccessible and starts a holddown timer Holddown timers help prevent counting to infinity but also increase convergence time Chapter 4: Internet Protocol Layer 141 Hierarchical Routing Not a flat network: too many routing entries Define an AS Routers within an AS are under the same administrative control Routing within an AS and between AS’s Intra-domain routing Inter-domain routing http://bgp.potaroo.net for the current BGP table size Chapter 4: Internet Protocol Layer 142 AS The Internet consists of Autonomous Systems (AS) interconnected with each other: Stub AS: small corporation Multihomed AS: large corporation (no transit) Transit AS: provider Two-level routing: Intra-AS: routing within an AS Inter-AS: routing between AS’s Chapter 4: Internet Protocol Layer 143 An example of Hierarchical Routing Inter-domain routers (exterior gateway) A.2 C.2 Domain A Domain C A.1 C.1 A.3 B.1 C.3 B.4 Domain B B.3 B.2 Intra-domain routers (interior gateway) Chapter 4: Internet Protocol Layer 144 Example of Internet Routing Protocols Intradomain routing RIP OSPF Interdomain routing BGP-4 Chapter 4: Internet Protocol Layer 145 Intra-domain Routing Routing Information Protocol (RIP) Open Shortest Path First (OSPF) Chapter 4: Internet Protocol Layer 146 Intra-domain Routing What is intra-domain routing Routing within a domain (AS) Administrator decides the routing protocol Administrator has total control on all routers Why intra-domain routing Maintain connectivity within a domain Chapter 4: Internet Protocol Layer 147 Intra-domain Routing Runs Interior Gateway Protocols (IGP) Most Common IGP’s RIP: Routing Information Protocol OSPF: Open Shortest Path First Chapter 4: Internet Protocol Layer 148 RIP Originally designed for Xerox PARC Universal Protocol (used in XNS) Adopted by UNIX and TCP/IP in 1982 routed of BSD RIP: RFC 1058 [1988] RIPv2: RFC 1388 [1993] RIPng: RFC 2080 [1997] Chapter 4: Internet Protocol Layer 149 RIP Distance Vector routing Use hop count as cost metric (up to 15) Restrict size of the network to 15 Exchange routing message (advertisement) every 30 seconds Each advertisement consists of up to 25 routes (destination nets) Chapter 4: Internet Protocol Layer 150 RIPv2 Packet Format 0 8 Command 16 24 Must be zero Version Family of net 1 31 Route Tag for net 1 Address of net 1 Subnet Mask for net 1 Next Hop for net 1 Distance to net 1 Family of net 2 Route Tag for net 2 Address of net 2 Subnet Mask for net 2 Next Hop for net 2 Distance to net 2 Chapter 4: Internet Protocol Layer 151 RIP Packet Format and Stability RIP packet format commands: request or reply, version number up to 25 destination addresses Stability hop count limit: 15 means infinity Stabilization Timer: Split horizons allows RIP to learn all routes from its neighbors before sending full updates no update on backward route (omits routes learned from that neighbor) Poison Reverse Update sends updates to a neighbor includes routes learned from that neighbor but sets the route metric to infinity Chapter 4: Internet Protocol Layer 152 Routing Table of RIP Taken from a cisco router at cs.ccu.edu.tw Destination Gateway Distance /Hop 35.0.0.0/8 140.123.1.250 120/1 127.0.0.0/8 directly connected 136.142.0.0/16 140.123.1.250 120/1 150.144.0.0/16 140.123.1.250 120/1 140.123.230.0/24 directly connected 140.123.240.0/24 140.123.1.250 120/4 140.123.241.0/24 140.123.1.250 120/3 140.123.242.0/24 140.123.1.250 120/1 192.152.102.0/24 140.123.1.250 120/1 0.0.0.0/0 140.123.1.250 120/3 Chapter 4: Internet Protocol Layer Update Flag Interface timer 00:00:28 R Vlan1 C Vlan0 00:00:17 R Vlan1 00:00:08 R Vlan1 C Vlan230 00:00:22 R Vlan1 00:00:22 R Vlan1 00:00:22 R Vlan1 00:01:04 R Vlan1 00:00:08 R Vlan1 153 Open Source Implementation 4.9: RIP GNU Zebra Project Supports many routing protocols RIP, OSPF, BGP Runs routing daemon as user process Communicates with kernel via netlink Chapter 4: Internet Protocol Layer 154 Open Source Implementation 4.9 (cont) Routing Daemon and Kernel User space Kernel space Routing Table Routing manager (Zebra, routed, gated, …) Handling protocol specific packets Control Kernel packets Data packets Packets from NICs Chapter 4: Internet Protocol Layer 155 Open Source Implementation 4.9 (cont) Overview of Zebra Routing Protocols OSPFd BGPd RIPngd Zebra Daemon ioctl Routing Table sysctl netlink proc fs rtnetlink Routing Information (via socket interface) RIPd Kernel Chapter 4: Internet Protocol Layer 156 Open Source Implementation 4.9 (cont) RIP Daemon (ripd) Initialization Scheduling RIP core rip_version rip_default_metric rip_timers rip_route rip_distance routemap Interface rip_network rip_neighbor rip_passive_interface ip_rip_version ip_rip_authentication rip_split_horizon Zebra client RIP Peer rip_peer_timeout rip_peer_update rip_peer_display Zebra Daemon offset Chapter 4: Internet Protocol Layer 157 OSPF Features (1/3) OSPF v2: RFC 2328 [1998] OSPF v3: RFC 2740 [1999] Run internal to a single Autonomous System Link-state routing protocol Shortest-path tree be constructed for routing table Dijkstra algorithm Support for equal-cost multipath routing Support for TOS-based routing Support variable subnet length each route distributed has a destination and mask Chapter 4: Internet Protocol Layer 158 OSPF Features (2/3) Integrated uni- and multicast support: Multicast OSPF (MOSPF) uses same topology database as OSPF Two levels of hierarchy : areas within an AS Area: a group of contiguous networks and hosts Topology of an area is invisible form outside Routing in the AS takes place on two level intra-area routing, inter-area routing Chapter 4: Internet Protocol Layer 159 OSPF : Two Levels of Hierarchy AS boundary router Area border router internal router Area A backbone router Backbone internal router Area B Chapter 4: Internet Protocol Layer Area border router internal router Area C 160 OSPF Hierarchy Area border routers Area internal routers Only participate intra-area routing Receive external routes broadcasted by area border router Backbone routers “summarize” distances to networks of its area advertise to other Area Border routers run OSPF routing limited to backbone AS Boundary routers connect to other AS’s Chapter 4: Internet Protocol Layer 161 OSPF Features (3/3) External routing data is advertised through AS Flood without modification Two types of cost type 1: compatible with costs within area, cost to an external network is the sum of internal cost and external cost type 2: order of magnitude larger, cost to an external network is solely determined by external cost Chapter 4: Internet Protocol Layer 162 OSPF Features Supports stub to reduce broadcasting An area can be figured as stub when there is a single exit point from the area. AS boundary routers cannot be placed internal to stub areas. No AS external advertisements are flood into /through stub areas. Chapter 4: Internet Protocol Layer 163 3 N1 3 N2 RT1 1 1 1 N3 1 RT4 8 8 8 7 6 8 RT2 2 8 RT5 RT3 6 6 N14 Internal router Area border router AS boundary router RT6 N4 Area 1 7 Ia Area 2 N11 1 N9 3 1 RT11 1 2 1 RT12 2 N10 1 N6 N8 10 6 RT7 2 N12 9 N15 Ib 5 RT10 RT9 Stub H1 N12 8 N13 1 RT8 Area 3 Chapter 4: Internet Protocol Layer 4 N7 164 OSPF Example: Intra-area Summarized area information advertised by RT3 and RT4 to backbone. Network Cost advertised by RT3 Cost advertised by RT4 N1 4 4 N2 4 4 N3 1 1 N4 2 3 3 N1 3 N2 Area 1 RT1 1 1 N3 RT2 1 1 2 N4 RT4 RT3 Chapter 4: Internet Protocol Layer 165 OSPF Example: Inter-area Backbone information advertised into area 1 by RT3 and RT4. Destination Ia, Ib N6 N7 Cost advertised by RT3 20 16 20 N8 N9-N11 RT5 RT7 18 29 14 20 Cost advertised by RT4 RT4 27 8 15 19 8 8 RT3 N12 8 N13 N14 7 6 8 18 36 8 14 8 RT5 6 6 RT6 7 Ia Ib 5 RT10 1 Chapter 4: Internet Protocol Layer N6 6 RT7 2 N12 9 1 166 N15 OSPF Example: Final Routing Table RT4’s routing table Destination Path Type Cost Next Hop N1 N2 N3 intra-area intra-area intra-area 4 4 1 RT1 RT2 direct N4 N6 intra-area Inter-area 3 15 RT3 RT5 N7 N8 inter-area Inter-area 19 18 RT5 RT5 N9-N11 N12 inter-area Type 1 external 36 16 RT5 RT5 N13 N14 Type 1 external Type 1 external 16 16 RT5 RT5 N15 Type 1 external 23 RT5 Chapter 4: Internet Protocol Layer 167 Open Source Implementation 4.10: OSPF Initialization Scheduling OSPF core ip_ospf_interface ip_ospf_neighbor ospf_router_id network_area show_ip_ospf_cmd Route Map route_map_update route_map_event Interface LSA Link State Advertisement Route Zebra daemon Network OSPF Flooding zclient OSPF SPF calcuation Chapter 4: Internet Protocol Layer ASE AS external route calculation LSDB 168 Inter-domain Routing Border Gateway Protocol (BGP) Chapter 4: Internet Protocol Layer 169 Inter-domain Routing Called Exterior Gateway Protocols (EGP) Most common EGP BGP: Border Gateway Protocol Chapter 4: Internet Protocol Layer 170 BGP Features (1/3) RFC 1771 (BGP-4) “Path vector” routing loop free inter-domain routing between ASs Runs over TCP with port 179 Routing table keeps all feasible paths Only advertises optimal path to neighbors Chapter 4: Internet Protocol Layer 171 BGP Features (2/3) Can be used within and between ASs multiple border routers (BGP speaker) within an AS IBGP: Interior BGP runs between routers in the same AS All BGP speakers within the AS must be fully meshed (through IGP protocol) EBGP: Exterior BGP runs between routers belonging to two different ASs Chapter 4: Internet Protocol Layer 172 BGP Features (3/3) Support information aggregation CIDR Confederation Policy routing at AS could also be used to allow multiple ASs within an AS access-list permit or deny (route or path filtering) Link cost metric combination of different metric with the degree of preference (weight, loc pref, med, …) Chapter 4: Internet Protocol Layer 173 BGP Messages Open Keepalive Send often enough to keep from timer expiration Update First message sent after connection No periodic refresh of the entire table Advertise a single feasible route to a peer Withdraw multiple routes previously advertised Message contains path attributes and Network Layer Reachability Information (NLRI) Notification send when an error is detected Chapter 4: Internet Protocol Layer 174 BGP Routing Algorithm Path vector routing Different ASs may have different link cost metrics Loop free is very important Policy routing is preferred (different priorities, prohibit lists, …) AS_PATH of the path attribute A list of ASs to the destination Loop is found if current AS already in the AS_PATH Next_Hop of the path attribute indicates the next router to the destination NLRI A list of subnets that can be reached by the AS_PATH Chapter 4: Internet Protocol Layer 175 BGP Path Selection Path selection (1) If Next_Hop is inaccessible, drop the update (2) Prefer largest LOCAL_PREF (3) Prefer shorter AS_PATH (4) Prefer lower origin code (igp<egp<incomplete) (5) Prefer lower MED (MULTI_EXIT_DISC) (6) Prefer external path over internal path (7) Prefer closer IGP neighbor (8) Prefer BGP router with lower ip address Advertise the highest degree of preference for each destination to neighbor BGP speakers Chapter 4: Internet Protocol Layer 176 BGP PATH Attributes (1/2) Origin Defines the origin of the path information AS_PATH IGP, BGP, Incomplete (unknown, e.g., static route) Ordered list or a set Next_Hop IP of the next hop to the destination For multiaccess network, nexthop could be a router other than the BGP speaker Chapter 4: Internet Protocol Layer 177 BGP PATH Attributes (2/2) LOCAL_PREF Indicate preferred exit router within an AS Multi_Exit_Disc(MED) When a router has multiple external links to the same AS, the link to the router with lower MED is preferred. Chapter 4: Internet Protocol Layer 178 BGP Example Network 61.13.0.0/16 61.251.128.0/20 211.73.128.0/19 218.32.0.0/17 218.32.128.0/17 Next Hop 139.175.56.165 140.123.231.103 140.123.231.100 139.175.56.165 140.123.231.103 210.241.222.62 139.175.56.165 140.123.231.103 140.123.231.106 139.175.56.165 140.123.231.103 140.123.231.106 LOCAL_ Weight Best? PREF 0 0 0 0 0 0 0 0 0 0 0 0 0 N N Y Y N Y N N Y N N Y Chapter 4: Internet Protocol Layer PATH Origin 4780,9739 9918,4780,9739 9739 4780,9277,17577 9918,4780,9277,17577 9674 4780,9919 9918,4780,9919 9919 4780,9919 9918,4780,9919 9919 IGP IGP IGP IGP IGP IGP IGP IGP IGP IGP IGP IGP 179 4.7 Multicast Internet Group Management Protocol (IGMP) Distance Vector Multicast Routing Protocol (DVMRP) Protocol-Independent Multicast (PIM) New Developments: SSM, MSDP, Anycast RP Multicast Backbone (MBONE) Chapter 4: Internet Protocol Layer 180 Multicast Communication among more than two parties Multi-party video conferencing Distance learning Issues Maintain group member information Construct a multicast tree for packet transmission Many to many communication Chapter 4: Internet Protocol Layer 181 Membership Management IGMP Chapter 4: Internet Protocol Layer 182 Internet Group Management Protocol ( IGMPv2) RFC 2236 Used by IP hosts to report multicast group memberships to routers Enhances IGMPv1 Querier election mechanism IGMPv2 Leave Group message Group-Specific Query message Chapter 4: Internet Protocol Layer 183 Protocol Overview (1/4) Multicast router plays one of the two roles: Querier or Non-Querier Querier is responsible for maintain membership information Router with the smallest IP address becomes the Querier Routers hear the Query messages and make the judge Querier periodically sends General Query to solicit membership information A General Query is sent to 224.0.0.1 (ALLSYSTEMS multicast group) Chapter 4: Internet Protocol Layer 184 Protocol Overview (2/4) When a host receives a General Query Delays a random time from the range of [0..Max Response Time](starts a timer) Sends a report with TTL=1 when timer expires Report suppression Max Resp. Time is given in the Query message If another host’s report received, stop the timer and does not send the report Similar for a host receives a GroupSpecific Query Chapter 4: Internet Protocol Layer 185 Protocol Overview (3/4) When a router receives a report adds the group being reported to the list of multicast groups Sets timer for the membership to [Group Membership Interval]. Deletes it if no reports received before timer expired Query is sent periodically When a host joins a multicast group Sends an unsolicited report immediately Chapter 4: Internet Protocol Layer 186 Protocol Overview (4/4) When a host leaves a multicast group If it was the last host to reply to a Query, it should send a Leave Group message to allrouters multicast address (224.0.0.2) When a router receives a Leave Group message Sends Group-specific Queries every [Last Member Query Interval] to the group being left for [Last Member Query Count] times. If no reports received before [Last Member Query Interval], assumes no local members. Chapter 4: Internet Protocol Layer 187 IGMPv2 Message Format (1/2) message format 0 8 Type 16 Max. Resp. Time 24 31 Checksum Multicast group Address type 0x11=Membership Query - General query - Group-Specific Query 0x16=Version 2 Membership Report 0x17=Leave Group Chapter 4: Internet Protocol Layer 188 IGMPv2 Message Format (2/2) Max Response Time - only in membership query message - set to be zero in other messages Checksum - 16-bit one’s complement Group address - zero when sending a General Query - group address when sending a Group-Specific query Chapter 4: Internet Protocol Layer 189 IGMPv3 IETF RFC 3376 Adds support for “source filtering” A receiver may request to receive packets only from specific source addresses Select source addresses by INCLUDE or EXCLUDE IPMulticastListen(socket, interface, multicast-address, filter-mode, source-list) filter-mode: INCLUDE or EXCLUDE Chapter 4: Internet Protocol Layer 190 Multicast Routing Protocols DVMRP PIM-SM SSM MSDP Anycast RP Chapter 4: Internet Protocol Layer 191 Multicast Routing Protocols Two types of multicast tree source-based tree core-based tree (shared tree) Multicast protocols What’s the difference: per (S,G) tree or per (*,G) tree DVMRP PIM Sparse mode Dense mode SSM MSDP Anycast RP MBGP Chapter 4: Internet Protocol Layer 192 Example where Steiner tree is different from least-cost-path tree C 3 1 A 3 B 4 1 3 D Copyright Chapterreserved 4: Internet 2001 Protocol (Lin & Layer Hwang) 193 Distance Vector Multicast Routing Protocol (DVMRP) RFC-1054 Derived from RIP Widely used on the Mbone Relies on RIP for unicast routing Enable incremental deployment of IP multicast since it supports tunnel Construct a source-based tree per source Provide a shortest path between source and receivers using Reverse Path Forwarding (RPF) algorithm Chapter 4: Internet Protocol Layer 194 RPF Algorithm Three steps Reverse Path Broadcast (RPB) Prune to a Reverse Path Multicast (RPM) tree Forwarding data uni-directionally Chapter 4: Internet Protocol Layer 195 Reverse Path Broadcast (RPB) Broadcast on the Reserve Path When a multicast packet is received Forward the packet on all of its outgoing links only if Packet arrives on the interface that is also the interface of the shortest path back to the sender Packet is not duplicated Otherwise, discard the packet Chapter 4: Internet Protocol Layer 196 RPB Example member mrouter router w/o member source RA Forward Discard RD RB RC RF RE RG Chapter 4: Internet Protocol Layer 197 Prune RPB Tree Prune to RPM tree Routers that do not lead to any members send prune messages to upstream routers Routers know membership information via IGMP Chapter 4: Internet Protocol Layer 198 Prune RPB Tree Example member mrouter router w/o member source RA Forward Prune RD RB RC RF RE RG Chapter 4: Internet Protocol Layer 199 Example of a RPM tree member router w/ member router w/o member source Chapter 4: Internet Protocol Layer Forward 200 DVMRP Drawbacks and Benefits Drawbacks First packet has to be flooded Periodic prune state refresh Routing state per (source , group) pair Benefits guarantee efficient delivery easy to implement Chapter 4: Internet Protocol Layer 201 Problems of DVMRP Work well only for densely represented groups Large amount of state information stored periodic broadcast will cause performance problems Information for forwarding Prune-state information Not scaleable Chapter 4: Internet Protocol Layer 202 PIM-SM Protocol Overview Special Features Packet Formats Chapter 4: Internet Protocol Layer 203 Protocol Overview Documents Terminologies RFC 2362, 4601(August, 2006) DR: Designated Router RP: Rendezvous Point RPT: RP-based Tree PIM-SM route packets in three phases Phase one: RP tree Phase two: Register Stop Phase three: Shortest-Path Tree (Optional) Chapter 4: Internet Protocol Layer 204 Phase One: RP Tree Receiver Sends join message to DR using IGMP DR sends (*,G) PIM Join message to RP Reaches RP or converge on a router on the RPT Join message is sent periodically (o.w., it will time out) Sender Sender sends a packet with multicast address as its destination to DR DR unicasts encapsulated packet to RP PIM Register packets RP decapsulates it and forwards it onto RPT Chapter 4: Internet Protocol Layer 205 Phase One: RP Tree (Fig) Join Encapsulated Multicast Send member RP DR RP (*,G) (*,G) RTA source A Chapter 4: Internet Protocol Layer B 206 Phase Two: Register Stop Motivation Encapsulation and decapsulation are too expensive Steps RP initiates an (S,G) source-specific Join to S All the routers on the path records the (S,G) multicast state Packets start to flow following the (S,G) tree to RP RP may now receive duplicate packets: native and encapsulated. RP discards the encapsulated packet RP sends a Register-Stop message to DR of Source RP forwards native packets to the RPT If the packet reaches a router with (*,G), do a short-cut to receivers. Chapter 4: Internet Protocol Layer 207 Phase Two: Register Stop (Fig) member Source specific join RP DR RP (S,G) source Chapter 4: Internet Protocol Layer 208 Phase Three: Shortest-Path Tree Motivation From source to RP, then to receivers is too long. Steps A receiver’s DR may optionally initiate to transfer from the RPT to a source-specific tree (SPT) It issues an (S,G) join to S. The join message may reach the source or converged at some router. It starts to receive two copies of packets. Drop the one from RPT. It then sends an (S,G) prune message to RP (S, G, rpt) prune Prune message reaches RP or converged at some router on RPT Chapter 4: Internet Protocol Layer 209 Phase Three: Shortest-Path Tree (Fig) member Source specific join (IGMPv3) Source specific prune RP DR RP (S,G) (S,G,rpt) source Chapter 4: Internet Protocol Layer 210 Source-specific Joins and Prunes If a receiver sends a source-specific join using IGMPv3 Multicast addresses for source-specific multicast If no other receiver on that group, DR may omit performing a (*,G) join. Instead, DR issues a source-specific (S,G) join. 232.0.0.0 to 232.255.255.255 Only source-specific join will be accepted for group in this range. A receiver may also sends a source-specific join with exclusive source list DR will perform a (*,G) join as normal, but may combine this with an (S,G,rpt) prune for each source in the list. Chapter 4: Internet Protocol Layer 211 Inter-domain Multicast: MSDP RP in each domain establishes an MSDP peering relation with RPs in other domains When the RP learns a new multicast source within its own domain, it informs its MSDP peers The RP encapsulates the first data packet in a Source Active (SA) message and sends the SA to all peers. The SA is forwarded by each receiving peer using a modified RPF check If the receiving MSDP peer is an RP, and the RP has a (*,G) entry for the group in the SA, the RP sends a (S,G) join. The RP also decapsulates the data and forwards down to its shared tree The receiver that interests in this (S,G) could sends a (S,G) join to have the shortest path to the source Each RP periodically sends SAs, which include all sources within its domain. Chapter 4: Internet Protocol Layer 212 Inter-domain Multicast: Multi-Protocol BGP (MBGP) Defined in RFC 2283 (extensions to BGP) MBGP is extended to carry different information to support IPv4 Unicast IPv6 Unicast IPv4 Multicast IPv6 Multicast …. Routing information may be carried in same BGP session Chapter 4: Internet Protocol Layer 213 Open Source Implementation 4.12: Mrouted Data structures of Mrouted routing_table Groups orginiated from the same source. rtentry rt_next rt_groups gtable gtable gt_next gt_prev gt_next gt_prev gt_gnext gt_gprev gt_gnext gt_gprev rt_next rt_groups gtable gt_next gt_prev gt_gnext gt_gprev Copyright Chapterreserved 4: Internet 2001 Protocol (Lin & Layer Hwang) 214 Summary on Multicast Source-based tree Advantage Disadvantage Optimal path between sources and receivers Routing information for each (S,G) pair Shared tree Advantage Less state in each router Disadvantage Non-optimal path between sources and receivers Chapter 4: Internet Protocol Layer 215 4.8 Summary Forwarding: longest prefix matching Routing: two-level, intra-domain and interdomain Distance vector routing vs. link state routing: distributed vs. centralized Other mechanisms: IPv6, NAT, ARP, DHCP, ICMP Broadcast in subnet: used by ARP and DHCP Chapter 4: Internet Protocol Layer 216