* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Network Routing - Yale University
Point-to-Point Protocol over Ethernet wikipedia , lookup
Asynchronous Transfer Mode wikipedia , lookup
Distributed firewall wikipedia , lookup
Dynamic Host Configuration Protocol wikipedia , lookup
Deep packet inspection wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Network tap wikipedia , lookup
Computer network wikipedia , lookup
Spanning Tree Protocol wikipedia , lookup
Multiprotocol Label Switching wikipedia , lookup
Airborne Networking wikipedia , lookup
Internet protocol suite wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Routing in delay-tolerant networking wikipedia , lookup
Routing and Forwarding 4/17/2012 1 Recap Internet routing is divided into intra-AS (intradomain) and inter-AS (interdomain) routing Technical reasons Non-technical reasons BGP (Border Gateway Protocol), a path- vector protocol, is the de facto standard interdomain protocol AS D Routing: Example E d1 d d2 AS A (OSPF) a2 F i AS C a1 How to specify? AS B (OSPF intra routing) b AS I 3 IP Addressing Scheme: Requirements We need an address to uniquely identify each destination Routing scalability needs flexibility in aggregation of destination addresses we should be able to aggregate a set of destinations as a single routing unit Preview: the unit of routing in the Internet is a network---the destinations in the routing protocols are networks 4 IP Address: An IP Address Identifies an Interface IPv4 address: 32-bit identifier for an interface interface: routers typically have multiple interfaces host may have multiple interfaces 223.1.1.1 223.1.2.1 223.1.1.2 223.1.1.4 223.1.1.3 223.1.2.9 223.1.3.27 223.1.2.2 223.1.3.2 223.1.3.1 %/sbin/ifconfig -a 223.1.3.2 = 11011111 00000001 00000011 00000010 223 1 3 2 5 IP Addressing IP address: network part host part What’s a network ? (from IP address perspective) is a unit of routing: can be routed together (depend on the routing protocol) 223.1.1.2 223.1.1.1 223.1.1.4 223.1.1.3 223.1.9.2 223.1.7.0 223.1.9.1 223.1.7.1 223.1.8.1 223.1.2.6 223.1.2.1 223.1.2.2 223.1.8.0 223.1.3.27 223.1.3.1 223.1.3.2 6 IP Addressing: Class-ful Addressing given notion of “network”, let’s re-examine IP addresses: “class-ful” addressing in the original IP design: 1.0.0.0 to 127.255.255.255 128.0.0.0 to 191.255.255.255 192.0.0.0 to 223.255.255.255 224.0.0.0 to 239.255.255.255 240.0.0.0 to 255.255.255.255 Problem of class-ful addressing? 7 IP Addressing: CIDR (Static)classful addressing: inefficient use of address space, address space exhaustion not flexible for aggregation • e.g., a class A net allocated enough addresses for 16 million hosts; a class B address may also be too big CIDR: Classless InterDomain Routing network portion of address of arbitrary length address format: a.b.c.d/x, where x is # bits in network portion of address network part host part 11001000 00010111 00010000 00000000 200.23.16.0/23 Some systems use mask (1’s to indicate network bits), instead of the /x format 8 CIDR Address Aggregation AS D d d1 AS A (OSPF) 130.132.1/24 i i->a1: I can reach 130.132/22; my path: I a2 a1 130.132.2/24 AS I intradomain routing uses /24 130.132.3/24 9 CIDR Address Aggregation B x00/24: B A x01/24: C C x10/24: E G E x11/24: F F F 10 Routing Table Size of BGP (number of globally advertised, aggregated networks) Active BGP Entries (http://bgp.potaroo.net/as1221/bgp-active.html) Internet Growth (http://www.caida.org/research/topology/as_core_network/historical.xml) 11 Routing Table Prefix Length Distr. 180000 160000 140000 120000 100000 80000 60000 40000 20000 0 8 13 18 23 28 12 IP Addressing: How to Get One? Q: How does an ISP get its block of addresses? A: ICANN: Internet Corporation for Assigned Names and Numbers allocates addresses manages DNS assigns domain names, resolves disputes Use %whois –h whois.arin.net “n <org>” to check addresses allocated to <org> 13 IP addresses: How to Get One? Q: How does a host get an IP address? Static configured wintel: control-panel->network->configuration>tcp/ip->properties unix: %/sbin/ifconfig eth0 inet 192.168.0.10 netmask 255.255.255.0 DHCP: Dynamic Host Configuration Protocol: dynamically get address from as server “plug-and-play” 14 DHCP: Dynamic Host Configuration Protocol Goal: allow host to dynamically obtain its IP address from network server when it joins network can renew its lease on address in use allows reuse of addresses (only hold address while connected) support for mobile users who want to join network DHCP msgs: host broadcasts “DHCP discover” msg DHCP server responds with “DHCP offer” msg host requests IP address: “DHCP request” msg DHCP server sends address: “DHCP ack” msg 15 Network Address Translation: Motivation A local network uses just one public IP address as far as outside world is concerned Each device on the local network is assigned a private IP address rest of Internet local network (e.g., home network) 192.168.1.0/24 192.168.1.1 192.168.1.2 192.168.1.3 138.76.29.7 192.168.1.4 All datagrams leaving local network have same single source NAT IP address: 138.76.29.7, different source port numbers Datagrams with source or destination in this network have 192.168.1/24 address for source, destination (as usual) 16 Private IP 17 NAT: Network Address Translation Implementation: NAT router must: outgoing datagrams: replace (source IP address, port #) of every outgoing datagram to (NAT IP address, new port #) . . . remote clients/servers will respond using (NAT IP address, new port #) as destination addr. remember (in NAT translation table) every (source IP address, port #) to (NAT IP address, new port #) translation pair incoming datagrams: replace (NAT IP address, new port #) in dest fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table 18 NAT: Network Address Translation NAT translation table WAN side addr LAN side addr 1: host 192.168.1.2 2: NAT router sends datagram to changes datagram 138.76.29.7, 5001 192.168.1.2, 3345 128.119.40.186, 80 source addr from …… …… 192.168.1.2, 3345 to 138.76.29.7, 5001, S: 192.168.1.2, 3345 updates table D: 128.119.40.186, 80 2 S: 138.76.29.7, 5001 D: 128.119.40.186, 80 138.76.29.7 S: 128.119.40.186, 80 D: 138.76.29.7, 5001 3: Reply arrives dest. address: 138.76.29.7, 5001 3 192.168.1.2 1 192.168.1.1 192.168.1.3 S: 128.119.40.186, 80 D: 192.168.1.2, 3345 4 192.168.1.4 4: NAT router changes datagram dest addr from 138.76.29.7, 5001 to 192.168.1.2, 3345 19 Network Address Translation: Advantages No need to be allocated range of addresses from ISP: - just one public IP address is used for all devices 16-bit port-number field allows 60,000 simultaneous connections with a single LAN-side address ! can change ISP without changing addresses of devices in local network can change addresses of devices in local network without notifying outside world Devices inside local net not explicitly addressable, visible by outside world (a security plus) 20 Network Address Translation: Problems If both hosts are behind NAT, they will have difficulty establishing connection NAT is controversial: routers should process up to only layer 3 violates end-to-end argument • NAT possibility must be taken into account by app designers, e.g., P2P applications address shortage should instead be solved by having more addresses --- IPv6 ! 21 Outline Admin. and recap BGP IP addressing IP forwarding 22 IP Datagram Format IP protocol version number header length (bytes) “type” of data max number remaining hops (decremented at each router) upper layer protocol to deliver payload to how much overhead with TCP? 20 bytes of TCP 20 bytes of IP = 40 bytes + app layer overhead 32 bits type of ver head. len service length fragment 16-bit identifier flgs offset time to upper Internet layer live checksum total datagram length (bytes) for fragmentation/ reassembly 32 bit source IP address 32 bit destination IP address Options (if any) data (variable length, typically a TCP or UDP segment) E.g. timestamp, record route taken, specify list of routers to visit. 23 Data Forwarding: Steps Error checking, e.g., check header checksum; if error, set up error flag Decrement TTL; if TTL == 0, set error flag If error, drop the packet, and generate ICMP report 24 The Network Layer Host, router network layer functions: Transport layer: TCP, UDP Network layer The IP protocol •addressing •datagram format Routing protocols •path selection •RIP, OSPF, BGP forwarding ICMP protocol •error reporting •router “signaling” Link layer Physical layer 25 ICMP: Internet Control Message Protocol communicate network-level information error reporting: unreachable host, network, port, protocol echo request/reply (used by ping) network-layer “above” IP: ICMP msgs carried in IP datagrams ICMP message: type, code plus first 8 bytes of IP datagram causing error type code checksum ICMP message body Type 0 3 3 3 3 3 3 4 Code 0 0 1 2 3 6 7 0 8 9 10 11 12 0 0 0 0 0 description echo reply (ping) dest network unreachable dest host unreachable dest protocol unreachable dest port unreachable dest network unknown dest host unknown source quench (congestion control - not used) echo request (ping) route advertisement router discovery TTL expired bad IP header traceroute is developed by a clever use of ICMP 26 Data Forwarding: Steps If no error, look up packet destination address in forwarding table: if datagram for a host on directly attached network, it is the job of the link layer now otherwise, • lookup: find next-hop router, and its outgoing interface • if needed, do fragmentation • forward packet to outgoing interface (to the next hop neighbor) try %netstat –rn to see the forwarding table 27 Forwarding Look up default: - # prefix a) b) c) d) e) f) g) h) i) j) k) 00001 00010 00011 001 0101 011 10 100 1010 1011 1100 interface 0 1 a g f d e abc h i jk The networks are represented by a decision tree, e.g., a Patricia Trie to look for the longest match of the destination address 28 Content Addressable Memory (CAM) Standard computer memory (Random Access Memory or RAM) Word Read(addr) Write(addr, Word) CAM Word Read(addr) Write(addr, Word) AddrList Search(Word) 29 Ternary CAM Binary CAM It searches words consisting entirely of 1s and 0s. TCAM It allows a third matching state of “X” or “Don’t Care” for one or more bits in the stored data word. “10XX0” will match “10000”, “10010”, “10110”, “10100”. 30 Prototype of TCAM Based IP Table Lookup Physical Address IP Prefix (4 Bytes) Next Hop (4 Bytes) 0x00000000 0x010203XX (1.2.3.0/24) 1 0x00000008 0x050607XX(5.6.7.0/24) 2 0x00000010 0x0102XXXX(1.2.0.0/16) 0 Keep an eye on hardware revolutions Solid State Drive (SDD) Optical Switching or Routing? 31 Example 1 (same network): A->B src forwarding table in A dst misc data fields 223.1.1.1 223.1.1.3 Look up dest address find dest is on same net link layer will send the datagram directly inside a link-layer frame Dest. Net. next router Nhops 223.1.1/24 1 223.1.2/24 223.1.1.4 2 223.1.3/24 223.1.1.4 2 0.0.0.0/0 223.1.1.4 To Internet A 223.1.1.1 223.1.4.1 223.1.2.1 B 223.1.1.2 223.1.1.4 223.1.2.9 223.1.2.2 223.1.1.3 223.1.3.1 223.1.3.27 223.1.3.2 32 Example 2 (Different Networks): A-> E forwarding table in A misc data fields 223.1.1.1 223.1.2.3 look up dest address in forwarding table routing table: next hop router to dest is 223.1.1.4 link layer sends datagram to router 223.1.1.4 inside a linklayer frame the dest. of the link layer frame is 223.1.1.4 Dest. Net. next router Nhops 223.1.1/24 1 223.1.2/24 223.1.1.4 2 223.1.3/24 223.1.1.4 2 0.0.0.0/0 223.1.1.4 To Internet A 223.1.1.1 223.1.4.1 223.1.2.1 B 223.1.1.2 223.1.1.4 223.1.2.9 223.1.2.3 223.1.1.3 223.1.3.1 223.1.3.27 E 223.1.3.2 33 Example 2 (Different Networks): A-> E misc data fields 223.1.1.1 223.1.2.3 Arriving at 223.1.1.4, destined for 223.1.2.2 look up dest address in router’s forwarding table E on same network as router’s interface 223.1.2.9 router, E directly attached link layer sends datagram to 223.1.2.2 inside link-layer frame via interface 223.1.2.9 datagram arrives at 223.1.2.2!! (hooray!) forwarding table in router Dest. Net router Nhops interface 223.1.1/24 223.1.2/24 223.1.3/24 0.0.0.0/0 A B - 223.1.1.1 1 223.1.1.4 1 223.1.2.9 1 223.1.3.27 223.1.4.1 To Internet 223.1.4.1 223.1.2.1 223.1.1.2 223.1.1.4 223.1.2.9 223.1.2.3 223.1.1.3 223.1.3.1 223.1.3.27 E 223.1.3.2 34 What A Router Looks Like: Outside 35 Look Inside a Router Two key router functions: run routing algorithms/protocol (RIP, OSPF, BGP) switching datagrams from incoming to outgoing ports 36 Input Port Functions physical layer: bit-level reception data link layer: e.g., Ethernet network layer: lookup output port using forwarding table 37 Switching: Low End 38 Switching Via An Interconnection Network Overcome bus bandwidth limitations fragmenting datagram into fixed length cells, switch cells through the fabric. Crossbar, Banyan networks, and others Cisco 12416: switches 320 Gbps (upgradeable to 1.28 Tbps) with 16 slots (each 10G full-duplex) through the crossbar interconnection network 39 New Potential Bottleneck: Output Ports Due to output port contention and head-of-the-Line (HOL) blocking (i.e., queued datagram at front of queue prevents others in queue from moving forward) 40 Head-of-Line Blocking Limits Thrput Due to output-port contention and HOL blocking, the stable throughput is only around 2 - sqrt(2) = 0.586 of line speed ! 41 Avoiding Port Contention and HOB Virtual output queueing Input/output ports matching algorithm Switch fabric speedup, e.g., two cells to one output port For more details: http://www.cisco.com/warp/public/63/arch12000-swfabric.html 42 Output Ports Buffering required when datagrams arrive from fabric faster than the transmission rate Queueing (delay) and loss due to output port buffer overflow ! Scheduling and queue/buffer management choose among queued datagrams for transmission 43 Backup Slides 44 Another Method for Checking Convergence: Dispute Wheels A dispute wheel prefers R1Q2 over Q1 u2 prefers R2 Q3 over Q2 etc u1 2 210 20 R1 Q2 Q1 u1 130 10 1 3 Q3 d u3 Qn Rn un 0 R2 u2 R3 320 30 45 Patterns of Valid Routes Consider how a path is extended according to the export policies case 1 (format: Destination, link type, …) • Dest, …CP Dest, … CP CP • Dest, …CP Dest, … CP PC • Dest, …CP Dest, … CP PP case 2 Valid consecutive link types (starting from source to destination, i.e., reserve) PC PC CP PC PP PC CP CP CP PP • Dest, …PC Dest, … PC PC case 3 • Dest, …PP Dest, … PP PC 46 Case 1: The first link of Q1 is a PC link Then the first link of R1Q2 must be a PC link because u1 chooses R1Q2 and C > E/P Thus all remaining links along R1Q2 are PC links because only PC follows PC u1 Thus the first link of Q2 is a PC link …… All links along R1, …, Rn are PC links The network has a PC loop. contradiction ! PC Rn R1 PC Q1 u2 Qn un d Rn-1 Q2 R2 47 Case 2: The first link of Q1 is a CP/PP link All links along Rn are CP links because all links before CP/PP are CP The first link of Qn must be a CP/PP link because C > E/P …… CP Rn u1 CP/PP Q1 All links along R1, …, Rn are CP links The network has a PC loop. contradiction ! u2 Qn un CP/PP Rn-1 d Q2 R2 48 Backup: IP Multicast 49 IP Fragmentation & Reassembly Network links have MTU (max.transfer size) largest possible link-level frame. different link types, different MTUs, e.g. Ethernet MTU is 1500 bytes Large IP datagram divided (“fragmented”) one datagram becomes several datagrams “reassembled” only at final destination IP header bits used to identify, order related fragments fragmentation: in: one large datagram out: 3 smaller datagrams reassembly 50 IP Fragmentation and Reassembly Example 4000 byte datagram MTU = 1500 bytes length ID fragflag offset =4000 =x =0 =0 One large datagram becomes several smaller datagrams length ID fragflag offset =1500 =x =1 =0 length ID fragflag offset =1500 =x =1 =1480 length ID fragflag offset =1040 =x =0 =2960 51 IP Multicast: Service Model 128.59.16.12 128.119.40.186 multicast group 226.17.30.197 128.34.108.63 128.34.108.60 Multicast group concept: use of indirection Open group model A group is identified by a location-independent logical address (class D IP address: prefix 1110) Anyone can send packets to the “logical” group address Anyone can join a group and receive packets Normal, best-effort delivery semantics of IP Needed: infrastructure to deliver mcast-addressed datagrams to all hosts that have joined that multicast group 52 Multicast Across LANs Goal: find a tree (or trees) connecting routers having local mcast group members source-based: different tree from sender to each receiver – Distance-vector multicast routing protocol (DVMRP) – Protocol-independent multicast-dense mode (PIM-DM) shared-tree: same tree used by all group members – Core-Based Tree (CBT) – Protocol-independent multicast-sparse mode (PIM-SM) shared tree source-based trees 53 Source Tree: Reverse Path Flooding (RPF) A router x forwards a packet from source (S) iff it arrives via neighbor y, and y is on the shortest path from x back to S A packet is replicated to all but the incoming interface 1 S 1 y x 1 1 z t 1 a 54 Reverse Path Forwarding: Improvement Basic idea: forward a packet from S only on child links for S A child link of router x for source S a link that has x as parent on the shortest path from the link to S a child x notifies its parent y (through the routing protocol) that it has selected y as its parent S y x z t a 55 Reverse Path Forwarding: Pruning No need to forward datagrams down subtree with no mcast group members “prune” msgs sent upstream by router with no downstream group members LEGEND S: source R1 router with attached group member R4 R2 P R5 R3 R6 P R7 P router with no attached group member prune message links with multicast forwarding 56 Pruning Prune (Source, Group) at a leaf router if no members send No-Membership Report (NMR) up tree If all children of router R prune (S,G) propagate prune for (S,G) to its parent What do you do when a member of a group (re)joins? send a Graft message to upstream parent How to deal with failures? prune dropped flow is reinstated down stream routers re-prune Note: again a soft-state approach 57 Implementation of Source Trees in the Internet Multicast OSFP (MOSFP) Membership is part of the link state distribution; calculate source specific, pre-pruned trees Reverse Path Forwarding Distance Vector Multicast Routing Protocol (DVMRP) Protocol Independent Multicast – Dense Mode (PIM-DM) • very similar to DVMRP Difference: PIM uses any unicast routing algorithm to determine the path from a router to the source; DVMRP uses distance vector Question: the state requirement of Reverse Path Forwarding 58 Building a Shared Tree Steiner Tree: minimum cost tree connecting all routers with attached group members A Steiner tree is not a spanning tree because you do not need to connect all nodes in the network Problem is NP-hard Excellent heuristics exists Not used in practice: computational complexity information about entire network needed monolithic: rerun whenever a router needs to join/leave 59 Center (Core) based Shared Tree Single delivery tree shared by all One router identified as “center” of tree Tree construction is receiver-based edge router sends unicast join-msg addressed to center router join-msg “processed” by intermediate routers and forwarded towards center join-msg either hits existing tree branch for this center, or arrives at center path taken by join-msg becomes new branch of tree for this router A sender unicasts a packet to center The packet is distributed on the tree when it hits the tree 60 Example: M3 Joins Group members: M1, M2 core M1 M2 M3 shared tree join message S1 Discussion: what is property of the constructed tree? 61 Example: M1 Sends Data Group members: M1, M2, M3 M1 sends data core M1 M2 control (join) messages data M3 S1 62 Shared Tree Protocols in the Internet Core Based Tree Protocol Independent Multicast (PIM) Sparse mode The catch: how do you know the center? session announcement 63 Mbone: Tunneling Q: How to connect “islands” of multicast routers in a “sea” of unicast routers? physical topology logical topology mcast datagram encapsulated inside “normal” (non-multicast- addressed) datagram normal IP datagram sent thru “tunnel” via regular IP unicast to receiving mcast router receiving mcast router unencapsulates to get mcast datagram 64