Download Introduction to Distributed Systems and Networking

Introduction to Networking Internet: Example • Click -> get page • Specifies - protocol (http) - location (www.cnn.com) Internet: Locating Resource • www.cnn.com – name of a computer – Implicitly also a file • Map name to IP address – DNS cnn.com? cnn.com? host com local a.b.c.d a.b.c.d Internet: Connection • Http sets up a connection (tcp) – between the host and cnn.com to transfer the page • The connection transfers page as a byte stream – without errors: flow control + error control Host www.cnn.com Page; close Internet: End-to-end • Byte stream flows end to end across many links/switches: – routing (+ addressing) • That stream is regulated and controlled by both ends: – retransmission of erroneous or missing bytes; flow control end-to-end pacing and error control routing HOST CNN.COM Internet: Packets • The network transports bytes grouped into packets • Packets are “self-contained”; routers handle them 1 by 1 • The end hosts worry about errors and pacing – Destination sends ACKs; Source checks losses A | B | # , CRC | bytes CNN.COM: A HOST: B C B: to C Internet: Bits • Equipment in each node sends packets as string of bits • That equipment is not aware of the meaning of the bits • Frames (packetizing) vs. streams 01011...011...110 01011...011...110 Transmitter Physical Medium Optical Copper Wireless Receiver Internet: Points to remember • Separation of tasks – send bits on a link: transmitter/receiver [clock, modulation,…] – send packet on each hop [framing, error detection,…] – send packet end to end [addressing, routing] – pace transmissions [detect congestion] – retransmit erroneous or missing packets [acks, timeout] – find destination address from name [DNS] • Scalability – routers don’t know full path – names and addresses are hierarchical Internet : Challenges • • • • • • Addressing ? Routing ? Reliable transmission ? Interoperability ? Resource management ? Quality of service ? Concepts at heart of the Internet • • • • • Protocol Layered Architecture Packet Switching Distributed Control Open System Protocol • Two communicating entities must agree on: – Expected order and meaning of messages they exchange – The action to perform on sending/receiving a message • Asking the time Layered Architectures • Human beings can handle lots of complexity in their protocol processing. – Ambiguously defined protocols – Many protocols all at once • How computers manage complex protocol processing? – Specify well defined protocols to enact. – Decompose complicated jobs into layers; • each has a well defined task Layered Architectures • Break-up design problem into smaller problems – More manageable • Modular design: easy to extend/modify. • Difficult to implement – careful with interaction of layers for efficiency Layered Architecture network users Applications Web, e-mail, file transfer, ... Middleware Reliable/ordered transmission, QOS, security, compression, ... Routing Physical Links End-to-end transmission, resource allocation, routing, ... Point-to-point links, LANs, radios, ... The OSI Model • Open Systems Interconnect model is a standard way of understanding conceptual layers of network comm. • This is a model, nobody builds systems like this. • Each level provides certain functions and guarantees, and communicates with the same level on remote notes. • A message is generated at the highest level, and is passed down the levels, encapsulated by lower levels, until it is sent over the wire. • On the destination, it makes its way up the layers,until the highlevel msg reaches its high-level destination. OSI Levels Node A Application Application Presentation Presentation Transport Transport Network Network Data Link Data Link Physical Physical Network Node B OSI Levels • • • • Physical Layer: electrical details of bits on the wire Data Link: sending “frames” of bits and error detection Network Layer:” routing packets to the destination Transport Layer: reliable transmission of messages, disassembly/assembly, ordering, retransmission of lost packets • Session Layer; really part of transport, typ. Not impl. • Presentation Layer: data representation in the message • Application: high-level protocols (mail, ftp, etc.) Internet protocol stack network users Application HTTP, SMTP, FTP, TELNET, DNS, … Transport TCP, UDP. Network IP Physical Point-to-point links, LANs, radios, ... Air travel Passenger Origin Passenger Destination Ticket (purchase) Ticket (complain) Baggage (check) Baggage (claim) Gates (load) Gates (unload) Runway (take off) Runway (landing) Airplane routing Protocol stack user X English user Y e-mail client SMTP e-mail server TCP server TCP TCP server IP server ethernet driver/card IP IEEE 802.3 standard electric signals IP server ethernet driver/card Protocol interfaces user X user Y e-mail client TCP server e-mail server s = open_socket(); socket_write(s, buffer); … TCP server IP server IP server ethernet driver/card ethernet driver/card Addressing • Each network interface has a hardware address – Multiple interfaces  multiple addresses • Each application communicates via a port – Port is a logical connection endpoint – Allows multiple local applications to use network resources – Up to 65535 • < 1024 : used by privileged applications • 1024 ≤ available for use ≤ 49151 • 49152 ≤ Dynamic ports/private ports ≤ 65535 – http ports 80 and 8080 – telnet 23, ftp 21, etc • Think of a telephone network … Addressing and Packet Format • The ``Data'' segment contains higher level protocol information. Start (7 bytes) – Which protocol is this packet destined for? – Which process is the packet destined for? – Which packet is this in a sequence of packets? Destination (6) Source (6) Length (2) – What kind of packet is this? • This is the stuff of the OSI reference model. Msg Data (1500) Checksum (4) Ethernet packet dispatching • • • • • • • • • An incoming packet comes into the Ethernet controller. The Ethernet controller reads it off the network into a buffer. It interrupts the CPU. A network interrupt handler reads the packet out of the controller into memory. A dispatch routine looks at the Data part and hands it to a higher level protocol The higher level protocol copies it out into user space. A program manipulates the data. The output path is similar. Consider what happens when you send mail. Hi Dad. Example: Mail To: Dad Hi Dad. SrcAddr: 128.95.1.2 DestAddr: 128.95.1.3 SrcPort: 110, DestPort: 110Bytes: 1-20 Hi Dad. To: Dad Mail Composition And Display User Mail Transport Layer Kernel To: Dad Hi Dad. SrcAddr: 128.95.1.2 DestAddr: 128.95.1.3 SrcPort: 110, DestPort: 110Bytes: 1-20 To: Dad Network Transport Layer Hi Dad. SrcEther: 0xdeadbeef DestEther: 0xfeedface SrcAddr: 128.95.1.2 DestAddr: 128.95.1.3 SrcPort: 100 DestPort: 200Bytes: 1-20 Hi Dad. Link Layer To: Dad Hi Dad. SrcEther: 0xdeadbeef DestEther: 0xfeedface SrcAddr: 128.95.1.2 DestAddr: 128.95.1.3 SrcPort: 100 DestPort: 200Bytes: 1-20 To: Dad Network Hi Dad. Protocol encapsulation user X “Hello” user Y e-mail client “Hello” e-mail server TCP server “Hello” TCP server IP server “Hello” IP server “Hello” ethernet driver/card ethernet driver/card End-to-End Argument • What function to implement in each layer? • Saltzer, Reed, Clarke 1984 – A function can be correctly and completely implemented only with the knowledge and help of applications standing at the communication endpoints – Argues for moving function upward in a layered architecture • Should the network guarantee packet delivery ? – Think about a file transfer program – Read file from disk, send it, the receiver reads packets and writes them to the disk End-to-End Argument • If the network guaranteed packet delivery – one might think that the applications would be simpler • No need to worry about retransmits – But need to check that file was written to the remote disk intact • A check is necessary if nodes can fail – Consequently, applications need to perform their retransmits • No need to burden the internals of the network with properties that can, and must, be implemented at the periphery End-to-End Argument • An Occam’s razor for Internet design – If there is a problem, the simplest explanation is probably the correct one • Application-specific properties are best provided by the applications, not the network – Guaranteed, or ordered, packet delivery, duplicate suppression, security, etc. • The internet performs the simplest packet routing and delivery service it can – Packets are sent on a best-effort basis – Higher-level applications do the rest Two ways to handle networking • Circuit Switching – What you get when you make a phone call – Dedicated circuit per call • Packet Switching – What you get when you send a bunch of letters – Network bandwidth consumed only when sending – Packets are routed independently Circuit Switching • End-to-end resources reserved for “call” – – – – Link bandwidth, switch capacity Dedicated resources: no sharing Circuit-like (guaranteed) performance Call setup required Packet Switching • Each end-to-end data stream divided into packets – User’s packets share network resources • Compared to dedicated allocation – Each packet uses full link bandwidth • Compared to dividing bandwidth into pieces – Resources are used as needed • Compared to resource reservation • Resource contention: – Aggregate demand can exceed amount available – Congestion: packets queue, wait for link use – Store and forward: packets move one hop at a time • Transmit over link • Wait turn at next link Routing • Goal: move data among routers from source to dest. • Datagram packet network: – – – – Destination address determines next hop Routes may change during session Analogy: driving, asking directions No notion of call state • Circuit-switched network: – Call allocated time slots of bandwidth at each link – Fixed path (for call) determined at call setup – Switches maintain lots of per call state: resource allocation Packet vs. Circuit Switching • Reliability: no congestion, in-order data in circuit-switch • Packet switching: better bandwidth use • State, resources: packet switching has less state – Good: less control plane processing resources along the way – More data plane (address lookup) processing • Failure modes (routers/links down) – Packet switch reconfigures sub-second timescale – Circuit switching: more complicated • Involves all switches in the path A small Internet W w,e5 V Scenario: A wants to send data to B. R r3 r1,e1 a,e3 A r2,e2 b,e4 B Packet forwarding Host A Host B Router R Router W HTTP HTTP TCP TCP IP IP ethernet eth IP link link IP eth ethernet The Link Layer What is purpose of this layer? • Physically encode bits on the wire • Link = pipe to send information – E.g. point to point or broadcast • Can be built out of: – Twisted pair, coaxial cable, optical fiber, radio waves, etc • Links should only be able to send data – Could corrupt, lose, reorder, duplicate, (fail in other ways) How to connect routers/machines? • WAN/Router Connections – Commercial: • • • • • T1 (1.5 Mbps), T3 (44 Mbps) OC1 (51 Mbps), OC3 (155 Mbps) ISDN (64 Kbps) Frame Relay (1-100 Mbps, usually 1.5 Mbps) ATM (some Gbps) – To your home: • DSL • Cable • Local Area: – Ethernet: IEEE 802.3 (10 Mbps, 100 Mbps, 1 Gbps, 10Gbps) – Wireless: IEEE 802.11 b/g/a (11 Mbps, 22 Mbps, 54 Mbps) Link level Issues • • • • Encoding: map bits to analog signals Framing: Group bits into frames (packets) Arbitration: multiple senders, one resource Addressing: multiple receivers, one wire Encoding • Map 1s and 0s to electric signals • Simple scheme: Non-Return to Zero (NRZ) – 0 = low voltage, 1 = high voltage 1 0 1 1 0 • Problems: – How to tell an error? When jammed? When is bus idle? – When to sample? Clock recovery is difficult. • Idea: Recover clock using encoding transitions Manchester Encoding • Used by Ethernet • Idea: Map 0 to low-to-high transition, 1 to high-to-low 0 1 1 0 • Plusses: can detect dead-link, can recover clock • Bad: reduce bandwidth, i.e. bit rate = ½ baud rate – If wire can do X transition per second? Framing • Why send packets? – Error control • How do you know when to stop reading? – Sentinel approach: send start and end sequence – For example, if sentinel is 11111 – 11111 00101001111100 11111 10101001 11111 010011 11111 – What if sentinel appears in the data? • map sentinel to something else, receiver maps it back – Bit stuffing Example: HDLC • Same sentinel for begin and end: 0111 1110 • packet format: 0111 1110 header data CRC • Bit stuffing – Sender: If 5 1s then insert a 0 0111 1110 0111 1101 0 – Receiver: if 5 1s followed by a 0, remove 0 0111 1101 0 • Else read next bit 0111 1110 • Packet size now depends on the contents 0111 1110 Arbitration • One medium, multiple senders – What did we do for CPU, memory, readers/writers? – New Problem: No centralized control • Approaches – TDMA: Time Division Multiple Access • Divide time into slots, round robin among senders • If you exceed the capacity  do not admit more (busy signal) – FDMA: Frequency Division Multiple Access (AMPS) • Divide spectrum into channels, give each sender a channel • If no more channels available, give a busy signal – Good for continuous streams: fixed delay, constant data rate – Bad for bursty Internet traffic: idle slots Ethernet • Developed in 1976, Metcalfe and Boggs at Xerox • Uses CSMA/CD: – Carrier Sense Multiple Access with Collision Detection • Easy way to connect LANs Metcalfe’s Ethernet sketch CSMA/CD • Carrier Sense: – Listen before you speak • Multiple Access: – Multiple hosts can access the network • Collision Detection: – Can make out if someone else started speaking Older Ethernet Frame CSMA Wait until carrier free CSMA/CA Garbled signals If the sender detects a collision, it will stop and then retry! What is the problem? CSMA/CD Packet? No Sense Carrier Send Detect Collision Yes Discard Packet attempts < 16 attempts == 16 Jam channel b=CalcBackoff(); wait(b); attempts++; Ethernet’s CSMA/CD (more) Jam Signal: make sure all other transmitters are aware of collision; 48 bits; Exponential Backoff: • Goal: adapt retransmission attempts to estimated current load – heavy load: random wait will be longer • first collision: choose K from {0,1}; delay is K x 512 bit transmission times • after second collision: choose K from {0,1,2,3}… • after ten or more collisions, choose K from {0,1,2,3,4,…,1023} Packet Size If packets are too small, the collision goes unnoticed  Limit packet size  Limit network diameter  Use CRC to check frame integrity  truncated packets are filtered out Ethernet Problems • What if there is a malicious user? – Might not use exponential backoff – Might listen promiscuously to packets • Integrating Fast and Gigabit Ethernet Addressing & ARP 128.84.96.89 128.84.96.90 128.84.96.91 “What is the physical address of the host named 128.84.96.89” “I’m at 1a:34:2c:9a:de:cc” • ARP is used to discover physical addresses • ARP = Address Resolution Protocol Addressing & RARP ??? 128.84.96.90 RARP Server 128.84.96.91 “I just got here. My physical address is 1a:34:2c:9a:de:cc. What’s my name ?” “Your name is 128.84.96.89” • RARP is used to discover virtual addresses • RARP = Reverse Address Resolution Protocol Repeaters and Bridges • Both connect LAN segments • Usually do not originate data • Repeaters (Hubs): physical layer devices – forward packets on all LAN segments – Useful for increasing range – Increases contention • Bridges: link layer devices – Forward packets only if meant on that segment – Isolates congestion – More expensive Backbone Bridge The Network Layer Purpose of Network layer • Given a packet, send it across the network to destination • 2 key issues: – Portability: • connect different technologies – Scalability • To the Internet scale application transport network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical application transport network data link physical What does it involve? Two important functions: • routing: determine path from source to dest. • forwarding: move packets from router’s input to output T1 T3 Sts-1 T3 T1 Network service model Q: What service model for “channel” transporting packets from sender to receiver? • • • • • guaranteed bandwidth? preservation of inter-packet timing (no jitter)? loss-free delivery? in-order delivery? congestion feedback to sender? The most important abstraction provided by network layer: Which things can be “faked” at the transport layer? ? ? ? virtual circuit or datagram? Two connection models • Connectionless (or “datagram”): – each packet contains enough information that routers can decide how to get it to its final destination b A b • Connection-oriented (or “virtual circuit”) B C – first set up a connection between two nodes – label it (called a virtual circuit identifier (VCI)) – all packets carry label A 1 1 1 B C Virtual circuits: signaling protocols • • • • used to setup, maintain teardown VC setup gives opportunity to reserve resources used in ATM, frame-relay, X.25 not used in today’s Internet application transport 5. Data flow begins network 4. Call connected data link 1. Initiate call physical 6. Receive data application 3. Accept call transport 2. incoming call network data link physical Virtual circuit switching • Forming a circuit: – send a connection request from A to B. Contains VCI + address of B – rule: VCI must be unique on the link its used on – switch creates an entry mapping input messages with VCI to output port – switch picks a new VCI unique between it and next switch 2 1 2 a c 1 5 2 1 b Virtual circuit forwarding • For each VCI switch has a table which maps input link to output link and gives the new VCI to use – if a’s messages come into switch 1 on link 2 and go out on link 3 then the table will be: (Input link,VCI) (output link, new VCI) (1, 2) (?, ?) (1, 5) (?, ?) Switch 1 2 1 2 a Switch 2 5 2 c 1 Switch 3 1 b Virtual Circuits: Discussion • Plusses: easy to associate resources with VC – Easy to provide QoS guarantees (bandwidth, delay) – Very little state in packet • Minuses: – Not good in case of crashes • Requires explicit connect and teardown phases – What if teardown does not get to all routers? – What if one switch crashes? • Will have to teardown and rebuild route Datagram networks no call setup at network layer • • routers: no state about end-to-end connections – no network-level concept of “connection” • packets typically routed using destination host ID – packets between same source-dest pair may take different paths • Best effort: data corruption, packet drops, route loops application transport network data link 1. Send data physical application transport 2. Receive data network data link physical Datagrams: Forwarding How does packet get to the destination? • switch creates a “forwarding table”, mapping destinations to output port (ignores input ports) • when a packet with a destination address in the table arrives, it pushes it out on the appropriate output port • when a packet with a destination address not in the table arrives, it must find out more routing information (next problem) d 0 S1 2 1 0 S2 c 1 2 a 0 S3 1 b Datagrams • Plusses: – – – – No round trip connection setup time No explicit route teardown No resource reservation  each flow could get max bandwidth Easily handles switch failures; routes around it • Minuses – Difficult to provide resource guarantees – Higher per packet overhead • Internet uses datagrams: IP (Internet Protocol) Datagrams Forwarding • How to build forwarding tables? – Manually enter it • What if nodes crashed • What about scale? • The graph-theoretic routing problem – Given a graph, with vertices (switches), edges (links), and edge costs (cost of sending on that link) – Find the least cost path between any two nodes • Path cost =  (cost of edges in path) Simple Routing Algorithm • Choose a central node – All nodes send their (nbr, cost) information to this node – Central node uses info to learn entire topology of the network – It then computes shortest paths between all pairs of nodes • Using All Pair Shortest Path Algorithm – Sends the new matrix to every node • Nice, simple, elegant! • What is the problem? – Scalability: centralization hurts scalability – Central node is “crushed” with traffic Link State Routing • Basic idea: – Every node propagates its (nbr, cost) information – This information at all nodes is enough to construct topology – Can use a graph algorithm to find the shortest routes • Mechanisms required: – Reliable flooding of link information – Method to calculate shortest route (Dijkstra’s algorithm) • Example link state update packet: – [node id, (nbr, cost) list, seq. no., ttl] • Seq. no. to identify latest updates, ttl specifies when to stop msg. Reliable flooding receive(pkt) If already have a copy of LSP from pkt.ID if pkt’s sequence number <= copy’s discard pkt else decrement pkt.TTL replace copy with pkt forward pkt to all links besides the one that we received it on # done every 10 minutes or so gen_LSP() increment node’s sequence # by one recompute cost vector send created LSP to all neighbors Discussion: Link-State Routing Plusses: • – Simple, determines the optimal route most of the time – Used by OSPF • Minuses: D 1 – Might have oscillations A 1 A 1+e 0 0 0 C e B 1 2+e 0 D 1+e 1 B 0 0 C 0 D 1 A 0 0 C 2+e B 1+e e Initially start with … everyone goes with … recompute Least loaded => almost equal routes least loaded 2+e A 0 D 1+e 1 B e 0 C … recompute Most loaded – Avoid using load as cost metric, reduce herding effect Is our routing algorithm scalable? • Route table size grows with size of network – Because our address structure is flat! • Solution: have a hierarchical structure – Used by OSPF – Divide the network into areas, each area has unique number • Nodes carry their area number in the address 1.A, 2.B, etc. – Nodes know complete topology in their area – Area border routers (ABR) know how to get to any other area Hierarchical Addressing Zone 2 2.a 1.b 0 S1 2 1 0 1.a Forwarding table for switch 1 Destination switch port 2. ? 3. ? 1.b ? 1.a ? 1 2 S2 2.b 3 0 S3 1 3.b 2 Zone 3 3.a IP has 2-layer addressing • Each IP address is 32 bits – Network part: which network the host is on? – Host part: identifies the host. • All hosts on same network have the same network part 18.26.0.1 network 32-bits host • 3 classes of addresses: A, B and C 0 net 1 7 host 24 bits 1 0 net host 110 net host 2 16 bits 3 21 8 bits 14 IP addressing • The different classes: class Unicast A 0 network B 10 C 110 Multicast D 1110 Reserved E 1111 1.0.0.0 to 127.255.255.255 host network 128.0.0.0 to 191.255.255.255 host network host multicast address reserved 32 bits • Problems: inefficient, address space exhaustion 192.0.0.0 to 223.255.255.255 224.0.0.0 to 239.255.255.255 240.0.0.0 to 255.255.255.255 IP addressing: CIDR • Classless InterDomain Routing – network portion of address of arbitrary length – address format: a.b.c.d/x, where x is # bits in network portion network part host part 11001000 00010111 00010000 00000000 – Examples: • Class A: /8 • Class B: /16 • Class C: /24 200.23.16.0/23 Internet Protocol Datagram IP protocol version Number header length “type” of data max number remaining hops (decremented at each router) upper layer protocol to deliver payload to 32 bits type of ver head. len service length fragment 16-bit identifier flgs offset time to upper Internet layer live checksum total datagram length (bytes) for fragmentation/ reassembly 32 bit source IP address 32 bit destination IP address Options (if any) data (variable length, typically a TCP or UDP segment) E.g. timestamp, record route taken, pecify list of routers to visit. Datagram Portability • IP Goal: To create one logical network from multiple physical networks – All intermediate routers should understand IP – IP header information sufficient to carry the packet to destination – Goal: Run over anything! • Problem: – Physical networks have different MTUs – “max. transmission unit”: 1500 for Ethernet, 48 for ATM • Solution 1: – Fit everything in the MTU (!) IP Fragmentation & Reassembly • Solution 2: (the one used) – If packet size > MTU of network, then fragment into pieces • Each fragment is less than MTU size • Each has IP headers + frag bit set + frag id + offset – Packets may get refragmented on the way to destination – Reassembly only done at the destination reassembly – What is a good initial packet size? fragmentation: in: one large datagram out: 3 smaller datagrams Internet: Names and Addresses Naming in the Internet • What are named? All Internet Resources. – Objects: www.cs.cornell.edu/~einar – Services: weather.yahoo.com/forecast – Hosts: planetlab1.cs.cornell.edu • Characteristics of Internet Names – human recognizable – unique – Persistent? • Universal Resource Names (URNs) Locating the resources • Internet services and resources are provided by end-hosts – ex. www1.cs.cornell.edu and www2.cs.cornell.edu host Einar’s home page. • Names are mapped to Locations – Universal Resource Locators (URL) – Embedded in the name itself: ex. weather.yahoo.com/forecast • Semantics of Internet naming  human recognizable  uniqueness x persistent Locating the Hosts? • Internet Protocol Addresses (IP Addresses) – ex. planetlab1.cs.cornell.edu  128.84.154.49 • Characteristics of IP Addresses – 32 bit fixed-length – enables network routers to efficiently handle packets in the Internet • Locating services on hosts – port numbers (16 bit unsigned integer) 65536 ports – standard ports: HTTP 80, FTP 20, SSH 22, Telnet 20 Mapping Not 1 to 1 • One host may map to more than one name – One server machine may be the web server (www.foo.com), mail server (mail.foo.com)etc. • One host may have more than one IP address – IP addresses are per network interface • But IP addresses are generally unique! – two globally visible machines should not have the same IP address – Anycast is an Exception: • routers send packets dynamically to the closest host matching an anycast address How to get a name? • Naming in Internet is Hierarchical – decreases centralization – improves name space management • First, get a domain name then you are free to assign sub names in that domain – How to get a domain name coming up • Example: weather.yahoo.com belongs to yahoo.com which belongs to .com – regulated by global non-profit bodies Domain name structure root (unnamed) com edu gov mil net org gTLDs lucent cornell ustreas ... fr gr us uk ... ccTLDs second level (sub-)domains gTLDs= Generic Top Level Domains ccTLDs = Country Code Top Level Domains Top-level Domains (TLDs) • Generic Top Level Domains (gTLDs) – – – – – – – .com - commercial organizations .org - not-for-profit organizations .edu - educational organizations .mil - military organizations .gov - governmental organizations .net - network service providers New: .biz, .info, .name, .xxx (nearly..) • Country code Top Level Domains (ccTLDs) – One for each country How to get a domain name? • In 1998, non-profit corporation, Internet Corporation for Assigned Names and Numbers (ICANN), was formed to assume responsibility from the US Government • ICANN authorizes other companies to register domains in com, org and net and new gTLDs – Network Solutions is largest and in transitional period between US Govt and ICANN had sole authority to register domains in com, org and net ICANN and politics.. • Why should a US company control Internet naming? • Should companies (from whatever country) be able to profit from internet names? – 28th Aug 2006: “ICANN to allow domain registries to charge ‘what the market will bear’ for domain names & renewals” How to get an IP Address? • Answer 1: Normally, answer is get an IP address from your upstream provider – This is essential to maintain efficient routing! • Answer 2: If you need lots of IP addresses then you can acquire your own block of them. – IP address space is a scarce resource - must prove you have fully utilized a small block before can ask for a larger one and pay $$ (Jan 2002 - $2250/year for /20 and $18000/year for a /14) How to get lots of IP Addresses? Internet Registries RIPE NCC (Riseaux IP Europiens Network Coordination Centre) for Europe, Middle-East, Africa APNIC (Asia Pacific Network Information Centre )for Asia and Pacific ARIN (American Registry for Internet Numbers) for the Americas, the Caribbean, sub-saharan Africa Note: Once again regional distribution is important for efficient routing! Can also get Autonomous System Numbers (ASNs from these registries Are there enough addresses? • Unfortunately No! – 32 bits  4 billion unique addresses – but addresses are assigned in chunks – ex. cornell has four chunks of /16 addressed • ex. 128.84.0.0 to 128.84.255.255 • 128.253.0.0, 128.84.0.0, 132.236.0.0, and 140.251.0.0 • Expanding the address space! – IPv6 128 bit addresses – difficult to deploy (requires cooperation and changes to the core of the Internet) DHCP and NATs • Dynamic Host Control Protocol – lease IP addresses for short time intervals – hosts may refresh addresses periodically  only live hosts need valid IP addresses • Network Address Translators – Hide local IP addresses from rest of the world – only a small number of IP addresses are visible outside  solves address shortage for all practical purposes  access is highly restricted • ex. peer-to-peer communication is difficult NATs in operation • Translate addresses when packets traverse through NATs • Use port numbers to increase number of supportable flows DNS: Domain Name System Domain Name System: • distributed database implemented in hierarchy of many name servers • application-layer protocol host, routers, name servers to communicate to resolve names (address/name translation) – Note how a core Internet function is implemented as application-layer protocol – complexity at network’s “edge” DNS name servers How could we provide this service? Why not centralize DNS? • • • • single point of failure traffic volume distant centralized database maintenance doesn’t scale! • no server has all name-to-IP address mappings Name server: process running on a host that processes DNS requests local name servers: – each ISP, company has local (default) name server – host DNS query first goes to local name server authoritative name server: – can perform name/address translation for a specific domain or zone Name Server Zone Structure root com gov edu lucent mil net org fr gr us uk Structure based on administrative issues. ustreas irs www Zone: subtree with common administration authority. Name Servers (NS) root com gov edu lucent cornell ustreas customs ... Root NS Lucent NS Ustreas NS irs IRS NS www Name Servers (NS) • NSs are duplicated for reliability. • Each domain must have a primary and secondary. • Each host knows the IP address of the local NS. • Each NS knows the IP addresses of all root NSs. DNS: Root name servers • contacted by local name server that can not resolve name • root name server: – Knows the authoritative name server for main domain • ~ 60 root name servers worldwide – real-world application of anycast Simple DNS example root name server host surf.eurecom.fr wants IP address of 2 www.cs.cornell.edu 5 1. Contacts its local DNS server, dns.eurecom.fr 2. dns.eurecom.fr contacts root name server, if necessary local name server dns.eurecom.fr 3. root name server contacts authoritative name server, 1 6 dns.cornell.edu, if necessary (what might be wrong with this?) requesting host surf.eurecom.fr 4 3 authorititive name server dns.cornell.edu www.cs.cornell.edu root name server .edu name server DNS example Root name server: 2 • may not know authoritative name server local name server dns.eurecom.fr • may know intermediate name server: who to contact to find authoritative name 1 server 4 3 5 6 7 8 9 intermediate name server dns.cornell.edu 10 requesting host authoritative name server dns.cs.cornell.edu surf.eurecom.fr www.cs.cornell.edu DNS Architecture • Hierarchical Namespace Management – domains and sub-domains – distributed and localized authority • Authoritative Nameservers – server mappings for specific sub-domains – more than one (at least two for failure resilience) • Caching to mitigate load on root servers – time-to-live (ttl) used to delete expired cached mappings DNS: query resolution root name server .edu name server iterated query: • contacted server replies with name of iterated query 2 4 server to contact 3 recursive • “I don’t know this 5 query name, but ask this 6 server” 9 • Takes burden off root local name server intermediate name server servers recursive query: • puts burden of name resolution on contacted name server • reduces latency dns.eurecom.fr 1 10 requesting host dns.cornell.edu 8 7 authoritative name server dns.cs.cornell.edu surf.eurecom.fr www.cs.cornell.edu DNS records: More than Name to IP Address DNS: distributed db storing resource records (RR) RR format: (name, • Type=A – name is hostname – value is IP address – One we’ve been discussing; most common • Type=NS – name is domain (e.g. foo.com) – value is IP address of authoritative name server for this domain value, type,ttl) • Type=CNAME – name is an alias name for some “cannonical” (the real) name – value is cannonical name • Type=MX – value is hostname of mailserver associated with name nslookup • Use to query DNS servers (not telnet like with http – why?) • Examples: – nslookup www.yahoo.com – nslookup www.yahoo.com dns.cs.cornell.edu • specify which local nameserver to use – nslookup –type=mx cs.cornell.edu • specify record type PTR Records • Do reverse mapping from IP address to name • Why is that hard? Which name server is responsible for that mapping? How do you find them? • Answer: special root domain, arpa, for reverse lookups Arpa top level domain Want to know machine name for 128.30.33.1? Issue a PTR request for 1.33.30.128.in-addr.arpa root arpa com gov edu mil net org In-addr ietf gr 33 1 us uk www.ietf.org. www 128 30 fr 1.33.30.128.in-addr.arpa. Why is it backwards? • Notice that 1.30.33.128.in-addr.arpa is written in order of increasing scope of authority just like www.cs.foo.edu • Edu largest scope of authority; foo.edu less, down to single machine www.cs.foo.edu • Arpa largest scope of authority; in-addr.arpa less, down to single machine 1.30.33.128.inaddr.arpa (or 128.33.30.1) In-addr.arpa domain • When an organization acquires a domain name, they receive authority over the corresponding part of the domain name space. • When an organization acquires a block of IP address space, they receive authority over the corresponding part of the in-addr.arpa space. • Example: Acquire domain berkeley.edu and acquire a class B IP Network ID 128.143 DNS protocol, messages DNS protocol : query and reply messages, both with same message format msg header • identification: 16 bit # for query, reply to query uses same # • flags: – query or reply – recursion desired – recursion available – reply is authoritative – reply was truncated DNS protocol, messages Name, type fields for a query RRs in reponse to query records for authoritative servers additional “helpful” info that may be used The Transport Layer Purpose of this layer • Interface end-to-end applications and protocols – Turn best-effort IP into a usable interface • Data transfer b/w processes: – Compared to end-to-end IP • We will look at 2: – TCP – UDP application transport network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical application transport network data link physical UDP • Unreliable Datagram Protocol • Best effort data delivery between processes – No frills, bare bones transport protocol – Packet may be lost, out of order • Connectionless protocol: – No handshaking between sender and receiver – Each UDP datagram handled independently UDP Functionality • Multiplexing/Demultiplexing – Using ports • Checksums (optional) – Check for corruption application-layer data segment header segment Ht M Hn segment P1 M application transport network P3 M M P4 application transport network receiver M P2 application transport network Multiplexing/Demultiplexing • Multiplexing: – Gather data from multiple processes, envelope data with header – Header has src port, dest port for multiplexing • Why not process id? • Demultiplexing: – Separate incoming data in machine to different applications – Demux based on sender addr, src and dest port 32 bits Length, in bytes of UDP segment, including header source port # dest port # length checksum Application data (message) UDP segment format Implementing Ports • As a message queue – Append incoming message to the end – Much like a mailbox file • If queue full, message can be discarded • When application reads from socket – OS removes some bytes from the head of the queue • If queue empty, application blocks waiting UDP Checksum • Over the headers and data – Ensures integrity end-to-end – 1’s complement sum of segment contents • Is optional in UDP • If checksum is non-zero, and receiver computes another value: – Silently drop the packet, no error message detected UDP Discussion • Why UDP? – No delay in connection establishment – Simple: no connection state – Small header size – No congestion control: can blast packets • Uses: – Streaming media, DNS, SNMP – Could add application specific error recovery TCP • Transmission Control Protocol – Reliable, in-order, process-to-process, two-way byte stream • Different from UDP – Connection-oriented – Error recovery: Packet loss, duplication, corruption, reordering • A number of applications require this guarantee – Web browsers use TCP Handling Packet Loss sender message receiver time There are a number of reasons why the packet may get lost: - router congestion, lossy medium, etc. How does sender know of a successful packet send? Lost Acks sender message timeout time ack What if packet/ack is lost? receiver Delayed ACKs sender message timeout time receiver ack message What will happen here? Due to congestion, small timeout, … Delayed ACKs  duplicate packets Delayed ACKs sender m1 receiver timeout ack time m1 m2 timeout ack How to solve this scenario? Insertion of Packets sender m1 receiver ack1 time m2 m2’ ack2 m2’ could be from an old expired session! Message Identifiers • Each message has <message id, session id> – Message id: uniquely identifies message in sender – Session id: unique across sessions • Message ids detect duplication, reordering • Session ids detect packet from old sessions • TCP’s sequence number has similar functionality: – Initial number chosen randomly – Unique across packets – Incremented by length of data bytes TCP Packets 32 bits URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) source port # dest port # sequence number acknowledgement number head not UA P R S F len used checksum rcvr window size ptr urgent data Options (variable length) application data (variable length) counting by bytes of data (not segments!) # bytes rcvr willing to accept sender TCP Connection Establishment receiver TCP is connection-oriented. Starts with a 3-way handshake. Protects against duplicate SYN packets. TCP Usage sender receiver TCP timeouts • What is a good timeout period ? – Want to improve throughput without unnecessary transmissions NewAverageRTT = (1 - ) OldAverageRTT +  LatestRTT NewAverageDev = (1 - ) OldAverageDev +  LatestDev where LatestRTT = (ack_receive_time – send_time), LatestDev = |LatestRTT – AverageRTT|,  = 1/8, typically. Timeout = AverageRTT + 4*AverageDev • Timeout is thus a function of RTT and deviation TCP Windows • Multiple outstanding packets can increase throughput TCP Windows DATA, id=17 DATA, id=18 DATA, id=19 DATA, id=20 ACK 17 ACK 18 ACK 19 ACK 20 • Can have more than one packet in transit • Especially over fat pipes, e.g. satellite connection • Need to keep track of all packets within the window • Need to adjust window size TCP Congestion Control • TCP increases its window size when no packets dropped • It halves the window size when a packet drop occurs – A packet drop is evident from the acknowledgements • Therefore, it slowly builds to the max bandwidth, and hover around the max – It doesn’t achieve the max possible though – Instead, it shares the bandwidth well with other TCP connections • This linear-increase, exponential backoff in the face of congestion is termed TCP-friendliness TCP Window Size Max Bandwidth • Linear increase • Exponential backoff Bandwidth • Assuming no other losses in the network except those due to bandwidth Time TCP Fairness A D Bottleneck Link Bandwidth for Host A B Bandwidth for Host B • Want to share the bottleneck link fairly between two flows TCP Slow Start • Linear increase takes a long time to build up a window size that matches the link bandwidth*delay • Most file transactions are not long enough • Consequently, TCP can spend a lot of time with small windows, never getting the chance to reach a sufficiently large window size • Fix: Allow TCP to build up to a large window size initially by doubling the window size until first loss TCP Slow Start Max Bandwidth • Initial phase of exponential increase Bandwidth • Assuming no other losses in the network except those due to bandwidth Time TCP Summary • Reliable ordered message delivery • Connection oriented, 3-way handshake • Transmission window for better throughput • Timeouts based on link parameters • Congestion control • Linear increase, exponential backoff • Fast adaptation • Exponential increase in the initial phase

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Introduction to Distributed Systems and Networking