Download Switch - NDSU Computer Science

Computer Networks (CS 778) Chapter 3: Packet Switching data over long distances (not just 1 link).  2 Packet Switching approaches: connection-oriented and connectionless  Forwarding or Switching: Routing packets from an input to the right output.  Key problem a packet switch must deal with is finite bandwidth of its outputs.  Contention: Packets arrive for an output faster than its capacity (buffered)  Congestion: Switch runs out of buffer space.  This chapter deals with forwarding and contention in packet switches  LAN Switching & ATM Switching: 2 main network packet switching technologies. Switching/Forwarding (layer? OSI network; Internet IP; ATM). Switch: multiInput, multiOutput devise which transfers packets from an input to  1 output. (called switching or forwarding). Assume bi-direction links (in a wire world, a link is an inPort-outPort pair). Switchs add star topology to the topologies we’ve seen so far (pt-pt, bus, ring Star allows a hierarchy and virtually unlimited size. Stars are scalable (adding host to a switch w/o decreasing performance for others, assuming switch “backplane” bandwidth is sum of link bandwidths) How does a switch decide which output to place packet on? Several approaches: Datagrams (connectionless approach) Virtual Circuits (connection-oriented approach) Source Routing (simple approach – less common than the other two) (Switch below has two T3 links and one STS-1 SONET link) Event Timing Datagrams (and datagram networks)        No setup phase (connectionless model) Each packet contains full destination address Hosts never know if the network can deliver or even if the destination can receive. Forwarding tables: eg, at SW2: A,C,D->3 B,G,H->0 E->2 Table creation (Cpt4) is hard: topology may change or there may be multiple paths  E.g., successive packets from A to B can follow different paths A switch or link failure may not preclude communication.  tables may be updated to route around the failure. This capability goes back to the ARPANET (forerunner of the internet)  Since it was a military network, this capability was essential. Virtual Circuit (VC) Switching (connection-oriented)  Setup phase: Establishes a connection-state in each switch along connection–by the   System Admin: for long-lived permanent virtual circuit or PVC or by Host: send setup req into net switched virtual circuit or SVC(signaling), Transfer Phase Teardown Phase.  All packets follow the same circuit (Analogous to phone calls)  Each switch keeps a VC table with VC state-entries: | inPort | inVCI | outPort | outVCI |   (VCI=VC-identifier) Combination of (inPort, inVCI) uniquely identifies VC thru a particular link. VCI’s are not globally unique (in fact, inVCI & outVCI usually differ)  VCIs have link-local scope. WHY?? Virtual Circuit (Continued)   For a PVC, Network Administrator picks unused VCI for each link (e.g., 5,11,7,4): inPort InVCI outPort outVCI VC-table entries at each switch are: SW1: 2 5 1 11 SW2: 3 11 0 7 SW3: 0 7 3 4 Virtual Circuit (Continued)     How is signaling done for SVCs? (setup communication) hostA sends SetupMessage (SM) to SW1 (with at least hostA, hostB addrs) SM flows on SW2 -> SW3 -> hostB (How? Routing details later.) Each SW sets table entry: (inPort, inVCI, outPort,__) (chooses an unused InVCI)   Note that the switch (or host for the last link) chooses the InVCI for the link coming into it) VC-table entries would be inPort SW1: 2 SW2: 3 SW3: 0 InVCI 5 11 7 outPort 1 0 3 outVCI Virtual Circuit (Continued)       hostB gets SetupMess (SM) If willing to accept connection, attaches OutVCI=4 to ack. Sends ack downstream: hostB - > SW3 - > SW2 - > SW1 - > hostA. Each SW completes VC-table entry, sends ack with appropriate link-VCI, inPort InVCI outPort outVCI VC-table entries would be SW1: 2 5 1 11 SW2: 3 11 0 7 SW3: 0 7 3 4 SW1 sends ack to hostA specifying VCI=5. The setup phase is complete. Second stage is data transfer. Third stage is connection teardown (when done sending)   HostA sends teardown message (TD) to SW1 (SW1 removes table entry) TD is sent SW1 - > SW2 - > SW3 - > hostB, each SW does similarly. Virtual Circuit (versus Datagram) Virtual Circuit Model:        Typically 1 RTT (setup) before 1st data packet is sent. Data packets have only a small identifier (setup mess has full destination address) (per-packet header overhead is small) If switch/link fails, connection is broken and a new one needs to be set up. Host reserves resources at setup, gets much info (net is able to transmit, dest is able to receiv VCI service is local (no global server - involving constant communication overhead The most popular VC technologies are: OSI X.25 uses VC model in a 3 part strategy:    Buffers are allocated along the VC when circuit is initialized. Sliding window is run between pairs of VC nodes for error correction (and flow ctrl) Circuit setup is rejected by any node with insufficient buffer availability      called hop-to-hop flow control Thus, there is contention, but never congestion. Frame Relay is a straight-forward implementation of VC technology. Extremely popular due to its simplicity (Frame Relay PVCs provide almost leased-line-like servic Some basic QoS and Congestion-avoidance is provided, but it’s minimal. ATM (coming up in 3.3) Datagram (versus Virtual Circuit) Datagram Model:     There is no round trip time delay waiting for setup. (Host can send data when ready.) Source doesn’t know if network can deliver packet or even if the intended destination is up and accepting packets Since packets are treated independently, it is possible to route around link/node failures. Since every packet must carry the full destination address, per packet overhead is higher than for the connection-oriented model. Source Routing      Uses neither Virtual Circuits nor conventional datagrams. Address contains entire sequence of out-Ports on source-to-destination path. List is rotated so the next out-Port is always in front. Problems? May be difficult for source to know route. Header must be variable size. Alternatives to rotating OutPort Addresses:  Stripping: Each SW strips off its outPort (eg, 3,0,1 to 3,0 at SW1)  Out-Port Pointer in fixed position in header (eg, |ptr| 3 | 0 | 1 | -- > |ptr| 3 | 0 | 1 | at SW1 ). `-----^ ` - -^ Source Routing (continued)   Source routing can be used in both datagram networks or VC networks. Internet Protocol includes source routing option.     Selected packets can be source routed. However, the majority are switched datagrams. Some VC nets use source routing to get VC setup request along path. Source routing suffers from poor scalability  (Hard for a host to know the complete route in large net). Performance A switch can be built from a general-purpose workstation. In fact, Unix provides this capability in the kernel (We will consider special-purpose switch hardware later.)      Install multiple NICs (Network Interface Cards). Use DMA for transferring packets between MM and the NICs. Build and manage your own buffers. CPU needs to inspect only the header information to determine out-Port. Usually the bottleneck is I/O bus bandwidth (all packets must go thru I/O bus)  Such a switch will have severe limitation on aggregate back-plane bandwidth. I/O bus CPU Interface 1 Interface 2 Interface 3 Main memory Forwarding vs Routing Forwarding: select outPort based on dest-address and forwarding table Routing: process by which forwarding table is built.  Bridge: A forwarding-switch (between LANS, eg Ethernets..)  AKA: LAN switch or LAN Bridge  For ethernets, one could use a repeater (to forward signal), but they impose size limitations. Could implement using node in promiscuous mode between 2 Ethernets (forwarding all packets   Intelligent bridge (learning bridge) don’t forward all packets (use forwarding table: Host > Port Starts empty; for each packet received, record sender’s port. If host is not in table, forward to all ports (table is just a filter).     All entries timeout after a fixed time to protects against inaccuracies due to host removal. Loops can form (causing frames to loop forever). Thus, bridges run distr spanning tree alg Think of the bridge-extended LAN as a graph (vertexes=bridges, edges=connections). Spanning tree is acyclic sub-graph which covers (spans) all vertexes. A  Network as a Graph: 3 4 C 6 1 2 1 B 9 E 1 D F Asynchronous Transfer Mode (ATM)   Connection-oriented, packet-switched network –Virtual Circuit Used for both WANs & LANs (but predominantly in long haul WANs today  Specified by ATM Forum (www.atmforum.org)  Commonly transmits over SONET at the physical level (but not a requirement)  QoS capabilities are one of the strong selling points.  Fixed length Packets = 53 byte cells:  When any VC is set up, dest address must appear in signaling message.  ATM uses 1 of several dest addr formats (different from MAC addr in LANs)   5-byte header + 48-byte payload. Two examples (detail later)  NSAP (Network Service Access Point)  E.164 48 byte payload was a compromise (US bid for 64B and Europe bid for 32B.) A little history on ATM Part of the B-ISDN standard of ITU in 1984. B-ISDN was motivated by PCs demanding higher bandwidths and lower error rates - was to replace separate Telephone network infrastructure & data networks - was to allow integration on one digital network fabric. - was to scale to gigabit speeds -was to provide a flexible way to divide bandwidth into chunks for different traffic 1988 ITU chose ATM as underlying switching/multiplexing technology for B-ISDN. 1991 ATM Forum was founded to replace ITU as the standards body for ATM. Planned Benefits of ATM: - Efficient use of network bandwidth (bandwidth on demand) - Scalability (LAN-WAN, # of users, speed) - Low latency and low latency variation (virtual circuit and pre-negotiated QoS) - Transparency to existing Applications - Integrated Service - Internetwork-able with existing WANs - Support both constant and variable bit rates Cells (Variable versus Fixed-Length? Size?) Fixed-length easier to switch in hardware, simpler, but no optimal length    if small: header-to-data overhead is high if large: low utilization for small messages Small size provides a finer-grained pre-emption point for scheduling a link, e.g.,       maximum packet = 4KB = 4096 bytes link speed = 100Mbps transmission time = 4096 x 8 bits/packet / 100 = 327.68μs / packet Thus, a high priority packet may sit in the queue for 327.68μs in contrast, 53 x 8 / 100 = 4.24μs / packet for ATM Near cut-through behavior, e.g.,  two 4KB packets arrive at same time  link idle for 327.68μs while both arrive  at end of 327.68μs, still have 8KB to transmit  in contrast to 53-byte cells where host can transmit first cell after 4.24μs and at the end of 327.68μs, there would be just over 4KB left in queue Cell Format  User-Network Interface (UNI) (cell format shown above) (host-to-switch format)  GFC Generic Flow Control (Intended for traffic ctrl across user-net interface. Not used)  VCI Virtual Circuit Identifier  VPI: Virtual Path Identifier (size goes to 12 bits for NNIs (when GFC goes away)  Type:     CLP: Cell Loss Priority  Set by source host if cell can be dropped without serious damage to message HEC: Header Error Check (CRC-8) Network-Network Interface (NNI)  switch-to-switch format  GFC becomes part of larger VPI field   1st bit: specifies management versus data cells 2nd bit: (for data cells) EFCI (Effective Forward Congestion Indicator) set by switches about to become congested. 3rd bit: user signalling (used in conjunction with AAL-5 to delineate frames) ATM Model  _VOICE VIDEO DATA_ | ATM Adaptation Layer (AAL)| | ATM Layer | | Physical Layer | Physical Layer   physical interfaces and framing protocols Several ATM Forum specs for physical connectivity between devices: DS-1 or T1 at 1.54 Mbps  DS-3 or T3 at 45 Mbps  100 Mbps access using FIDDI standard  155 Mbps access using Fiber Channel standard on multimode fiber  SONET (nonUS=SDH, Synchronous Digital Hierarchy - single/multimode fiber at N*51.84 SONET is predominant physical layer LEVEL LINE-RATES OC-1 51.84 Mbps OC-3 155.52 Mbps OC-12 622.08 Mbps OC-48 2488.32 Mbps   ATM Adaptation Layer (AAL) AAL is the Interface between user applications and the ATM Layer  Performs SAR, segmentation of packets into ATM celss and reassembly of ATM cells into packets.  Also detects and handles out of order or lost cells.  Supports ATM Application Level Service Classes     CBR (Constant Bit rate) Reserves a set bandwidth end-to-end. VBR (Variable Bit Rate) bursty traffic (realtime, non-rt; Reserves amt of variable bdwd) ABR (Available Bit Rate) Min bandwidth and burst above it w/o cell loss. UBR (Unspecified Bit Rate) best effort service similar to Internet Serv Class Traffic descriptors (at Call Setup) QoS Parameters CBR PCR (Peak Cell Rate) CTD (Cell Transfer Delay) CDV (Cell Delay Variation) CLR (Cell Loss Ratio) rt-VBR PCR (Peak Cell Rate) Maximum CTD SRC (Sustained CR)) peak-to-peak CDV MBS (Max Burst Size) CLR (Cell Loss Ratio) ABR PCR (Peak Cell Rate) CLR (Cell Loss Ratio) MCR (Min Cell Rate) UBR PCR (Peak Cell Rate) FTP (file trans) Intended Uses realtime Video Voice compressed Voice compressed Video rt-OLTP RPC NFS/DDBMS Four AAL protocols were originally defined (AAL-1, AAL-2, AAL-3, AAL-4), then AAL-3 and AAL-4 were merged into AAL-3/4, then AAL-5 was added. Segmentation and Reassembly User Packets ATM Cells ATM Adaptation Layer (AAL)    AAL 1,2 designed for apps needing guaranteed rate (voice, video; CBR, rt-VBR) AAL 3/4 designed for packet data (nrt-VBR) AAL 5 alt standard for packet data (LAN traffic; connection/connectionless VBR) Segmentation and Reassembly (details) (Convergence sublayer of the AAL layer provides an interface to the application) (SAR sublayer converts messages to cells)  AAL-1 AAL-1 is the protocol used for real-time, constant-bit-rate, connection-oriented traffic  E.g., Uncompressed audio and video Bits are fed in by the application at a constant rate and must be delivered at the same rate with minimum delay, jitter(variation in rate) and overhead  One byte (or two) of ATM payload is used for control information     P-cells are used when message boundaries must be preserved (Pointer gives the offset to the start of the next message in number of bytes) SN is the cell sequence number SNP cell sequence number checksum (CRC-3), Even parity bit further reduces liklihood of bad SN)  AAL-2 AAL-2 is the protocol used for compressed, constant-bit-rate, connection-oriented traffic    E.g., Compressed audio and video Bit rate can vary strongly over time One byte (or two) of ATM payload is used for control information     SN is the cell sequence number IT stands for Information Type and is used to indicate that the cell is the start/middle/end of message LI is the length indicator (tells how bit the payload is in bytes (could be less than 45) CRC is a checksum for the entire cell AAL-2 Cell Format AAL 3/4 8 8 CPI Btag 16 < 64 Kbytes BASize USER DATA 0-24 8 8 16 Pad 0 Etag Length Convergence Sublayer Protocol Data Unit (CS-PDU = AAL3/4 packet) CPI: common part indicator (CS-PDU version); Btag/Etag: begin/end tag BASize (Buffer size hint) User-data (AAL var len payload) Length: PDU size Originally ITU had different protocols for connection-oriented and Connectionless service for data transport, ie, sensitive to loss and errors but not time dependent. Then they discovered there was no need for 2 protocols so conbined into AAL-3/4 which can operate in stream (no message bddry maintained) or message mode and provide both reliable and unreliable transport as well as multiplexing (not available in any of the others) which allows a host the option of multiplexing multiple sessions onto one VC (saves money, since charging is done by the VC): 40 2 4 10 352 (44 bytes) ATM header Type SEQ MID Cell Payload 6 10 bits Length CRC-10 ATM Cell AAL3/4 format: Type (BOM/EOM: begin/end of message COM: continuation of message) SEQ: sequence number; MID: message id, AAL3/4 Payload=44B (4B of standard ATM payload for 6 special AAL3/4 fields: Type, SEQ, MID, Length, CRC-10) Length: # of PDU bytes cell | CS-PDU-Header |U S E R D A TA U S E R D A T A | CS-PDU-trailer | | | | | Segmentation: V V V V |ATM-header|AAL-header| Cell-Payload |AAL-trailer| |||pyld|| |||pyld|| … |||pyld padding|| AAL5 < 64 KB USER DATA 0-47B 16 16 32 pad Reserved Length CRC-32 Convergence Sublayer Protocol Data Unit CS-PDU Format (AAL5 packet format) pad so trailer falls at end of ATM cell Reserved for higher layer sequencing / multiplexing Length: size of PDU (data only – padded to be a multiple of 48bytes) CRC-32 (detects missing or misordered cells) Cell Format - same as AAL3/4 except: end-of-PDU bit in Type field of ATM header |U S E R D A TA U S E R D A T A |pad| CS-PDU-trailer | | Segmentation: V |ATM-header|Cell-Payload | ||pyld| ||pyld| … ||pyld| AAL-1 trhu AAL-3/4 were designed by the telecom industry without much input from the computer industry. When the computer industry woke up and realized the implications of, complexity and inefficiency of two headers (2 layers) and the short checksum (10 bits) they invented their own AAL protocol, AAL-5. It was originally called SEAL for Simple Efficient Adaptation Layer. It offers several service options: 1. reliable service (guaranteed delivery and flow control) 2. Unreliable service (no guaranteed delivery – best effort) VPI/VCI   Host: treat VPI/VCI together as a 24-bit circuit identifier A Switch that routes many VCs between company sites can use one VPI instead of many VCIs   Makes the Virtual Circuit Tables smaller and makes addressing faster. Network: A VP aggregates multiple circuits into 1 path ATM in the LAN  Problem: In common shared-media LANs multicast/broadcast is easy since every node is connected to the same link. (e.g., Ethernet, Token-Ring)   Protocols were built to take advantage of easy broadcast (eg, Addr Resolution Protocol=ARP) Two Solutions:  Redesign Protocols that make LAN assumptions which are not true of ATM   E.g., ATMARP doesn’t depend on broadcast Make ATM behave more like a shared-media LAN (eg, support broadcast/multicast without losing performance advantages of switched network. I.e., add functionality to ATM LANs so anything that runs over sharedmedia LAN runs on ATM LAN  Called LAN Emulation or LANE LANE terms & addresses are confusing (host/brdige/router=LANE Emulation Client =LECs) LANE must provide, e.g., 48-bit MAC addresses to emulate Ethernet. VCI is very different from an address (need addr for setup, then VCI used for transit) For LANE, ATM switches don’t change, LANE has additional servers (at hosts?) LECS: LAN Emulation Config Serv (New LEC finds LECS: gets LANE info, frame size, LES adr LES: LAN Em Serv (New LEC sends MAC & ATM addrs to LES. LES gives ATM addr of BUS) BUS: Broadcast & Unknown Server (maintains pt-multipt VC to all clients for broadcasting) Switching Hardware Overview  Terminology: n x m switch has n inputs and m outputs    (usually n=m, but not always) Design Goals  High throughput  Scalability (with respect to n) Ports and Fabrics  Port      Input port Output port Input port Output port Fabric Input port Output port Input port Output port Contains Electric or Optic receivers and transmitters, Provides buffers for packets (cells) waiting to be switched or transmitted, contains circuitry. InPort determines and attaches outPort# (in predominant case of self-routing fabric) InPort is the first place to look for performance bottlenecks. InPort deals with complexities of the outside world so fabric has simple job: Fabric   Deliver presented packet to the right output. (as simply as possible) May do buffering also (internal buffering fabric). Buffering (and Head-of-line blocking)  Head-of-line blocking: E.g., when InPort buffers have head-of-line cells in a FIFO queues destined for the same OutPort, while cells behind them wait unnecessarily (destined for other OutPorts).     Can reduce throughput down to 59% (assuming uniformly distributed arrivals). Majority of switches use pure outPort or mixed internal/outPort buffering. Buffering is also important wrt QoS (can’t always use simple FIFO, Chpt 6) Buffering is needed wherever contention is possible  input ports (contending for fabric)  Internal fabric buffers (contending for output port)  output ports (contending for links) 2 1 2 Port 1 Sw itch Port 2 Crossbar Switch  4X4 crossbar: Conceptually simple (Every input connected to every output)  Only possible contention problem is OutPort contention.    Complexity of an OutPort grows faster than the number of InPorts. Complexity of switch  n2 Designing a switch with low OutPort complexity is difficult. Knockout Switch is one such. (next slide) Knockout Switch (not-quite-perfect crossbar)  8-to-4 knockout concentrator Perfect crossbar can route packets from all n inports to 1 outport concurrently. Inputs n-by-l Knockout Concentrator:    OutPort can accept l packets  Pick l small enough to keep costs low  Pick l large enough for hotspots  InPort where arrivals concentrate  E.g., popular website.. Each OutPort has 3 parts:  Filters (recognize packets for this port)  Concentrator (picks  l packets, discard rest  Hard job – needs to be fair     Losers go to the next section. Winner beats all others in a section: D D D D D D D D D D D D 1 2 D D 3 4 Outputs section1 section2 section3 Queue of length l at each OutPort for accepted packets that are as yet untransmitted section4 Knockout Switch Output Port Buffer   Each OutPort has l separate buffers Buffers are filled round-robin (by a shifter)   Shifter (a) Buffers Occupancy levels always within 1 of each other Buffers are emptied in round-robin fashion  Preserving arrival order Shifter    A) 3 packets arrive B) 3 packets arrive, 1 leaves C) 1 packet arrives, 1 leaves. (b) Buffers Shifter (c) Buffers Knockout Switch (All components) Shared Media Switches Examples include switches built from PCs (sharing PC bus and memory) Tend to scale poorly (shared resources get overloaded as switching task grows) Nice aspect is large shared buffer space built using COTS parts better utilization possible. Writes only 1 packet to memory at a time. Mux-to-memory bus must be n times faster than link speed. Arriving packet: header is stripped and goes Write-ctrl logic which gets a memory address from a freelist, writes the packet to that address, adds the address to the appropriate outPort list Read-ctrl takes packets from outPort lists sends to outPort thru demux returns memory address to the freelist. Self-Routing Fabrics BANYAN     Route: 0up, 1down on: left-bit middle-bit right-bit  Banyan Network  Constructed from simple 2 x 2 switching elements as above  InPort attaches self-routing header = Binary_OutPort#  OutPort removes it  Only one path exists from a given input to a given output.   No collisions if inputs are pre-sorted into ascending order Complexity: n log2 n (n/2 switching elements per stage and log2 n stages) Banyan Switch example The route two cells take through the switch. 6 = 110 (down, down, up) 1 = 001 (up, up, down) Banyan Switch examples Cell collisions on the left, e.g., 5&7; 0&3; 6&4; 2&1. And 2 in middle due to the fact that the inputs are not ordered (assume lesser is taken). Collision-free routing on the right (inputs are ordered) If the cells are sorted by destination and presented on input lines, 0,2,4,6, 1,3,5,7; then there will be no collisions.  Batcher Network  switching element that sorts inputs (1 path from each In to each Out)  some elements sort into ascending order ( )  some elements sort into descending order ( ) (if only 1 cell go opposite arrow)  elements arranged to implement merge sort complexity: n log2 n   Common Design: Batcher-Banyan Switching Fabric Batcher-Banyan Switch (example with 4 cells) Batcher-Banyan Switches Batcher-Banyan would have to drop packets whenever  2 are headed for the same OutPort. There are switches that deal with this problem. First came Starlite in 1984, Moonshine Switch in 1987, Sunshine Switch in 1991. They differ only in the way their trap component works. The l banyans allow accepting up to l packets destined for any one port at a time (selector makes sure they go each to a different banyan and sends any extras to Delay for recycling). The Trap identifies the extras for Selector to recycle. High-Speed IP Routers      link interface router lookup (input) common IP path (input) packet queue (output) Line card (forwarding buffering)  Switch (possibly ATM) Line Cards + Forwarding Engines Network Processor   routing protocol(s) exceptional cases Line card (forwarding buffering) Routing software w/ router OS Routing CPU Line card (forwarding buffering) Line card (forwarding buffering)  Buffer memory Alternative Design NI with uP ... NI with uP NI with uP ... NI with uP NI with uP ... NI with uP PC PC CPU CPU MEM MEM PC CPU MEM PC PC Crossbar Switch CPU MEM PC CPU CPU MEM MEM NI with uP ... NI with uP NI with uP ... NI with uP NI with uP ... NI with uP

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Switch - NDSU Computer Science