* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CMPT 880: Internet Architectures and Protocols
Survey
Document related concepts
Wake-on-LAN wikipedia , lookup
Asynchronous Transfer Mode wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Remote Desktop Services wikipedia , lookup
Computer network wikipedia , lookup
Airborne Networking wikipedia , lookup
Network tap wikipedia , lookup
Deep packet inspection wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
UniPro protocol stack wikipedia , lookup
Transcript
School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda 1 Review of Basic Networking Concepts Internet structure Protocol layering and encapsulation Internet services and socket programming Network Layer Network types: Circuit switching, Packet switching Addressing, Forwarding, Routing Transport layer Reliability, congestion and flow control TCP, UDP Link Layer Multiple Access Protocols Ethernet 2 Transport services and protocols provide logical communication between app processes running on different hosts transport protocols run in end systems send side: breaks app messages into segments, passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport protocol available to apps Internet: TCP and UDP application transport network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical application transport network data link physical 3 Transport vs. network layer network layer: logical communication between hosts Household analogy: transport layer: logical communication between processes processes = kids relies on, enhances, network layer services 12 kids sending letters to 12 kids app messages = letters in envelopes hosts = houses transport protocol = Ann and Bill network-layer protocol = postal service 4 Multiplexing/demultiplexing Multiplexing at send host: gathering data from multiple sockets, enveloping data with header (later used for demultiplexing) Demultiplexing at rcv host: delivering received segments to correct socket = socket application transport network link = process P3 P1 P1 application transport network P2 P4 application transport network link link physical host 1 physical host 2 physical host 3 5 Connectionless demux P2 client IP: A P1 P1 P3 SP: 9157 DP: 6428 SP: 6428 SP: 6428 DP: 9157 DP: 5775 SP: 5775 server IP: C DP: 6428 Client IP:B UDP socket identified by: (dst IP, dst Port) datagrams with different src IPs and/or src ports are directed to same socket 6 Connection-oriented demux (cont) P1 P4 P5 P2 P6 P1P3 SP: 5775 DP: 80 S-IP: B D-IP:C SP: 9157 client IP: A DP: 80 S-IP: A D-IP:C SP: 9157 server IP: C DP: 80 S-IP: B D-IP:C Client IP:B TCP socket identified by 4-tuple: (src IP, src Port, dst IP, dst Port) 7 UDP: User Datagram Protocol [RFC 768] “no frills,” “bare bones” Internet transport protocol “best effort” service, UDP segments may be: lost delivered out of order to app Connectionless: no handshaking between UDP sender, receiver each UDP segment handled independently of others Why is there a UDP? no connection establishment (which can add delay) simple: no connection state at sender, receiver small segment header no congestion control: UDP can blast away as fast as desired 8 UDP often used for streaming multimedia apps loss tolerant Length, in rate sensitive bytes of UDP other UDP uses DNS SNMP reliable transfer over UDP: add reliability at application layer application-specific error recovery! segment, including header 32 bits source port # dest port # length checksum Application data (message) UDP segment format 9 Reliable data transfer important in application, transport, and link layers top-10 list of important networking topics! characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt) 10 Pipelined (Sliding Window) Protocols Pipelining: sender allows multiple, “in-flight”, yet-to-beacknowledged pkts range of sequence numbers must be increased buffering at sender and/or receiver Two generic forms of pipelined protocols: go-Back-N, selective repeat 11 Go-Back-N Sender: k-bit seq # in pkt header “window” of up to N, consecutive unack’ed pkts allowed ACK(n): ACKs all pkts up to, including seq # n -- cumulative ACK may receive duplicate ACKs (see receiver) timer for each in-flight pkt timeout(n): retransmit pkt n and all higher seq # pkts in window i.e., go back to n 12 GBN in action Go back to 2 Window size, N = 4 13 Go-Back-N Do you see potential problems with GBN? Consider high-speed links with long delays (called large bandwidth-delay product pipes) GBN can fill that pipe by having large N many unACKed pkts could be in the pipe A single lost pkt could cause a re-transmission of a huge number (up to N) of pkts waste of bandwidth Solutions?? 14 Selective Repeat receiver individually acknowledges all correctly received pkts buffers pkts, as needed, for eventual in-order delivery to upper layer sender only resends pkts for which ACK not received sender timer for each unACKed pkt sender window N consecutive seq #’s again limits seq #s of sent, unACKed pkts 15 Selective repeat: sender, receiver windows 16 TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 point-to-point: full duplex data: one sender, one receiver bi-directional data flow in same connection MSS: maximum segment size reliable, in-order byte stream: no “message boundaries” pipelined: connection-oriented: handshaking (exchange of control msgs) init’s sender, receiver state before data exchange TCP congestion and flow control set window size send & receive buffers flow controlled: socket door application writes data application reads data TCP send buffer TCP receive buffer sender will not overwhelm receiver socket door segment 17 TCP segment structure 32 bits URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) source port # dest port # sequence number acknowledgement number head not UA P R S F len used checksum Receive window Urg data pnter Options (variable length) counting by bytes of data (not segments!) # bytes rcvr willing to accept application data (variable length) 18 TCP reliable data transfer TCP creates rdt service on top of IP’s unreliable service Pipelined segments Cumulative acks TCP uses single retransmission timer Retransmissions are triggered by: timeout events duplicate acks Initially consider simplified TCP sender: ignore duplicate acks ignore flow control, congestion control 19 TCP sender events: data rcvd from app: timeout: Create segment with seq # retransmit segment that caused timeout seq # is byte-stream number of first data byte in segment start timer if not already running (think of timer as for oldest unacked segment) expiration interval: TimeOutInterval restart timer Ack rcvd: If acknowledges previously unacked segments update what is known to be acked start timer if there are outstanding segments 20 NextSeqNum = InitialSeqNum SendBase = InitialSeqNum loop (forever) { switch(event) event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data) TCP sender (simplified) event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } } /* end of loop forever */ 21 TCP: retransmission scenarios Host A X loss Sendbase = 100 SendBase = 120 SendBase = 100 time SendBase = 120 lost ACK scenario Host B Seq=92 timeout Host B Seq=92 timeout timeout Host A time premature timeout 22 TCP retransmission scenarios (more) timeout Host A Host B X loss SendBase = 120 time Cumulative ACK scenario 23 TCP Round Trip Time and Timeout If TCP timeout is too short: premature timeout unnecessary retransmissions too long: slow reaction to segment loss Q: how to set TCP timeout value? Based on Round Trip Time (RTT), but RTT itself varies with time! We need to estimate current RTT RTT Estimation SampleRTT: measured time from segment transmission until ACK receipt ignore retransmissions SampleRTT will vary, want estimated RTT “smoother” average several recent measurements, not just current SampleRTT 24 TCP Round Trip Time and Timeout EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT Exponential weighted moving average influence of past sample decreases exponentially fast typical value: = 0.125 25 Example RTT estimation: RTT: gaia.cs.umass.edu to fantasia.eurecom.fr 350 RTT (milliseconds) 300 250 200 150 100 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 time (seconnds) SampleRTT Estimated RTT 26 TCP Round Trip Time and Timeout Setting the timeout EstimtedRTT plus safety margin large variation in EstimatedRTT -> larger safety margin first estimate how much SampleRTT deviates from EstimatedRTT: DevRTT = (1-)*DevRTT + *|SampleRTT - EstimatedRTT| (typically, = 0.25) Then set timeout interval: TimeoutInterval = EstimatedRTT + 4*DevRTT 27 Fast Retransmit Time-out period often relatively long: long delay before resending lost packet Detect lost segments via duplicate ACKs. Sender often sends many segments back-toback If segment is lost, there will likely be many duplicate ACKs. If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost: fast retransmit: resend segment before timer expires 28 TCP Connection Management: opening TCP: 3-way handshake Step 1: client host sends TCP SYN segment client to server specifies initial seq # conn. no data request server conn. granted Step 2: server host receives SYN, replies with SYNACK segment server allocates buffers specifies server initial seq. # Step 3: client receives SYNACK, replies with ACK segment, which may contain data A. SYN Flood DoS attack Q. How would a hacker exploit TCP 3-way handshake to bring a server down? 29 TCP Connection Management: closing Step 1: client end system sends TCP FIN segment to server client server closing Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN Enters “timed wait” – may need to re-send ACK to received FINs timed wait Step 3: client receives FIN, replies with ACK closing closed closed Step 4: server, receives ACK Connection closed 30 TCP Connection Management TCP server lifecycle TCP client lifecycle 31 TCP Flow Control receive side of TCP connection has a receive buffer: flow control sender won’t overflow receiver’s buffer by transmitting too much, too fast speed-matching service: matching the send rate to the receiving app’s drain rate app process may be slow at reading from buffer 32 TCP Flow control: how it works Rcvr advertises spare room by including value of RcvWindow in segments Sender limits unACKed data to RcvWindow (Suppose TCP receiver discards out-of-order segments) spare room in buffer guarantees receive buffer doesn’t overflow = RcvWindow = RcvBuffer-[LastByteRcvd LastByteRead] 33 Congestion Control Congestion: sources send too much data for network to handle different from flow control, which is e2e Congestion results in … lost packets (buffer overflow at routers) • more work (retransmissions) for given “goodput” long delays (queueing in router buffers) • Premature (unneeded) retransmissions Waste of upstream links’ capacity • Pkt traversed several links, then dropped at congested router 34 Approaches towards congestion control Two broad approaches towards congestion control: End-end congestion control: no explicit feedback from network congestion inferred from end-system observed loss, delay approach taken by TCP Network-assisted congestion control: routers provide feedback to end systems single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM) explicit rate sender should send at 35 TCP congestion control: Approach Approach: probe for usable bandwidth in network increase transmission rate until loss occurs then decrease Additive increase, multiplicative decrease (AIMD) congestion window Saw tooth behavior: probing for bandwidth Rate (CongWin) 24 Kbytes 16 Kbytes 8 Kbytes time time 36 TCP Congestion Control Sender keeps a new variable, Congestion Window (CongWin), and limits unacked bytes to: LastByteSent - LastByteAcked min {CongWin, RcvWin} For our discussion: assume RcvWin is large enough Roughly, what is the sending rate as a function of CongWin? Ignore loss and transmission delay Rate = CongWin/RTT (bytes/sec) So, rate and CongWin are somewhat synonymous 37 TCP Congestion Control Congestion occurs at routers (inside the network) Routers do not provide any feedback to TCP How can TCP infer congestion? From its symptoms: timeout or duplicate acks Define loss event ≡ timeout or 3 duplicate acks TCP decreases its CongWin (rate) after a loss event TCP Congestion Control Algorithm: three components AIMD: additive increase, multiplicative decrease slow start Reaction to timeout events 38 AIMD additive increase: (congestion avoidance phase) increase CongWin by 1 MSS every RTT until loss detected TCP increases CongWin by: MSS x (MSS/CongWin) for every ACK received Ex. MSS = 1,460 bytes and CongWin = 14,600 bytes With every ACK, CongWin is increased by 146 bytes multiplicative decrease: cut CongWin in half after loss congestion window CongWin 24 Kbytes 16 Kbytes 8 Kbytes time 39 TCP Slow Start When connection begins, CongWin = 1 MSS Example: MSS = 500 bytes & RTT = 200 msec initial rate = CongWin/RTT = 20 kbps available bandwidth may be >> MSS/RTT desirable to quickly ramp up to respectable rate Slow start: When connection begins, increase rate exponentially fast until first loss event. How can we do that? double CongWin every RTT. How? Increment CongWin by 1 MSS for every ACK received 40 TCP Slow Start (cont’d) Increment CongWin by 1 MSS for every ACK Host B RTT Host A Summary: initial rate is slow but ramps up exponentially fast time 41 Reaction to a Loss event TCP Tahoe (Old) Threshold = CongWin / 2 Set CongWin = 1 Slow start till threshold Then Additive Increase // congestion avoidance TCP Reno (most current TCP implementations) If 3 dup acks // fast retransmit • Threshold = CongWin / 2 • Set CongWin = Threshold // fast recovery • Additive Increase Else // timeout • Same as TCP Tahoe 42 Reaction to a Loss event (cont’d) 3 dup acks Why differentiate between 3 dup acks and timeout? 3 dup ACKs indicate network capable of delivering some segments timeout indicates a “more alarming” congestion scenario 43 TCP Congestion Control: Summary Initially Threshold is set to large value (65 Kbytes), has no effect CongWin = 1 MSS Slow Start (SS): CongWin grows exponentially till a loss event occurs (timeout or 3 dup ack) or reaches Threshold Congestion Avoidance (CA): CongWin grows linearly 3 duplicate ACK occurs: Threshold = CongWin/2; CongWin = Threshold; CA Timeout occurs: Threshold = CongWin/2; CongWin = 1 MSS; SS till Threshold 44 TCP Throughput Analysis Understand the fundamental relationship between Packet loss probability, RTT, and TCP performance (throughput) We present simple model, with several assumptions Yet it still provides useful insights See Ch 5 of [HJ04] for a summary of more detailed models with references to the original papers 45 TCP Throughput Analysis Any TCP model must capture Window Dynamics (internal and deterministic) • Controlled internally by the TCP algorithms. • Depends on the particular flavor of TCP • We assume TCP Reno (the most common) Packet Loss Process (external and uncertain) • Models the aggregate of network conditions at all nodes in the TCP connection path • Typically modeled as a Stochastic Process with probability p that a packet loss occurs • TCP responds by reducing the window size We usually analyze the steady state Ignore the slow start phase (transient) Although many connections finish within slow start, because they send only a few kilobytes 46 Notations X(t): Throughput at time t (transmission rate) W(t): window size at time t RTT: Round Trip Time X(t) = W(t)/RTT What does the above equation implicitly assume? Increasing X(t) has negligible effects on the queuing delay in the network RTT remains constant 47 Simple (Periodic) Model loss occurs Packet losses occur with constant probability p W TCP window starts at W/2 grows to W, then halves, repeat forever … W/2 period time (RTT) W(t) packets transmitted each RTT W(t+1) = W(t) + 1 each round until a loss occurs 48 Simple (Periodic) Model T Compute the steady state throughput as a function of average loss probability p. Average # of Packets Sent During a Period 1 / p X ( p) Period Length T 49 Simple (Periodic) Model T T: period between detecting packet losses T = RTT * W /2 Now, we find W as a function of p. How? Compute the number of packets sent during a period and equate it to 1/p. (Size of the green area): W/2 * (W/2 + W) / 2 = 1/p W = sqrt(8/3p) 50 Simple (Periodic) Model Inverse Square-Root-p Law 1 X ( p) RTT 3 2p TCP throughput is inversely proportional to RTT and square root of packet loss probability p 51 In More Realistic Models … Packet loss probability is not constant and is bursty Consider effect of duplicate ACKs and Timeouts Consider receiver window limit 52