Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 20 Nov. 2 Scalable Reliable Multicast • Became a hot topic with the introduction of Berkeley’s SRM • Idea is to provide a primitive that – Has best-effort reliability (tries to repair problems… doesn’t promise) – Is receiver driven • Sender doesn’t know who is in the group • Receiver must solicit repairs – Has localized costs SRM goal compared to virtual synchrony • Virtual synchrony provides a strong guarantee – Either all processes that stay operational receive the message – Or none does • But this seems to limit scalability • Virtual synchrony can be fast but apparently is limited to groups of about 50 processes or fewer Stock Exchange Problem: Vsync. multicast is too “fragile” Most members are healthy…. … but one is slow throughput (msgs/sec) Effect of Perturbation 250 200 150 100 50 0 Virtual Synchrony Protocol Ideal Pbcast Protocol Actual 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 amount perturbed Figure 1: Multicast throughput in an 8-member group perturbed by transient failures The problem gets worse as the system scales up Virtually synchronous Ensemble multicast protocols average throughput on nonperturbed members 250 group size: 32 group size: 64 group size: 96 200 150 100 50 0 0 0.1 0.2 0.3 0.4 0.5 perturb rate 0.6 0.7 0.8 0.9 SRM • Unlike virtual synchrony the membership of the group will be loosely defined – Do it in the style of IP multicast – Receivers join at will • Receiver has a way to request missing data Application Level Framing • Really, a separate idea • But was bundled with SRM: – Application is assumed to be doing video multicast streaming – Perhaps it has its own way to recover lost data • So instead of retransmitting precisely the lost data… • … we solicit retransmission but let the application decide what to send in response Comparison • Virtual synchrony: – Strong membership – Each multicast has its own identity – Strong guarantees of ordering, synchronization, atomicity • SRM: – Weak membership – Multicast has a sequence number but retransmission determined separately – Weak guarantees Flow control • Virtual synchrony does flow control – Sender has some amount of space – If it runs low, new multicasts are stopped • SRM lacks any form of flow control • Sender just runs at whatever rate it may select, oblivious to network • In fact, SRM doesn’t even use acknowledgements from receiver to sender! How SRM works • Assumes that network has an IP multicast routing tree within it • So, IP multicast is reasonably cheap • SRM is built using broadcasts within this tree – One exception: so called “scalable session messages” – We’ll return to this later Basic algorithm • Application generates and numbers a packet • Sends it using IP multicast • Receiver receives it, puts it in order, delivers it … cost is one packet per link in the IP multicast tree, one send and one receive per process Basic algorithm: application perspective R S: Sender R: Receiver R : Network Router S R R R R Basic algorithm: network perspective R R S: Sender R: Receiver : Network Router S R R R R Things to notice • IP multicast lives “in the network” • SRM protocol lives in the application • There may be many more network nodes involves than application nodes • Similar observation applies to any protocol unless implemented by the router itself SRM error recovery • Suppose a packet is lost… • This will either happen – In the O/S near the receiver • Affects just one process – Or in the network or the O/S of the sender • Affects a subtree • In SRM, subtree might be very large Basic algorithm: network perspective R R S: Sender R: Receiver : Network Router S R R R R Receiver notices • Either it receives the next packet in the stream – In this case it notices a gap • Or it receives a “session message” – Again, this causes it to see a gap – Sent periodically A wave of discovery • Receivers have some distance from the sender • Can measure it in milliseconds • Close receivers notice the problem first. Basic algorithm: network perspective R R S: Sender R: Receiver : Network Router S R R R R Basic algorithm: network perspective R R S: Sender R: Receiver : Network Router S R R R R NAK Basic algorithm: network perspective R R S: Sender R: Receiver : Network Router S R R R R NAK NAK spreads like a wave • Other receivers see the NAK – It inhibits them from sending their retransmission request • Some upstream sender sees it too – It generates a repair NAK or Repair Flood • But why won’t all receivers at a given distance NAK simultaneously? • Or all senders retransmit? – In fact this would be a problem as protocol was described SRM borrows an idea from the Ethernet • Introduce a few milliseconds of randomness • Instead of sending the NAK after timeout … pick a delay and send after a randomly selected delay in time interval [ … +] • With any luck at all, first to NAK inhibits the others • Similarly for retransmissions Why will this work? • Ideally, packet loss is low – Hence zero or one losses in the entire tree – But many impacted receivers • With this scheme – Ideally, one NAK inhibits others – One retransmission, inhibits others Scalable Session Messages • In a large tree may see losses in different subtrees • Want to limit cost of – NAK – Repair • Do this using the regional TTL setting for IP multicast • Limits the propagation of the various messages, except for initial multicast of the message itself Experience? • SRM works well in networks with low loss rates • But if packet loss becomes higher – SRM overheads begin to soar – May see many solicitations, repairs on each network link – Eventually, this overloads the network • Example: with 100 receivers in a large network we easily provoked scenarios where each packet triggers 5 or 10 NAKs and a similar number of repairs Why is this bad? • All processes see these NAK and repair messages • Hence costs are global A quadratic cost! • Risk of a “multi-loss” rises with network size, roughly linearly • And cost of a NAK or repair is also linear • Could argue that SRM has an O(n2) mechanism buried within • In practice, dies around 50 processes… similar to vsync! Status? • In the Internet today, IP multicast is mostly disabled • If used at all, limited to individual companies – But even then is usually disabled • Internet mbone set aside for multicast • But has a rigid structure, used for a single Internet “video feed” Althernatives to SRM • RMTP was developed at Lucent – Reliable Multicast Transport Protocol • Supports 1-many file transfer • Uses notion of regional repair servers – Each region has a repair server that handles processes in its vicinity – Hierarchical repair architecture – Also has flow-control built in RMTP status • Experience suggests it works well – But placement of repair servers is very important – Also, not very helpful for arbitrary uses; most experience is with file transfer applications • An IETF standard as of 1999 PGM • A Cisco proposal • Idea is to put a form of reliable multicast right into the routers – Routers talk to each other – Recovery of lost packets occurs locally – But mechanism is very simple and offers no guarantees PGM • Cisco promoting it as a standard • But seems not to be taking off • To be useful, needs to work on all routers in a network – Cisco has a tunneling scheme to leapfrog small numbers of noncompliant routers LGM • Lightweight group multicast: another network-level proposal • Getting a lot of attention now • But no implementations in the field • And there are three or four other proposals Forward Error Correction • Can be used with any protocol • Idea is: – Instead of just sending data: • m0.. m1… m2… – Send an error correction code from time to time: • m0.. m1… m2… ECC012… m3… • Receiver can use code to correct for packet loss FEC continued • Example? – Suppose that ECC simply is XOR of prior k packets – Than with any k-1 of them and ECC we can reconstruct the missing packet – Hence would need to lose 2 to have an unrecoverable gap • Can use more powerful codes but they tend to be slower to compute FEC continued • With genuinely random packet loss, FEC works really well • No need to send a NAK or a repair • Good match to Internet router behavior – RED drops `em – FEC fixes em! • One idea: FEC plus congestion control equals SRM! Issues? • Dependency on a single route or single IP multicast forwarding tree – Routers rarely fail… – But if router becomes heavily loaded, may drop many packets – Similarly if routes change – c.f. Jahanian study Wishful thinking? • Recall our idea of a network routing scheme that would offer failureindependent redundant routes – (A,B) and (A’,B’) “independent” • If we had such a scheme for IP multicast… • … we could retransmit over it and would have a good chance of recovery even if network becomes very overloaded! IBM Gryphon Routers • Gryphon: high speed publishsubscribe project at IBM – Idea was to build hardware – Started with software on UNIX boxes… highly optimized • Special purpose “network” just for publish-subscribe applications – Targets stock markets, financial systems – Likely to have a substantial market – But expensive: dedicated infrastructure What will future bring? • Topic of much debate • Two broad schools of thought – Mine (pessimistic) – Theirs (optimistic) Mine? • Perhaps I’m too negative • But I suspect: – – – – The Internet is a mess QoS is not coming any time soon Multicast isn’t about to start working Situation broken and we need a whole new “web” design • But perhaps can reuse many pieces of existing architecture and technology • Much early evidence suggests this can work Theirs • The prevailing academic research perspective • They see – Diffsrv, multilevel queuing… – Various multicast mechanisms • And they expect – Internet telephony – Publish-subscribe hardware – Streaming media over IP • Vast riches for all!