* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Path Splicing with Network Slicing
Distributed operating system wikipedia , lookup
Network tap wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Deep packet inspection wikipedia , lookup
Multiprotocol Label Switching wikipedia , lookup
Airborne Networking wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Dijkstra's algorithm wikipedia , lookup
Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Can the Internet be “Always On”? “It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder. Over time, our list will evolve. It should be: 1. Robust and available. The network should be as robust, fault-tolerant and available as the wire-line telephone network is today. 2. … • Various studies (Paxson, etc.) show the Internet is at about 2.5 “nines” • More “critical” (or at least availabilitycentric) applications on the Internet • At the same time, the Internet is getting more difficult to debug – Scale, complexity, disconnection, etc. 2 Availability of Other Services • Carrier Airlines (2002 FAA Fact Book) – 41 accidents, 6.7M departures – 99.9993% availability • 911 Phone service (1993 NRIC report +) – 29 minutes per year per line – 99.994% availability • Std. Phone service (various sources) – 53+ minutes per line per year – 99.99+% availability 3 Threats to Availability • Natural disasters • Physical device failures (node, link) – Drunk network administrators 4 Threats to Availability • Natural disasters • Physical failures (node, link) – Drunk network administrators – Cisco bugs • • • • • Misconfiguration Mis-coordination Denial-of-service (DoS) attacks Changes in traffic patterns (e.g., flash crowd) … 5 Availability: Two Aspects • Reliability: Connectivity in the routing tables should approach the that of the underlying graph – If two nodes s and t remain connected in the underlying graph, there is some sequence of hops in the routing tables that will result in traffic • Recovery: In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path 6 Where Today’s Protocols Stand • Reliability – Approach: Compute backup paths – Challenge: Many possible failure scenarios! • Recovery – Approach: Switch to a backup when a failure occurs – Challenge: Must quickly discover a new working path – Meanwhile, packets are dropped, reordered, etc. 7 Multipath: Promise and Problems s t • Bad: If any link fails on both paths, s is disconnected from t • Want: End systems remain connected unless the underlying graph has a cut 8 Path Splicing: Main Idea Compute multiple forwarding trees per destination. Allow packets to switch slices midstream. s t • Step 1 (Perturbations): Run multiple instances of the routing protocol, each with slightly perturbed versions of the configuration • Step 2 (Slicing): Allow traffic to switch between instances at any node in the protocol 9 Outline • Path Splicing – Achieving Reliabile Connectivity • Generating slices • Constructing paths – Forwarding – Recovery • Properties – High Reliability – Bounded Stretch – Fast recovery • Ongoing Work 10 Generating Slices • Goal: Each instance provides different paths • Mechanism: Each edge is given a weight that is a slightly perturbed version of the original weight – Two schemes: Uniform and degree-based “Base” Graph Perturbed Graph 3 s 3 1.5 t 3 s 5 1.25 4 1.5 t 3.5 11 How to Perturb the Link Weights? • Uniform: Perturbation is a function of the initial weight of the link • Degree-based: Perturbation is a linear function of the degrees of the incident nodes – Intuition: Deflect traffic away from nodes where traffic might tend to pass through by default 12 Constructing Paths • Goal: Allow multiple instances to co-exist • Mechanism: Virtual forwarding tables a s dst b t next-hop Slice 1 t a Slice 2 t c c 13 Forwarding Traffic • Packet has shim header with forwarding bits • Routers use lg(k) bits to index forwarding tables – Shift bits after inspection • To access different (or multiple) paths, end systems simply change the forwarding bits – Incremental deployment is trivial – Persistent loops cannot occur • Various optimizations are possible 14 Putting It Together • End system sets forwarding bits in packet header • Forwarding bits specify slice to be used at any hop • Router: examines/shifts forwarding bits, and forwards s t 15 Evaluation • Defining reliability • Does path splicing improve reliability? – How close can splicing get to the best possible reliability (i.e., that of the underlying graph)? • Can path splicing enable fast recovery? – Can end systems (or intermediate nodes) find alternate paths fast enough? 16 Defining Reliability • Reliability: the probability that, upon failing each edge with probability p, the graph remains connected • Reliability curve: the fraction of sourcedestination pairs that remain connected for various link failure probabilities p • The underlying graph has an underlying reliability (and reliability curve) – Goal: Reliability of routing system should approach that of the underlying graph. 17 Reliability Curve: Illustration Fraction of source-dest pairs disconnected Better reliability Probability of link failure (p) More edges available to end systems -> Better reliability 18 Reliability Approaches Optimal • Sprint (Rocketfuel) topology • 1,000 trials • p indicates probability edge was removed from base graph Reliability approaches optimal Average stretch is only 1.3 Sprint topology, degree-based perturbations 19 Recovery: Two Mechanisms • End-system recovery – Switch slices at every hop with probability 0.5 • Network-based recovery – Router switches to a random slice if next hop is unreachable – Continue for a fixed number of hops till destination is reached 20 20 Simple Recovery Strategies Work Well • Which paths can be recovered within 5 trials? – Sequential trials: 5 round-trip times – …but trials could also be made in parallel Recovery approaches maximum possible Adding a few more slices improves recovery beyond best possible reliability with fewer slices. 21 What About Loops? • Persistent loops are avoidable – In the simple scheme, path bits are exhausted from the header – Never switching back to the same • Transient loops can still be a problem because they increase end-to-end delay (“stretch”) – Longer end-to-end paths – Wasted capacity 22 Significant Novelty for Modest Stretch • Novelty: difference in nodes in a perturbed shortest path from the original shortest path Fraction of edges on short path shared with long path Example s d Novelty: 1 – (1/3) = 2/3 23 Splicing Improves Availability • Reliability: Connectivity in the routing tables should approach the that of the underlying graph – Approach: Overlay trees generated using random link-weight perturbations. Allow traffic to switch between them. – Result: Splicing ~ 10 trees achieves near-optimal reliability • Recovery: In case of failure, nodes should quickly be able to discover a new path – Approach: End nodes randomly select new bits. – Result: Recovery within 5 trials approaches best possible. 24 Extension: Interdomain Paths • Observation: Many routers already learn multiple alternate routes to each destination. • Idea: Use the forwarding bits to index into these alternate routes at an AS’s ingress and egress routers. default d alternate Splice paths at ingress and egress routers Required new functionality • Storing multiple entries per prefix • Indexing into them based on packet headers • Selecting the “best” k routes for each destination 25 Open Questions and Ongoing Work • How does splicing interact with traffic engineering? Sources controlling traffic? • What are the best mechanisms for generating slices and recovering paths? • Can splicing eliminate dynamic routing? 26 Conclusion • Simple: Forwarding bits provide access to different paths through the network • Scalable: Exponential increase in available paths, linear increase in state • Stable: Fast recovery does not require fast routing protocols http://www.cc.gatech.edu/~feamster/papers/splicing-hotnets.pdf 27 Network-Level Spam Filtering Spam: More Than Just a Nuisance • 75-90% of all email traffic – PDF Spam: ~11% and growing – Content filters cannot catch! • As of August 2007, one in every 87 emails constituted a phishing attack • Targeted attacks on the rise – 20k-30k unique phishing attacks per month – Spam targeted at CEOs, social networks on the rise 29 Problem #1: Content-Based Filtering is Malleable • Low cost to evasion: Spammers can easily alter features of an email’s content can be easily adjusted and changed • Customized emails are easy to generate: Contentbased filters need fuzzy hashes over content, etc. • High cost to filter maintainers: Filters must be continually updated as content-changing techniques become more sophistocated 30 Problem #2: IPs are Ephemeral • Spammers use various techniques to change the IP addresses from which they send traffic • Humans must notice changing behavior • Existing blacklists cannot stay up to date One technique: BGP Spectrum Ability ~ 10 minutes • Hijack IP address space • Send spam • Withdraw route 31 Our Approach: Network-Based Filtering • Filter email based on how it is sent, in addition to simply what is sent. • Network-level properties are more fixed – – – – Hosting or upstream ISP (AS number) Botnet membership Location in the network IP address block • Challenge: Which properties are most useful for distinguishing spam traffic from legitimate email? 32 SpamTracker: Main Idea and Intuition • Idea: Blacklist sending behavior (“Behavioral Blacklisting”) – Identify sending patterns that are commonly used by spammers • Intuition: Much more difficult for a spammer to change the technique by which mail is sent than it is to change the content 33 SpamTracker Design Approach • For each sender, construct a behavioral fingerprint • Cluster senders with similar fingerprints • Filter new senders that map to existing clusters Email IP x domain x time Lookup Score Collapse Cluster Classify 34 Building the Classifier: Clustering • Feature: Distribution of email sending volumes across recipient domains • Clustering Approach – Build initial seed list of bad IP addresses – For each IP address, compute feature vector: volume per domain per time interval – Collapse into a single IP x domain matrix: – Compute clusters 35 Clustering: Output and Fingerprint • For each cluster, compute fingerprint vector: • New IPs will be compared to this “fingerprint” IP x IP Matrix: Intensity indicates pairwise similarity 36 Classifying IP Addresses • Given “new” IP address, build a feature vector based on its sending pattern across domains • Compute the similarity of this sending pattern to that of each known spam cluster – Normalized dot product of the two feature vectors – Spam score is maximum similarity to any cluster 37 Summary • Spam is on the rise and becoming more clever – 12% of spam now PDF spam. Content filters are falling behind. Also becoming more targeted • IP-Based blacklists are evadable – Up to 30% of spam not listed in common blacklists at receipt. ~20% remains unlisted after a month – Spammers commonly steal IP addresses • New approach: Behavioral blacklisting – Blacklist how the mail was sent, not what was sent – SpamTracker being deployed and evaluated by a large spam filtering company 38