* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download (PHI) - KAIST
Piggybacking (Internet access) wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Computer network wikipedia , lookup
Distributed firewall wikipedia , lookup
Network tap wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Airborne Networking wikipedia , lookup
(Phi) Timothy Roscoe, Joseph M. Hellerstein, Brent Chun, Nina Taft, Petros Maniatis, Ryan Huebsch, Tyson Condie, Boon Thau Long Intel Research at Berkeley and U.C. Berkeley (much input from Tom Anderson, Vern Paxson, Larry Peterson, Scott Shenker, Ion Stoica, and David Wetherall) Intel Research 1 Lessons from PlanetLab… What have we learned from PlanetLab? We understand how to build robust large-scale systems E.g. structured overlay networks that scale to the planet What have we learned about the state of the Internet? It’s brittle. It’s unpredictable. It’s doesn’t know what’s happening. It’s afraid. Intel Research 2 The brittleness of the Internet Security systems are afraid of the unknown. Everything new is unknown. Everything new is a threat. Better to shut it down now. Users really don’t comprehend the problems (and why should they?) Not exactly easy to understand Very little information available Intel Research 3 The brittleness of the Internet Performance is unpredictable Failures, bottlenecks, congestion, misconfigurations, etc. It appears overlays can do better Provided they can measure the network. This shouldn’t work, but it does. IP protocols assume no information is available about current network state. Intel Research 4 Research trends and directions Extensive network measurement / modelling Distributed security solutions Distributed performance diagnosis Machine learning across networks Measurement-based overlays Better Internet protocols Network visualizations Intel Research 5 What’s missing? Applications and services User awareness ? Measurement, monitoring, logging, etc. of the real Internet Intel Research 6 An “Information Plane” for the Internet Continuous queries over distributed network state, available to all end systems Integrate data from: Backbone monitoring Router configuration Network state databases (e.g. RouteViews) Security systems (Firewalls, DShield, Autograph, etc.) End-system monitoring Intel Research 7 The big picture sensor INTERNET Types of Clients End users End applications Overlays Types of sensors sensor sensor End-systems Backbone monitors Routers Network databases Firewall logs Intel Research 8 The big picture query plan query plan sensor disseminate query sensor sensor sensor sensor Intel Research 9 The big picture query execution query execution Answer sensor Query results sensor sensor sensor sensor Intel Research 10 Implications of success Short-term: Enable & Connect measurement & security researchers E.g. “Live DShield”, top 10 IP address result from Barford et.al Promote user-awareness through downloadable tools Medium-term: Provide global network knowledge for planetary-scale applications & overlays E.g. Resource discovery on PlanetLab, OpenDHT could exploit NW link information Long-term: Kick off a new generation of Network-Aware Internet Protocols E.g. Host-based source routing solutions Intel Research 11 Phi goals Create the missing piece of the information plane by building a scalable, distributed dataflow engine for processing continuous queries in-network Data tuples are Routed between nodes along a dataflow graph Processed at nodes (filtering, aggregation, data reduction, correlation, result dissemination) Declarative Queries Abstract Dataflow (Query Plan) Physical Dataflow in Overlay Network Physical Network Intel Research 12 The hard problems Scale: Millions of sources, sinks, queries Linear scaling on a n3 problem : need to factor out n2 redundant communication & computation Fidelity & Security Bad inputs: data poisoning, perturbed computations Bad outputs: launchpads, vulnerability detection Efficiently embedding analysis algorithms in network topologies Data must be combined (hence moved around the network) according to the distributed analysis algorithm Intel Research 13 The rest of the talk Where we are today PIER: distributed relational query processor Single query, many sources, many sinks Deployed on PlanetLab for the last 12 months Where we intend to go P2: full dataflow engine with multiquery scaling Topological Fault Tolerance Develop embeddings of distributed analysis algorithms Intel Research 14 Key technology: Structured overlay networks (DHTs) • E.g. Chord, Pastry, Tapestry, CAN, Kademlia... • Flat, sparse ID space (e.g. 160-bit identifiers) • Routing in log(n) hops routing to the owner of any key • Based on “interesting” routing graphs Intel Research 15 What can DHTs do? • Content-Based Routing – i.e. send a message to a key – Equivalent to hashing a key to a node • Storage – Storing values in the network under a key • Tree construction – Formed by routing to a key from all nodes 0 15 1 14 2 13 3 12 4 11 5 10 6 9 8 7 Intel Research 16 What can DHTs do? • Content-Based Routing – i.e. send a message to a key – Equivalent to hashing a key to a node • Storage – Storing values in the network under a key • Tree construction – Formed by routing to a key from all nodes 0 15 1 14 2 13 3 12 4 11 5 10 6 9 8 7 Intel Research 17 Using DHTs in Phi Query Dissemination (trees) Hierarchical Aggregation (trees and storage) Indexing (routing and storage) Range Indexing Substrate (routing and storage) Hash-partitioned parallelism (routing) Hash tables for group-by, join (storage) Intel Research 18 Bamboo: our DHT (Sean Rhea) Pastry-style routing Epidemic propagation of leaf sets, routing tables Recursive routing Adaptive timeouts based on continuous measurement Highly robust under churn Tested to ~1000 nodes PlanetLab, ModelNet Intel Research 19 PIER: a relational query engine Data is tuples in named tables Tables exist on nodes Relational operators: Selection Projection Join (correlate, intersect, match) Aggregation (summarize, compress, group by) Also has recursive queries Can query topological structures Intel Research 20 PIER architecture Declarative Queries Query Plan Overlay Network Physical Network Network Monitoring Other User Apps Applications Query Optimizer Catalog Manager Core Relational Execution Engine PIER DHT Wrapper Storage Manager IP Network Overlay Routing DHT Network Intel Research 21 Experience so far • PIER has run on PlanetLab for about a year • Querying PlanetLab sensors, in particular Snort events Intel Research 22 Experience so far Use of DHT for query processing by-and-large works Need story for NATs, non-transitive connectivity Node heterogeneity Multiresolution emulation is essential Simulation, emulation (ModelNet), deployment (PlanetLab) Simple results are quite compelling E.g. top 10 attackers demo for IDF Intel Research 23 P2 Build on PIER techniques Reimplementation in C++ Extend beyond relational operators Synopses/sketches, junction trees, Bayes nets, PCA,.. Address multiquery optimization (2 factors of n) Investigate using the overlay for data fidelity Codify communication and computation for a variety of algorithms Intel Research 24 Multiquery optimization Granular lineage for data inputs and intermediate data products Telegraph: Tuple lineage bitmaps (operators & queries) Scaling via cluster analysis: bits name sets of queries/operators 0 15 14 Embedded in the network Multi-operator query mesh of multiple trees is formed 1 2 13 3 12 4 Optimizations in routing & replicating intermediate results 11 5 Scaling result dissemination 10 Multicast from within the MQO mesh 6 9 8 7 A many-source/many-sink multicast problem Tie-ins with MQO: Can choose the multicast source points as part of query optimization Intel Research 25 Ephemeral overlay topologies DHTs emulate InterConnect Networks These have deep algebraic structure Based on group-theoretic graph constructs Rich families of such graphs with different properties We can exploit the structure (i.e. constraints) of the overlay To embed complex computations with efficient communication To reason about the “influence” of malicious nodes in the network We could choose ephemeral topologies to suit specific analysis algorithms Intel Research 26 Topological Fault Tolerance Fidelity and Security influence: 8 nodes Diversifying Influence Reundant computation (a la process pairs) applied in an adversarial environment Structured overlay topologies admit analysis of spheres of influence Two Dimensions to Diversity In Space: Multiple trees, different roots In Time: Reassign node IDs to change spheres of influence influence: 1 node Intel Research 27 Design Patterns for NetworkEmbedded Data Analysis Taxonomize and abstract comm patterns for in-network analyses binomial tree We already understand how some of these map to DHTs 0 15 Up-tree aggregation (AVG, SUM, etc.) 14 Up-a-special-tree aggregation (Haar Wavelets) Arbitrary dissemination (e.g. MIN, MAX, Gibbons-style duplicate-insensitive sketching) 1 2 13 3 12 Lessons from sensornet arena (topology-oblivious) First cut taxonomy of aggrs (TAG) 4 11 5 10 Junction Trees for distributed inference: Up-then-down a special tree 6 9 8 7 What all this means Algebraic properties of comm patterns dictate an extensibility API Expose only enough of alg logic to achieve optimization, code reuse, resource control Efficient comm patterns can drive research into new analysis techniques Intel Research 28 Collaborations • Network Measurement Community – End-host packet traces (KAIST VMS, NETI@home, DIMES) – Firewall log repositories (Snort, Bro, DShield, Domino) – Backbone monitors and repositories (CAIDA, CoMo) – Network tomography (RouteViews, NetTelescope) • Distributed Algorithms Community – Summarization / data reduction (IRP, Bell Labs) – Inference / anomaly detection (CMU, UCB, IR) – Signature detection (IRP, EarlyBird) – Joins / correlations (UCB, ICSI) Value proposition: reusable backplane for – – – – Real-time data summarization & transport Data validation (against other sources) Correlation with other data Algorithm design and deployment Intel Research 29 The Plan (1) Build a distributed peer-to-peer dataflow engine Define protocols: Tuple transfer protocol Dataflow signalling protocol Instantiate the “right” overlay Address multiquery optimization Rich aggregations/summarization and joins/correlations Explore topological diversity in time and space Identify efficient, realizable families Establish feasible timescales for topology construction Apply to both topological FT and net-embedded computations Intel Research 30 The Plan (2) Deploy an initial information plane, starting on PlanetLab and building out Multiple classes of data sources: End-system monitoring (e.g. Neti@home) Link monitors (e.g. CoMo) Network Telescopes (dark address space) Databases / archives (e.g. Routeviews) Build example applications ourselves Implement example analysis operators: wavelet, PCA, etc. Enable others to more easily build applications Client libraries Query handholding Intel Research 31 Many thanks! [email protected] Intel Research 32