Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computer networks Malathi Veeraraghavan Univ. of Virginia [email protected] Fall 2013 (updated Jan. 2014) • Funded projects (GRA openings) • • • • • NSF SDCI: 2 years left DOE HNTES: 4 years left (new grant awarded) NSF CC-NIE (new): 3 years NSF SCRP: 2 years left NSF JUNO: 3 years (just starting) • Applied orientation 1 Outline • Big picture • Four projects – What is the problem? – Why solve it? (Motivation) • Methods used – As a GRA, what would I do? • Processes & style 2 Big picture • Networks to support scientific research community – High-speed – Low-latency • Who is in the science community? – DOE Office of Science • Basic energy sciences, high-energy physics, fusion energy sciences, bio & environ. research – NSF Office of Cyber Infrastructure (OCI) 3 Both agencies (NSF OCI and DOE) support • Supercomputing centers – – – – nersc.gov olcf.gov alcf.gov XSEDE (NSF OCI) • High-speed networks – Backbone: ESnet, Internet2 – Campus and regional nets: DYNES 4 NSF Software Dev. for Cyber Infrastructure (SDCI) • Problem & motivation (what & why): 1. Climate scientists run simulations that require > 5000 cores • Intra-datacenter network identified as bottleneck (InfiniBand cluster: 72K cores) • MPI communications: need to reduce latency and variance in latency 2. Scientists move tera-to-peta byte sized files: move these fast • 100 Gbps: current state of the art in link speed but not throughput (software!) 5 DOE Hybrid Network Traffic Engineering System (HNTES) • Problem & motivation: – Find high-rate, large-sized (alpha) flows within a network and isolate – Why? • As link rates increase, spread between fastest flow and slowest flow increases • Fast flows can delay slow flows (user sees poor quality for real-time flows) • On links to providers: Service Level Agreements (SLAs) can be violated when fast flows appear 6 NSF Campus Cyberinfrastructure – Network Infrastructure & Engineering (CC-NIE) • Problem & motivation – Design protocols/apps to multicast data reliably to hundreds of receivers – Save network & computing resources when compared to unicast delivery from one sender to hundreds of receivers • Application: Weather data distribution – UCAR sends real-time weather data almost continuously to 170 institutions 7 NSF Scheduled Circuit Routing Protocol (SCRP) • Problem & motivation – Scientific networking community has been building out a new type of internetwork with circuits and virtual circuits (airlines) • why: service guarantees (think fedex) – Contrast with Internet (roadways) – Routing problem: what should one organization’s network tell another to enable path computation for circuits? 8 NeTS: JUNO: Collaborative Research: ACTION: Applications Coordinating with Transport, IP, and Optical Networks • This project is a joint collaboration with U. Texas at Dallas, and two universities in Japan • The UVA portion of the project will develop application and transport protocols for optical networks • Starting Feb. 1, 2014 9 Outline • Big picture • Four projects – What is the problem? – Why solve it? (Motivation) Methods used – As a GRA, what would I do? • Processes & style 10 Methods used: Stats • Science before engineering: – Theodore von Karman: • “Scientists study the world as it is; engineers create the world that never has been” – Data collection & statistics • Rely on contacts at DOE labs, universities, network operators for operational data • Write R programs to analyze procured data • Use fir research cluster for parallel computing • Skills needed: stats/R language/parallel 11 prog. Methods used: run experiments • Run existing software used by scientists to obtain measurements • Use national supercomputers and network testbeds – – – – – – NCAR Wyoming SC: MPI programs (climate) U. Utah Emulab ESnet 100G network testbed U. New Mexico: PROBE ExoGENI racks: OpenFlow switches DYNES: 10 high-performance hosts/switches across US • Skills needed: learn/run new software programs; write shell scripts; cron jobs; use rigorous scientific methods in executing expts. 12 Methods used: simulations • For NSF SCRP project – Problem requires large-scale thinking – Cannot implement – Cannot collect data as system does not yet exist – Then simulate • Skills needed: C++ programming, parallel programming, prob & stats, rigorous scientific methods 13 Methods used: engineering • Come up with engineering solutions for problems identified from scientific discovery through analysis of operational data and experimentally collected data • Implement software • Evaluate solutions on testbeds • Two key points – Exploratory not confirmatory (watch out for bias) – Always quantify the negative! 14 Methods: Write papers • Conference first, then journal • Collab Web site for grad students – – – – – how to organize a paper hierarchical think of reviewers know your community’s work literature search (when?) 15 Outline • Big picture • Four projects – What is the problem? – Why solve it? (Motivation) • Methods used – As a GRA, what would I do? Processes & style 16 Processes • Goals as a graduate student – Focus on next step • quals • proposal defense • dissertation – – – – Want Masters en route: MCS or MS Career goal: academics or industry Community, community, community Ask the process question for each step 17 Advising style • Close collaboration with GRA – Research grants have milestones/deliverables – Generate ideas/papers/software that others use – who is the customer? what is the product? • New ideas from GRA – Develop proposals: Security for DHS; Vehicular • Communicate – be open • Full-time access (no substitute for hard work) – two-way commitment 18 Summary • High-speed, low-latency networking for – Scientific applications: scientists – Network utilization: providers, campus, datacenter – Bottom-up: new optical comm. technologies • Techniques used – Obtain operational data/experimental measurements and analyze statistics – find the real problem – Develop engineering solution – Evaluate through experiments or simulations 19