Bridging the Gap Jerry Sobieski Director, Research Initiatives Mid-Atlantic Crossroads (MAX) Presented to NORDUnet 2006 Gothenburg, Sweden September 26, 2006 Optical Networking Research: Decline or Resurgence? Jerry Sobieski Director, Research Initiatives Mid-Atlantic Crossroads (MAX) Presented to BroadNets 2006 San Jose, CA USA Oct 3, 2006 Scoping the Problem: • “Grid applications will incorporate in excess of 100,000 processors within 5 years.” – Dr. Larry Smarr, “On Vector” Workshop, UCSD Feb 2006 • “The Global Information Grid will need to store and access exabytes of data on a realtime basis by 2010” – Dr. Henry Dardy, Optical Fiber Conference, Los Angeles, CA USA, Mar 2006 • “Each LHC experiment foresees a recorded raw data rate of 1 to several PetaBytes/year” – Dr. Harvey Neuman (Cal Tech) • “US Bancorp backs up 100 TB financial data every night – now.” – David Grabski (VP Information Tech. US Bancorp), Qwest High Performance Networking Summit, Denver, CO. USA, June 2006. • “The VLA facility is now able to generate 700 Gbps of astronomical data and will reach 3.2 Terabits per second by 2009.” – Dr. Steven Durand, National Radio Astronomy Observatory, E-VLBI Workshop, MIT Haystack Obs., Sep 2006. Large Scale [Distributed] Cluster Computing • The Smarr example: 100,000 Processors… – Using today’s dual core technology = 50K nodes – Simple 1:1 bisection = 25,000 point-to-point links – A single Gigabit Ethernet interface per node… – 25 Terabits/second total bisection bandwidth – unidirectional. • 1 gigabit/node burst = 25 Tbps aggregate => 2500 seconds (40 minutes) funneled thru a 10 GE • 1 MB/node * 25K nodes = 25 GB burst = 200 Gbits = 20 seconds over 10GE • A single 9000B packet/node = 225MB = 1.8 Gbits = 180 ms on a 10GE … 10x slower than disk… Real applications employ local storage, but they typically have much more complex inter-processor communications patterns and requirements… Large Scale [Distributed] Cluster Computing • Another example: 10K processor compute cluster • 4 Gbyte/processor memory, • 1GE network interface per processor, – Checkpoint / mirror this data to a remote storage facility 100 km away. – Burst capability: • 10K processors * 1 Gbps = 10 Tbps (!) – Data transfer time: • Per processor: 4GB @ 1Gbps = 32 seconds • Aggregate: 4GB * 10K = 40 TeraBytes total data to be moved – 40 TBytes @ 40 Gbps = 2.2 hours – 40 TBytes @ 100 Gbps => 53 mins Clearly, these issues will be challenging … • Parallel and distributed clusters are incorporating more nodes faster than Moore’s Law is reducing their size.. – How do you power 100K processors? @ 200W / node = 50K*200W… = 10 Megawatts ! (~2000 homes…) – How big a room would you need? @160 cpus/rack=100K/160… ~= 600 racks ! (just for the cpu blades…what about disks, comms, etc) – How do you protect the investment? @$1000/node* 50K nodes = $50,000,000 USD • Centralized clusters of this magnitude will be rare…Large super clusters will be constructed from confederations of “lesser” clusters, distributed over a large geographic space and across many organizations. How will they communicate? • This is the network challenge facing the computational research community...and a noble quest for the [Optical] networking community. • What if in 10 years we are working with much larger collaborating virtual organizations? To study the application space… • Take an application example and try to apply Optical Network Technologies to it… – This should help identify where existing technology or near term experimental models fall short… – And where advanced Optical and Photonic technologies could [maybe] bring something to the table… • Electronic Very Long Baseline Interferometry… – “E-VLBI” is a radio astronomy technique for studying both the cosmos *and* the earth. Application Specific Collaboratories The E-VLBI poster child example: VLSR VLSR Mark 5 Correlator/Compute Cluster VLSR Mark 5 Global R&E Hybrid Infrastructure VLSR Visualization station IGrid 2005 E-VLBI Application Specific Network The Very Large Array (VLA) • 27 Antennae 120 Gbps each… = 3.2 Terabits/sec The Technology Bridge To use the network transport and switching technologies as an example … • Clearly (at least IMHO ) 40 Gbps or even 100 Gbps transmission technologies will not be adequate to meet the [e-science] needs 4 or 5 years from now •We need to think… bigger. Can our networks go faster? • What if we could move data at Terabits per second rather than Gigabits per second? The Smarr Example reduces to: 25K nodes burst 1 Gbit = 25 Tbit burst = 25 seconds @ 1 Tbps -> 6 seconds @ 4 Tbps -> 1 second @ 25 Tbps Now we’re talkin! Are Multi-Terabit Link Speeds Possible? • Dr. Toshio Morioka (NICT) has generated 1000 waves in the C band on 6 GHz spacing, each wave modulated at 2.5 Ghz. -> 2.5 Tbps. • Dr. Keren Bergman (Columbia University) has demonstrated a photonic switch that can forward packets at 160 Gbps - per port. – The architecture can [theoretically] scale to hundreds of Tbps/port. • Many other groups are working on similar OPS technologies… – 320 Gbps per wave is possible – Other approaches to photonic data switching and all photonic wavelength switching and translation are being explored • There remains much to be done yet to mature these technologies…but these examples show that these technologies are indeed possible and are potential bridge material. Research Topics • Near Term: – Integration of “networking” with “optics” • – – • Rapid/remote deployment systems Satellite Communications, space based applications (e.g. E-VLBI) Optical? Photonic device integration – – – – – • Viable ideas, demonstrable basic technologies, Needs: Refining transmission and switching technologies for wide band udwdm… High speed EO/OE interfaces Needs: integration and architectural definition… Free Space Optics – – • Hybrid optical/photonic networking Dynamic wavelength generation Photonic Packet Switching (!) – – – – • Too often the physical layer and optical device research community does not understand the higher layer networking requirements, so low layer optical integration with upper layer network functionality is often missed UDWDM integration for transmission devices High speed Optical/Photonic memories and buffering Photonic regeneration and Wavelength translation Photonic logic and ultra high speed switching devices Ultra high speed burst mode receivers and amplifiers Optical Fiber research – – – Improved dispersion properties across a wider band Reduced (or enhanced) non-linearity.. Micro-structured fiber… Research Agenda: Old news… (discussion to be convened in the bar after the talk…) – QoS (there’s no free lunch!) • We seem to be continually trying to squeeze the last ounce of performance from shared BE networks ... – Assertion: not a cost effective approach » Need to build networks that do not promise magic() • For users or affinity groups that need deterministic performance, pre-allocate and dedicate the network resources required (apriori lightpath provisioning) – There are good light weight reservation protocols… – Routing and PCE (today) are focused on path selection – not path optimization… • For users who do not have known traffic characteristics, use buffered packet switching and upper layer protocols for congestion mgmt and guaranteed delivery. – Work still needed in KSP forwarding… – ULH land-based framing-agnostic photonic networks… • Not needed in the land based regimen… • Applicable to undersea cable systems, possible space based application.