Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Massive Data Transfers 1 Copyright AARNet 2005 George McLaughlin Mark Prior AARNet “Big Science” projects driving networks • Billion dollar globally funded projects Massive data transfer needs • 2 Copyright AARNet 2005 Large Hadron Collider – Coming on-stream in 2007 – Particle collisions generating terabytes/second of “raw” data at a single, central, well-connected site – Need to transfer data to global “tier 1” sites. A tier 1 site must have a 10Gbps path to CERN – Tier 1 sites need to ensure gigabit capacity to the Tier2 sites they serve Square Kilometre Array – Coming on-stream in 2010? – Greater data generator than LHC – Up to 125 sites at remote locations, data need to be brought together for correlation – Can’t determine “noise” prior to correlation – Many logistic issues to be addressed From very small to very big 3 Copyright AARNet 2005 Scientists and Network Engineers coming together • HEP community and R&E network community have figured out mechanisms for interaction – probably because HEP is pushing network boundaries • eg the ICFA workshops on HEP, Grid and the Global Digital Divide bring together scientists, network engineers and decision makers – and achieve results • http://agenda.cern.ch/List.php 4 Copyright AARNet 2005 What’s been achieved so far 5 Copyright AARNet 2005  A new generation of real-time Grid systems is emerging - support worldwide data analysis by the physics community  Leading role of HEP in developing new systems and paradigms for data intensive science  Transformed view and theoretical understanding of TCP as an efficient, scalable protocol with a wide field of use  Efficient standalone and shared use of 10 Gbps paths of virtually unlimited length; progress towards 100 Gbps networking  Emergence of a new generation of “hybrid” packet- and circuit- switched networks LHC data (simplified) 1 Megabyte (1MB) A digital photo Per experiment 40 million collisions per second • After filtering, 100 collisions of interest per second • A Megabyte of digitised information for each collision = recording rate of 100 Megabytes/sec • 1 billion collisions recorded = 1 Petabyte/year CMS 6 Copyright AARNet 2005 LHCb ATLAS 1 Gigabyte (1GB) = 1000MB A DVD movie 1 Terabyte (1TB) = 1000GB World annual book production 1 Petabyte (1PB) = 1000TB 10% of the annual production by LHC experiments 1 Exabyte (1EB) = 1000 PB World annual information production ALICE LHC Computing Hierarchy CERN/Outside Resource Ratio ~1:2 Tier0/( Tier1)/( Tier2) ~1:1:1 ~PByte/sec ~100-1500 MBytes/sec Online System Experiment CERN Center PBs of Disk; Tape Robot Tier 0 +1 Tier 1 ~2.5-10 Gbps IN2P3 Center INFN Center RAL Center FNAL Center 2.5-10 Gbps ~2.5-10 Gbps Tier 3 Tier 2 Institute Institute Physics data cache 7 Workstations Copyright AARNet 2005 Institute Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center Institute 0.1 to 10 Gbps Tier 4 Tens of Petabytes by 2007-8. An Exabyte ~5-7 Years later. Lightpaths for Massive data transfers • From CANARIE A small number of users with large data transfer needs can use more bandwidth than all other users 30 25 20 15 10 5 0 8 Copyright AARNet 2005 n04 Ju n03 Ju n02 Ju n01 Ju n00 Ju n99 Ju Ju n98 Lightpaths IP Peak IP Average Why? • Type 3 users: High Energy Physics Astronomers, eVLBI, High Definition multimedia over IP Massive data transfers from experiments running 24x7 9 Copyright AARNet 2005 Cees de Laat classifies network users into 3 broad groups. 1. Lightweight users, browsing, mailing, home use. Who need full Internet routing, one to many; 2. Business applications, multicast, streaming, VPN’s, mostly LAN. Who need VPN services and full Internet routing, several to several + uplink; and 3. Scientific applications, distributed data processing, all sorts of grids. Need for very fat pipes, limited multiple Virtual Organizations, few to few, peer to peer. What is the GLIF? • Global Lambda Infrastructure Facility - www.glif.is • International virtual organization that supports persistent data-intensive scientific research and middleware development • Provides ability to create dedicated international point to point Gigabit Ethernet circuits for “fixed term” experiments 10 Copyright AARNet 2005 Huygens Space Probe – a practical example 11 • Cassini spacecraft left Earth in October 1997 to travel to Saturn • On Christmas Day 2004, the Huygens probe separated from Cassini Very Long Baseline Interferometry (VLBI) is • Started it’s descent through the dense a technique where atmosphere of Titan on 14 Jan 2005 widely separated radio• Using this technique 17 telescopes in telescopes observe the Australia, China, Japan and the US were same region of the sky able to accurately position the probe to simultaneously to within a kilometre (Titan is ~1.5 billion generate images of kilometres from Earth) cosmic radio sources • Need to transfer Terabytes of data between Australia and the Netherlands Copyright AARNet 2005 AARNet - CSIRO ATNF contribution 12 Copyright AARNet 2005 • Created “dedicated” circuit • The data from two of the Australian telescopes (Parkes [The Dish] & Mopra) was transferred via light plane to CSIRO Marsfield (Sydney) • CeNTIE based fibre from CSIRO Marsfield to AARNet3 GigaPOP • SXTransPORT 10G to Seattle • “Lightpath” to Joint Institute for VLBI in Europe (JIVE) across CA*net4 and SURFnet optical infrastructure But……….. • 9 organisations in 4 countries involved in “making it happen” • Required extensive human-human Although time from interaction (mainly emails…….lots of them) concept to undertaking the scientific experiment • Although a 1Gbps path was available, maximum throughput was around 400Gbps was only 3 weeks…….. • Issues with protocols, stack tuning, disk-todisk transfer, firewalls, different formats, etc • Currently scientists and engineers need to test thoroughly before important experiments, not yet “turn up and use” • Ultimate goal is for the control plane issues to be transparent to the end-user who simply presses the “make it happen” icon 13 Copyright AARNet 2005 International path for Huygens transfer 14 Copyright AARNet 2005 EXPReS and Square Kilometre Array Australia one of countries bidding for SKA – significant infrastructure challenges Also, Eu Commision funded EXPReS project to link 16 radio telescopes around the world at gigabit speeds 15 Copyright AARNet 2005 • SKA bigger data generator than LHC • But in a remote location In Conclusion • scientists and network engineers working together can exploit the new opportunities that high capacity networking opens up for “big science” • Need to solve issues associated with scalability, control plane, ease of use • QUESTIONS? 16 Copyright AARNet 2005