Download QUESTnet 2005: Linking the World with Light

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Massive Data Transfers
1
Copyright AARNet 2005
George McLaughlin
Mark Prior
AARNet
“Big Science” projects driving networks
•
Billion dollar globally
funded projects
Massive data transfer
needs
•
2
Copyright AARNet 2005
Large Hadron Collider
– Coming on-stream in 2007
– Particle collisions generating terabytes/second of
“raw” data at a single, central, well-connected site
– Need to transfer data to global “tier 1” sites. A tier
1 site must have a 10Gbps path to CERN
– Tier 1 sites need to ensure gigabit capacity to the
Tier2 sites they serve
Square Kilometre Array
– Coming on-stream in 2010?
– Greater data generator than LHC
– Up to 125 sites at remote locations, data need to
be brought together for correlation
– Can’t determine “noise” prior to correlation
– Many logistic issues to be addressed
From very small to very big
3
Copyright AARNet 2005
Scientists and Network Engineers coming together
• HEP community and R&E network
community have figured out
mechanisms for interaction – probably
because HEP is pushing network
boundaries
• eg the ICFA workshops on HEP, Grid
and the Global Digital Divide bring
together scientists, network engineers
and decision makers – and achieve
results
• http://agenda.cern.ch/List.php
4
Copyright AARNet 2005
What’s been achieved so far
5
Copyright AARNet 2005
 A new generation of real-time Grid systems is
emerging - support worldwide data analysis
by the physics community
 Leading role of HEP in developing new
systems and paradigms for data intensive
science
 Transformed view and theoretical
understanding of TCP as an efficient,
scalable protocol with a wide field of use
 Efficient standalone and shared use of 10
Gbps paths of virtually unlimited length;
progress towards 100 Gbps networking
 Emergence of a new generation of “hybrid”
packet- and circuit- switched networks
LHC data (simplified)
1 Megabyte (1MB)
A digital photo
Per experiment
40 million collisions per second
• After filtering, 100 collisions
of interest per second
• A Megabyte of digitised
information for each collision
= recording rate of 100
Megabytes/sec
• 1 billion collisions recorded =
1 Petabyte/year
CMS
6
Copyright AARNet 2005
LHCb
ATLAS
1 Gigabyte (1GB)
= 1000MB
A DVD movie
1 Terabyte (1TB)
= 1000GB
World annual
book production
1 Petabyte (1PB)
= 1000TB
10% of the annual production by
LHC experiments
1 Exabyte (1EB)
= 1000 PB
World annual information
production
ALICE
LHC Computing Hierarchy
CERN/Outside Resource Ratio ~1:2
Tier0/( Tier1)/( Tier2)
~1:1:1
~PByte/sec
~100-1500
MBytes/sec
Online System
Experiment
CERN Center
PBs of Disk;
Tape Robot
Tier 0 +1
Tier 1
~2.5-10 Gbps
IN2P3 Center
INFN Center
RAL Center
FNAL Center
2.5-10 Gbps
~2.5-10 Gbps
Tier 3
Tier 2
Institute Institute
Physics data cache
7
Workstations
Copyright AARNet 2005
Institute
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Institute
0.1 to 10 Gbps
Tier 4
Tens of Petabytes by 2007-8.
An Exabyte ~5-7 Years later.
Lightpaths for Massive data transfers
• From CANARIE
A small number of users
with large data transfer
needs can use more
bandwidth than all other
users
30
25
20
15
10
5
0
8
Copyright AARNet 2005
n04
Ju
n03
Ju
n02
Ju
n01
Ju
n00
Ju
n99
Ju
Ju
n98
Lightpaths
IP Peak
IP Average
Why?
•
Type 3 users:
High Energy Physics
Astronomers, eVLBI,
High Definition
multimedia over IP
Massive data transfers
from experiments
running 24x7
9
Copyright AARNet 2005
Cees de Laat classifies network
users into 3 broad groups.
1. Lightweight users, browsing, mailing,
home use. Who need full Internet
routing, one to many;
2. Business applications, multicast,
streaming, VPN’s, mostly LAN. Who
need VPN services and full Internet
routing, several to several + uplink;
and
3. Scientific applications, distributed
data processing, all sorts of grids.
Need for very fat pipes, limited
multiple Virtual Organizations, few to
few, peer to peer.
What is the GLIF?
• Global Lambda Infrastructure Facility
- www.glif.is
• International virtual organization that
supports persistent data-intensive
scientific research and middleware
development
• Provides ability to create dedicated
international point to point Gigabit
Ethernet circuits for “fixed term”
experiments
10
Copyright AARNet 2005
Huygens Space Probe – a practical example
11
• Cassini spacecraft left Earth in October 1997
to travel to Saturn
• On Christmas Day 2004, the Huygens probe
separated from Cassini
Very Long Baseline
Interferometry (VLBI) is • Started it’s descent through the dense
a technique where
atmosphere of Titan on 14 Jan 2005
widely separated radio• Using this technique 17 telescopes in
telescopes observe the
Australia, China, Japan and the US were
same region of the sky
able to accurately position the probe to
simultaneously to
within a kilometre (Titan is ~1.5 billion
generate images of
kilometres from Earth)
cosmic radio sources
• Need to transfer Terabytes of data between
Australia and the Netherlands
Copyright AARNet 2005
AARNet - CSIRO ATNF contribution
12
Copyright AARNet 2005
• Created “dedicated” circuit
• The data from two of the Australian
telescopes (Parkes [The Dish] &
Mopra) was transferred via light
plane to CSIRO Marsfield (Sydney)
• CeNTIE based fibre from CSIRO
Marsfield to AARNet3 GigaPOP
• SXTransPORT 10G to Seattle
• “Lightpath” to Joint Institute for VLBI
in Europe (JIVE) across CA*net4
and SURFnet optical infrastructure
But………..
• 9 organisations in 4 countries involved in
“making it happen”
• Required extensive human-human
Although time from
interaction (mainly emails…….lots of them)
concept to undertaking
the scientific experiment • Although a 1Gbps path was available,
maximum throughput was around 400Gbps
was only 3 weeks……..
• Issues with protocols, stack tuning, disk-todisk transfer, firewalls, different formats, etc
• Currently scientists and engineers need to
test thoroughly before important
experiments, not yet “turn up and use”
• Ultimate goal is for the control plane issues
to be transparent to the end-user who simply
presses the “make it happen” icon
13
Copyright AARNet 2005
International path for Huygens transfer
14
Copyright AARNet 2005
EXPReS and Square Kilometre Array
Australia one of
countries bidding for
SKA – significant
infrastructure challenges
Also, Eu Commision
funded EXPReS project
to link 16 radio
telescopes around the
world at gigabit speeds
15
Copyright AARNet 2005
• SKA bigger data generator than LHC
• But in a remote location
In Conclusion
• scientists and network engineers
working together can exploit the new
opportunities that high capacity
networking opens up for “big science”
• Need to solve issues associated with
scalability, control plane, ease of use
• QUESTIONS?
16
Copyright AARNet 2005