Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Scientific Computing at SLAC Richard P. Mount Director: Scientific Computing and Computing Services Stanford Linear Accelerator Center HEPiX October 11, 2005 SLAC Scientific Computing Balancing Act • Aligned with the evolving science mission of SLAC but neither • Subservient to the science mission nor • Unresponsive to SLAC mission needs Richard P. Mount, SLAC October 11, 2005 2 SLAC Scientific Computing Drivers • BaBar (data-taking ends December 2008) – The world’s most data-driven experiment – Data analysis challenges until the end of the decade • KIPAC – From cosmological modeling to petabyte data analysis • Photon Science at SSRL and LCLS – Ultrafast Science, modeling and data analysis • Accelerator Science – Modeling electromagnetic structures (PDE solvers in a demanding application) • The Broader US HEP Program (aka LHC) – Contributes to the orientation of SLAC Scientific Computing R&D Richard P. Mount, SLAC October 11, 2005 3 DOE Scientific Computing Funding at SLAC • Particle and Particle Astrophysics – $14M SCCS – $5M Other • Photon Science – $0M SCCS – $1M SSRL? • Computer Science – $1.5M Richard P. Mount, SLAC October 11, 2005 4 Scientific Computing The relationship between Science and the components of Scientific Computing SCCS FTE Application Sciences Issues addressable with “computing” Computing techniques Computing architectures Computing hardware Richard P. Mount, SLAC High-energy and Particle-Astro Physics, Accelerator Science, Photon Science … Particle interactions with matter, Electromagnetic structures, Huge volumes of data, Image processing … ~20 PDE Solving, Algorithmic geometry, Visualization, Meshes, Object databases, Scalable file systems … Single system image, Low-latency clusters, Throughputoriented clusters, Scalable storage … ~26 Processors, I/O devices, Mass-storage hardware, Randomaccess hardware, Networks and Interconnects … October 11, 2005 5 Scientific Computing: SLAC’s goals for Scientific Computing Application Sciences SLAC ++ Stanford Stanford Science SLAC Science Computing techniques Computing architectures Computing hardware Richard P. Mount, SLAC The Scien ce of Scientific Computing Computing for DataIntensive Science October 11, 2005 Collaboration with Stanford and Industry Issues addressable with “computing” 6 Scientific Computing: Current SLAC leadership and recent achievements in Scientific Computing PDE solving for complex electromagnetic structures SLAC + Stanford Science Huge-memory systems for data analysis World’s largest database Scalable data management Richard P. Mount, SLAC GEANT4 photon/particle interaction in complex structures (in collaboration with CERN) Internet2 Land-Speed Record; SC2004 Bandwidth Challenge October 11, 2005 7 What does SCCS run (1)? 256 dualopteron Sun V20zs Data analysis “farms” (also good for HEP simulation) ~ 4000 processors ~ Linux and Solaris Shared-Memory multiprocessor Myrinet cluster – SGI 3700 – 128 processors – 72 processors – Linux – Linux Richard P. Mount, SLAC October 11, 2005 8 What does SCCS run (2)? Application-specific clusters – each 32 to 128 processors – Linux PetaCache Prototype And even … – 64 nodes – 16 GB memory per node – Linux/Solaris Richard P. Mount, SLAC October 11, 2005 9 What does SCCS run (3)? Disk Servers About 120 TB – About 500TB Sun/Solaris Servers – Network attached Sun fibrechannel disk arrays – Mainly xrootd – Some NFS – Some AFS Tape Storage – 6 STK Powderhorn Silos – Up to 6 petabytes capacity – Currently store 2 petabytes – HPSS Richard P. Mount, SLAC October 11, 2005 10 What does SCCS run (4)? Networks – 10 Gigabits/s to ESNET – 10 Gigabits/s R&D – 96 fibers to Stanford – 10 Gigabits/s core in computer center (as soon as we unpack the boxes) Richard P. Mount, SLAC October 11, 2005 11 SLAC Computing - Principles and Practice (Simplify and Standardize) • Lights-out operation – no operators for the last 10 years – Run 24x7 with 8x5 (in theory) staff – (When there is a cooling failure on a Sunday morning, 10–15 SCS staff are on site by the time I turn up) • Science (and business-unix) raised-floor computing – – – – – – Adequate reliability essential Solaris and Linux Scalable “cookie-cutter” approach Only one type of farm CPU bought each year Only one type of file-server + disk bought each year Highly automated OS installations and maintenance • e.g see talk on how SLAC does Clusters by Alf Wachsmann http://www.slac.stanford.edu/~alfw/talks/RCcluster.pdf Richard P. Mount, SLAC October 11, 2005 12 SLAC-BaBar Computing Fabric Client Client Client Client Client IP Network (Cisco) Disk Server Disk Server Disk Server Tape Server Disk Server Richard P. Mount, SLAC Tape Server 1700 dual CPU Linux 800 single CPU Sun/Solaris HEP-specific ROOT software (Xrootd) + Objectivity/DB object database some NFS Disk Server IP Network (Cisco) Tape Server Client Disk Server 120 dual/quad CPU Sun/Solaris ~400 TB Sun FibreChannel RAID arrays (+some SATA) HPSS + SLAC enhancements to ROOT and Objectivity server code Tape Server October 11, 2005 Tape Server 25 dual CPU Sun/Solaris 40 STK 9940B 6 STK 9840A 6 STK Powderhorn over 1 PB of data 13 Scientific Computing Research Areas (1) (Funded by DOE-HEP and DOE SciDAC and DOE-MICS) • Huge-memory systems for data analysis (SCCS Systems group and BaBar) – Expected major growth area (more later) • Scalable Data-Intensive Systems: (SCCS Systems and Physics Experiment Support groups) – “The world’s largest database” (OK not really a database any more) – How to maintain performance with data volumes growing like “Moore’s Law”? – How to improve performance by factors of 10, 100, 1000, … ? (intelligence plus brute force) – Robustness, load balancing, troubleshootability in 1000 – 10000-box systems – Astronomical data analysis on a petabyte scale (in collaboration with KIPAC) Richard P. Mount, SLAC October 11, 2005 14 Scientific Computing Research Areas (2) (Funded by DOE-HEP and DOE SciDAC and DOE MICS) • Grids and Security: (SCCS Physics Experiment Support. Systems and Security groups) – PPDG: Building the US HEP Grid – Open Science Grid; – Security in an open scientific environment; – Accounting, monitoring, troubleshooting and robustness. • Network Research and Stunts: (SCCS Network group – Les Cottrell et al.) – Land-speed record and other trophies • Internet Monitoring and Prediction: (SCCS Network group) – IEPM: Internet End-to-End Performance Monitoring (~5 years) – INCITE: Edge-based Traffic Processing and Service Inference for HighPerformance Networks Richard P. Mount, SLAC October 11, 2005 15 Scientific Computing Research Areas (2) (Funded by DOE-HEP and DOE SciDAC and DOE MICS) • GEANT4: Simulation of particle interactions in million to billion-element geometries: (SCCS Physics Experiment Support Group – M. Asai, D. Wright, T. Koi, J. Perl …) – BaBar, GLAST, LCD … – LHC program – Space – Medical • PDE Solving for complex electromagnetic structures: (Kwok Ko‘s advanced Computing Department + SCCS clusters) Richard P. Mount, SLAC October 11, 2005 16 Growing Competences • Parallel Computing (MPI …) – Driven by KIPAC (Tom Abel) and ACD (Kwok Ko) – SCCS competence in parallel computing (= Alf Wachsmann currently) – MPI clusters and SGI SSI system • Visualization – Driven by KIPAC and ACD – SCCS competence is currently experimental-HEP focused (WIRED, HEPREP …) – (A polite way of saying that growth is needed) Richard P. Mount, SLAC October 11, 2005 17 A Leadership-Class Facility for Data-Intensive Science Richard P. Mount Director, SLAC Computing Services Assistant Director, SLAC Research Division Washington DC, April 13, 2004 PetaCache Goals • The PetaCache architecture aims at revolutionizing the query and analysis of scientific databases with complex structure. – Generally this applies to feature databases (terabytes–petabytes) rather than bulk data (petabytes–exabytes) • The original motivation comes from HEP – Sparse (~random) access to tens of terabytes today, petabytes tomorrow – Access by thousands of processors today, tens of thousands tomorrow Richard P. Mount, SLAC October 11, 2005 19 PetaCache The Team • David Leith, Richard Mount, PIs • Randy Melen, Project Leader • Chuck Boeheim (Systems group leader) • Bill Weeks, performance testing • Andy Hanushevsky, xrootd • Systems group members • Network group members • BaBar (Stephen Gowdy) Richard P. Mount, SLAC October 11, 2005 20 Latency and Speed – Random Access Random-Access Storage Performance 1000 100 10 Retreival Rate Mbytes/s 1 0.1 0.01 0.001 PC2100 WD200GB 0.0001 STK9940B 0.00001 0.000001 0.0000001 0.00000001 0.000000001 0 1 2 3 4 5 6 7 8 9 10 log10 (Obect Size Bytes) Richard P. Mount, SLAC October 11, 2005 21 The PetaCache Strategy • • Sitting back and waiting for technology is a BAD idea Scalable petabyte memory-based data servers require much more than just cheap chips. Now is the time to develop: – – – Data server architecture(s) delivering latency and throughput cost-optimized for the science Scalable data-server software supporting a range of data-serving paradigms (file-access, direct addressing, …) “Liberation” of entrenched legacy approaches to scientific data analysis that are founded on the “knowledge” that accessing small data objects is crazily inefficient • • Applications will take time to adapt not just codes, but their whole approach to computing, to exploit the new architecture Hence: three phases 1. 2. 3. Prototype machine (In operation) • • Commodity hardware Existing “scalable data server software” (as developed for disk-based systems) • • HEP-BaBar as co-funder and principal user Tests of other applications (GLAST, LSST …) • • Tantalizing “toy” applications only (too little memory for flagship analysis applications) Industry participation Development Machine (Next proposal) • Low-risk (purely commodity hardware) and higher-risk (flash memory system requiring some hardware development) components • • Data server software – improvements to performance and scalability, investigation of other paradigms HEP-BaBar as co-funder and principal user • • Work to “liberate” BaBar analysis applications Tests of other applications (GLAST, LSST …) • • Major impact on a flagship analysis application Industry partnerships, DOE Lab partnerships Production Machine(s) • Storage-class memory with a range of interconnect options matched to the latency/throughput needs of differing applications • • Scalable data-server software offering several data-access paradigms to applications Proliferation – machines deployed at several labs • • Economic viability – cost-effective for programs needing dedicated machines Industry partnerships transitioning to commercialization Richard P. Mount, SLAC October 11, 2005 22 Prototype Machine (Operational) Cisco Switch Clients up to 2000 Nodes, each 2 CPU, 2 GB memory Linux Data-Servers 64-128 Nodes, each Sun V20z, 2 Opteron CPU, 16 GB memory Up to 2TB total Memory Solaris or Linux (mix and match) Cisco Switches PetaCache MICS + HEPBaBar Funding Existing HEP-Funded BaBar Systems Richard P. Mount, SLAC October 11, 2005 23 Object-Serving Software • Xrootd/olbd (Andy Hanushevsky/SLAC) – – – – – – – Optimized for read-only access File-access paradigm (filename, offset, bytecount) Make 1000s of servers transparent to user code Load balancing Self-organizing Automatic staging from tape Failure recovery • Allows BaBar to start getting benefit from a new data-access architecture within months without changes to user code • The application can ignore the hundreds of separate address spaces in the data-cache memory Richard P. Mount, SLAC October 11, 2005 24 Prototype Machine: Performance Measurements • Latency • Throughput (transaction rate) • (Aspects of) Scalability Richard P. Mount, SLAC October 11, 2005 25 Latency (microseconds) versus data retrieved (bytes) 250.00 200.00 Server xrootd overhead Server xrootd CPU Client xroot overhead Client xroot CPU TCP stack, NIC, switching Min transmission time 150.00 100.00 50.00 10 0 60 0 11 00 16 00 21 00 26 00 31 00 36 00 41 00 46 00 51 00 56 00 61 00 66 00 71 00 76 00 81 00 0.00 Richard P. Mount, SLAC October 11, 2005 26 Throughput Measurements 100000 90000 Transactions per Second 80000 70000 60000 Linux Client - Solaris Server Linux Client - Linux Server Linux Client - Solaris Server bge 50000 40000 30000 22 processor microseconds per transaction 20000 10000 0 1 5 10 15 20 25 30 35 40 45 50 Number of Clients for One Server Richard P. Mount, SLAC October 11, 2005 27 Storage-Class Memory • New technologies coming to market in the next 3 – 10 years (Jai Menon – IBM) • Current not-quite-crazy example is flash memory Richard P. Mount, SLAC October 11, 2005 28 Development Machine Plans Data-Servers 30 Nodes, each 2 Opteron CPU, 1TB Flash memory ~ 30TB total Memory Solaris/Linux SLAC-BaBar System Switch (10 Gigabit ports) Clients up to 2000 Nodes, each 2 CPU, 2 GB memory Linux Data-Servers 80 Nodes, each 8 Opteron CPU, 128 GB memory Up to 10TB total Memory Solaris/Linux Cisco Switch Fabric PetaCache Richard P. Mount, SLAC October 11, 2005 29 Minor Details? • 1970’s – SLAC Computing Center designed for ~35 Watts/square foot – 0.56 MWatts maximum • 2005 – Over 1 MWatt by the end of the year – Locally high densities (16 kW racks) • 2010 – Over 2 MWatts likely need • Onwards – Uncertain, but increasing power/cooling need is likely Richard P. Mount, SLAC October 11, 2005 30 Crystal Ball (1) • The pessimist’s vision: – PPA computing winds down to about 20% of its current level as BaBar analysis ends in 2012 (Glast is negligible, KIPAC is small) – Photon Science is dominated by non-SLAC/nonStanford scientists who do everything at their home institutions – The weak drivers from the science base make SLAC unattractive for DOE computer science funding Richard P. Mount, SLAC October 11, 2005 31 Crystal Ball (2) • The optimist’s vision: – PPA computing in 2012 includes: • Vigorous/leadership involvement in LHC physics analysis using innovative computing facilities • Massive computational cosmology/astrophysics • A major role in LSST data analysis (petabytes of data) • Accelerator simulation for the ILC – Photon Science computing includes: • A strong SLAC/Stanford faculty, leading much of LCLS science, fully exploiting SLAC’s strengths in simulation, data analysis and visualization • A major BES accelerator research initiative – Computer Science includes: • National/international leadership in computing for data-intensive science (supported at $25M to $50M per year) – SLAC and Stanford: • University-wide support for establishing leadership in the science of scientific computing • New SLAC/Stanford scientific computing institute. Richard P. Mount, SLAC October 11, 2005 32