Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Advanced Computing Technology Overview Richard P. Mount Director: Scientific Computing and Computing Services Stanford Linear Accelerator Center May 25, 2005 Richard P. Mount, SLAC May 2005 1 Advanced Computing Technology My Viewpoint • Decades of computing for experimental HEP; • Decades of data-intensive computing; • Belief that the future will be even more data-intensive for HEP; • Belief that the many other sciences are also facing a data-intensive future. Richard P. Mount, SLAC May 2005 2 My History • Circa 1980 – – – – – The EMC experiment World’s largest collaboration (99 physicists) 10,000 tapes/year No advance planning for computing resources I had to invent a data-handling system to be able to do physics • 1982 – 1997 – The L3 experiment at LEP – Responsible for L3 computing at CERN • 1997 – now – BaBar at SLAC – Future HEP, Particle-astro and X-ray science at SLAC/Stanford Richard P. Mount, SLAC May 2005 3 CPU, Disk and Network History i.e. what I (and Harvey) have bought 10,000,000 1,000,000 100,000 Farm CPU box MIPS/$M 10,000 Doubling in 1.2 years Raid Disk GB/$M 1,000 Doubling in 1.4 years Transatlantic WAN kB/s per $M/yr 100 Doubling in 8.4 or 0.7 years 10 Richard P. Mount, SLAC May 2005 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1 4 What about the future? • CPU – The clock-speed ramp up has run out of steam – Intel/AMD response: multicore chips • Fairly easy to use for HEP data processing – Intel predicts 25 Tflops/chip in 2015 (100 cores) • Close to the “doubling every 1.2 year” extrapolation (costs are very dependent on memory) • Disk – “increasing requirements for disk drive improvements provides a unending challenge to extend GMR technology to its limits, and then to look beyond” (Hitachi GST) – It seems to be working – no end to capacity growth is in sight yet. Richard P. Mount, SLAC May 2005 5 Generic HEP Computing Fabric Client Client Client Client Client IP Network Disk Server Disk Server Disk Server Disk Server Tape Server Tape Server Thousands of Farm CPUs HEP-specific software (e.g Root/Xrootd) Disk Server IP Network Tape Server Client Disk Server Hundreds of Disk Servers Mass Storage System Software Tape Server Tape Server Tens of Tape servers Tens of Tape drives Richard P. Mount, SLAC May 2005 6 Some Issues • While CPU power per $ has been doubling every 1.2 years, Watts per $ have been increasing too. – Infrastructure (power, cooling, space) now costs as much per year as the computers • Boxes per $ are also increasing – 10,000 – 100,000 box systems are in sight – Scalability is vital – Fault tolerance is a requirement • The raised-floor switched network (= system backplane) is a potential bottleneck – But if you have a few times $10M then a Cisco CSR-1 can provide about 10,000 non blocking 10Gbit Ethernet ports on a single switch fabric. Richard P. Mount, SLAC May 2005 7 Disks: It’s not just about capacity (1) Ed Grochowski: Hitachi Global Storage Technologies Richard P. Mount, SLAC May 2005 8 Disks: It’s not just about capacity (2) 10% CGR in speed Ed Grochowski: Hitachi Global Storage Technologies Richard P. Mount, SLAC May 2005 9 Disks: Gloom and Doom • Giora Tarnopolski (TarnoTek) – “I do not believe that there will be much shorter access times in the future” • Ed Grochowski (Hitachi GST) – “While rotation rates beyond 15K are possible in the future, these will likely occur at longer product time intervals” • BaBar reality: – Micro DST events are replicated about tenfold on disk – Millions of $$$ per year, and many months of delay, are spent on data reorganization to allow efficient access by thousands of concurrent jobs Richard P. Mount, SLAC May 2005 10 Technology Issues in Data Access • Latency • Speed/Bandwidth • (Cost) • (Reliabilty) Richard P. Mount, SLAC May 2005 11 Latency and Speed – Random Access Random-Access Storage Performance 10000 1000 100 Retreival Rate Mbytes/s 10 105 1 DRAM 675 MHz 0.1 0.01 Seagate 400GB SATA 0.001 0.0001 STK9940B 0.00001 0.000001 0.0000001 0.00000001 0.000000001 0 1 2 3 4 5 6 7 8 9 10 11 log10 (Obect Size Bytes) Richard P. Mount, SLAC May 2005 12 Latency and Speed – Random Access 10000 1000 100 DRAM 675 MHz Retrieval Rate MBytes/s 10 Seagate 400GB SATA 1 0.1 STK9940B 0.01 0.001 RAM 10 years ago 0.0001 0.00001 Disk 10 years ago 0.000001 Tape 10 years ago 0.0000001 0.00000001 0.000000001 0 1 2 3 4 5 6 7 8 9 10 11 log10 (Object Size Bytes) Richard P. Mount, SLAC May 2005 13 Death to Disks • 105 latency gap with respect to memory will be intolerable, eventually even in consumer applications; • Storage-class memory is in development (see Jai Menon’s talk at CHEP 2004); • In the meantime we can use DRAM or even Flash memory in latency-critical or throughput-critical applications; • For example, May 23, 2005 news item: – “Samsung develops flash-based 'disk' for PCs” • “It uses memory chips instead of a mechanical recording system” • http://www.computerworld.com/hardwaretopics/storage/story/0,10801,1 01946,00.html?source=NLT_AM&nid=101946 • Market forces are not yet aligned with scientific needs for massive, random access storage-class memory. Richard P. Mount, SLAC May 2005 14 Storage-Class Memory Architecture: SLAC Strategy (PetaCache) • • • • There is significant commercial interest in architectures including massive data-cache memory But: from interest to delivery will take 3-4 years And: applications will take time to adapt not just codes, but their whole approach to computing, to exploit the new randomaccess architecture Hence: two phases 1. Development phase (years 1,2,3) – – – – – – 2. Commodity hardware taken to its limits BaBar as principal user, adapting existing data-access software to exploit the configuration BaBar/SLAC contribution to hardware and manpower Publicize results Encourage other users Begin collaboration with industry to design the leadership-class machine Operational Facility (years 3,4,5) – – – New architecture Strong industrial collaboration Wide applicability Richard P. Mount, SLAC May 2005 15 Development Machine Deployment – Currently Funded Cisco Switch Clients up to 2000 Nodes, each 2 CPU, 2 GB memory Linux Data-Servers 64-128 Nodes, each Sun V20z, 2 Opteron CPU, 16 GB memory Up to 2TB total Memory Solaris Cisco Switch Richard P. Mount, SLAC May 2005 16 Development Machine Deployment – Possible Next Step Cisco Switch Clients up to 2000 Nodes, each 2 CPU, 2 GB memory Linux Data-Servers 80 Nodes, each 8 Opteron CPU, 128 GB memory Up to 10TB total Memory Solaris Cisco Switch Richard P. Mount, SLAC May 2005 17 Scalable Object-Serving Software Example • Xrootd (Andy Hanushevsky/SLAC) – – – – – – Optimized for read-only access Contributes ~29 microseconds to server latency Make 1000s of servers transparent to user code Load balancing Automatic staging from tape Failure recovery • Can allow BaBar to start getting benefit from a new data-access architecture within months without changes to user code • Minimizes impact of hundreds of separate address spaces in the data-cache memory Richard P. Mount, SLAC May 2005 18 Summary • Moore’s Law (at least the generalized version) is alive and well for CPU throughput and disk capacity; • Moore’s Law seems dead for single-threaded CPU power; • Moore’s Law never applied to random-access to data; • At constant cost, computing is getting hotter! • Disks are now playing the same role in HEP that tapes were in 1990 (i.e. they are not random-access devices); • Prepare for a random-access future; • Prepare for a 10,000 to 100,000 box future; • Scalability and fault tolerance are the challenges we must address. Richard P. Mount, SLAC May 2005 19