Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory [email protected] I/O in a HPC system Application I/O System Software … Clients running applications (100s-1000s) Storage or System Network I/O System Software Storage Hardware … I/O devices or servers (10s-100s) • Many cooperating tasks sharing I/O resources • Relying on parallelism of hardware and software for performance 2 Motivation • HPC applications increasingly rely on I/O subsystems – Large input datasets, checkpointing, visualization • Applications continue to be scaled, putting more pressure on I/O subsystems • Application programmers desire interfaces that match the domain – Multidimensional arrays, typed data, portable formats • Two issues to be resolved by I/O system – Very high performance requirements – Gap between app. abstractions and HW abstractions 3 I/O history in a nutshell • I/O hardware has lagged behind and continues to lag behind all other system components • I/O software has matured more slowly than other components (e.g. message passing libraries) – Parallel file systems (PFSs) are not enough • This combination has led to poor I/O performance on most HPC platforms • Only in a few instances have I/O libraries presented abstractions matching application needs 4 Evolution of I/O software (Not to scale or necessarily in the right order…) • Goal is convenience and performance for HPC • Slowly capabilities have emerged • Parallel high-level libraries bring together good abstractions and performance, maybe 5 I/O software stacks Application High-level I/O Library MPI-IO Library Parallel File System I/O Hardware • Myriad I/O components are converging into layered solutions • Insulate applications from eccentric MPI-IO and PFS details • Maintain (most of) I/O performance – Some HLL features do cost performance 6 Role of parallel file systems • Manage storage hardware – Lots of independent components – Must present a single view – Provide fault tolerance • Focus on concurrent, independent access – Difficult to pass knowledge of collectives to PFS • Scale to many clients – Probably means removing all shared state – Lock-free approaches • Publish an interface that MPI-IO can use effectively – Not POSIX 7 Role of MPI-IO implementations • Facilitate concurrent access by groups of processes – Understanding of the programming model • Provide hooks for tuning PFS – MPI_Info as interface to PFS tuning parameters • Expose a fairly generic interface – Good for building other libraries • Leverage MPI-IO semantics – Aggregation of I/O operations • Hide unimportant details of parallel file system 8 Role of high-level libraries • Provide an appropriate abstraction for the domain – – – – Multidimensional, typed datasets Attributes Consistency semantics that match usage Portable format • Maintain the scalability of MPI-IO – Map data abstractions to datatypes – Encourage collective I/O • Implement optimizations that MPI-IO cannot (e.g. header caching) 9 Example: ASCI/Alliance FLASH ASCI FLASH Parallel netCDF IBM MPI-IO • FLASH is an astrophysics simulation code from the ASCI/Alliance Center for GPFS Astrophysical Storage Thermonuclear Flashes • Fluid dynamics code using adaptive mesh refinement (AMR) • Runs on systems with thousands of nodes • Three layers of I/O software between the application and the I/O hardware • Example system: ASCI White Frost 10 FLASH data and I/O • 3D AMR blocks – 163 elements per block – 24 variables per element – Perimeter of ghost cells • Checkpoint writes all variables – no ghost cells – one variable at a time (noncontiguous) • Visualization output is a subset of variables • Portability of data desirable – Postprocessing on separate platform Ghost cell Element (24 vars) 11 Tying it all together FLASH I/O Benchmark • FLASH tells PnetCDF that all its processes want to write out regions of variables and store them in a portable format 120 100 80 60 40 20 0 16 32 64 128 Processors HDF5 256 PnetCDF • PnetCDF performs data conversion and calls appropriate MPI-IO collectives • MPI-IO optimizes writing of data to GPFS using data shipping, I/O agents • GPFS handles moving data from agents to storage resources, storing the data, and maintaining file metadata • In this case, PnetCDF is a better match to the application 12 Future of I/O system software • More layers in the I/O stack – Better match application view of data – Mapping this view to PnetCDF or similar – Maintaining collectives, rich descriptions Application Domain Specific I/O Library • More high-level libraries using MPI-IO – PnetCDF, HDF5 are great starts – These should be considered mandatory I/O system software on our machines High-level I/O Library MPI-IO Library Parallel File System I/O Hardware • Focusing component implementations on their roles – Less general-purpose file systems - Scalability and APIs of existing PFSs aren’t up to workloads and scales – More aggressive MPI-IO implementations - Lots can be done if we’re not busy working around broken PFSs – More aggressive high-level library optimization - They know the most about what is going on 13 Future • Creation and adoption of parallel high-level I/O libraries should make things easier for everyone – New domains may need new libraries or new middleware – HLLs that target database backends seem obvious, probably someone else is already doing this? • Further evolution of components necessary to get best performance – Tuning/extending file systems for HPC (e.g. user metadata storage, better APIs) • Aggregation, collective I/O, and leveraging semantics are even more important at larger scale – Reliability too, especially for kernel FS components • Potential HW changes (MEMS, active disk) are complementary 14