Download Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Connecting HPIO Capabilities
with Domain Specific Needs
Rob Ross
MCS Division
Argonne National Laboratory
[email protected]
I/O in a HPC system
Application
I/O System Software
…
Clients running
applications
(100s-1000s)
Storage or
System Network
I/O System Software
Storage Hardware
…
I/O devices
or servers
(10s-100s)
• Many cooperating tasks sharing I/O resources
• Relying on parallelism of hardware and software
for performance
2
Motivation
• HPC applications increasingly rely on I/O
subsystems
– Large input datasets, checkpointing, visualization
• Applications continue to be scaled, putting more
pressure on I/O subsystems
• Application programmers desire interfaces that
match the domain
– Multidimensional arrays, typed data, portable formats
• Two issues to be resolved by I/O system
– Very high performance requirements
– Gap between app. abstractions and HW abstractions
3
I/O history in a nutshell
• I/O hardware has lagged behind and
continues to lag behind all other system
components
• I/O software has matured more slowly than
other components (e.g. message passing
libraries)
– Parallel file systems (PFSs) are not enough
• This combination has led to poor I/O
performance on most HPC platforms
• Only in a few instances have I/O libraries
presented abstractions matching application
needs
4
Evolution of I/O software
(Not to scale or necessarily in the right order…)
• Goal is convenience and performance for HPC
• Slowly capabilities have emerged
• Parallel high-level libraries bring together good
abstractions and performance, maybe
5
I/O software stacks
Application
High-level I/O Library
MPI-IO Library
Parallel File System
I/O Hardware
• Myriad I/O components are converging into
layered solutions
• Insulate applications from eccentric MPI-IO
and PFS details
• Maintain (most of) I/O performance
– Some HLL features do cost performance
6
Role of parallel file systems
• Manage storage hardware
– Lots of independent components
– Must present a single view
– Provide fault tolerance
• Focus on concurrent, independent access
– Difficult to pass knowledge of collectives to PFS
• Scale to many clients
– Probably means removing all shared state
– Lock-free approaches
• Publish an interface that MPI-IO can use
effectively
– Not POSIX
7
Role of MPI-IO implementations
• Facilitate concurrent access by groups of
processes
– Understanding of the programming model
• Provide hooks for tuning PFS
– MPI_Info as interface to PFS tuning parameters
• Expose a fairly generic interface
– Good for building other libraries
• Leverage MPI-IO semantics
– Aggregation of I/O operations
• Hide unimportant details of parallel file
system
8
Role of high-level libraries
• Provide an appropriate abstraction for the
domain
–
–
–
–
Multidimensional, typed datasets
Attributes
Consistency semantics that match usage
Portable format
• Maintain the scalability of MPI-IO
– Map data abstractions to datatypes
– Encourage collective I/O
• Implement optimizations that MPI-IO cannot
(e.g. header caching)
9
Example: ASCI/Alliance FLASH
ASCI FLASH
Parallel netCDF
IBM MPI-IO
• FLASH is an astrophysics
simulation code from the
ASCI/Alliance Center for
GPFS
Astrophysical
Storage
Thermonuclear Flashes
• Fluid dynamics code using adaptive mesh
refinement (AMR)
• Runs on systems with thousands of nodes
• Three layers of I/O software between the
application and the I/O hardware
• Example system: ASCI White Frost
10
FLASH data and I/O
• 3D AMR blocks
– 163 elements per block
– 24 variables per element
– Perimeter of ghost cells
• Checkpoint writes all variables
– no ghost cells
– one variable at a time (noncontiguous)
• Visualization output is a subset of
variables
• Portability of data desirable
– Postprocessing on separate platform
Ghost cell
Element (24 vars)
11
Tying it all together
FLASH I/O Benchmark
• FLASH tells PnetCDF that all its
processes want to write out
regions of variables and store
them in a portable format
120
100
80
60
40
20
0
16
32
64
128
Processors
HDF5
256
PnetCDF
• PnetCDF performs data conversion and calls appropriate
MPI-IO collectives
• MPI-IO optimizes writing of data to GPFS using data
shipping, I/O agents
• GPFS handles moving data from agents to storage
resources, storing the data, and maintaining file metadata
• In this case, PnetCDF is a better match to the application
12
Future of I/O system software
• More layers in the I/O stack
– Better match application view of data
– Mapping this view to PnetCDF or similar
– Maintaining collectives, rich descriptions
Application
Domain Specific I/O Library
• More high-level libraries using MPI-IO
– PnetCDF, HDF5 are great starts
– These should be considered mandatory
I/O system software on our machines
High-level I/O Library
MPI-IO Library
Parallel File System
I/O Hardware
• Focusing component implementations on their roles
– Less general-purpose file systems
- Scalability and APIs of existing PFSs aren’t up to workloads and scales
– More aggressive MPI-IO implementations
- Lots can be done if we’re not busy working around broken PFSs
– More aggressive high-level library optimization
- They know the most about what is going on
13
Future
• Creation and adoption of parallel high-level I/O
libraries should make things easier for everyone
– New domains may need new libraries or new middleware
– HLLs that target database backends seem obvious, probably
someone else is already doing this?
• Further evolution of components necessary to get
best performance
– Tuning/extending file systems for HPC (e.g. user metadata
storage, better APIs)
• Aggregation, collective I/O, and leveraging semantics
are even more important at larger scale
– Reliability too, especially for kernel FS components
• Potential HW changes (MEMS, active disk) are
complementary
14