Download vandy_ibpgrid_05

Global Terascale Data Management Scott Atchley, Micah Beck, Terry Moore Vanderbilt ACCRE Workshop 7 April 2005 Talk Outline » » » » What is Terascale Data Management? Accessing Globally Distributed Data What is Logistical Networking? Performing Globally Distributed Computation Terascale Data Management » Scientific Data Sets are Growing Rapidly • Growing from 100s of GBs to 10s of TBs • Terascale Supernova Initiative (ORNL) • Princeton Plasma Physics Lab fusion simulation » Distributed resources • Simulate on supercomputer at ORNL or NERSC • Analyze/Visualize at ORNL, NCSU, PPPL, etc. » Distributed Collaborators • TSI has members at ORNL, NCSU, SUNYSB, UCSD, FSU, FAU, UC-Davis, others Terascale Data Management » Scientists use NetCDF or HDF to manage dataset • Scientist has logical view of data as variables, dimensions, metadata, etc. • NetCDF or HDF manages serialization of data to local file • Scientist can query for specific variables, metadata, etc. • Requires local file access to use (local disk, NFS, NAS, SAN, etc.) • Datasets can be too large for the scientist to store locally - if so, can’t browse NetCDF (Common Data Format) Scientist generates data Variable1 Variable2 Variable3 NetCDF file stored on disk Scientist uses NetCDF to organize data Header Accessing Globally Distributed Data » Collaboration requires shared file system • Nearly impossible over the wide-area network » Other methods include HTTP, FTP, GridFTP, scp • Can be painfully slow to transfer large data • Can be cumbersome to set up/manage accounts » How to manage replication? • More copies can improve performance, but… • If more than one copy exists, how to let others know? • How to choose which replica to use? What is Logistical Networking? » An architecture for globally distributed, sharable resources • Storage • Computation » A globally deployed infrastructure • Over 400 storage servers in 30 countries • Serving 30 TB of storage » Open-source client tools and libraries • Linux, Solaris, MacOSX, Windows, AIX, others • Some Java tools Logistical Networking Applications Logistical Runtime System LBone exNode IBP Physical Layer » Modeled on Internet Protocol » IBP provides generic storage and generic computation • Weak semantics • Highly scalable » L-Bone provides resource discovery » exNode provides data aggregation (length and replication) and annotation » LoRS provides fault-tolerance, high performance, security » Multiple applications available Current Infrastructure Deployment The public deployment includes 400 IBP depots in 30 countries serving 30 TB storage (leverages PlanetLab). Private deployments for DOE, Brazilian and Czech backbones. Available LoRS Client Tools Binaries for Windows and MacOSX. Source for Linux, Solaris, AIX, others. LoDN - Web-based File System Store files into the Logistical Network using Java upload/download tools. Manages exNode “warming” (lease renewal and migration). Provides account (single user or group) as well as “world” permissions. NetCDF/L » Modified NetCDF that stores data in logistical network (lors://) » Uses libxio (Unix IO wrapper) » Ported NetCDf with 13 lines of code » NetCDF 3.6 provides for >2 GB files (64-bit offset) » LoRS parameters available via environment variables ncview using NetCDF/L Libxio » Unix IO wrapper (open(), read(), write(), etc.) » Developed in Czech Republic for Distributed Data Storage (DiDaS) project » Port any Unix IO app using 12 lines #ifdef HAVE_LIBXIO #define open #define close ... #endif xio_open xio_close » DiDaS ported transcode (video transcoding) and mplayer (video playback) apps using libxio High Performance Streaming for Large Scale Simulations » Viraj Bhat, Scott Klasky, Scott Atchley, Micah Beck, Doug McCune, Manish Parashar, High Performance Threaded Data Streaming for Large Scale Simulations, in the proceedings of 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh, PA, Nov 2004. » Streamed data from NERSC to PPPL using LN » Sent data from previous timestep while computation proceeds on next timestep » Automatically adjusts to network latencies • Adds threads when needed » Imposed <3% overhead (as compared to no IO) » Writing to GPFS at NERSC imposed 3-10% overhead (depending on block size) Failsafe Mechanisms for Data Streaming in Energy Fusion Simulation GPFS file sys exnodercv depots on simulation end depots close by Replication Re-fetch failed transfers from GPFS/depots PPPL depots Write to GPFS or Simulation Depot Supercomputer nodes Buffer Overflow Signal Postprocessing routines Network Failure Data Flow LN in the Ctr for Plasma Edge Simulation » Implementing workflow automation • Reliable communication between stages • Exposed mapping of parallel files » Collaborative use of detailed simulation traces • Terabytes per time step • Accessed globally • Distributed postprocessing and redistribution » Vizualization of partial datasets • Heirarchy: Workstation/cluster/disk cache/tape • Access prediction, prestaging is vital Performing Globally Distributed Computation » Adding generic, restricted computing within the depot » Side-effect-free programming only (no system calls or external libraries except malloc allowed) » Uses IBP capabilities to pass arguments » Mobile code “oplets” with restricted environment • C/compiler based • Java byte code based » Test applications include • Text mining: parallel grep • Medical vizualization: brain fiber tracing Grid Computing for Distributed Viz: DTMRI Brain Scans (Jian Huang) 1. Send data 2. Request processing 3. Return results Transgrep » » » » » Stores 1.3 GB (web server) log file Data is striped and replicated on dozens of depots Transgrep breaks job into uniform blocks (10 MB) Has depots search within blocks Depots return matches as well as partial first and last lines » Client sorts results and searches lines that overlap block boundaries » 15-20 times speed up versus local search » Automatically handles slow or failed depots Czech Republic LN Infrastructure Brazil RNP LN Infrastructure Proposed OneTenn Infrastructure Logistical Networking First Steps » » » » » Try LoDN by browsing as a guest (Java) Download the LoRS tools and try them out Download and use NetCDF/L Use libxio to port your Unix IO apps to use LN Run your own IBP depots (publicly or privately) More information http://loci.cs.utk.edu [email protected]

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download vandy_ibpgrid_05