Download Towards a Data Cauldron

Microsoft Research Faculty Summit 2008 Ian Foster Computation Institute University of Chicago & Argonne National Laboratory If you want to build a ship, don’t drum up the men to gather wood, divide the work, and give orders. Instead, teach them to yearn for the vast and endless sea. Antoine de SaintExupéry Folker Meyer, Genome Sequencing vs. Moore’s Law: Cyber Challenges for the Next Decade, CTWatch, August 2006. Data in “No limits”  Storage  Computing  Format  Program Programs & rules in Results out Allowing for  Versioning  Provenance  Collaboration  Annotation having the interior immediately accessible relatively free of obstructions to sight, movement, or internal arrangement generous, liberal, or bounteous in operation; live readily admitting new members not constipated Rules Parallel programs Workflows Swift MapReduce R Dryad MatLab SQL Octave BPEL SCFL Virtualization Run any program, store any data Indexing Automated maintenance Provisioning Policy-driven allocation of resources to competing demands Data Data Transform Annotate Search Add to Tag Visualize Discover Extend Group Share Astrophysics Cognitive science East Asian studies Economics Environmental science Epidemiology Genomic medicine Neuroscience Political science Sociology Solid state physics 1000 TB tape backup 500 TB reliable storage (data, metadata) Diverse data sources Data ingest PADS 180 TB, 180 GB/s 17 Top/s analysis Dynamic provisioning Parallel analysis Remote access Diverse users Offload to remote data centers CPU cores: 118784 Tasks: 934803 Elapsed time: 7257 sec Compute time: 21.43 CPU yr Average task time: 667 sec Relative Efficiency: 99.7% Time (secs) (from 16 to 32 racks) Utilization: Sustained: 99.6% Overall: 78.3% Ioan Raicu Zhao Zhang Mike Wilde HPC systems software (MPICH, PVFS, ZeptOS) Collaborative data tagging (GLOSS) Data integration (XDTM) HPC data analytics and visualization Loosely coupled parallelism (Swift, Hadoop) Dynamic provisioning (Falkon) Service authoring (Introduce, caGrid, gRAVI) Provenance recording and query (Swift) Service composition and workflow (Taverna) Virtualization management (Workspace Service) Distributed data management (GridFTP, etc.) Ben Clifford, Functional MihaelHatigan, MRI Mike Wilde, Yong Zhao Diverse experimental data & metadata Browse data Search Content preview Transcode Download Analyze SIDgrid Bennett Berthenthal Mike Papka Mike Wilde … and others TeraGrid PADS … Data in “No limits”  Storage  Computing  Format  Program Programs & rules in Results out Allowing for  Versioning  Provenance  Collaboration  Annotation

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Towards a Data Cauldron