Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IBM Research Analytics in the cloud Dow we really need to reinvent the storage stack? R. Ananthanarayanan, Karan Gupta, Prashant Pandey, Himabindu Pucha, Prasenjit Sarkar, Mansi Shah, Renu Tewari Image courtesy NASA / ESA © 2002 IBM Corporation Storage Systems Research © 2008 IBM Corporation IBM Research Data-Intensive Internet Scale Applications Typical Applications • Web-scale search, indexing, mining • Genomic sequencing • brain-scale network simulations Storage Systems Research © 2008 IBM Corporation IBM Research Data-Intensive Internet Scale Applications Key Requirements • Scale to very large data sets • Platform needs to scale to 1000’s of nodes • Built of commodity hardware for cost efficiency • Tolerate failures during “every” job execution • Support data shipping to reduce network requirements Storage Systems Research © 2008 IBM Corporation IBM Research MapReduce for analytics MapReduce is emerging as a model for large-scale analytics application Important design goals are extreme-scalability and fault-tolerance Storage layer is separated and has well-defined requirements QuickTime™ and a decompressor are needed to see this picture. Image source: http://developer.yahoo.com/hadoop/tutorial/module1.html Storage Systems Research © 2008 IBM Corporation IBM Research MapReduce Data-store requirements Provide a hierarchical namespace with directories and files Allow applications to read/write data to files Protect data availability and reliability in the face of node and disk failures Provide high bandwidth access to reasonably-sized chunks of data to all compute nodes (not necessarily all-to-all) Provide chunk access-affinity information to allow proper scheduling of tasks Storage Systems Research © 2008 IBM Corporation IBM Research Data store options: Cluster FS Vs Specialized FS Specialized FS Cluster FS Scaling Yes Yes Commodity hardware compliant Yes Yes Storage Systems Research © 2008 IBM Corporation IBM Research Data store options: Cluster FS Vs Specialized FS Specialized FS Cluster FS Scaling Yes Yes Commodity hardware compliant Yes Yes Traditional application support No Yes Mature management tools No Yes Storage Systems Research © 2008 IBM Corporation IBM Research Data store options: Cluster FS Vs Specialized FS Specialized FS Cluster FS Scaling Yes Yes Commodity hardware compliant Yes Yes Traditional application support No Yes Mature management tools No Yes Tuned for Hadoop Yes No Storage Systems Research © 2008 IBM Corporation IBM Research Modifying a Cluster Filesystem for MapReduce GPFS • Mature filesystem - many large production installations • High performance, Highly scalable • Reliability features focused on SAN environments Supports rack-aware 2-way replication • POSIX interface • Supports shared disk (SAN) and shared-nothing setups • Not optimized for MapReduce workloads Does not expose data location information largest block size = 16 MB Changes for Hadoop: • Make blocks bigger • Let the platform know where the big blocks are • Optimize replication and placement to reduce network usage Storage Systems Research © 2008 IBM Corporation IBM Research Key change: Metablocks Works for many workloads • Small FS blocks (eg: 512K) • Large Application blks (eg: 64M) New allocation scheme • Metablock size granularity for wide striping Block map operates on large Metablock size All FS operations operate on small regular block size New allocation policy FS block Application “meta-block” Additional changes to provide block location information and “write affinity” Storage Systems Research © 2008 IBM Corporation IBM Research MapReduce performance 2500.00 Time (sec) 2000.00 GPFS rack=1 HDFS rack=1 GPFS rack=2 HDFS rack=2 1500.00 1000.00 500.00 0.00 Grep TeraGen Terasort Test bed Hadoop : version 0.18.1 16 nodes iDataPlex: 42 nodes GPFS: version pre3.3 160 GB data 8 cores, 8GB RAM (replication factor = 2) 4+1 disks per-node Storage Systems Research © 2008 IBM Corporation IBM Research Impact on traditional workloads 2.5 Normalized perf 2 1.5 1 0.5 0 512K 512K, metablock Normalized Random perf 16M Normalized Seq read perf iDataPlex: 42 nodes GPFS: version pre3.3 8 cores, 8GB RAM Bonnie filesystem benchmark 4+1 disks per-node Storage Systems Research © 2008 IBM Corporation IBM Research Things that didn’t work 800 700 Large filesystem block-size 600 500 400 Turn-off Prefetching 300 200 Create alignment of records to block boundaries 100 P G PF S1 6M LA N PF S1 6M LA G G PF S1 6M L G H PF S1 6M D FS 0 File System 2.5 2 Time (sec) 1.5 1 0.5 0 512K 16M Normalized Random perf Storage Systems Research 16M, no-prefetch Normalized Seq read perf © 2008 IBM Corporation IBM Research Advantages of traditional filesystems Traditional filesystems have solved many hard problems like access control, quotas, snapshots … Allow traditional and MapReduce applications to share the same input data. Exploit Filesystem tools & scripts based on “regular” filesystems. Re-use of Backup/Archive solutions built around particular filesystems. Mixed analytics pipelines. Using a MapReduce-specific filesystem (e.g. HDFS): Crawl Load Crawler writes to a traditional filesystem Analyze Output Serve Back to traditional fielsystem Into mapreduce filesystem Using a general-purpose filesystem (e.g. GPFS): Crawl Analyze Storage Systems Research Serve © 2008 IBM Corporation IBM Research Conclusion MapReduce platforms can use traditional filesystems without loss of performance. There are important reasons why traditional filesystems are attractive to users of MapReduce. Storage Systems Research © 2008 IBM Corporation