* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Title (46 pt. HP Simplified bold)
Survey
Document related concepts
Transcript
Vertica to HDFS Capstone Project Tharanga Gamaethige, Engineer, Data Management, Vertica University of Pittsburgh August30th, 2013 1 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Agenda • • • • 2 What is Vertica Bridge from Vertica to HDFS Success criteria Benefits to you © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. What Is Vertica • Founded in 2005 by database researcher Michael Stonebraker and a small group of engineers • Acquired by Hewlett Packard on March 2011. 3 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. What Is Vertica • SQL Database for Real-time Analytics • Runs on x86 hardware • MPP Columnar Architecture – scales to PBs! • Reduced footprint via Advanced Compression • Extensible analytics capabilities • Easy to setup and use • Elastic - grow/shrink as needed • Extensive Ecosystem of analytic tools 4 Speed Scale Simplicity © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Bridge from Vertica to HDFS HDFS cluster Vertica database cluster • • • 5 Use as a database to database export tool. Export data from Vertica tables into external targets e.g. to HDFS Extensible to facilitate different data formats, storage formats and data targets. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Bridge from Vertica to HDFS HDFS cluster Vertica database cluster Formatter > Tuples to Blocks • • • 6 Pipe delimited ORC file Etc. Prism > Blocks to Blocks • • • Zip TAR Etc. Target > Blocks to Storage • • • © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HDFS File system Etc. Success criteria a) Plugin that can read data from Vertica tables and export into an external target. E.g. HDFS cluster. b) Design the plugin to be scalable to export terabytes of data. c) Design the plugin to be extensible to support different data formats (pipe delimited, ORC files, etc.), storage formats (zip, tar, plain data, etc.) and data targets (HDFS, QFS, etc.) 7 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Benefits to you • Get hands-on experience in using Vertica and HDFS. • Learn to provide real-life design and implementation for extensibility, in the face of big data and distributed processing. • Recognition of being part of the open source community. • Potential recognition from Vertica’s 1000s of customers. • Most importantly free espressos, t-shirts and a coffee mug. 8 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Thanks! Tharanga Gamaethige : [email protected] Sennott Square 5404 9 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.