Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Vertica Real-Time Analytics Engine Presenter Name Date The Vertica Real-Time Analytics Engine Leverages BI, ETL, Hadoop/MapReduce and OLTP investments Built-in redundancy that also speeds up queries No disk I/O bottleneck simultaneously load & query Automatic setup, optimization, and DB management Native DB-aware clustering on low-cost x86 Linux nodes Up to 90% space reduction using 10+ algorithms 50x – 1000x faster than traditional RDBMS Scales from TB to PB with industrystandard hardware Simple integration with existing ETL and BI solutions SQL-99+ compliant Ultimate deployment flexibility Extended advanced analytics 24/7 Load & Query Column Orientation Vertica intelligently organizes data on disk for each column • • • Only reads the columns involved in the query from disk instead of every row and column Reads and writes in very large block sizes All operations in the query engine built for columnar execution Ideal for load-/read-intensive workloads with dramatic reduction of disk I/O SELECT avg(price) FROM tickstore WHERE symbol = ‘AAPL” date = ‘5/06/09’ Column Store - Reads 3 columns NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS AAPL AAPL BBY BBY NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS 143.74 143.75 37.03 37.13 Row Store - Reads all columns AAPL AAPL BBY BBY NYASE NYAASE NYSE NYASE NGGYSE NYGGGSE NYSE NYSE NYSE NYASE NYAASE NYSE NYASE NGGYSE NYGGGSE NYSE NYSE NYSE NYASE NYAASE NYSE NYASE NGGYSE NYGGGSE NYSE NYSE NYSE NYASE NYAASE NYSE NYASE NGGYSE NYGGGSE NYSE NYSE NYSE 143.74 143.74 37.03 37.13 NYSE NYSE NYSE NYSE NYSE NYSE NYSE NYSE NYSE NYSE NYSE NYSE 5/05/09 5/06/09 5/05/09 5/06/09 NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS NYSE NYSE NYSE NQDS 5/05/09 5/06/09 5/05/09 5/06/09 Advanced Compression Vertica replaces slower disk I/O with fast CPU cycles through aggressive compression • • • • Uses properties of the data like sorting and cardinality Can be operated upon without decoding first Implements late materialization Decoded intelligently and as late as possible Encoding Mechanism Transaction Date 5/05/2009 5/05/2009, 5/05/2009 5/05/2009 5/05/2009 5/05/2009 5/05/2009 5/05/2009 5/05/2009 5/05/2009 16 Few values Sorted RLE Customer ID 0000001 0000001 0 0000001 2 0000003 2 0000003 4 0000005 10 10 0000011 19 0000011 25 0000020 49 0000026 0000050 0000051 0000052 Many values Integer Maybe sorted DeltaVal Many Others… Clickstream 10 Audit 10 Trading 5 SNMP 20 Network Logs 60 Marketing 20 Consumer 30 CDR 8 0% 20% Raw Data 40% 60% 80% Compressed Data 100% Automatic Design & Administration Database Designer recommends a physical database design that provides the best performance for the user’s workload • Minimize the time that DBAs spend on physical database tuning Re-run incrementally to optimize for changing workloads overtime • Background process runs “on-the-fly” Database Designer Generates DBA Provides Logical schema Create table “Sample set” of Typical queries Sample data K-safety level Physical schema, compression to: Make queries in sample set run fast Fit within trickle load requirements Ensure all SQL queries can be answered A B C (A B C | A) B A C (B A C | B A) Real-time Analytics Real-time analytics on large volumes of data Hybrid storage architecture • Concurrent load / query enabled by asynchronous “tuple mover” process Vertica achieves very low data latency (seconds) AND full context (store years of detailed history) Load performance scales with cluster size – proven at 40+ TB per hour Hybrid storage architecture Loads Read Optimized Store (ROS) Write Optimized Store (WOS) Trickle Load A B C TUPLE MOVER Asynchronous Data Transfer Memory based Unsorted / Uncompressed Segmented Low latency / Small quick inserts • • • • On disk Sorted / Compressed Segmented Large data loaded direct A B C (A B C | A) Streaming Load from Apache Kafka • Vertica loads continuously - consume from Kafka • Near Real-Time (seconds) CLI Load Scheduler • High volume: 2TB/hr on 3 node cluster • Exactly-Once (fault tolerant) • JSON, Avro data formats • CLI for easy setup Kafka Kafka Kafka Load Export • In-database monitoring • Vertica can also produce query results to Kafka Vertica Vertica Vertica Kafka Kafka Plugin Kafka Plugin Plugin P2P Massively Parallel Processing (MPP) Parallel design leverages data projections to enable distributed storage and workload • “Active” redundancy • Automatic replication, failover and recovery Shared-nothing, grid-based architecture provides scalability on clusters of commodity servers • Add nodes to achieve optimal capacity and performance Client Network Nodes are Peers No specialized nodes Private Data Network Node 1 2 x 12 Cores 128+GB RAM 20 TB Node 2 2 x 12 Cores 128+GB RAM 20 TB Node 3 2 x 12 Cores 128+GB RAM 20 TB Query/Load to any node Continuous/ realtime load and query Concurrency & Workload Management No leader node bottleneck! • • Query initiation work is evenly distributed between cluster nodes Concurrency scales as nodes are added to the cluster Configurable resource manager • Resource pools for different query workloads • Limit or guarantee resource availability to targeted query workloads • Set per-pool runtime priority, concurrency limits and runtime limits, and resource allocation guidelines • Resource limits per pool, per user, or per session Real-time administrative capabilities • • tactical tactical Adjust query priority in real-time Kill run away queries Tactical General Analytic User 2 Analytic User 1 analytic Native High Availability RAID-like functionality within database • • Projections are organized so if a node fails a copy is available on one of the surviving nodes Automatically stores redundant data sets for query performance gains as well Always-on queries & loads • • • • • • No need for manual log-based recovery System continues to load and query when nodes are down Recover missing data by querying other nodes Nodes recover individual tables No longer binary “node up or still recovering” Prioritize table recovery order B2 A2 C2 B1 A1 C1 B3 A3 C3 A3 B3 C3 A2 B2 C2 A1 B1 C1 Elastic Cluster Scale-Out Simple process to add more servers • • Add nodes to increase performance or capacity Vertica automatically redistributes data in the background No database downtime • Database continues to support query requests while rebalance is in progress High performance redistribution • • Elastic cluster and local segmentation enable fast cluster scaling E.g. One customer expanded their 11 TB database cluster from 16 nodes to 32 nodes in 65 minutes! Terrace Routing • • • Efficient resource usage for larger clusters Shuffle within rack, then cross rack Potential ½ memory, 20% performance boost Flexible Backup / Restore N to One File-based backup / restore utility Full or incremental backup • (IP Network) Change only files that have changed since previous backup run Hot backup • No lock contention with active database operations N to N Configurable backup options • • • Configurable mappings from Vertica nodes to backup server(s) Optional encryption option between database and backup locations Configurable number of restore points Object-level backup / restore • Configure backups per application / user / schema to meet individual SLAs (IP Network) N to M (IP Network) “Excavator” Backup / Restore Improvements Backup time (shorter bars are better) – Performance improvement up to several orders of magnitude Full backup – Remove need for file system hard linking – Object restore from full backup Pre-release 0 – Can implement active/active DR Customers: 7.1.2 Minimal incremental 100 200 300 400 minutes New York Data Center Chicago Data Center Vertica - Primary DB Vertica – Secondary DB store sales store customers products Replicate schema sales customers products Vertica Management Console • Manage multiple clusters from a single web-based console • Real-time view of database activity and cluster status • Correlate system and database activity Browser Access Vertica Management Console Cluster 1 Cluster 2 Cluster 3 Additional Features in “Excavator” Directed Queries Automatic Eviction of Slow Nodes Primary Key Enforcement – If you like your plan, you can keep it – “Bad” or “Hung” node impairs the cluster – Optionally validate keys on load – Save plan for a query, use it later – Directly inform optimizer how to execute query – Helpful during upgrade – Configurable heartbeat interval to evict – Immediate take-over by standby node – Can force inner/outer in hash joins Operational Improvements – Explicit move query between resource pools – Optimizer support for live-aggregate projections – Copy partitions between tables – Admintools SSH handling improvements – Mechanism for preserving tables of dropped users – CUBE support – Explain plans in JSON format – Performant due to auto projection design – Precursor to many optimizer features – Also validates Unique constraints Vertica Analytics A Rich Analytics Platform Benefits ANSI SQL • Window functions • Statistical Standard functionality that performs at scale HPE Vertica Extensions • Pattern matching • Event series joins • Time series Sessionization Conversion analysis Fraud detection See online doc • Analytic functions - http://tinyurl.com/v-analytic-functions • SDK - http://tinyurl.com/v-analytic-SDK SDKs • • • • C++ R Java Python Monte Carlo simulation, Custom Data Mining, XML/JSON Parsers and lots more SQL Analytics+ - Built for Big Data Features • • • • • • • Time series gap filing and interpolation Event window functions and sessionization Social Graphing Pattern matching Event series join Statistical functions Geospatial functions Benefits • • • High performance (Keep Data close to CPU) Low cost (Industry Standard building blocks) Ease of use (Automated + Available) Use Cases • Tickstore data cleanups • CDR/VOD data analysis • Clickstream sessionization • Data aggregation and compression • Monte Carlo simulation • Graph algorithms • Sensor Data • Process Control Time Series • SmartGrid • … Analytics Using HPE Vertica Graph Analytics Text Analytics HPE Vertica – R UDFs Network centrality metrics for Twitter profiles using the “igraph” package of R • Betweeness centrality • Closeness centrality • Eigen-vector centrality • Clustering coefficient HPE Vertica – C++ UDFs Tweet text processing and mining using C++ functions • @mention mining • Retweet handle mining • #tag mining Statistical Modeling HPE Vertica – R UDFs Statistical scorecard based modeling • Metric normalization • Outlier treatment • Structural equation modeling • Weighted scoring Application Integration HPE Vertica Integration with 3rd Party ETL and BI Tools Simple and seamless integration to existing BI and ETL tools • Vertica supports ODBC, JDBC, ADO.NET, and most ETL, BI, and visualization products Leverage existing investments and lower TCO Bulk & Trickle Loads HPE Vertica ETL, Replication, Data Quality Database SQL, ODBC, JDBC Analytics, Reporting Hadoop Integration Combining the Power of Vertica and Hadoop Vertica HPE Vertica Optimized Storage • Designed for Performance • Interactive Analytics • A rich SQL ecosystem Both Purposebuilt Scalable Analytics Platforms Hadoop • Designed for Faulttolerance • Batch Analytics • A rich Programming Model Core HPE Vertica Engine Partner Ecosystem Full ANSI SQL Known for: – High performance Columnar RDBMS – Scale Concurrency Optimized Plan – ANSI SQL Completeness + more Distributed MPP Execution – Modular Distributed Scaling ROS Columnar Format – Predictable Query Execution – Secure Encoded/Compressed EXT4, Built-in HA, Resource Management HPE Big Data Reference HW Architectures, x86, Cloud – HA and Resource utilization HPE Vertica for SQL on Hadoop Partner Ecosystem Full ANSI SQL A new member of the Vertica product family – Explore the data where it lives – Query the data regardless of format or structure Optimized Plan – Runs on any flavor of Hadoop Distributed MPP Execution Hadoop Open Formats ORCFile, Parquet, Avro, JSON Encoded/Compressed HDFS HPE Big Data Reference HW Architectures, x86, Cloud – Enterprise grade reliability and manageability – Simplify the ecosystem with a single query engine Pushing the performance envelope Partner Ecosystem Full ANSI SQL Native ORC Reader – Open source project in collaboration with Hadoop community – Locality – query where data resides Optimized Plan – Column Pruning Distributed MPP Execution – Predicate Pushdown ORC READER ORCFile Encoded/Compressed HDFS HPE Big Data Reference HW Architectures, x86, Cloud NET: Even faster than before! Map/Reduce and HDFS Connectors Load data from HDFS directly into Vertica using the HDFS Connector Use the Map/Reduce Connector to stream data directly between Vertica and your M/R job Hadoop / Vertica: Advanced Analytics Vertica MapReduce / Pig Job DFS Block 1 DFS Block 1 DFS Block 1 Map Vertica DFS Block 2 Map DFS Block 2 Reduce DFS Block 2 DFS Block 3 Map DFS Block 3 Vertica HPE Vertica for SQL on Hadoop features and benefits Query data, no matter where it is located Analytical Applications – Install HPE Vertica directly on your Hadoop infrastructure – Uses same architecture as HPE Vertica R Java Python SQL – Ingest open source formats - AVRO, JSON, etc. – Store in highly optimized ROS, ORC, or Parquet (future) HPE Vertica Core Engine – Query pre-existing data lakes – Experience full-functionality ANSI SQL 99 – Run 100% of TPC-DS queries Store: ROS Ingest: AVRO, JSON, etc. – Support for leading Hadoop distros (Hortonworks, MapR and Cloudera) – Proven enterprise-grade scalability and reliability via innovative “no helper node” architecture that eliminates a key single point of failure Query: ORC & Parquet HPE Vertica + Hadoop: Joint Use Cases Hadoop for ETL, Vertica for Analytics Log parsing / tagging / filtering Convert JSON into relational tuples HDFS for Storage, Vertica + Hadoop for Analytics Real-time analytics on Vertica (needs speed) Long-running / exploratory analytics on Hadoop (needs fault tolerance) Load from HDFS directly to Vertica Vertica SQL access to HDFS Vertica for Storage and Analytics, Hadoop as a multi-purpose tool Hadoop as a scheduler / load-balancer Hadoop to convert to formats for other tools (e.g. STATA) Hadoop for Backup via Sqoop Flex Zone HPE Vertica Flex Zone Challenge: Exploring varying and semi-structured data is time consuming and error prone Visualization Vertica Analytics Solution: HPE Vertica Flex Zone Daily Analytics Store and Explore Columnar Tables Flex Zone Tables Benefits: Cost-effective way to store and explore semi-structured data Skip creation and maintenance of time-consuming schemas One-step performance gains by seamlessly moving Flex Zone data into Vertica HPE Vertica Flex Zone A new approach to loading, managing, and exploring semi-structured data Features Auto-schematization for simple, semi-structured data loading Flexible parsers for JSON, Avro, Regex and delimited data Faster SQL querying on semi-structured data One-step schema for blazing-fast performance Benefits Avoid time-consuming schema development Optimize selected data easily for accelerated performance Rely on single system for structured & semi-structured data Respond quickly to changing business requests Simplify the ETL process Visualize semi-structured data using existing BI tools HPE Vertica 7 – Turbocharging Hadoop Semi-Structured Data Load, store and explore with Auto Schematization TM HPE Vertica Data Visualization Gain insight from all data Using popular BI tools Vertica SQL HPE Vertica Query with full, standard SQL Enterprise Hadoop Storage Hadoop Data Load and explore from PIG, MapReduce, HDFS or Hcatalog Flex Zone Serve Operationalize data with one step And Much More! Feature Highlights Feature Highlights Improved Manageability Enhanced Core Platform and Security – HPE Vertica Flex Zone – Expanded Workload Manageability – Kerberized HDFS Connector – Hcatalog integration – Intelligent Installation and Load Balancing – Kerberized Client Drivers – Data Collector Retention – Integrated Spread Installation and Configuration – Key / Value interface – Parallelized Loader – Large data types Long Strings – Rejection Data as a Table – EC2 AMI Instances – Database Designer API – Improved Control and Performance of ODBC Drivers – Secure Password Protection – JDBC 4.0 Support – Java SDK – Improved Networking Infrastructure – Faster Query Performance with USDF Predicates – Optimized Merge Vertica Technology - Summary Next-generation platform core Columnar storage and execution MPP architecture scales to petabytes Simple and flexible deployment, enterprise-class manageability Deploy in x86 hardware environments, scale to petabytes Robust backup, high availability, disaster recovery, and security solutions Service a range of workloads and users in one analytics hub Advanced in-database analytics Native SQL extensions for common analytics use cases UDx framework for high-performance custom analytics Seamless ecosystem integration Continue to leverage existing BI, ETL, and Hadoop investments Build custom plug-ins to load natively from varied data sources HPE Vertica Express and Premium edition details Analytics at Scale No Limits No Compromises Express - $15k/TB Premium - $25k/TB MPP Architecture ✓ ✓ High Availability ✓ ✓ Role-Based Security ✓ ✓ User Function Creation (UDx) ✓ ✓ Standard SQL (ANSI 99) ✓ ✓ Flex Tables ✓ ✓ Workload Analyzer, DB Designer, Mgmt Console ✓ ✓ Elastic Cluster ✓ ✓ Advanced SQL Analytics (Time Series, SQL Windowing, Gap Filling, more)* ✓ ✓ Fault Groups ✓ Key Value Interface ✓ Sentiment, Geospatial, R Extensions ✓ Column Security ✓ Live Aggregate & Pre-Join Projections ✓ Thank you