* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Taming the Big Data Fire Hose
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					the NewSQL database you’ll never outgrow Taming the Big Data Fire Hose John Hugg Sr. Software Engineer, VoltDB Big Data Defined  Velocity + Moves at very high rates (think sensor-driven systems) + Valuable in its temporal, high velocity state  Volume + Fast-moving data creates massive historical archives + Valuable for mining patterns, trends and relationships  Variety + Structured (logs, business transactions) + Semi-structured and unstructured VoltDB 2 Example Big Data Use Cases VoltDB Data Source High-frequency operations Lower-frequency operations Capital markets Write/index all trades, store tick data Show consolidated risk across traders Call initiation request Real-time authorization Fraud detection/analysis Inbound HTTP requests Visitor logging, analysis, alerting Traffic pattern analytics Online game Rank scores: • Defined intervals • Player “bests” Leaderboard lookups Real-time ad trading systems Match form factor, placement criteria, bid/ask Report ad performance from exhaust stream Mobile device location sensor Location updates, QoS, transactions Analytics on transactions 3 Big Data and You  Incoming data streams are different than traditional business apps Big Data and You + You need to write data quickly and reliably, but …  It’s not just about high speed writes + + + + + VoltDB You need to validate in real-time You need to count and aggregate You need to analyze in real-time You need to scale on demand You may need to transact 4 Big Data Management Infrastructure High Velocity Online gaming Ad serving NewSQL     Structured data ACID guarantees Relational/SQL Real-time analytics     Unstructured data Eventual consistency Schemaless KV, document High Volume Analytic Datastore Sensor data Financial trade Internet commerce SaaS, Web 2.0 Mobile platforms VoltDB Other OLAP data stores NoSQL 5 Big Data Management Infrastructure High Velocity Online gaming NewSQL High Volume Analytic Datastore Ad serving Sensor data Financial trade Internet commerce SaaS, Web 2.0 Mobile platforms VoltDB Other OLAP data stores NoSQL 6 High Velocity Data Management High Velocity DBMS Requirements  Ingest at very high speeds and rates  Scale easily to meet growth and demand peaks  Support integrated fault tolerance  Support a wide range of real-time (or “near-time”) analytics  Integrate easily with high volume analytic datastores VoltDB 8 High Speed Data Ingestion  Support millions of write operations per second at scale  Read and write latencies below 50 milliseconds  Provide ACID-level consistency guarantees (maybe)  Support one or more well-known application interfaces + SQL + Key/Value + Document VoltDB 9 Scale to Meet Growth and Demand  Scale-out on commodity hardware  Built-in database partitioning + Manual sharding and/or add-on solutions are brittle, require apps to do “heavy lifting”, and can be an operational nightmare  Database must automatically implement defined partitioning strategy + Application should “see” a single database instance  Database should encourage scalability best practices + For example, replication of reference data minimizes need for multi-partition operations VoltDB 10 A Look Inside Partitioning select count(*) from orders where customer_id = 5 single-partition select count(*) from orders where product_id = 3 multi-partition insert into orders (customer_id, order_id, product_id) values (3,303,2) single-partition update products set product_name = ‘spork’ where product_id = 3 multi-partition Partition 1 VoltDB 1 1 4 101 101 401 1 2 3 knife spoon fork Partition 2 2 3 2 2 5 5 201 501 502 1 2 3 knife spoon fork Partition 3 1 3 2 3 6 6 201 601 601 1 2 3 knife spoon fork 1 1 2 table orders : (partitioned) customer_id (partition key) order_id product_id table products : product_id (replicated) product_name 11 Integrated Fault Tolerance  Database should transparently support built-in “Tandem-style” HA + Users should be able to easily increase/decrease fault tolerance levels  Database should be easily and quickly recoverable in the event of severe hardware failures  Database should be able to automatically detect and manage a variety of partition fault conditions  Downed nodes should be “rejoinable” without the need for service windows VoltDB 12 Partition Detection & Recovery Network fault protection  Detects partition event Server A  Determines which side of fault to disable Server C  Snapshots and disables orphaned node(s) Server B Live node rejoin  Allows “downed” nodes to rejoin live cluster Server A  Automatically re-synchs all node data Server C  Coordinates transactions during re-synch Server B VoltDB 13 Real-time Analytics  Database should support a wide variety of high performance reads + High-frequency single-partition + Lower-frequency multi-partition  Common analytic queries should be optimized in the database + Multi-partition aggregations, limits, etc.  Database should accommodate a flexible range of relational data operations + Particularly relevant to structured data VoltDB 14 Integration with Analytic Datastores  Database should offer high performance, transactional export  Export should allow a wide variety of common data enrichment operations + Normalize and de-normalize + De-duplicate + Aggregate  Architecture should support loosely-coupled integrations + Impedance mismatches + Durability VoltDB 15 VoltDB Export Data Flow High Velocity Database Cluster  Loosely-coupled, asynchronous  Queue must be durable  Bi-directional durability VoltDB 16 Summary  Big Data infrastructures will usually require more than one engine + High velocity engine for “fast” data + Analytic engine for “deep” data  Data characteristics will often determine which high velocity engine to use + NewSQL is often well-suited to structured data + NoSQL is often a good fit for unstructured data  Choose solutions that suit your needs and are designed for interoperability VoltDB 17
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            