Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SQL Server 2016 Operational Analytics Sponsorzy strategiczni Sponsorzy srebrni Łukasz Grala Microsoft MVP Data Platform | MCT | MCSE • Architect - Mentor Data Platform & Business Intelligence Solutions • Trainer Data Platform and Business Intelligence • University Lecturer • Author Webcasts and Publications • Microsoft MVP Data Platform • Leader PLSSUG Poznań • Phd Student on Poznan University of Technology, Faculty of Computing Science (topics – database and datawarehouse architecture, data mining, machine learning) [email protected] [email protected] Marcin Szeliga • Data Philosopher • BI Expert and Consultant • Data Platform Architect • 20 years of experience with SQL Server • Ph.D. Candidate at Politechnika Śląska • [email protected] Microsoft platform leads the way on-premises and cloud Leader in 2014 for Gartner Magic Quadrants Operational Database Management Systems Data Warehouse Database Management Systems Business Intelligence and Analytics Platforms x86 Server Virtualization Cloud Infrastructure as a Service Enterprise Application Platform as a Service Public Cloud Storage Do more. Achieve more. SQL Server 2016 improvements Performance Operational analytics Insights on operational data; Works with in-memory OLTP and disk-based OLTP In-memory OLTP enhancements Greater T-SQL surface area, terabytes of memory supported, and greater number of parallel CPUs Query data store Monitor and optimize query plans Native JSON Expanded support for JSON data Temporal database support Query data as points in time Security Always encrypted Sensitive data remains encrypted at all times with ability to query Row-level security Apply fine-grained access control to table rows Dynamic data masking Real-time obfuscation of data to prevent unauthorized access Other enhancements Audit success/failure of database operations TDE support for storage of inmemory OLTP tables Enhanced auditing for OLTP with ability to track history of record changes Availability Scalability Enhanced AlwaysOn Enhanced database caching Three synchronous replicas for auto failover across domains Cache data with automatic, multiple TempDB files per instance in multicore environments Round robin load balancing of replicas Automatic failover based on database health DTC for transactional integrity across database instances with AlwaysOn Support for SSIS with AlwaysOn Mission-critical performance Performance Operational analytics Insights on operational data; Works with in-memory OLTP and disk-based OLTP In-memory OLTP enhancements Greater T-SQL surface area, terabytes of memory supported, and greater number of parallel CPUs Query data store Monitor and optimize query plans Native JSON Expanded support for JSON data Temporal database support Query data as points in time Security Always encrypted Sensitive data remains encrypted at all times with ability to query Row-level security Apply fine-grained access control to table rows Dynamic data masking Real-time obfuscation of data to prevent unauthorized access Other enhancements Audit success/failure of database operations TDE support for storage of inmemory OLTP tables Enhanced auditing for OLTP with ability to track history of record changes Availability Scalability Enhanced AlwaysOn Enhanced database caching Three synchronous replicas for auto failover across domains Cache data with automatic, multiple TempDB files per instance in multicore environments Round robin load balancing of replicas Automatic failover based on database health DTC for transactional integrity across database instances with AlwaysOn Support for SSIS with AlwaysOn What does operational mean? • Refers to Operational Workload (i.e. OLTP) • Examples: • • • • Enterprise Resource Planning (ERP) – Inventory, Order, Sales, Machine Data – Data from machine operations on factory floor Online Stores (e.g. Amazon, Expedia) Stock/Security trades • Mission Critical • No downtime (High Availability) – impact on revenue • Low latency and high transaction throughput What does analytics mean? • Analytics • Studying past data (e.g. operational, social media) to identify potential trends • To analyze the effects of certain decisions or events (e.g. Ad campaign) • Analyze past/current data to predict outcomes (e.g. credit score) • Goals • Enhance the business by gaining knowledge to make improvements or changes Traditional BI architecture Key Issues • Complex Implementation • Requires two Servers (CapEx and OpEx) IIS Server • Data Latency in Analytics ETL • More businesses demand/require real-time Analytics Minimizing data latency for analytics Benefits • No Data Latency • No ETL • No Separate DW IIS Server Challenges • Analytics queries are resource intensive and can cause blocking This is • How toANALYTICS minimize Impact on Operational OPERATIONAL workload • Sub-optimal execution of Analytics on relational schema SQL Server 2016 Quick Recap: Columnstore Index Data stored as rows Data stored as columns C1 C2 C3 C4 C5 … Ideal for OLTP Efficient operation on small set of rows Improved compression: Data from same domain compress better Reduced I/O: Fetch only columns needed 16 Improved performance: More data fits in memory Optimized for CPU utilization Ideal for DW workload Clustered Columnstore Performance: TPC-H 17 Operational Analytics with columnstore index Delete bitmap Btree Index Delta rowgroups • • • • • 19 Minimizing CSI overhead DML Operations Btree Index Delete bitmap HOT • • • Delta rowgroups Order Management Application – CREATE NONCLUSTERED COLUMNSTORE INDEX ….. WHERE order_status = ‘SHIPPED’ 20 Operational Analytics with columnstore on In-Memory Tables Hash Index Range Index DRT In-Memory OLTP Table Tai l No explicit delta rowgroup Rows (tail) not in columnstore stay in in-memory OLTP table No columnstore index overhead when operating on tail Background task migrates rows from tail to columnstore in chunks of 1 million rows not changed in last 1 hour Deleted Rows Table (DRT) – tracks deleted rows Updateable CCI Columnstore data fully resident in memory Persisted together with operational data No application changes required 22 Query processing Demo time Performance improvments Scan type Elapsed time (s) Speedup Row store scan, interop 44.441 Row store scan, native 28.445 1.6x CSI scan, interop 0.802 55.4x Insert, Update, Delete costs and query time Operation Elasped time (s) with CSI CSI scan, interop 0.802 Insert 400 000 rows 53.5 CSI scan, interop 0.869 Update 400 000 rows 42.4 CSI scan, interop 1.181 Delete 400 000 rows 38.3 CSI scan, interop 1.231 Elasped time (s) No CSI Increase % Update Increase % Query BASE 47.8 11.9% 8.4% 28.9 46.7% 47.3% 30.5 25.6% 53.5% Single thread insert and update Operation Rows affected Row store (s) Secondary CSI (s) Primary CSI (s) 1000 updates 10 000 0.893 1.400 6.866 10% insert 18M 233.9 566 291.4 2% update 3.96M 123.2 314.3 275.9 Single thread scan Millions of rows Row store Secondary CSI Primary CSI New built 180 99.1 4.7 1.71 After 1000 updates 180 99.4 5.4 1.75 After 10% inserts 198 108.7 14.5 9.5 After 2% updates 198 109.5 16.8 10.0 Comparing performance Operation Billions of value per second No SIMD Billions of value per second SIMD Speedup Bit unpacking 6bits 2.08 11.55 5.55x Bit unpacking 12 bits 1.91 9.76 5.11x Bit unpacking 21 bits 1.96 5.29 2.70x Compaction 32 bits 1.24 6.70 5.40x Range predicate 16 bits 0.94 11.42 5.06x Sum 16 bit values 2.86 14.46 5.06x 128-bit bitmap filter 0.97 11.42 11.77x 64KB bitmap filter 1.01 2.37 2.35x Query performance (1) Predicate or aggregation Duration SQL2014 (ms) Duration SQL2016 (ms) Speedup Billion of rows per s Q1-Q4: select count(*) from LINEITEM where <predicate> L_ORDERKEY = 235236 220 140 1.57x 12.9 L_QUANTITY = 1900 664 68 9.76x 26.5 L_SHIPMODE='AIR' 694 147 4.72x 12.2 L_SHIPDATE between '01.01.1997' and '01.01.1998' 512 87 5.89x 20.7 Query performance (2) Predicate or aggregation Duration SQL2014 (ms) Duration SQL2016 (ms) Speedup Billion of rows per s Q5-Q6: select count(*) from PARTSUPP where <predicate> PS_AVAILQTY < 10 50 27 1.85x 8.9 PS_AVAILQTY = 10 45 15 3.00x 16 Q7-Q8: select <aggregates> from LINEITEM avg(L_DISCOUNT) 1272 196 6.49x 9.1 avg(L_DISCOUNT), min(L_ORDERKEY), max(L_ORDERKEY) 1978 356 5.56x 5.1 Availability Groups as data warehouse Always on Availability Group Secondary Replica Primary Replica Secondary Replica Secondary Replica Key points • Mission Critical Operational Workloads typically configured for High Availability using AlwaysOn Availability Groups • You can offload analytics to readable secondary replica Minimizing data latency for analytics IIS Server SSAS Enterprise Readiness: Tabular New DirectQuery DirectQuery for Oracle, Teradata, ASP DirectQuery support for MDX query(Excel Tools) High-end Server Hardware Sponsorzy strategiczni Sponsorzy srebrni