Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
In-Memory Columnstore Indexes--Make Your Data Warehouse Fly Joey D’Antoni PASS DBA Virtual Chapter 8 January 2014 About Me Solution Architect, Anexinet @jdanton – Twitter [email protected] Joedantoni.wordpress.com – Blog, Slides http://bit.ly/SQLColumnstore -- Slides, Resources Agenda Indexes—a basic overview Columnstore—an introduction Query Performance—Demo 2012 and 2014—What’s Changing? 2014—Demo Questions Indexes • Data Structure that allows us to speed data retrieval, by maintaining an extra copy of data • Can be filtered • Can be function based, or ordered • Penalty is that writes become more expensive • More storage required Indexes in SQL Server • Clustered vs. Nonclustered • Clustered Index—Index Organized Table • Non-clustered index “just an index” Clustered Index • Data is ordered as is inserted into pages • Data in clustered index is only stored on disk once (it’s the data from the tables) • Table without a clustered index is called a heap—no order at all Non-Clustered Index • Duplicate copy of the data in table • Provides point from index to table data • No specific order of data in index So Why All This Talk About Indexes? Data Warehouse Queries • Data Warehouses have a lot of data • Querying lots of a data can take a really long time • Processing data row by row—may not be the most efficient way to perform aggregations Traditional Approaches To Improving Performance • Partitioned Tables • Indexed Views • Data Compression Introducing Columnstore Indexes (SQL 2012) • Data is stored in columns, as opposed to rows • This allows a much higher rate of compression • Columns not used in a query a simply not scanned, nor returned • Recommended practice is to add most columns in a table to a index Columnar Data Storage Columnstore 2012 Demo So How is So Much Faster? • • • • Very good compression ratio for Column oriented data Better use of Memory Segment Elimination Skips Large Chunks of Data Batch Mode • Processes data in chunks of a 1000 row “batches” rather than row by row • 7-40x CPU savings with batch mode “The key to getting the best performance is to make sure your queries process the large majority of data in batch mode.” Columnstore All The Things? • Awesome performance—so what’s the negative? • Can’t update/insert in 2012 • Can only be nonclustered index—so we are storing more data on disk • Data types are somewhat limited • One index per table • Can’t be a sorted index Update Process (2012) Data To Be Loaded Staging Table Build Columnstore Index Fact Table Partition 1 Partition Switch Data From Staging to Fact Table Fact Table Partition 4 Fact Table Partition 2 Fact Table Partition 3 So Where To Use Columnstore Indexes? • Only on Large Tables—Fact tables and Dimension Tables > 3 Million Rows • Include Every Column • Structure Queries as star joins with grouping and aggregation More details here Columnstore 2014 Columnstore in 2014 • Fewer Data Type Limitations • Updateable • Can be Clustered Index • New Archival Compression Mode • Batch Mode Improvements Columnstore Updates (2014) Updates To Index Collected until they reach 1000 rows Tuple Movers Move into Index Columnstore Updates (2014) • Bulk Inserts go through special API • Updates are processed as inserts and deletes, so expensive operation Columnstore 2014 Demo What Do We Do Differently in 2014 • Best Practices are mostly the same • Batch mode gets enhanced and gains more query types • No need to worry about dropping and rebuilding indexes—just append data • Still focus on large tables where data is not frequently updated • Archival Compression Good for old unused data Questions Contact [email protected] Joedantoni.wordpress.com @jdanton http://bit.ly/SQLColumnstore -- Slides, Resources