Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Antonio Abalos Castillo How to load your data faster and safer using Change Tracking in SQL Server Thank you to our sponsors! Agenda 1. Why faster data loads? 2. What is Change Tracking? 3. Design overview 4. Demo/implementation 5. Extra hints Why faster data loads? Corporations load and replicate data in a variety of ways They become unreliable or miss data over time They use unsupported ways to identify increment of data They are difficult to maintain Not optimal when identifying the updated data Need extra programming effort Do not follow standards Why faster data loads? Benefits of using this approach No programming overhead at the source Avoid using timestamps, row GUIDs or any other programming artifact Change Tracking is transparent to applications Maintenance cost is 0 Very low performance impact in the source database Multiple target systems can get data from the same source DB using this approach We get just the latest version, according to our last status. All different row status in the middle are skipped Running the delta more often will decrease the execution time MERGE is the fastest data loading method (SCD remains as a bad example) Minimally logged operations will help performance (maybe more than you think) What is Change Tracking? Change tracking is a lightweight solution that provides an efficient change tracking mechanism for applications Available since SQL Server 2008 Requires Standard edition of SQL Server or higher Lightweight: The incremental performance overhead that is associated with using change tracking on a table is similar to the overhead incurred when an index is created for a table and needs to be maintained https://technet.microsoft.com/en-us/library/hh710064(v=sql.110).aspx https://msdn.microsoft.com/en-us/library/bb933875(v=sql.110).aspx What is Change Tracking? Each insert/update/delete in each table will be tracked by: 1. The ID columns used in the table 2. [optional] the columns that were updated Changes are accumulated and reported by SQL Server according to the last version we got https://msdn.microsoft.com/en-us/library/hh710064(v=sql.110).aspx What is Change Tracking? Enable Change Tracking Database level ALTER DATABASE AdventureWorks2012 SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON) For each audited table ALTER TABLE dbo.SalesOrderDetail ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON) What is Change Tracking? Get changes from Change Tracking Get current version SET @ver = CHANGE_TRACKING_CURRENT_VERSION(); Get minimum valid version SET @mvv = CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID('dbo.Sales')); What is Change Tracking? Get changes from Change Tracking Get changes for one table DECLARE @last_ver BIGINT = 82; SELECT CT.SalesID, CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS FROM CHANGETABLE(CHANGES dbo.Sales, @last_ver) AS CT https://technet.microsoft.com/en-us/library/cc280358(v=sql.105).aspx Source • • Change Tracking enabled Isolation aware ETL • • Minimally logged operations Automatic delta/full load detection Target Design overview • • Staging area MERGE delta over target data Design overview Requirements: • SQL Server source database • Change Tracking enabled • Integration Services • MERGE statements (SQL 2008+) Other data sources: • Change Data Capture (Oracle) Demo Demo scenario Windows Azure VNET Server A Server B SQL Server Source database Change Tracking SQL Server Target database Logging SSIS Data flow Extra hints – Best practices Best practices • Transaction isolation strategy • • • • Index maintenance jobs can break big transactions at the source Watch out for complex data flows that may need to break down into simpler ones • • • Enable SNAPSHOT isolation in the source database Or create a source snapshot database The best is to have a one-to-one copy of the source table, but this is not always possible How do we deal with deleted rows? (joining tables) Do we need to track changes in columns? Extra hints - Trick list • • • • • • Use trace flag 610 (carefully) Use tab-lock in destination Use ORDER hint in destination Boost up DFT memory Boost up DFT number of rows Run parallel tasks The Data Loading Performance Guide https://msdn.microsoft.com/en-us/library/dd425070.aspx Extra hints - Other tricks • • • • • • • • Databases in “simple” recovery model Change page torn detection to NONE Create a DATA file group and set it as DEFAULT Create as many files as CPU in each file group (depends on storage) Separate the log file from the data files in different disks Consider using heaps for fast-load processes Consider using partitioned tables for regular tables Increase parallelism Extra hints - Security considerations Catalog views sys.change_tracking_databases sys.change_tracking_tables Permissions SELECT permission on at least the primary key columns on the change-tracked table to the table that is being queried VIEW CHANGE TRACKING permission on the table for which changes are being obtained https://msdn.microsoft.com/en-us/library/hh710064(v=sql.110).aspx Extra hints - Change Tracking Vs. Change Data Capture Change data capture (CDC) Change tracking (CT) Yes Yes Historical data Yes No Whether column was changed Yes Yes DML type Yes Yes Collects historical values, and therefore much more data than CT You have no idea on how many updates were made to a row, nor the values that were updated Tracked changes DML changes Tracked information https://technet.microsoft.com/en-us/library/cc280519(v=sql.105).aspx https://msdn.microsoft.com/en-us/library/bb933994.aspx Other references Brent Ozar’s guide to Change Tracking https://www.brentozar.com/archive/2014/06/performancetuning-sql-server-change-tracking/ Good guide for a data load using Change Tracking implementation https://www.timmitchell.net/post/2016/01/20/using-sqlserver-change-tracking-for-incremental-loads