Download Track Data Changes (SQL Server) | Microsoft Docs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

SQL wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Functional Database Model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Clusterpoint wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Table of Contents
Overview
About Change Data Capture
Enable and Disable Change Data Capture
Administer and Monitor Change Data Capture
Work with Change Data
About Change Tracking
Enable and Disable Change Tracking
Manage Change Tracking
Work with Change Tracking
Change Data Capture and Other SQL Server Features
Track Data Changes (SQL Server)
3/24/2017 • 11 min to read • Edit Online
SQL Server 2016 provides two features that track changes to data in a database: change data capture and change
tracking. These features enable applications to determine the DML changes (insert, update, and delete operations)
that were made to user tables in a database. Change data capture and change tracking can be enabled on the
same database; no special considerations are required. For the editions of SQL Server that support change data
capture and change tracking, see Features Supported by the Editions of SQL Server 2016. Change tracking is
supported by SQL Database.
Benefits of Using Change Data Capture or Change Tracking
The ability to query for data that has changed in a database is an important requirement for some applications to
be efficient. Typically, to determine data changes, application developers must implement a custom tracking
method in their applications by using a combination of triggers, timestamp columns, and additional tables.
Creating these applications usually involves a lot of work to implement, leads to schema updates, and often carries
a high performance overhead.
Using change data capture or change tracking in applications to track changes in a database, instead of developing
a custom solution, has the following benefits:
There is reduced development time. Because functionality is available in SQL Server 2016, you do not have
to develop a custom solution.
Schema changes are not required. You do not have to add columns, add triggers, or create side table in
which to track deleted rows or to store change tracking information if columns cannot be added to the user
tables.
There is a built-in cleanup mechanism. Cleanup for change tracking is performed automatically in the
background. Custom cleanup for data that is stored in a side table is not required.
Functions are provided to obtain change information.
There is low overhead to DML operations. Synchronous change tracking will always have some overhead.
However, using change tracking can help minimize the overhead. The overhead will frequently be less than
that of using alternative solutions, especially solutions that require the use triggers.
Change tracking is based on committed transactions. The order of the changes is based on transaction
commit time. This allows for reliable results to be obtained when there are long-running and overlapping
transactions. Custom solutions that use timestamp values must be specifically designed to handle these
scenarios.
Standard tools are available that you can use to configure and manage. SQL Server 2016 provides standard
DDL statements, SQL Server Management Studio, catalog views, and security permissions.
Feature Differences Between Change Data Capture and Change
Tracking
The following table lists the feature differences between change data capture and change tracking. The tracking
mechanism in change data capture involves an asynchronous capture of changes from the transaction log so that
changes are available after the DML operation. In change tracking, the tracking mechanism involves synchronous
tracking of changes in line with DML operations so that change information is available immediately.
FEATURE
CHANGE DATA CAPTURE
CHANGE TRACKING
Yes
Yes
Historical data
Yes
No
Whether column was changed
Yes
Yes
DML type
Yes
Yes
Tracked changes
DML changes
Tracked information
Change Data Capture
Change data capture provides historical change information for a user table by capturing both the fact that DML
changes were made and the actual data that was changed. Changes are captured by using an asynchronous
process that reads the transaction log and has a low impact on the system.
As shown in the following illustration, the changes that were made to user tables are captured in corresponding
change tables. These change tables provide an historical view of the changes over time. The change data
capturefunctions that SQL Server provides enable the change data to be consumed easily and systematically.
Security Model
This section describes the change data capture security model.
Configuration and Administration
To either enable or disable change data capture for a database, the caller of sys.sp_cdc_enable_db (Transact-SQL)
or sys.sp_cdc_disable_db (Transact-SQL) must be a member of the fixed server sysadmin role. Enabling and
disabling change data capture at the table level requires the caller of sys.sp_cdc_enable_table (Transact-SQL) and
sys.sp_cdc_disable_table (Transact-SQL) to either be a member of the sysadmin role or a member of the database
database db_owner role.
Use of the stored procedures to support the administration of change data capture jobs is restricted to members
of the server sysadmin role and members of the database db_owner role.
Change Enumeration and Metadata Queries
To gain access to the change data that is associated with a capture instance, the user must be granted select access
to all the captured columns of the associated source table. In addition, if a gating role is specified when the capture
instance is created, the caller must also be a member of the specified gating role. Other general change data
capture functions for accessing metadata will be accessible to all database users through the public role, although
access to the returned metadata will also typically be gated by using select access to the underlying source tables,
and by membership in any defined gating roles.
DDL Operations to Change Data Capture Enabled Source Tables
When a table is enabled for change data capture, DDL operations can only be applied to the table by a member of
the fixed server role sysadmin, a member of the database role db_owner, or a member of the database role
db_ddladmin. Users who have explicit grants to perform DDL operations on the table will receive error 22914 if
they try these operation.
Data Type Considerations for Change Data Capture
All base column types are supported by change data capture. The following table lists the behavior and limitations
for several column types.
TYPE OF COLUMN
CHANGES CAPTURED IN CHANGE TABLES
LIMITATIONS
Sparse Columns
Yes
Does not support capturing changes
when using a columnset.
Computed Columns
No
Changes to computed columns are not
tracked. The column will appear in the
change table with the appropriate type,
but will have a value of NULL.
XML
Yes
Changes to individual XML elements
are not tracked.
Timestamp
Yes
The data type in the change table is
converted to binary.
BLOB data types
Yes
The previous image of the BLOB
column is stored only if the column
itself is changed.
Change Data Capture and Other SQL Server Features
This section describes how the following features interact with change data capture:
Database mirroring
Transactional replication
Database restore or attach
Database Mirroring
A database that is enabled for change data capture can be mirrored. To ensure that capture and cleanup happen
automatically on the mirror, follow these steps:
1. Ensure that SQL Server Agent is running on the mirror.
2. Create the capture job and cleanup job on the mirror after the principal has failed over to the mirror. To
create the jobs, use the stored procedure sys.sp_cdc_add_job (Transact-SQL).
For more information about database mirroring, see Database Mirroring (SQL Server).
Transactional Replication
Change data capture and transactional replication can coexist in the same database, but population of the change
tables is handled differently when both features are enabled. Change data capture and transactional replication
always use the same procedure, sp_replcmds, to read changes from the transaction log. When change data
capture is enabled on its own, a SQL Server Agent job calls sp_replcmds. When both features are enabled on the
same database, the Log Reader Agent calls sp_replcmds. This agent populates both the change tables and the
distribution database tables. For more information, see Replication Log Reader Agent.
Consider a scenario in which change data capture is enabled on the AdventureWorks2012 database, and two
tables are enabled for capture. To populate the change tables, the capture job calls sp_replcmds. The database is
enabled for transactional replication, and a publication is created. Now, the Log Reader Agent is created for the
database and the capture job is deleted. The Log Reader Agent continues to scan the log from the last log
sequence number that was committed to the change table. This ensures data consistency in the change tables. If
transactional replication is disabled in this database, the Log Reader Agent is removed and the capture job is recreated.
NOTE
When the Log Reader Agent is used for both change data capture and transactional replication, replicated changes are first
written to the distribution database. Then, captured changes are written to the change tables. Both operations are
committed together. If there is any latency in writing to the distribution database, there will be a corresponding latency
before changes appear in the change tables.
Restoring or Attaching a Database Enabled for Change Data Capture
SQL Server uses the following logic to determine if change data capture remains enabled after a database is
restored or attached:
If a database is restored to the same server with the same database name, change data capture remains
enabled.
If a database is restored to another server, by default change data capture is disabled and all related
metadata is deleted.
To retain change data capture, use the KEEP_CDC option when restoring the database. For more
information about this option, see RESTORE.
If a database is detached and attached to the same server or another server, change data capture remains
enabled.
If a database is attached or restored with the KEEP_CDC option to any edition other than Enterprise, the
operation is blocked because change data capture requires SQL Server Enterprise. Error message 932 is
displayed:
SQL Server cannot load database '%.*ls' because change data capture is enabled. The currently installed
edition of SQL Server does not support change data capture. Either disable change data capture in the
database by using a supported edition of SQL Server, or upgrade the instance to one that supports change
data capture.
You can use sys.sp_cdc_disable_db to remove change data capture from a restored or attached database.
Change Tracking
Change tracking captures the fact that rows in a table were changed, but does not capture the data that was
changed. This enables applications to determine the rows that have changed with the latest row data being
obtained directly from the user tables. Therefore, change tracking is more limited in the historical questions it can
answer compared to change data capture. However, for those applications that do not require the historical
information, there is far less storage overhead because of the changed data not being captured. A synchronous
tracking mechanism is used to track the changes. This has been designed to have minimal overhead to the DML
operations.
The following illustration shows a synchronization scenario that would benefit by using change tracking. In the
scenario, an application requires the following information: all the rows in the table that were changed since in the
last time that the table was synchronized, and only the current row data. Because a synchronous mechanism is
used to track the changes, an application can perform two-way synchronization and reliably detect any conflicts
that might have occurred.
Change Tracking and Sync Services for ADO.NET
Sync Services for ADO.NET enables synchronization between databases, providing an intuitive and flexible API that
enables you to build applications that target offline and collaboration scenarios. Sync Services for ADO.NET
provides an API to synchronize changes, but it does not actually track changes in the server or peer database. You
can create a custom change tracking system, but this typically introduces significant complexity and performance
overhead. To track changes in a server or peer database, we recommend that you use change tracking in SQL
Server 2016 because it is easy to configure and provides high performance tracking.
For more information about change tracking and Sync Services for ADO.NET, use the following links:
About Change Tracking (SQL Server)
Describes change tracking, provides a high-level overview of how change tracking works, and describes
how change tracking interacts with other SQL Server Database Engine features.
Microsoft Sync Framework Developer Center
Provides complete documentation for Sync Framework and Sync Services. In the documentation for Sync
Services, the topic "How to: Use SQL Server Change Tracking" contains detailed information and code
examples.
Related Tasks (required)
Task
Topic
Provides an overview of change data capture.
About Change Data Capture (SQL Server)
Describes how to enable and disable change data capture on
a database or table.
Enable and Disable Change Data Capture (SQL Server)
Describes how to administer and monitor change data
capture.
Administer and Monitor Change Data Capture (SQL Server)
Describes how to work with the change data that is available
to change data capture consumers. This topic covers
validating LSN boundaries, the query functions, and query
function scenarios.
Work with Change Data (SQL Server)
Provides an overview of change tracking.
About Change Tracking (SQL Server)
Describes how to enable and disable change tracking on a
database or table.
Enable and Disable Change Tracking (SQL Server)
Describes how to manage change tracking, configure security,
and determine the effects on storage and performance when
change tracking is used.
Manage Change Tracking (SQL Server)
Describes how applications that use change tracking can
obtain tracked changes, apply these changes to another data
store, and update the source database. This topic also
describes the role change tracking plays when a failover
occurs and a database must be restored from a backup.
Work with Change Tracking (SQL Server)
See Also
Change Data Capture Functions (Transact-SQL)
Change Tracking Functions (Transact-SQL)
Change Data Capture Stored Procedures (Transact-SQL)
Change Data Capture Tables (Transact-SQL)
Change Data Capture Related Dynamic Management Views (Transact-SQL)
About Change Data Capture (SQL Server)
3/24/2017 • 14 min to read • Edit Online
Change data capture records insert, update, and delete activity that is applied to a SQL Server table. This makes the
details of the changes available in an easily consumed relational format. Column information and the metadata
that is required to apply the changes to a target environment is captured for the modified rows and stored in
change tables that mirror the column structure of the tracked source tables. Table-valued functions are provided to
allow systematic access to the change data by consumers.
A good example of a data consumer that is targeted by this technology is an extraction, transformation, and
loading (ETL) application. An ETL application incrementally loads change data from SQL Server source tables to a
data warehouse or data mart. Although the representation of the source tables within the data warehouse must
reflect changes in the source tables, an end-to-end technology that refreshes a replica of the source is not
appropriate. Instead, you need a reliable stream of change data that is structured so that consumers can apply it to
dissimilar target representations of the data. SQL Server change data capture provides this technology.
Change Data Capture Data Flow
The following illustration shows the principal data flow for change data capture.
The source of change data for change data capture is the SQL Server transaction log. As inserts, updates, and
deletes are applied to tracked source tables, entries that describe those changes are added to the log. The log
serves as input to the capture process. This reads the log and adds information about changes to the tracked
table’s associated change table. Functions are provided to enumerate the changes that appear in the change tables
over a specified range, returning the information in the form of a filtered result set. The filtered result set is
typically used by an application process to update a representation of the source in some external environment.
Understanding Change Data Capture and the Capture Instance
Before changes to any individual tables within a database can be tracked, change data capture must be explicitly
enabled for the database. This is done by using the stored procedure sys.sp_cdc_enable_db. When the database is
enabled, source tables can be identified as tracked tables by using the stored procedure sys.sp_cdc_enable_table.
When a table is enabled for change data capture, an associated capture instance is created to support the
dissemination of the change data in the source table. The capture instance consists of a change table and up to two
query functions. Metadata that describes the configuration details of the capture instance is retained in the change
data capture metadata tables cdc.change_tables, cdc.index_columns, and cdc.captured_columns. This
information can be retrieved by using the stored procedure sys.sp_cdc_help_change_data_capture.
All objects that are associated with a capture instance are created in the change data capture schema of the
enabled database. The requirements for the capture instance name is that it be a valid object name, and that it be
unique across the database capture instances. By default, the name is <schema nametable name> of the source
table. Its associated change table is named by appending _CT to the capture instance name. The function that is
used to query for all changes is named by prepending **fn_cdc_get_all_changes** to the capture instance name. If
the capture instance is configured to support net changes, the net_changes query function is also created and
named by prepending fn_cdc_get_net_changes_ to the capture instance name.
Change Table
The first five columns of a change data capture change table are metadata columns. These provide additional
information that is relevant to the recorded change. The remaining columns mirror the identified captured
columns from the source table in name and, typically, in type. These columns hold the captured column data that is
gathered from the source table.
Each insert or delete operation that is applied to a source table appears as a single row within the change table.
The data columns of the row that results from an insert operation contain the column values after the insert. The
data columns of the row that results from a delete operation contain the column values before the delete. An
update operation requires one row entry to identify the column values before the update, and a second row entry
to identify the column values after the update.
Each row in a change table also contains additional metadata to allow interpretation of the change activity. The
column _$start_lsn identifies the commit log sequence number (LSN) that was assigned to the change. The commit
LSN both identifies changes that were committed within the same transaction, and orders those transactions. The
column \_$seqval can be used to order more changes that occur in the same transaction. The column __$operation
records the operation that is associated with the change: 1 = delete, 2 = insert, 3 = update (before image), and 4 =
update (after image). The column __$update_mask is a variable bit mask with one defined bit for each captured
column. For insert and delete entries, the update mask will always have all bits set. Update rows, however, will only
have those bits set that correspond to changed columns.
Change Data Capture Validity Interval for a Database
The change data capture validity interval for a database is the time during which change data is available for
capture instances. The validity interval begins when the first capture instance is created for a database table, and
continues to the present time.
Data that is deposited in change tables will grow unmanageably if you do not periodically and systematically
prune the data. The change data capture cleanup process is responsible for enforcing the retention-based cleanup
policy. First, it moves the low endpoint of the validity interval to satisfy the time restriction. Then, it removes
expired change table entries. By default, three days of data is retained.
At the high end, as the capture process commits each new batch of change data, new entries are added to
cdc.lsn_time_mapping for each transaction that has change table entries. Within the mapping table, both a
commit Log Sequence Number (LSN) and a transaction commit time (columns start_lsn and tran_end_time,
respectively) are retained. The maximum LSN value that is found in cdc.lsn_time_mapping represents the high
water mark of the database validity window. Its corresponding commit time is used as the base from which
retention based cleanup computes a new low water mark.
Because the capture process extracts change data from the transaction log, there is a built in latency between the
time that a change is committed to a source table and the time that the change appears within its associated
change table. While this latency is typically small, it is nevertheless important to remember that change data is not
available until the capture process has processed the related log entries.
Change Data Capture Validity Interval for a Capture Instance
Although it is common for the database validity interval and the validity interval of individual capture instance to
coincide, this is not always true. The validity interval of the capture instance starts when the capture process
recognizes the capture instance and starts to log associated changes to its change table. As a result, if capture
instances are created at different times, each will initially have a different low endpoint. The start_lsn column of the
result set that is returned by sys.sp_cdc_help_change_data_capture shows the current low endpoint for each
defined capture instance. When the cleanup process cleans up change table entries, it adjusts the start_lsn values
for all capture instances to reflect the new low water mark for available change data. Only those capture instances
that have start_lsn values that are currently less than the new low water mark are adjusted. Over time, if no new
capture instances are created, the validity intervals for all individual instances will tend to coincide with the
database validity interval.
The validity interval is important to consumers of change data because the extraction interval for a request must
be fully covered by the current change data capture validity interval for the capture instance. If the low endpoint of
the extraction interval is to the left of the low endpoint of the validity interval, there could be missing change data
due to aggressive cleanup. If the high endpoint of the extraction interval is to the right of the high endpoint of the
validity interval, the capture process has not yet processed through the time period that is represented by the
extraction interval, and change data could also be missing.
The function sys.fn_cdc_get_min_lsn is used to retrieve the current minimum LSN for a capture instance, while
sys.fn_cdc_get_max_lsn is used to retrieve the current maximum LSN value. When querying for change data, if the
specified LSN range does not lie within these two LSN values, the change data capture query functions will fail.
Handling Changes to Source Tables
To accommodate column changes in the source tables that are being tracked is a difficult issue for downstream
consumers. Although enabling change data capture on a source table does not prevent such DDL changes from
occurring, change data capture helps to mitigate the effect on consumers by allowing the delivered result sets that
are returned through the API to remain unchanged even as the column structure of the underlying source table
changes. This fixed column structure is also reflected in the underlying change table that the defined query
functions access.
To accommodate a fixed column structure change table, the capture process responsible for populating the change
table will ignore any new columns that are not identified for capture when the source table was enabled for
change data capture. If a tracked column is dropped, null values will be supplied for the column in the subsequent
change entries. However, if an existing column undergoes a change in its data type, the change is propagated to
the change table to ensure that the capture mechanism does not introduce data loss to tracked columns. The
capture process also posts any detected changes to the column structure of tracked tables to the cdc.ddl_history
table. Consumers wishing to be alerted of adjustments that might have to be made in downstream applications,
use the stored procedure sys.sp_cdc_get_ddl_history.
Typically, the current capture instance will continue to retain its shape when DDL changes are applied to its
associated source table. However, it is possible to create a second capture instance for the table that reflects the
new column structure. This allows the capture process to make changes to the same source table into two distinct
change tables having two different column structures. Thus, while one change table can continue to feed current
operational programs, the second one can drive a development environment that is trying to incorporate the new
column data. Allowing the capture mechanism to populate both change tables in tandem means that a transition
from one to the other can be accomplished without loss of change data. This can happen any time the two change
data capture timelines overlap. When the transition is effected, the obsolete capture instance can be removed.
NOTE
The maximum number of capture instances that can be concurrently associated with a single source table is two.
Relationship Between the Capture Job and the Transactional
Replication Logreader
The logic for change data capture process is embedded in the stored procedure sp_replcmds, an internal server
function built as part of sqlservr.exe and also used by transactional replication to harvest changes from the
transaction log. When change data capture alone is enabled for a database, you create the change data capture
SQL Server Agent capture job as the vehicle for invoking sp_replcmds. When replication is also present, the
transactional logreader alone is used to satisfy the change data needs for both of these consumers. This strategy
significantly reduces log contention when both replication and change data capture are enabled for the same
database.
The switch between these two operational modes for capturing change data occurs automatically whenever there
is a change in the replication status of a change data capture enabled database.
IMPORTANT
Both instances of the capture logic require SQL Server Agent to be running for the process to execute.
The principal task of the capture process is to scan the log and write column data and transaction related
information to the change data capture change tables. To ensure a transactionally consistent boundary across all
the change data capture change tables that it populates, the capture process opens and commits its own
transaction on each scan cycle. It detects when tables are newly enabled for change data capture, and
automatically includes them in the set of tables that are actively monitored for change entries in the log. Similarly,
disabling change data capture will also be detected, causing the source table to be removed from the set of tables
actively monitored for change data. When processing for a section of the log is finished, the capture process
signals the server log truncation logic, which uses this information to identify log entries eligible for truncation.
NOTE
When a database is enabled for change data capture, even if the recovery mode is set to simple recovery the log truncation
point will not advance until all the changes that are marked for capture have been gathered by the capture process. If the
capture process is not running and there are changes to be gathered, executing CHECKPOINT will not truncate the log.
The capture process is also used to maintain history on the DDL changes to tracked tables. The DDL statements
that are associated with change data capture make entries to the database transaction log whenever a change data
capture-enabled database or table is dropped or columns of a change data capture-enabled table are added,
modified, or dropped. These log entries are processed by the capture process, which then posts the associated DDL
events to the cdc.ddl_history table. You can obtain information about DDL events that affect tracked tables by
using the stored procedure sys.sp_cdc_get_ddl_history.
Change Data Capture Agent Jobs
Two SQL Server Agent jobs are typically associated with a change data capture enabled database: one that is used
to populate the database change tables, and one that is responsible for change table cleanup. Both jobs consist of a
single step that runs a Transact-SQL command. The Transact-SQL command that is invoked is a change data
capture defined stored procedure that implements the logic of the job. The jobs are created when the first table of
the database is enabled for change data capture. The Cleanup Job is always created. The capture job will only be
created if there are no defined transactional publications for the database. The capture job is also created when
both change data capture and transactional replication are enabled for a database, and the transactional logreader
job is removed because the database no longer has defined publications.
Both the capture and cleanup jobs are created by using default parameters. The capture job is started immediately.
It runs continuously, processing a maximum of 1000 transactions per scan cycle with a wait of 5 seconds between
cycles. The cleanup job runs daily at 2 A.M. It retains change table entries for 4320 minutes or 3 days, removing a
maximum of 5000 entries with a single delete statement.
The change data capture agent jobs are removed when change data capture is disabled for a database. The capture
job can also be removed when the first publication is added to a database, and both change data capture and
transactional replication are enabled.
Internally, change data capture agent jobs are created and dropped by using the stored procedures
sys.sp_cdc_add_job and sys.sp_cdc_drop_job, respectively. These stored procedures are also exposed so that
administrators can control the creation and removal of these jobs.
An administrator has no explicit control over the default configuration of the change data capture agent jobs. The
stored procedure sys.sp_cdc_change_job is provided to allow the default configuration parameters to be modified.
In addition, the stored procedure sys.sp_cdc_help_jobs allows current configuration parameters to be viewed. Both
the capture job and the cleanup job extract configuration parameters from the table msdb.dbo.cdc_jobs on startup.
Any changes made to these values by using sys.sp_cdc_change_job will not take effect until the job is stopped and
restarted.
Two additional stored procedures are provided to allow the change data capture agent jobs to be started and
stopped: sys.sp_cdc_start_job and sys.sp_cdc_stop_job.
NOTE
Starting and stopping the capture job does not result in a loss of change data. It only prevents the capture process from
actively scanning the log for change entries to deposit in the change tables. A reasonable strategy to prevent log scanning
from adding load during periods of peak demand is to stop the capture job and restart it when demand is reduced.
Both SQL Server Agent jobs were designed to be flexible enough and sufficiently configurable to meet the basic
needs of change data capture environments. In both cases, however, the underlying stored procedures that
provide the core functionality have been exposed so that further customization is possible.
Change data capture cannot function properly when the Database Engine service or the SQL Server Agent service
is running under the NETWORK SERVICE account. This can result in error 22832.
See Also
Track Data Changes (SQL Server)
Enable and Disable Change Data Capture (SQL Server)
Work with Change Data (SQL Server)
Administer and Monitor Change Data Capture (SQL Server)
Enable and Disable Change Data Capture (SQL
Server)
3/24/2017 • 7 min to read • Edit Online
This topic describes how to enable and disable change data capture for a database and a table.
Enable Change Data Capture for a Database
Before a capture instance can be created for individual tables, a member of the sysadmin fixed server role must
first enable the database for change data capture. This is done by running the stored procedure
sys.sp_cdc_enable_db (Transact-SQL) in the database context. To determine if a database is already enabled, query
the is_cdc_enabled column in the sys.databases catalog view.
When a database is enabled for change data capture, the cdc schema, cdc user, metadata tables, and other system
objects are created for the database. The cdc schema contains the change data capture metadata tables and, after
source tables are enabled for change data capture, the individual change tables serve as a repository for change
data. The cdc schema also contains associated system functions used to query for change data.
Change data capture requires exclusive use of the cdc schema and cdc user. If either a schema or a database user
named cdc currently exists in a database, the database cannot be enabled for change data capture until the schema
and or user are dropped or renamed.
See the Enable Database for Change Data Capture template for an example of enabling a database.
IMPORTANT
To locate the templates in SQL Server Management Studio, go to View, click Template Explorer, and then select SQL
Server Templates. Change Data Capture is a sub-folder. Under this folder, you will find all the templates referenced in this
topic. There is also a Template Explorer icon on the SQL Server Management Studio toolbar.
-- ====
-- Enable Database for CDC template
-- ====
USE MyDB
GO
EXEC sys.sp_cdc_enable_db
GO
Disable Change Data Capture for a Database
A member of the sysadmin fixed server role can run the stored procedure sys.sp_cdc_disable_db (Transact-SQL)
in the database context to disable change data capture for a database. It is not necessary to disable individual
tables before you disable the database. Disabling the database removes all associated change data capture
metadata, including the cdc user and schema and the change data capture jobs. However, any gating roles created
by change data capture will not be removed automatically and must be explicitly deleted. To determine if a
database is enabled, query the is_cdc_enabled column in the sys.databases catalog view.
If a change data capture enabled database is dropped, change data capture jobs are automatically removed.
See the Disable Database for Change Data Capture template for an example of disabling a database.
IMPORTANT
To locate the templates in SQL Server Management Studio, go to View, click Template Explorer, and then click SQL
Server Templates. Change Data Capture is a sub-folder where you will find all the templates that are referenced in this
topic. There is also a Template Explorer icon on the SQL Server Management Studio toolbar.
-- =======
-- Disable Database for Change Data Capture template
-- =======
USE MyDB
GO
EXEC sys.sp_cdc_disable_db
GO
Enable Change Data Capture for a Table
After a database has been enabled for change data capture, members of the db_owner fixed database role can
create a capture instance for individual source tables by using the stored procedure sys.sp_cdc_enable_table. To
determine whether a source table has already been enabled for change data capture, examine the
is_tracked_by_cdc column in the sys.tables catalog view.
The following options can be specified when creating a capture instance:
Columns in the source table to be captured.
By default, all of the columns in the source table are identified as captured columns. If only a subset of columns
need to be tracked, such as for privacy or performance reasons, use the @captured_column_list parameter to
specify the subset of columns.
A filegroup to contain the change table.
By default, the change table is located in the default filegroup of the database. Database owners who want to
control the placement of individual change tables can use the @filegroup_name parameter to specify a particular
filegroup for the change table associated with the capture instance. The named filegroup must already exist.
Generally, it is recommended that change tables be placed in a filegroup separate from source tables. See the
Enable a Table Specifying Filegroup Option template for an example showing use of the @filegroup_name
parameter.
-- =========
-- Enable a Table Specifying Filegroup Option Template
-- =========
USE MyDB
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'MyTable',
@role_name
= N'MyRole',
@filegroup_name = N'MyDB_CT',
@supports_net_changes = 1
GO
A role for controlling access to a change table.
The purpose of the named role is to control access to the change data. The specified role can be an existing fixed
server role or a database role. If the specified role does not already exist, a database role of that name is created
automatically. Members of either the sysadmin or db_owner role have full access to the data in the change
tables. All other users must have SELECT permission on all the captured columns of the source table. In addition,
when a role is specified, users who are not members of either the sysadmin or db_owner role must also be
members of the specified role.
If you do not want to use a gating role, explicitly set the @role_name parameter to NULL. See the Enable a Table
Without Using a Gating Role template for an example of enabling a table without a gating role.
-- =========
-- Enable a Table Without Using a Gating Role template
-- =========
USE MyDB
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'MyTable',
@role_name
= NULL,
@supports_net_changes = 1
GO
A function to query for net changes.
A capture instance will always include a table valued function for returning all change table entries that occurred
within a defined interval. This function is named by appending the capture instance name to
"cdc.fn_cdc_get_all_changes_". For more information, see cdc.fn_cdc_get_all_changes_<capture_instance>
(Transact-SQL).
If the parameter @supports_net_changes is set to 1, a net changes function is also generated for the capture
instance. This function returns only one change for each distinct row changed in the interval specified in the call.
For more information, see cdc.fn_cdc_get_net_changes_<capture_instance> (Transact-SQL).
To support net changes queries, the source table must have a primary key or unique index to uniquely identify
rows. If a unique index is used, the name of the index must be specified using the @index_name parameter. The
columns defined in the primary key or unique index must be included in the list of source columns to be captured.
See the Enable a Table for All and Net Changes Queries template for an example demonstrating the creation
of a capture instance with both query functions.
-- =============
-- Enable a Table for All and Net Changes Queries template
-- =============
USE MyDB
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'MyTable',
@role_name
= N'MyRole',
@supports_net_changes = 1
GO
NOTE
If change data capture is enabled on a table with an existing primary key, and the @index_name parameter is not used to
identify an alternative unique index, the change data capture feature will use the primary key. Subsequent changes to the
primary key will not be allowed without first disabling change data capture for the table. This is true regardless of whether
support for net changes queries was requested when change data capture was configured. If there is no primary key on a
table at the time it is enabled for change data capture, the subsequent addition of a primary key is ignored by change data
capture. Because change data capture will not use a primary key that is created after the table was enabled, the key and key
columns can be removed without restrictions.
Disable Change Data Capture for a Table
Members of the db_owner fixed database role can remove a capture instance for individual source tables by using
the stored procedure sys.sp_cdc_disable_table. To determine whether a source table is currently enabled for
change data capture, examine the is_tracked_by_cdc column in the sys.tables catalog view. If there are no tables
enabled for the database after the disabling takes place, the change data capture jobs are also removed.
If a change data capture-enabled table is dropped, change data capture metadata that is associated with the table
is automatically removed.
See the Disable a Capture Instance for a Table template for an example of disabling a table.
-- =====
-- Disable a Capture Instance for a Table template
-- =====
USE MyDB
GO
EXEC sys.sp_cdc_disable_table
@source_schema = N'dbo',
@source_name = N'MyTable',
@capture_instance = N'dbo_MyTable'
GO
See Also
Track Data Changes (SQL Server)
About Change Data Capture (SQL Server)
Work with Change Data (SQL Server)
Administer and Monitor Change Data Capture (SQL Server)
Administer and Monitor Change Data Capture (SQL
Server)
3/24/2017 • 10 min to read • Edit Online
This topic describes how to administer and monitor change data capture.
Capture Job
The capture job is initiated by running the parameterless stored procedure sp_MScdc_capture_job. This stored
procedure starts by extracting the configured values for maxtrans, maxscans, continuous, and pollinginterval for
the capture job from msdb.dbo.cdc_jobs. These configured values are then passed as parameters to the stored
procedure sp_cdc_scan. This is used to invoke sp_replcmds to perform the log scan.
Capture Job Parameters
To understand capture job behavior, you must understand how the configurable parameters are used by
sp_cdc_scan.
maxtrans Parameter
The maxtrans parameter specifies the maximum number of transactions that can be processed in a single scan
cycle of the log. If, during the scan, the number of transactions to be proessed reaches this limit, no additional
transactions are included in the current scan. After a scan cycle is complete, the number of transactions that were
processed will always be less than or equal to maxtrans.
maxscans Parameter
The maxscans parameter specifies the maximum number of scan cycles that are attempted to drain the log before
either returning (continuous = 0) or executing a waitfor (continuous = 1).
continous Parameter
The continuous parameter controls whether sp_cdc_scan relinquishes control in after either draining the log or
executing the maximum number of scan cycles (one shot mode). It also controles whether sp_cdc_scan continues
to run until explicitly stopped (continuous mode).
O n e Sh o t M o d e
In one shot mode, the capture job requests sp_cdc_scan to perform up to maxtrans scans to try to drain the log
and return. Any transactions in addition to maxtrans that are present in the log will be processed in later scans.
One shot mode is used in controlled tests, where the volume of transactions to be processed is known, and there
are advantages to the fact that the job closes automatically on when it is finished. One shot mode is not
recommended for production use. This is because t relies on the job schedule to manage how frequently the scan
cycle is run.
When running in one shot mode, you can compute an upper bound on expected throughput of the capture job,
expressed in transactions per second by using the following computation:
(maxtrans * maxscans) / number of seconds between scans
Even if the time that is required to scan the log and populate the change tables were not significantly different
from 0, the average throughput of the job could not exceed the value obtained by dividing the maximum allowed
transactions for a single scan multiplied by the maximum allowed scans by the number of seconds separating log
processing.
If one shot mode were to be used to regulate log scanning, the number of seconds between log processing would
have to be governed by the job schedule. When this kind of behavior is desired, running the capture job in
continuous mode is a better way to manage rescheduling the log scan.
Co n t i n u o u s Mo d e an d t h e P o l l i n g In t er val
In continuous mode, the capture job requests that sp_cdc_scan run continuously. This lets the stored procedure
manage its own wait loop by providing not only for maxtrans and maxscans but also a value for the number of
seconds between log processing (the polling interval). Running in this mode, the capture job remains active,
executing a WAITFOR between log scanning.
NOTE
When the value of the polling interval is greater than 0, the same upper limit on throughput for the recurring one shot job
also applies to the job operation in continuous mode. That is, (maxtrans * maxscans) divided by a nonzero polling interval
will put an upper bound on the average number of transactions that can be processed by the capture job.
Capture Job Customization
For the capture job, you can apply additional logic to determine whether a new scan begins immediately or
whether a sleep is imposed before it starts a new scan instead of rely on a fixed polling interval. The choice could
be based merely on time of the day, perhaps enforcing very long sleeps during peak activity times, and even
moving to a polling interval of 0 at close of day when it is important to complete the days processing and prepare
for nightly runs. Capture process progress could also be monitored to determine when all transactions committed
by mid-night had been scanned and deposited in change tables. This lets the capture job end, to be restarted by a
scheduled daily restart. By replacing the delivered job step calling sp_cdc_scan with a call to a user written
wrapper for sp_cdc_scan, highly customized behavior can be obtained with little additional effort.
Cleanup Job
This section provides information about how the change data capture cleanup job works.
Structure of the Cleanup Job
Change data capture uses a retention based cleanup strategy to manage change table size. The cleanup
mechanism consists of a SQL Server Agent Transact-SQL job that is created when the first database table is
enabled. A single cleanup job handles cleanup for all database change tables and applies the same retention value
to all defined capture instances.
The cleanup job is initiated by running the parameterless stored procedure sp_MScdc_cleanup_job. This stored
procedure starts by extracting the configured retention and threshold values for the cleanup job from
msdb.dbo.cdc_jobs. The retention value is used to compute a new low watermark for the change tables. The
specified number of minutes is substracted from the maximum tran_end_time value from the
cdc.lsn_time_mapping table to obtain the new low water mark expressed as a datetime value. The
CDC.lsn_time_mapping table is then used to convert this datetime value to a corresponding lsn value. If the same
commit time is shared by multiple entries in the table, the lsn that corresponds to the entry that has the smallest
lsn is chosen as the new low watermark. This lsn value is passed to sp_cdc_cleanup_change_tables to remove
change table entries from the database change tables.
NOTE
The advantage of using the commit time of the recent transaction as the base for computing the new low watermark is that
it lets the changes remain in change tables for the specified time. This happens even when the capture process is running
behind. All entries that have the same commit time as the current low watermark continue to be represented within the
change tables by choosing the smallest lsn that has the shared commit time for the actual low watermark.
When a cleanup is performed, the low watermark for all capture instances is initially updated in a single
transaction. It then tries to remove obsolete entries from the change tables and the cdc.lsn_time_mapping table.
The configurable threshold value limits how many entries are deleted in any single statement. Failure to perform
the delete on any individual table will not prevent the operation from being attempted on the remaining tables.
Cleanup Job Customization
For the cleanup job, the possibility for customization is in the strategy used to determine which change table
entries are to be discarded. The only supported strategy in the delivered cleanup job is a time-based one. In that
situation, the new low watermark is computed by subtracting the allowed retention period from the commit time
of the last transaction processed. Beacuse the underlying cleanup procedures are based on lsn instead of time, any
number of strategies can be used to determine the smallest lsn to keep in the change tables. Only some of these
are strictly time-based. Knowledge about the clients, for example, could be used to provide a failsafe if
downstream processes that require access to the change tables cannot run. Also, although the default strategy
applies the same lsn to clean up all the databases’ change tables, the underlying cleanup procedure, can also be
called to clean up at the capture instance level.
Monitor the Change Data Capture Process
Monitoring the change data capture process lets you determine if changes are being written correctly and with a
reasonable latency to the change tables. Monitoring can also help you to identify any errors that might occur. SQL
Server includes two dynamic management views to help you monitor change data capture:
sys.dm_cdc_log_scan_sessions and sys.dm_cdc_errors.
Identify Sessions with Empty Result Sets
Every row in sys.dm_cdc_log_scan_sessions represents a log scan session (except the row with an ID of 0). A log
scan session is equivalent to one execution of sp_cdc_scan. During a session, the scan can either return changes or
return an empty result. If the result set is empty, the empty_scan_count column in sys.dm_cdc_log_scan_sessions is
set to 1. If there are consecutive empty result sets, such as if the capture job is running continuously, the
empty_scan_count in the last existing row is incremented. For example, if sys.dm_cdc_log_scan_sessions already
contains 10 rows for scans that returned changes and there are five empty results in a row, the view contains 11
rows. The last row has a value of 5 in the empty_scan_count column. To determine sessions that had an empty
scan, run the following query:
SELECT * from sys.dm_cdc_log_scan_sessions where empty_scan_count <> 0
Determine Latency
The sys.dm_cdc_log_scan_sessions management view includes a column that records the latency for each capture
session. Latency is defined as the elapsed time between a transaction being committed on a source table and the
last captured transaction being committed on the change table. The latency column is populated only for active
sessions. For sessions with a value greater than 0 in the empty_scan_count column, the latency column is set to 0.
The following query returns the average latency for the most recent sessions:
SELECT latency FROM sys.dm_cdc_log_scan_sessions WHERE session_id = 0
You can use latency data to determine how fast or slow the capture process is processing transactions. This data is
most useful when the capture process is running continuously. If the capture process is running on a schedule,
latency can be high because of the lag between transactions being committed on the source table and the capture
process running at its scheduled time.
Another important measure of capture process efficiency is throughput. This is the average number of commands
per second that are processed during each session. To determine the throughput of a session, divide the value in
the command_count column by the value in the duration column. The following query returns the average
throughput for the most recent sessions:
SELECT command_count/duration AS [Throughput] FROM sys.dm_cdc_log_scan_sessions WHERE session_id = 0
Use Data Collector to Collect Sampling Data
The SQL Server data collector lets you collect snapshots of data from any table or dynamic management view and
build a performance data warehouse. When change data capture is enabled on a database, it is useful to take
snapshots of the sys.dm_cdc_log_scan_sessions view and the sys.dm_cdc_errors view at regular intervals for later
analysis. The following procedure sets up a data collector for collecting sample data from the
sys.dm_cdc_log_scan_sessions management view.
Configuring Data Collection
1. Enable data collector and configure a management data warehouse. For more information, see Manage
Data Collection.
2. Execute the following code to create a custom collector for change data capture.
USE msdb;
DECLARE @schedule_uid uniqueidentifier;
-- Collect and upload data every 5 minutes
SELECT @schedule_uid = (
SELECT schedule_uid from sysschedules_localserver_view
WHERE name = N'CollectorSchedule_Every_5min')
DECLARE @collection_set_id int;
EXEC dbo.sp_syscollector_create_collection_set
@name = N' CDC Performance Data Collector',
@schedule_uid = @schedule_uid,
@collection_mode = 0,
@days_until_expiration = 30,
@description = N'This collection set collects CDC metadata',
@collection_set_id = @collection_set_id output;
-- Create a collection item using statistics from
-- the change data capture dynamic management view.
DECLARE @paramters xml;
DECLARE @collection_item_id int;
SELECT @paramters = CONVERT(xml,
N'<TSQLQueryCollector>
<Query>
<Value>SELECT * FROM sys.dm_cdc_log_scan_sessions</Value>
<OutputTable>cdc_log_scan_data</OutputTable>
</Query>
</TSQLQueryCollector>');
EXEC dbo.sp_syscollector_create_collection_item
@collection_set_id = @collection_set_id,
@collector_type_uid = N'302E93D1-3424-4BE7-AA8E-84813ECF2419',
@name = ' CDC Performance Data Collector',
@frequency = 5,
@parameters = @paramters,
@collection_item_id = @collection_item_id output;
GO
3. In SQL Server Management Studio, expand Management, and then expand Data Collection. Right click
CDC Performance Data Collector, and then click Start Data Collection Set.
4. In the data warehouse you configured in step 1, locate the table custom_snapshots.cdc_log_scan_data. This
table provides a historical snapshot of data from log scan sessions. This data can be used to analyze latency,
throughput, and other performance measures over time.
See Also
Track Data Changes (SQL Server)
About Change Data Capture (SQL Server)
Enable and Disable Change Data Capture (SQL Server)
Work with Change Data (SQL Server)
Work with Change Data (SQL Server)
3/24/2017 • 14 min to read • Edit Online
Change data is made available to change data capture consumers through table-valued functions (TVFs). All
queries of these functions require two parameters to define the range of Log Sequence Numbers (LSNs) that are
eligible for consideration when developing the returned result set. Both the upper and lower LSN values that
bound the interval are considered to be included within the interval.
Several functions are provided to help determine appropriate LSN values for use in querying a TVF. The function
sys.fn_cdc_get_min_lsn returns the smallest LSN that is associated with a capture instance validity interval. The
validity interval is the time interval for which change data is currently available for its capture instances. The
function sys.fn_cdc_get_max_lsn returns the largest LSN in the validity interval. The functions
sys.fn_cdc_map_time_to_lsn and sys.fn_cdc_map_lsn_to_time are available to help place LSN values on a
conventional timeline. Because change data capture uses closed query intervals, it is sometimes necessary to
generate the next LSN value in a sequence to ensure that changes are not duplicated in consecutive query
windows. The functions sys.fn_cdc_increment_lsn and sys.fn_cdc_decrement_lsn are useful when an incremental
adjustment to an LSN value is required.
Validating LSN Boundaries
We recommend validating the LSN boundaries that are to be used in a TVF query before their use. Null endpoints
or endpoints that lie outside the validity interval for a capture instance will force an error to be returned by a
change data capture TVF.
For example, the following error is returned for a query for all changes when a parameter that is used to define
the query interval is not valid, or is out of range, or the row filter option is invalid.
Msg 313, Level 16, State 3, Line 1
An insufficient number of arguments were supplied for the procedure or function cdc.fn_cdc_get_all_changes_
...
The corresponding error returned for a net changes query is the following:
Msg 313, Level 16, State 3, Line 1
An insufficient number of arguments were supplied for the procedure or function cdc.fn_cdc_get_net_changes_
...
NOTE
It is recognized that the message for Msg 313 is misleading and does not convey the actual cause of the failure. This
awkward usage stems from the inability to raise an explicit error from within a TVF. Nevertheless, the value of returning a
recognizable, if inaccurate, error was deemed preferable to simply returning an empty result. An empty result set would not
be distinguishable from a valid query returning no changes.
Authorization failures will return failures when querying for all changes, as shown:
Msg 229, Level 14, State 5, Line 1
The SELECT permission was denied on the object 'fn_cdc_get_all_changes_...', database 'MyDB', schema 'cdc'.
The same is true when querying for net changes:
Msg 229, Level 14, State 5, Line 1
The SELECT permission was denied on the object fn_cdc_get_net_changes_...', database 'MyDB', schema 'cdc'.
See the template Enumerate Net Changes Using TRY CATCH for a demonstration of how to intercept these known
TVF errors and return more meaningful information about the failure.
NOTE
To locate change data capture templates in SQL Server Management Studio, on the View menu, click Template Explorer,
expand SQL Server Templates and then expand the Change Data Capture folder.
Query Functions
Depending on the characteristics of the source table being tracked and the way in which its capture instance is
configured, either one or two TVFs for querying change data are generated.
The function cdc.fn_cdc_get_all_changes_ returns all changes that occurred for the specified interval. This
function is always generated. Entries are always returned sorted, first by the transaction commit LSN of the
change, and then by a value that sequences the change within its transaction. Depending on the row filter
option chosen, either the final row is returned on update (row filter option "all") or both the new and old
values are returned on update (row filter option "all update old"').
The function cdc.fn_cdc_get_net_changes_ is generated when the parameter @supports_net_changes is set
to 1 when the source table is enabled.
NOTE
This option is only supported if the source table has a defined primary key or if the parameter @index_name has
been used to identify a unique index.
The netchanges function returns one change per modified source table row. If more than one change is
logged for the row during the specified interval, the column values will reflect the final contents of the row.
To correctly identify the operation that is necessary to update the target environment, the TVF must
consider both the initial operation on the row during the interval and the final operation on the row. When
the row filter option 'all' is specified, the operations that are returned by a net changes query will either be
insert, delete, or update (new values). This option always returns the update mask as null because there is a
cost associated with computing an aggregate mask. If you require an aggregate mask that reflects all
changes to a row, use the 'all with mask' option. If downstream processing does not require inserts and
updates to be distinguished, use the 'all with merge' option. In this case, the operation value will only take
on two values: 1 for delete and 5 for an operation that could be either an insert or an update. This option
eliminates the additional processing needed to determine whether the derived operation should be an
insert or an update, and can improve the performance of the query when this differentiation is not
necessary.
The update mask that is returned from a query function is a compact representation that identifies all
columns that changed in a row of change data. Typically, this information is only required for a small
subset of the captured columns. Functions are available to assist in extracting information from the mask in
a form that is more directly usable by applications. The function sys.fn_cdc_get_column_ordinal returns the
ordinal position of a named column for a given capture instance, whereas the function sys.fn_cdc_is_bit_set
returns the parity of the bit in the provided mask based on the ordinal that was passed in the function call.
Together, these two functions allow information from the update mask to be efficiently extracted and
returned with the request for change data. See the template Enumerate Net Changes Using All With Mask
for a demonstration of how these functions are used.
Query Function Scenarios
The following sections describe common scenarios for querying change data capture data by using the query
functions cdc.fn_cdc_get_all_changes_ and cdc.fn_cdc_get_net_changes_.
Querying for All Changes Within the Capture Instance Validity Interval
The most straightforward request for change data is one that returns all of the current change data in a capture
instance’s validity interval. To make this request, first determine the lower and upper LSN boundaries of the
validity interval. Then, use these values to identify the parameters @from_lsn and @to_lsn passed to the query
function cdc.fn_cdc_get_all_changes_ or cdc.fn_cdc_get_net_changes_. Use the function sys.fn_cdc_get_min_lsn to
obtain the lower bound, and sys.fn_cdc_get_max_lsn to obtain the upper bound. See the template Enumerate All
Changes for the Valid Range for sample code to query for all current valid changes by using the query function
cdc.fn_cdc_get_all_changes_. See the template Enumerate Net Changes for the Valid Range for a similar example
of using the function cdc.fn_cdc_get_net_changes_.
Querying for All New Changes Since the Last Set of Changes
For typical applications, querying for change data will be an ongoing process, making periodic requests for all of
the changes that occurred since the last request. For such queries, you can use the function
sys.fn_cdc_increment_lsn to derive the lower bound of the current query from the upper bound of the previous
query. This method ensures that no rows are repeated because the query interval is always treated as a closed
interval where both end-points are included in the interval. Then, use the function sys.fn_cdc_get_max_lsn to
obtain the high end-point for the new request interval. See the template Enumerate All Changes Since Previous
Request for sample code to systematically move the query window to obtain all changes since the last request.
Querying for all New Changes Up Until Now
A typical constraint that is placed on the changes returned by a query function is to include only the changes that
occurred between the previous request until the current date and time. For this query, apply the function
sys.fn_cdc_increment_lsn to the @from_lsn value that was used in the previous request to determine the lower
bound. Because the upper bound on the time interval is expressed as a specific point in time, it must be converted
to an LSN value before it can be used by a query function. Before the datetime value can be converted to a
corresponding LSN value, you must ensure that the capture process has processed all changes that were
committed through the specified upper bound. This is required to ensure that all the qualifying changes have been
propagated to the change table. One way to do this is to structure a wait loop that periodically checks to see if the
current maximum commit lsn recorded for any database change table exceeds the desired end time of the request
interval.
After the delay loop verifies that the capture process has already processed all the relevant log entries, use the
function sys.fn_cdc_map_time_to_lsn to determine the new high end-point expressed as an LSN value. To ensure
that all entries that were committed through the specified time are retrieved, call the function
sys.fn_cdc_map_time_to_lsn, and use the option 'largest less than or equal'.
NOTE
In periods of inactivity, a dummy entry is added to the table cdc.lsn_time_mapping to mark the fact that the capture process
has processed the changes up to a given commit time. This prevents it from appearing that the capture process has fallen
behind when there are simply no recent changes to process.
The template Enumerate All Changes Up Until Now demonstrates how to use the previous strategy to query for
change data.
Adding a Commit Time to an All Changes Result Set
The commit time of each transaction with an associated entry in a database change table is available in the table
cdc.lsn_time_mapping. By joining the __$start_lsn value returned in a request for all changes with the start_lsn
value of a cdc.lsn_time_mapping table entry, you can return the tran_end_time along with the change data to
stamp the change with the commit time of the transaction at the source. The template Append Commit Time to All
Changes Result Set demonstrates how to perform this join.
Joining Change Data with Other Data from the Same Transaction
Occasionally, it is useful to join change data with other information gathered about the transaction when it
committed at the source. The tran_begin_lsn column in the table cdc.lsn_time_mapping provides the information
needed to perform such a join. When the update of the source occurs, the value for
database_transaction_begin_lsn from the system dynamic view sys.dm_tran_database_transactions must be saved
along with any other information to be joined with the change data. Use the function
fn_convertnumericlsntobinary to compare the database_transaction_begin_lsn and tran_begin_lsn values. The
code to create this function is available in the template Create Function fn_convertnumericlsntobinary. The
template Return All Changes with a Given tran_begin_lsn demonstrates how to effect the join.
Querying Using Datetime Wrapper Functions
A typical application scenario for querying for change data is to periodically request change data by using a
sliding window bounded by datetime values. For this class of consumers, change data capture provides the stored
procedure sys.sp_cdc_generate_wrapper_function that generates scripts to create custom wrapper functions for
the change data capture query functions. These custom wrappers allow the query interval to be expressed as a
datetime pair.
Calling options for the stored procedure allow for wrappers to be generated for all capture instances that the
caller has access to, or only a specified capture instance. Supported options also include the ability to specify
whether the high end-point of the capture interval should be open or closed, which of the available captured
columns should be included in the result set and which of the included columns should have associated update
flags. The procedure returns a result set with two columns: the generated function name, which is derivable from
the capture instance name, and the create statement for the wrapper stored procedure. The function to wrap the
all changes query is always generated. If the @supports_net_changes parameter was set when the capture
instance was created, the function to wrap the net changes function is also generated.
It is the responsibility of the application designer to call the script generation stored procedure to generate the
create statements for the wrapper stored procedures, and to execute the resulting create scripts to create the
functions. This does not occur automatically when a capture instance is created.
Datetime wrappers are owned by the user, and not are created in the default schema of the caller. The generated
function is suitable without modification for most users. However, further customization can always be applied to
the generated script prior to creating the function.
The name of the function to wrap the all changes query is fn_all_changes_ followed by the capture instance name.
The prefix that is used for the net changes wrapper is fn_net_changes_. Both functions take three arguments, just
as their associated change data capture TVFs do. However, the query interval for the wrappers is bounded by two
datetime values instead of than by two LSN values. The @row_filter_option parameter for both sets of functions
are the same.
The generated wrapper functions support the following convention for systematically walking the change data
capture timeline: It is expected that the @end_time parameter of the previous interval be used as the @start_time
parameter of the subsequent interval. The wrapper function takes care of mapping the datetime values to LSN
values and ensuring that no data is lost or repeated if this convention is followed.
The wrappers can be generated to support either a closed upper bound or an open upper bound on the specified
query window. That is, the caller can specify whether entries having a commit time equal to the upper bound of
the extraction interval are to be included within the interval. By default, the upper bound is included.
While the generated query TVFs fail if supplied a null value for either the @from_lsn value or the @to_lsn value,
the datetime wrapper functions use null to allow the datetime wrappers to return all current changes. That is, if
null is passed as the low end-point of the query window to the datetime wrapper, the low end point of the capture
instance validity interval is used in the underlying SELECT statement that is applied to the query TVF. Similarly, if
null is passed as the high end-point of the query window, the high end-point of the capture instance validity
interval is used when selecting from the query TVF.
The result set returned by a wrapper function includes all the requested columns followed by an operation
column, recoded as one or two characters to identify the operation that is associated with the row. If update flags
have been requested, they appear as bit columns after the operation code, in the order specified in the
@update_flag_list parameter. For information about the calling options for customizing the generated datetime
wrappers, see sys.sp_cdc_generate_wrapper_function (Transact-SQL).
The template Instantiate a Wrapper TVF With Update Flag shows how to customize a generated wrapper function
to append an update flag for a specified column to the result set returned by a net changes query. The template
Instantiate CDC Wrapper TVFs for a Schema shows how to instantiate the Datetime Wrappers for the Query TVFs
for all of the capture instances created for the source tables in a given database schema.
For an example that uses a datetime wrapper to query for change data, see the template Get Net Changes Using
Wrapper With Update Flags. This template demonstrates how to query for net changes with a wrapper function
when the wrapper is configured to return update flags. Note that the row filter option 'all with mask' is required
for the underlying query function to return a non-null update mask on update. Null values are passed for both the
lower and upper datetime interval boundaries to signal the function to use the low end point and the high end
point of the validity interval for the capture instance when performing the underlying LSN based query. The query
returns one row for each modification to a source row that occurred within the valid range for the capture
instance.
Using the Datetime Wrapper Functions to Transition Between Capture Instances
Change data capture supports up to two capture instances for a single tracked source table. The principal use of
this capability is to accommodate a transition between multiple capture instances when data definition language
(DDL) changes to the source table expand the set of available columns for tracking. When transitioning to a new
capture instance, one way to protect higher application levels from changes in the names of the underlying query
functions is to use a wrapper function to wrap the underlying call. Then, ensure that the name of the wrapper
function remains the same. When the switch is to occur, the old wrapper function can be dropped, and a new one
with the same name created that references the new query functions. By first modifying the generated script to
create a wrapper function of the same name, you can make the switch to a new capture instance without affecting
higher application layers.
See Also
Track Data Changes (SQL Server)
About Change Data Capture (SQL Server)
Enable and Disable Change Data Capture (SQL Server)
Administer and Monitor Change Data Capture (SQL Server)
About Change Tracking (SQL Server)
3/24/2017 • 4 min to read • Edit Online
THIS TOPIC APPLIES TO: SQL Server (starting with 2008) Azure SQL Database Azure SQL Data
Warehouse
Parallel Data Warehouse
Change tracking is a lightweight solution that provides an efficient change tracking mechanism for applications.
Typically, to enable applications to query for changes to data in a database and access information that is related
to the changes, application developers had to implement custom change tracking mechanisms. Creating these
mechanisms usually involved a lot of work and frequently involved using a combination of triggers, timestamp
columns, new tables to store tracking information, and custom cleanup processes.
Different types of applications have different requirements for how much information they need about the
changes. Applications can use change tracking to answer the following questions about the changes that have
been made to a user table:
What rows have changed for a user table?
Only the fact that a row has changed is required, not how many times the row has changed or the
values of any intermediate changes.
The latest data can be obtained directly from the table that is being tracked.
Has a row changed?
The fact that a row has changed and information about the change must be available and recorded at
the time that the change was made in the same transaction.
NOTE
If an application requires information about all the changes that were made and the intermediate values of the changed
data, using change data capture, instead of change tracking, might be appropriate. For more information, see About Change
Data Capture (SQL Server).
One-Way and Two-Way Synchronization Applications
Applications that have to synchronize data with an instance of the SQL Server Database Engine must be able to
query for changes. Change tracking can be used as a foundation for both one-way and two-way synchronization
applications.
One -Way Synchronization Applications
One-way synchronization applications, such as a client or mid-tier caching application, can be built that use
change tracking. As shown in the following illustration, a caching application requires data to be stored in the
Database Engine and to be cached in other data stores. The application must be able to keep the cache up-to-date
with any changes that have been made to the database tables. There are no changes to pass back to the Database
Engine.
Two -Way Synchronization Applications
Two-way synchronization applications can also be built that use change tracking. In this scenario, the data in an
instance of the Database Engine is synchronized with one or more data stores. The data in those stores can be
updated and the changes must be synchronized back to the Database Engine.
A good example of two-way synchronization application is an occasionally connected application. In this type of
application, a client application queries and updates a local store. When a connection is available between a client
and server, the application will synchronize with a server, and changed data flows in both directions.
The two-way synchronization applications must be able to detect conflicts. A conflict would occur if the same data
was changed in both data stores in the time between synchronizations. With the ability to detect conflicts, an
application can make sure that changes are not lost.
How Change Tracking Works
To configure change tracking, you can use DDL statements or SQL Server Management Studio. For more
information, see Enable and Disable Change Tracking (SQL Server). To track changes, change tracking must first be
enabled for the database and then enabled for the tables that you want to track within that database. The table
definition does not have to be changed in any way, and no triggers are created.
After change tracking is configured for a table, any DML statement that affects rows in the table will cause change
tracking information for each modified row to be recorded. To query for the rows that have changed and to obtain
information about the changes, you can use change tracking functions.
The values of the primary key column is only information from the tracked table that is recorded with the change
information. These values identify the rows that have been changed. To obtain the latest data for those rows, an
application can use the primary key column values to join the source table with the tracked table.
Information about the change that was made to each row can also be obtained by using change tracking. For
example, the type of DML operation that caused the change (insert, update, or delete) or the columns that were
changed as part of an update operation.
Change Tracking Cleanup
Change tracking information for all tables (enabled for Change Tracking) is stored in an in-memory rowstore.
Change tracking data associated with each table enabled for Change Tracking is flushed on every checkpoint from
the in-memory rowstore to the corresponding on-disk internal table. During checkpoint, the in-memory rowstore
is also purged after the rows are moved to the on-disk tables.
Each table that is enabled for Change Tracking has an internal on-disk table which is used by Change Tracking
functions to determine the change version and the rows that have changed since a particular version. Every time
the auto cleanup thread wakes up, it scans all the user databases on the SQL Server instance to identify the
change tracking enabled databases. Based on the retention period setting of the database, each internal on-disk
table is purged of its expired records.
A stored procedure was added in Service Packs for SQL Server 2014 and SQL Server 2016 for performing manual
cleanup for the internal Change Tracking internal tables. More information about the stored procedure is available
in KB173157.
See Also
Enable and Disable Change Tracking (SQL Server)
Work with Change Tracking (SQL Server)
Manage Change Tracking (SQL Server)
Track Data Changes (SQL Server)
Enable and Disable Change Tracking (SQL Server)
3/24/2017 • 3 min to read • Edit Online
THIS TOPIC APPLIES TO: SQL Server (starting with 2008) Azure SQL Database Azure SQL Data
Warehouse
Parallel Data Warehouse
This topic describes how to enable and disable change tracking for a database and a table.
Enable Change Tracking for a Database
Before you can use change tracking, you must enable change tracking at the database level. The following example
shows how to enable change tracking by using ALTER DATABASE.
ALTER DATABASE AdventureWorks2012
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)
You can also enable change tracking in SQL Server Management Studio by using the Database Properties
(ChangeTracking Page) dialog box.
You can specify the CHANGE_RETENTION and AUTO_CLEANUP options when you enable change tracking, and
you can change the values at any time after change tracking is enabled.
The change retention value specifies the time period for which change tracking information is kept. Change
tracking information that is older than this time period is removed periodically. When you are setting this value,
you should consider how often applications will synchronize with the tables in the database. The specified
retention period must be at least as long as the maximum time period between synchronizations. If an application
obtains changes at longer intervals, the results that are returned might be incorrect because some of the change
information has probably been removed. To avoid obtaining incorrect results, an application can use the
CHANGE_TRACKING_MIN_VALID_VERSION system function to determine whether the interval between
synchronizations has been too long.
You can use the AUTO_CLEANUP option to enable or disable the cleanup task that removes old change tracking
information. This can be useful when there is a temporary problem that prevents applications from synchronizing
and the process for removing change tracking information older than the retention period must be paused until
the problem is resolved.
For any database that uses change tracking, be aware of the following:
To use change tracking, the database compatibility level must be set to 90 or greater. If a database has a
compatibility level of less than 90, you can configure change tracking. However, the CHANGETABLE
function, which is used to obtain change tracking information, will return an error.
Using snapshot isolation is the easiest way for you to help ensure that all change tracking information is
consistent. For this reason, we strongly recommend that snapshot isolation be set to ON for the database.
For more information, see Work with Change Tracking (SQL Server).
Enable Change Tracking for a Table
Change tracking must be enabled for each table that you want tracked. When change tracking is enabled, change
tracking information is maintained for all rows in the table that are affected by a DML operation.
The following example shows how to enable change tracking for a table by using ALTER TABLE.
ALTER TABLE Person.Contact
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = ON)
You can also enable change tracking for a table in SQL Server Management Studio by using the Database
Properties (ChangeTracking Page) dialog box.
When the TRACK_COLUMNS_UPDATED option is set to ON, the SQL Server Database Engine stores extra
information about which columns were updated to the internal change tracking table. Column tracking can enable
an application to synchronize only those columns that were updated. This can improve efficiency and performance.
However, because maintaining column tracking information adds some extra storage overhead, this option is set
to OFF by default.
Disable Change Tracking for a Database or Table
Change tracking must first be disabled for all change-tracked tables before change tracking can be set to OFF for
the database. To determine the tables that have change tracking enabled for a database, use the
sys.change_tracking_tables catalog view.
When no tables in a database track changes, you can disable change tracking for the database. The following
example shows how to disable change tracking for a database by using ALTER DATABASE.
ALTER DATABASE AdventureWorks2012
SET CHANGE_TRACKING = OFF
The following example shows how to disable change tracking for a table by using ALTER TABLE.
ALTER TABLE Person.Contact
DISABLE CHANGE_TRACKING;
See Also
Database Properties (ChangeTracking Page)
ALTER DATABASE SET Options (Transact-SQL)
sys.change_tracking_databases (Transact-SQL)
sys.change_tracking_tables (Transact-SQL)
Track Data Changes (SQL Server)
About Change Tracking (SQL Server)
Work with Change Data (SQL Server)
Manage Change Tracking (SQL Server)
Manage Change Tracking (SQL Server)
3/24/2017 • 5 min to read • Edit Online
THIS TOPIC APPLIES TO: SQL Server (starting with 2008) Azure SQL Database Azure SQL Data
Warehouse
Parallel Data Warehouse
This topic describes how to manage change tracking. It also describes how to configure security and determine the
effects on storage and performance when change tracking is used.
Managing Change Tracking
The following sections list catalog views, permissions, and settings that are relevant for managing change tracking.
Catalog Views
To determine which tables and databases have change tracking enabled, you can use the following catalog views:
sys.change_tracking_databases (Transact-SQL)
sys.change_tracking_tables (Transact-SQL)
Also, the sys.internal_tables catalog view lists the internal tables that are created when change tracking is
enabled for a user table.
Security
To access change tracking information by using the change tracking functions, the principal must have the
following permissions:
SELECT permission on at least the primary key columns on the change-tracked table to the table that is
being queried.
VIEW CHANGE TRACKING permission on the table for which changes are being obtained. The VIEW
CHANGE TRACKING permission is required for the following reasons:
Change tracking records include information about rows that have been deleted, specifically the
primary key values of the rows that have been deleted. A principal could have been granted SELECT
permission for a change tracked table after some sensitive data had been deleted. In this case, you
would not want that principal to be able to access that deleted information by using change tracking.
Change tracking information can store information about which columns have been changed by
update operations. A principal could be denied permission to a column that contains sensitive
information. However, because change tracking information is available, a principal can determine
that a column value has been updated, but the principal cannot determine the value of the column.
Understanding Change Tracking Overhead
When change tracking is enabled for a table, some administration operations are affected. The following table lists
the operations and the effects you should consider.
OPERATION
WHEN CHANGE TRACKING IS ENABLED
DROP TABLE
All change tracking information for the dropped table is
removed.
OPERATION
WHEN CHANGE TRACKING IS ENABLED
ALTER TABLE DROP CONSTRAINT
An attempt to drop the PRIMARY KEY constraint will fail.
Change tracking must be disabled before a PRIMARY KEY
constraint can be dropped.
ALTER TABLE DROP COLUMN
If a column that is being dropped is part of the primary key,
dropping the column is not allowed, regardless of change
tracking.
If the column that is being dropped is not part of the primary
key, dropping the column succeeds. However, the effect on
any application that is synchronizing this data should be
understood first. If column change tracking is enabled for the
table, the dropped column might still be returned as part of
the change tracking information. It is the responsibility of the
application to handle the dropped column.
ALTER TABLE ADD COLUMN
If a new column is added to the change tracked table, the
addition of the column is not tracked. Only the updates and
changes that are made to the new column are tracked.
ALTER TABLE ALTER COLUMN
Data type changes of a non-primary key columns are not
tracked.
ALTER TABLE SWITCH
Switching a partition fails if one or both of the tables has
change tracking enabled.
DROP INDEX, or ALTER INDEX DISABLE
The index that enforces the primary key cannot be dropped or
disabled.
TRUNCATE TABLE
Truncating a table can be performed on a table that has
change tracking enabled. However, the rows that are deleted
by the operation are not tracked, and the minimum valid
version is updated. When an application checks its version,
the check indicates that the version is too old and a
reinitialization is required. This is the same as change tracking
being disabled, and then reenabled for the table.
Using change tracking does add some overhead to DML operations because of the change tracking information
that is being stored as part of the operation.
Effects on DML
Change tracking has been optimized to minimize the performance overhead on DML operations. The incremental
performance overhead that is associated with using change tracking on a table is similar to the overhead incurred
when an index is created for a table and needs to be maintained.
For each row that is changed by a DML operation, a row is added to the internal change tracking table. The effect
of this relative to the DML operation depends on various factors, such as the following:
The number of primary key columns
The amount of data that is being changed in the user table row
The number of operations that are being performed in a transaction
Snapshot isolation, if used, also has an effect on performance for all DML operations, whether change
tracking is enabled or not.
Effects on Storage
Change tracking data is stored in the following types of internal tables:
Internal change table
There is one internal change table for each user table that has change tracking enabled.
Internal transaction table
There is one internal transaction table for the database.
These internal tables affect storage requirements in the following ways:
For each change to each row in the user table, a row is added to the internal change table. This row has a
small fixed overhead plus a variable overhead equal to the size of the primary key columns. The row can
contain optional context information set by an application. And, if column tracking is enabled, each changed
column requires 4 bytes in the tracking table.
For each committed transaction, a row is added to an internal transaction table.
As with other internal tables, you can determine the space used for the change tracking tables by using the
sp_spaceused stored procedure. The names of the internal tables can be obtained by using the
sys.internal_tables catalog view, as shown in the following example.
sp_spaceused 'sys.change_tracking_309576141'
sp_spaceused 'sys.syscommittab'
See Also
Track Data Changes (SQL Server)
ALTER TABLE (Transact-SQL)
Database Properties (ChangeTracking Page)
ALTER DATABASE SET Options (Transact-SQL)
sys.change_tracking_databases (Transact-SQL)
sys.change_tracking_tables (Transact-SQL)
Track Data Changes (SQL Server)
About Change Tracking (SQL Server)
Work with Change Data (SQL Server)
Work with Change Tracking (SQL Server)
3/24/2017 • 17 min to read • Edit Online
THIS TOPIC APPLIES TO: SQL Server (starting with 2008) Azure SQL Database Azure SQL Data
Warehouse
Parallel Data Warehouse
Applications that use change tracking must be able to obtain tracked changes, apply these changes to another data
store, and update the source database. This topic describes how to perform these tasks, and also the role change
tracking plays when a failover occurs and a database must be restored from a backup.
Obtain Changes by Using Change Tracking Functions
Describes how to use the change tracking functions to obtain changes and information about the changes that
were made to a database.
About the Change Tracking Functions
Applications can use the following functions to obtain the changes that are made in a database and information
about the changes:
CHANGETABLE(CHANGES …) function
This rowset function is used to query for change information. The function queries the data stored in the internal
change tracking tables. The function returns a results set that contains the primary keys of rows that have changed
together with other change information such as the operation, columns updated and version for the row.
CHANGETABLE(CHANGES …) takes a last synchronization version as an argument. The last sychronization version
is obtained using the @last_synchronization_version variable. The semantics of the last synchronization version
are as follows:
The calling client has obtained changes and knows about all changes up to and including the last
synchronization version.
CHANGETABLE(CHANGES …) will therefore return all changes that have occurred after the last
synchronization version.
The following illustration shows how CHANGETABLE(CHANGES …) is used to obtain changes.
CHANGE_TRACKING_CURRENT_VERSION() function
Is used to obtain the current version that will be used the next time when querying changes. This version
represents the version of the last committed transaction.
CHANGE_TRACKING_MIN_VALID_VERSION()function
Is used to obtain the minimum valid version that a client can have and still obtain valid results from
CHANGETABLE(). The client should check the last synchronization version against the value thatis returned
by this function. If the last synchronization version is less than the version returned by this function, the
client will be unable to obtain valid results from CHANGETABLE() and will have to reinitialize.
Obtaining Initial Data
Before an application can obtain changes for the first time, the application must send a query to obtain the initial
data and the synchronization version. The application must obtain the appropriate data directly from the table, and
then use CHANGE_TRACKING_CURRENT_VERSION() to obtain the initial version. This version will be passed to
CHANGETABLE(CHANGES …) the first time that changes are obtained.
The following example shows how to obtain the initial synchronization version and the initial data set.
-- Obtain the current synchronization version. This will be used next time that changes are obtained.
SET @synchronization_version = CHANGE_TRACKING_CURRENT_VERSION();
-- Obtain initial data set.
SELECT
P.ProductID, P.Name, P.ListPrice
FROM
SalesLT.Product AS P
Using the Change Tracking Functions to Obtain Changes
To obtain the changed rows for a table and information about the changes, use CHANGETABLE(CHANGES…). For
example, the following query obtains changes for the SalesLT.Product table.
SELECT
CT.ProductID, CT.SYS_CHANGE_OPERATION,
CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT
FROM
CHANGETABLE(CHANGES SalesLT.Product, @last_synchronization_version) AS CT
Usually, a client will want to obtain the latest data for a row instead of only the primary keys for the row. Therefore,
an application would join the results from CHANGETABLE(CHANGES …) with the data in the user table. For
example, the following query joins with the SalesLT.Product table to obtain the values for the Name and
ListPrice columns. Note the use of OUTER JOIN . This is required to make sure that the change information is
returned for those rows that have been deleted from the user table.
SELECT
CT.ProductID, P.Name, P.ListPrice,
CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS,
CT.SYS_CHANGE_CONTEXT
FROM
SalesLT.Product AS P
RIGHT OUTER JOIN
CHANGETABLE(CHANGES SalesLT.Product, @last_synchronization_version) AS CT
ON
P.ProductID = CT.ProductID
To obtain the version for use in the next change enumeration, use CHANGE_TRACKING_CURRENT_VERSION(), as
shown in the following example.
SET @synchronization_version = CHANGE_TRACKING_CURRENT_VERSION()
When an application obtains changes, it must use both CHANGETABLE(CHANGES…) and
CHANGE_TRACKING_CURRENT_VERSION(), as shown in the following example.
-- Obtain the current synchronization version. This will be used the next time CHANGETABLE(CHANGES...) is
called.
SET @synchronization_version = CHANGE_TRACKING_CURRENT_VERSION();
-- Obtain incremental changes by using the synchronization version obtained the last time the data was
synchronized.
SELECT
CT.ProductID, P.Name, P.ListPrice,
CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS,
CT.SYS_CHANGE_CONTEXT
FROM
SalesLT.Product AS P
RIGHT OUTER JOIN
CHANGETABLE(CHANGES SalesLT.Product, @last_synchronization_version) AS CT
ON
P.ProductID = CT.ProductID
Version Numbers
A database that has change tracking enabled has a version counter that increases as changes are made to change
tracked tables. Each changed row has a version number that is associated with it. When a request is sent to an
application to query for changes, a function is called that supplies a version number. The function returns
information about all the changes that have been made since that version. In some ways, change tracking version
is similar in concept to the rowversion data type.
Validating the Last Synchronized Version
Information about changes is maintained for a limited time. The length of time is controlled by the
CHANGE_RETENTION parameter that can be specified as part of the ALTER DATABASE.
Be aware that the time specified for CHANGE_RETENTION determines how frequently all applications must
request changes from the database. If an application has a value for last_synchronization_version that is older than
the minimum valid synchronization version for a table, that application cannot perform valid change enumeration.
This is because some change information might have been cleaned up. Before an application obtains changes by
using CHANGETABLE(CHANGES …), the application must validate the value for last_synchronization_version that it
plans to pass to CHANGETABLE(CHANGES …). If the value of last_synchronization_version is not valid, that
application must reinitialize all the data.
The following example shows how to verify the validity of the value of
table.
last_synchronization_version
for each
-- Check individual table.
IF (@last_synchronization_version < CHANGE_TRACKING_MIN_VALID_VERSION(
OBJECT_ID('SalesLT.Product')))
BEGIN
-- Handle invalid version and do not enumerate changes.
-- Client must be reinitialized.
END
As the following example shows, the validity of the value of
all tables in the database.
last_synchronization_version
can be checked against
-- Check all tables with change tracking enabled
IF EXISTS (
SELECT COUNT(*) FROM sys.change_tracking_tables
WHERE min_valid_version > @last_synchronization_version )
BEGIN
-- Handle invalid version & do not enumerate changes
-- Client must be reinitialized
END
Using Column Tracking
Column tracking enables applications to obtain the data for only the columns that have changed instead of the
whole row. For example, consider the scenario in which a table has one or more columns that are large, but rarely
change; and also has other columns that frequently change. Without column tracking, an application can only
determine that a row has changed and would have to synchronize all the data that includes the large column data.
However, by using column tracking, an application can determine whether the large column data changed and
only synchronize the data if it has changed.
Column tracking information appears in the SYS_CHANGE_COLUMNS column that is returned by the
CHANGETABLE(CHANGES …) function.
Column tracking can be used so that NULL is returned for a column that has not changed. If the column can be
changed to NULL, a separate column must be returned to indicate whether the column changed.
In the following example, the CT_ThumbnailPhoto column will be NULL if that column did not change. This column
could also be NULL because it was changed to NULL - the application can use the CT_ThumbNailPhoto_Changed
column to determine whether the column changed.
DECLARE @PhotoColumnId int = COLUMNPROPERTY(
OBJECT_ID('SalesLT.Product'),'ThumbNailPhoto', 'ColumnId')
SELECT
CT.ProductID, P.Name, P.ListPrice, -- Always obtain values.
CASE
WHEN CHANGE_TRACKING_IS_COLUMN_IN_MASK(
@PhotoColumnId, CT.SYS_CHANGE_COLUMNS) = 1
THEN ThumbNailPhoto
ELSE NULL
END AS CT_ThumbNailPhoto,
CHANGE_TRACKING_IS_COLUMN_IN_MASK(
@PhotoColumnId, CT.SYS_CHANGE_COLUMNS) AS
CT_ThumbNailPhoto_Changed
CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS,
CT.SYS_CHANGE_CONTEXT
FROM
SalesLT.Product AS P
INNER JOIN
CHANGETABLE(CHANGES SalesLT.Product, @last_synchronization_version) AS CT
ON
P.ProductID = CT.ProductID AND
CT.SYS_CHANGE_OPERATION = 'U'
Obtaining Consistent and Correct Results
Obtaining the changed data for a table requires multiple steps. Be aware that inconsistent or incorrect results
could be returned if certain issues are not considered and handled.
For example, to obtain the changes that were made to a Sales table and SalesOrders table, an application would
perform the following steps:
1. Validate the last synchronized version by using CHANGE_TRACKING_MIN_VALID_VERSION().
2. Obtain the version that can be used to obtain change the next time by using
CHANGE_TRACKING_CURRENT_VERSION().
3. Obtain the changes for the Sales table by using CHANGETABLE(CHANGES …).
4. Obtain the changes for the SalesOrders table by using CHANGETABLE(CHANGES …).
Two processes are occurring in the database that can affect the results that are returned by the previous
steps:
The cleanup process runs in the background and removes change tracking information that is older than
the specified retention period.
The cleanup process is a separate background process that uses the retention period that is specified when
you configure change tracking for the database. The issue is that the cleanup process can occur in the time
between when the last synchronization version was validated and when the call to
CHANGETABLE(CHANGES…) is made. A last synchronization version that was just valid might no longer be
valid by the time the changes are obtained. Therefore, incorrect results might be returned.
Ongoing DML operations are occurring in the Sales and SalesOrders tables, such as the following
operations:
Changes can be made to the tables after the version for next time has been obtained by using
CHANGE_TRACKING_CURRENT_VERSION(). Therefore, more changes can be returned than expected.
A transaction could commit in the time between the call to obtain changes from the Sales table and
the call to obtain changes from the SalesOrders table. Therefore, the results for the SalesOrder table
could have foreign key value that does not exist in the Sales table.
To overcome the previously listed challenges, we recommend that you use snapshot isolation. This will help
to ensure consistency of change information and avoid race conditions that are related to the background
cleanup task. If you do not use snapshot transactions, developing an application that uses change tracking
could require significantly more effort.
Using Snapshot Isolation
Change tracking has been designed to work well with snapshot isolation. Snapshot isolation must be enabled for
the database. All the steps that are required to obtain changes must be included inside a snapshot transaction. This
will ensure that all changes that are made to data while obtaining changes will not be visible to the queries inside
the snapshot transaction.
To obtain data inside a snapshot transaction, perform the following steps:
1. Set the transaction isolation level to snapshot and start a transaction.
2. Validate the last synchronization version by using CHANGE_TRACKING_MIN_VALID_VERSION().
3. Obtain the version to be used the next time by using CHANGE_TRACKING_CURRENT_VERSION().
4. Obtain the changes for the Sales table by using CHANGETABLE(CHANGES …)
5. Obtain the changes for the Salesorders table by using CHANGETABLE(CHANGES …)
6. Commit the transaction.
Some points to remember as all steps to obtain changes are inside a snapshot transaction:
If cleanup occurs after the last synchronization version is validated, the results from
CHANGETABLE(CHANGES …) will still be valid as the delete operations performed by cleanup will not be
visible inside the transaction.
Any changes that are made to the Sales table or the SalesOrders table after the next synchronization
version is obtained will not be visible, and the calls to CHANGETABLE(CHANGES …) will never return
changes with a version later than that returned by CHANGE_TRACKING_CURRENT_VERSION(). Consistency
between the Sales table and the SalesOrders table will also be maintained, because the transactions that
were committed in the time between calls to CHANGETABLE(CHANGES …) will not be visible.
The following example shows how snapshot isolation is enabled for a database.
-- The database must be configured to enable snapshot isolation.
ALTER DATABASE AdventureWorksLT
SET ALLOW_SNAPSHOT_ISOLATION ON;
A snapshot transaction is used as follows:
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRAN
-- Verify that version of the previous synchronization is valid.
-- Obtain the version to use next time.
-- Obtain changes.
COMMIT TRAN
For more information about snapshot transactions, see SET TRANSACTION ISOLATION LEVEL (Transact-SQL).
Alternatives to Using Snapshot Isolation
There are alternatives to using snapshot isolation, but they require more work to make sure all application
requirements are met. To make sure the last_synchronization_version is valid and data is not removed by the
cleanup process before changes are obtained, do the following:
1. Check last_synchronization_version after the calls to CHANGETABLE().
2. Check last_synchronization_version as part of each query to obtain changes by using CHANGETABLE().
Changes can occur after the synchronization version for the next enumeration has been obtained. There are
two ways to handle this situation. The option that is used depends on the application and how it can handle
the side-effects of each approach:
Ignore changes that have a version larger than the new synchronization version.
This approach has the side effect that a new or updated row would be skipped if it was created or updated
before the new synchronization version, but then updated afterward. If there is a new row, a referential
integrity problem might occur if there was a row in another table that was created that referenced the
skipped row. If there is an updated existing row, the row will be skipped and not synchronized until the next
time.
Include all changes, even those that have a version larger than the new synchronization version.
The rows that have a version larger than the new synchronization version will be obtained again on the next
synchronization. This must be expected and handled by the application.
In addition to the previous two options, you can devise approach that combines both options, depending on
the operation. For example, you might want an application for which it is best to ignore changes newer than
the next synchronization version in which the row was created or deleted, but updates are not ignored.
NOTE
Choosing the approach that will work for the application when you are using change tracking (or any custom tracking
mechanism), requires significant analysis. Therefore, it is much simpler to use snapshot isolation.
How Change Tracking Handles Changes to a Database
Some applications that use change tracking perform two-way synchronization with another data store. That is,
changes that are made in the SQL Server database are updated in the other data store, and changes that are made
in the other store are updated in the SQL Server database.
When an application updates the local database with changes from another data store, the application must
perform the following operations:
Check for conflicts.
A conflict occurs when the same data is changed at the same time in both data stores. The application must
be able to check for a conflict and obtain enough information to enable the conflict to be resolved.
Store application context information.
The application stores data that has the change tracking information. This information would be available
together with other change tracking information when changes were obtained from the local database. A
common example of this contextual information is an identifier for the data store that was the source of the
change.
To perform the previous operations, a synchronization application can use the following functions:
CHANGETABLE(VERSION…)
When an application is making changes, it can use this function to check for conflicts. The function obtains
the latest change tracking information for a specified row in a change tracked table. The change tracking
information includes the version of the row that was last changed. This information enables an application
to determine whether the row was changed after the last time that the application was synchronized.
WITH CHANGE_TRACKING_CONTEXT
An application can use this clause to store context data.
Checking for Conflicts
In a two-way synchronization scenario, the client application must determine whether a row has not been updated
since the application last obtained the changes.
The following example shows how to use the CHANGETABLE(VERSION …) function to check for conflicts in the
most efficient way, without a separate query. In the example, CHANGETABLE(VERSION …) determines the
SYS_CHANGE_VERSION for the row specified by @product id . CHANGETABLE(CHANGES …) can obtain the same
information, but that would be less efficient. If the value of SYS_CHANGE_VERSION for the row is larger than the value
of @last_sync_version , there is a conflict. If there is a conflict, the row will not be updated. The ISNULL() check is
required because there might be no change information available for the row. No change information would exist
if the row had not been updated since change tracking was enabled or since the change information was cleaned
up.
-- Assumption: @last_sync_version has been validated.
UPDATE
SalesLT.Product
SET
ListPrice = @new_listprice
FROM
SalesLT.Product AS P
WHERE
ProductID = @product_id AND
@last_sync_version >= ISNULL (
SELECT CT.SYS_CHANGE_VERSION
FROM CHANGETABLE(VERSION SalesLT.Product,
(ProductID), (P.ProductID)) AS CT),
0)
The following code can check the updated row count and can identify more information about the conflict.
-- If the change cannot be made, find out more information.
IF (@@ROWCOUNT = 0)
BEGIN
-- Obtain the complete change information for the row.
SELECT
CT.SYS_CHANGE_VERSION, CT.SYS_CHANGE_CREATION_VERSION,
CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS
FROM
CHANGETABLE(CHANGES SalesLT.Product, @last_sync_version) AS CT
WHERE
CT.ProductID = @product_id;
------
Check CT.SYS_CHANGE_VERSION to verify that it really was a conflict.
Check CT.SYS_CHANGE_OPERATION to determine the type of conflict:
update-update or update-delete.
The row that is specified by @product_id might no longer exist
if it has been deleted.
END
Setting Context Information
By using the WITH CHANGE_TRACKING_CONTEXT clause, an application can store context information together
with the change information. This information can then be obtained from the SYS_CHANGE_CONTEXT column that
is returned by CHANGETABLE(CHANGES …).
Context information is typically used to identify the source of the changes. If the source of the change can be
identified, that information can be used by a data store to avoid obtaining changes when it synchronizes again.
-- Try to update the row and check for a conflict.
WITH CHANGE_TRACKING_CONTEXT (@source_id)
UPDATE
SalesLT.Product
SET
ListPrice = @new_listprice
FROM
SalesLT.Product AS P
WHERE
ProductID = @product_id AND
@last_sync_version >= ISNULL (
(SELECT CT.SYS_CHANGE_VERSION FROM CHANGETABLE(VERSION SalesLT.Product,
(ProductID), (P.ProductID)) AS CT),
0)
Ensuring Consistent and Correct Results
An application must consider the cleanup process when it validates the value of @last_sync_version. This is
because data could have been removed after CHANGE_TRACKING_MIN_VALID_VERSION() was called, but before
the update was made.
IMPORTANT
We recommend that you use snapshot isolation and make the changes within a snapshot transaction.
-- Prerequisite is to ensure ALLOW_SNAPSHOT_ISOLATION is ON for the database.
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRAN
-- Verify that last_sync_version is valid.
IF (@last_sync_version <
CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID(‘SalesLT.Product’)))
BEGIN
RAISERROR (N’Last_sync_version too old’, 16, -1);
END
ELSE
BEGIN
-- Try to update the row.
-- Check @@ROWCOUNT and check for a conflict.
END
COMMIT TRAN
NOTE
There is a possibility that the row being updated within the snapshot transaction could have been updated in another
transaction after the snapshot transaction was started. In this case, a snapshot isolation update conflict will occur and lead to
the transaction being terminated. If this happens, retry the update. This will then lead to a change tracking conflict being
detected and no rows being changed.
Change Tracking and Data Restore
Applications that require synchronization must consider the case in which a database that has change tracking
enabled reverts to an earlier version of the data. This can occur after a database is restored from a backup, when
there is a failover to an asynchronous database mirror, or when there is a failure when using log shipping. The
following scenario illustrates the issue:
1. Table T1 is change tracked, and the minimum valid version for table is 50.
2. A client application synchronizes data at version 100 and obtains information about all changes between
versions 50 and 100.
3. Additional changes are made to table T1 after version 100.
4. At version 120, there is a failure and the database administrator restores the database with data loss. After
the restore operation, the table contains data up through version 70, and the minimum synchronized
version is still 50.
This means that the synchronized data store has data that no longer exists in the primary data store.
5. T1 is updated many times. This brings the current version to 130.
6. The client application synchronizes again and supplies a last-synchronized version of 100. The client
validates this number successfully because 100 is greater than 50.
The client obtains changes between version 100 and 130. At this point, the client is not aware that the
changes between 70 and 100 are not the same as before. The data on the client and server are not
synchronized.
Note that if the database was recovered to a point after version 100, there would be no problems with
synchronization. The client and server would synchronize data correctly during the next synchronization
interval.
Change tracking does not provide support for recovering from the loss of data. However, there are two
options for detecting these types of synchronization issues:
Store a database version ID on the server, and update this value whenever a database is recovered or
otherwise loses data. Each client application would store the ID, and each client would have to validate this
ID when it synchronizes data. If data loss occurs, the IDs will not match and the clients would reinitialize.
One drawback is if the data loss had not crossed the last synchronized boundary, the client might do
unnecessary reinitialization.
When a client queries for changes, record the last synchronization version number for each client on the
server. If there is a problem with the data, the last synchronized version numbers would not match. This
indicates that a reinitialization is required.
See Also
Track Data Changes (SQL Server)
About Change Tracking (SQL Server)
Manage Change Tracking (SQL Server)
Enable and Disable Change Tracking (SQL Server)
CHANGETABLE (Transact-SQL)
CHANGE_TRACKING_MIN_VALID_VERSION (Transact-SQL)
CHANGE_TRACKING_CURRENT_VERSION (Transact-SQL)
WITH CHANGE_TRACKING_CONTEXT (Transact-SQL)
Change Data Capture and Other SQL Server
Features
3/24/2017 • 3 min to read • Edit Online
This topic describes how the following features interact with change data capture:
Change tracking
Database mirroring
Transactional replication
Restoring or Attaching a Database Enabled for Change Data Capture
Change Tracking
Change data capture and change tracking can be enabled on the same database. No special considerations are
required. For more information, see Work with Change Tracking (SQL Server).
Database Mirroring
A database that is enabled for change data capture can be mirrored. To ensure that capture and cleanup happen
automatically after a failover, follow these steps:
1. Ensure that SQL Server Agent is running on the new principal server instance.
2. Create the capture job and cleanup job on the new principal database (the former mirror database). To
create the jobs, use the sp_cdc_add_job stored procedure.
To view the current configuration of a cleanup or capture job, use the sys.sp_cdc_help_jobs stored procedure
on the new principal server instance. For a given database, the capture job is named
cdc.database_name_capture, and the cleanup job is named cdc.database_name_cleanup, where
database_name is the name of the database.
To change the configuration of a job, use the sys.sp_cdc_change_job stored procedure.
For information about database mirroring, see Database Mirroring (SQL Server).
Transactional Replication
Change data capture and transactional replication can coexist in the same database, but population of the change
tables is handled differently when both features are enabled. Change data capture and transactional replication
always use the same procedure, sp_replcmds, to read changes from the transaction log. When change data capture
is enabled on its own, a SQL Server Agent job calls sp_replcmds. When both features are enabled on the same
database, the Log Reader Agent calls sp_replcmds. This agent populates both the change tables and the
distribution database tables. For more information, see Replication Log Reader Agent.
Consider a scenario in which change data capture is enabled on the AdventureWorks2012 database, and two
tables are enabled for capture. To populate the change tables, the capture job calls sp_replcmds. The database is
enabled for transactional replication, and a publication is created. Now, the Log Reader Agent is created for the
database and the capture job is deleted. The Log Reader Agent continues to scan the log from the last log sequence
number that was committed to the change table. This ensures data consistency in the change tables. If transactional
replication is disabled in this database, the Log Reader Agent is removed and the capture job is re-created.
NOTE
When the Log Reader Agent is used for both change data capture and transactional replication, replicated changes are first
written to the distribution database. Then, captured changes are written to the change tables. Both operations are committed
together. If there is any latency in writing to the distribution database, there will be a corresponding latency before changes
appear in the change tables.
The proc exec option of transactional replication is not available when change data capture is enabled.
Restoring or Attaching a Database Enabled for Change Data Capture
SQL Server uses the following logic to determine if change data capture remains enabled after a database is
restored or attached:
If a database is restored to the same server with the same database name, change data capture remains
enabled.
If a database is restored to another server, by default change data capture is disabled and all related
metadata is deleted.
To retain change data capture, use the KEEP_CDC option when restoring the database. For more information
about this option, see RESTORE.
If a database is detached and attached to the same server or another server, change data capture remains
enabled.
If a database is attached or restored with the KEEP_CDC option to any edition other than Enterprise, the
operation is blocked because change data capture requires SQL Server Enterprise. Error message 934 is
displayed:
SQL Server cannot load database '%.*ls' because Change Data Capture is enabled. The currently installed
edition of SQL Server does not support Change Data Capture. Either restore database without KEEP_CDC
option, or upgrade the instance to one that supports Change Data Capture.
You can use sys.sp_cdc_disable_db to remove change data capture from a restored or attached database.
Change Data Capture and Always On
When you use Always On, change enumeration should be done on the Secondary replication to reduce the disk
load on the primary.
See Also
Administer and Monitor Change Data Capture (SQL Server)