Download IBM InfoSphere Data Replication’s Change Data Capture (CDC) Disaster Recovery (DR) Considerations

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Relational model wikipedia , lookup

Concurrency control wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
IBM InfoSphere Data Replication’s
Change Data Capture (CDC)
Disaster Recovery (DR) Considerations
Version 1.0
IBM Information
Management
TABLE OF CONTENTS
TABLE OF CONTENTS ....................................................................................................................... 2
INTRODUCTION ................................................................................................................................. 3
DATABASE LOG READING CONCEPT ................................................................................................ 4
CDC Bookmark ........................................................................................................................................................................................ 4
CDC Log Reading Concept ..................................................................................................................................................................... 5
CDC Install/Instance creation on backup (DR) system............................................................................................................................ 6
CDC Fail-over considerations .................................................................................................................................................................. 6
PRODUCTION SYSTEM BEING RECOVERED ....................................................................................... 7
Different DR Recovery Scenarios ............................................................................................................................................................ 7
Overview of the Three Types of Disaster Recovery Replication Methods .............................................................................................. 8
Synchronous Physical DR Replication Method.................................................................................................................................. 8
Asynchronous Physical DR Replication Method ................................................................................................................................ 8
Asynchronous Logical DR Replication Method .................................................................................................................................. 9
SYNCHRONOUS PHYSICAL REPLICATION METHOD USED FOR DR SYNCHRONIZATION .....................10
ASYNCHRONOUS PHYSICAL REPLICATION METHOD USED FOR DR SYNCHRONIZATION ...................12
Asynch Physical DR Replication Case 1: CDC Target Same or Behind Source................................................................................. 13
Asynch Physical DR Replication Case 2: CDC Target Ahead of Source ............................................................................................ 14
ASYNCHRONOUS LOGICAL REPLICATION METHOD USED FOR DR SYNCHRONIZATION .....................17
KEEPING CDC INSTANCE SYNCHRONIZED ......................................................................................20
CHANGING CDC IP ADDRESS.........................................................................................................21
HOW TO DETERMINE IF LOG POSITION VALID AFTER SWITCH-OVER...............................................23
ADDITIONAL RECOVERY ASPECTS IF YOUR TARGET IS NOT A DATABASE.......................................24
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 2 OF 24
INTRODUCTION
This document is focusing on switching over a CDC replication instance in a Disaster
Recovery (DR) environment. Since this is meant to address true DR environments that are
over a long distance, simpler recovery environments such as using a shared disk are not
covered.
There are various considerations when CDC is used within an environment for which DR is
implemented. This document will explore the various topologies and considerations for
operating CDC in such an environment.
1. Which production system is being recovered:
a. Recovery of source production system
b. Recovery of target production system
c. Recovery of both source and target production systems
2. What type of Disaster Recovery Replication Method is used:
a. Synchronous Physical Method
b. Asynchronous Physical Method
c. Asynchronous Logical Method
3. Type of CDC target used:
a. Database engine
b. DataStage or Event Server engines which do not have a bookmark table in
an external database
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 3 OF 24
DATABASE LOG READING CONCEPT
CDC Bookmark
Before we can talk about recovery of a CDC instance, we will introduce the concept of
database log reading as it applies to CDC, and the concept of a bookmark that CDC uses to
track its replication progress.
When reading a DBMS log, there is the concept of a log position. A log position is a unique
position/point in the DBMS log. Examples of a simplistic representation of a log position are
an SCN for Oracle and an LSN for DB2.
The CDC bookmark consists of all relevant information required to be able to restart
replication at the appropriate log position (including the current log position and earliest open
log position). Data changes are scraped from the log and sent to the target. CDC will apply
the appropriate database operation (insert/update/delete), and in the same transaction
commit the bookmark to a metadata table. When CDC restarts after any normal or
abnormal shutdown, it will acquire the bookmark from the target system, and restart
replication at the appropriate point in the log. The mechanism is illustrated in the following
diagram:
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 4 OF 24
CDC Log Reading Concept
The following diagram illustrates the log reading concept and the population of the CDC
bookmark table:
For the purposes of this document, we will use a simplistic representation of a log position
(in reality, this can differ quite significantly from one database to another). Here we show
that each operation has a corresponding log position. In this case, the ‘Insert a’ corresponds
to log position A01, and the ‘insert b’ corresponds to log position A02 and so on. When
CDC is replicating and applies data to the target database, it will also write an entry into the
bookmark table (within the same commit). The example above illustrates that CDC has
replicate all four inserts and thus the bookmark would contain the log position of the last
operation (in this case ‘insert d’).
The above does over simplify the bookmark. In actuality, multiple items make up the CDC
bookmark. As it pertains to the topic of CDC in a DR environment, there are two key aspects
that you need to be aware of:
1. The last applied log position (which was illiustrated above)
2. The earliest open transaction log position
When CDC reads data from the logs, it will first build transactions of the source system and
will not send them to the target system until a complete transaction is built. The earliest
open log position keeps track of the log position for the start of a transaction which CDC has
started to process (read the logs), but has not yet seen an end transaction for. This is an
important concept for DR recovery as when CDC restarts replication, it may potentially need
to go back to the earliest open log position. This is another good reason to follow a best
practice of trying to avoid large and long running transactions.
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 5 OF 24
CDC Install/Instance creation on backup (DR) system
How CDC is installed and how the instance should be created will depend on the DR
replication solution being utilized.
If you are using physical replication to your DR machine, it is best to include the file system
that CDC is installed on to be mirrored to the DR site as well. In this case, no additional
install or instance creation will be required on the DR system. In all other scenarios where
the file system that CDC is installed in is not replicated to the DR system, you will be
required to do an additional install of CDC on the DR system, and to create the CDC
instance as well.
CDC Fail-over considerations
The following are items that need to be considered when CDC is failed-over to another
system:
1. IP Address used to reach CDC
2. CDC configuration metadata (stored in an internal database)
3. CDC Operational metadata (stored in client database), most importantly the CDC
bookmark table
When connecting to a CDC instance, an IP address is specified. When dealing with a DR
scenario, it is easiest to use a Virtual IP Address, and then no additional steps will be
required. If you cannot use a Virtual IP Address, you will need to change the IP address in
the CDC configuration. Instructions on how to change the CDC IP address can be found in
Changing CDC IP Address.
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 6 OF 24
PRODUCTION SYSTEM BEING RECOVERED
Different DR Recovery Scenarios
For now, let’s consider that CDC is installed locally on the source and target database
servers. In the following diagram, the active production CDC replication is indicated by the
solid arrow. The dashed arrows numbered 1 to 3 represent possible CDC replication after
either the source production server fails and is switch over to a backup server, or the target
production fails and is switched over, or both the source and target production servers fail.
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 7 OF 24
Overview of the Three Types of Disaster Recovery Replication Methods
The procedures for recovery of CDC to a DR site will differ depending on the type of DR
recovery method used. Following are descriptions of the three main DR replication
methods:
Synchronous Physical DR Replication Method
Although the replication may not be instantaneous, synchronous physical replication will
ensure that the data on the source and target will remain synchronized in the event of a
failure. Note that in the above example the DR log always contains the exact same image
as the Production log.
When using Synchronous Physical DR replication, you can only replicate the database, or
replicate the database and the CDC instance directory. It is recommended that you also
replicate the CDC instance directory. By also replicating the CDC instance directory you will
not need to do a separate install on the DR system, and it will ensure that the CDC internal
metadata is kept in sync.
Asynchronous Physical DR Replication Method
Using asynchronous physical replication means that there is the possibility that the DR
system will have an image from an earlier point in time than the Production System. Note in
the above example that the log on the DR is at a different point in time than the log position
on the production system.
When using asynchronous Physical DR replication, you can only replicate the database, or
replicate the database and the CDC instance directory. It is recommended that you also
replicate the CDC instance directory. By also replicating the CDC instance directory you will
not need to do a separate install on the DR system, and it will ensure that the CDC internal
metadata is kept in sync. Even though there is some latency with asynchronous physical
replication, given that the CDC metadata rarely changes (only when configuration changes
are made), the CDC metadata on the DR site should always be in sync.
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 8 OF 24
Asynchronous Logical DR Replication Method
Using asynchronous logical DR replication makes recovery of CDC significantly more
complex than by using physical DR replication. As illustrated in the example about, the
asynchronous nature will mean that the DR log will be at an older position than the
production system. Additionally, where the real complication comes in with logical
replication, there is a different log on the DR system. The Log positions on the DR system
will not correspond to the log positions on the primary production system. This means that
the CDC bookmark which is stored on the target side cannot be used to restart replication.
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 9 OF 24
SYNCHRONOUS PHYSICAL REPLICATION METHOD USED FOR
DR SYNCHRONIZATION
Using a synchronous physical replication method to mirror the production system to a DR
system is the best (most simplistic) approach to use when CDC is being used to replicate
data from or to those systems. The scenario is illustrated in the following diagram:
In the above example depicted by the wide arrows, whole system physical replication is
used between the production and DR servers. Also key is that a synchronous mode is being
utilized for the replication. DBMS level sychronization between the production and DR
system is handled by the DR soltion.
It is also ideal to use the physical replication to mirror the CDC instance directory. If you are
using sychronous replication for the DBMS, but not the CDC instance directory, please see
section ‘Keeping CDC Instance Synchronized’. If you do use physical mirroring for the CDC
Instance, since the entire CDC instance and Database has an exact copy on the DR system,
it makes fail-over straight forward. As a result, there are no CDC specific considerations
about log position and data availability for disaster recovery in this scenario.
The only CDC consideration is dealing with an IP address change. Using a virtual IP
address is recommended and simplifies the process. If a virtual IP address can not be used,
you will need to follow the procedures outline in the ‘Changing CDC IP Address’ section of
this document.
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 10 OF 24
Following is a high level flow diagram to illustrate the choices for sychrnous physical DR
replication:
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 11 OF 24
ASYNCHRONOUS PHYSICAL REPLICATION METHOD USED
FOR DR SYNCHRONIZATION
Utilizing asynchronous replication increases the complexity of a CDC DR solution
considerably. The reason for this is that there are multiple different recovery situations that
you may need to deal with. The following section will outline each of these scenarios
utilizing diagrams to illustrate the concepts.
The above diagram illustrates some of the possible outcomes at point of failure. The data
replicated to the target relative to the source (may be failed-over) can be either ahead, equal
to, or behind. For instance, in the above diagram if the only the source production switched
over to the DR box, the production target database is ahead of the source, and contains
data that does not exist on the source. If only the target production machine switched over
to the DR target, then the data on the DR target would be behind the production source,
which is the easy situation to deal with. Refer to section ‘How To Determine If Log Position
Valid After Switch-Over’ for information on determine which scenario you fall into after a failover.
Lastly, there are common considerations for CDC instance sychronization and IP Address
change when using asynchronous pysical DR replication method versus sychronous
physical DR replication, and as such will not be repeated here.
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 12 OF 24
Asynch Physical DR Replication Case 1: CDC Target Same or Behind
Source
If you are using asynchronous physical DR replication, where the CDC target is the same or
behind the source log after switchover, then you have a straight forward recovery case.
Below are two examples of situations which match this scenario:
or
In the first example, the source switches over to the DR site. In this case, the DR source
system has more recent data than the production target. In the second example, the
production target switches over to the DR target, and again, the source has more recent
data than the target. In both of these examples, since the data in the target is older than
what is available in the production server, there is no special CDC consideration required.
When you restart CDC, it will go back in the log based on the log position stored in the CDC
target bookmark, and will restart replication as per normal operating behavior.
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 13 OF 24
Asynch Physical DR Replication Case 2: CDC Target Ahead of Source
If you are using asynchronous physical DR replication, and after switchover the CDC target
is ahead of the source log, then recovery is significantly more involved. The following
diagram illustrates this case:
In the above example, CDC had replicated the data upto log position A04. However, the
asychronous source DR replication had only replicated upto the point in time of log position
A02. Thus, at the point in time of the fail-over to the DR source, the DR source is missing
data that has already been applied to the target system. Because of this, the bookmark on
the target system is invalid as it doesn’t exist on the source system.
This situation is one you want to avoid at all costs. If for instance you know that the
asynchronous DR replication will at most be 2 minutes latent, one way that you may be able
to prevent this situation is by creating a CDC target user exit that will delay the apply by a
set amount of time (in this case for example, 3 minutes). There is a sample user exit
available in Developer Works located here:
https://www.ibm.com/developerworks/community/files/app/file/f047a38c-734a-4071-8a3c4fe37c85baeb
If you need to deal with the situation in the example above, then you are out of sync and
need to make a business decision on how to move forward. Here are some possible
options:
1) Source DR should be reconciled with the latest changes that were replicated to the target
but missed by the Asynch DR Replication solution
•
In this case, the operations/transactions that were not captured by the DR replication
solution need to be repopulated to the source DR location
•
This reconciliation needs to be performed before restarting the source applications
on the DR site
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 14 OF 24
•
Once the reconciliation is complete, you can mark the table capture point
•
Next you can allow access to the source applications to the site
•
Lastly, you would restart CDC replication
Note, if everything was not perfectly reconciled, you may have replication errors and CDC
would shut down.
2) Perform a refresh of any table that has “newer” data than is available on the source. If
you don’t know which tables that is, could mark table capture point (before allowing users on
the production system), and start replication. Then if CDC encounters an issue applying to a
table and shuts down, you will need to refresh that table and start up mirroring again. You
may have to do this iteratively until all out-of-sync tables have been refreshed.
3) If your business decision is to just continue replication, and leave the additional data on
your target system, then you cannot continue to use standard mirroring as it would fail as
soon as one operation hit data it did not expect and for instance get a duplicate key error.
As such, you would have to mark table capture point to reset the starting position to ‘now’.
You would also have to enable Conflict Detection and Resolution (CDR) on all tables. You
would have to set the conflict method to “Source Wins” so that CDC would continue
replication when a conflict occurs and the value from the source will be used. For example,
if there was an insert on the source, and the record already existed in the target, the CDR
would change the insert to an update and the final image would be the same as on the
source system. Note that using CDR will have a significant performance impact, so if your
data volumes are high, this may not be a feasible option.
A high level summary of decision flow is as follows:
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 15 OF 24
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 16 OF 24
ASYNCHRONOUS LOGICAL REPLICATION METHOD USED FOR
DR SYNCHRONIZATION
As mentioned in the DR method overview section, using asynchronous logical DR
replication results in the most complex recovery of CDC. The following diagram will be used
to illustrate the scenarios:
In case ①, when the target fails, you will not be required to do any special processing
(beyond ensuring that the CDC instance configuration is up to date), and can just restart
CDC replication which will pick up from the last applied entry.
In case ②, and ③, the DR source has a different log and the log positions do not
correspond to the log positions found on the original production source. There is also an
added potential complexity that the Object IDs are different on the DR source. Additionally,
this example also illustrates that the CDC target may be ahead of the DR replication so
there is data on the target that does not exist on the source. The techniques to deal with
the newer data would be the same as those outlined for the asynchronous physical DR
replication.
Dealing with the potential difference in Object IDs will require the table mapping to be
reconfigured. The simplest way to accomplish this is by doing an export on the production
system, and an import on the DR system. Note that after you import the table mappings, the
tables wil be marked for refresh and the bookmark position will be ‘reset’. The bookmark
position is not valid since it corresponds to another log. You will need to change from
refresh to mirroring, and you would have to mark the table capture point to set an
appropriate bookmark starting position.
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 17 OF 24
So we are left with dealing with the fact that the DR system has a different log. To simplify
the procedures, it is best to ensure that the CDC target is never ahead of the DR source
from a log processing perspective. Additionally, when you restart, you would either refresh
tables, or use an adaptive apply technique. If this isn’t possible, the following procedure is
one way to mitigate this factor. However, there is significant extra complexity and points of
failure, so using this procedure would not be a best practice:
•
Note that this procedure will only work for databases and DR replication technology
where CDC will be able to run replication off the active DR database
•
You will need to create an additional table which I will call CDC_Restart. This table
will contain an ID and a timestamp column
•
The CDC_Restart table will be replicated by CDC from the Production Source to the
Production Target using an adaptive apply method. You will need to create a
separate subscription for replicating this table.
o
On the target, this table will have an additional column to hold a CDC
bookmark
•
The CDC_Restart table will also be replicated to the DR Source via the Asychronous
Logical DR replication.
•
Lastly, the CDC_Restart table will be replicated by CDC from the DR Source to the
Production Target as well using standard replication. The table on the target will be
called CDC_DR_Restart. You will need to create a separate subscription for
replicating this table. On the DR Source instance, keep only the subscription
mirroring the CDC_Restart active, all the other subscriptions must be kept idle in
Refresh/parked state.
o
On the target, the CDC_DR_Restart table will have an insert trigger on it.
The trigger will call the DMSHOWBOOKMARK command and will then insert
and/or update the corresponding record in the CDC_Restart table (which is
also on the target system).
•
You will need to create a script that runs on the Production Source and will insert
data into the CDC_Restart table as a set interval (eg. Every 5 seconds)
•
When a failover happens, you will need to check for the latest entry in the
CDC_Restart table that contains a bookmark. You would then use this bookmark to
set the bookmark on the source system and restart replication utilizing CDR for
dealing with conflicts as some data will be replayed.
The following two diagrams illustrate the scenarios where the CDC replication is ahead
of the DR replication and is behind the DR replication:
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 18 OF 24
A high level summary of decision flow is as follows:
Note that the above is very simplified as the Mark Correct log position can involve the
complex setup that was listed previously.
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 19 OF 24
KEEPING CDC INSTANCE SYNCHRONIZED
The ideal case is to use physical replication to mirror the CDC instance directory to the DR
machine (this would not be applicable to a MS Windows environment). If physical
replication of the CDC instance directory is not possible, then you will have to perform the
following:
1. You must install an instance of CDC on the DR machine
2. You will need to copy over the CDC metadata from the production server initially,
and then any time that configuration changes happen (such as table mapping or
table definition change)
a. For each CDC instance in Production: : run <install_dir>/bin/dmbackupmd –I
<instance>
b. Stop the Backup Server CDC instance, and remove all md*.* files from the
<installe_dir>/instance/<instance>/conf directory
c. Copy md$1.wal and md.dbn files created in step a in
<installe_dir>/instance/<instance>/conf/backup to each instance in DR
You need to be aware that CDC maintains a linkage between the instance operational
metadata in the user database and the configuration metadata by means of an instance is
value. At instance start, the instance id value stored in each is compared. If they do not
match, and error is generated and the instance will fail to start. Thus, it is crucial that the
metadata is always properly synchronized. If you start up and get the mismatch error, you
will need to call IBM L2 support to past this.
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 20 OF 24
CHANGING CDC IP ADDRESS
As mentioned previously, it is recommended that a Virtual IP Address is used across the
production and DR server as this will be transparent to CDC upon switchover (other than
you will be required to restart Access Server due to caching of DNS name). However, if
using a virtual IP address is not possible, the following steps must be taken upon switchover to update the CDC instance with a new address. If it is only the CDC source instance
that is switching over, you only need to do steps 1 to 5. If you have switched over the CDC
target instance, then you also need to do steps 6 to 12 for all related subscriptions.
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 21 OF 24
If you are only dealing with a CDC source switch-over, and the version of CDC you are
using is IIDR 10.2 or newer, then you can simplify the procedure by utilizing the following
CHCCLP commands to replace steps 1 to 5 above:
modify access datastore name ds_name hostname new_hostname;
modify subscription name sub_name tcphostname new_hostname;
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 22 OF 24
HOW TO DETERMINE IF LOG POSITION VALID AFTER
SWITCH-OVER
If you are using asychronous physical DR mirroring, the easiest way to know if the target is
ahead of the source is to start CDC replication (before making the source system available
to users). If upon restart, the bookmark position is not found, the target is ahead of the
source. Since the CDC bookmark is specific to the source database, there are different
procedures to determine the right bookmark, and for some databases, can only be done with
the help of IBM L2 support. The following sections outlines the procedures for some
databases.
1. Start with checking the target subscription bookmark position
a. From Target, run <install_dir>/bin/dmshowbookmark –I <instance> -s
<subscription>
b. Copy the resultant bookmark over to the source system
2. Check the SCN of the target bookmark
a. From the CDC Source system, run <install_dir>/bin/dmdecodebookmark –I
<instance> -b <bookmark information from step 1>
3. Check the current log position from the source DBMS
a. From Oracle source DBMS, run SQL or command based on the specific DBMS
b. Oracle:
DB2:
select current_scn from v$database;
db2pd -d <db name> -logs | grep “Current LSN”
4. If the log position of the target bookmark’s current log position is larger (newer) than the
source log position it means that you have additional data.
a. You would need to follow the directions from section Asynch Physical DR
Replication Case 2: CDC Target Ahead of Source
b. From the source, before allowing users back on the database, run
<install_dir>/bin/dmmarktablecapturepoint –I <instance> -s <subscription> -A
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 23 OF 24
ADDITIONAL RECOVERY ASPECTS IF YOUR TARGET IS NOT A
DATABASE
If your CDC target is not a Database, then the CDC bookmark will not be in the customer
target database, but rather stored in the internal CDC metadata. This aspect requires
special consideration for DR of the CDC target system. The following CDC engines are
examples of ones that have this characteristic:
•
CDC Event Server
•
CDC for DataStage if using FlatFile
Since the bookmark is stored in the CDC Instance directory, it is very important to ensure
that the CDC target instance directory is mirrored to the DR target system for easier
recovery. The other approach is to use the dmbackupmd command on a very regular basis
and ensure that it is available for restore on the backup target system. Note, using this
technique the bookmark stored will be at a point in time behind what has already been
applied, so when replication is restarted, you will “replay” some transactions.
INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION
PAGE 24 OF 24