Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IBM InfoSphere Data Replication’s Change Data Capture (CDC) Disaster Recovery (DR) Considerations Version 1.0 IBM Information Management TABLE OF CONTENTS TABLE OF CONTENTS ....................................................................................................................... 2 INTRODUCTION ................................................................................................................................. 3 DATABASE LOG READING CONCEPT ................................................................................................ 4 CDC Bookmark ........................................................................................................................................................................................ 4 CDC Log Reading Concept ..................................................................................................................................................................... 5 CDC Install/Instance creation on backup (DR) system............................................................................................................................ 6 CDC Fail-over considerations .................................................................................................................................................................. 6 PRODUCTION SYSTEM BEING RECOVERED ....................................................................................... 7 Different DR Recovery Scenarios ............................................................................................................................................................ 7 Overview of the Three Types of Disaster Recovery Replication Methods .............................................................................................. 8 Synchronous Physical DR Replication Method.................................................................................................................................. 8 Asynchronous Physical DR Replication Method ................................................................................................................................ 8 Asynchronous Logical DR Replication Method .................................................................................................................................. 9 SYNCHRONOUS PHYSICAL REPLICATION METHOD USED FOR DR SYNCHRONIZATION .....................10 ASYNCHRONOUS PHYSICAL REPLICATION METHOD USED FOR DR SYNCHRONIZATION ...................12 Asynch Physical DR Replication Case 1: CDC Target Same or Behind Source................................................................................. 13 Asynch Physical DR Replication Case 2: CDC Target Ahead of Source ............................................................................................ 14 ASYNCHRONOUS LOGICAL REPLICATION METHOD USED FOR DR SYNCHRONIZATION .....................17 KEEPING CDC INSTANCE SYNCHRONIZED ......................................................................................20 CHANGING CDC IP ADDRESS.........................................................................................................21 HOW TO DETERMINE IF LOG POSITION VALID AFTER SWITCH-OVER...............................................23 ADDITIONAL RECOVERY ASPECTS IF YOUR TARGET IS NOT A DATABASE.......................................24 INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 2 OF 24 INTRODUCTION This document is focusing on switching over a CDC replication instance in a Disaster Recovery (DR) environment. Since this is meant to address true DR environments that are over a long distance, simpler recovery environments such as using a shared disk are not covered. There are various considerations when CDC is used within an environment for which DR is implemented. This document will explore the various topologies and considerations for operating CDC in such an environment. 1. Which production system is being recovered: a. Recovery of source production system b. Recovery of target production system c. Recovery of both source and target production systems 2. What type of Disaster Recovery Replication Method is used: a. Synchronous Physical Method b. Asynchronous Physical Method c. Asynchronous Logical Method 3. Type of CDC target used: a. Database engine b. DataStage or Event Server engines which do not have a bookmark table in an external database INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 3 OF 24 DATABASE LOG READING CONCEPT CDC Bookmark Before we can talk about recovery of a CDC instance, we will introduce the concept of database log reading as it applies to CDC, and the concept of a bookmark that CDC uses to track its replication progress. When reading a DBMS log, there is the concept of a log position. A log position is a unique position/point in the DBMS log. Examples of a simplistic representation of a log position are an SCN for Oracle and an LSN for DB2. The CDC bookmark consists of all relevant information required to be able to restart replication at the appropriate log position (including the current log position and earliest open log position). Data changes are scraped from the log and sent to the target. CDC will apply the appropriate database operation (insert/update/delete), and in the same transaction commit the bookmark to a metadata table. When CDC restarts after any normal or abnormal shutdown, it will acquire the bookmark from the target system, and restart replication at the appropriate point in the log. The mechanism is illustrated in the following diagram: INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 4 OF 24 CDC Log Reading Concept The following diagram illustrates the log reading concept and the population of the CDC bookmark table: For the purposes of this document, we will use a simplistic representation of a log position (in reality, this can differ quite significantly from one database to another). Here we show that each operation has a corresponding log position. In this case, the ‘Insert a’ corresponds to log position A01, and the ‘insert b’ corresponds to log position A02 and so on. When CDC is replicating and applies data to the target database, it will also write an entry into the bookmark table (within the same commit). The example above illustrates that CDC has replicate all four inserts and thus the bookmark would contain the log position of the last operation (in this case ‘insert d’). The above does over simplify the bookmark. In actuality, multiple items make up the CDC bookmark. As it pertains to the topic of CDC in a DR environment, there are two key aspects that you need to be aware of: 1. The last applied log position (which was illiustrated above) 2. The earliest open transaction log position When CDC reads data from the logs, it will first build transactions of the source system and will not send them to the target system until a complete transaction is built. The earliest open log position keeps track of the log position for the start of a transaction which CDC has started to process (read the logs), but has not yet seen an end transaction for. This is an important concept for DR recovery as when CDC restarts replication, it may potentially need to go back to the earliest open log position. This is another good reason to follow a best practice of trying to avoid large and long running transactions. INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 5 OF 24 CDC Install/Instance creation on backup (DR) system How CDC is installed and how the instance should be created will depend on the DR replication solution being utilized. If you are using physical replication to your DR machine, it is best to include the file system that CDC is installed on to be mirrored to the DR site as well. In this case, no additional install or instance creation will be required on the DR system. In all other scenarios where the file system that CDC is installed in is not replicated to the DR system, you will be required to do an additional install of CDC on the DR system, and to create the CDC instance as well. CDC Fail-over considerations The following are items that need to be considered when CDC is failed-over to another system: 1. IP Address used to reach CDC 2. CDC configuration metadata (stored in an internal database) 3. CDC Operational metadata (stored in client database), most importantly the CDC bookmark table When connecting to a CDC instance, an IP address is specified. When dealing with a DR scenario, it is easiest to use a Virtual IP Address, and then no additional steps will be required. If you cannot use a Virtual IP Address, you will need to change the IP address in the CDC configuration. Instructions on how to change the CDC IP address can be found in Changing CDC IP Address. INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 6 OF 24 PRODUCTION SYSTEM BEING RECOVERED Different DR Recovery Scenarios For now, let’s consider that CDC is installed locally on the source and target database servers. In the following diagram, the active production CDC replication is indicated by the solid arrow. The dashed arrows numbered 1 to 3 represent possible CDC replication after either the source production server fails and is switch over to a backup server, or the target production fails and is switched over, or both the source and target production servers fail. INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 7 OF 24 Overview of the Three Types of Disaster Recovery Replication Methods The procedures for recovery of CDC to a DR site will differ depending on the type of DR recovery method used. Following are descriptions of the three main DR replication methods: Synchronous Physical DR Replication Method Although the replication may not be instantaneous, synchronous physical replication will ensure that the data on the source and target will remain synchronized in the event of a failure. Note that in the above example the DR log always contains the exact same image as the Production log. When using Synchronous Physical DR replication, you can only replicate the database, or replicate the database and the CDC instance directory. It is recommended that you also replicate the CDC instance directory. By also replicating the CDC instance directory you will not need to do a separate install on the DR system, and it will ensure that the CDC internal metadata is kept in sync. Asynchronous Physical DR Replication Method Using asynchronous physical replication means that there is the possibility that the DR system will have an image from an earlier point in time than the Production System. Note in the above example that the log on the DR is at a different point in time than the log position on the production system. When using asynchronous Physical DR replication, you can only replicate the database, or replicate the database and the CDC instance directory. It is recommended that you also replicate the CDC instance directory. By also replicating the CDC instance directory you will not need to do a separate install on the DR system, and it will ensure that the CDC internal metadata is kept in sync. Even though there is some latency with asynchronous physical replication, given that the CDC metadata rarely changes (only when configuration changes are made), the CDC metadata on the DR site should always be in sync. INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 8 OF 24 Asynchronous Logical DR Replication Method Using asynchronous logical DR replication makes recovery of CDC significantly more complex than by using physical DR replication. As illustrated in the example about, the asynchronous nature will mean that the DR log will be at an older position than the production system. Additionally, where the real complication comes in with logical replication, there is a different log on the DR system. The Log positions on the DR system will not correspond to the log positions on the primary production system. This means that the CDC bookmark which is stored on the target side cannot be used to restart replication. INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 9 OF 24 SYNCHRONOUS PHYSICAL REPLICATION METHOD USED FOR DR SYNCHRONIZATION Using a synchronous physical replication method to mirror the production system to a DR system is the best (most simplistic) approach to use when CDC is being used to replicate data from or to those systems. The scenario is illustrated in the following diagram: In the above example depicted by the wide arrows, whole system physical replication is used between the production and DR servers. Also key is that a synchronous mode is being utilized for the replication. DBMS level sychronization between the production and DR system is handled by the DR soltion. It is also ideal to use the physical replication to mirror the CDC instance directory. If you are using sychronous replication for the DBMS, but not the CDC instance directory, please see section ‘Keeping CDC Instance Synchronized’. If you do use physical mirroring for the CDC Instance, since the entire CDC instance and Database has an exact copy on the DR system, it makes fail-over straight forward. As a result, there are no CDC specific considerations about log position and data availability for disaster recovery in this scenario. The only CDC consideration is dealing with an IP address change. Using a virtual IP address is recommended and simplifies the process. If a virtual IP address can not be used, you will need to follow the procedures outline in the ‘Changing CDC IP Address’ section of this document. INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 10 OF 24 Following is a high level flow diagram to illustrate the choices for sychrnous physical DR replication: INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 11 OF 24 ASYNCHRONOUS PHYSICAL REPLICATION METHOD USED FOR DR SYNCHRONIZATION Utilizing asynchronous replication increases the complexity of a CDC DR solution considerably. The reason for this is that there are multiple different recovery situations that you may need to deal with. The following section will outline each of these scenarios utilizing diagrams to illustrate the concepts. The above diagram illustrates some of the possible outcomes at point of failure. The data replicated to the target relative to the source (may be failed-over) can be either ahead, equal to, or behind. For instance, in the above diagram if the only the source production switched over to the DR box, the production target database is ahead of the source, and contains data that does not exist on the source. If only the target production machine switched over to the DR target, then the data on the DR target would be behind the production source, which is the easy situation to deal with. Refer to section ‘How To Determine If Log Position Valid After Switch-Over’ for information on determine which scenario you fall into after a failover. Lastly, there are common considerations for CDC instance sychronization and IP Address change when using asynchronous pysical DR replication method versus sychronous physical DR replication, and as such will not be repeated here. INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 12 OF 24 Asynch Physical DR Replication Case 1: CDC Target Same or Behind Source If you are using asynchronous physical DR replication, where the CDC target is the same or behind the source log after switchover, then you have a straight forward recovery case. Below are two examples of situations which match this scenario: or In the first example, the source switches over to the DR site. In this case, the DR source system has more recent data than the production target. In the second example, the production target switches over to the DR target, and again, the source has more recent data than the target. In both of these examples, since the data in the target is older than what is available in the production server, there is no special CDC consideration required. When you restart CDC, it will go back in the log based on the log position stored in the CDC target bookmark, and will restart replication as per normal operating behavior. INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 13 OF 24 Asynch Physical DR Replication Case 2: CDC Target Ahead of Source If you are using asynchronous physical DR replication, and after switchover the CDC target is ahead of the source log, then recovery is significantly more involved. The following diagram illustrates this case: In the above example, CDC had replicated the data upto log position A04. However, the asychronous source DR replication had only replicated upto the point in time of log position A02. Thus, at the point in time of the fail-over to the DR source, the DR source is missing data that has already been applied to the target system. Because of this, the bookmark on the target system is invalid as it doesn’t exist on the source system. This situation is one you want to avoid at all costs. If for instance you know that the asynchronous DR replication will at most be 2 minutes latent, one way that you may be able to prevent this situation is by creating a CDC target user exit that will delay the apply by a set amount of time (in this case for example, 3 minutes). There is a sample user exit available in Developer Works located here: https://www.ibm.com/developerworks/community/files/app/file/f047a38c-734a-4071-8a3c4fe37c85baeb If you need to deal with the situation in the example above, then you are out of sync and need to make a business decision on how to move forward. Here are some possible options: 1) Source DR should be reconciled with the latest changes that were replicated to the target but missed by the Asynch DR Replication solution • In this case, the operations/transactions that were not captured by the DR replication solution need to be repopulated to the source DR location • This reconciliation needs to be performed before restarting the source applications on the DR site INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 14 OF 24 • Once the reconciliation is complete, you can mark the table capture point • Next you can allow access to the source applications to the site • Lastly, you would restart CDC replication Note, if everything was not perfectly reconciled, you may have replication errors and CDC would shut down. 2) Perform a refresh of any table that has “newer” data than is available on the source. If you don’t know which tables that is, could mark table capture point (before allowing users on the production system), and start replication. Then if CDC encounters an issue applying to a table and shuts down, you will need to refresh that table and start up mirroring again. You may have to do this iteratively until all out-of-sync tables have been refreshed. 3) If your business decision is to just continue replication, and leave the additional data on your target system, then you cannot continue to use standard mirroring as it would fail as soon as one operation hit data it did not expect and for instance get a duplicate key error. As such, you would have to mark table capture point to reset the starting position to ‘now’. You would also have to enable Conflict Detection and Resolution (CDR) on all tables. You would have to set the conflict method to “Source Wins” so that CDC would continue replication when a conflict occurs and the value from the source will be used. For example, if there was an insert on the source, and the record already existed in the target, the CDR would change the insert to an update and the final image would be the same as on the source system. Note that using CDR will have a significant performance impact, so if your data volumes are high, this may not be a feasible option. A high level summary of decision flow is as follows: INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 15 OF 24 INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 16 OF 24 ASYNCHRONOUS LOGICAL REPLICATION METHOD USED FOR DR SYNCHRONIZATION As mentioned in the DR method overview section, using asynchronous logical DR replication results in the most complex recovery of CDC. The following diagram will be used to illustrate the scenarios: In case ①, when the target fails, you will not be required to do any special processing (beyond ensuring that the CDC instance configuration is up to date), and can just restart CDC replication which will pick up from the last applied entry. In case ②, and ③, the DR source has a different log and the log positions do not correspond to the log positions found on the original production source. There is also an added potential complexity that the Object IDs are different on the DR source. Additionally, this example also illustrates that the CDC target may be ahead of the DR replication so there is data on the target that does not exist on the source. The techniques to deal with the newer data would be the same as those outlined for the asynchronous physical DR replication. Dealing with the potential difference in Object IDs will require the table mapping to be reconfigured. The simplest way to accomplish this is by doing an export on the production system, and an import on the DR system. Note that after you import the table mappings, the tables wil be marked for refresh and the bookmark position will be ‘reset’. The bookmark position is not valid since it corresponds to another log. You will need to change from refresh to mirroring, and you would have to mark the table capture point to set an appropriate bookmark starting position. INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 17 OF 24 So we are left with dealing with the fact that the DR system has a different log. To simplify the procedures, it is best to ensure that the CDC target is never ahead of the DR source from a log processing perspective. Additionally, when you restart, you would either refresh tables, or use an adaptive apply technique. If this isn’t possible, the following procedure is one way to mitigate this factor. However, there is significant extra complexity and points of failure, so using this procedure would not be a best practice: • Note that this procedure will only work for databases and DR replication technology where CDC will be able to run replication off the active DR database • You will need to create an additional table which I will call CDC_Restart. This table will contain an ID and a timestamp column • The CDC_Restart table will be replicated by CDC from the Production Source to the Production Target using an adaptive apply method. You will need to create a separate subscription for replicating this table. o On the target, this table will have an additional column to hold a CDC bookmark • The CDC_Restart table will also be replicated to the DR Source via the Asychronous Logical DR replication. • Lastly, the CDC_Restart table will be replicated by CDC from the DR Source to the Production Target as well using standard replication. The table on the target will be called CDC_DR_Restart. You will need to create a separate subscription for replicating this table. On the DR Source instance, keep only the subscription mirroring the CDC_Restart active, all the other subscriptions must be kept idle in Refresh/parked state. o On the target, the CDC_DR_Restart table will have an insert trigger on it. The trigger will call the DMSHOWBOOKMARK command and will then insert and/or update the corresponding record in the CDC_Restart table (which is also on the target system). • You will need to create a script that runs on the Production Source and will insert data into the CDC_Restart table as a set interval (eg. Every 5 seconds) • When a failover happens, you will need to check for the latest entry in the CDC_Restart table that contains a bookmark. You would then use this bookmark to set the bookmark on the source system and restart replication utilizing CDR for dealing with conflicts as some data will be replayed. The following two diagrams illustrate the scenarios where the CDC replication is ahead of the DR replication and is behind the DR replication: INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 18 OF 24 A high level summary of decision flow is as follows: Note that the above is very simplified as the Mark Correct log position can involve the complex setup that was listed previously. INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 19 OF 24 KEEPING CDC INSTANCE SYNCHRONIZED The ideal case is to use physical replication to mirror the CDC instance directory to the DR machine (this would not be applicable to a MS Windows environment). If physical replication of the CDC instance directory is not possible, then you will have to perform the following: 1. You must install an instance of CDC on the DR machine 2. You will need to copy over the CDC metadata from the production server initially, and then any time that configuration changes happen (such as table mapping or table definition change) a. For each CDC instance in Production: : run <install_dir>/bin/dmbackupmd –I <instance> b. Stop the Backup Server CDC instance, and remove all md*.* files from the <installe_dir>/instance/<instance>/conf directory c. Copy md$1.wal and md.dbn files created in step a in <installe_dir>/instance/<instance>/conf/backup to each instance in DR You need to be aware that CDC maintains a linkage between the instance operational metadata in the user database and the configuration metadata by means of an instance is value. At instance start, the instance id value stored in each is compared. If they do not match, and error is generated and the instance will fail to start. Thus, it is crucial that the metadata is always properly synchronized. If you start up and get the mismatch error, you will need to call IBM L2 support to past this. INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 20 OF 24 CHANGING CDC IP ADDRESS As mentioned previously, it is recommended that a Virtual IP Address is used across the production and DR server as this will be transparent to CDC upon switchover (other than you will be required to restart Access Server due to caching of DNS name). However, if using a virtual IP address is not possible, the following steps must be taken upon switchover to update the CDC instance with a new address. If it is only the CDC source instance that is switching over, you only need to do steps 1 to 5. If you have switched over the CDC target instance, then you also need to do steps 6 to 12 for all related subscriptions. INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 21 OF 24 If you are only dealing with a CDC source switch-over, and the version of CDC you are using is IIDR 10.2 or newer, then you can simplify the procedure by utilizing the following CHCCLP commands to replace steps 1 to 5 above: modify access datastore name ds_name hostname new_hostname; modify subscription name sub_name tcphostname new_hostname; INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 22 OF 24 HOW TO DETERMINE IF LOG POSITION VALID AFTER SWITCH-OVER If you are using asychronous physical DR mirroring, the easiest way to know if the target is ahead of the source is to start CDC replication (before making the source system available to users). If upon restart, the bookmark position is not found, the target is ahead of the source. Since the CDC bookmark is specific to the source database, there are different procedures to determine the right bookmark, and for some databases, can only be done with the help of IBM L2 support. The following sections outlines the procedures for some databases. 1. Start with checking the target subscription bookmark position a. From Target, run <install_dir>/bin/dmshowbookmark –I <instance> -s <subscription> b. Copy the resultant bookmark over to the source system 2. Check the SCN of the target bookmark a. From the CDC Source system, run <install_dir>/bin/dmdecodebookmark –I <instance> -b <bookmark information from step 1> 3. Check the current log position from the source DBMS a. From Oracle source DBMS, run SQL or command based on the specific DBMS b. Oracle: DB2: select current_scn from v$database; db2pd -d <db name> -logs | grep “Current LSN” 4. If the log position of the target bookmark’s current log position is larger (newer) than the source log position it means that you have additional data. a. You would need to follow the directions from section Asynch Physical DR Replication Case 2: CDC Target Ahead of Source b. From the source, before allowing users back on the database, run <install_dir>/bin/dmmarktablecapturepoint –I <instance> -s <subscription> -A INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 23 OF 24 ADDITIONAL RECOVERY ASPECTS IF YOUR TARGET IS NOT A DATABASE If your CDC target is not a Database, then the CDC bookmark will not be in the customer target database, but rather stored in the internal CDC metadata. This aspect requires special consideration for DR of the CDC target system. The following CDC engines are examples of ones that have this characteristic: • CDC Event Server • CDC for DataStage if using FlatFile Since the bookmark is stored in the CDC Instance directory, it is very important to ensure that the CDC target instance directory is mirrored to the DR target system for easier recovery. The other approach is to use the dmbackupmd command on a very regular basis and ensure that it is available for restore on the backup target system. Note, using this technique the bookmark stored will be at a point in time behind what has already been applied, so when replication is restarted, you will “replay” some transactions. INFOSPHERE CDC – HOW TO PERFORM INITIAL SYNCHRONIZATION PAGE 24 OF 24