Download Implementing Oracle9i Data Guard

Database IMPLEMENTING ORACLE9I DATA GUARD Michael New, Gradation LLC ([email protected]) Abstract ................................................................................................................................................................... 3 Specific Data Guard Environment Presented.......................................................................................................... 3 Overview Of Data Guard Concepts ........................................................................................................................ 4 Operational Requirements .......................................................................................................................................................................... 4 Data Guard Architecture ............................................................................................................................................................................ 5 Best Practices For Data Guard Configurations ....................................................................................................... 7 Implement Backup and Recovery Strategy First ..................................................................................................................................... 8 Set Appropriate Database Protection Mode............................................................................................................................................ 9 Use FORCE LOGGING Mode ............................................................................................................................................................... 9 Use Identical Primary And Secondary Sites ............................................................................................................................................. 9 Recommended Initialization Parameter Settings .................................................................................................................................. 11 Recommended Oracle Network Structure ........................................................................................................................................... 16 Consider Standby Redo Logs ................................................................................................................................................................... 20 Monitor Data Guard Configuration........................................................................................................................................................ 21 Consider Data Guard Broker ................................................................................................................................................................... 21 Create Data Guard Environment ........................................................................................................................... 22 Implement FORCE LOGGING Mode (Pri) ....................................................................................................................................... 22 Create Identical Primary & Secondary Sites (Pri+Sec) ........................................................................................................................ 23 Set Initialization Parameters – Use SPFILE (Pri+Sec) ........................................................................................................................ 24 Create Oracle Network Configuration Files (Pri+Sec) ........................................................................................................................ 26 Change To ARCHIVELOG Mode With Automatic Archiving (Pri) ............................................................................................... 27 Backup Primary Database (Pri) ................................................................................................................................................................ 28 Create Standby Control File (Pri) ............................................................................................................................................................ 28 Copy Backup To Standby Site (Sec)........................................................................................................................................................ 28 Recover Standby Database (Pri,Sec) ....................................................................................................................................................... 29 Enable Data Guard Broker (Pri+Sec) .................................................................................................................................................... 33 Appendix 1: SPFILEs ........................................................................................................................................... 34 SPFILE for PROD1 on node1 ................................................................................................................................................................ 34 SPFILE for PROD1 on node2 ................................................................................................................................................................ 35 SPFILE for PROD2 on node1 ................................................................................................................................................................ 35 SPFILE for PROD2 on node2 ................................................................................................................................................................ 36 Appendix 2: Network Configuration Files ........................................................................................................... 38 listener.ora for node1................................................................................................................................................................................. 38 listener.ora for node2................................................................................................................................................................................. 39 tnsnames.ora for node1 ............................................................................................................................................................................. 40 Paper #36226 1 Database tnsnames.ora for node2 ............................................................................................................................................................................. 41 tnsnames.ora for Non-Privileged Clients ............................................................................................................................................... 41 tnsnames.ora for DBA Clients ................................................................................................................................................................. 41 Appendix 3: Monitoring Scripts for Standby Databases ...................................................................................... 43 crontab For Monitoring Scripts ............................................................................................................................................................... 43 STBY_gap_check.ksh ................................................................................................................................................................................ 43 setup_ora_STBY ........................................................................................................................................................................................ 44 recovery_mode_check.sql ......................................................................................................................................................................... 45 archive_log_gap_check.sql ....................................................................................................................................................................... 45 STBY_integrity_check.ksh ....................................................................................................................................................................... 46 unrecoverable_change.sql ......................................................................................................................................................................... 47 STBY_gap_check.ksh Email Output ...................................................................................................................................................... 47 STBY_integrity_check.ksh Email Output.............................................................................................................................................. 48 begin_managed_standby.ksh .................................................................................................................................................................... 50 recover_managed_standby.ksh ................................................................................................................................................................ 50 Paper #36226 2 Database ABSTRACT Oracle has introduced many new features in Oracle9i Data Guard to enhance Oracle8i standby database functionality. This white paper covers how to implement Oracle9i Data Guard. An actual production Data Guard environment is presented, a bidirectional physical standby database configuration between two Linux Red Hat 7.1 servers running two separate databases. However, the concepts presented apply to any platform. Step-by-step procedures and actual configuration files demonstrate a working Data Guard implementation.1 SPECIFIC DATA GUARD ENVIRONMENT PRESENTED The environment presented here is that from an implementation of Data Guard at a customer site. It was a standard bidirectional Data Guard configuration built for two large mission-critical data warehouses on two separate Linux servers, as shown in Figure 1 below: node1 node2 Primary database: PROD1 Primary database: PROD2 Standby database: PROD2 Standby database: PROD1 Figure 1: Standard bidirectional Data Guard configuration implemented. The specifics of this implementation were to create a standby database on server node2 for an existing database, PROD1, on node1. Similarly, to create a standby database on server node1 for a separate existing database, PROD2, on node2. Because one implementation was the mirror image of the other, the generic steps were the same to build each standby. The first standby built was the PROD1 standby on node2. Then the PROD2 standby was built on node1. Whenever specific commands or values are indicated here, they are for the PROD1 standby implementation on node2. Each server was running a separate instance of Oracle 9.2.0.1.0 configured for dedicated server on a Red Hat Linux 7.1 platform. Each server was a Dell Poweredge 2550 with two 1000Mhz P3 processors and 4GB RAM. The two servers were located on a WAN, and the network provided high throughput between the servers. These servers were located in different geographical locations, thereby providing disaster recovery. The primary database on one server had its standby database on the other server to make efficient use of each system with no idle hardware. If either primary database became incapacitated, the physical standby database at the other location could be failed over to the primary role so processing could continue. 1 For questions or issues regarding this material, please feel free to contact me at [email protected]. Paper #36226 3 Database OVERVIEW OF DATA GUARD CONCEPTS Data Guard is software that maintains a standby database, or real-time copy of a primary database. Data Guard is an excellent High Availability (HA) solution, and can be used for Disaster Recovery (DR) when the standby site is in a different geographical location than the primary site. When the sites are identical, and the physical location of the production database is transparent to the user, the production and standby roles can easily switch between sites for many different types of unplanned or planned outages.2 Oracle Data Guard manages the two databases by providing remote archiving, managed recovery, switchover and failover features. A secondary site that is identical to the primary site allows predictable performance and response time after failing over or switching over from the primary site. An identical secondary site also allows for identical procedures, processes, and management between sites. The secondary site is leveraged for all unplanned outages not resolved automatically or quickly on the primary site, and for many planned outages when maintenance is required on the primary site. Data Guard with a physical standby database provides benefits, which fall into two broad classes:  Availability and disaster protection - provides protection from human errors, data failures, and from physical corruptions due to device failure. Provides switchover operations for primary site maintenance, and different database protection modes to minimize or create no data loss environments. A specified delay of redo application at the standby database can be configured to ensure that a logical corruption or error such as dropping a table will be detected before the change is applied to the standby database. Using the standby database, most database failures are resolved faster than by using ondisk backups since the amount of database recovery is dramatically reduced. The standby database can be geographically separate from the primary database, a feature that provides Disaster Recovery against local catastrophic events. Data Guard, therefore, provides a higher degree of availability than other HA methods that do not employ a second database, such as Real Application Clusters (RAC) or Highly Available Disk Arrays (HADA).  Manageability - provides a framework for remote archiving services and managed standby recovery, contains role management services such as switchover and failover, and allows offloading of backups and read-only activities from the production database. The Data Guard broker provides the Data Guard Manager GUI and command-line interface to automate the management and monitoring of the Data Guard environment. OPERATIONAL REQUIREMENTS Below are operational requirements for maintaining a standby database. Some of these requirements are more lax then Data Guard best practices would dictate (see Best Practices For Data Guard Configurations below):  The primary database must run in ARCHIVELOG mode.  The primary and standby databases must be the same database release. To use the Data Guard broker, the database server must be licensed for Oracle9i Enterprise Edition or Personal Edition. The operating system on the primary and standby sites must be the same, but the operating system release does not need to be the same. The hardware and operating system architecture on the primary and standby locations must be the same. For example, a Data Guard configuration with a primary database on a 32-bit Linux system must be configured with a standby database on a 32-bit Linux system.  The primary database can be a single instance database or a multi-instance Real Application Clusters database. The standby databases can be single instance databases or multi-instance Real Application Clusters databases, and these standby databases can be a mix of both physical and logical types.  If using a physical standby database, log transport services must be configured to specify a dedicated server process rather than a shared server (dispatcher) process in managed recovery mode. Although the read-only mode allows a shared server process, you must have a dedicated server once you open the database again in managed recovery mode.3  The hardware (for example, the number of CPUs, memory size, storage configuration) can be different between the primary and standby systems. 2 Much of the material in this section is taken from Oracle9i Data Guard Concepts and Administration. Oracle9i Data Guard Concepts and Administration, Section 5.1 Introduction to Log Transport Services. This requirement is easy to miss, only found referenced in a Note in this section. 3 Paper #36226 4 Database  Each primary database and standby database must have its own control file.  If you place your primary and standby databases on the same system, you must adjust the initialization parameters correctly. DATA GUARD ARCHITECTURE Oracle9i Data Guard uses several processes to achieve the automation necessary for disaster recovery and high availability:     On the primary location, log transport services use the following processes:  Log writer process (LGWR) – This process collects transaction redo and updates the online redo logs.  Archiver process (ARC) – These processes create a copy of the online redo logs, either locally (or remotely for standby databases).  Fetch archive log (FAL) process (physical standby databases only) – This process provides a client/server mechanism for resolving gaps detected in the range of archived redo logs generated at the primary database and received at the standby database. The FAL client requests the transfer of archived redo log files automatically when it detects a gap in the redo logs received by the standby database. The FAL server typically runs on the primary database and services the FAL requests coming from the FAL client. The FAL client and server are configured using the FAL_CLIENT and FAL_SERVER initialization parameters which are set on the standby location. On the standby location, log transport services use the following processes:  Remote file server (RFS) – This process receives redo logs from the primary database.  Archiver process (ARC) – This process archives the standby redo logs when standby redo logs and LGWR are used. On the standby location, log apply services use the following processes:  Managed recovery process (MRP) - For physical standby databases only, the MRP applies archived redo log information to the physical standby database.  Logical standby process (LSP) - For logical standby databases only, the LSP applies archived redo log information to the logical standby database, using SQL interfaces. On the primary and standby locations, the Data Guard broker uses the following processes:  Data Guard broker monitor (DMON) process – These processes work cooperatively to manage the primary and standby databases as a unified configuration. The DMON processes work together to execute switchover and failover operations, monitor the status of the databases, and manage log transport services and log apply services. Figure 2 identifies the relationships of these processes to the operations they perform and the database objects on which they operate in the absence of the Data Guard broker. Paper #36226 5 Database Figure 2: Data Guard architecture. In this figure, the standby redo logs are optionally configured for physical standby databases, except when running in maximum protection mode, which requires physical standby databases and standby redo logs. Logical standby databases do not use standby redo logs. LOG TRANSPORT SERVICES The log transport services component of the Data Guard environment is responsible for automatic archiving of primary database online redo logs. Log transport services provide for the management of archived redo log permissions, destinations, transmission, reception, and transmission failure resolution. In a Data Guard environment, the log transport services component coordinates its activities with log apply services and role management services for switchover and failover operations. LOG APPLY SERVICES Log apply services automatically apply archived redo logs to maintain transactional synchronization with the primary database, and allow transactionally consistent read-only access to the data. The main difference between physical and logical standby databases is the manner in which log apply services apply the archived redo logs. For physical standby databases, log apply services maintain the standby database in managed recovery mode or open read-only mode. For logical standby databases, log apply services maintain the standby database in SQL apply mode. The following list summarizes these modes: Paper #36226 6 Database  Managed recovery mode (physical standby databases only) - In this mode, log transport services archive logs to the standby site, and log apply services automatically apply these logs. If you want maximum protection against data loss or corruption, then maintain the standby database in managed recovery mode in a Data Guard environment.  Read-only mode (physical standby databases only) - Use read-only mode for supplemental reporting of data contained in the primary database. If you want to use the standby database for reporting purposes, then open it in read-only mode in a Data Guard environment. Log apply services cannot apply archived redo logs to the standby database when it is in this mode, but you can still execute queries on the database. While the standby database is in read-only mode, it can continue to receive archived redo logs from the primary database.  SQL apply mode (logical standby databases only) - Log apply services manage logical standby databases in SQL apply mode only. Only logical standby databases can be opened in read/write mode, but the target tables for the regenerated SQL statements are available only in read-only mode for reporting purposes. The SQL apply mode supports the application of SQL statements and reporting activities simultaneously to the logical standby database. ROLE MANAGEMENT SERVICES A database operates in one of two mutually exclusive roles: primary or standby. Role management services operate in conjunction with the log transport services and log apply services to change these roles dynamically as a planned transition called a switchover operation, or as a result of a database failure through either a graceful failover or a forced failover operation. To summarize, here are the types of failover:  Database switchover - Switchover operations provide the means to transition database roles from a primary role to a standby role and visa-versa.  Graceful (no-data-loss) failover - A no-data-loss failover is possible if the corresponding primary database is operating in either the Maximum Protection or Maximum Availability data protection mode.  Forced (minimal-data-loss) failover - Minimal-data-loss failover occurs when primary database modifications are not yet available on the standby site at the time the failover operation occurs. Minimal data loss is possible during a forced failover when operating in Maximum Performance mode. You should not fail over to a standby database except in an emergency, because the failover operation is an unplanned transition that may result in data loss. If you need to fail over before the primary and standby databases are resynchronized, and if the standby database becomes inaccessible, data on the primary database may diverge from data on standby databases. You can prevent this by using a combination of forced logging (to force the generation of redo records of changes against the database), standby redo logs, and the Maximum Protection mode with physical standby databases. The amount of data differences or data loss you incur during a failover operation is directly related to how you set up the overall Data Guard configuration, and log transport services in particular, and can be prevented entirely by using Maximum Protection mode. BEST PRACTICES FOR DATA GUARD CONFIGURATIONS A robust Data Guard standby database setup is essential to ensuring that both databases work properly during switchover, failover and switchback operations. To this end, following is a summary of the recommended Data Guard configuration options. The exact options and parameters specified below are those implemented in the environment discussed below under Create Data Guard Environment. A number of these best practices recommendations are attributed to an excellent Oracle White Paper called Maximum Availability Architecture (MAA). 4 However, an important distinction between the environment presented in this MAA White Paper and that here is that the former recommends the Maximum Protection mode whereas the customer here chose Maximum Performance mode. So while most of these options are best practices for any type of Data Guard configuration, a few options were affected by the choice of Maximum Performance mode, and are so noted. Examples of these options are several LOG_ARCHIVE_DEST_2 attributes, and choosing not to use standby redo logs (SRLs). Some options not implemented here (such as SRLs) are presented for sake of completeness, and are not necessarily inconsistent with Maximum 4 Maximum Availability Architecture (MAA), July 2002, an Oracle White Paper. Paper #36226 7 Database Performance mode. See the MAA White Paper for settings more appropriate for Maximum Availability mode, or for more details on options not implemented in the particular environment presented here. While Maximum Protection mode is certainly the more robust choice for High Availability, the term best practices is relative and must be adapted to a particular client’s environment and needs. In this case study, our customer had two separate data warehouse environments, both of which were recoverable from daily backups within the timeframe specified by their service level agreements. The benefit of increased database performance by using Maximum Performance mode outweighed the possibility of minimal data loss. Data Guard is best implemented on Oracle Server 9.2.0.1 or higher. Oracle Server 9.0.x, particularly the base release, contains many bugs related to Data Guard, as well as to recovery and instance crash, many with Severe Loss of Service priority. Most of these bugs are fixed in Product Version 9.2.0.1, and a few are resolved in 9.0.1.x. So it’s best to upgrade to the latest release of 9i. Following are some of the more serious Data Guard bugs found in Release 9.0:  Bug No. 2083080 - ORA-1679 occurs when you activate the standby database, if you already opened the standby in readonly mode, and mounted it (by recovery).  Bug No. 2151468 - switchover from the standby database hangs, but works from the primary database.  Bug No. 1034871 - BACKUP CONTROLFILE TO TRACE output is wrong if using a standby controlfile. In particular, read-only files may be listed when they should not be.  Bug No. 1920673 - Standby database recovery may fail (end-of-redo error is possible) in delayed protection mode.  Bug No. 1950279 – A network disconnect from a no-data-loss standby can corrupt primary online redo logs. IMPLEMENT BACKUP AND RECOVERY STRATEGY FIRST  Implement an efficient Backup and Recovery (B&R) architecture before setting up a standby database. A standby provides High Availability (HA) functionality. The bidirectional failover standby database design presented here is a common, practical, and stable HA configuration. However, good B&R mechanisms should be in place before implementing a standby.5  It is extremely important to recognize the difference between HA and B&R functionality. The impetus for HA, obviously, is to provide a high degree of access to database users. B&R, however, is concerned with the recovery of data in the event of physical or logical corruption. (A standby can prevent logical corruption, but only if it is discovered before being propagated to the standby). If no B&R mechanism exists, some failures would result in data loss despite a working standby.  A standby is not considered a B&R solution because it may not always be available. Standby environments depend on multiple devices, which substantially increases the potential for hardware failure. It is common to experience problems in the network or the failover platform that prohibit the transmission of archive logs from the primary to the standby. Naturally you do not want to shutdown your primary production environment in these cases, thereby defeating the HA objective. Another time standby databases are vulnerable as a B&R solution is during failover and subsequent switchback. These operations require refreshing both primary and secondary to a current state. During these activities the standby capability is diminished or non-existent. Therefore, for the length of these periods, any concurrent hardware failure would result in data loss if no B&R mechanisms existed.  When implementing backup and recovery, it is highly advantageous to use multiple, even overlapping, backup methods. This allows you to choose the most expedient mechanism to minimize downtime when database recovery is required. Experience shows that using a combination of binary and logical (i.e. database exports) backups to tape and disk provides the maximum flexibility to recover from any failure scenario. Data Management Solutions Oracle High Availability Guidelines and Recommendations, M. Burke, 2003, ThinkSpark, LLP. General overview of Oracle-based High Availability techniques. Please contact ThinkSpark for copies and additional information. 5 Paper #36226 8 Database SET APPROPRIATE DATABASE PROTECTION MODE You can choose between three protection modes, with different logging options. Each mode for your environment has a different impact on availability, costs, data loss, performance, and scalability. Choose one of the following depending on your service level agreements:  Maximum Protection mode with LGWR SYNC AFFIRM option for an environment that requires no data loss and no divergence. Performance overhead is incurred.  Maximum Availability mode with LGWR SYNC AFFIRM option for an environment that needs no data loss but tolerates divergence when sites are temporarily inaccessible.  Maximum Performance mode with LGWR ASYNC (AFFIRM or NOAFFIRM) option for an environment that tolerates minimal data loss and divergence when sites are temporarily inaccessible. Performance overhead is minimized. The only difference between the Maximum Protection and Maximum Performance configuration is whether LGWR writes synchronously or asynchronously, respectively. For the environment presented here, Maximum Performance mode with LGWR ASYNC NOAFFIRM was chosen based upon client requirements. USE FORCE LOGGING MODE FORCE LOGGING mode, a new feature to Oracle Release 9.2, will override any user transactions that specify NOLOGGING. This prevents unlogged direct writes in the primary database from propagating to the standby database. Unlogged operations invalidate the standby database and may require substantial DBA intervention, including the need to propagate unlogged operations manually. Turn on FORCE LOGGING at the primary database before taking datafile backups for standby creation. Keep the database in FORCE LOGGING mode as long as the standby database is required. See Implement FORCE LOGGING Mode below for details on how to check a database’s current logging mode, how to persistently change it to FORCE LOGGING, and for related advice when recreating the control file. USE IDENTICAL PRIMARY AND SECONDARY SITES An obvious premise to using identical and secondary sites is that the primary and standby databases are not located on the same server. That said, the idea behind identical configurations at each of the Data Guard sites is the interchangeability of primary and standby roles at each site. You can execute the same set of steps for backup, switchover and failover operations, which reduces maintenance complexity and minimizes the number of steps required, thereby reducing operator error. Ideally, the following are identical at primary and secondary sites. IDENTICAL LOGICAL STRUCTURES Logical structures include database-related files and other operating system (O/S) files. Hardware solutions usually involve a considerable investment in hardware or software. However, no such investment is necessary to implement identical file and directory structures. At the database level, datafiles, log files and control files should contain matching names and paths, and use OFA naming conventions if at all possible. Archive directories should also be identical between sites, including size and structure. At the O/S level, maintain the same partition names, privileges, accounts, other software, database and nondatabase directories and files. Duplicate logical structures allows for identical operational practices. The Data Guard environment covered in this paper used identical logical structures. IDENTICAL NETWORK CONFIGURATIONS It is best to use the same Oracle and O/S network configurations at each site. This does not mean that the Oracle network configuration files can be identical – different server names at each site dictate that the HOST values in the network configuration files will differ, for example. But these network configuration files should have the same overall structure, i.e. the same number of listeners, and the same ports dedicated to the same functions at each site. Using the same network structure obviates the need to manually change the network configuration at each site when their roles switch. Instead, full Data Guard functionality can be built into the network configuration up-front. For instance, you can configure multiple listeners (and corresponding tnsnames.ora entries) at each site to handle normal SQL*Net connectivity requests as well as redo log traffic in either direction, depending upon the role of each site. The primary and secondary site Oracle network configurations here were identically structured on both sites. Paper #36226 9 Database As with Oracle network configurations, O/S network infrastructures should be the same across all sites. This includes the number of NIC cards. You must ensure that the operating system network infrastructure between sites will support the redo traffic. Because the redo logs on the production database update the physical standby database, your network infrastructure must have sufficient bandwidth to handle the maximum redo traffic at peak load. Furthermore, network latency affects overall throughput and response time for OLTP and batch operations. Not only does a separate NIC at each site provide the required bandwidth for redo log transmission, it also guards against a single NIC point of failure. A loss of network communications between primary and backup systems due to a network card failure would prevent archive log transport from the primary to the standby. Therefore, for redundancy purposes alone, you would do well to install two NICs in (or make a backup NIC available to) each server. For this Data Guard environment, redo traffic traveled on a 100GB dedicated backbone via a second NIC card at each site, whereas regular database application traffic was directed through a 10GB network. The networks were distinguished by different HOST names in the Oracle network configuration files, which were resolved in the /etc/hosts file and routed to the appropriate NIC. NEARLY IDENTICAL INITIALIZATION PARAMETER SETTINGS For databases on both primary and secondary sites, initialization parameters can be nearly identical, including INSTANCE_NAME, SERVICE_NAMES and DB_NAME. This allows for full symmetry and ease of maintenance. The only difference between the sites should be the roles they play. Provided the primary and standby databases are on separate nodes and are logically identical, only four parameters need to differ between them. For a discussion of these four parameters, see Recommended Initialization Parameter Settings below. Functionally, primary and secondary sites require defining many more than just four parameters differently. However, you can define parameters on the standby site that apply only to the primary site because the standby site will safely ignore them. Then, after a switchover or failover, when this standby site becomes the new primary site, these primary-only parameters, which would now be required, would be activated. The same is true for standby-only parameters defined on the primary site. Defining primary-only and standby-only parameters on both sites allows you to essentially build-in a role reversal capability between these sites. The server parameter file (SPFILE) makes the process of role reversal even easier. If you want to use OEM 9i to manage your Data Guard environment, the Data Guard broker employed by OEM actually requires an SPFILE. Initialization parameters stored in an SPFILE are persistent, in that any changes made to the parameters while an instance is running can persist across instance shutdown and startup. This eliminates the need to manually update initialization parameters to make changes effected by ALTER SYSTEM statements persistent. This feature is useful for Data Guard environments because the four parameters that must be different between primary and secondary sites are all dynamic. So to accomplish a switchover or failover, you can persistently change these four parameter values at each site (perhaps in a script) without needing to manually update any init.ora files as follows: SQL> ALTER SYSTEM SET <parameter_name> = <parmeter_value> SCOPE = BOTH; The combination of defining both primary-only and standby-only parameters on both sites, and of using SPFILE to change the four parameters that must be different between sites, greatly simplifies the process of role reversal. Each database in the Data Guard environment presented uses identical initialization parameters (with the four exceptions) at both primary and secondary sites. IDENTICAL HARDWARE ENVIRONMENTS In an ideal world, you would use the same hardware at all sites, including middle or application tiers. The same RAM , CPU and storage systems ensures that after a switchover or failover, the secondary site now in the production role has the capacity to handle the same load and provide the same level of fault tolerance. The primary and secondary site would then only differ by their current production or standby roles. The same operational procedures can be leveraged across both sites. If the hardware is different between servers hosting primary and standby systems, you may have to restrict work done after a switchover to a server with fewer resources. The storage options that exist today provide a variety of features and options. The following features will increase high availability of your Data Guard environment:  Full redundancy for all hardware components. Paper #36226 10 Database  Online parts replacement (hot swappable parts).  Online patch application.  Mirrored write cache with battery backup.  Load balancing and failover capabilities across all host bus adaptors.  Hardware mirroring and striping capabilities - You may want to consider coupling Data Guard with remote mirroring technologies, which can be useful to:  Synchronize non-database files such as Oracle binaries or other software between primary and secondary sites.  Synchronize important flat or binary files such as SPFILEs or init.ora files between primary and secondary sites. That said, Data Guard and physical standby databases provide the following benefits over database remote mirroring solutions with a third party technology:  Protection from user error or data corruption.  Reduced network utilization because only the redo traffic is transferred.  Role management facilities provide simple and integrated switchover and failover procedures. Data Guard’s remote log transport and database protection modes provide a better switchover and failover no-data-loss solution without needing to customize or script, or use remote mirroring technologies.  Simplified support and certification by using an Oracle-based solution. These features often make Data Guard preferable to remote mirroring solutions. Our client here preferred to use relatively inexpensive identical hardware (see Specific Data Guard Environment Presented above), which was sufficient to avoid a further investment in remote mirroring technology. RECOMMENDED INITIALIZATION PARAMETER SETTINGS Many Data Guard configuration options are implemented as initialization parameter settings. Several parameter settings significantly impact the Data Guard environment. Differences in initialization parameters can cause performance degradation at a standby database and, in some cases, halt database operations completely. Table 1 lists the initialization parameters both directly and indirectly related to Data Guard environments. It is indicated whether the parameter applies to the primary role, the secondary role, or both. Regardless of the role a parameter applies to, all parameters, except the four italicized whose values differ based upon role (see Four Parameters That Differ By Role below), are set equally on both primary and secondary sites to simplify role reversal. The Data Guard Broker was enabled (DG_BROKER_START=TRUE is the default) for this configuration, which is a best practices setting. LOG_ARCHIVE_DEST_STATE_2 is the only parameter you need to change during the course of building the standby database (see Dynamically Enable LOG_ARCHIVE_DEST_STATE_2 below). Initially, you need to set this parameter to DEFER, then you dynamically set it to ENABLE when you recover the standby database. By way of example, recommended values and attributes shown are those for the PROD1 primary database running in Maximum Performance mode on node1. All parameter settings except one, LOG_ARCHIVE_DEST_2, are Data Guard generic best practices that are irrespective of the database protection mode chosen. (Of course, net service and file/directory names for several parameters are environment-specific). LOG_ARCHIVE_DEST_2 attributes and values are considered best practices for Maximum Performance protection mode configurations, but must be tuned along with other values for each unique environment. Parameters are omitted that relate either to logical standby databases or to Data Guard environments where one system hosts both primary and standby databases. Paper #36226 11 Database Parameter Name ARCHIVE_LAG_TARGET COMPATIBLE CONTROL_FILE_RECORD_KEEP_TIME CONTROL_FILES DB_CREATE_ONLINE_LOG_DEST_n DB_FILE_NAME_CONVERT DB_FILES DB_NAME DG_BROKER_START FAL_SERVER FAL_CLIENT LOCAL_LISTENER LOG_ARCHIVE_DEST_1 Applies To Pri Both Both Both Both Sec Both Both Both Sec Sec Both Both Recommended Values and Attributes 0 9.2.0.1.0 (use latest release) 14 Platform-specific "" (default) "" (default) 471 PROD1 (set to same name) TRUE PROD1_node1 PROD1_node2 "" ‘location=/arch1/prod1 arch async mandatory noreopen max_failure=0 alternate=log_archive_dest_3’ LOG_ARCHIVE_DEST _STATE_1 Both Enable LOG_ARCHIVE_DEST_2 (Maximum Pri ‘service=PROD1_node2 optional lgwr async=20480 Performance mode only) noaffirm reopen=15 max_failure=10 delay=30 net_timeout=30’ LOG_ARCHIVE_DEST _STATE_2 Pri enable (if role is production)- initially set to defer until recovering standby defer (if role is standby) LOG_ARCHIVE_DEST_3 Both ‘location=/arch2/prod1’ LOG_ARCHIVE_DEST_STATE_3 Both alternate LOG_ARCHIVE_FORMAT Both arch_%t_%S.arc LOG_ARCHIVE_MAX_PROCESSES Both 1 (default – but tune for environment) LOG_ARCHIVE_MIN_SUCCEED_DEST Both 1 (default – informed by choice of Maximum Performance mode) LOG_ARCHIVE_START Both TRUE REMOTE_ARCHIVE_ENABLE Both TRUE REMOTE_LOGIN_PASSWORDFILE Both EXCLUSIVE SERVICE_NAMES Both PROD1.domain.com STANDBY_ARCHIVE_DEST Sec /arch1/prod1 (=LOG_ARCHIVE_DEST_1) STANDBY_FILE_MANAGEMENT Both AUTO Table 1: Recommended Data Guard initialization parameter settings for PROD1 in Maximum Performance mode on node1. AVOID DEFINING PARAMETERS WITH ENVIRONMENT VARIABLES When implementing Data Guard at a customer site, I came across a standby limitation with an easy workaround. The error does not occur until after building the standby and placing it in managed recovery. The error stems from using Unix environment variables to define initialization parameters, which is possible to do since Oracle 7.0. However, using environment variables, such as ORACLE_SID, to define USER_DUMP_DEST or BACKGROUND_DUMP_DEST causes the alert log error ARC0: Error 7446 attaching RFS server to standby instance at host '<FAL_SERVER_on_stby>'. This behavior is not documented in any Oracle Corporation sources as relating to standby databases. Error 7446 apparently denotes an ORA7446: sdnfy: bad value ‘string’ for parameter string, which is in the Oracle Error Messages manual. The action stated there is to make sure the directory you have specified is a valid directory/file specification. An oblique reference to the cause of the problem is found in a generic Metalink Note6 about the ORA-7446 error. This Note states that environment variables are not defined for remote SQL*Net connections. Since redo logs are transmitted from the primary to the standby database over SQL*Net, 6 Metalink Note 20480.1: OERR: ORA 7446 sdnfy: bad value '%s' for parameter %s. Paper #36226 12 Database this error makes sense in the context of standby databases. Since environment variables cannot be expanded, the solution, of course, is to replace the environment variable with its expanded equivalent. FOUR PARAMETERS THAT DIFFER BY ROLE The most robust Data Guard configuration uses the same parameter values wherever possible on both primary and standby sites to allow for easy role reversal. In other words, you can use nearly identical SPFILEs on each site for all production and standby instances of a particular database. As mentioned in Nearly Identical Initialization Parameter Settings above, in general, only four parameters need to be different between primary and secondary sites, assuming these sites are identical. These parameters are: FAL_CLIENT FAL_SERVER LOG_ARCHIVE_DEST_2 LOG_ARCHIVE_DEST_STATE_2 A diff between exported SPFILEs (see Appendix 1 for exported SPFILEs in their entirety) for PROD1 from primary and secondary sites shows just how slightly even these four values differ (italicized): $ diff initPROD1.ora_frSPFILE_node1 initPROD1.ora_frSPFILE_node2 < *.fal_client='PROD1_node2' < *.fal_server='PROD1_node1' --> *.fal_client='PROD1_node1' > *.fal_server='PROD1_node2' 27c27 < *.log_archive_dest_2='service=PROD1_node2 optional lgwr async=20480 noaffirm reopen=15 max_failure=10 delay=30 net_timeout=30' --> *.log_archive_dest_2='service=PROD1_node1 optional lgwr async=20480 noaffirm reopen=15 max_failure=10 delay=30 net_timeout=30' 30c30 < *.log_archive_dest_state_2='enable'#stby - set to defer when in stby role --> *.log_archive_dest_state_2='defer'#stby - set to enable when in pri role Recall that for PROD1, node1 is the primary and node2 is the standby. You can see that the first three parameter values only differ by an Oracle Net service name, PROD1_node2 vs. PROD1_node1. For details on the Oracle Net service names, see Create Oracle Network Configuration Files below. As is required, the value for the fourth parameter, LOG_ARCHIVE_DEST_STATE_2, is set to ENABLE on the primary (node1) so that LOG_ARCHIVE_DEST_2 can specify the standby destination for Oracle Net to transmit the archive logs. (See Set Initialization Parameters – Use SPFILE below about setting this parameter to DISABLE until just before recovering the standby when you dynamically set it to ENABLE). LOG_ARCHIVE_DEST_STATE_2 is set to DEFER on the standby (node2) to disable LOG_ARCHIVE_DEST_2 because this is a primary-only parameter. SPFILE comments for LOG_ARCHIVE_DEST_STATE_2 serve as reminders for the proper setting of this parameter depending upon the role of the site. PARAMETERS RELATED TO ARCHIVING STRATEGY The archiving method on both primary and standby sites is a critical component of any Data Guard environment. Table 2 below describes the main rules of good archiving strategy (which this environment abides by) and the rationale behind each rule: Paper #36226 13 Database Archiving Rules Description and Rationale Archiving must be started and remote archiving enabled. Maintaining a standby requires archiving to be enabled and started (LOG_ARCHIVE_START = TRUE). Additionally, remote archiving must be enabled (REMOTE_ARCHIVE_ENABLE = TRUE). Use a consistent log format, LOG_ARCHIVE_FORMAT. LOG_ARCHIVE_FORMAT should have a thread and sequence attribute and should be consistent across all instances. The “%S” instructs the format to zero-fill the sequence number piece of the log archive file name. ARCH process archives locally to LOG_ARCHIVE_DEST_1 on both primary and standby. Reduce work for LGWR. Postpone hang situations if local and alternate archive destinations are full. Allow more time to detect and fix space problems. LOG_ARCHIVE_DEST _STATE_1 should always be set to ENABLE. LGWR should archive both online and archived redo logs from the primary to the standby (and, for RAC, to only one standby instance and node). In all cases, use LGWR (not ARCn). All production instances in a RAC archive to the same standby destination using the same net service name. The net service name can have an address list that contains all standby instance nodes with the first node in the list being the primary standby instance that is normally running managed recovery. Use a backup net service name only if you want to switch to the secondary standby host. In this case, only one physical standby node exists. Create identical local alternate archive destinations, LOG_ARCHIVE_DEST_1 and LOG_ARCHIVE_DEST_3, on both primary and standbys. . Set LOG_ARCHIVE_DEST _STATE_3 = ALTERNATE In order to support role reversal, both primary and secondary sites archive locally to one device (LOG_ARCHIVE_DEST_1), and alternately archive locally to a separate device (LOG_ARCHIVE_DEST_3). ARCH can switch to the alternate if it encounters a write error or if the destination runs out of space. Archive directory structure is identical across all production and standby nodes. Use only 2 archive destinations: /arch1/<DB_NAME> as the primary archive destination /arch2/<DB_NAME> as the alternate archive destination If they are identical across the nodes, then it is predictable and easy to manage even after a switchover or failover operation. Standby archive destination should use the LOG_ARCHIVE_DEST_1 directory. To avoid confusion, set STANDBY_ARCHIVE_DEST = LOG_ARCHIVE_DEST_1 (local archive directory). If standby redo logs (SRLs) are present, the standby’s ARCH process writes to the local archive destination. If there is a gap, the fetch archive log (FAL) process writes to the standby archive destination. Archive destinations should be sized to hold all the archived redo log files since the last on-disk backup. For standby database instantiation, the on-disk backup accompanied with the local archived redo log files can be leveraged to re-create a new standby database. Any node can play the role of the primary standby node since switchover or node failures can occur. Set LOG_ARCHIVE_DEST_STATE_2 to The setting depends on the database role. In a production role, the state is ENABLE on primary and to DEFER on enabled. When the database is in a physical standby role, the state is deferred. standby Table 2: Archiving Rules - description and rationale. Archiving strategy is intricately linked with the settings for LOG_ARCHIVE_DEST_2. This parameter applies to the remote standby destination because the SERVICE attribute of this parameter points to the standby database net alias. Following is a Paper #36226 14 Database description of all attributes of this parameter as tuned for this environment, but derived from best practices principles for Maximum Performance mode:  OPTIONAL – Specifies that the primary can reuse online redo logs even if archiving to the standby fails. The customer here did not want primary database operations to halt if the primary database lost contact with the standby. The MANDATORY attribute of LOG_ARCHIVE_DEST_1 still required local primary archiving of redo logs to succeed.  LGWR – Specifies that LGWR rather than ARCH is responsible for transmitting redo logs to the standby. This allows redo records generated on the primary to be transmitted at the record-level, allowing for minimal data loss. Otherwise, using ARCH, a redo log switch needs to occur so the redo log can be archived and transmitted to the standby.  ASYNC=20480 - When using the primary database log writer process to archive redo logs, you can specify synchronous (SYNC) or asynchronous (ASYNC) network transmission of redo logs to archiving destinations. With ASYNC, control will be returned to the application processes immediately, even if the data has not reached the destination. This mode has a reasonable degree of data protection on the destination database, with minimal performance effect on the primary database. In general, for slower network connections, use larger block counts. ASYNC=20480 indicates to transmit the SGA network buffer in 20480 512-byte blocks. In Maximum Performance mode, this 10MB buffer size (the largest allowed) performs best in a WAN. (In a LAN ASYNC buffer size does not impact primary database throughput). Also, in a WAN, using the maximum buffer size reduces “Timing out” messages due to an async buffer full condition. This is because the smaller the buffer, the more the chance of the buffer filling up as latency increases.7  NOAFFIRM - Specifies to perform asynchronous log archiving disk write I/O operations on the standby database. It is not necessary for the primary database to receive acknowledgment of the availability of the modifications on the standby database in a Maximum Performance environment. This attribute applies to local and remote archive destination disk I/O operations, and to standby redo log disk write I/O operations. However, the NOAFFIRM attribute has no effect on primary database online redo log disk I/O operations.  REOPEN=15 MAX_FAILURE=10 – Denotes that if there is a connection failure, the network server reopens the connection after 15 seconds and retries up to 10 times. The maximum retry time for all failed operations is calculated as REOPEN multiplied by MAX_FAILURE, or 150 seconds (2.5 minutes).  DELAY=30 – Specifies that recovery apply is delayed for 30 minutes from the time the log is archived on the physical standby, but the redo transfer to the standby is not delayed. The correct recovery delay is important in ensuring that a user error or corruption does not get propagated to the standby database, which would compromise your disaster recovery solution. The recovery delay setting is critical for standby configurations regardless of the protection mode. The delay allows the managed recovery process (MRP) on the standby database to intentionally lag behind in applying archived redo log files. Without a recovery delay, when the standby database is in managed recovery mode, archived redo is automatically applied upon a log switch. Reducing the delay time reduces standby recovery time due to the reduced number of archived redo log files required for standby recovery. But a short delay time is possible only if you have a monitoring infrastructure that detects problems and stops the standby database within that timeframe (see Monitor Data Guard Configuration below). In the case of this client, OEM Data Guard Manager events monitored the configuration tightly enough to allow for a 30-minute delay.  NET_TIMEOUT=30 – Designates that if there is no reply for a network operation within 30 seconds, then the network server errors out due to the network timeout instead of stalling for the default network timeout period (TCP timeout value). A NET_TIMEOUT of 30 seconds here provided enough cushion to accommodate the latency during peak redo traffic through the dedicated NIC on the WAN. Data Guard functionality requires, or archiving strategy informs, the above archive-related parameter settings. However, Data Guard best practices address the following five additional parameter settings: DISABLE ARCHIVE_LAG_TARGET The initialization parameter ARCHIVE_LAG_TARGET limits the amount of data that can be lost and can effectively increase the availability of the standby database by forcing a log switch after a user-specified time period elapses. As with the 7 Maximum Availability Architecture (MAA), Feb 2003, an Oracle White Paper. Paper #36226 15 Database Data Guard environment here, you would be better off disabling this time-based thread advance feature by setting it to zero to eliminate archive log switches based on time. Instead, as with any database, size redo logs such that log switches occur frequently enough to meet requirements for maximum allowable loss of data. TUNE CONTROL_FILE_RECORD_KEEP_TIME CONTROL_FILE_RECORD_KEEP_TIME specifies the minimum number of days before a reusable record in the control file can be reused. Setting this parameter prevents the ARCHIVELOG mechanism from overwriting an archive log name (a serially reusable record) in the control file. Setting this parameter value higher than the default of 7 days helps to ensure that data is made available on the standby database. UNSET DB_CREATE_ONLINE_LOG_DEST_N The DB_CREATE_ONLINE_LOG_DEST_n parameter should be unset, as it was in the Data Guard environment here. This parameter sets the default location for Oracle-managed control files and online redo logs. But it causes a problem for instantiation of standby databases using standby redo logs because the SRL names cannot be reused. In other words, when this parameter is set, Oracle dynamically creates a name for the online logs that cannot be manually created on the standby database. SET REMOTE_LOGIN_PASSWORDFILE DEPENDING UPON BROKER If you want to use the Data Guard broker to manage the standby database, as done here, you must set up a remote login password file on the standby (REMOTE_LOGIN_PASSWORDFILE=EXCLUSIVE). However, if not using the broker, use operating system (REMOTE_LOGIN_PASSWORDFILE=NONE) rather than password file authentication (see Create Identical Primary & Secondary Sites above). RECOMMENDED ORACLE NETWORK STRUCTURE You need to evaluate four Oracle network options on a case-by-case basis before constructing the Oracle network configuration. This section presents these four options, then the structure of and rationale behind the network configuration. Each of the four Oracle network options affects the structure of the network configuration files, so you need to decide which options to implement, if any, then construct your network configuration files accordingly. The structure of this network configuration works for, but is not specific to, the standard bidirectional environment here. As pointed out in the beginning, each of the two bidirectional environments here is configured identically within itself, including network file structures. So the best practices for a Data Guard network configuration presented here are just as applicable to single Data Guard environments as to bidirectional ones. CONSIDER ORACLE NETWORK CONFIGURATION OPTIONS The four network configuration options are dynamic service registration, an alternate standby connection, connect-time failover and the Session Data Unit (SDU) networking parameter. The Data Guard configuration presented here employs the last two options – connect-time failover and SDU. See each option below for the rationale as applied to this configuration. I discuss these options in relation to their ramifications in a Data Guard setting, particularly for those who have already configured one or more of them. DYNAMIC SERVICE REGISTRATION (VS. STATIC SID) The parameters LOG_ARCHIVE_DEST_2 (the remote archive destination), FAL_CLIENT and FAL_SERVER all require an Oracle Net service. This service is represented as a net service name entry in a local tnsnames.ora file. There are two ways to configure the listener for this net service name: using dynamic service registration, or via a static SID list in the listener configuration.  Dynamic Service Registration8 The MAA White Paper already cited recommends dynamic service registration for Data Guard environments. But having worked with dynamic service registration in the past, I would caution that this functionality is somewhat unreliable and baffling. Sometimes, setting up and debugging service registration felt like rearranging the chairs on the deck of the Titanic. 8 A helpful note on setting up dynamic service registration is Metalink Note 76636.1: Service Registration in Net 8i. Paper #36226 16 Database The idea of dynamic registration in Data Guard environments is that sites automatically assume different roles according to their availability. Using this Data Guard environment as an example, the PROD1 database on node1 is the production site. But if node1 crashes, the new PROD1 production site becomes node2. If service registration works, there is no need to change network configuration files or start and stop listeners after a switchover or failover. The PROD1 database on node2 dynamically registers with the listener on that server, and the PROD1 database on node1 deregisters with the listener on node1. This is a reasonably good idea in theory. But it does not always work reliably. You may question whether you really need to automate this capability – after all, primary and standby roles change very infrequently. Moreover, you can easily simulate this functionality by manually controlling the listener states on the primary and secondary sites according to the role of these sites (see explanation of Static SID below). Certainly, the biggest disadvantage to using automatic service registration is that it precludes you from using the Data Guard broker to manage the standby database.9 An unavoidable though much smaller constraint of service registration is that if the listener is started after the instance, service registration does not occur immediately (though PMON should register the instance within a short time). In this case, you would need to issue the ALTER SYSTEM REGISTER statement on the database to instruct PMON to immediately register the instance with the listeners. All that said, to dynamic service registration requires setting the following initialization parameters correctly: SERVICE_NAMES for the database service name INSTANCE_NAME for the instance name LOCAL_LISTENER (you only need this parameter if using a non-default listener address other than port 1521) PMON registers a database service with the listener and resolves LOCAL_LISTENER by finding the corresponding net service name (alias) in the local tnsnames.ora file. You need a locally managed tnsnames.ora on both production and standby databases so you can use the same alias on each node. On the primary node, SERVICE_NAME or SID_NAME in listener.ora and SID in tnsnames.ora files should equal the SERVICE_NAMES initialization parameter. The listener.ora file needs to define a listener for redo log traffic (see Suggested Oracle Network Configuration below), but a separate listener is not required for database client connections on default port 1521. In fact, for database clients (barring the required listener.ora definition for redo traffic), a listener.ora file is not even required for database clients unless using a port other than 1521; in that case, you need to define an identical listener (except for HOST settings) on primary and secondary hosts, but do not need statically configured information. You may decide to use both dynamic service registration and a static SID, because OEM requires a static SID to discover and manage listeners. A static SID does not override service registration - database clients still connect by means of automatic service registration - the static SID is an appendage only for OEM’s benefit. The customer here wanted to use the Data Guard broker, so automatic service registration was not an option.  Static SID The other method for configuring the listener is to use a static SID list. There are several advantages to using a static SID:  A static SID is more reliable, less complex and more intuitive to manage than using dynamic service registration.  A static SID containing the GLOBAL_DBNAME parameter is required by standby databases and listeners in order for Oracle Enterprise Manager (OEM) to automatically discover and manage them.  A static SID allows you to manage the Data Guard configuration using the Data Guard broker through OEM. In a Data Guard environment, a minor disadvantage to a static SID (compared with automatic service registration) is that it requires you to manually stop the listener on the primary site and start it on the secondary site during a switchover or failover. However, this drawback is more than offset by the fact that a static SID allows you to use Data Guard Manager within OEM, which greatly simplifies the process of role reversal. As an example, following is the static SID list entry for the listener SID_LIST_LSNRPROD1PUB configured for nonprivileged database access on PROD1: 9 Oracle9i Data Guard Concepts and Administration, Chapter 3. Paper #36226 17 Database LSNRPROD1PUB = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1021)(SDU = 32768) ) ) SID_LIST_LSNRPROD1PUB = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD1) ) ) ALTERNATE STANDBY CONNECTION If the standby site needs to be shut down for maintenance, you can alter the LOG_ARCHIVE_DEST_2 setting on the production database to point to an alternate service name (alias) of another working standby, whose alias is resolved in the tnsnames.ora file. This allows the production database to remain open in any protection mode. If you need to shutdown the standby node for maintenance, one of the following will occur:  If the Data Guard environment is configured with Maximum Protection database mode, the production database incurs an outage unless you can switch to an available standby database.  If the Data Guard environment is configured with Maximum Availability or Performance database modes, you can shutdown the standby node for maintenance without impacting the production database. However, you lose fault tolerance with this architecture. The longer the outage, the further the standby will lag behind the production database. Still, the choice of Maximum Performance mode means that it is not necessary to implement an alternate standby connection. CONNECT-TIME FAILOVER Connect-time failover occurs when a connection request is forwarded to a second listener address if the first listener address does not respond. Connect-time failover requires that the service name in the tnsnames.ora file for database clients contain two addresses: one for the production node and one for the standby node. The second address allows for connect-time failover in case the first connection fails. Irrespective of whether you use dynamic service registration or a static SID, Data Guard best practices call for database clients to use connect-time failover to connect to the primary or the standby. With service registration, connect-time failover should work automatically, as only the primary database should be registered with its listener. When using a static SID, this is manually accomplished by making sure that only the listener on the primary site is running. SESSION DATA UNIT (SDU) PARAMETER SETTING Reducing the number of round trips across the network is key to optimizing the performance for transporting redo log data to a standby site. With Oracle Net Services it is possible to optimize data transfer by adjusting the size of the Oracle Net setting for SDU. In a WAN environment, Oracle recommends setting the SDU to the maximum setting of 32k (32768). The SDU parameter designates the size of an Oracle Net buffer used to place data into before it delivers each buffer to the TCP/IP network layer for transmission across the network. Oracle Net sends the data in the buffer either when requested or when it is full. Oracle internal Data Guard testing on a WAN demonstrates that the maximum setting of 32k performs best on a WAN. This was the setting used at the customer site reviewed here. The SDU parameter needs to be set at the listener and connection levels, i.e., in the tnsnames.ora and listener.ora. See the network configuration files in Appendix 2 for the syntax of the SDU parameter setting. In addition to setting the SDU parameter, increasing the TCP send and receive window sizes can improve performance. Use caution, however, because this may adversely affect networked applications that do not exhibit the same characteristics as archiving. This method consumes a large amount of system resources, so involve your network administrator in any TCP window sizing. Paper #36226 18 Database SUGGESTED ORACLE NETWORK CONFIGURATION The first thing to keep in mind regarding Oracle Data Guard network configurations is that you may want or need to integrate new network file entries with existing entries rather than creating new network files altogether. Secondly, if you are using connect-time failover (see Connect-time Failover above) or Transparent Application Failover (TAF), Oracle recommends not setting the GLOBAL_DBNAME parameter in the SID_LIST_<standby_listener_names> clauses in the listener.ora file. If you decide you will never implement either failover method, and you are using Oracle Enterprise Manager (OEM) 9i, use the GLOBAL_DBNAME parameter - OEM 9i requires this parameter to distinguish primary from standby database so that OEM can discover and manage both. Following are the main features of this network design.10 This design is just as applicable to single as to bidirectional Data Guard configurations. A bidirectional configuration simply doubles the number of entries in the network configuration files. So, to simplify the explanation, this section is couched in terms of a single Data Guard environment with a primary database on one site and a standby database on another site:  The same set of Oracle Net configuration files is used regardless of the current role of the node. You do not need to change or swap any network files during a switchover or failover operation.  Static SIDs for the listeners on each node allow Oracle Enterprise Manager (OEM) to discover and manage both primary and standby databases and listeners using OEM tools, and to use the Data Guard broker to manage the Data Guard environment within the OEM framework.  No special initialization settings are required for the above Oracle Net configuration, except to set DB_NAME and DB_DOMAIN to the same values on both primary and secondary sites. LISTENER.ORA FILES Two listeners service the primary database and its standby:   10 Listener for non-privileged database clients:  This listener is for non-privileged (other than DBA) clients to connect to the current primary database on a nondefault port (not 1521). Using a non-default port prevents automatic service registration from occurring by accident.  Proper Data Guard operation relies upon starting this dedicated client listener only on the node where the primary database is currently running. You need to make sure you stop the corresponding listener on the secondary node.  To allow for transparent client connections to the database, clients use the same net service name (tnsnames.ora alias) regardless of which is currently the primary database. This is possible because this alias contains addresses for both primary and standby nodes; users can only connect to the primary node because only the primary listener will be running. Listener for redo log traffic and DBA clients:  This listener has two addresses, one for redo log traffic and the other for DBA access. The address for redo log traffic allows the standby database to fetch archive log gaps from the primary database through a dedicated network on a separate port than that for DBA access, which goes through the regular network.  Because redo log traffic goes through a special dedicated network connecting the two nodes, there is one more network interface and IP address on each of the two nodes. The two databases in this bidirectional environment share this dedicated NIC which at any one time accommodates redo log traffic in each direction for each database. This special network is designated by different hostnames, resolved in the /etc/hosts file on each node.  This listener is configured and always running on both nodes to allow for role reversal. Only the listener on the current primary node is utilized for redo log traffic, but the listener for the current standby remains running so that you won’t need to start it if roles reverse. Metalink Note 175122.1: Data Guard 9i - Net8 Configuration for a 2-node database environment. Paper #36226 19 Database  SYSDBA connections must always be possible on both the primary and standby databases. So they cannot go through the listener for non-privileged database clients, which is only running on the node where the primary database is running. TNSNAMES.ORA FILES Build three types of tnsnames.ora files: for each local node, for non-privileged database clients, and for DBA access:  The local tnsnames.ora file on each node contains an entry for the standby database on that node to fetch archive log gaps from the primary database on the other node. The net alias is the database setting for FAL_CLIENT on that node.  The tnsnames.ora files for non-privileged clients contains aliases used to connect only to the primary database. Each alias contains two addresses, one for each node. As an example, for the PROD1 database, the alias looks like this: PROD1 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1021)) (ADDRESS=(PROTOCOL=tcp)(HOST=node2)(PORT=1021)) ) (CONNECT_DATA = (SID = PROD1) ) ) Proper Data Guard operation relies upon starting the listener only on the node in the primary database role so only one of these two addresses can be used - the address defining HOST=primary_node. We’re using connect-time failover, but with manual listener controls. Distributing a separate tnsnames.ora file to non-privileged clients is a good security practice, in that it only gives the required connectivity information, and nothing more.  The tnsnames.ora file for DBAs can be used to connect to the database in either role on either node. CONSIDER STANDBY REDO LOGS Consider creating standby redo logs (SRLs). Standby redo logs are required for Maximum Protection mode and recommended for Maximum Availability mode. In these protection modes LGWR from the primary host writes transactions directly to the standby redo logs. This enables no data loss solutions and reduces the amount of data loss for Maximum Availability mode if the standby network is disconnected. SRLs are not required in Maximum Performance mode, nor were they used at this site. The rules of thumb for SRLs are as follows:  If you employ SRLs, use them at all sites. In other words, if you configure standby redo on the standbys, then you should also configure standby redo logs on the primary database. Even though the standby redo logs are not used when the database is running in the primary role, configuring the standby redo logs on the primary database is recommended in preparation for an eventual switchover or failover.  Create the same number of SRLs for both production and standby databases.  Use the same names for SRLs on both production and standby databases.  All of the production database’s online redo logs and SRLs should be the same size.  The number of SRLs to create is one more than the sum of all online log groups per thread. Having one more log group than the production database’s online redo log groups reduces the likelihood that the production instance’s LGWR is blocked because an SRL cannot be allocated on the standby. For the Data Guard configuration described here, the primary database has one instance (thread) and each thread has five online log groups. So if SRLs had been used here, six such SRLs would have been created on both sites. Paper #36226 20 Database  Standby redo logs must be archived before the data can be applied to the standby database. The standby archival operation occurs automatically, even if the standby database is not in ARCHIVELOG mode. However, the archiver process (ARCn) must be started on the standby database.  Check MAXLOGFILES to ensure the number of SRL groups desired is less than maximum number of groups of SRLs allowed.  Check MAXLOGMEMBERS to ensure the number of SRL members desired is less than maximum number of members of SRLs allowed. See Create Standby Redo Logs, If Necessary below for the syntax used to create standby redo logs. MONITOR DATA GUARD CONFIGURATION If you use Oracle Enterprise Manager (OEM) 9i to manage your Data Guard environment, you can take advantage of several Diagnostics Pack event tests. These tests are located in the Database Specialized category under the Data Guard heading. The following events are currently available:  Data Guard Actual Apply Delay  Data Guard Data Not Applied  Data Guard Logs Not Applied  Data Guard Logs Not Shipped  Data Guard Potential Data Loss  Data Guard Status The most common standby detection mechanism is to monitor the standby alert log for critical errors such as ORA-600 or ORA-1578, and to alert and react when the application detects a logical corruption like a missing table. But other non ORAerrors in the alert log can indicate trouble. The following alert log entries exemplify this: ARC0: ARC0: ARC0: ARC0: Error 272 I/O error Error 270 Error 270 writing standby archive log file at host 'PROD1_node2' 272 archiving log 5 to 'PROD1_node2' closing standby archive log file at host 'PROD1_node2' Closing archive log file 'PROD1_node2' Therefore, you may want to consider parsing the alert log file for words like error (case-insensitive). Despite best intentions, you may not be able to monitor for all possible errors in the alert log. And other errors will not appear in the alert log at all, such as unrecoverable changes due to NOLOGGING operations when not in FORCE LOGGING mode, for example. If you are not managing your Data Guard environment with the OEM 9i Diagnostics Pack, you can monitor for these same event conditions by querying the dynamic performance (V$) views on the standby (and the primary). See Appendix 3 for scripts that check for archive log gaps between primary and standby, ensure the standby is in media recovery mode, and check for unrecoverable changes on the standby propagated by unrecoverable operations on the primary database. CONSIDER DATA GUARD BROKER Data Guard allows you to configure and manage a Data Guard configuration through several interfaces, including SQL statements, initialization parameters, a PL/SQL package, and the Oracle9i Data Guard broker. A broker configuration consists of the existing primary and standby databases, the sites, and database resources such as log transport and log apply services. You create a broker configuration either using the Data Guard Manager (OEM) or its command-line interface (CLI). With the Data Guard Manager, a Create Configuration Wizard defines the broker configuration. The broker controls the logical objects in the configuration, modifies their behavior at runtime, dynamically sets the protection mode across the configuration, monitors the overall health of the configuration, and reports any health and other operational characteristics up through the Enterprise Management notification mechanisms and the Data Guard Manager general property pages if you are using Data Guard Manager, or through SHOW commands if you are using the CLI. Paper #36226 21 Database In summary, the Data Guard broker provides the following benefits:  Management - Provides primary and standby database management as one unified configuration. It allows you to configure and manage multiple sites from a single location.  Automation - Automates opening a primary database, mounting a physical standby database, opening a logical standby database, and starting log transport and log apply services. Automates switchover and failover operations, provides a GUI for changing database states and Data Guard properties.  Monitoring - Provides Monitoring of database health and other runtime parameters. Provides a unified status through the Data Guard configuration log. Provides a tie-in to Oracle Enterprise Manager Data Guard-related events. CREATE DATA GUARD ENVIRONMENT Following are step-by-step instructions to create an Oracle9i Data Guard environment. These are common steps in that you will need (or want) to perform them regardless of whether you manually create the environment, as shown here, or whether you use OEM 9i Data Guard Manager. These general procedures are also irrespective of the type of Data Guard configuration. Some of these steps are prerequisites for others, so perform them in the order listed. Otherwise you may be required to backtrack, and in some cases, start over. These instructions assume you are using dedicated server rather than shared server architecture for log transport services (see Operational Requirements); that a thoroughly tested Backup & Recovery strategy is in place (see Best Practices For Data Guard Configurations); and that you have configured any Oracle networking options desired (see Consider Oracle Network Configuration Options). Several of the preliminary steps require shutting down the primary database. For completeness, a SHUTDOWN command is therefore listed in each of these steps. However, to maximize primary database availability, you could defer shutting down until the last step requiring a shutdown. This last step will either be at the end of Set Initialization Parameters – Use SPFILE, or under Backup Primary Database if you are doing a cold backup to create the standby database. To remind you of this option, all but the last SHUTDOWN command is marked (can defer), and the last SHUTDOWN command is marked as such. Just remember that if you defer bouncing the database as long as possible, you will need to execute those commands necessarily skipped that were listed after each step containing intermediate SHUTDOWN commands. To generalize the steps required, appended to each title below is (Pri), (Sec), (Pri+Sec) or (Pri,Sec) to indicate whether to execute each step within that title on the primary, the secondary, both or a combination thereof, respectively. For the titles, Set Database Protection Mode, Run In ARCHIVELOG Mode With Automatic Archiving, and Implement FORCE LOGGING Mode, only (Pri) is appended when it appears these titles should apply to both primary and secondary (Pri+Sec). They do in fact apply to both sites, but you only need to perform these procedures on the primary database because the changes will be contained in the database backup used to build the standby. The tag (9.2) denotes that the command or procedure is only available in Oracle Release 9.2 or later, and (9.0) indicates that it applies to any release of Oracle 9.0. If no release is indicated, the command applies to both versions. IMPLEMENT FORCE LOGGING MODE (PRI) FORCE LOGGING mode is a new feature to Oracle Release 9.2. This mode will override any user transactions that specify NOLOGGING. Unlogged operations will invalidate the standby database and may require substantial DBA intervention, including the need to propagate unlogged operations manually. The FORCE LOGGING database mode will override any user transactions specifying NOLOGGING. FORCE LOGGING is not the default mode, so you will probably need to change it. To check whether the primary database is in FORCE LOGGING mode, issue the following command: (9.2): SELECT FORCE_LOGGING FROM V$DATABASE; FORCE_LOGGING -------------------NO To place the primary database in FORCE LOGGING mode, specify the following as SYSDBA: Paper #36226 22 Database SQL> ALTER DATABASE FORCE LOGGING; Database altered. The FORCE LOGGING mode is a persistent attribute of the database. That is, if the database is shut down and restarted, it remains in the same logging mode state. However, if you recreate the control file, the database is not restarted in the FORCE LOGGING mode unless you specify the FORCE LOGGING clause in the CREATE CONTROL FILE statement. Because FORCE LOGGING mode is not available in Oracle Server Release 9.0, the best you can do in Oracle9.0 is to put all tablespaces in LOGGING mode (except temporary tablespaces which is always set to NOLOGGING). To check if a tablespace is in LOGGING mode, execute the following in sqlplus (this works in 9.2 as well but is unnecessary if in FORCE LOGGING mode): (9.0): SELECT TABLESPACE_NAME, LOGGING FROM DBA_TABLESPACES; TABLESPACE_NAME -----------------------------SYSTEM UNDOTBS1 TEMP DRSYS INDX LOGGING --------LOGGING LOGGING NOLOGGING LOGGING LOGGING etc. To put a tablespace in LOGGING mode, do the following: ALTER TABLESPACE <permanent tablespace_name> LOGGING; This does not prevent objects from being overridden by NOLOGGING if specified at the object level, or when doing direct load inserts. CREATE IDENTICAL PRIMARY & SECONDARY SITES (PRI+SEC) For smooth roll reversal after a failover or switchover, create identical operating system environments on both primary and secondary sites. The following components should be identically configured:  O/S clocks - It is easy to overlook synchronizing the clocks of all servers in a Data Guard environment. Not doing so can make it very confusing later on when comparing log files (particularly alert logs) containing time stamps, because they will probably not dovetail. You may want to ask the System Administrator to implement enterprise-wide software to keep all server clocks synchronized all the time.  Hosts files - A working Data Guard environment requires that the /etc/hosts files contain all hostnames used in the network configuration files so that they can be resolved to IP addresses. Even if the /etc/hosts files on both servers in a bidirectional standby configuration already contain both server names, as is often the case, you may need to add new entries for any new NICs dedicated to redo traffic (see Identical Oracle and O/S Network Configurations above). These dedicated networks must be distinguished by different HOST names in the Oracle network configuration files, and resolved in the /etc/hosts files. Also, as recommended above under Use Identical Primary and Secondary Sites, you may need to install a new NIC to prevent a single NIC point of failure.  Environment variables and DBA scripts.  Initialize environments on both primary and secondary sites (preferably with scripts) to host either database in either role. Make sure to include all required environment variables (ORACLE_SID, etc), aliases, functions, and the like.  Place all primary DBA maintenance scripts on both sites. If you configured the SPFILE using the default name in the default directory as recommended above, then you can change all scripts that specify a PFILE after the database Paper #36226 23 Database STARTUP command. The default SPFILE will now be used if nothing is specified after the STARTUP command within scripts. This makes these scripts more generic. But you must be careful to initialize the O/S environments using the correct values for ORACLE_SID, etc.   Directories - Create identical directories (including owner & permissions) on primary and secondary sites for datafiles, control files, redo logs, standby redo logs and archive logs should match as follows:  Create identical datafile directories on standby as on primary.  Create identical control file directories on the standby as on the primary. Provided the online redo log is multiplexed, it is good practice to store a control file copy on every disk drive that stores members of online redo log groups. By storing control files in these locations, you minimize the risk that all control files and all groups of online redo logs will be lost in a single disk failure.  Create the identical redo log directories on the standby as on the primary.  Create identical standby redo log directories on both primary and secondary sites.  On both primary and secondary, create a local alternate archive log directory (LOG_ARCHIVE_DEST3) on a separate device than LOG_ARCHIVE_DEST_1.  Create STANDBY_ARCHIVE_DEST on the secondary in the same location as LOG_ARCHIVE_DEST_1 on the primary. Password files. The Data Guard broker requires password file authentication, which was already the authentication method used here. However, if not using the broker, use operating system rather than password file authentication. Using password file authentication opens up the possibility after role reversal of mismatches in SYSDBA/SYSOPER password or privilege information (as reported by V$PWFILE_USERS) with that stored in the password file. This can occur when you add or remove SYSDBA or SYSOPER users, or change passwords for these users on the primary, or if you ever need to rebuild the standby. SQL commands that change SYSOPER or SYSDBA user information on the primary database do propagate to the standby database, but not to the standby password file. If you cannot use OS authentication, but must use a password file, then create a password file on the primary (if not already done) and the standby as follows: $ orapwd file=orapw<ORACLE_SID> password=passwd entries=max_users Set REMOTE_LOGIN_PASSWORDFILE=EXCLUSIVE. Then connect SYS/password as SYSDBA, and grant the SYSDBA or SYSOPER system privilege to the same users on the standby as granted on the primary. If later you need to change passwords for SYS, SYSTEM, or any other SYSDBA or SYSOPER user, or if you add/remove any of these users, remember to do so on both primary and secondary sites. SET INITIALIZATION PARAMETERS – USE SPFILE (PRI+SEC) If you are using an initialization file (init.ora) on the primary database, Best Practices for Data Guard Configurations recommends migrating to a server parameter file (SPFILE), principally to facilitate role reversal during switchover operations, and because the Data Guard broker requires an SPFILE. However, when first setting up a Data Guard environment, you need to make many initialization parameter changes. It is easier to do so with an init.ora file than an SPFILE, which requires issuing ALTER SYSTEM statements to change parameter values. If you are using an SPFILE already, you may even consider exporting the SPFILE, editing the output text file, then re-creating the SPFILE. In accordance with Best Practices for Data Guard Configurations, this Data Guard environment uses a nearly identical SPFILE for each database’s primary and secondary site. All in all, a standard bidirectional configuration such as this requires four SPFILEs: a slightly different SPFILE (where four parameter values are different) for PROD1 on each site, and another for PROD2 on each site. (The SPFILEs for PROD1 and PROD2 are also very similar - they differ by more than four parameter values, but this is chiefly because their instance names differ). Appendix 1 contains exports of these four SPFILEs in the form of exported text initialization parameter files. (Generic values were substituted for values containing client-specific information, such as DOMAIN_NAME). For this implementation, four init.ora files were built in the default locations, then migrated to SPFILEs with default names and locations. It was possible to build the standby SPFILEs at this time because the Paper #36226 24 Database CREATE SPFILE statement can be executed before or after instance startup, provided Oracle9i Server is already installed on the standby. LOG_ARCHIVE_DEST_STATE_2 is the only initialization parameter you cannot set as required for standby functionality until you recover the standby database (see Dynamically Enable LOG_ARCHIVE_DEST_STATE_2 below). For now, set this parameter to DEFER. This will specify to preserve the LOG_ARCHIVE_DEST_2 destination attributes, but will exclude the standby from archiving operations until enabled. If you are using an initialization file (init.ora), and have already set all initialization parameters as required for Data Guard, now is the time to change the primary database to use a server parameter file (SPFILE). You can check if you are already using SPFILE as follows: SQL> SHOW PARAMETERS SPFILE; NAME TYPE VALUE ------------------------------------ ----------- -----------------------------spfile string $ORACLE_HOME/dbs/SPFILE$ ORACLE_SID.ora If not null, then SPFILE is being used. Alternately, you can check as follows: SQL> SELECT COUNT(*) FROM V$SPPARAMETER WHERE VALUE IS NOT NULL; COUNT(*) ---------31 If count >0 then you are using SPFILE. To switch to SPFILE, do the following as SYSDBA: SQL> CREATE SPFILE FROM PFILE=’<full_path_of_init.ora>’; (The clause FROM PFILE does not need to be specified if the PFILE is in the default directory). This creates an SPFILE called $ORACLE_HOME/dbs/SPFILE$ORACLE_SID.ora. You now need to bounce the database in order for it to use SPFILE: SQL> SHUTDOWN IMMEDIATE; Database closed. Database dismounted. ORACLE instance shut down. SQL> STARTUP; ORACLE instance started. (last shutdown unless doing cold backup later) Total System Global Area 143727516 bytes Fixed Size 453532 bytes Variable Size 109051904 bytes Database Buffers 33554432 bytes Redo Buffers 667648 bytes Database mounted. Database opened. SQL> EXIT; Disconnected from Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.1.0 – Production Create a backup of this SPFILE: Paper #36226 25 Database $ cd $ORACLE_HOME/dbs $ cp SPFILE<ORACLE_SID>.ora SPFILE<ORACLE_SID>.ora_YYYYMMDD Create a backup initialization parameter file (PFILE) from SPFILE: SQL> CREATE PFILE=’init<ORACLE_SID>.ora_fromSPFILEYYYYMMDD’ FROM SPFILE; This file will be created in $ORACLE_HOME/dbs by default CREATE ORACLE NETWORK CONFIGURATION FILES (PRI+SEC) The next step is to configure and start Oracle networking components. The Oracle Net Services configurations at each site must conform to Data Guard architecture requirements. SPFILE, /etc/hosts, listener.ora, tnsnames.ora (and sqlnet.ora, if used) files all play a part, and must be consistent. Several initialization parameters required by Data Guard depend on a proper Oracle Net configuration, including LOG_ARCHIVE_DEST_2, SERVICE_NAMES, FAL_SERVER, and FAL_CLIENT. For the parameter values used in this configuration, see Appendix 1 containing the two SPFILEs at each site. Appendix 2 lists all network configuration files for node1 and node2, including listener.ora and tnsnames.ora files, as well as tnsnames.ora files for DBA and non-privileged clients. The hostnames node1b and node2b correspond to the dedicated network for standby redo log traffic between the nodes. The /etc/hosts file contains the following entries for the two nodes (generic IP addresses are used for client security reasons): 192.168.0.1 node1 node1.domain.com 192.168.0.2 node2 node2.domain.com 10.10.0.1 node1b node1b.domain.com 10.10.0.2 node2b node2b.domain.com These network configuration files abide by best practices as explained above in Recommended Oracle Network Configuration. Creating a bidirectional Data Guard environment requires defining four rather than two listeners – two for PROD1 and two for PROD2. Similarly, bidirectional environments call for two (not one) aliases in the local tnsnames.ora file for each node. All listener definitions utilize static SID lists. Table 3 below gives the listener names, the servers on which they are defined, the ports used, and a description of what these listeners service. Listener Name Server(s) Where Located Port Purpose LSNRPROD1PUB node1, node2 1021 Non-privileged client access to PROD1 primary LSNRPROD2PUB node1, node2 1022 Non-privileged client access to PROD2 primary LSNRPROD1REDO node1, node2 1121 1221 Redo log traffic for PROD1 standby DBA access to PROD1 in either role on either node LSNRPROD2REDO node1, node2 1122 1222 Redo log traffic for PROD2 standby DBA access to PROD2 in either role on either node Table 3: Data Guard listener configuration. Following is a description of the network configuration for PROD1. The same network structure applies to PROD2; therefore, for simplicity, only the PROD1 network configuration is covered.  Listener for non-privileged database clients (LSNRPROD1PUB):  The listener is dedicated to non-privileged (other than DBA) clients to connect to the PROD1 primary database on non-default port 1021. Paper #36226 26 Database   Proper Data Guard operation relies upon starting this dedicated client listener only on the node where the PROD1 primary database is running, which initially will be node1. You need to make sure you stop the corresponding listener on the secondary node, node2.  To allow for transparent client connections to the database, clients use the same net service name (tnsnames.ora alias) regardless of which is currently the primary database. This is possible because this alias contains addresses for both primary and standby nodes; users can only connect to the primary node because only the primary listener will be running. Listener for redo log traffic and DBA clients (LSNRPROD1REDO):  This listener allows the PROD1 standby database (initially on node2) to fetch archive log gaps from the PROD1 primary database (initially on node1) through port 1121.  All redo log traffic goes through a special dedicated network connecting the two nodes (designated by node1b and node2b). Thus, there is one more network interface and IP address on each of the two nodes (10.0.0.1 and 10.0.0.2, respectively).  This listener is always running on both nodes to allow for DBA intervention and role reversal. DBA clients can connect at any time through port 1221 on either node in either role. Only the listener on the current primary node (initially node1) is utilized for redo log traffic, but the listener for the standby node (node2) remains running so that if roles reverse, you won’t need to start this listener. CHANGE TO ARCHIVELOG MODE WITH AUTOMATIC ARCHIVING (PRI) Maintaining a physical standby requires that the primary database be in ARCHIVELOG mode, and that archiving be enabled and started. To check whether all of these are the case, issue the following command on the primary as SYSDBA: SQL> ARCHIVE LOG LIST; Database log mode Automatic archival Archive destination Oldest online log sequence Current log sequence No Archive Mode Disabled /arch1/PROD1 3 5 To enable automatic archiving, set the initialization parameter LOG_ARCHIVE_START = TRUE on the primary. (Do the same on the secondary in case of role reversal). To place the primary database in ARCHIVELOG mode, issue the following in sqlplus as SYSDBA: SQL> SHUTDOWN IMMEDIATE; Database closed. Database dismounted. ORACLE instance shut down. (can defer) Set the initialization parameter LOG_ARCHIVE_START=TRUE on both primary and secondary. Then execute the following on the primary: SQL> STARTUP MOUNT EXCLUSIVE; ORACLE instance started. Total System Global Area 143727516 Fixed Size 453532 Variable Size 109051904 Database Buffers 33554432 Redo Buffers 667648 Database mounted. SQL> ALTER DATABASE ARCHIVELOG; bytes bytes bytes bytes bytes Paper #36226 27 Database Database altered. SQL> ALTER DATABASE OPEN; Database altered. Now verify the database is now in ARCHIVELOG mode: SQL> ARCHIVE LOG LIST; Database log mode Archive Mode Automatic archival Enabled Archive destination /arch1/PROD1 Oldest online log sequence 3 Current log sequence 5 SQL> SHUTDOWN; (can defer) Database closed. Database dismounted. ORACLE instance shut down. If you change the archiving mode, this updates the control file, so requires a cold backup (the next step) of the primary database. Any previous backups are no longer usable because they were taken in NOARCHIVELOG mode. BACKUP PRIMARY DATABASE (PRI) First, make backups of your primary database datafiles. You will be creating the standby datafiles from these backups. As mentioned above in Change To ARCHIVELOG Mode With Automatic Archiving, you will need to perform a cold backup if you just changed to ARCHIVELOG mode. If your primary database was already in ARCHIVELOG mode, you can do a hot backup now to avoid shutting down the primary database. Either way, you will use these backups and subsequent archive logs to build the standby database. CREATE STANDBY CONTROL FILE (PRI) After you have created the backups that will be used as the standby datafiles, you can create the standby database control file. The control file must be created after the latest timestamp for the backup datafiles. To create the standby database control file, connect to the primary database and issue: SQL> ALTER DATABASE CREATE STANDBY CONTROLFILE AS ‘/tmp/control01.ctl’; The filename for the created standby control file must be different from the filename of the current control file of the primary database. COPY BACKUP TO STANDBY SITE (SEC) At the O/S level, copy the following from the primary site to identical locations on the secondary site:  Backup datafiles (except temporary tablespace datafiles)  Standby control file to the same locations on the secondary site as on the primary site (specified by the CONTROLFILES parameter)  All archived redo logs to STANDBY_ARCHIVE_DEST on the secondary (=LOG_ARCHIVE_DEST_1 on the primary). The required archive logs are those created after the last cold backup, or from the beginning of the last hot backup.  All online redo logs (recommended for switchover and failover operations) Paper #36226 28 Database RECOVER STANDBY DATABASE (PRI,SEC) MOUNT STANDBY DATABASE (SEC) After you have configured the necessary parameter and network files, you can start the standby instance and mount the physical standby database. In sqlplus as SYSDBA, do the following: SQL> STARTUP NOMOUNT; SQL> ALTER DATABASE MOUNT STANDBY DATABASE; CREATE STANDBY REDO LOG FILES, IF NECESSARY (PRI+SEC) In this customer’s case, the customer chose not to use standby redo logs (SRLs), which are not required in Maximum Performance mode. However, if you decide to use SRLs, you would create them on both sites. You cannot create them on the primary database prior to the backup used to create the standby database because the SRLs on the standby must be empty of primary database redo data. To create SRLs on each site, do the following:  Check MAXLOGFILES to ensure the number SRLs desired is less than maximum number of groups of SRLs allowed.  Check MAXLOGMEMBERS to ensure the number SRL members desired is less than maximum number of members of SRLs allowed.  Add standby log files using the following syntax: alter database add standby logfile GROUP 1 ('/db02/oracle/oradata/prod1/redo_stby_01a.log', '/db03/oracle/oradata/prod1/redo_stby_01b.log') size 250m; alter database add standby logfile GROUP 2 ( '/db02/oracle/oradata/prod1/redo_stby_02a.log', '/db03/oracle/oradata/prod1/redo_stby_02b.log') size 250M; etc. START NEW LISTENERS (PRI+SEC) Now that the network configuration files are in place, you can start the new networking components on both primary and standby sites. To pick up the new listener definitions, stop and start the listeners on the primary database that you added or changed, and start the new listeners on the standby (the commands below assumes both primary listeners are new): $ $ $ $ lsnrctl lsnrctl lsnrctl lsnrctl start start start start LSNRPROD1PUB LSNRPROD2PUB LSNRPROD1REDO LSNRPROD2REDO If you configured dynamic service registration for non-privileged database clients, you need to register the databases with these new listeners because you started them after starting the primary and after mounting the standby. To register these listeners and confirm they were registered, do the following on both primary and standby databases: SQL> ALTER SYSTEM REGISTER; System altered. SQL> EXIT; $ lsnrctl status LSNRPROD1PUB $ lsnrctl status LSNRPROD2PUB The output of each of these commands should contain the following (sample output portion is from the first command): “Service "PROD1" has 1 instance(s). handler(s) for this service.” Instance "PROD1", status UNKNOWN, has 1 Paper #36226 29 Database RECOVER STANDBY DATABASE (SEC) Standby database recovery will fail if a redo log file switch interrupts these next two steps (recovering the standby and placing it in managed recovery mode). To reduce this possibility, manually switch log files now, just prior to recovery: SQL> ALTER SYSTEM SWITCH LOGFILE; System altered. Now you need to recover the standby database by applying required archive logs to the standby. To do so, issue the following command in sqlplus as SYSDBA: SQL> RECOVER AUTOMATIC STANDBY DATABASE; This will recover until no more logs are required, at which point you will get the following message: ORA-00308: cannot open archived log '/arch1/prod2/log_1_4868.arc' ORA-27037: unable to obtain file status Linux Error: 2: No such file or directory. This should return you to sqlplus. At this point all archived logs from production should have been applied to the standby. PLACE STANDBY IN MANAGED RECOVERY MODE (SEC) To allow log apply services to automatically apply future primary archive logs to the standby, issue the following command in sqlplus as SYSDBA: SQL> RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION; DICONNECT FROM SESSION allows log apply services to run as a detached background server process and immediately returns control to the user. It does not disconnect the current SQL session. This command should eventually return: Media recovery complete. The last line in the standby alert log should show the following: Media Recovery Waiting for thread 1 seq# 4868 DYNAMICALLY ENABLE LOG_ARCHIVE_DEST_STATE_2 (PRI) In order to allow archiving to initiate on the standby site once recovered, change LOG_ARCHIVE_DEST_STATE_2 from DEFER to ENABLE on the primary site as follows: ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2 = ENABLE SCOPE=BOTH; SET DATABASE PROTECTION MODE (PRI) If you need to change the database protection mode from the default, Maximum Protection, you must do at this time, after placing the standby database in managed recovery mode. In addition, changing the protection mode requires shutting down the primary database in order to mount it in EXCLUSIVE mode. To determine what protection mode is set, issue the following command in sqlplus: Paper #36226 30 Database SQL> SELECT PROTECTION_MODE FROM V$DATABASE; PROTECTION_MODE -------------------MAXIMUM PERFORMANCE If necessary, set the primary database to the desired mode, in this case, MAXIMUM PERFORMANCE mode as follows: Paper #36226 31 Database (9.2): SQL> SHUTDOWN IMMEDIATE; Database closed. Database dismounted. ORACLE instance shut down. SQL> STARTUP MOUNT EXCLUSIVE; ORACLE instance started. (can defer) Total System Global Area 143727516 bytes Fixed Size 453532 bytes Variable Size 109051904 bytes Database Buffers 33554432 bytes Redo Buffers 667648 bytes Database mounted. SQL> ALTER DATABASE SET STANDBY TO MAXIMIZE PERFORMANCE; Database altered. Oracle9.0 has only two database modes, PROTECTED and UNPROTECTED. UNPROTECTED mode in Oracle9.0 is the closest equivalent to the Oracle9.2 database mode, MAXIMUM PERFORMANCE. So the statement in 9.0 would be as follows: (9.0): ALTER DATABASE SET STANDBY DATABASE UNPROTECTED; Database altered. VERIFY STANDBY AUTOMATIC ARCHIVING WORKS (PRI,SEC) Now that the standby is in managed recovery mode, confirm that the standby is working. Specifically,  Confirm from the primary site that automatic archiving to the standby is working: SQL> ALTER SYSTEM SWITCH LOGFILE; System altered. SQL> SELECT STATUS, ERROR FROM V$ARCHIVE_DEST WHERE DEST_ID=2; STATUS ERROR --------- ---------------------------------------------------VALID  If STATUS is VALID and ERROR is null as shown above, then automatic archiving from the primary is working. If not, possible causes are:  the standby may not be in managed recovery mode.  the primary may have switched a log file during the previous two steps, and skipped sending this log file to the standby.  Windows platforms only - a bug with the ARCH process requires you to bounce the archive process as a workaround: SQL> ALTER SYSTEM ARCHIVE LOG STOP; System altered. Paper #36226 32 Database SQL> ALTER SYSTEM ARCHIVE LOG START; System altered.  Confirm from the standby that the archive log from the previous step was copied to STANDBY_ARCHIVE_DEST. In addition, the alert log should show that the log was processed and that Media Recovery is waiting for the next log: Media Recovery Waiting for thread 1 seq# 4868 Fri Aug 30 08:13:34 2002 Media Recovery Log /arch/prod1/log_1_4868.arc Fri Aug 30 08:13:34 2002 Media Recovery Waiting for thread 1 seq# 4869 If the log was not copied or processed, then issue: SQL>RECOVER MANAGED STANDBY DATABASE CANCEL; Then begin automatic standby database recovery again at Recover Standby Database above. ENABLE DATA GUARD BROKER (PRI+SEC) You have already started the Data Guard monitor (DMON) process on each site by setting the initialization parameter DB_BROKER_START to TRUE on both sites. You now need to create and enable the broker configuration. As already explained under Consider Enabling Data Guard broker above, you create and enable the broker configuration either using Data Guard Manager’s Create Configuration Wizard, or its command-line interface (CLI). Here, we used the Create Configuration Wizard, which leads you step-by-step through the process. If using CLI, execute the DGMGRL commands CREATE CONFIGURATION to create the primary site, CREATE SITE to create the standby site, and ENABLE CONFIGURATION to enable the broker configuration. Paper #36226 33 Database APPENDIX 1: SPFILES Below are exports of the four SPFILEs used in the standard bidirectional Data Guard environment presented in this paper. These files were created using the SQL statement CREATE PFILE. The first text initialization parameter file, for example, was created on node2 for the PROD1 database using the following statement: CREATE PFILE=initPROD1.ora_fr_SPFILE_node1 FROM SPFILE; Comments are indicated after the #. SPFILE FOR PROD1 ON NODE1 *._OPTIM_ENHANCE_NNULL_DETECTION=FALSE *.archive_lag_target=0 *.background_dump_dest='/usr/oracle/oradba/prod1/bdump' *.compatible='9.2.0.1.0' *.control_file_record_keep_time=14 *.control_files='/db01/oracle/oradata/prod1/control01.ctl','/db02/oracle/oradata/prod1/control02.ctl','/db03/ora cle/oradata/prod1/control03.ctl' *.cursor_space_for_time=TRUE *.db_block_size=8192 *.db_cache_size=840M *.db_create_online_log_dest_1='' *.db_create_online_log_dest_2='' *.db_create_online_log_dest_3='' *.db_create_online_log_dest_4='' *.db_create_online_log_dest_5='' *.db_domain='domain.com' *.db_file_multiblock_read_count=32 *.db_files=471 *.db_name='PROD1' *.fal_client='PROD1_node2' *.fal_server='PROD1_node1' *.global_names=false *.instance_name='PROD1' *.log_archive_dest_1='location=/arch1/prod1 mandatory arch async noreopen max_failure=0 alternate=log_archive_dest_3' *.log_archive_dest_2='service=PROD1_node2 optional lgwr async=20480 noaffirm reopen=15 max_failure=10 delay=30 net_timeout=30' *.log_archive_dest_3='location=/arch2/prod1' *.log_archive_dest_state_1='enable' *.log_archive_dest_state_2='enable'#stby - set to defer when in stby role *.log_archive_dest_state_3='alternate' *.log_archive_dest='' *.log_archive_duplex_dest='' *.log_archive_format='log_%t_%s.arc' *.log_archive_start=true *.log_buffer=1048576 *.optimizer_mode='choose' *.parallel_threads_per_cpu=4 *.pga_aggregate_target=840M *.pre_page_sga=FALSE *.processes=200 *.query_rewrite_enabled=true *.query_rewrite_integrity='trusted' *.remote_archive_enable=true *.remote_login_passwordfile='EXCLUSIVE' *.service_names='PROD1.domain.com' *.shared_pool_size=200M *.standby_archive_dest='/arch1/prod1' *.standby_file_management='auto' *.timed_statistics=TRUE *.undo_management='auto' *.user_dump_dest='/usr/oracle/oradba/prod1/udump' *.utl_file_dir='/home/oracle/htmllogs' *.workarea_size_policy='AUTO' Paper #36226 34 Database SPFILE FOR PROD1 ON NODE2 *._OPTIM_ENHANCE_NNULL_DETECTION=FALSE *.archive_lag_target=0 *.background_dump_dest='/usr/oracle/oradba/prod1/bdump' *.compatible='9.2.0.1.0' *.control_file_record_keep_time=14 *.control_files='/db01/oracle/oradata/prod1/control01.ctl','/db02/oracle/oradata/prod1/control02.ctl','/db03/ora cle/oradata/prod1/control03.ctl' *.cursor_space_for_time=TRUE *.db_block_size=8192 *.db_cache_size=840M *.db_create_online_log_dest_1='' *.db_create_online_log_dest_2='' *.db_create_online_log_dest_3='' *.db_create_online_log_dest_4='' *.db_create_online_log_dest_5='' *.db_domain='domain.com' *.db_file_multiblock_read_count=32 *.db_files=471 *.db_name='PROD1' *.fal_client='PROD1_node1' *.fal_server='PROD1_node2' *.global_names=false *.instance_name='PROD1' *.log_archive_dest_1='location=/arch1/prod1 mandatory arch async noreopen max_failure=0 alternate=log_archive_dest_3' *.log_archive_dest_2='service=PROD1_node1 optional lgwr async=20480 noaffirm reopen=15 max_failure=10 delay=30 net_timeout=30' *.log_archive_dest_3='location=/arch2/prod1' *.log_archive_dest_state_1='enable' *.log_archive_dest_state_2='defer'#stby - set to enable when in pri role *.log_archive_dest_state_3='alternate' *.log_archive_dest='' *.log_archive_duplex_dest='' *.log_archive_format='log_%t_%s.arc' *.log_archive_start=true *.log_buffer=1048576 *.optimizer_mode='choose' *.parallel_threads_per_cpu=4 *.pga_aggregate_target=840M *.pre_page_sga=FALSE *.processes=200 *.query_rewrite_enabled=true *.query_rewrite_integrity='trusted' *.remote_archive_enable=true *.remote_login_passwordfile='EXCLUSIVE' *.service_names='PROD1.domain.com' *.shared_pool_size=200M *.standby_archive_dest='/arch1/prod1' *.standby_file_management='auto' *.timed_statistics=TRUE *.undo_management='auto' *.user_dump_dest='/usr/oracle/oradba/prod1/udump' *.utl_file_dir='/home/oracle/htmllogs' *.workarea_size_policy='AUTO' SPFILE FOR PROD2 ON NODE1 *._OPTIM_ENHANCE_NNULL_DETECTION=FALSE *.archive_lag_target=0 *.background_dump_dest='/usr/oracle/oradba/prod2/bdump' *.compatible='9.2.0.1.0' *.control_file_record_keep_time=14 *.control_files='/db01/oracle/oradata/prod2/control01.ctl','/db02/oracle/oradata/prod2/control02.ctl','/db03/ora cle/oradata/prod2/control03.ctl' *.cursor_space_for_time=TRUE *.db_block_size=8192 *.db_cache_size=200M *.db_create_online_log_dest_1='' *.db_create_online_log_dest_2='' *.db_create_online_log_dest_3='' *.db_create_online_log_dest_4='' *.db_create_online_log_dest_5='' *.db_domain='domain.com' Paper #36226 35 Database *.db_file_multiblock_read_count=32 *.db_files=471 *.db_name='PROD2' *.fal_client='PROD2_node2' *.fal_server='PROD2_node1' *.global_names=false *.instance_name='PROD2' *.log_archive_dest_1='location=/arch1/prod2 mandatory arch async noreopen max_failure=0 alternate=log_archive_dest_3' *.log_archive_dest_2='service=PROD2_node2 optional lgwr async=20480 noaffirm reopen=15 max_failure=10 delay=30 net_timeout=30' *.log_archive_dest_3='location=/arch2/prod2' *.log_archive_dest_state_1='enable' *.log_archive_dest_state_2='defer'#stby - set to enable when in pri role *.log_archive_dest_state_3='alternate' *.log_archive_dest='' *.log_archive_duplex_dest='' *.log_archive_format='log_%t_%s.arc' *.log_archive_start=true *.log_buffer=1048576 *.optimizer_mode='choose' *.parallel_threads_per_cpu=2 *.pga_aggregate_target=100M *.pre_page_sga=TRUE *.processes=200 *.query_rewrite_enabled=true *.query_rewrite_integrity='trusted' *.remote_archive_enable=true *.remote_login_passwordfile='EXCLUSIVE' *.service_names='PROD2.domain.com' *.shared_pool_size=200M *.standby_archive_dest='/arch1/prod2' *.standby_file_management='auto' *.timed_statistics=TRUE *.undo_management='auto' *.user_dump_dest='/usr/oracle/oradba/prod2/udump' *.utl_file_dir='/home/oracle/htmllogs' *.workarea_size_policy='AUTO' SPFILE FOR PROD2 ON NODE2 *._OPTIM_ENHANCE_NNULL_DETECTION=FALSE *.archive_lag_target=0 *.background_dump_dest='/usr/oracle/oradba/prod2/bdump' *.compatible='9.2.0.1.0' *.control_file_record_keep_time=14 *.control_files='/db01/oracle/oradata/prod2/control01.ctl','/db02/oracle/oradata/prod2/control02.ctl','/db03/ora cle/oradata/prod2/control03.ctl' *.cursor_space_for_time=TRUE *.db_block_size=8192 *.db_cache_size=200M *.db_create_online_log_dest_1='' *.db_create_online_log_dest_2='' *.db_create_online_log_dest_3='' *.db_create_online_log_dest_4='' *.db_create_online_log_dest_5='' *.db_domain='domain.com' *.db_file_multiblock_read_count=32 *.db_files=471 *.db_name='PROD2' *.fal_client='PROD2_node1' *.fal_server='PROD2_node2' *.global_names=false *.instance_name='PROD2' *.log_archive_dest_1='location=/arch1/prod2 mandatory arch async noreopen max_failure=0 alternate=log_archive_dest_3' *.log_archive_dest_2='service=PROD2_node1 optional lgwr async=20480 noaffirm reopen=15 max_failure=10 delay=30 net_timeout=30' *.log_archive_dest_3='location=/arch2/prod2' *.log_archive_dest_state_1='enable' *.log_archive_dest_state_2='enable'#stby - set to defer when in stby role *.log_archive_dest_state_3='alternate' *.log_archive_dest='' Paper #36226 36 Database *.log_archive_duplex_dest='' *.log_archive_format='log_%t_%s.arc' *.log_archive_start=true *.log_buffer=1048576 *.optimizer_mode='choose' *.parallel_threads_per_cpu=2 *.pga_aggregate_target=100M *.pre_page_sga=TRUE *.processes=200 *.query_rewrite_enabled=true *.query_rewrite_integrity='trusted' *.remote_archive_enable=true *.remote_login_passwordfile='EXCLUSIVE' *.service_names='PROD2.domain.com' *.shared_pool_size=200M *.standby_archive_dest='/arch1/prod2' *.standby_file_management='auto' *.timed_statistics=TRUE *.undo_management='auto' *.user_dump_dest='/usr/oracle/oradba/prod2/udump' *.utl_file_dir='/home/oracle/htmllogs' *.workarea_size_policy='AUTO' Paper #36226 37 Database APPENDIX 2: NETWORK CONFIGURATION FILES Below are all network configuration files (listener.ora and tnsnames.ora) for the standard bidirectional Data Guard environment presented here. LISTENER.ORA FOR NODE1 # # # # Below is the PROD1 listener for non-privileged database clients. It listens on the network address for node1 on port 1021. It should be up when PROD1 is primary on node1 and down when PROD1 is primary on node2. LSNRPROD1PUB = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1021)(SDU=32768)) ) ) # The SID_LIST below is required because we are using a static SID to # allow Oracle Enterprise Manager to discover and manage the listener. SID_LIST_LSNRPROD1PUB = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD1) ) ) # # # # Below is the PROD2 listener for non-privileged database clients. It listens on the network address for node1 on port 1022. It should be up when PROD2 is primary on node1 and down when PROD2 is primary on node2. LSNRPROD2PUB = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1022)(SDU=32768)) ) ) # The SID_LIST below is required because we are using a static SID to # allow Oracle Enterprise Manager to discover and manage the listener. SID_LIST_LSNRPROD2PUB = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD2) ) ) # # # # The following listener for DBA clients and redo log traffic should always be up. DBA clients connect through port 1222. The PROD2 standby on node2 fetches archive log gaps from the PROD2 primary on node1 through port 1122 and goes on the special network, denoted by the node1b hostname. LSNRPROD2REDO = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node1b)(PORT = 1122)(SDU=32768)) (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1222)(SDU=32768)) ) ) # The GLOBAL_DBNAME parameter below is used to allow Oracle Enterprise Manager # to discover PROD2 as both standby on node1 and as primary on node2 where a # different GLOBAL_DBNAME is used. SID_LIST_LSNRPROD2REDO = (SID_LIST = Paper #36226 38 Database (SID_DESC = (GLOBAL_DBNAME = PROD2_node1.domain.com (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD2) ) ) # # # # The following listener for DBA clients and redo log traffic should always be up. DBA clients connect through port 1221. The PROD1 standby on node2 fetches archive log gaps from the PROD1 primary on node1 through port 1121 and goes on the special network, denoted by the node1b hostname. LSNRPROD1REDO = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node1b)(PORT = 1121)(SDU=32768)) (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1221)(SDU=32768)) ) ) SID_LIST_LSNRPROD1REDO = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD1) ) ) LISTENER.ORA FOR NODE2 # # # # Below is the PROD1 listener for non-privileged database clients. It listens on the network address for node2 on port 1021. It should be up when PROD1 is primary on node2 and down when it is primary on node1. LSNRPROD1PUB = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1021)(SDU=32768)) ) ) # The SID_LIST below is required because we are using a static SID to # allow Oracle Enterprise Manager to discover and manage the listener. SID_LIST_LSNRPROD1PUB = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD1) ) ) # # # # Below is the PROD2 listener for non-privileged database clients. It listens on the network address for node2 on port 1022. It should be up when PROD2 is primary on node2 and down when it is primary on node1. LSNRPROD2PUB = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1022)(SDU=32768)) ) ) # The SID_LIST below is required because we are using a static SID to # allow Oracle Enterprise Manager to discover and manage the listener. SID_LIST_LSNRPROD2PUB = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD2) ) ) Paper #36226 39 Database # # # # The following listener for DBA clients and redo log traffic should always be up. DBA clients connect through port 1221. The PROD1 standby on node1 fetches archive log gaps from the PROD1 primary on node2 through port 1121 and goes on the special network, denoted by the node2b hostname. LSNRPROD1REDO = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node2b)(PORT = 1121)(SDU=32768)) (ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1221)(SDU=32768)) ) ) SID_LIST_LSNRPROD1REDO = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD1) ) ) # # # # The following listener for DBA clients and redo log traffic should always be up. DBA clients connect through port 1222. The PROD2 standby on node1 fetches archive log gaps from the PROD2 primary on node2 through port 1121 and goes on the special network, denoted by the node2b hostname. LSNRPROD2REDO = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node2b)(PORT = 1122)(SDU=32768)) (ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1222)(SDU=32768)) ) ) SID_LIST_LSNRPROD2REDO = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD2) ) ) TNSNAMES.ORA FOR NODE1 # # # # The following entry is for the PROD2 standby on node1 to fetch archive log gaps from the PROD2 primary on node2. PROD2_node2 is the PROD2 setting for FAL_CLIENT on node1. This redo traffic goes through the special network connecting the two nodes. PROD2_node2 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node2b)(PORT = 1122)) ) (CONNECT_DATA = (SID = PROD2) ) ) # # # # The following entry is for the PROD1 standby on node2 to fetch archive log gaps from the PROD1 primary on node1. PROD1_node2 is the PROD1 setting for FAL_CLIENT on node2. This redo traffic goes through the special network connecting the two nodes. PROD1_node2 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node2b)(PORT = 1121)) ) (CONNECT_DATA = (SID = PROD1) ) Paper #36226 40 Database ) TNSNAMES.ORA FOR NODE2 # # # # The following entry is for the PROD2 standby on node1 to fetch archive log gaps from the PROD2 primary on node2. PROD2_node1 is the PROD2 setting for FAL_CLIENT on node2. This redo traffic goes through the special network connecting the two nodes. PROD2_node1 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node1b)(PORT = 1122)) ) (CONNECT_DATA = (SID = PROD2) ) ) # # # # Following is the entry for the PROD1 standby on node2 to fetch archive log gaps from the PROD1 primary on node1. PROD1_node1 is the PROD1 setting for FAL_CLIENT on node2. This redo traffic goes through the special network connecting the two nodes. PROD1_node1 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node1b)(PORT = 1121)) ) (CONNECT_DATA = (SID = PROD1) ) ) TNSNAMES.ORA FOR NON-PRIVILEGED CLIENTS # # # # Below are the PROD1 and PROD2 aliases for non-privileged clients. These aliases are used to connect only to the primary databases PROD1 and PROD2. Proper operation relies upon starting the listener only on the node in the primary database role. PROD1 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1021)) (ADDRESS=(PROTOCOL=tcp)(HOST=node2)(PORT=1021)) ) (CONNECT_DATA = (SID = PROD1) ) ) PROD2 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1022)) (ADDRESS=(PROTOCOL=tcp)(HOST=node2)(PORT=1022)) ) (CONNECT_DATA = (SID = PROD2) ) ) TNSNAMES.ORA FOR DBA CLIENTS # Below are the PROD1 and PROD2 aliases for DBA clients. # These aliases can be used to connect to PROD1 and PROD2 in # either role on either node. PROD1NODE1 = Paper #36226 41 Database (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1221)) ) (CONNECT_DATA = (SID = PROD1) ) ) PROD2NODE1 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1222)) ) (CONNECT_DATA = (SID = PROD2) ) ) PROD1NODE2 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS=(PROTOCOL=tcp)(HOST=node2)(PORT=1221)) ) (CONNECT_DATA = (SID = PROD1) ) ) PROD2NODE2 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS=(PROTOCOL=tcp)(HOST=node2)(PORT=1222)) ) (CONNECT_DATA = (SID = PROD2) ) ) Paper #36226 42 Database APPENDIX 3: MONITORING SCRIPTS FOR STANDBY DATABASES The following set of scripts have been implemented and tested at several customer sites. They have proven highly effective in managing Oracle8i standby database and Oracle9i Data Guard environments.11 CRONTAB FOR MONITORING SCRIPTS Below are the crontab entries for the two monitoring scripts also listed in this Appendix. # DBA's crontab .... # #min hour day month dow #0-59 0-23 1-31 1-12 0-6 command #---------------------------------------------# # # 0 9 * * * /export/home/dmsdba/tools/STBY_gap_check.ksh > /export/home/dmsdba/tmp/cron_STBY_gap_check_$$.log 2>&1 # # # 0 9 * * * /export/home/dmsdba/tools/STBY_integrity_check.ksh > /export/home/dmsdba/tmp/cron_STBY_check_$$.log 2>&1 # STBY_GAP_CHECK.KSH #!/bin/ksh # # File: STBY_gap_check.ksh # *************************************************************************** # # typeset -x PATH=/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/ccs/bin:$PATH # typeset -x PATH=/export/home/dmsdba/tools:$PATH # typeset -x ORACLE_PATH=$HOME/active:$HOME/tools/sql:$HOME/tools/sql/misclib:/ora/admin/scripts \ ORACLE_BASE=/ora # . /usr/local/bin/setup_ora_STBY # # tempfile=/export/home/dmsdba/tmp/STBY_gap_check_$$.dat # # # Check STBY for Recovery Mode # echo " " > $tempfile echo "Checking STBY for Recovery Mode ------------------" >> $tempfile echo " " >> $tempfile # # sqlplus -s /nolog << EOS >> $tempfile connect system/passwd as sysdba set pause off set timing off set feedback off @recovery_mode_check.sql EOS # # # # Check STBY for Gaps in Archive log application # echo " " >> $tempfile I am indebted to my colleague and mentor, Matthew Burke, Director of ThinkSpark’s Data Management Services (DMS) Group, for writing, implementing and testing all scripts in Appendix 3. 11 Paper #36226 43 Database echo "Checking STBY for Gaps in Archive log applications ------------------" >> $tempfile echo " " >> $tempfile # # sqlplus -s /nolog << EOS >> $tempfile connect system/passwd as sysdba set pause off set timing off @archive_log_gap_check.sql EOS # # mailx -s "STBY Archive Log Gap Check" [email protected] < $tempfile # # exit 0 SETUP_ORA_STBY # File: setup_ora_STBY # # # This script must be "dot" executed in order to # apply correctly. Consequently, this file should # NOT be given execute privilege - hopefully we'll # avoid misleading any users. # # NOTE: In order to avoid version mismatches, these scripts # always reset certain environment variables # # # Remove any possible setting for TWO_TASK # unset TWO_TASK # # # Setup common environment # ORACLE_BASE=/ora ORAENV_ASK=NO TMPDIR=/tmp TNS_ADMIN=/var/opt/oracle # export ORACLE_BASE ORAENV_ASK TMPDIR TNS_ADMIN # Setup to Oracle STBY Database # NLS_LANG=AMERICAN_AMERICA.UTF8 NLS_DATE_FORMAT=DD-MON-RR NLS_DATE_LANGUAGE=AMERICAN NLS_NUMERIC_CHARACTERS=".," ORACLE_SID=STBY EXPORTS=/ora/EXPORTS # export NLS_LANG NLS_DATE_FORMAT NLS_DATE_LANGUAGE NLS_NUMERIC_CHARACTERS ORACLE_SID EXPORTS # # # # # Setup to Java Runtime Environment # JRE_DIR=/ora/jre # PATH=$JRE_DIR/jre1.1.6/bin:$PATH CLASSPATH=$CLASSPATH:$JRE_DIR/jre1.1.6/lib/rt.jar # # export PATH CLASSPATH JRE_DIR # # # Custom settings for Client # HOSTNAME=be2 HOSTTYPE=sparc LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/ucblib:/lib:/ora/product/817/lib Paper #36226 44 Database # export HOSTNAME HOSTTYPE LD_LIBRARY_PATH # # . /usr/local/bin/oraenv # # # .eof RECOVERY_MODE_CHECK.SQL rem rem rem File: recovery_mode_check.sql rem rem rem prompt The presence of lock type MR below indicates that the database is prompt in media recovery mode - either manual or managed - not simply started prompt and mounted. rem #column "Lock Type" format a12 rem rem select type "Lock Type", count(*) "Number of Locks" from v$lock group by type / rem rem rem .eof ARCHIVE_LOG_GAP_CHECK.SQL rem rem rem rem rem rem select from File: archive_log_gap_check.sql high.thread#, "LogGap#", "HighGap#" ( select from thread#, MIN(sequence#)-1 "HighGap#" ( select from a.thread#, a.sequence# ( select * from v$archived_log ) a, ( select thread#, MAX(next_change#)gap1 from v$log_history group by thread# ) b a.thread# = b.thread# a.next_change# > gap1 where and ) group by thread# ) high, ( select thread#, MIN(sequence#) "LogGap#" from ( select thread#, sequence# from v$log_history, v$datafile where checkpoint_change# <= next_change# and checkpoint_change# >= first_change# Paper #36226 45 Database where ) group by thread# ) low low.thread# = high.thread# / rem rem rem .eof STBY_INTEGRITY_CHECK.KSH #!/bin/ksh # # File: STBY_integrity_check.ksh # *************************************************************************** # # typeset -x PATH=/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/ccs/bin:$PATH # typeset -x PATH=/export/home/dmsdba/tools:$PATH # typeset -x ORACLE_PATH=$HOME/active:$HOME/tools/sql:$HOME/tools/sql/misclib:/ora/admin/scripts \ ORACLE_BASE=/ora # . /usr/local/bin/setup_ora_STBY # # tempfile=/export/home/dmsdba/tmp/STBY_check_$$.dat # # # Check STBY for Recovery Mode # echo " " > $tempfile echo "Checking STBY Recovery Mode------------------" >> $tempfile echo " " >> $tempfile # # sqlplus -s /nolog << EOS >> $tempfile connect system/passwd as sysdba set pause off set timing off set feedback off @recovery_mode_check.sql EOS # # # # Check PRI for Unrecoverable Datafiles # # Check STBY for Unrecoverable Datafiles # echo " " >> $tempfile echo "Checking STBY Datafiles------------------" >> $tempfile echo " " >> $tempfile # # sqlplus -s /nolog << EOS >> $tempfile connect system/passwd as sysdba set pause off set timing off set feedback off @unrecoverable_change EOS # # # # Check PRI for Unrecoverable Datafiles # # echo " " >> $tempfile echo "Checking PRI Datafiles------------------" >> $tempfile echo " " >> $tempfile Paper #36226 46 Database # # sqlplus -s /nolog << EOS >> $tempfile connect system/passwd @PRI set pause off set timing off set feedback off @unrecoverable_change EOS # # # Check STBY for any corrupted blocks # alrtscan $ORACLE_BASE/admin/STBY/bdump/alert_STBY.log | tail -300 >> $tempfile # # mailx -s "STBY Integrity Check" [email protected] < $tempfile # # exit 0 UNRECOVERABLE_CHANGE.SQL rem rem rem rem rem rem rem rem rem rem rem select File: unrecoverable_change.sql Check for unrecoverable changes in the Standby database by unrecoverable operations in the primary database. unrecoverable_change# as "Unrecoverable SCN", to_char(unrecoverable_time, 'mm-dd-yyyy hh24:mi:ss') as "Unrecoverable Timestamp", file# as "File Number", name as "File Name" from v$datafile where unrecoverable_time is not null order by file# / rem rem rem .eof STBY_GAP_CHECK.KSH EMAIL OUTPUT Following is a sample email output from the script, STBY_gap_check.ksh listed above. It indicates a gap in archive log application which needs to be addressed: -----Original Message----From: [email protected] [mailto:[email protected]] Sent: Friday, August 15, 2003 9:07 AM To: [email protected] Subject: STBY Archive Log Gap Check Checking STBY for Recovery Mode -----------------Connected. The presence of lock type MR below indicates that the database is in media recovery mode - either manual or managed - not simply started and mounted. Lock Type Number of Locks ------------ -----------------FS 1 IS 1 Paper #36226 47 Database MR RT WL 27 1 1 Checking STBY for Gaps in Archive log applications -----------------Connected. THREAD# LogGap# HighGap# ------------------ ------------------ -----------------1 28,015 28,016 1 row selected. STBY_INTEGRITY_CHECK.KSH EMAIL OUTPUT Below is email output from STBY_integrity_check.ksh: -----Original Message----From: [email protected] [mailto:[email protected]] Sent: Friday, August 15, 2003 1:00 AM To: [email protected] Subject: STBY Integrity Check Checking STBY Recovery Mode-----------------Connected. The presence of lock type MR below indicates that the database is in media recovery mode - either manual or managed - not simply started and mounted. Lock Type Number of Locks ------------ -----------------FS 1 IS 1 MR 27 RT 1 Checking STBY Datafiles-----------------Connected. Unrecoverable SCN -----------------917,984,600 930,774,627 930,774,668 917,989,671 917,989,801 917,990,967 Unrecoverable Times File Number File Name ------------------- ------------------ ---------------------------------------------------02-19-2003 11:04:35 7 /d02/oradata/STBY/data/l1_ez_i01.dbf 03-05-2003 10:04:21 12 /d03/oradata/STBY/data/l2_ez_intf_d_01.dbf 03-05-2003 10:04:28 13 /d02/oradata/STBY/data/l1_ez_intf_i_01.dbf 02-19-2003 11:12:24 21 /d02/oradata/STBY/data/l1_ez_i_02.dbf 02-19-2003 11:12:39 22 /d02/oradata/STBY/data/l1_ez_i_03.dbf 02-19-2003 11:15:20 23 /d02/oradata/STBY/data/l1_ez_i_04.dbf Checking PRI Datafiles-----------------Connected. Unrecoverable SCN -----------------917,984,600 930,774,627 930,774,668 917,989,671 917,989,801 917,990,967 Unrecoverable Times File Number File Name ------------------- ------------------ ---------------------------------------------------02-19-2003 11:04:35 7 /CL1/oradata/PRI/data/l1_ez_i01.dbf 03-05-2003 10:04:21 12 /CL2/oradata/PRI/data/l2_ez_intf_d_01.dbf 03-05-2003 10:04:28 13 /CL1/oradata/PRI/data/l1_ez_intf_i_01.dbf 02-19-2003 11:12:24 21 /CL1/oradata/PRI/data/l1_ez_i_02.dbf 02-19-2003 11:12:39 22 /CL1/oradata/PRI/data/l1_ez_i_03.dbf 02-19-2003 11:15:20 23 /CL1/oradata/PRI/data/l1_ez_i_04.dbf Media Recovery Log /ora/STBYarch/PRI_28010_1.arc Thu Aug 14 16:17:00 2003 Media Recovery Waiting for thread 1 seq# 28011 Thu Aug 14 18:41:17 2003 Media Recovery Log /ora/STBYarch/PRI_28011_1.arc Thu Aug 14 18:42:28 2003 Media Recovery Waiting for thread 1 seq# 28012 Thu Aug 14 21:01:15 2003 Media Recovery Log /ora/STBYarch/PRI_28012_1.arc Thu Aug 14 21:02:06 2003 Paper #36226 48 Database Media Recovery Waiting for thread 1 seq# 28013 Thu Aug 14 21:03:51 2003 Media Recovery Log /ora/STBYarch/PRI_28013_1.arc Media Recovery Waiting for thread 1 seq# 28014 Thu Aug 14 21:05:48 2003 Media Recovery Log /ora/STBYarch/PRI_28014_1.arc Thu Aug 14 21:06:18 2003 Media Recovery Waiting for thread 1 seq# 28015 Thu Aug 14 21:07:48 2003 Media Recovery Log /ora/STBYarch/PRI_28015_1.arc Thu Aug 14 21:08:13 2003 Media Recovery Waiting for thread 1 seq# 28016 Thu Aug 14 21:14:46 2003 Shutting down instance (immediate) License high water mark = 5 Thu Aug 14 21:14:58 2003 Media Recovery failed with error 1089 ORA-283 signalled during: ALTER DATABASE RECOVER managed standby database Thu Aug 14 21:15:20 2003 ALTER DATABASE CLOSE NORMAL Thu Aug 14 21:15:20 2003 ORA-1109 signalled during: ALTER DATABASE CLOSE NORMAL... Thu Aug 14 21:15:20 2003 ALTER DATABASE DISMOUNT Completed: ALTER DATABASE DISMOUNT archiving is disabled Thu Aug 14 21:15:22 2003 ARCH shutting down ARC0: Archival stopped Thu Aug 14 21:39:36 2003 Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 LICENSE_MAX_USERS = 0 Starting up ORACLE RDBMS Version: 9.2.0.1.0 . . . System parameters with non-default values: processes = 100 ARCH: STARTING ARCH PROCESSES ARC0 started with pid=10 Thu Aug 14 21:39:40 2003 ARCH: STARTING ARCH PROCESSES COMPLETE Thu Aug 14 21:39:40 2003 ARC0: Archival started Thu Aug 14 21:39:41 2003 alter database mount standby database Thu Aug 14 21:39:45 2003 Successful mount of redo thread 1, with mount id 2715592097. Thu Aug 14 21:39:45 2003 Standby Database mounted. Completed: alter database mount standby database Thu Aug 14 21:39:45 2003 ALTER DATABASE RECOVER managed standby database Media Recovery Start: Managed Standby Recovery Media Recovery Log Media Recovery Waiting for thread 1 seq# 28016 <EOF> ... This output reports six unrecoverable datafiles on both the primary and standby sites. It also indicates that the database is in media recovery mode, but that the standby is still waiting for log sequence 28,016 from the primary database: The primary database probably lost network connectivity with the standby. Data Guard should automatically detect and resolve this archive gap, and fetch all logs on the primary from sequence# 28,016 onward. If necessary (because the network problem is not fixed in a timely manner, for example), you can manually copy these archive logs to the standby, recover the standby, and place it in managed recovery mode again. To manually recover the standby, execute the following on the standby: SQL> SQLPLUS INTERNAL; SQL> STARTUP NOMOUNT; Paper #36226 49 Database SQL> ALTER DATABASE MOUNT STANDBY DATABASE; After mounting the standby, the primary should resume automatically copying archive logs to the standby. Initiate managed recovery so the standby can catch up with the primary archive logs as follows: SQL> RECOVER AUTOMATIC STANDBY DATABASE; SQL> CANCEL; SQL> EXIT; Then start a background process to place the standby in managed recovery mode by executing begin_managed_standby.ksh. This script is listed below along with the script it calls, recover_managed_standby.ksh: BEGIN_MANAGED_STANDBY.KSH #!/bin/ksh # # File: begin_managed_standby.ksh # *************************************************************************** # # typeset -x PATH=/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/ccs/bin:$PATH # # # . /usr/local/bin/setup_ora_STBY # # # # unset ORACLE_PATH # nohup $ORACLE_BASE/admin/STBY/scripts/recover_managed_standby.ksh > $ORACLE_BASE/admin/STBY/script_logs/managed_recovery.log 2>&1 & # # # # # .eof RECOVER_MANAGED_STANDBY.KSH #!/bin/ksh # # File: recover_managed_standby.ksh # # # This script assumes the user's environment and the database # are prepared for this command. # # sqlplus "/ as sysdba" << EOS set echo on startup nomount; alter database mount standby database; recover managed standby database; EOS # # # .eof Paper #36226 50

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Implementing Oracle9i Data Guard