Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Front cover IBM PowerHA SystemMirror 7.1 for AIX Learn how to plan for, install, and configure PowerHA with the Cluster Aware AIX component See how to migrate to, monitor, test, and troubleshoot PowerHA 7.1 Explore the IBM Systems Director plug-in and disaster recovery Dino Quintero Shawn Bodily Brandon Boles Bernhard Buehler Rajesh Jeyapaul SangHee Park Minh Pham Matthew Radford Gus Schlachter Stefan Velica Fabiano Zimmermann ibm.com/redbooks International Technical Support Organization IBM PowerHA SystemMirror 7.1 for AIX March 2011 SG24-7845-00 Note: Before using this information and the product it supports, read the information in “Notices” on page ix. First Edition (March 2011) This edition applies to the IBM PowerHA SystemMirror Version 7.1 and IBM AIX Version 6.1 TL6 and 7.1 as the target. © Copyright International Business Machines Corporation 2011. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Chapter 1. PowerHA SystemMirror architecture foundation. . . . . . . . . . . . . . . . . . . . . . 1 1.1 Reliable Scalable Cluster Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Overview of the components for Reliable Scalable Cluster Technology. . . . . . . . . 2 1.1.2 Architecture changes for RSCT 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.3 PowerHA and RSCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Cluster Aware AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.1 CAA daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.2 RSCT changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.3 The central repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.4 Cluster event management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Cluster communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3.1 Communication interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3.2 Communication node status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.3.3 Considerations for the heartbeat configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.3.4 Deciding when a node is down: Round-trip time (rtt) . . . . . . . . . . . . . . . . . . . . . . 20 1.4 PowerHA 7.1 SystemMirror plug-in for IBM Systems Director . . . . . . . . . . . . . . . . . . . 21 1.4.1 Introduction to IBM Systems Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.4.2 Advantages of using IBM Systems Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.4.3 Basic architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 2. Features of PowerHA SystemMirror 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Deprecated features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 New features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Changes to the SMIT panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 SMIT tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 The smitty hacmp command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 The smitty clstart and smitty clstop commands. . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Cluster Standard Configuration menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Custom Cluster Configuration menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.6 Cluster Snapshot menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.7 Configure Persistent Node IP Label/Address menu . . . . . . . . . . . . . . . . . . . . . . . 2.4 The rootvg system event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Resource management enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Start After and Stop After resource group dependencies . . . . . . . . . . . . . . . . . . . 2.5.2 User-defined resource type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Dynamic node priority: Adaptive failover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 CLUSTER_OVERRIDE environment variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 CAA disk fencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 PowerHA SystemMirror event flow differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.1 Startup processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . © Copyright IBM Corp. 2011. All rights reserved. 23 24 24 25 25 26 28 29 30 31 31 31 32 32 34 35 36 37 38 38 iii 2.8.2 Another node joins the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.8.3 Node down processing normal with takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Chapter 3. Planning a cluster implementation for high availability . . . . . . . . . . . . . . . 3.1 Software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Prerequisite for AIX BOS and RSCT components . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Hardware requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Supported hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Requirements for the multicast IP address, SAN, and repository disk . . . . . . . . . 3.3 Considerations before using PowerHA 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Migration planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Shared storage for the repository disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Adapters supported for storage communication . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Multipath driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.4 System Storage Interoperation Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 Multicast address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.2 Network interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.3 Subnetting requirements for IPAT via aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.4 Host name and node name. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.5 Other network considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 44 44 44 45 45 46 46 48 48 49 50 50 50 51 51 51 51 51 Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX . . . . . . . . . . . . . . . . . . . . . . . 4.1 Hardware configuration of the test environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 SAN zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Shared storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Configuring the FC adapters for SAN-based communication . . . . . . . . . . . . . . . . 4.2 Installing PowerHA file sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 PowerHA software installation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Volume group consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 54 54 55 57 58 59 64 Chapter 5. Configuring a PowerHA cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.1 Cluster configuration using SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1.1 SMIT menu changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1.2 Overview of the test environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.1.3 Typical configuration of a cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.1.4 Custom configuration of the cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.1.5 Configuring resources and applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.1.6 Configuring Start After and Stop After resource group dependencies . . . . . . . . . 96 5.1.7 Creating a user-defined resource type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.1.8 Configuring the dynamic node priority (adaptive failover) . . . . . . . . . . . . . . . . . . 102 5.1.9 Removing a cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.2 Cluster configuration using the clmgr tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.2.1 The clmgr action commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.2.2 The clmgr object classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.2.3 Examples of using the clmgr command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.2.4 Using help in clmgr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.2.5 Configuring a PowerHA cluster using the clmgr command. . . . . . . . . . . . . . . . . 112 5.2.6 Alternative output formats for the clmgr command . . . . . . . . . . . . . . . . . . . . . . . 130 5.2.7 Log file of the clmgr command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 5.2.8 Displaying the log file content by using the clmgr command . . . . . . . . . . . . . . . 132 5.3 PowerHA SystemMirror for IBM Systems Director . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 iv IBM PowerHA SystemMirror 7.1 for AIX Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 . . . . . . . . . . . . . . . . . . 6.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Installing the required file sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Installing DB2 on both nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.3 Importing the shared volume group and file systems . . . . . . . . . . . . . . . . . . . . . 6.1.4 Creating the DB2 instance and database on the shared volume group . . . . . . . 6.1.5 Updating the /etc/services file on the secondary node . . . . . . . . . . . . . . . . . . . . 6.1.6 Configuring IBM PowerHA SystemMirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Implementing a PowerHA SystemMirror cluster and Smart Assist for DB2 7.1 . . . . . 6.2.1 Preliminary steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Starting Smart Assist for DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Completing the configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 136 136 136 137 137 139 139 139 139 141 147 Chapter 7. Migrating to PowerHA 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Considerations before migrating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Understanding the PowerHA 7.1 migration process . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Stages of migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Premigration checking: The clmigcheck program . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Snapshot migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Overview of the migration process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Performing a snapshot migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 Checklist for performing a snapshot migration . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.4 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Rolling migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.2 Performing a rolling migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.3 Checking your newly migrated cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Offline migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Planning the offline migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Offline migration flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.3 Performing an offline migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 152 153 153 157 161 162 163 176 176 177 178 178 191 191 191 194 195 Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster . . . . . . . . . . . . . 8.1 Collecting information before a cluster is configured . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Collecting information after a cluster is configured . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Collecting information after a cluster is running . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 AIX commands and log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 CAA commands and log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.3 PowerHA 7.1 cluster monitoring tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.4 PowerHA ODM classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.5 PowerHA clmgr utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.6 IBM Systems Director web interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.7 IBM Systems Director CLI (smcli interface) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 202 206 216 216 224 231 236 241 246 257 Chapter 9. Testing the PowerHA 7.1 cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Testing the SAN-based heartbeat channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Testing the repository disk heartbeat channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Testing environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Simulation of a network failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.2 Testing environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.3 Testing a network failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Testing the rootvg system event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 260 269 269 270 282 282 282 283 286 Contents v 9.4.1 The rootvg system event. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.2 Testing the loss of the rootvg volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.3 Loss of rootvg: What PowerHA logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Simulation of a crash in the node with an active resource group . . . . . . . . . . . . . . . . 9.6 Simulations of CPU starvation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 Simulation of a Group Services failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8 Testing a Start After resource group dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8.1 Testing the standard configuration of a Start After resource group dependency 9.8.2 Testing application startup with Startup Monitoring configured. . . . . . . . . . . . . . 9.9 Testing dynamic node priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 286 289 289 292 296 297 298 298 302 Chapter 10. Troubleshooting PowerHA 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Locating the log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 CAA log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.2 PowerHA log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Troubleshooting the migration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 The clmigcheck script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.2 The ‘Cluster still stuck in migration’ condition . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.3 Existing non-IP networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Troubleshooting the installation and configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 The clstat and cldump utilities and the SNMP. . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.2 The /var/log/clcomd/clcomd.log file and the security keys . . . . . . . . . . . . . . . . 10.3.3 The ECM volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.4 Communication path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Troubleshooting problems with CAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.1 Previously used repository disk for CAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.2 Repository disk replacement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.3 CAA cluster after the node restarts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.4 Creation of the CAA cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.5 Volume group name already in use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.6 Changed PVID of the repository disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.7 The ‘Cluster services are not active’ message . . . . . . . . . . . . . . . . . . . . . . . . . 305 306 306 306 308 308 308 308 312 312 313 313 314 316 316 317 317 318 320 322 323 Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in . 325 11.1 Installing IBM Systems Director Version 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 11.1.1 Hardware requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 11.1.2 Installing IBM Systems Director on AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 11.1.3 Configuring and activating IBM Systems Director. . . . . . . . . . . . . . . . . . . . . . . 328 11.2 Installing the SystemMirror plug-in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 11.2.1 Installing the SystemMirror server plug-in. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 11.2.2 Installing the SystemMirror agent plug-in in the cluster nodes . . . . . . . . . . . . . 330 11.3 Installing the clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 11.3.1 Installing the common agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 11.3.2 Installing the PowerHA SystemMirror agent . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 Chapter 12. Creating and managing a cluster using IBM Systems Director . . . . . . . 12.1 Creating a cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.1 Creating a cluster with the SystemMirror plug-in wizard . . . . . . . . . . . . . . . . . . 12.1.2 Creating a cluster with the SystemMirror plug-in CLI . . . . . . . . . . . . . . . . . . . . 12.2 Performing cluster management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 Performing cluster management with the SystemMirror plug-in GUI wizard. . . 12.2.2 Performing cluster management with the SystemMirror plug-in CLI. . . . . . . . . 12.3 Creating a resource group with the SystemMirror plug-in GUI wizard . . . . . . . . . . . vi IBM PowerHA SystemMirror 7.1 for AIX 333 334 334 339 341 341 347 349 12.3.1 Creating a custom resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.2 Creating a predefined resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.3 Verifying the creation of a resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Managing a resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.1 Resource group management using the SystemMirror plug-in wizard . . . . . . . 12.4.2 Managing a resource group with the SystemMirror plug-in CLI . . . . . . . . . . . . 12.5 Verifying and synchronizing a configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.1 Verifying and synchronizing a configuration with the GUI. . . . . . . . . . . . . . . . . 12.5.2 Verifying and synchronizing with the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Performing cluster monitoring with the SystemMirror plug-in . . . . . . . . . . . . . . . . . . 12.6.1 Monitoring cluster activities before starting a cluster . . . . . . . . . . . . . . . . . . . . 12.6.2 Monitoring an active cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6.3 Recovering from cluster configuration issues . . . . . . . . . . . . . . . . . . . . . . . . . . 351 353 355 355 355 359 360 360 363 364 364 368 369 Chapter 13. Disaster recovery using DS8700 Global Mirror . . . . . . . . . . . . . . . . . . . . 13.1 Planning for Global Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.1 Software prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.2 Minimum DS8700 requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.3 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Installing the DSCLI client software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Scenario description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Configuring the Global Mirror resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.1 Checking the prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.2 Identifying the source and target volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.3 Configuring the Global Mirror relationships. . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Configuring AIX volume groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.1 Configuring volume groups and file systems on primary site . . . . . . . . . . . . . . 13.5.2 Importing the volume groups in the remote site . . . . . . . . . . . . . . . . . . . . . . . . 13.6 Configuring the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.1 Configuring the cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.2 Configuring cluster resources and resource group . . . . . . . . . . . . . . . . . . . . . . 13.7 Failover testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7.1 Graceful site failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7.2 Rolling site failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7.3 Site re-integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8 LVM administration of DS8000 Global Mirror replicated resources . . . . . . . . . . . . . 13.8.1 Adding a new Global Mirror pair to an existing volume group. . . . . . . . . . . . . . 13.8.2 Adding a Global Mirror pair into a new volume group . . . . . . . . . . . . . . . . . . . . 371 372 372 372 373 373 374 374 375 375 377 381 381 383 385 385 388 393 395 398 400 404 404 411 Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator . . 14.1 Planning for TrueCopy/HUR management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.1 Software prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.2 Minimum connectivity requirements for TrueCopy/HUR . . . . . . . . . . . . . . . . . . 14.1.3 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Overview of TrueCopy/HUR management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.1 Installing the Hitachi CCI software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.2 Overview of the CCI instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.3 Creating and editing the horcm.conf files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Scenario description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4 Configuring the TrueCopy/HUR resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.1 Assigning LUNs to the hosts (host groups). . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.2 Creating replicated pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.3 Configuring an AIX disk and dev_group association. . . . . . . . . . . . . . . . . . . . . 419 420 420 420 421 422 422 424 425 427 429 429 432 443 Contents vii 14.4.4 Defining TrueCopy/HUR managed replicated resource to PowerHA . . . . . . . . 14.5 Failover testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5.1 Graceful site failover for the Austin site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5.2 Rolling site failure of the Austin site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5.3 Site re-integration for the Austin site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5.4 Graceful site failover for the Miami site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5.5 Rolling site failure of the Miami site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5.6 Site re-integration for the Miami site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6 LVM administration of TrueCopy/HUR replicated pairs. . . . . . . . . . . . . . . . . . . . . . . 14.6.1 Adding LUN pairs to an existing volume group . . . . . . . . . . . . . . . . . . . . . . . . . 14.6.2 Adding a new logical volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6.3 Increasing the size of an existing file system . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6.4 Adding a LUN pair to a new volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 454 455 457 459 460 461 462 463 463 466 468 469 Appendix A. CAA cluster commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The lscluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The mkcluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The rmcluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The chcluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The clusterconf command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 478 478 480 480 481 Appendix B. PowerHA SMIT tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 Appendix C. PowerHA supported hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM Power Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM POWER5 systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM POWER6 systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM POWER7 Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM POWER Blade servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fibre Channel adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Network-attached storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Serial-attached SCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fibre Channel adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . InfiniBand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SCSI and iSCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PCI bus adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 492 492 493 494 494 495 495 497 498 498 498 498 499 499 500 500 500 Appendix D. The clmgr man page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 519 519 520 520 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 viii IBM PowerHA SystemMirror 7.1 for AIX Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. © Copyright IBM Corp. 2011. All rights reserved. ix Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: AIX® DB2® Domino® DS4000® DS6000™ DS8000® Enterprise Storage Server® FileNet® FlashCopy® Global Technology Services® HACMP™ IBM® Lotus® Power Systems™ POWER6® POWER7® PowerHA® PowerVM® POWER® pureScale® Redbooks® Redbooks (logo) solidDB® System i® System p® System Storage® Tivoli® WebSphere® XIV® ® The following terms are trademarks of other companies: Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Snapshot, NetApp, and the NetApp logo are trademarks or registered trademarks of NetApp, Inc. in the U.S. and other countries. Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Java, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. x IBM PowerHA SystemMirror 7.1 for AIX Preface IBM® PowerHA® SystemMirror 7.1 for AIX® is a major product announcement for IBM in the high availability space for IBM Power Systems™ Servers. This release now has a deeper integration between the IBM high availability solution and IBM AIX. It features integration with the IBM Systems Director, SAP Smart Assist and cache support, the IBM System Storage® DS8000® Global Mirror support, and support for Hitachi storage. This IBM Redbooks® publication contains information about the IBM PowerHA SystemMirror 7.1 release for AIX. This release includes fundamental changes, in particular departures from how the product has been managed in the past, which has necessitated this Redbooks publication. This Redbooks publication highlights the latest features of PowerHA SystemMirror 7.1 and explains how to plan for, install, and configure PowerHA with the Cluster Aware AIX component. It also introduces you to PowerHA SystemMirror Smart Assist for DB2®. This book guides you through migration scenarios and demonstrates how to monitor, test, and troubleshoot PowerHA 7.1. In addition, it shows how to use IBM Systems Director for PowerHA 7.1 and how to install the IBM Systems Director Server and PowerHA SystemMirror plug-in. Plus, it explains how to perform disaster recovery using DS8700 Global Mirror and Hitachi TrueCopy and Universal Replicator. This publication targets all technical professionals (consultants, IT architects, support staff, and IT specialists) who are responsible for delivering and implementing high availability solutions for their enterprise. The team who wrote this book This book was produced by a team of specialists from around the world working at the International Technical Support Organization (ITSO), Poughkeepsie Center. Dino Quintero is a Project Leader and IT generalist with the ITSO in Poughkeepsie, NY. His areas of expertise include enterprise continuous availability planning and implementation, enterprise systems management, virtualization, and clustering solutions. He is currently an Open Group Master Certified IT Specialist - Server Systems. Dino holds a Master of Computing Information Systems degree and a Bachelor of Science degree in Computer Science from Marist College. Shawn Bodily is a Certified Consulting IT Specialist for Advanced Technical Support Americas in Dallas, Texas. He has worked for IBM for 12 years and has 14 years of AIX experience, with 12 years specializing in High-Availability Cluster Multi-Processing (HACMP™). He is certified in both versions 4 and 5 of HACMP and ATE. He has written and presented on high availability and storage. Shawn has coauthored five other Redbooks publications. Brandon Boles is a Development Support Specialist for PowerHA/HACMP in Austin, Texas. He has been with IBM for four years and has been doing support, programming, and consulting with PowerHA and HACMP for 11 years. Brandon has been working with AIX since version 3.2.5. © Copyright IBM Corp. 2011. All rights reserved. xi Bernhard Buehler is an IT Specialist for IBM in Germany. He is currently working for IBM STG Lab Services in La Gaude, France. He has worked at IBM for 29 years and has 20 years of experience in the AIX and availability field. His areas of expertise include AIX, PowerHA, High Availability architecture, script programming, and AIX security. Bernhard has coauthored several Redbooks publications and several courses in the IBM AIX curriculum. Rajesh Jeyapaul is the technical lead for IBM Systems Director Power Server management. His focus is on improving PowerHA SystemMirror, DB2 pureScale®, and the AIX Runtime Expert plug-in for System Director. He has worked extensively with customers and specialized in performance analysis under the IBM System p® and AIX environment. His areas of expertise includes IBM POWER® Virtualization, high availability, and system management. He has coauthored DS8000 Performance Monitoring and Tuning, SG24-7146, and Best Practices for DB2 on AIX 6.1 for POWER Systems, SG24-7821. Rajesh holds a Master in Software Systems degree from the University of BITS, India, and a Master of Business Administration (MBA) degree from the University of MKU, India. SangHee Park is a Certified IT Specialist in IBM Korea. He is currently working for IBM Global Technology Services® in Maintenance and Technical Support. He has 5 years of experience in Power Systems. His areas of expertise include AIX, PowerHA SystemMirror, and PowerVM® Virtualization. SangHee holds a bachelor degree in aerospace and mechanical engineering from Korea Aerospace University. Minh Pham is currently a Development Support Specialist for PowerHA and HACMP in Austin, Texas. She has worked for IBM for 10 years, including 6 years in System p microprocessor development and 4 years in AIX development support. Her areas of expertise include core and chip logic design for System p and AIX with PowerHA. Minh holds a Bachelor of Science degree in Electrical Engineering from the University of Texas at Austin. Matthew Radford is a Certified AIX Support Specialist in IBM UK. He is currently working for IBM Global Technology Services in Maintenance and Technical Support. He has worked at IBM for 13 years and is a member of the UKI Technical Council. His areas of expertise include AIX, and PowerHA. Matthew coauthored Personal Communications Version 4.3 for Windows 95, 98 and NT, SG24-4689. Matthew holds a Bachelor of Science degree in Information Technology from the University of Glamorgan. Gus Schlachter is a Development Support Specialist for PowerHA in Austin, TX. He has worked with HACMP for over 15 years in support, development, and testing. Gus formerly worked for CLAM/Availant and is an IBM-certified Instructor for HACMP. Stefan Velica is an IT Specialist who is currently working for IBM Global Technologies Services in Romania. He has five years of experience in Power Systems. He is a Certified Specialist for IBM System p Administration, HACMP for AIX, High-end and Entry/Midrange DS Series, and Storage Networking Solutions. His areas of expertise include IBM System Storage, PowerVM, AIX, and PowerHA. Stefan holds a bachelor degree in electronics and telecommunications engineering from Politechnical Institute of Bucharest. Fabiano Zimmermann is an AIX/SAN/TSM Subject Matter Expert for Nestlé in Phoenix, Arizona. He has been working with AIX, High Availability and System Storage since 2000. A former IBM employee, Fabiano has experience and expertise in the areas of Linux, DB2, and Oracle. Fabiano is a member of the L3 team that provides worldwide support for the major Nestlé data centers. Fabiano holds a degree in computer science from Brazil. xii IBM PowerHA SystemMirror 7.1 for AIX Front row from left to right: Minh Pham, SangHee Park, Stefan Velica, Brandon Boles, and Fabiano Zimmermann; back row from left to right: Gus Schlachter, Dino Quintero (project leader), Bernhard Buehler, Shawn Bodily, Matt Radford, and Rajesh Jeyapaul Thanks to the following people for their contributions to this project: Bob Allison Catherine Anderson Chuck Coleman Bill Martin Darin Meyer Keith O'Toole Ashutosh Rai Hitachi Data Systems David Bennin Ella Buslovich Richard Conway Octavian Lascu ITSO, Poughkeepsie Center Patrick Buah Michael Coffey Mark Gurevich Felipe Knop Paul Moyer Skip Russell Stephen Tovcimak IBM Poughkeepsie Eric Fried Frank Garcia Kam Lee Gary Lowther Deb McLemore Ravi A. Shankar Preface xiii Stephen Tee Tom Weaver David Zysk IBM Austin Nick Fernholz Steven Finnes Susan Jasinski Robert G. Kovacs William E. (Bill) Miller Rohit Krishna Prasad Ted Sullivan IBM USA Philippe Hermes IBM France Manohar R Bodke Jes Kiran Anantoju Srinivas IBM India Claudio Marcantoni IBM Italy Now you can become a published author, too! Here's an opportunity to spotlight your skills, grow your career, and become a published author—all at the same time! Join an ITSO residency project and help write a book in your area of expertise, while honing your experience using leading-edge technologies. Your efforts will help to increase product acceptance and customer satisfaction, as you expand your network of technical contacts and relationships. Residencies run from two to six weeks in length, and you can participate either in person or as a remote resident working from your home base. Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html Comments welcome Your comments are important to us! We want our books to be as helpful as possible. Send us your comments about this book or other IBM Redbooks publications in one of the following ways: Use the online Contact us review Redbooks form found at: ibm.com/redbooks Send your comments in an email to: [email protected] xiv IBM PowerHA SystemMirror 7.1 for AIX Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400 Stay connected to IBM Redbooks Find us on Facebook: http://www.facebook.com/IBMRedbooks Follow us on Twitter: http://twitter.com/ibmredbooks Look for us on LinkedIn: http://www.linkedin.com/groups?home=&gid=2130806 Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks weekly newsletter: https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm Stay current on recent Redbooks publications with RSS Feeds: http://www.redbooks.ibm.com/rss.html Preface xv xvi IBM PowerHA SystemMirror 7.1 for AIX 1 Chapter 1. PowerHA SystemMirror architecture foundation This chapter provides information about the new architecture of the IBM PowerHA SystemMirror 7.1 for AIX. It includes the differences from previous versions. This chapter includes the following topics: Reliable Scalable Cluster Technology Cluster Aware AIX Cluster communication PowerHA 7.1 SystemMirror plug-in for IBM Systems Director For an introduction to high availability and IBM PowerHA SystemMirror 7.1, see the “IBM PowerHA SystemMirror for AIX” page at: http://www.ibm.com/systems/power/software/availability/aix/index.html © Copyright IBM Corp. 2011. All rights reserved. 1 1.1 Reliable Scalable Cluster Technology Reliable Scalable Cluster Technology (RSCT) is a set of software components that together provide a comprehensive clustering environment for AIX, Linux, Solaris, and Microsoft Windows. RSCT is the infrastructure used by various IBM products to provide clusters with improved system availability, scalability, and ease of use. This section provides an overview of RSCT, its components, and the communication paths between these components. Several helpful IBM manuals, white papers, and Redbooks publications are available about RSCT. This section focuses on the components that affect PowerHA SystemMirror. To find the most current documentation for RSCT, see the RSCT library in the IBM Cluster Information Center at: http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm. cluster.rsct.doc%2Frsctbooks.html 1.1.1 Overview of the components for Reliable Scalable Cluster Technology RSCT has the following main components: Topology Services This component provides node and network failure detection. Group Services This component provides cross-node or process coordination on some cluster configurations. For a detailed description about how Group Services work, see IBM Reliable Scalable Cluster Technology: Group Services Programming Guide, SA22-7888, at: http://publibfp.boulder.ibm.com/epubs/pdf/a2278889.pdf RSCT cluster security services This component provides the security infrastructure that enables RSCT components to authenticate the identity of other parties. Resource Monitoring and Control (RMC) subsystem This subsystem is the scalable, reliable backbone of RSCT. It runs on a single machine or on each node (operating system image) of a cluster. Also, it provides a common abstraction for the resources of the individual system or the cluster of nodes. You can use RMC for single system monitoring or for monitoring nodes in a cluster. However, in a cluster, RMC provides global access to subsystems and resources throughout the cluster. Therefore, it provides a single monitoring and management infrastructure for clusters. Resource managers A resource manager is a software layer between a resource (a hardware or software entity that provides services to some other component) and RMC. A resource manager maps programmatic abstractions in RMC into the actual calls and commands of a resource. For a more detailed description of the RSCT components, see the IBM Reliable Scalable Cluster Technology: Administration Guide, SA22-7889, at the following web address: http://publibfp.boulder.ibm.com/epubs/pdf/22788919.pdf 2 IBM PowerHA SystemMirror 7.1 for AIX 1.1.2 Architecture changes for RSCT 3.1 RSCT version 3.1 is the first version that supports Cluster Aware AIX (CAA). Although this section provides a high-level introduction to the RSCT architecture changes to support, you can find more details about CAA in 1.2, “Cluster Aware AIX” on page 7. As shown in Figure 1-1 on page 3, RSCT 3.1 can operate without CAA in “non-CAA” mode. You use the non-CAA mode if you use one of the following products: PowerHA versions before PowerHA 7.1 A mixed cluster with PowerHA 7.1 and older PowerHA versions Existing RSCT Peer Domains (RPD) that were created before RSCT 3.1 was installed A new RPD, when you specify during creation that the system must not use or create a CAA cluster Figure 1-1 shows both modes in which RSCT 3.1 can be used (with or without CAA). The left part shows the non-CAA mode, which is equal to the older RSCT versions. The right part shows the CAA-based mode. The difference between these modes is that Topology Services has been replaced with CAA. Important: On a given node, use only one RSCT version at a time. RSCT with CAA RSCT without CAA RSCT Group Services (grpsvcs) Resource Manager Resource Monitoring and Control Resource Monitoring and Control Resource Manager RSCT Group Services (cthags) Topology Services AIX AIX CAA Figure 1-1 RSCT 3.1 RSCT 3.1 is available for both AIX 6.1 and AIX 7.1. To use CAA, for RSCT 3.1 on AIX 6.1, you must have TL 6 or later installed. CAA on AIX 6.1 TL 6: The use of CAA on AIX 6.1 TL 6 is enabled only for PowerHA 7.1. Chapter 1. PowerHA SystemMirror architecture foundation 3 Figure 1-2 shows a high-level architectural view of how IBM high availability (HA) applications PowerHA, IBM Tivoli® System Automation for Multiplatforms, and Virtual I/O Server (VIOS) Clustered Storage use the RSCT and CAA architecture. Figure 1-2 HA applications using the RSCT and CAA architecture 4 IBM PowerHA SystemMirror 7.1 for AIX 1.1.3 PowerHA and RSCT Figure 1-3 shows the non-CAA communication paths between PowerHA and RSCT. Non-CAA mode is still used when you have a PowerHA version 6.1 or earlier, even if you are using AIX 7.1. The main communication goes from PowerHA to Group Services (grpsvcs), then to Topology Services (topsvcs), and back to PowerHA. The communication path from PowerHA to RMC is used for PowerHA Process Application Monitors. Another case where PowerHA uses RMC is when a resource group is configured with the Dynamic Node Priority policy. Figure 1-3 PowerHA using RSCT without CAA Chapter 1. PowerHA SystemMirror architecture foundation 5 Figure 1-4 shows the new CAA-based communication paths of PowerHA, RSCT, and CAA. You use this architecture when you have PowerHA v7.1 or later. It is the same architecture for AIX 6.1 TL 6 and AIX 7.1 or later. As in the previous architecture, the main communication goes from PowerHA to Group Services. However, in Figure 1-4, Group Services communicates with CAA. Figure 1-4 RSCT with Cluster Aware AIX (CAA) Example 1-1 lists the cluster processes on a running PowerHA 7.1 cluster. Group Services subsystem name: Group Services now uses the subsystem name cthags, which replaces grpsvcs. Group Services is now started with a different control script (cthags) and in turn from a different subsystem name cthags. Example 1-1 Output of lssrc # lssrc -a | egrep "rsct|ha|svcs|caa|cluster" | grep -v _rm cld caa 4980920 active clcomd caa 4915400 active clconfd caa 5243070 active cthags cthags 4456672 active ctrmc rsct 5767356 active clstrmgrES cluster 10813688 active solidhac caa 10420288 active solid caa 5832836 active clevmgrdES cluster 5177370 active clinfoES cluster 11337972 active ctcas rsct inoperative topsvcs topsvcs inoperative 6 IBM PowerHA SystemMirror 7.1 for AIX grpsvcs grpglsm emsvcs emaixos grpsvcs grpsvcs emsvcs emsvcs inoperative inoperative inoperative inoperative 1.2 Cluster Aware AIX Cluster Aware AIX introduces fundamental clustering capabilities into the base operating system AIX. Such capabilities include the creation and definition of the set of nodes that comprise the cluster. CAA provides the tools and monitoring capabilities for the detection of node and interface health. File sets: CAA is provided by the non-PowerHA file sets bos.cluster.rte, bos.ahafs, and bos.cluster.solid. The file sets are on the AIX Install Media or in the TL6 of AIX 6.1. More information: For more information about CAA, see Cluster Management, SC23-6779, and the IBM AIX Version 7.1 Differences Guide, SG24-7910. CAA provides a set of tools and APIs to enable clustering on the AIX operating system. CAA does not provide the application monitoring and resource failover capabilities that PowerHA provides. PowerHA uses the CAA capabilities. Other applications and software programs can use the APIs and command-line interfaces (CLIs) that CAA provides to make their applications and services “Cluster Aware” on the AIX operating system. Figure 1-2 on page 4 illustrates how applications can use CAA. The following products and parties can use CAA technology: RSCT (3.1 and later) PowerHA (7.1 and later) VIOS (CAA support in a future release) Third-party ISVs, service providers, and software products CAA provides the following features among others: Central repository – Configuration – Security Quorumless (CAA does not require a quorum to be up and operational.) Monitoring capabilities for custom actions Fencing aids – Network – Storage – Applications The following sections explain the concepts of the CAA central repository, RSCT changes, and how PowerHA 7.1 uses CAA. Chapter 1. PowerHA SystemMirror architecture foundation 7 1.2.1 CAA daemons When CAA is active in your cluster, you notice the daemon services running as shown in Figure 1-5. chile:/ # lssrc -g caa Subsystem Group clcomd caa cld caa solid caa clconfd caa solidhac caa PID 4849670 7012500 11010276 7340038 10027064 Status active active active active active Figure 1-5 CAA services CAA includes the following services: clcomd This daemon is the cluster communications daemon, which has changed in PowerHA 7.1. In previous versions of PowerHA, it was called clcomdES. The location of the rhosts file that PowerHA uses has also changed. The rhosts file used by the clcomd service is in the /etc/cluster/rhosts directory. The old clcomdES rhosts file in the /usr/es/sbin/cluster/etc directory is not used. cld The cld daemon runs on each node and determines whether the local node must be the primary or the secondary solidDB® database server. solid The solid subsystem provides the database engine, and solidHAC is used for high availability of the IBM solidDB database. Both run on the primary and the secondary database servers. In a two-node cluster, the primary database is mounted on node 1 (/clrepos_private1), and the secondary database is mounted on node 2 (/clrepos_private2). These nodes have the solid and solidHAC subsystems running. In a three-node cluster configuration, the third node acts as a standby for the other two nodes. The solid subsystem (solid and solidHAC) is not running, and the file systems (/clrepos_private1 and /clrepos_private2) are not mounted. If a failure occurs on the primary or secondary nodes of the cluster, the third node activates the solid subsystem. It mounts either the primary or secondary file system, depending on the node that has failed. See 1.2.3, “The central repository” on page 9, for information about file systems. clconfd The clconfd subsystem runs on each node of the cluster. The clconfd daemon wakes up every 10 minutes to synchronize any necessary cluster changes. 1.2.2 RSCT changes IBM PowerHA now uses CAA, instead of RSCT, to handle the cluster topology, including heartbeating, configuration information, and live notification events. PowerHA still communicates with RSCT Group Services (grpsvcs replaced by cthags), but PowerHA has replaced the topsvcs function with the new CAA function. CAA reports the status of the 8 IBM PowerHA SystemMirror 7.1 for AIX topology to cthags, by using Autonomic Health Advisory File System API (AHAFS) events, which are fed up to cthagsrhosts. For information about the RSCT changes, see 1.1.2, “Architecture changes for RSCT 3.1” on page 3. 1.2.3 The central repository A major part of CAA is the central repository. The central repository is stored on a dedicated storage area network (SAN) disk that is shared between all participating nodes. This repository contains the following structures: Bootstrap repository (BSR) LV1, LV2, LV3 (private LVs) solidDB (primary location (/clrepos_private1) and secondary location (/clrepos_private2)) CAA repository disk: The CAA repository disk is reserved for use by CAA only. Do not attempt to change any of it. The information in this chapter is provided for information only to help you understand the purpose of the new disk and file system structure. Figure 1-6 shows an overview of the CAA repository disk and its structure. Figure 1-6 Cluster repository disk structure If you installed and configured PowerHA 7.1, your cluster repository disk is displayed as varied on (active) in lspv output as shown in Figure 1-7 on page 10. In this figure, the disk label has changed to caa_private0 to remind you that this disk is for private use by CAA only. Figure 1-7 on page 10 also shows a volume group, called caavg_private, which must always be varied on (active) when CAA is running. CAA is activated when PowerHA 7.1 is installed Chapter 1. PowerHA SystemMirror architecture foundation 9 and configured. If you are performing a migration or have an earlier level of PowerHA installed, CAA is not active. If you have a configured cluster and find that caavg_private is not varied on (active), your CAA cluster has a potential problem. See Chapter 10, “Troubleshooting PowerHA 7.1” on page 305, for guidance about recovery in this situation. chile:/ # lspv hdisk1 caa_private0 hdisk3 hdisk4 hdisk5 hdisk6 hdisk7 hdisk8 hdisk0 000fe4114cf8d1ce 000fe40163c54011 000fe4114cf8d2ec 000fe4114cf8d3a1 000fe4114cf8d441 000fe4114cf8d4d5 000fe4114cf8d579 000fe4114cf8d608 000fe40140a5516a None caavg_private None diskhb None None None ny_datavg rootvg active active Figure 1-7 lspv command showing the caa_private repository disk You can view the structure of caavg_private from the standpoint of a Logical Volume Manager (LVM) as shown in Figure 1-8. The lsvg command shows the structure of the file system. chile:/ # lsvg -l caavg_private caavg_private: LV NAME TYPE LPs caalv_private1 boot 1 caalv_private2 boot 1 caalv_private3 boot 4 fslv00 jfs2 4 /clrepos_private1 fslv01 jfs2 4 /clrepos_private2 powerha_crlv boot 1 PPs 1 1 4 4 PVs 1 1 1 1 LV STATE closed/syncd closed/syncd open/syncd open/syncd 4 1 closed/syncd 1 1 closed/syncd MOUNT POINT N/A N/A N/A N/A Figure 1-8 The lsvg output of CAA This file system has a special reserved structure. CAA mounts some file systems for its own use as shown in Figure 1-9 on page 11. The fslv00 file system contains the solidDB database mounted as /clrepos_private1 because the node is the primary node of the cluster. If you look at the output for the second node, you might have /clrepos_private2 mounted instead of /clrepos_private1. See 1.2.1, “CAA daemons” on page 8, for an explanation of the solid subsystem. Important: CAA creates a file system for solidDB on the default lv name (fslv00, fslv01). If you have a default name of lv for existing file systems that is outside of CAA, ensure that both nodes have the same lv names. For example, if node A has the names fslv00, fslv01, and fslv02, node B must have the same names. You must not have any default lv names in your cluster nodes so that CAA can use fslv00, fslv01 for the solidDB. Also a /aha, which is a special pseudo file system, is mounted in memory and used by the AHAFS. See “Autonomic Health Advisor File System” on page 11 for more information. 10 IBM PowerHA SystemMirror 7.1 for AIX Important: Do not interfere with this volume group and its file systems. For example, forcing a umount of /aha on a working cluster causes the node to halt. For more information about CAA, see Cluster Management, SC23-6779, at the following web address: http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.clusteraware/c lusteraware_pdf.pdf 1.2.4 Cluster event management With PowerHA 7.1, event management is handled by using a new pseudo file-system architecture called the Autonomic Health Advisor File System. With this pseudo file system, application programming interfaces (APIs) can program the monitoring of events by reading and writing events to the file system. Autonomic Health Advisor File System The AHAFS is part of the AIX event infrastructure for AIX and AIX clusters and is what CAA uses as its monitoring framework. The AHAFS file system is automatically mounted when you create the cluster (Figure 1-9). chile:/ # mount node mounted -------- --------------/dev/hd4 /dev/hd2 /dev/hd9var /dev/hd3 /dev/hd1 /dev/hd11admin /proc /dev/hd10opt /dev/livedump rw,log=/dev/hd8 /aha /dev/fslv00 rw,dio,log=INLINE mounted over vfs date options --------------- ------ ------------ --------------/ jfs2 Sep 30 13:37 rw,log=/dev/hd8 /usr jfs2 Sep 30 13:37 rw,log=/dev/hd8 /var jfs2 Sep 30 13:37 rw,log=/dev/hd8 /tmp jfs2 Sep 30 13:37 rw,log=/dev/hd8 /home jfs2 Sep 30 13:38 rw,log=/dev/hd8 /admin jfs2 Sep 30 13:38 rw,log=/dev/hd8 /proc procfs Sep 30 13:38 rw /opt jfs2 Sep 30 13:38 rw,log=/dev/hd8 /var/adm/ras/livedump jfs2 Sep 30 13:38 /aha ahafs Sep 30 13:46 rw /clrepos_private1 jfs2 Sep 30 13:52 Figure 1-9 AHAFS file system mounted Event handling entails the following process: 1. Create a monitor file based on the /aha directory. 2. Write the required information to the monitor file to represent the wait type (either a select() call or a blocking read() call). Indicate when to trigger the event, such as a state change of node down. 3. Wait in a select() call or a blocking read() call. 4. Read from the monitor file to obtain the event data. The event data is then fed to Group Services. The event information is retrieved from CAA, and any changes are communicated by using AHAFS events. RSCT Group Services uses the AHAFS services to obtain events on the Chapter 1. PowerHA SystemMirror architecture foundation 11 cluster. This information is provided by cluster query APIs and is fed to Group Services. Figure 1-10 shows a list of event monitor directories. drwxrwxrwt 1 root drwxrwxrwt 1 root drwxrwxrwt 1 root drwxrwxrwt 1 root drwxrwxrwt 1 root drwxrwxrwt 1 root chile:/aha/cluster # system system system system system system 0 1 1 0 1 1 Oct Oct Oct Oct Oct Oct 1 1 1 1 1 1 17:04 17:04 17:04 17:04 17:04 17:04 linkedCl.monFactory networkAdapterState.monFactory nodeAddress.monFactory nodeContact.monFactory nodeList.monFactory nodeState.monFactory Figure 1-10 Directory listing of /aha/cluster The AHAFS files used in RSCT The following AHAFS event files are used in RSCT: Node state, such as NODE_UP or NODE_DOWN /aha/cluster/nodeState.monFactory/nodeStateEvent.mon Node configuration, such as node added or deleted /aha/cluster/nodeList.monFactory/nodeListEvent.mon Adapter state, such as ADAPTER_UP or ADAPTER_DOWN and interfaces added or deleted /aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon Adapter configuration /aha/cluster/nodeAddress.monFactory/nodeAddressEvent.mon Process exit (Group Services daemon), such as PROCESS_DOWN /aha/cpu/processMon.monFactory/usr/sbin/rsct/bin/hagsd.mon Example of a NODE_DOWN event A NODE_DOWN event is written to the nodeStateEvent.mon file in the nodeState.monFactory directory. A NODE_DOWN event from the nodeStateEvent.mon file is interpreted as “a given node has failed.” In this situation, the High Availability Topology Services (HATS) API generates an Hb_Death event on the node group. Example of a network ADAPTER_DOWN event If a network adapter failure occurs, an ADAPTER_DOWN event is generated in the networkAdapterStateEvent.mon file. This event is interpreted as “a given network interface has failed.” In this situation, the HATS API generates an Hb_Death event on the adapter group. Example of Group Services daemon failure When you get a PROCESS_DOWN event because of a failure in Group Services, the event is generated in the hagsd.mon file. This event is treated as a NODE_DOWN event, which is similar to pre-CAA behavior. No PROCESS_UP event exists because, when the new Group Services daemon is started, it broadcasts a message to peer daemons. Filtering duplicated or invalid events AHAFS handles duplicate or invalid events. For example, if a NODE_DOWN event is generated for a node that is already marked as down, the event is ignored. The same applies for “up” events and adapter events. Node events for local nodes are also ignored. 12 IBM PowerHA SystemMirror 7.1 for AIX 1.3 Cluster communication Cluster Aware AIX indicates which nodes are in the cluster and provides information about these nodes including their state. A special “gossip” protocol is used over the multicast address to determine node information and implement scalable reliable multicast. No traditional heartbeat mechanism is employed. Gossip packets travel over all interfaces. The communication interfaces can be traditional networking interfaces (such as an Ethernet) and storage fabrics (SANs with Fibre Channel, SAS, and so on). The cluster repository disk can also be used as a communication device. Gossip protocol: The gossip protocol determines the node configuration and then transmits the gossip packets over all available networking and storage communication interfaces. If no storage communication interfaces are configured, only the traditional networking interfaces are used. For more information, see “Cluster Aware concepts” at: http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.clusterawar e/claware_concepts.htm 1.3.1 Communication interfaces The highly available cluster has several communication mechanisms. This section explains the following interface concepts: IP network interfaces SAN-based communication (SFWCOM) interface Central cluster repository-based communication (DPCOM) interface Output of the lscluster -i command The RESTRICTED and AIX_CONTROLLED interface state Point of contact IP network interfaces IBM PowerHA communicates over available IP interfaces using a multicast address. PowerHA use all IP interfaces that are configured with an address and are in an UP state as long as they are reachable across the cluster. PowerHA SystemMirror management interfaces: PowerHA SystemMirror and Cluster Aware for AIX use all network interfaces that are available for cluster communication. All of these interfaces are discovered by default and are used for health management and other cluster communication. You can use the PowerHA SystemMirror management interfaces to remove any interface that you do not want to be used for application availability. For additional information, see “Cluster communication” topic in the AIX 7.1 Information Center at: http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix. clusteraware/claware_comm_benifits.htm Cluster communication requires the use of a multicast IP address. You can specify this address when you create the cluster, or you can have one generated automatically when you synchronize the initial cluster configuration. Chapter 1. PowerHA SystemMirror architecture foundation 13 Cluster topology configuration on sydney node: The following PowerHA cluster topology is configured from by using smitty sysmirror on the sydney node: NODE perth: Network ether01 perthb2 192.168.201.136 perth 192.168.101.136 NODE sydney: Network ether01 sydneyb2 192.168.201.135 sydney 192.168.101.135 A default multicast address of 228.168.101.135 is generated for the cluster. PowerHA takes the IP address of the node and changes its most significant part to 228 as shown in the following example: x.y.z.t -> 228.y.z.t An overlap of the multicast addresses might be generated by default in the case of two clusters with interfaces in the same virtual LAN (VLAN). This occurs when their IP addresses are similar to the following example: x1.y.z.t x2.y.z.t The netmon.cf configuration file is not required with CAA and PowerHA 7.1. The range 224.0.0.0–224.0.0.255 is reserved for local purposes, such as administrative and maintenance tasks. The data that they receive is never forwarded by multicast routers. Similarly, the range 239.0.0.0–239.255.255.255 is reserved for administrative scooping. These special multicast groups are regularly published in the assigned numbers RFC.1 If multicast traffic is present in the adjacent network, you must ask the network administrator for multicast IP address allocation for your cluster. Also, ensure that the multicast traffic generated by any of the cluster nodes is properly forwarded by the network infrastructure toward the other cluster nodes. The Internet Group Management Protocol (IGMP) must be enabled. Interface states Network interfaces can have any of the following common states. You can see the interface state in the output of the lscluster -i command, as shown in Example 1-2 on page 16. UP The interface is up and active. STALE The interface configuration data is stale, which happens when communication has been lost, but was previously up at some point. DOWN SOURCE HARDWARE RECEIVE / SOURCE HARDWARE TRANSMIT The interface is down because of a failure to receive or transmit, which can happen in the event of a cabling problem. DOWN SOURCE SOFTWARE The interface is down in AIX software only. SAN-based communication (SFWCOM) interface Redundant high-speed communication channels can be established between the hosts through the SAN fabric. To use this communication path, you must complete additional setup 1 14 http://tools.ietf.org/html/rfc3171 IBM PowerHA SystemMirror 7.1 for AIX for the Fibre Channel (FC) adapters. Configure the server FC ports in the same zone of the SAN fabric, and set their Target Mode Enable (tme) attribute to yes. Then enable the dynamic tracking and fast fail. The SAS adapters do not require special setup. Based on this setup, the CAA Storage Framework provides a SAN-based heartbeat. This heartbeat is an effective replacement for all the non-IP heartbeat mechanisms used in earlier releases. Enabling SAN fiber communication: To enable SAN fiber communication for cluster communication, you must configure the Target Mode Enable attribute for FC adapters. See Example 4-4 on page 57 for details. Configure your cluster in an environment that supports SAN fabric-based communication. This approach provides another channel of redundancy to help reduce the risk of getting a partitioned (split) cluster. The Virtual SCSI (VSCSI) SAN heartbeat depends on VIOS 2.2.0.11-FP24 SP01. Interface state The SAN-based communication (SFWCOM) interface has one state available, the UP state. The UP state indicates that the SFWCOM interface is active. You can see the interface state in the output of the lscluster -i command as shown in Example 1-2 on page 16. Unavailable SAN fiber communication: When SAN fiber communication is unavailable, the SFWCOM section is not listed in the output of the lscluster -i command. A DOWN state is not shown. Central cluster repository-based communication (DPCOM) interface Heartbeating and other cluster messaging are also achieved through the central repository disk. The repository disk is used as another redundant path of communication between the nodes. A portion of the repository disk is reserved for node-to-node heartbeat and message communication. This form of communication is used when all other forms of communication have failed. The CAA Storage Framework provides a heartbeat through the repository disk, which is only used when IP or SAN heartbeating no longer works. When the underlying hardware infrastructure is available, you can proceed with the PowerHA cluster topology configuration. The heartbeat starts right after the first successful “Verify and Synchronization” operation, when the CAA cluster is created and activated by the PowerHA. Interface states The Central cluster repository-based communication (DPCOM) interface has the following available states. You can see the interface state in the output of the lscluster -i command, which is shown in Example 1-2. UP AIX_CONTROLLED Indicates that the interface is UP, but under AIX control. The user cannot change the status of this interface. UP RESTRICTED AIXCONTROLLED Indicates that the interface is UP and under AIX system control, but is RESTRICTED from monitoring mode. STALE The interface configuration data is stale. This state occurs when communication is lost, but was up previously at some point. Chapter 1. PowerHA SystemMirror architecture foundation 15 Output of the lscluster -i command Example 1-2 shows the output from the lscluster -i command. The output shows the interfaces and the interface states as explained in the previous sections. Example 1-2 The lscluster -i output for one node lscluster -i Network/Storage Interface Query Cluster Name: au_cl Cluster uuid: d77ac57e-cc1b-11df-92a4-00145ec5bf9a Number of nodes reporting = 2 Number of nodes expected = 2 Node sydney Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a Number of interfaces discovered = 4 Interface number 1 en1 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.c5.bf.9a Smoothed rrt across interface = 8 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 110 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmas k 255.255.252.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netm ask 0.0.0.0 Interface number 2 en2 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.c5.bf.9b Smoothed rrt across interface = 8 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 110 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255 netmas k 255.255.252.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netm ask 0.0.0.0 Interface number 3 sfwcom ifnet type = 0 ndd type = 304 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 0 Mean Deviation in network rrt across interface = 0 16 IBM PowerHA SystemMirror 7.1 for AIX Probe interval for interface = 100 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP Interface number 4 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 750 Mean Deviation in network rrt across interface = 1500 Probe interval for interface = 22500 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED Node perth Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182 Number of interfaces discovered = 4 Interface number 1 en1 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.e7.25.d9 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255 netmas k 255.255.252.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netm ask 0.0.0.0 Interface number 2 en2 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.e7.25.d8 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255 netmas k 255.255.252.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netm ask 0.0.0.0 Interface number 3 sfwcom ifnet type = 0 ndd type = 304 Mac address length = 0 Mac address = 0.0.0.0.0.0 Chapter 1. PowerHA SystemMirror architecture foundation 17 Smoothed rrt across interface = 0 Mean Deviation in network rrt across interface = 0 Probe interval for interface = 100 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP Interface number 4 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 750 Mean Deviation in network rrt across interface = 1500 Probe interval for interface = 22500 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED The RESTRICTED and AIX_CONTROLLED interface state When the network and storage interfaces in the cluster are active and available, the cluster repository disk appears as restricted and controlled by AIX. (The restricted term identifies the disk as “not currently used.”) In the output from the lscluster commands, the term dpcom is used for the cluster repository disk as a communication device and is initially noted as UP RESTRICTED AIX_CONTROLLED. When the system determines that the node has lost the normal network or storage interfaces, the system activates (unrestrict) the cluster repository disk interface (dpcom) and begins using it for communications. At this point, the interface state changes to UP AIX_CONTROLLED (unrestricted, but still system controlled). Point of contact The output of the lscluster -m command shows a reference to a point of contact as shown in Example 1-3 on page 19. The local node is displayed as N/A, and the remote node is displayed as en0 UP. CAA monitors the state and points of contact between the nodes for both communication interfaces. A point of contact indicates that a node has received a packet from the other node over the interface. The point-of-contact status UP indicates that the packet flow is continuing. The point-of-contact monitor tracks the number of UP points of contact for each communication interface on the node. If this count reaches zero, the interface is marked as reachable through the cluster repository disk only. 1.3.2 Communication node status The node communication status is indicated by the State of Node value in the lscluster -m command output (Example 1-3 on page 19). The cluster node can have the following communication states: UP Indicates that the node is up. UP NODE_LOCAL Indicates that the node is up and is the local node in the cluster. UP NODE_LOCAL REACHABLE THROUGH REPOS DISK ONLY Indicates that the local node is up, but that it is reachable through the repository disk only. 18 IBM PowerHA SystemMirror 7.1 for AIX When a node can only communicate by using the cluster repository disk, the output from the lscluster command notes it as REACHABLE THROUGH REPOS DISK ONLY. When the normal network or storage interfaces become available again, the system automatically detects the restoration of communication interfaces, and again places dpcom in the restricted state. See “The RESTRICTED and AIX_CONTROLLED interface state” on page 18. UP REACHABLE THROUGH REPOS DISK ONLY Indicates that the local node is up. It is reachable through the repository disk only, but not through a local node. DOWN Indicates that the node is down. If the node does not have access to the cluster repository disk, the node is marked as down. Example 1-3 The lscluster -m output lscluster -m Calling node query for all nodes Node query number of nodes examined: 2 Node name: chile Cluster shorthand id for node: 1 uuid for node: 7067c3fa-ca95-11df-869b-a2e310452004 State of node: UP NODE_LOCAL Smoothed rtt to node: 0 Mean Deviation in network rtt to node: 0 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID newyork local 5f2f5d38-cd78-11df-b986-a2e310452003 Number of points_of_contact for node: 0 Point-of-contact interface & contact state n/a -----------------------------Node name: serbia Cluster shorthand id for node: 2 uuid for node: 8a5e2768-ca95-11df-8775-a2e312537404 State of node: UP Smoothed rtt to node: 7 Mean Deviation in network rtt to node: 3 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID newyork local 5f2f5d38-cd78-11df-b986-a2e310452003 Number of points_of_contact for node: 1 Point-of-contact interface & contact state en0 UP Chapter 1. PowerHA SystemMirror architecture foundation 19 Interface up, point of contact down: This phrase means that an interface might be up but a point-of-contact might be down. In this state, no packets are received from the other node. 1.3.3 Considerations for the heartbeat configuration In previous versions of PowerHA, the heartbeat configuration was necessary to configure a non-IP heartbeat configuration, such as disk-based heartbeating. PowerHA 7.1 no longer supports disk-based heartbeat monitoring. CAA uses all available interfaces to perform heartbeat monitoring, including the Repository Disk-based and SAN fiber heartbeat monitoring methods. Both types of heartbeat monitoring are replacements for the previous non-IP heartbeat configuration. The cluster also performs heartbeat monitoring, similar to how it use to perform it, across all available network interfaces. Heartbeat monitoring is performed by sending and receiving gossip packets across the network with the multicast protocol. CAA uses heartbeat monitoring to determine communication problems that need to be reflected in the cluster information. 1.3.4 Deciding when a node is down: Round-trip time (rtt) CAA monitors the interfaces of each node by using the multicast protocol and gossip packets. Gossip packets are periodically sent from each node in the cluster for timing purposes. These gossip packets are automatically replied to by the other nodes of the cluster. The packet exchanges are used to calculate the round-trip time. The round-trip time value is shown in the output of the lscluster -i and lscluster -m commands. The mean deviation in network rtt is the average round-trip time, which is automatically managed by CAA. Unlike previous versions of PowerHA and HACMP, no heartbeat tuning is necessary. See Example 1-2 on page 16 and Figure 1-11 for more information. Smoothed rtt to node:7 Mean Deviation in network rtt to node: 3 Figure 1-11 Extract from the lscluster -m command output showing the rtt values Statistical projections are directly employed to compute node-down events. By using normal network dropped packet rates and the projected round-trip times with mean deviations, the cluster can determine when a packet was lost or not sent. Each node monitors the time when a response is due from other nodes in the cluster. If a node finds that a node is overdue, a node down protocol is initiated in the cluster to determine if the node is down or if network isolation has occurred. This algorithm is self-adjusting to load and network conditions, providing a highly reliable and scalable cluster. Expected round-trip times and variances rise quickly when load conditions cause delays. Such delays cause the system to wait longer before setting a node down state. Such a state provides for a high probability of valid state information. (Quantitative probabilities of errors can be computed.) Conversely, expected round-trip times and variances fall quickly when delays return to normal. The cluster automatically adjusts to variances in latency and bandwidth characteristics of various network and storage interfaces. 20 IBM PowerHA SystemMirror 7.1 for AIX 1.4 PowerHA 7.1 SystemMirror plug-in for IBM Systems Director PowerHA SystemMirror provides a plug-in to IBM Systems Director, giving you a graphical user interface to manage a cluster. This topic includes the following sections: Introduction to IBM Systems Director Advantages of using IBM Systems Director Basic architecture 1.4.1 Introduction to IBM Systems Director IBM Systems Director provides systems management personnel with a single-point-of-control, helping to reduce IT management complexity and cost. With IBM Systems Director, IT personnel can perform the following tasks: Optimize computing and network resources Quickly respond to business requirements with greater delivery flexibility Attain higher levels of services management with streamlined management of physical, virtual, storage, and network resources A key feature of IBM Systems Director is a consistent user interface with a focus on driving common management tasks. IBM Systems Director provides a unified view of the total IT environment, including servers, storage, and network. With this view, users can perform tasks with a single tool, IBM Systems Director. 1.4.2 Advantages of using IBM Systems Director IBM Systems Director offers the following advantages: A single, centralized view into all PowerHA SystemMirror clusters – Centralized and secure access point Everyone logs in to the same machine, simplifying security and providing an audit trail of user activities. – Single sign-on (SSO) capability After the initial setup is done, using standard Director mechanisms, the password of each individual node being managed no longer must be provided. Customers log in to the Director server by using their account on that one machine only and have access to all PowerHA clusters under management by that server. Two highly accessible interfaces – Graphical interface The GUI helps to explain and show relationships. It also guides customers through the learning phase, improving their chances of success with Systems Director. • • • • Instant and nearly instant help for just about everything Maximum, interactive assistance with many tasks Maximum error checking SystemMirror enterprise health summary Chapter 1. PowerHA SystemMirror architecture foundation 21 – Textual interface As with all IBM Systems Director plug-ins, the textual interface (also known as the CLI) is available through the smcli utility of IBM Systems Director. The namespace (which is not needed) is sysmirror, for example smcli sysmirror help. • • Maximum speed Centralized, cross-cluster scripting A common, IBM unified interface (learn once, manage many) More IBM products are now plugging into Systems Director. Although each individual plug-in is different, the common framework around each on remains the same, reducing the education burden of customers. Another benefit is in the synergies that might be used by having multiple products all sharing a common data store on the IBM Systems Director server. To learn more about the advantages of IBM Systems Director, see the PowerHA 7.1 presentation by Peter Schenke at: http://www-05.ibm.com/ch/events/systems/pdf/6_PowerHA_7_1_News.pdf 1.4.3 Basic architecture Figure 1-12 shows the basic architecture of IBM Systems Director for PowerHA. IBM Systems Director is used to quickly and easily scan subnets to find and load AIX systems. When these systems are unlocked (when the login ID and password are provided), and if PowerHA is installed on any of these systems, they are automatically discovered and loaded by the plug-ins. Three-tier architecture provides scalability: User Interface Management Server Director Agent User Interface Director Agent Web-based interface Command-line interface Automatically installed on AIX 7.1 & AIX V6.1 TL06 AIX P D P D P D P D PowerHA Director Agent Secure communication P D P D P D Director Server Discovery of clusters and resources Figure 1-12 Basic architecture of IBM Systems Director for PowerHA 22 IBM PowerHA SystemMirror 7.1 for AIX Central point of control Supported on AIX, Linux, and Windows Agent manager 2 Chapter 2. Features of PowerHA SystemMirror 7.1 This chapter explains which previously supported features of PowerHA SystemMirror have been removed. It also provides information about the new features in PowerHA SystemMirror Standard Edition 7.1 for AIX. This chapter includes the following topics: Deprecated features New features Changes to the SMIT panel The rootvg system event Resource management enhancements CLUSTER_OVERRIDE environment variable CAA disk fencing PowerHA SystemMirror event flow differences © Copyright IBM Corp. 2011. All rights reserved. 23 2.1 Deprecated features PowerHA SystemMirror 7.1 has removed support for the following previously available features: IP address takeover (IPAT) via IP replacement Locally administered address (LAA) for hardware MAC address takeover (HWAT) Heartbeat over IP aliases clcomdES with the /usr/es/sbin/cluster/etc/rhosts directory is replaced by the Cluster Aware AIX (CAA) clcomd with the /etc/cluster/rhosts directory The following IP network types: – ATM – FDDI – Token Ring The following point-to-point (non-IP) network types: – – – – – RS232 TMSCSI TMSSA Disk heartbeat (diskhb) Multi-node disk heartbeat (mndhb) Two-node configuration assistant WebSMIT (replaced with the IBM Systems Director plug-in) Site support in this version – Cross-site Logical Volume Manager (LVM) mirroring (available in PowerHA 7.1 SP3) IPV6 support in this version IP address takeover via IP aliasing is now the only supported IPAT option. SAN heartbeat, provided by the CAA repository disk, and FC heartbeat, as described in the following section, have replaced all point-to-point (non-IP) network types. 2.2 New features The new version of PowerHA uses much simpler heartbeat management. This method uses multicasting, which reduces the burden on the customer to define aliases for heartbeat monitoring. By default, it supports dual communication paths for most data center deployments by using both the IP network and the SAN connections (available in 7.1 SP3 and later). These communication paths are done through the CAA and the central repository disk. PowerHA SystemMirror 7.1 introduces the following features: SMIT panel enhancements The rootvg system event Systems Director plug-in Resource management enhancements – StartAfter – StopAfter User-defined resource type 24 IBM PowerHA SystemMirror 7.1 for AIX Dynamic node priority: Adaptive failover Additional disk fencing by CAA New Smart Assists for the following products: – – – – – SAP NetWeaver 7.0 (2004s) SR3 IBM FileNet® 4.5.1 IBM Tivoli Storage Manager 6.1 IBM Lotus® Domino® Server SAP MaxDB v7.6 and 7.7 The clmgr tool The clmgr tool is the new command-line user interface (CLI) with which an administrator can use a uniform interface to deploy and maintain clusters. For more information, see 5.2, “Cluster configuration using the clmgr tool” on page 104. 2.3 Changes to the SMIT panel PowerHA SystemMirror 7.1 includes several changes to the SMIT panel since the release of PowerHA 6.1. This topic focuses on the most used items on the panel and not the technical changes behind these items. These changes can help experienced system administrators to quickly find the paths to the functions they need to implement in their new clusters. In PowerHA SystemMirror 7.1, the SMIT panel has the following key changes: Separation of menus by function Addition of the Custom Cluster Configuration menu Removal of Extended Distance menus from the base product Removal of unsupported dialogs or menus Changes to some terminology New dialog for specifying repository and cluster IP address Many changes in topology and resource menus 2.3.1 SMIT tree The SMIT tree offers several changes that make it easier for system administrators to find the task they want to perform. For an overview of these changes, see Appendix B, “PowerHA SMIT tree” on page 483. To access a list of the SMIT tree and available fast paths, use the smitty path: smitty hacmp Can't find what you are looking for ?. Chapter 2. Features of PowerHA SystemMirror 7.1 25 2.3.2 The smitty hacmp command Figure 2-1 shows the SMIT screens that you see when you use the smitty hacmp command or the path: smitty Communications Applications and Services PowerHA SystemMirror. It compares PowerHA 5.5, PowerHA 6.1, and PowerHA SystemMirror 7.1. Figure 2-1 The screens shown after running the smitty hacmp command In PowerHA SystemMirror 7.1, the smitty sysmirror (or smit sysmirror) command provides a new fast path to the PowerHA start menu in SMIT. The old fast path (smitty hacmp) is still valid. 26 IBM PowerHA SystemMirror 7.1 for AIX Figure 2-2 shows, in more detail, where some of the main functions moved to. Minor changes have been made to the following paths, which are not covered in this Redbooks publication: System Management (C-SPOC) Problem Determination Tools Can’t find what you are looking for ? Not sure where to start ? The “Initialization and Standard Configuration” path has been split into two paths: Cluster Nodes and Networks and Cluster Applications and Resources. For more details about these paths, see 2.3.4, “Cluster Standard Configuration menu” on page 29. Some features for the Extended Configuration menu have moved to the Custom Cluster Configuration menu. For more details about custom configuration, see 2.3.5, “Custom Cluster Configuration menu” on page 30. smitty sysmirror Figure 2-2 PowerHA SMIT start panel Chapter 2. Features of PowerHA SystemMirror 7.1 27 2.3.3 The smitty clstart and smitty clstop commands The SMIT screens to start and stop a cluster did not change, and the fast path is still the same. Figure 2-3 shows the Start Cluster Services panels for PowerHA versions 5.5, 6.1, and 7.1. Although the SMIT path did not change, some of the wording has changed. For example, the word “HACMP” was replaced with “Cluster Services.” The path with the new wording is smitty hacmp System Management (C-SPOC) PowerHA SystemMirror Services, and then you select either the “Start Cluster Services” or “Stop Cluster Services” menu. Figure 2-3 The screens that are shown when running the smitty clstart command 28 IBM PowerHA SystemMirror 7.1 for AIX 2.3.4 Cluster Standard Configuration menu In previous versions, the “Cluster Standard Configuration” menu was called the “Initialization and Standard Configuration” menu. This menu is now split into the following menu options as indicated in Figure 2-2 on page 27: Cluster Nodes and Networks Cluster Applications and Resources This version has a more logical flow. The topology configuration and management part is in the “Cluster Nodes and Networks” menu. The resources configuration and management part is in the “Cluster Applications and Resources” menu. Figure 2-4 shows some tasks and where they have moved to. The dotted line shows where Smart Assist was relocated. The Two-Node Cluster Configuration Assistant no longer exists. Figure 2-4 Cluster standard configuration Chapter 2. Features of PowerHA SystemMirror 7.1 29 2.3.5 Custom Cluster Configuration menu The “Custom Cluster Configuration” menu is similar to the “Extended Configuration” menu in the previous release. Unlike the “Extended Configuration” menu, which contains entries that were duplicated from the standard menu path, the “Custom Cluster Configuration” menu in PowerHA SystemMirror 7.1 does not contain these duplicate entries. Figure 2-5 shows an overview of where some of the functions have moved to. The Custom Cluster Configuration menu is shown in the upper-right corner, and the main PowerHA SMIT menu is shown in the lower-right corner. Figure 2-5 Custom Cluster Configuration menu 30 IBM PowerHA SystemMirror 7.1 for AIX 2.3.6 Cluster Snapshot menu The content of the Cluster Snapshot menu did not change compared to PowerHA 6.1 (Figure 2-6). However, the path to this menu has changed to smitty sysmirror Cluster Nodes and Networks Manage the Cluster Snapshot Configuration. Snapshot Configuration Move cursor to desired item and press Enter. Create a Snapshot of the Cluster Configuration Change/Show a Snapshot of the Cluster Configuration Remove a Snapshot of the Cluster Configuration Restore the Cluster Configuration From a Snapshot Configure a Custom Snapshot Method Figure 2-6 Snapshot Configuration menu 2.3.7 Configure Persistent Node IP Label/Address menu The content of the SMIT panel to add or change a persistent IP address did not change compared to PowerHA 6.1 (Figure 2-7). However, the path to it changed to smitty hacmp Cluster Nodes and Networks Manage Nodes Configure Persistent Node IP Label/Addresses. Configure Persistent Node IP Label/Addresses Move cursor to desired item and press Enter. Add a Persistent Node IP Label/Address Change/Show a Persistent Node IP Label/Address Remove a Persistent Node IP Label/Address Figure 2-7 Configure Persistent Node IP Label/Addresses menu 2.4 The rootvg system event PowerHA SystemMirror 7.1 introduces system events. These events are handled by a new subsystem called clevmgrdES. The rootvg system event allows for the monitoring of loss of access to the rootvg volume group. By default, in the case of loss of access, the event logs an entry in the system error log and reboots the system. If required, you can change this option in the SMIT menu to log only an event entry and not to reboot the system. For further details about this event and a test example, see 9.4.1, “The rootvg system event” on page 286. Chapter 2. Features of PowerHA SystemMirror 7.1 31 2.5 Resource management enhancements PowerHA SystemMirror 7.1 offers the following new resource and resource group configuration choices. They provide more flexibility in administering resource groups across the various nodes in the cluster. Start After and Stop After resource group dependencies User-defined resource type Adaptive failover 2.5.1 Start After and Stop After resource group dependencies The previous version of PowerHA has the following types of resource group dependency runtime policies: Parent-child Online on the Same Node Online on Different Nodes Online On Same Site Location These policies are insufficient for supporting some complex applications. For example, the FileNet application server must be started only after its associated database is started. It does not need to be stopped if the database is brought down for some time and then started. The following dependencies have been added to PowerHA: Start After dependency Stop After dependency The Start After and Stop After dependencies use source and target resource group terminology. The source resource group depends on the target resource group as shown in Figure 2-8. db_rg Target Start After Source app_rg Figure 2-8 Start After resource group dependency 32 IBM PowerHA SystemMirror 7.1 for AIX For Start After dependency, the target resource group must be online on any node in the cluster before a source (dependent) resource group can be activated on a node. Resource groups can be released in parallel and without any dependency. Similarly, for Stop After dependency, the target resource group must be offline on any node in the cluster before a source (dependent) resource group can be brought offline on a node. Resource groups are acquired in parallel and without any dependency. A resource group can serve as both a target and a source resource group, depending on which end of a given dependency link it is placed. You can specify three levels of dependencies for resource groups. You cannot specify circular dependencies between resource groups. A Start After dependency applies only at the time of resource group acquisition. During a resource group release, these resource groups do not have any dependencies. A Start After source resource group cannot be acquired on a node until its target resource group is fully functional. If the target resource group does not become fully functional, the source resource group goes into an OFFLINE DUE TO TARGET OFFLINE state. If you notice that a resource group is in this state, you might need to troubleshoot which resources need to be brought online manually to resolve the resource group dependency. When a resource group in a Start After target role falls over from one node to another, the resource groups that depend on it are unaffected. After the Start After source resource group is online, any operation (such as bring offline or move resource group) on the target resource group does not affect the source resource group. A manual resource group move or bring resource group online on the source resource group is not allowed if the target resource group is offline. A Stop After dependency applies only at the time of a resource group release. During resource group acquisition, these resource groups have no dependency between then. A Stop After source resource group cannot be released on a node until its target resource group is offline. When a resource group in a Stop After source role falls over from one node to another, its related target resource group is released as a first step. Then the source (dependent) resource group is released. Next, both resource groups are acquired in parallel, assuming that no start after or tparent-child dependency exists between these resource groups. A manual resource group move or bring resource group offline on the Stop After source resource group is not allowed if the target resource group is online. Summary: In summary, the source Start After and Stop After target resource groups have the following dependencies: Source Start After target: The source is brought online after the target resource group. Source Stop After target: The source is brought offline after the target resource group. Chapter 2. Features of PowerHA SystemMirror 7.1 33 A parent-child dependency can be seen as being composed of two parts with the newly introduced Start After and Stop After dependencies. Figure 2-9 shows this logical equivalence. Figure 2-9 Comparing Start After, Stop After, and parent-child resource group (rg) dependencies If you configure a Start After dependency between two resource groups in your cluster, the applications in these resource groups are started in the configured sequence. To ensure that this process goes smoothly, configure application monitors and use a Startup Monitoring mode for the application included in the target resource group. For a configuration example, see 5.1.6, “Configuring Start After and Stop After resource group dependencies” on page 96. 2.5.2 User-defined resource type With PowerHA, you can add your own resource types and specify management scripts to configure how and where PowerHA processes the resource type. You can then configure a user-defined resource instance for use in a resource group. A user-defined resource type is one that you can define for a customized resource that you can add to a resource group. A user-defined resource type contains several attributes that describe the properties of the instances of the resource type. When you create a user-defined resource type, you must choose processing order among existing resource types. PowerHA SystemMirror processes the user-defined resources at the beginning of the resource acquisition order if you choose the FIRST value. If you chose any other value, for example, VOLUME_GROUP, the user-defined resources are acquired after varying on the volume groups. Then they are released before varying off the volume groups. These resources are existing resource types. You can choose from a pick list in the SMIT menu. 34 IBM PowerHA SystemMirror 7.1 for AIX Figure 2-10 shows the existing resource type and acquisition or release order. A user-defined resource can be any of the following types: FIRST WPAR SERVICEIP TAPE (DISKS) VOLUME_GROUP FILE_SYSTEM APPLICATION DISK FILE SYSTEM Userdefined Resource Acquisition Order Release Order RSCT with CAA SERVICE IP APPLICATION Figure 2-10 Processing order of the resource type 2.5.3 Dynamic node priority: Adaptive failover The framework for dynamic node priority is already present in the previous versions of PowerHA. This framework determines the takeover node at the time of a failure according to one of the following policies: cl_highest_free_mem cl_highest_idle_cpu cl_lowest_disk_busy The cluster manager queries the Resource Monitoring and Control (RMC) subsystem every 3 minutes to obtain the current value of these attributes on each node. Then the cluster manager distributes them cluster-wide. For an architecture overview of PowerHA and RSCT, see 1.1.3, “PowerHA and RSCT” on page 5. Chapter 2. Features of PowerHA SystemMirror 7.1 35 The dynamic node priority feature is enhanced in PowerHA SystemMirror 7.1 to support the following policies: cl_lowest_nonzero_udscript_rc cl_highest_udscript_rc The return code of a user-defined script is used in determining the destination node. When you select one of the criteria, you must also provide values for the DNP script path and DNP timeout attributes for a resource group. PowerHA executes the supplied script and collects the return codes from all nodes. If you choose the cl_highest_udscript_rc policy, collected values are sorted. The node that returned the highest value is selected as a candidate node to fall over. Similarly, if you choose the cl_lowest_nonzero_udscript_rc policy, collected values are sorted. The node that returned lowest nonzero positive value is selected as a candidate takeover node. If the return value of the script from all nodes is the same or zero, the default node priority is considered. PowerHA verifies the script existence and the execution permissions during verification. Time-out value: When you select a time-out value, ensure that it is within the time period for running and completing a script. If you do not specify a time-out value, a default value equal to the config_too_long time is specified. For information about configuring the dynamic node priority, see 5.1.8, “Configuring the dynamic node priority (adaptive failover)” on page 102. 2.6 CLUSTER_OVERRIDE environment variable In PowerHA SystemMirror 7.1, the use of several AIX commands on cluster resources can potentially impair the integrity of the cluster configuration. PowerHA SystemMirror 7.1 provides C-SPOC versions of these functions, which are safer to use in the cluster environment. You can avoid this usage by using the commands outside of C-SPOC. By default, it is set to allow the use of these commands outside of C-SPOC. To restrict people from using these commands in the command line, you can change the default value from yes to no: 1. Locate the following line in the /etc/environment file: CLUSTER_OVERRIDE=yes 2. Change the line to the following line: CLUSTER_OVERRIDE=no The following commands are affected by this variable: 36 chfs crfs chgroup chlv chpasswd chuser chvg extendlv extendvg importvg mirrorvg IBM PowerHA SystemMirror 7.1 for AIX mkgroup mklv mklvcopy mkuser mkvg reducevg If the CLUSTER_OVERRIDE variable has the value no, you see an error message similar to the one shown in Example 2-1. Example 2-1 Error message when using CLUSTER_OVERRIDE=no # chfs -a size=+1 /home The command must be issued using C-SPOC or the override environment variable must be set. In this case, use the equivalent C-SPOC CLI called cli_chfs. See the C-SPOC man page for more details. Deleting the CLUSTER_OVERRIDE variable: You also see the message shown in Example 2-1 if you delete the CLUSTER_OVERRIDE variable in your /etc/environment file. 2.7 CAA disk fencing CAA introduces another level of disk fencing beyond what PowerHA and gsclvmd provide by using enhanced concurrent volume groups (ECVGs). In previous releases of PowerHA when using ECVGs in a fast disk takeover mode, the volume groups are in full read/write (active) mode on the node owning the resource group. Any standby candidate node has the volume group varied on in read only (passive) mode. The passive state allows only read access to a volume group special file and the first 4 KB of a logical volume. Write access through standard LVM is not allowed. However, low-level commands, such as dd, can bypass LVM and write directly to the disk. The new CAA disk fencing feature prevents writes from any other nodes to the disk device, invalidating the potential for a lower-level operation, such as dd, to succeed. However, any system that has access to that disk might be a member of the CAA cluster. Therefore, its still important to zone the storage appropriately so that only cluster nodes have the disks configured. The PowerHA SystemMirror 7.1 announcement letter explains this fencing feature as a storage framework that is embedded in the operating system to aid in storage device management. As part of the framework, fencing disks or disk groups are supported. Fencing shuts off write access to the shared disks from any entity on the node (irrespective of the privileges associated with the entity trying to access the disk). Fencing is exploited by PowerHA SystemMirror to implement strict controls in regard to shared disks and their access solely from one the nodes that is sharing the disk. Fencing ensures that, when the workload moves to another node for continuing operations, access to the disks on the departing node is turned off for write operations. Chapter 2. Features of PowerHA SystemMirror 7.1 37 2.8 PowerHA SystemMirror event flow differences The event flow process occurs when the PowerHA SystemMirror cluster starts. 2.8.1 Startup processing Start Cluste r servic es In this example, a resource group must be started on a node. The start server is not done until the necessary resources are acquired. Figure 2-11 illustrates the necessary steps to move the acquired resource groups during a node failure. 1)rg_move_acquire lls ca RC clstrmgrES Event Manager cal ls RC process_resources (NONE) for each RG: process_resources (ACQUIRE) process_resources (SERVICE_LABELS) acquire_service_addr acquire_aconn_service en0 net_ether_01 process_resources (DISKS) process_resources (VGS) process_resources (LOGREDO) process_resources (FILESYSTEMS) process_resources (SYNC_VGS) process_resources (TELINIT) process_resources (NONE) < Event Summary > 2) rg_move_complete for each RG: process resources (APPLICATIONS) start_server app01 process_resources (ONLINE) process_resources (NONE) < Event Summary > Figure 2-11 First node starting the cluster services TE_RG_MOVE_ACQUIRE is the SystemMirror event listed in the debug file. The /usr/es/sbin/cluster/events/rg_online.rp recovery program is listed in the HACMP rules Object Data Manager (ODM) file (Example 2-2). Example 2-2 The rg_online.rp file all "rg_move_fence" 0 NULL barrier # all "rg_move_acquire" 0 NULL # barrier # all "rg_move_complete" 0 NULL The following section explains what happens when a subsequent node joins the cluster. 38 IBM PowerHA SystemMirror 7.1 for AIX 2.8.2 Another node joins the cluster When another node starts, it must first join the cluster. If a resource group needs to fall back, then rg_move_release is done. If the resource group fallback is not needed, the rg_move_release is skipped. The numbers indicate the order of the steps. The same number means that parallel processing is taking place. Example 2-3 shows the messages on the process flow. Example 2-3 Debug file showing the process of another node joining the cluster Debug file: [TE_JOIN_NODE_DEP] r [TE_RG_MOVE_ACQUIRE] [TE_JOIN_NODE_DEP_COMPLETE]i cluster.log file node1: Nov 23 00:35:06 AIX: EVENT Nov 23 00:35:06 AIX: EVENT Nov 23 00:35:11 AIX: EVENT Nov 23 00:35:11 AIX: EVENT Nov 23 00:35:11 AIX: EVENT Nov 23 00:35:11 AIX: EVENT Nov 23 00:35:11 AIX: EVENT Nov 23 00:35:11 AIX: EVENT Nov 23 00:35:15 AIX: EVENT Nov 23 00:35:15 AIX: EVENT Nov 23 00:35:18 AIX: EVENT Nov 23 00:35:18 AIX: EVENT START: node_up node2 COMPLETED: node_up node2 0 START: rg_move_fence node1 2 COMPLETED: rg_move_fence node1 2 0 START: rg_move_acquire node1 2 START: rg_move node1 2 ACQUIRE COMPLETED: rg_move node1 2 ACQUIRE 0 COMPLETED: rg_move_acquire node1 2 0 START: rg_move_complete node1 2 COMPLETED: rg_move_complete node1 2 0 START: node_up_complete node2 COMPLETED: node_up_complete node2 0 cluster.log file node2 Nov 23 00:35:06 AIX: EVENT Nov 23 00:35:08 AIX: EVENT Nov 23 00:35:11 AIX: EVENT Nov 23 00:35:11 AIX: EVENT Nov 23 00:35:11 AIX: EVENT Nov 23 00:35:11 AIX: EVENT Nov 23 00:35:11 AIX: EVENT Nov 23 00:35:13 AIX: EVENT Nov 23 00:35:13 AIX: EVENT Nov 23 00:35:13 AIX: EVENT Nov 23 00:35:15 AIX: EVENT Nov 23 00:35:15 AIX: EVENT Nov 23 00:35:15 AIX: EVENT Nov 23 00:35:15 AIX: EVENT Nov 23 00:35:16 AIX: EVENT Nov 23 00:35:16 AIX: EVENT Nov 23 00:35:18 AIX: EVENT Nov 23 00:35:18 AIX: EVENT START: node_up node2 COMPLETED: node_up node2 0 START: rg_move_fence node1 2 COMPLETED: rg_move_fence node1 2 0 START: rg_move_acquire node1 2 START: rg_move node1 2 ACQUIRE START: acquire_service_addr START: acquire_aconn_service en2 appsvc_ COMPLETED: acquire_aconn_service en2 app COMPLETED: acquire_service_addr 0 COMPLETED: rg_move node1 2 ACQUIRE 0 COMPLETED: rg_move_acquire node1 2 0 START: rg_move_complete node1 2 START: start_server appBctrl COMPLETED: start_server appBctrl 0 COMPLETED: rg_move_complete node1 2 0 START: node_up_complete node2 COMPLETED: node_up_complete node2 0 Chapter 2. Features of PowerHA SystemMirror 7.1 39 Figure 2-12 shows the process flow when another node joins the cluster. g nin n u r clstrmgrES Messages Event Manager 1) rg_move_release C R fallback to higher node (see node leaves slide) Nothing RC 2) rg_move_acquire nothing 2) rg_move_acquire ll ca RC call ll ca C R Event Manager t ar St ster s e u Cl rvic e ca s ll 1) rg_move_release clstrmgrES 3)rg_move_complete nothing If no fallback, rg_move_release is not done Same sequence as node 1 up (previous visual) 3) rg_move_complete for each RG: process resources (APPLICATIONS) start_server app02 process_resources (ONLINE) process_resources (NONE) < Event Summary > Figure 2-12 Another node joining the cluster The next section explains what happens when a node leaves the cluster voluntarily. 40 IBM PowerHA SystemMirror 7.1 for AIX 2.8.3 Node down processing normal with takeover In this example, a resource group is on the departing node and must be moved to one of the remaining nodes. Node failure The situation is slightly different if the node on the right fails suddenly. Because a node is not in a position to run any events, the calls to process_resources listed under the right node are not run as shown in Figure 2-13. ning run clstrmgrES p Sto ter s Clu vices ca ll ser clstrmgrES Event Manager Messages Event Manager ll ca C R 1) rg_move_release RC 2) rg_move_acquire RC RC ll ca ll ca nothing 1) rg_move_release for each RG: service address disks for each RG: process_resources (RELEASE) process_resources (APPLICATIONS) stop_server app02 process_resources (FILESYSTEMS) process_resources (VGS) process_resources (SERVICE_LABELS) release_service_addr < Event Summary > 2) rg_move_acquire 3) rg_move_complete start server nothing 3) rg_move_complete Figure 2-13 Node leaving the cluster (stopped) Example 2-4 shows details about the process flow from the clstrmgr.debug file. Example 2-4 clstrmgr.debug file clstrmgr.debug file: [TE_FAIL_NODE_DEP] [TE_RG_MOVE_RELEASE] [TE_RG_MOVE_ACQUIRE] [TE_FAIL_NODE_DEP_COMPLETE] cluster.log file node1 Nov 23 06:24:21 AIX: EVENT COMPLETED: rg_move node1 1 RELEASE 0 Nov 23 06:24:21 AIX: EVENT COMPLETED: rg_move_release node1 1 0 Nov 23 06:24:32 AIX: EVENT START: rg_move_fence node1 1 Nov 23 06:24:32 AIX: EVENT COMPLETED: rg_move_fence node1 1 0 Nov 23 06:24:34 AIX: EVENT START: rg_move_fence node1 2 Nov 23 06:24:34 AIX: EVENT COMPLETED: rg_move_fence node1 2 0 Nov 23 06:24:35 AIX: EVENT START: rg_move_acquire node1 2 Nov 23 06:24:35 AIX: EVENT START: rg_move node1 2 ACQUIRE Nov 23 06:24:35 AIX: EVENT START: acquire_service_addr Nov 23 06:24:36 AIX: EVENT START: acquire_aconn_service en2 appsvc_ Chapter 2. Features of PowerHA SystemMirror 7.1 41 Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov 42 23 23 23 23 23 23 23 23 23 23 23 23 23 23 06:24:36 06:24:36 06:24:36 06:24:38 06:24:41 06:24:41 06:24:41 06:24:41 06:24:42 06:24:42 06:24:49 06:24:49 06:24:51 06:24:51 AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT COMPLETED: acquire_aconn_service en2 app COMPLETED: acquire_service_addr 0 START: acquire_takeover_addr COMPLETED: acquire_takeover_addr 0 COMPLETED: rg_move node1 2 ACQUIRE 0 COMPLETED: rg_move_acquire node1 2 0 START: rg_move_complete node1 2 START: start_server appActrl START: start_server appBctrl COMPLETED: start_server appBctrl 0 COMPLETED: start_server appActrl 0 COMPLETED: rg_move_complete node1 2 0 START: node_down_complete node2 COMPLETED: node_down_complete node2 0 cluster.log node2 Nov 23 06:24:21 AIX: Nov 23 06:24:21 AIX: Nov 23 06:24:21 AIX: Nov 23 06:24:21 AIX: Nov 23 06:24:22 AIX: Nov 23 06:24:24 AIX: Nov 23 06:24:27 AIX: Nov 23 06:24:28 AIX: Nov 23 06:24:29 AIX: Nov 23 06:24:30 AIX: Nov 23 06:24:30 AIX: Nov 23 06:24:30 AIX: Nov 23 06:24:32 AIX: Nov 23 06:24:32 AIX: Nov 23 06:24:34 AIX: Nov 23 06:24:35 AIX: Nov 23 06:24:35 AIX: Nov 23 06:24:35 AIX: Nov 23 06:24:35 AIX: Nov 23 06:24:35 AIX: Nov 23 06:24:41 AIX: Nov 23 06:24:41 AIX: Nov 23 06:24:51 AIX: Nov 23 06:24:52 AIX: EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT START: rg_move_release node1 1 START: rg_move node1 1 RELEASE START: stop_server appActrl START: stop_server appBctrl COMPLETED: stop_server appBctrl 0 COMPLETED: stop_server appActrl 0 START: release_service_addr COMPLETED: release_service_addr 0 START: release_takeover_addr COMPLETED: release_takeover_addr 0 COMPLETED: rg_move node1 1 RELEASE 0 COMPLETED: rg_move_release node1 1 0 START: rg_move_fence node1 1 COMPLETED: rg_move_fence node1 1 0 START: rg_move_fence node1 2 COMPLETED: rg_move_fence node1 2 0 START: rg_move_acquire node1 2 START: rg_move node1 2 ACQUIRE COMPLETED: rg_move node1 2 ACQUIRE 0 COMPLETED: rg_move_acquire node1 2 0 START: rg_move_complete node1 2 COMPLETED: rg_move_complete node1 2 0 START: node_down_complete node2 COMPLETED: node_down_complete node2 0 IBM PowerHA SystemMirror 7.1 for AIX 3 Chapter 3. Planning a cluster implementation for high availability This chapter provides guidance for planning a cluster implementation for high availability with IBM PowerHA SystemMirror 7.1 for AIX. It explains the software, hardware, and storage requirements with a focus on PowerHA 7.1. For more details about planning, consider the following publications: PowerHA for AIX Cookbook, SG24-7739 PowerHA SystemMirror Version 7.1 for AIX Planning Guide, SC23-6758-01 This chapter includes the following topics: Software requirements Hardware requirements Considerations before using PowerHA 7.1 Migration planning Storage Network © Copyright IBM Corp. 2011. All rights reserved. 43 3.1 Software requirements Because PowerHA 7.1 for AIX uses Cluster Aware AIX (CAA) functionality, the following minimum versions of AIX and Reliable Scalable Cluster Technology (RSCT) are required: AIX 6.1 TL6 or AIX 7.1 RSCT 3.1 CAA cluster: PowerHA SystemMirror creates the CAA cluster automatically. You do not manage the CAA configuration or state directly. You can use the cluster commands to view the CAA status directly. Download and install the latest service packs for AIX and PowerHA from IBM Fix Central at: http://www.ibm.com/support/fixcentral 3.1.1 Prerequisite for AIX BOS and RSCT components The following Base Operating System (BOS) components for AIX are required for PowerHA: bos.adt.lib bos.adt.libm bos.adt.syscalls bos.ahafs bos.clvm.enh bos.cluster bos.data bos.net.tcp.client bos.net.tcp.server bos.rte.SRC bos.rte.libc bos.rte.libcfg bos.rte.libcur bos.rte.libpthreads bos.rte.lvm bos.rte.odm cas.agent (required for the IBM Systems Director plug-in) The following file sets on the AIX base media are required: rsct.basic.rte rsct.compat.basic.hacmp rsct.compat.clients.hacmp The appropriate versions of RSCT for the supported AIX releases are also supplied with the PowerHA installation media. 3.2 Hardware requirements The nodes of your cluster can be hosted on any hardware system on which installation of AIX 6.1 TL6 or AIX 7.1 is supported. They can be hosted as a full system partition or inside a logical partition (LPAR). 44 IBM PowerHA SystemMirror 7.1 for AIX The right design methodology can help eliminate network and disk single points of failure (SPOF) by using redundant configurations. Have at least two network adapters connected to different Ethernet switches in the same virtual LAN (VLAN). EtherChannel is supported with PowerHA. Employ dual-fabric SAN connections to the storage subsystems using at least two Fibre Channel (FC) adapters and appropriate multipath drivers. Use Redundant Array of Independent Disks (RAID) technology to protect data from any disk failure. This topic describes the hardware that is supported. 3.2.1 Supported hardware Your hardware, including the firmware and the AIX multipath driver, must be in a supported configuration. For more information about hardware, see Appendix C, “PowerHA supported hardware” on page 491. More information: For a list of the supported FC adapters, see “Setting up cluster storage communication” in the AIX 7.1 Information Center at: http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix. clusteraware/claware_comm_setup.htm See the readme files that are provided with the base PowerHA file sets and the latest service pack. See also the PowerHA SystemMirror 7.1 for AIX Standard Edition Information Center at: http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.doc/doc/base/ powerha.htm The nodes of your cluster can be any system on which the installation of AIX 6.1 TL6 or AIX 7.1 is supported, either as a full system partition or as a logical partition (LPAR). Design methodologies can help eliminate network and disk single points of failure (SPOF) by using redundant configurations. Use at least two network adapters connected to different Ethernet switches in the same virtual LAN (VLAN). (PowerHA also supports the use of EtherChannel.) Similarly, use dual-fabric storage area network (SAN) connections to the storage subsystems with at least two Fibre Channel (FC) adapters and appropriate multipath drivers. Also use Redundant Array of Independent Disks (RAID) technology to protect data from any disk failure. 3.2.2 Requirements for the multicast IP address, SAN, and repository disk Cluster communication requires the use of a multicast IP address. You can specify this address when you create the cluster, or you can have one be generated automatically. The ranges 224.0.0.0–224.0.0.255 and 239.0.0.0–239.255.255.255 are reserved for administrative and maintenance purposes. If multicast traffic is present in the adjacent network, you must ask the network administrator for a multicast IP address allocation. Also, ensure that the multicast traffic that is generated by each of the cluster nodes is properly forwarded by the network infrastructure to any other cluster node. If you use SAN-based heartbeat, you must have zoning setup to ensure connectivity between host FC adapters. You also must activate the Target Mode Enabled parameter on the involved FC adapters. Hardware redundancy at the storage subsystem level is mandatory for the Cluster Repository disk. Logical Volume Manager (LVM) mirroring of the repository disk is not supported. The disk Chapter 3. Planning a cluster implementation for high availability 45 must be at least 1 GB in size and not exceed 10 GB. For more information about supported hardware for the cluster repository disk, see 3.5.1, “Shared storage for the repository disk” on page 48. CAA support: Currently CAA only supports the repository disk Fibre Channel or SAS disks as described in the “Cluster communication” topic in the AIX 7.1 Information Center at: http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix. clusteraware/claware_comm_benifits.htm 3.3 Considerations before using PowerHA 7.1 You must be aware of the following considerations before planning to use PowerHA 7.1: You cannot change the host name in a configured cluster. After the cluster is synchronized, you are unable to change the host name of any of the cluster nodes. Therefore, changing the host name is not supported. You cannot change the cluster name in a configured cluster. After the cluster is synchronized, you are unable to change the name of the cluster. If you want to change the cluster name, you must completely remove and recreate the cluster. You cannot change the repository location or cluster IP address in a configured cluster. After the cluster is synchronized, you are unable to change the repository disk or cluster multicast IP address. To change the repository disk or the cluster multicast IP address, you must completely remove and recreate the cluster. No IPV6 support is available, which is a restriction from the CAA implementation. 3.4 Migration planning Before migrating your cluster, you must be aware of the following considerations: The required software – AIX – Virtual I/O Server (VIOS) Multicast address Repository disk FC heartbeat support All non-IP networks support removed – – – – – RS232 TMSCSI TMSSA Disk heartbeat (DISKHB) Multinode disk heartbeat (MNDHB) IP networks support removed – Asynchronous transfer mode (ATM) – Fiber Distributed Data Interface (FDDI) – Token ring 46 IBM PowerHA SystemMirror 7.1 for AIX IP Address Takeover (IPAT) via replacement support removed Heartbeat over alias support removed Site support not available in this version IPV6 support not available in this version You can migrate from High-Availability Cluster Multi-Processing (HACMP) or PowerHA versions 5.4.1, 5.5, and 6.1 only. If you are running a version earlier than HACMP 5.4.1, you must upgrade to a newer version first. TL6: AIX must be at a minimum version of AIX 6.1 TL6 (6.1.6.0) on all nodes before migration. Use of AIX 6.1 TL6 SP2 or later is preferred. Most migration scenarios require a two-part upgrade. First, you migrate AIX to the minimum version of AIX 6.1 TL6 on all nodes. You must reboot each node after upgrading AIX. Second, you migrate to PowerHA 7.1 by using the offline, rolling, or snapshot scenario as explained in Chapter 7, “Migrating to PowerHA 7.1” on page 151. In addition, keep in mind the following considerations: Multicast address A multicast address is required for communication between the nodes (used by CAA). During the migration, you can specify this address or allow CAA to automatically generate one for you. Discuss the multicast address with your network administrator to ensure that such addresses are allowed on your network. Consider firewalls and routers that might not have this support enabled. CAA repository disk A shared disk that is zoned in and available to all nodes in the cluster is required. This disk is reserved for use by CAA only. VIOS support You can configure a PowerHA 7.1 cluster on LPARs that are using resources provided by a VIOS. However, the support of your CAA repository disk has restrictions. Support for vSCSI: CAA repository disk support for virtual SCSI (vSCSI) is officially introduced in AIX 6.1 TL6 SP2 and AIX 7.1 SP2. You can create a vSCSI disk repository at AIX 6.1 TL6 base levels, but not at SP1. Alternatively, direct SAN connection logical unit numbers (LUNs) or N_Port ID Virtualization (NPIV) LUNs are supported with all versions. SAN heartbeat support One of the new features of PowerHA 7.1 is the ability to use the SAN fabric for another communications route between hosts. This feature is implemented through CAA and replaces Non-IP support in previous versions. Adapters for SAN heartbeat: This feature requires 4 GB or 8 GB adapters, which must be direct attach or virtualized. If the adapters are virtualized as vSCSI through VIOS or by using NPIV, VIOS 2.2.0.11-FP24 SP01 is required. Chapter 3. Planning a cluster implementation for high availability 47 Heartbeat support for non-IP configurations (such as disk heartbeat) Disk-based heartbeat, MNDHB, RS232, TMSCSI, and TMSSA are no longer supported configurations with PowerHA 7.1. When you migrate, be aware that you cannot keep these configurations. When the migration is completed, these definitions are removed from the Object Data Manager (ODM). As an alternative, PowerHA 7.1 uses SAN-based heartbeat, which is configured automatically when you migrate. Removal of existing network hardware support FDDI, ATM, and token ring are no longer supported. You must remove this hardware before you begin the migration. IPAT via IP replacement IPAT via IP replacement for address takeover is no longer supported. You must remove this configuration before you begin the migration. Heartbeat over aliases Configurations using heartbeat over aliases are no longer supported. You must remove this configuration before you begin the migration. PowerHA SystemMirror for AIX Enterprise Edition (PowerHA/XD) configurations The latest version of PowerHA/XD is 6.1. You cannot migrate this version to PowerHA 7.1. 3.5 Storage This section provides details about storage planning considerations for high availability of your cluster implementation. 3.5.1 Shared storage for the repository disk You must dedicate a shared disk with a minimum size of 1 GB as a central repository for the cluster configuration data of CAA. For this disk, configure intrinsic data redundancy by using hardware RAID features of the external storage subsystems. For additional information about the shared disk, see the PowerHA SystemMirror Version 7.1 for AIX Standard Edition Concepts and Facilities Guide, SC23-6751. See also the PowerHA SystemMirror Version 7.1 announcement information or the PowerHA SystemMirror Version 7.1 for AIX Standard Edition Planning Guide, SC23-6758-01, for a complete list of supported devices. The following disks are supported (through Multiple Path I/O (MPIO)) for the repository disk: All FC disks that configure as MPIO IBM DS8000, DS3000, DS4000®, DS5000, XIV®, ESS800, SAN Volume Controller (SVC) EMC: Symmetrix, DMX, CLARiiON HDS: 99XX, 96XX, OPEN series IBM System Storage N series/NetApp: All models of N series and all NetApp models common to N series VIOS vSCSI All IBM serial-attached SCSI (SAS) disks that configure as MPIO SAS storage 48 IBM PowerHA SystemMirror 7.1 for AIX The following storage types are known to work with MPIO but do not have a service agreement: HP SUN Compellent 3PAR LSI Texas Memory Systems Fujitsu Toshiba Support for third-party multipathing software: At the time of writing, some third-party multipathing software was not supported. 3.5.2 Adapters supported for storage communication At the time of this writing, only the 4 GB and 8 GB FC adapters are supported. Also the daughter card for IBM System p blades and Emulex FC adapters are supported. See PowerHA SystemMirror Version 7.1 for AIX Standard Edition Planning Guide, SC23-6758-01, for additional information. The following FC and SAS adapters are supported for connection to the repository disk: 4 GB Single-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 1905; CCIN 1910) 4 GB Single-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 5758; CCIN 280D) 4 GB Single-Port Fibre Channel PCI-X Adapter (FC 5773; CCIN 5773) 4 GB Dual-Port Fibre Channel PCI-X Adapter (FC 5774; CCIN 5774) 4 Gb Dual-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 1910; CCIN 1910) 4 Gb Dual-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 5759; CCIN 5759) 8 Gb PCI Express Dual Port Fibre Channel Adapter (FC 5735; CCIN 577D) 8 Gb PCI Express Dual Port Fibre Channel Adapter 1Xe Blade (FC 2B3A; CCIN 2607) 3 Gb Dual-Port SAS Adapter PCI-X DDR External (FC 5900 and 5912; CCIN 572A) More information: For the most current list of supported storage adapters for shared disks other than the repository disk, contact your IBM representative. Also see the “IBM PowerHA SystemMirror for AIX” web page at: http://www.ibm.com/systems/power/software/availability/aix/index.html The PowerHA software supports the following disk technologies as shared external disks in a highly available cluster: SCSI drives, including RAID subsystems FC adapters and disk subsystems Data path devices (VPATH): SDD 1.6.2.0, or later Virtual SCSI (vSCSI) disks Support for vSCSI: CAA repository disk support for vSCSI is officially introduced in AIX 6.1 TL6 SP2 and AIX 7.1 SP2. You can create a vSCSI disk repository at AIX 6.1 TL6 base levels, but not at SP1. Alternatively, direct SAN connection LUNs or NPIV LUNs are supported with all versions. You can combine these technologies within a cluster. Before choosing a disk technology, review the considerations for configuring each technology as described in the following section. Chapter 3. Planning a cluster implementation for high availability 49 3.5.3 Multipath driver AIX 7.1 does not support the IBM Subsystem Device Driver (SDD) for TotalStorage Enterprise Storage Server®, the IBM System Storage DS8000, and the IBM System Storage SAN Volume Controller. Instead, you can use the IBM Subsystem Device Driver Path Control Module (SDDPCM) or native AIX MPIO Path Control Module (PCM) for multipath support on AIX7.1. AIX MPIO is an architecture that uses PCMs. The following PCMs are all supported: SDDPCM HDLM PCM AIXPCM SDDPCM only supports DS6000™, DS8000, SVC, and some models of DS4000. HDLM PCM only supports Hitachi storage devices. AIXPCM supports all storage devices that System p servers and VIOS support. AIXPCM supports storage devices from over 25 storage vendors. Support for third-party multipath drivers: At the time of writing, other third-party multipath drivers (such as EMC PowerPath, and Veritas) are not supported. This limitation is planned to be resolved in a future release. See the “Support Matrix for Subsystem Device Driver, Subsystem Device Driver Path Control Module, and Subsystem Device Driver Device Specific Module” at: http://www.ibm.com/support/docview.wss?rs=540&uid=ssg1S7001350 Also check whether the coexistence of different multipath drivers using different FC ports on the same system is supported for mixed cases. For example, the cluster repository disk might be a on storage or FC adapter other than the shared data disks. 3.5.4 System Storage Interoperation Center To check the compatibility of your particular storage and SAN infrastructure with PowerHA, see the System Storage Interoperation Center (SSIC) site at: http://www.ibm.com/systems/support/storage/config/ssic 3.6 Network The networking requirements for PowerHA SystemMirror 7.1 differ from all previous versions. This section focuses specifically on the differences of the following requirements: Multicast address Network interfaces Subnetting requirements for IPAT via aliasing Host name and node name Other network considerations – Single adapter networks – Virtual Ethernet (VIOS) IPv6: IPv6 is not supported in PowerHA SystemMirror 7.1. For additional information, and details about common features between versions, see the PowerHA for AIX Cookbook, SG24-7739. 50 IBM PowerHA SystemMirror 7.1 for AIX 3.6.1 Multicast address The CAA functionality in PowerHA SystemMirror 7.1 employs multicast addressing for heartbeating. Therefore, the network infrastructure must handle and allow the use of multicast addresses. If multicast traffic is present in the adjacent network, you must ask the network administrator for a multicast IP address allocation. Also, ensure that the multicast traffic generated by each of the cluster nodes is properly forwarded by the network infrastructure toward any other cluster node. 3.6.2 Network interfaces Because PowerHA SystemMirror uses CAA, CAA forces the use of all common network (Ethernet, InfiniBand, or both) interfaces between the cluster nodes for communications. You cannot limit which interfaces are used or configured to the cluster. In previous versions, the network Failure Detection Rate (FDR) policy was tunable, which is no longer true in PowerHA SystemMirror 7.1. 3.6.3 Subnetting requirements for IPAT via aliasing In terms of subnetting requirements, IPAT via aliasing is now the only IPAT option available. IPAT via aliasing has the following subnet requirements: All base IP addresses on a node must be on separate subnets. All service IP addresses must be on a separate subnet from any of the base subnets. The service IP addresses can all be in the same or different subnets. The persistent IP address can be in the same or a different subnet from the service IP address. If the networks are a single adapter configuration, both the base and service IP addresses are allowed on the same subnet. 3.6.4 Host name and node name In PowerHA SystemMirror 7.1, both the cluster node name and AIX host name be the same. 3.6.5 Other network considerations Other network considerations for using PowerHA SystemMirror 7.1 include single adapter networks and virtual Ethernet. Single adapter networks Through the use of EtherChannel, Shared Ethernet Adapters (SEA), or both at the VIOS level, it is common today to have redundant interfaces act as one logical interface to the AIX client or cluster node. In these configurations, historically users configured a netmon.cf file to ping additional external interfaces or addresses. The netmon.cf configuration file is no longer required. Virtual Ethernet In previous versions, when using virtual Ethernet, users configured a special formatted netmon.cf file to ping additional external interfaces or addresses by using specific outbound interfaces. The netmon.cf configuration file no longer applies. Chapter 3. Planning a cluster implementation for high availability 51 52 IBM PowerHA SystemMirror 7.1 for AIX 4 Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX This chapter explains how to install the IBM PowerHA SystemMirror 7.1 for AIX Standard Edition software. This chapter includes the following topics: Hardware configuration of the test environment Installing PowerHA file sets Volume group consideration © Copyright IBM Corp. 2011. All rights reserved. 53 4.1 Hardware configuration of the test environment Figure 4-1 shows a hardware overview of the test environment to demonstrate the installation and configuration procedures in this chapter. It consists of two IBM Power 570 logical partitions (LPARs), both SAN-attached to a DS4800 storage subsystem and connected to a common LAN segment. Figure 4-1 PowerHA Lab environment 4.1.1 SAN zoning In the test environment, the conventional SAN zoning is configured between each host and the storage subsystem to allow for the host attachment of the shared disks. For the cluster SAN-based communication channel, two extra zones are created as shown in Example 4-1. One zone includes the fcs0 ports of each server, and the other zone includes the fcs1 ports of each server. Example 4-1 Host-to-host zoning for SAN-based channel sydney:/ # for i in 0 1; do lscfg -vpl fcs$i|grep "Network Address";done Network Address.............10000000C974C16E Network Address.............10000000C974C16F perth:/ # for i in 0 1; do lscfg -vpl fcs$i|grep "Network Address";done Network Address.............10000000C97720D8 Network Address.............10000000C97720D9 54 IBM PowerHA SystemMirror 7.1 for AIX Fabric1: zone: Syndey_fcs0__Perth_fcs0 10:00:00:00:c9:74:c1:6e 10:00:00:00:c9:77:20:d8 Fabric2: zone: Syndey_fcs1__Perth_fcs1 10:00:00:00:c9:74:c1:6f 10:00:00:00:c9:77:20:d9 This dual zone setup provides redundancy for the SAN communication channel at the Cluster Aware AIX (CAA) storage framework level. The dotted lines in Figure 4-2 represent the initiator-to-initiator zones added on top of the conventional ones, connecting host ports to storage ports. Figure 4-2 Host-to-host zoning 4.1.2 Shared storage Three Redundant Array of Independent Disks (RAID) logical drives are configured on the DS4800 storage subsystem and are presented to both AIX nodes. One logical drive hosts the cluster repository disk. On the other two drives, the shared storage space is configured for application data. Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX 55 Example 4-2 shows that each disk is available through two paths on different Fibre Channel (FC) adapters. Example 4-2 FC path setup on AIX nodes sydney:/ # for Enabled hdisk1 Enabled hdisk1 Enabled hdisk2 Enabled hdisk2 Enabled hdisk3 Enabled hdisk3 perth:/ Enabled Enabled Enabled Enabled Defined Enabled i in hdisk1 hdisk2 hdisk3 ; do lspath -l $i;done fscsi0 fscsi1 fscsi0 fscsi1 fscsi0 fscsi1 # for i in hdisk1 hdisk2 hdisk3 ; do lspath -l $i;done hdisk1 fscsi0 hdisk1 fscsi1 hdisk2 fscsi0 hdisk2 fscsi1 hdisk3 fscsi0 hdisk3 fscsi1 The multipath driver being used is the AIX native MPIO. In Example 4-3, the mpio_get_config command shows identical LUNs on both nodes, as expected. Example 4-3 MPIO shared LUNs on AIX nodes sydney:/ # mpio_get_config -Av Frame id 0: Storage Subsystem worldwide name: 60ab800114632000048ed17e Controller count: 2 Partition count: 1 Partition 0: Storage Subsystem Name = 'ITSO_DS4800' hdisk LUN # Ownership User Label hdisk1 7 B (preferred) PW-0201-L7 hdisk2 8 A (preferred) PW-0201-L8 hdisk3 9 B (preferred) PW-0201-L9 perth:/ # mpio_get_config -Av Frame id 0: Storage Subsystem worldwide name: 60ab800114632000048ed17e Controller count: 2 Partition count: 1 Partition 0: Storage Subsystem Name = 'ITSO_DS4800' hdisk LUN # Ownership User Label hdisk1 7 B (preferred) PW-0201-L7 hdisk2 8 A (preferred) PW-0201-L8 hdisk3 9 B (preferred) PW-0201-L9 56 IBM PowerHA SystemMirror 7.1 for AIX 4.1.3 Configuring the FC adapters for SAN-based communication To properly configure the FC adapters for the cluster SAN-based communication, follow these steps: X in fcsX: In the following steps, the X in fcsX represents the number of the FC adapters. You must complete this procedure for each FC adapter that is involved in cluster SAN-based communication. 1. Unconfigure fcsX: rmdev -Rl fcsX fcsX device busy: If the fcsX device is busy when you use the rmdev command, enter the following commands: chdev -P -l fcsX -a tme=yes chdev -P -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail Then restart the system. 2. Change tme attribute value to yes in the fcsX definition: chdev -l fcsX -a tme=yes 3. Enable the dynamic tracking and the fast-fail error recovery policy on the corresponding fscsiX device: chdev -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail 4. Configure fcsX port and its associated Storage Framework Communication device: cfgmgr -l fcsX;cfgmgr -l sfwcommX 5. Verify the configuration changes by running the following commands: lsdev -C | grep -e fcsX -e sfwcommX lsattr -El fcsX | grep tme lsattr -El fscsiX | grep -e dyntrk -e fc_err_recov Example 4-4 illustrates the procedure for port fcs0 on node sydney. Example 4-4 SAN-based communication channel setup sydney:/ # lsdev -l fcs0 fcs0 Available 00-00 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03) sydney:/ # lsattr -El fcs0|grep tme tme no Target Mode Enabled True sydney:/ # rmdev -Rl fcs0 fcnet1 Defined sfwcomm0 Defined fscsi0 Defined fcs0 Defined sydney:/ # chdev -l fcs0 -a tme=yes fcs0 changed sydney:/ # chdev -l fscsi0 -a dyntrk=yes -a fc_err_recov=fast_fail Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX 57 fscsi0 changed sydney:/ # cfgmgr -l fcs0;cfgmgr -l sfwcomm0 sydney:/ # lsdev -C|grep -e fcs0 -e sfwcomm0 fcs0 Available 01-00 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03) sfwcomm0 Available 01-00-02-FF Fiber Channel Storage Framework Comm sydney:/ # lsattr -El fcs0|grep tme tme yes Target Mode Enabled True sydney:/ # lsattr -El fscsi0|grep -e dyntrk -e fc_err_recov dyntrk yes Dynamic Tracking of FC Devices True fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True 4.2 Installing PowerHA file sets At a minimum, you must have the following PowerHA runtime executable files: cluster.es.client cluster.es.server cluster.es.cspoc Depending on the functionality required for your environment, additional file sets might be selected for installation. Migration consideration: Installation on top of a previous release is considered a migration. Additional steps are required for migration including running the clmigcheck command. For more information about migration, see Chapter 7, “Migrating to PowerHA 7.1” on page 151. PowerHA SystemMirror 7.1 for AIX Standard Edition includes the Smart Assists images. For more details about the Smart Assists functionality and new features, see 2.2, “New features” on page 24. The PowerHA for IBM Systems Director agent file set comes with the base installation media. To learn more about PowerHA SystemMirror for IBM Systems Director, see 5.3, “PowerHA SystemMirror for IBM Systems Director” on page 133. You can install the required packages in the following ways: From a CD From a hard disk to which the software has been copied From a Network Installation Management (NIM) server Installation from a CD is more appropriate for small environments. Use NFS export and import for remote nodes to avoid multiple CD maneuvering or image copy operations. The following section provides an example of how to use a NIM server to install the PowerHA software. 58 IBM PowerHA SystemMirror 7.1 for AIX 4.2.1 PowerHA software installation example This section guides you through an example of installing the PowerHA software. This example runs on the server configuration shown in 4.1, “Hardware configuration of the test environment” on page 54. Installing the AIX BOS components and RSCT Some of the prerequisite file sets might already be present, or they might be missing from previous installations, updates, and removals. To begin, a consistent AIX image must be installed. The test environment entailed starting with a “New and Complete Overwrite” of AIX 6.1.6.1 installation from a NIM server. Example 4-5 shows how to check the AIX version and the consistency of the installation. Example 4-5 Initial AIX image sydney:/ # oslevel -s 6100-06-01-1043 sydney:/ # lppchk -v sydney:/ # In Example 4-6, the lslpp command lists the prerequisites that are already installed and the ones that are missing in a single output. Example 4-6 Checking the installed and missing prerequisites sydney:/ # lslpp -L bos.adt.lib bos.adt.libm bos.adt.syscalls bos.clvm.enh \ > bos.cluster.rte bos.cluster.solid bos.data bos.ahafs bos.net.tcp.client \ > bos.net.tcp.server bos.rte.SRC bos.rte.libc bos.rte.libcfg \ > bos.rte.libcur bos.rte.libpthreads bos.rte.lvm bos.rte.odm \ > bos.rte.libcur bos.rte.libpthreads bos.rte.lvm bos.rte.odm \ > rsct.basic.rte rsct.compat.basic.hacmp rsct.compat.clients.hacmp Fileset Level State Type Description (Uninstaller) ---------------------------------------------------------------------------bos.adt.lib 6.1.2.0 C F Base Application Development Libraries lslpp: Fileset bos.adt.libm not installed. lslpp: Fileset bos.adt.syscalls not installed. bos.cluster.rte 6.1.6.1 C F Cluster Aware AIX bos.cluster.solid 6.1.6.1 C F POWER HA Business Resiliency solidDB lslpp: Fileset bos.clvm.enh not installed. lslpp: Fileset bos.data not installed. bos.net.tcp.client 6.1.6.1 C F TCP/IP Client Support bos.net.tcp.server 6.1.6.0 C F TCP/IP Server bos.rte.SRC 6.1.6.0 C F System Resource Controller bos.rte.libc 6.1.6.1 C F libc Library bos.rte.libcfg 6.1.6.0 C F libcfg Library bos.rte.libcur 6.1.6.0 C F libcurses Library bos.rte.libpthreads 6.1.6.0 C F pthreads Library bos.rte.lvm 6.1.6.0 C F Logical Volume Manager bos.rte.odm 6.1.6.0 C F Object Data Manager rsct.basic.rte 3.1.0.1 C F RSCT Basic Function rsct.compat.basic.hacmp 3.1.0.1 C F RSCT Event Management Basic Function (HACMP/ES Support) Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX 59 rsct.compat.clients.hacmp 3.1.0.0 C F RSCT Event Management Client Function (HACMP/ES Support) Figure 4-3 shows selection of the appropriate lpp_source on the NIM server, aix6161, by following the path smitty nim Install and Update Software Install Software. You select all of the required file sets on the next panel. Install and Update Software Move cursor to desired item and press Enter. Install Software Update Installed Software to Latest Level (Update All) Install Software Bundle Update Software by Fix (APAR) •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • Select the LPP_SOURCE containing the install images • • • • Move cursor to desired item and press Enter. • • • • aix7100g resources lpp_source • • aix7101 resources lpp_source • • aix6161 resources lpp_source • • ha71sp1 resources lpp_source • • aix6060 resources lpp_source • • aix6160-SP1-only resources lpp_source • • • • F1=Help F2=Refresh F3=Cancel • • Esc+8=Image Esc+0=Exit Enter=Do • F1• /=Find n=Find Next • Es•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• Figure 4-3 Installing the prerequisites: Selecting lpp_source 60 IBM PowerHA SystemMirror 7.1 for AIX Figure 4-4 shows one of the selected file sets, bos.clvm. Although it is not required for another file set, bos.clvm is mandatory for PowerHA 7.1 because only enhanced concurrent volume groups (ECVGs) are supported. See 10.3.3, “The ECM volume group” on page 313, for more details. Ty•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• Pr• Software to Install • • • [T• Move cursor to desired item and press Esc+7. Use arrow keys to scroll. • * • ONE OR MORE items can be selected. • * • Press Enter AFTER making all selections. • • • • [MORE...2286] • • + 6.1.6.1 POWER HA Business Resiliency solidDB • • + 6.1.6.0 POWER HA Business Resiliency solidDB • • • • > bos.clvm ALL • • + 6.1.6.0 Enhanced Concurrent Logical Volume Manager • • • • bos.compat ALL • • + 6.1.6.0 AIX 3.2 Compatibility Commands • • [MORE...4498] • [M• • • F1=Help F2=Refresh F3=Cancel • F1• Esc+7=Select Esc+8=Image Esc+0=Exit • Es• Enter=Do /=Find n=Find Next • Es•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• Figure 4-4 Installing the prerequisites: Selecting the file sets After installing from the NIM server, ensure that each node remains at the initial version of AIX and RSCT, and check the software consistency, as shown in Example 4-7. Example 4-7 Post-installation check of the prerequisites sydney:/ # oslevel -s 6100-06-01-1043 sydney:/ # lppchk -v sydney:/ # lslpp -L rsct.basic.rte rsct.compat.basic.hacmp \ > rsct.compat.clients.hacmp Fileset Level State Type Description (Uninstaller) ---------------------------------------------------------------------------rsct.basic.rte 3.1.0.1 C F RSCT Basic Function rsct.compat.basic.hacmp 3.1.0.1 C F RSCT Event Management Basic Function (HACMP/ES Support) rsct.compat.clients.hacmp 3.1.0.0 C F RSCT Event Management Client Function (HACMP/ES Support) Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX 61 Installing the PowerHA file sets To prepare an lpp_source that contains the required base and updated file sets, follow these steps: 1. Copy the file set from the media to a directory on the NIM server by using the smit bffcreate command. 2. Apply the latest service pack in the same directory by using the smit bffcreate command. 3. Create an lpp_source resource that points to the directory on the NIM server. Example 4-8 lists the contents of the lpp_source. As mentioned previously, both the Smart Assist file sets and PowerHA for IBM Systems Director agent file set come with the base media. Example 4-8 The contents of lpp_source in PowerHA SystemMirror nimres1:/ # lsnim -l ha71sp1 ha71sp1: class = resources type = lpp_source arch = power Rstate = ready for use prev_state = unavailable for use location = /nimrepo/lpp_source/HA71 alloc_count = 0 server = master nimres1:/nimres1:/ # ls /nimrepo/lpp_source/HA71 .toc cluster.adt.es cluster.doc.en_US.assist cluster.doc.en_US.assist.db2.html.7.1.0.1.bff cluster.doc.en_US.assist.oracle.html.7.1.0.1.bff cluster.doc.en_US.assist.websphere.html.7.1.0.1.bff cluster.doc.en_US.es cluster.doc.en_US.es.html.7.1.0.1.bff cluster.doc.en_US.glvm.html.7.1.0.1.bff cluster.es.assist cluster.es.assist.common.7.1.0.1.bff cluster.es.assist.db2.7.1.0.1.bff cluster.es.assist.domino.7.1.0.1.bff cluster.es.assist.ihs.7.1.0.1.bff cluster.es.assist.sap.7.1.0.1.bff cluster.es.cfs cluster.es.cfs.rte.7.1.0.1.bff cluster.es.client cluster.es.client.clcomd.7.1.0.1.bff cluster.es.client.lib.7.1.0.1.bff cluster.es.client.rte.7.1.0.1.bff cluster.es.cspoc cluster.es.director.agent cluster.es.migcheck cluster.es.nfs cluster.es.server cluster.es.server.diag.7.1.0.1.bff cluster.es.server.events.7.1.0.1.bff 62 IBM PowerHA SystemMirror 7.1 for AIX cluster.es.server.rte.7.1.0.1.bff cluster.es.server.utils.7.1.0.1.bff cluster.es.worksheets cluster.license cluster.man.en_US.es.data cluster.msg.en_US.assist cluster.msg.en_US.es rsct.basic_3.1.0.0 rsct.compat.basic_3.1.0.0 rsct.compat.clients_3.1.0.0 rsct.core_3.1.0.0 rsct.exp_3.1.0.0 rsct.opt.fence_3.1.0.0 rsct.opt.stackdump_3.1.0.0 rsct.opt.storagerm_3.1.0.0 rsct.sdk_3.1.0.0 Example 4-9 shows the file sets that were selected for the test environment and installed from the lpp_source that was prepared previously. Each node requires a PowerHA license. Therefore, you must install the license file set. Example 4-9 List of installed PowerHA file sets sydney:/ # lslpp -L cluster.* Fileset Level State Type Description (Uninstaller) ---------------------------------------------------------------------------Infrastructure cluster.es.client.lib 7.1.0.1 C F PowerHA SystemMirror Client Libraries cluster.es.client.rte 7.1.0.1 C F PowerHA SystemMirror Client Runtime cluster.es.client.utils 7.1.0.0 C F PowerHA SystemMirror Client Utilities cluster.es.client.wsm 7.1.0.0 C F Web based Smit cluster.es.cspoc.cmds 7.1.0.0 C F CSPOC Commands cluster.es.cspoc.dsh 7.1.0.0 C F CSPOC dsh cluster.es.cspoc.rte 7.1.0.0 C F CSPOC Runtime Commands cluster.es.migcheck 7.1.0.0 C F PowerHA SystemMirror Migration support cluster.es.server.cfgast 7.1.0.0 C F Two-Node Configuration Assistant cluster.es.server.diag 7.1.0.1 C F Server Diags cluster.es.server.events 7.1.0.1 C F Server Events cluster.es.server.rte 7.1.0.1 C F Base Server Runtime cluster.es.server.testtool 7.1.0.0 C F Cluster Test Tool cluster.es.server.utils 7.1.0.1 C F Server Utilities cluster.license 7.1.0.0 C F PowerHA SystemMirror Electronic License cluster.man.en_US.es.data 7.1.0.0 C F Man Pages - U.S. English cluster.msg.en_US.assist 7.1.0.0 C F PowerHA SystemMirror Smart Assist Messages - U.S. English cluster.msg.en_US.es.client 7.1.0.0 C F PowerHA SystemMirror Client Messages - U.S. English cluster.msg.en_US.es.server Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX 63 7.1.0.0 C F Recovery Driver Messages U.S. English Then verify the installed software as shown in Example 4-10. The prompt return by the lppchk command confirms the consistency of the installed file sets. Example 4-10 Verifying the installed PowerHA filesets consistency sydney:/ # lppchk -v sydney:/ # lppchk -c cluster.* sydney:/ # 4.3 Volume group consideration PowerHA 7.1 supports only the use of enhanced concurrent volume groups. If you try to add an existing non-current volume group to a PowerHA resource group, it fails if it is not already imported on the other node with the error message shown in Figure 4-5. Auto Discover/Import of Volume Groups was set to true. Gathering cluster information, which may take a few minutes. claddres: test_vg is not a shareable volume group. Could not perform all imports. No ODM values were changed. <01> Importing Volume group: test_vg onto node: chile: FAIL Verification to be performed on the following: Cluster Topology Cluster Resources Figure 4-5 Error message when adding a volume group To work around the problem shown in Figure 4-5, manually import the volume group on the other node by using the following command: importvg -L test_vg hdiskx After the volume group is added to the other node, the synchronization and verification are then completed. Volume group conversion: The volume group is automatically converted to an enhanced concurrent volume group during the first startup of the PowerHA cluster. 64 IBM PowerHA SystemMirror 7.1 for AIX 5 Chapter 5. Configuring a PowerHA cluster To configure a PowerHA cluster, you can choose from the following options. SMIT SMIT is the most commonly used way to manage and configure a cluster. The SMIT menus are available after the cluster file sets are installed. The learning cycle for using SMIT is shorter than the learning cycle for using the command-line interface (CLI). For more information about using SMIT to configure a cluster, see 5.1, “Cluster configuration using SMIT” on page 66. PowerHA SystemMirror plug-in for IBM Systems Director IBM Systems Director is for users who are ready to use and want to use it to manage and configure the PowerHA clusters. You might choose this option if you are working with large environments for central management of all clusters. You can choose from two methods, as explained in the following sections, to configure a cluster using IBM Systems Director: – 12.1.1, “Creating a cluster with the SystemMirror plug-in wizard” on page 334 – 12.1.2, “Creating a cluster with the SystemMirror plug-in CLI” on page 339 The clmgr CLI You can use the clmgr utility for configuration tasks. However, its purpose is to provide a uniform scripting interface for deployments in larger environments and to perform day-to-day cluster management. For more information about using this tool, see 5.2, “Cluster configuration using the clmgr tool” on page 104. You can perform most administration tasks with any of these options. The option that you choose depends on which one you prefer and which one meets the requirements of your environment. This chapter includes the following topics: Cluster configuration using SMIT Cluster configuration using the clmgr tool PowerHA SystemMirror for IBM Systems Director © Copyright IBM Corp. 2011. All rights reserved. 65 5.1 Cluster configuration using SMIT This topic includes the following sections: SMIT menu changes Overview of the test environment Typical configuration of a cluster topology Custom configuration of the cluster topology Configuring resources and applications Configuring Start After and Stop After resource group dependencies Creating a user-defined resource type Configuring the dynamic node priority (adaptive failover) Removing a cluster 5.1.1 SMIT menu changes The SMIT menus for PowerHA SystemMirror 7.1 are restructured to simplify configuration and administration by grouping menus by function. Locating available options: If you are familiar with the SMIT paths from an earlier version, and need to locate a specific feature, use the “Can’t find what you are looking for ?” feature from the main SMIT menu to list and search the available options. To enter the top-level menu, use the new fast path, smitty sysmirror. The fast path on earlier versions, smitty hacmp, still works. From the main menu, the highlighted options shown in Figure 5-1 are available to help with topology and resources configuration. Most of the tools necessary to configure cluster components are under “Cluster Nodes and Networks” and “Cluster Applications and Resources.” Some terminology has changed, and the interface looks more simplified for easier navigation and management. PowerHA SystemMirror Move cursor to desired item and press Enter. Cluster Nodes and Networks Cluster Applications and Resources System Management (C-SPOC) Problem Determination Tools Custom Cluster Configuration Can't find what you are looking for ? Not sure where to start ? Figure 5-1 Top-level SMIT menu Because topology monitoring has been transferred to CAA, its management has been simplified. Support for non-TCP/IP heartbeat has been transferred to CAA and is no longer a separate configurable option. Instead of multiple menu options and dialogs for configuring non-TCP/IP heartbeating devices, a single option is available plus a window (Figure 5-2) to specify the CAA cluster repository disk and the multicast IP address. 66 IBM PowerHA SystemMirror 7.1 for AIX Up-front help information and navigation aids, similar to the last two items in the top-level menu in Figure 5-1 (Can't find what you are looking for ? and Not sure where to start ?), are now available in some of the basic panels. See the last menu option in Figure 5-2 (What are a repository disk and cluster IP address ?) for an example. The context-sensitive help (F1 key) in earlier versions is still available. Initial Cluster Setup (Typical) Move cursor to desired item and press Enter. Setup a Cluster, Nodes and Networks Define Repository Disk and Cluster IP Address What are a repository disk and cluster IP address ? F1=Help Esc+9=Shell F2=Refresh Esc+0=Exit F3=Cancel Enter=Do Esc+8=Image Figure 5-2 Help information The top resource menus keep only the commonly used options, and the less frequently used menus are deeper in the hierarchy, under a new Custom Cluster Configuration menu. This menu includes various customizable and advanced options, similar to the “Extended Configuration” menu in earlier versions. See 2.3, “Changes to the SMIT panel” on page 25, for a layout that compares equivalent menu screens in earlier versions with the new screens. The Verify and Synchronize functions now have a simplified form in most of the typical menus, while the earlier customizable version is available in more advanced contexts. Application server versus application controller: Earlier versions used the term application server to refer to the scripts that are used to start and stop applications under SystemMirror control. In version 7.1, these scripts are referred to as application controllers. A System Events dialog is now available in addition to the user-defined events and pre- and post-event commands for predefined events from earlier versions. For more information about this dialog, see 9.4, “Testing the rootvg system event” on page 286. SSA disks are no longer supported in AIX 6.1, and the RSCT role has been diminished. Therefore, some related menu options have been removed. See Chapter 2, “Features of PowerHA SystemMirror 7.1” on page 23, for more details about the new and obsolete features. For a topology configuration, SMIT provides two possible approaches that resemble the previous Standard and Extended configuration paths: typical configuration and custom configuration. Typical configuration The smitty sysmirror Cluster Nodes and Networks Initial Cluster Setup (Typical) configuration path provides the means to configure the basic components of a cluster in a few steps. Discovery and selection of configuration information is automated, and default values are provided whenever possible. If you need to use specific values instead of the default Chapter 5. Configuring a PowerHA cluster 67 paths that are provided, you can change them later or use the custom configuration path instead. Custom configuration Custom cluster configuration options are not typically required or used by most customers. However they provide extended flexibility in configuration and management options. These options are under the Custom Cluster Configuration option in the top-level panel. If you want complete control over which components are added to the cluster, and create them piece by piece, you can configure the cluster topology with the SMIT menus. Follow the path Custom Cluster Configuration Initial Cluster Setup (Custom). With this path, you can also set your own node and network names, other than the default ones. Alternatively, you can choose only specific network interfaces to support the clustered applications. (By default, all IP configured interfaces are used.) Resources configuration The Cluster Applications and Resources menu in the top-level panel groups the commonly used options for configuring resources, resource groups, and application controllers. Other resource options that are not required in most typical configurations are under the Custom Cluster Configuration menu. They provide dialogs and options to perform the following tasks: Configure a custom disk, volume group, and file system methods for cluster resources Customize resource recovery and service IP label distribution policy Customize and event Most of the resources menus and dialogs are similar to their counterparts in earlier versions. For more information, see the existing documentation about the previous releases listed in “Related publications” on page 519. 5.1.2 Overview of the test environment The cluster used in the test environment is a mutual-takeover, dual-node implementation with two resource groups, one on each node. Figure 5-3 on page 69 shows the cluster configuration on top of the hardware infrastructure introduced in 4.1, “Hardware configuration of the test environment” on page 54. 68 IBM PowerHA SystemMirror 7.1 for AIX Figure 5-3 Mutual-takeover, dual-node cluster By using this setup, we can present various aspects of a typical production implementation, such as topology redundancy or more complex resource configuration. As an example, we configure SAN-based heartbeating and introduce the new Start After and Stop After resource group dependencies. 5.1.3 Typical configuration of a cluster topology This section explains step-by-step how to configure a basic PowerHA cluster topology using the typical cluster configuration path. For an example of using the custom cluster configuration path, see 5.1.4, “Custom configuration of the cluster topology” on page 78. Prerequisite: Before reading this section, you must have configured all your networks and storage devices as explained in 3.2, “Hardware requirements” on page 44. The /etc/cluster/rhosts directory must be populated with all cluster IP addresses before using PowerHA SystemMirror. This process was done automatically in earlier versions, but is now a required, manual process. The addresses that you enter in this file must include the addresses that resolve to the host name of the cluster nodes. If you update this file, you must refresh the clcomd subsystem with the refresh -s clcomd command. Chapter 5. Configuring a PowerHA cluster 69 In previous releases of PowerHA, you were not required to have the host name resolve into an IP address. From the information based on the PowerHA release notes, you are required to resolve the host name. Important: Previous releases used the clcomdES subsystem, which read information from the /usr/es/sbin/cluster/etc/rhosts directory. The clcomdES subsystem is no longer used. Therefore, you must configure the clcomd subsystem as explained in this section. Also, ensure that you have one unused shared disk available for the cluster repository. Example 5-1 shows the lspv command output on the systems sydney and perth. The first part shows the output from the node sydney, and the second part shows the output from perth. Example 5-1 lspv command output before configuring PowerHA sydney:/ # lspv hdisk0 00c1f170488a4626 hdisk1 00c1f170fd6b4d9d hdisk2 00c1f170fd6b50a5 hdisk3 00c1f170fd6b5126 rootvg dbvg appvg None active --------------------------------------------------------------------------perth:/ # lspv hdisk0 00c1f1707c6092fe rootvg active hdisk1 00c1f170fd6b4d9d dbvg hdisk2 00c1f170fd6b50a5 appvg hdisk3 00c1f170fd6b5126 None Node names: The sydney and perth node names have no implication on extended distance capabilities. The names have been used only for node names. Defining a cluster To define a cluster, follow these steps: 1. Use the smitty sysmirror or smitty hacmp fast path. 2. In the PowerHA SystemMirror menu (Figure 5-4), select the Cluster Nodes and Networks option. PowerHA SystemMirror Move cursor to desired item and press Enter. Cluster Nodes and Networks Cluster Applications and Resources System Management (C-SPOC) Problem Determination Tools Custom Cluster Configuration Can't find what you are looking for ? Not sure where to start ? Figure 5-4 Menu that is displayed after entering smitty sysmirror 70 IBM PowerHA SystemMirror 7.1 for AIX 3. In the Cluster Nodes and Networks menu (Figure 5-5), select the Initial Cluster Setup (Typical) option. Cluster Nodes and Networks Move cursor to desired item and press Enter. Initial Cluster Setup (Typical) Manage the Cluster Manage Nodes Manage Networks and Network Interfaces Discover Network Interfaces and Disks Verify and Synchronize Cluster Configuration Figure 5-5 Cluster Nodes and Networks menu 4. In the Initial Cluster Setup (Typical) menu (Figure 5-6), select the Setup a Cluster, Nodes and Networks option. Initial Cluster Setup (Typical) Move cursor to desired item and press Enter. Setup a Cluster, Nodes and Networks Define Repository Disk and Cluster IP Address What are a repository disk and cluster IP address ? Figure 5-6 Initial cluster setup (typical) 5. From the Setup a Cluster, Nodes, and Networks panel (Figure 5-7 on page 72), complete the following steps: a. Specify the repository disk and the multicast IP address. The cluster name is based on the host name of the system. You can use this default or replace it with a name you want to use. In the text environment, the cluster is named australia. b. In the New Nodes field, define the IP label that you want to use to communicate to the other systems. In this example, we plan to build a two-node cluster where the two systems are named sydney and perth. If you want to create a cluster with more than two nodes, you can specify more than one system by using the F4 key. The advantage is that you do not get typographical errors, and you can verify that the /etc/hosts file contains your network addresses. The Currently Configured Node(s) field lists all the configured nodes or lists the host name of the system you are working on if nothing is configured so far. c. Press Enter. Chapter 5. Configuring a PowerHA cluster 71 Setup Cluster, Nodes and Networks (Typical) Type or select values in entry fields. Press Enter AFTER making all desired changes. * Cluster Name New Nodes (via selected communication paths) Currently Configured Node(s) [Entry Fields] [australia] [perth] sydney Figure 5-7 Setup a Cluster, Nodes and Networks panel The COMMAND STATUS panel (Figure 5-8) indicates that the cluster creation completed successfully. COMMAND STATUS Command: OK stdout: yes stderr: no Before command completion, additional instructions may appear below. [TOP] Cluster Name: australia_cluster Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No Repository Disk: None Cluster IP Address: There are 2 node(s) and 1 network(s) defined NODE perth: Network net_ether_01 perth 192.168.101.136 NODE sydney: Network net_ether_01 sydney 192.168.101.135 No resource groups defined clharvest_vg: Initializing.... Gathering cluster information, which may take a few minutes... clharvest_vg: Processing... Storing the following information in file /usr/es/sbin/cluster/etc/config/clvg_config perth: [MORE...93] Figure 5-8 Cluster creation completed successfully 72 IBM PowerHA SystemMirror 7.1 for AIX + If you receive an error message similar to the example in Figure 5-9, you might have missed a step. For example, you might not have added the host names to /etc/cluster/rhosts directory or forgot to use the refresh -s clcomd command. Alternatively, you might have to change the host name in the /etc/cluster/rhosts directory to a full domain-based host name. Reminder: After you change the /etc/cluster/rhosts directory, enter the refresh -s clcomd command. COMMAND STATUS Command: failed stdout: yes stderr: no Before command completion, additional instructions may appear below. Warning: There is no cluster found. cllsclstr: No cluster defined cllsclstr: Error reading configuration Figure 5-9 Failure to set up the initial cluster When you look in more detail at the output, you might notice that the system adds your entries to the cluster configuration and runs a discovery on the systems. You also get information about the discovered shared disks that are listed. Configuring the repository disk and cluster multicast IP address After you configure the cluster, configure the repository disk and the cluster multicast IP address. 1. Go back to the Initial Cluster Setup (Typical) panel (Figure 5-6 on page 71). You can use the path smitty sysmirror Cluster Nodes and Networks Initial Cluster Setup (Typical) or the smitty cm_setup_menu fast path. 2. In the Initial Cluster Setup (Typical) panel, select the Define Repository and Cluster IP Address option. Chapter 5. Configuring a PowerHA cluster 73 3. In the Define Repository and Cluster IP Address panel (Figure 5-10), complete these steps: a. Press the F4 key to select the disk that you want to use as the repository disk for CAA. As shown in Example 5-1 on page 70, only one unused shared disk, hdisk3, remains. b. Leave the Cluster IP Address field empty. The system generates an appropriate address for you. The cluster IP address is a multicast address that is used for internal cluster communication and monitoring. Specify an address manually only if you have an explicit reason to do so. For more information about the cluster multicast IP address, see “Requirements for the multicast IP address, SAN, and repository disk” on page 45. Multicast address not specified: If you did not specify a multicast address, you can see the one that AIX chose for you in the output of the cltopinfo command. c. Press Enter. Define Repository and Cluster IP Address Type or select values in entry fields. Press Enter AFTER making all desired changes. * Cluster Name * Repository Disk Cluster IP Address [Entry Fields] australia [None] [] + +--------------------------------------------------------------------------+ | Repository Disk | | | | Move cursor to desired item and press Enter. | | | | hdisk3 | | | | F1=Help F2=Refresh F3=Cancel | F1| F8=Image F10=Exit Enter=Do | F5| /=Find n=Find Next | F9+--------------------------------------------------------------------------+ Figure 5-10 Define Repository and Cluster IP Address panel 74 IBM PowerHA SystemMirror 7.1 for AIX Then the COMMAND STATUS panel (Figure 5-11) opens. COMMAND STATUS Command: OK stdout: yes stderr: no Before command completion, additional instructions may appear below. [TOP] Cluster Name: australia Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No Repository Disk: hdisk3 Cluster IP Address: There are 2 node(s) and 1 network(s) defined NODE perth: Network net_ether_01 perth 192.168.101.136 NODE sydney: Network net_ether_01 sydney 192.168.101.135 No resource groups defined Current cluster configuration: [BOTTOM] Figure 5-11 COMMAND STATUS showing OK for adding a repository disk This process only updates the information in the cluster configuration. If you use the lspv command on any nodes in the cluster, each node still shows the same output as listed in Example 5-1 on page 70. When the cluster is synchronized the first time, both the CAA cluster and repository disk are created. Creating a cluster with host names in the FQDN format In the testing environments, we create working clusters with both short and fully qualified domain name (FQDN) host names. To use the FQDN, you must follow this guidance: The /etc/hosts file has the FQDN entry first, right after the IP address, and then the short host name as an alias for each label. In this case, the FQDN name is used by CAA because CAA always uses the host name for its node names, regardless of whether the host name is short or FQDN. Define the PowerHA node names with the short names because dots are not accepted as part of a node name. As long as the /etc/hosts file contains the FQDN entry first, and then the short name as an alias, the host name can be either FQDN or short in your configuration. As long as the /etc/hosts file contains the FQDN entry first, and then the short name as an alias, the /etc/cluster/rhosts file can contain only the short name. This file is only used for the first synchronization of the cluster, when the Object Data Manager (ODM) classes are still not populated with the communication paths for the nodes. The same Chapter 5. Configuring a PowerHA cluster 75 function as /usr/es/sbin/cluster/etc/rhosts file exists in previous PowerHA and HACMP versions. When you are defining the interfaces to PowerHA, choose either the short or long name from the pick lists in SMIT. PowerHA always uses the short name at the end. The same guidance applies for service or persistent addresses. Logical partition (LPAR) names continue to be the short ones, even if you use FQDN for host names. Example 5-2 shows a configuration that uses host names in the FQDN format. Example 5-2 Configuration using host names in the FQDN format seoul.itso.ibm.com:/ # clcmd cat /etc/hosts ------------------------------NODE seoul.itso.ibm.com ------------------------------127.0.0.1 loopback localhost # loopback (lo0) name/address ::1 loopback localhost # IPv6 loopback (lo0) name/address 192.168.101.143 seoul-b1.itso.ibm.com seoul-b1 # Base IP label 1 192.168.101.144 busan-b1.itso.ibm.com busan-b1 # Base IP label 1 192.168.201.143 seoul-b2.itso.ibm.com seoul-b2 # Base IP label 2 192.168.201.144 busan-b2.itso.ibm.com busan-b2 # Base IP label 2 10.168.101.43 seoul.itso.ibm.com seoul # Persistent IP 10.168.101.44 busan.itso.ibm.com busan # Persistent IP 10.168.101.143 poksap-db.itso.ibm.com poksap-db # Service IP label 10.168.101.144 poksap-en.itso.ibm.com poksap-en # Service IP label 10.168.101.145 poksap-er.itso.ibm.com poksap-er # Service IP label ------------------------------NODE busan.itso.ibm.com ------------------------------127.0.0.1 loopback localhost # loopback (lo0) name/address ::1 loopback localhost # IPv6 loopback (lo0) name/address 192.168.101.143 seoul-b1.itso.ibm.com seoul-b1 # Base IP label 1 192.168.101.144 busan-b1.itso.ibm.com busan-b1 # Base IP label 1 192.168.201.143 seoul-b2.itso.ibm.com seoul-b2 # Base IP label 2 192.168.201.144 busan-b2.itso.ibm.com busan-b2 # Base IP label 2 10.168.101.43 seoul.itso.ibm.com seoul # Persistent IP 10.168.101.44 busan.itso.ibm.com busan # Persistent IP 10.168.101.143 poksap-db.itso.ibm.com poksap-db # Service IP label 10.168.101.144 poksap-en.itso.ibm.com poksap-en # Service IP label 10.168.101.145 poksap-er.itso.ibm.com poksap-er # Service IP label seoul.itso.ibm.com:/ # clcmd hostname ------------------------------NODE seoul.itso.ibm.com ------------------------------seoul.itso.ibm.com ------------------------------NODE busan.itso.ibm.com ------------------------------busan.itso.ibm.com seoul.itso.ibm.com:/ # clcmd cat /etc/cluster/rhosts ------------------------------NODE seoul.itso.ibm.com ------------------------------seoul busan ------------------------------NODE busan.itso.ibm.com ------------------------------seoul busan 76 IBM PowerHA SystemMirror 7.1 for AIX seoul.itso.ibm.com:/ # clcmd lsattr ------------------------------NODE seoul.itso.ibm.com ------------------------------authm 65536 bootup_option no gateway hostname seoul.itso.ibm.com rout6 route net,,0,192.168.100.60 ------------------------------NODE busan.itso.ibm.com ------------------------------authm 65536 bootup_option no gateway hostname busan.itso.ibm.com rout6 route net,,0,192.168.100.60 seoul.itso.ibm.com:/ Adapter Name Global Name busan-b1 boot 24 busan-b2 boot 24 poksap-er service 255.255.255.0 24 poksap-en service 255.255.255.0 24 poksap-db service 255.255.255.0 24 seoul-b1 boot 24 seoul-b2 boot 24 poksap-er service 255.255.255.0 24 poksap-en service 255.255.255.0 24 poksap-db service 255.255.255.0 24 -El inet0 Authentication Methods Use BSD-style Network Configuration Gateway Host Name IPv6 Route Route True True True True True True Authentication Methods Use BSD-style Network Configuration Gateway Host Name IPv6 Route Route True True True True True True # cllsif Type Network Netmask net_ether_01 ether net_ether_01 ether Net Type Attribute Node IP Address Alias for HB Prefix Length public busan 192.168.101.144 public busan Hardware Address Interface 192.168.201.144 en0 255.255.255.0 en2 255.255.255.0 net_ether_01 ether public busan 10.168.101.145 net_ether_01 ether public busan 10.168.101.144 net_ether_01 ether public net_ether_01 ether public seoul 192.168.101.143 en0 255.255.255.0 net_ether_01 ether public seoul 192.168.201.143 en2 255.255.255.0 net_ether_01 ether public seoul 10.168.101.145 net_ether_01 ether public seoul 10.168.101.144 net_ether_01 ether public seoul 10.168.101.143 busan 10.168.101.143 seoul.itso.ibm.com:/ # cllsnode Node busan Interfaces to network net_ether_01 Communication Interface: Name Communication Interface: Name Communication Interface: Name Communication Interface: Name Communication Interface: Name busan-b1, Attribute public, IP address 192.168.101.144 busan-b2, Attribute public, IP address 192.168.201.144 poksap-er, Attribute public, IP address 10.168.101.145 poksap-en, Attribute public, IP address 10.168.101.144 poksap-db, Attribute public, IP address 10.168.101.143 Node seoul Interfaces to network Communication Communication Communication Communication Communication seoul-b1, Attribute public, IP address 192.168.101.143 seoul-b2, Attribute public, IP address 192.168.201.143 poksap-er, Attribute public, IP address 10.168.101.145 poksap-en, Attribute public, IP address 10.168.101.144 poksap-db, Attribute public, IP address 10.168.101.143 net_ether_01 Interface: Name Interface: Name Interface: Name Interface: Name Interface: Name # LPAR names seoul.itso.ibm.com:/ # clcmd uname -n ------------------------------NODE seoul.itso.ibm.com Chapter 5. Configuring a PowerHA cluster 77 ------------------------------seoul ------------------------------NODE busan.itso.ibm.com ------------------------------busan seoul.itso.ibm.com:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------sapdb ONLINE seoul OFFLINE busan sapen ONLINE OFFLINE seoul busan saper ONLINE OFFLINE busan seoul # The output below shows that CAA always use the hostname for its node names # The Power HA nodenames are: seoul, busan seoul.itso.ibm.com:/ # lscluster -c Cluster query for cluster korea returns: Cluster uuid: 02d20290-d578-11df-871d-a24e50543103 Number of nodes in cluster = 2 Cluster id for node busan.itso.ibm.com is 1 Primary IP address for node busan.itso.ibm.com is 10.168.101.44 Cluster id for node seoul.itso.ibm.com is 2 Primary IP address for node seoul.itso.ibm.com is 10.168.101.43 Number of disks in cluster = 2 for disk cldisk2 UUID = 428e30e8-657d-8053-d70e-c2f4b75999e2 cluster_major = 0 cluster_minor = 2 for disk cldisk1 UUID = fe1e9f03-005b-3191-a3ee-4834944fcdeb cluster_major = 0 cluster_minor = 1 Multicast address for cluster is 228.168.101.43 5.1.4 Custom configuration of the cluster topology For the custom configuration path example, we use the test environment from 4.1, “Hardware configuration of the test environment” on page 54. As a preliminary step, add the base IP aliases in /etc/cluster/rhosts file on each node and refresh the CAA clcomd daemon. Example 5-3 illustrates this step on the node sydney. Example 5-3 Populating the /etc/cluster/rhosts file sydney:/ # cat /etc/cluster/rhosts sydney perth sydneyb2 perthb2 sydney:/ # stopsrc -s clcomd;startsrc -s clcomd 0513-044 The clcomd Subsystem was requested to stop. 0513-059 The clcomd Subsystem has been started. Subsystem PID is 4980906. 78 IBM PowerHA SystemMirror 7.1 for AIX Performing a custom configuration To perform a custom configuration, follow these steps: 1. Access the Initial Cluster Setup (Custom) panel (Figure 5-12) by following the path smitty sysmirror Custom Cluster Configuration Cluster Nodes and Networks Initial Cluster Setup (Custom). This task shows how to use each option on this menu. Initial Cluster Setup (Custom) Move cursor to desired item and press Enter. Cluster Nodes Networks Network Interfaces Define Repository Disk and Cluster IP Address Figure 5-12 initial Cluster Setup (Custom) panel for a custom configuration 2. Define the cluster: a. From the Initial Cluster Setup (Custom) panel (Figure 5-12), follow the path Cluster Add/Change/Show a Cluster. b. In the Add/Change/Show a Cluster panel (Figure 5-13), define the cluster name, australia. Add/Change/Show a Cluster Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [australia] * Cluster Name Figure 5-13 Adding a cluster 3. Add the nodes: a. From the Initial Cluster Setup (Custom) panel, select the path Nodes Add a Node, b. In the Add a Node panel (Figure 5-14), specify the first node, sydney, and the path that is taken to initiate communication with the node. The cluster Node Name might be different from the host name of the node. c. Add the second node, perth, in the same way as you did for the sydney node. Add a Node Type or select values in entry fields. Press Enter AFTER making all desired changes. * Node Name Communication Path to Node Entry Fields] [sydney] [sydney] + Figure 5-14 Add a Node panel Chapter 5. Configuring a PowerHA cluster 79 4. Add a network: a. From the Initial Cluster Setup (Custom) panel, follow the path Networks Add a Network. b. In the Add a Network panel (Figure 5-15), For Network Type, select ether. c. Define a PowerHA logical network, ether01, and specify its netmask. This logical network is later populated with the corresponding base and service IP labels. You can define more networks if needed. Add a Network Type or select values in entry fields. Press Enter AFTER making all desired changes. * Network Name * Network Type * Netmask(IPv4)/Prefix Length(IPv6) [Entry Fields] [ether01] ether [255.255.252.0] Figure 5-15 Add a Network panel 5. Add the network interfaces: a. From the Initial Cluster Setup (Custom) panel, follow the path Network Interfaces Add a Network Interface. b. Select the logical network and populate it with the appropriate interfaces. In the example shown in Figure 5-16, we select the only defined ether01 network, and add the interface sydneyb2 on the sydney node. Add in all the other interfaces in the same way. Tip: You might find it useful to remember the following points: The sydneyb1 and perthb1 addresses are defined in the same subnet network. The sydnetb2 and perthb2 addresses are defined in another subnet network. All interfaces must have the same network mask. Add a Network Interface Type or select values in entry fields. Press Enter AFTER making all desired changes. * * * * IP Label/Address Network Type Network Name Node Name Network Interface Figure 5-16 Add a Network Interface panel 80 IBM PowerHA SystemMirror 7.1 for AIX [Entry Fields] [sydneyb2] ether ether01 [sydney] [] + + 6. Define the repository disk and cluster IP address: a. From the Initial Cluster Setup (Custom) panel, select the Define Repository Disk and Cluster IP Address option. b. Choose the physical disk that is used as a central repository of the cluster configuration and specify the multicast IP address to be associated with this cluster. In the example shown in Figure 5-17, we let the cluster automatically generate a default value for the multicast IP address. Define Repository and Cluster IP Address Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] australia [hdisk1] + [] * Cluster Name * Repository Disk Cluster IP Address Figure 5-17 Define Repository Disk and Cluster IP Address panel Verifying and synchronizing the custom configuration With the cluster topology defined, you can verify and synchronize the cluster for the first time. When the first Verify and Synchronize Cluster Configuration action is successful, the underlying CAA cluster is activated, and the heartbeat messages begin. We use the customizable version of the Verify and Synchronize Cluster Configuration command. Figure 5-18 shows an example where the Automatically correct errors found during verification? option changed from the default value of No to Yes. PowerHA SystemMirror Verification and Synchronization Type or select values in entry fields. Press Enter AFTER making all desired changes. * Verify, Synchronize or Both * Include custom verification library checks * Automatically correct errors found during verification? * Force synchronization if verification fails? * Verify changes only? * Logging [Entry Fields] [Both] [Yes] [Yes] + + + [No] [No] [Standard] + + + Figure 5-18 Verifying and synchronizing the cluster configuration (advanced) Chapter 5. Configuring a PowerHA cluster 81 Upon successful synchronization, check the PowerHA topology and the CAA cluster configuration by using cltopinfo and lscluster -c commands on any node. Example 5-4 shows usage of the PowerHA cltopinfo command. It also shows how the topology configured on the node sydney looks on the node perth after synchronization. Example 5-4 PowerHA cluster topology perth:/ # cltopinfo Cluster Name: australia Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No Repository Disk: caa_private0 Cluster IP Address: There are 2 node(s) and 1 network(s) defined NODE perth: Network ether01 perthb2 192.168.201.136 perth 192.168.101.136 NODE sydney: Network ether01 sydneyb2 192.168.201.135 sydney 192.168.101.135 No resource groups defined Example 5-5 shows a summary configuration of the CAA cluster created during the synchronization phase. Example 5-5 CAA cluster summary configuration perth:/ # lscluster -c Cluster query for cluster australia returns: Cluster uuid: d77ac57e-cc1b-11df-92a4-00145ec5bf9a Number of nodes in cluster = 2 Cluster id for node perth is 1 Primary IP address for node perth is 192.168.101.136 Cluster id for node sydney is 2 Primary IP address for node sydney is 192.168.101.135 Number of disks in cluster = 0 Multicast address for cluster is 228.168.101.135 For more details about the CAA cluster status, see the following section. Initial CAA cluster status Check the status of the CAA cluster by using lscluster command. As shown in Example 5-6, the lscluster -m command lists the node and point-of-contact status information. A point-of-contact status indicates that a node has received communication packets across this interface from another node. Example 5-6 CAA cluster node status sydney:/ # lscluster -m Calling node query for all nodes Node query number of nodes examined: 2 82 IBM PowerHA SystemMirror 7.1 for AIX Node name: perth Cluster shorthand id for node: 1 uuid for node: 15bef17c-cbcf-11df-951c-00145e5e3182 State of node: UP Smoothed rtt to node: 7 Mean Deviation in network rtt to node: 3 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID australia local 98f28ffa-cfde-11df-9a82-00145ec5bf9a Number of points_of_contact for node: 3 Point-of-contact interface & contact state sfwcom UP en2 UP en1 UP -----------------------------Node name: sydney Cluster shorthand id for node: 2 uuid for node: f6a81944-cbce-11df-87b6-00145ec5bf9a State of node: UP NODE_LOCAL Smoothed rtt to node: 0 Mean Deviation in network rtt to node: 0 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID australia local 98f28ffa-cfde-11df-9a82-00145ec5bf9a Number of points_of_contact for node: 0 Point-of-contact interface & contact state n/a sydney:/ # Example 5-7 shows detailed interface information provided by the lscluster -i command. It shows information about the network interfaces and the other two logical interfaces that are used for cluster communication: sfwcom dpcom The node connection to the SAN-based communication channel. The node connection to the repository disk. Example 5-7 CAA cluster interface status sydney:/ # lscluster -i Network/Storage Interface Query Cluster Name: australia Cluster uuid: d77ac57e-cc1b-11df-92a4-00145ec5bf9a Number of nodes reporting = 2 Number of nodes expected = 2 Node sydney Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a Number of interfaces discovered = 4 Chapter 5. Configuring a PowerHA cluster 83 Interface number 1 en1 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.c5.bf.9a Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 5 Probe interval for interface = 120 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmask 255.255.252.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en2 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.c5.bf.9b Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 5 Probe interval for interface = 120 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255 netmask 255.255.252.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 sfwcom ifnet type = 0 ndd type = 304 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 0 Mean Deviation in network rrt across interface = 0 Probe interval for interface = 100 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP Interface number 4 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 750 Mean Deviation in network rrt across interface = 1500 Probe interval for interface = 22500 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED Node perth Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182 Number of interfaces discovered = 4 84 IBM PowerHA SystemMirror 7.1 for AIX Interface number 1 en1 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.e7.25.d9 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255 netmask 255.255.252.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en2 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.e7.25.d8 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255 netmask 255.255.252.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 sfwcom ifnet type = 0 ndd type = 304 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 0 Mean Deviation in network rrt across interface = 0 Probe interval for interface = 100 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP Interface number 4 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 750 Mean Deviation in network rrt across interface = 1500 Probe interval for interface = 22500 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED Chapter 5. Configuring a PowerHA cluster 85 5.1.5 Configuring resources and applications This section continues to build up the cluster by configuring its resources, resource groups, and application controllers. The goal is to prepare the setup that is needed to introduce the new Start After and Stop After resource group dependencies in PowerHA 7.1. For a configuration example for these dependencies, see 5.1.6, “Configuring Start After and Stop After resource group dependencies” on page 96. Adding storage resources and resource groups from C-SPOC To add storage resources and resource groups form C-SPOC, follow these steps: 1. Use the smitty cl_lvm fast path or follow the path smitty sysmirror System Management (C-SPOC) Storage to configure storage resources. 2. Create two volume groups, dbvg and appvg. In the Storage panel (Figure 5-19), select the path Volume Groups Create a Volume Group (smitty cl_createvg fast path). Storage Move cursor to desired item and press Enter. Volume Groups Logical Volumes File Systems Physical Volumes Figure 5-19 C-SPOC storage panel The Volume Groups option is the preferred method for creating a volume group, because it is automatically configured on all of the selected nodes. Since the release of PowerHA 6.1, most operations on volume groups, logical volumes, and file systems no longer require these objects to be in a resource group. Smart menus check for configuration and state problems and prevent invalid operations before they can be initiated. 86 IBM PowerHA SystemMirror 7.1 for AIX 3. In the Volume Groups panel, in the Node Names dialog (Figure 5-20), select the nodes for configuring the volume groups. Volume Groups Move cursor to desired item and press Enter. List All Volume Groups Create a Volume Group Create a Volume Group with Data Path Devices Set Characteristics of a Volume Group Enable a Volume Group for Fast Disk Takeover or Concurrent Access •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • Node Names • • • • Move cursor to desired item and press Esc+7. • • ONE OR MORE items can be selected. • • Press Enter AFTER making all selections. • • • • > perth • • > sydney • • • • F1=Help F2=Refresh F3=Cancel • • Esc+7=Select Esc+8=Image Esc+0=Exit • F1• Enter=Do /=Find n=Find Next • Es•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• Figure 5-20 Nodes selection Chapter 5. Configuring a PowerHA cluster 87 In the Volume Groups panel (Figure 5-21), only the physical shared disks that are accessible on the selected nodes are displayed (Physical Volume Names menu). 4. In the Physical Volume Names menu (inset in Figure 5-21), select the volume group type. Volume Groups Move cursor to desired item and press Enter. List All Volume Groups Create a Volume Group Create a Volume Group with Data Path Devices Set Characteristics of a Volume Group Enable a Volume Group for Fast Disk Takeover or Concurrent Access •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • Physical Volume Names • • • • Move cursor to desired item and press Esc+7. • • ONE OR MORE items can be selected. • • Press Enter AFTER making all selections. • • • • 00c1f170674f3d6b ( hdisk1 on all selected nodes ) • • 00c1f1706751bc0d ( hdisk2 on all selected nodes ) • • • • F1=Help F2=Refresh F3=Cancel • • Esc+7=Select Esc+8=Image Esc+0=Exit • F1• Enter=Do /=Find n=Find Next • Es•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• Figure 5-21 Shared disk selection PVID: This step automatically creates physical volume IDs (PVIDs) for the unused (no PVID) shared disks. A shared disk might have different names on selected nodes, but the PVID is the same. 88 IBM PowerHA SystemMirror 7.1 for AIX 5. In the Create a Volume Group panel (Figure 5-22), specify the volume group name and the resource group name. Use the Resource Group Name field to include the volume group into an existing resource group or automatically create a resource group to hold this volume group. After the resource group is created, synchronize the configuration for this change to take effect across the cluster. Create a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] Node Names Resource Group Name + PVID VOLUME GROUP name Physical partition SIZE in megabytes Volume group MAJOR NUMBER Enable Cross-Site LVM Mirroring Verification Enable Fast Disk Takeover or Concurrent Access Volume Group Type CRITICAL volume group? [Entry Fields] perth,sydney [dbrg] 00c1f170674f3d6b [dbvg] 4 [37] false Fast Disk Takeover Original no + # + + + Figure 5-22 Creating a volume group in C-SPOC Chapter 5. Configuring a PowerHA cluster 89 6. Leave the resource group field empty and create or associate the resource group later. When a volume group is known on multiple nodes, it is displayed in pick lists as <Not in a Resource Group>. Figure 5-23 shows an example of a pick list. Logical Volumes Move cursor to desired item and press Enter. List All Logical Volumes by Volume Group Add a Logical Volume Show Characteristics of a Logical Volume Set Characteristics of a Logical Volume Change a Logical Volume Remove a Logical Volume •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • Select the Volume Group that will hold the new Logical Volume • • • • Move cursor to desired item and press Enter. • • • • #Volume Group Resource Group Node List • • appvg <Not in a Resource Group> perth,sydney • • caavg_private <Not in a Resource Group> perth,sydney • • dbvg dbrg perth,sydney • • • • F1=Help F2=Refresh F3=Cancel • • Esc+8=Image Esc+0=Exit Enter=Do • F1• /=Find n=Find Next • Es•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• Figure 5-23 Adding a logical volume in C-SPOC 7. In the C-SPOC Storage panel (Figure 5-19 on page 86), define the logical volumes and file systems by selecting the Logical Volumes and File Systems options. The intermediate and final panels for these actions are similar to those panels in previous releases. You can list the file systems that you created by following the path C-SPOC Storage File Systems List All File Systems by Volume Group. The COMMAND STATUS panel (Figure 5-24) shows the list of file systems for this example. COMMAND STATUS Command: OK stdout: yes stderr: no Before command completion, additional instructions may appear below. #File System /appmp /clrepos_private1 /clrepos_private2 /dbmp Volume Group appvg caavg_private caavg_private dbvg Figure 5-24 Listing of file systems in C-SPOC 90 IBM PowerHA SystemMirror 7.1 for AIX Resource Group <None> <None> <None> dbrg Node List sydney,perth sydney,perth sydney,perth sydney,perth Resources and resource groups By following the path smitty sysmirror Cluster Applications and Resources, you see the Cluster Applications and Resources menu (Figure 5-25) for resources and resource group management. Cluster Applications and Resources Move cursor to desired item and press Enter. Make Applications Highly Available (Use Smart Assists) Resources Resource Groups Verify and Synchronize Cluster Configuration Figure 5-25 Cluster Applications and Resources menu Smart Assists: The “Make Applications Highly Available (Use Smart Assists)” function leads to a menu of all installed Smart Assists. If you do not see the Smart Assist that you need, verify that the corresponding Smart Assist file set is installed. Configuring application controllers To configure the application controllers, follow these steps: 1. From the Cluster Applications and Resources menu, select Resources. 2. In the Resources menu (Figure 5-26), select the Configure User Applications (Scripts and Monitors) option to configure the application scripts. Alternatively, use the smitty cm_user_apps fast path or smitty sysmirror Cluster Applications and Resources Resources Configure User Applications (Scripts and Monitors). Resources Move cursor to desired item and press Enter. Configure User Applications (Scripts and Monitors) Configure Service IP Labels/Addresses Configure Tape Resources Verify and Synchronize Cluster Configuration Figure 5-26 Resources menu Chapter 5. Configuring a PowerHA cluster 91 3. In the Configure User Applications (Scripts and Monitors) panel (Figure 5-27), select the Application Controller Scripts option. Configure User Applications (Scripts and Monitors) Move cursor to desired item and press Enter. Application Controller Scripts Application Monitors Configure Application for Dynamic LPAR and CoD Resources Show Cluster Applications Figure 5-27 Configure user applications (scripts and monitors) 4. In the Application Controller Scripts panel (Figure 5-28), select the Add Application Controller Scripts option. Application Controller Scripts Move cursor to desired item and press Enter. Add Application Controller Scripts Change/Show Application Controller Scripts Remove Application Controller Scripts What is an "Application Controller" anyway ? Figure 5-28 Application controller scripts 92 IBM PowerHA SystemMirror 7.1 for AIX 5. In the Add Application Controller Scripts panel (Figure 5-29), which looks similar to the panels in previous versions, follow these steps: a. In the Application Controller Name field, type the name that you want use as a label for your application. In this example, we use the name dbac. b. As in previous versions, in the Start Script field, provide the location of your application start script. c. In the Stop Script field, specify the location of your stop script. In this example, we specify /HA71/db_start.sh as the start script and /HA71/db_stop.sh as the stop script. d. Optional: To monitor your application, in the Application Monitor Name(s) field, select one or more application monitors. However, you must define the application monitors before you can use them here. For an example, see “Configuring application monitoring for the target resource group” on page 98. Add Application Controller Scripts Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [dbac] [/HA71/db_start.sh] [/HA71/db_stop.sh] * Application Controller Name * Start Script * Stop Script Application Monitor Name(s) + Figure 5-29 Adding application controller scripts The configuration of the applications is completed. The next step is to configure the service IP addresses. Configuring IP service addresses To configure the IP service addresses, follow these steps: 1. Return to the Resource panel (Figure 5-26 on page 91) by using the smitty cm_resources_menu fast path or smitty sysmirror Cluster Applications and Resources Resources. 2. In the Resource panel, select the Configure Service IP Labels/Addresses option. 3. In the Configure Service IP Labels/Addresses menu (Figure 5-30), select the Add a Service IP Label/Address option. Configure Service IP Labels/Addresses Move cursor to desired item and press Enter. Add a Service IP Label/Address Change/ Show a Service IP Label/Address Remove Service IP Label(s)/Address(es) Configure Service IP Label/Address Distribution Preferences Figure 5-30 Configure Service IP Labels/Addresses menu Chapter 5. Configuring a PowerHA cluster 93 4. In the Network Name subpanel (Figure 5-31), select the network to which you want to add the Service IP Address. In this example, only one network is defined. Configure Service IP Labels/Addresses Move cursor to desired item and press Enter. Add a Service IP Label/Address Change/ Show a Service IP Label/Address Remove Service IP Label(s)/Address(es) Configure Service IP Label/Address Distribution Preferences +--------------------------------------------------------------------------+ | Network Name | | | | Move cursor to desired item and press Enter. | | | | ether01 (192.168.100.0/22 192.168.200.0/22) | | | | F1=Help F2=Refresh F3=Cancel | | F8=Image F10=Exit Enter=Do | F1| /=Find n=Find Next | F9+--------------------------------------------------------------------------+ Figure 5-31 Network Name subpanel for the Add a Service IP Label/Address option 5. In the Add a Service IP Label/Address panel, which changes as shown in Figure 5-32, in the IP Label/Address field, select the service address that you want to add. Service address defined: As in previous versions, the service address must be defined in the /etc/hosts file. Otherwise, you cannot select it by using the F4 key. You can use the Netmask(IPv4)/Prefix Length(IPv6) field to define the netmask. With IPv4, you can leave this field empty. The Network Name field is prefilled. Add a Service IP Label/Address Type or select values in entry fields. Press Enter AFTER making all desired changes. * IP Label/Address Netmask(IPv4)/Prefix Length(IPv6) * Network Name [Entry Fields] sydneys [] ether01 + Figure 5-32 Details of the Add a Service IP Label/Address panel You have now finished configuring the resources. In this example, you defined one service IP address. If you need to add more service IP addresses, repeat the steps as indicated in this section. As explained in the following section, the next step is to configure the resource groups. 94 IBM PowerHA SystemMirror 7.1 for AIX Configuring resource groups To configure the resource groups, follow these steps: 1. Go to the Cluster Applications and Resources panel (Figure 5-25 on page 91). Alternatively, use the smitty cm_apps_resources fast path or smitty sysmirror Cluster Applications and Resources. 2. In the Cluster Applications and Resources panel, select Resource Groups. 3. In the Resource Groups menu (Figure 5-33), add a resource group by selecting the Add a Resource Group option. Resource Groups Move cursor to desired item and press Enter. Add a Resource Group Change/Show Nodes and Policies for a Resource Group Change/Show Resources and Attributes for a Resource Group Remove a Resource Group Configure Resource Group Run-Time Policies Show All Resources by Node or Resource Group Verify and Synchronize Cluster Configuration What is a "Resource Group" anyway ? Figure 5-33 Resource Groups menu 4. In the Add a Resource Group panel (Figure 5-34), as in previous versions of PowerHA, specify the resource group name, the participating nodes, and the policies. Add a Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. * Resource Group Name * Participating Nodes (Default Node Priority) Startup Policy Fallover Policy Fallback Policy [Entry Fields] [dbrg] [sydney perth] + Online On Home Node O> + Fallover To Next Prio> + Fallback To Higher Pr> + Figure 5-34 Add a Resource Group panel Chapter 5. Configuring a PowerHA cluster 95 5. Configure the resources into the resource group. If you need more than one resource group, repeat the previous step to add a resource group. a. To configure the resources to the resource group, go back to the Resource Groups panel (Figure 5-33 on page 95), and select the Change/Show Resources and Attributes for a Resource Group. b. In the Change/Show Resources and Attributes for a Resource Group panel (Figure 5-35), define the resources for the resource group. Change/Show All Resources and Attributes for a Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] Resource Group Name Participating Nodes (Default Node Priority) [Entry Fields] dbrg sydney perth Startup Policy Fallover Policy Fallback Policy Fallback Timer Policy (empty is immediate) Online On Home Node O> Fallover To Next Prio> Fallback To Higher Pr> [] + Service IP Labels/Addresses Application Controllers [sydneys] [dbac] + + [dbvg] false + + Volume Groups Use forced varyon of volume groups, if necessary [MORE...24] Figure 5-35 Change/Show All Resources and Attributes for a Resource Group panel You have now finished configuring the resource group. Next, you synchronize the cluster nodes. If the Verify and Synchronize Cluster Configuration task is successfully completed, you can start your cluster. However, you might first want to see if the CAA cluster was successfully created by using the lscluster -c command. 5.1.6 Configuring Start After and Stop After resource group dependencies In this section, you configure a Start After resource group dependency and similarly create a Stop After resource group dependency. For more information about Start After and Stop After resource group dependencies, see 2.5.1, “Start After and Stop After resource group dependencies” on page 32. 96 IBM PowerHA SystemMirror 7.1 for AIX You can manage Start After dependencies between resource groups by following the path smitty sysmirror Cluster Applications and Resources Resource Groups Configure Resource Group Run-Time Policies Configure Dependencies between Resource Groups Configure Start After Resource Group Dependency. Figure 5-36 shows the Configure Start After Resource Group Dependency menu. Configure Start After Resource Group Dependency Move cursor to desired item and press Enter. Add Start After Resource Group Dependency Change/Show Start After Resource Group Dependency Remove Start After Resource Group Dependency Display Start After Resource Group Dependencies Figure 5-36 Configuring Start After Resource Group dependency menu To add a new dependency, in the Configure Start After Resource Group Dependency menu, select the Add Start After Resource Group Dependency option. In this example, we already configured the dbrg and apprg resource groups. The apprg resource group is defined as the source (dependent) resource group as shown in Figure 5-37. Configure Start After Resource Group Dependency Move cursor to desired item and press Enter. Add Start After Resource Group Dependency Change/Show Start After Resource Group Dependency Remove Start After Resource Group Dependency Display Start After Resource Group Dependencies •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • Select the Source Resource Group • • • • Move cursor to desired item and press Enter. • • • • apprg • • dbrg • • • • F1=Help F2=Refresh F3=Cancel • • Esc+8=Image Esc+0=Exit Enter=Do • F1• /=Find n=Find Next • Es•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• Figure 5-37 Selecting the source resource group of a Start After dependency Chapter 5. Configuring a PowerHA cluster 97 Figure 5-38 shows dbrg resource group defined as the target resource group. Configure Start After Resource Group Dependency Move cursor to desired item and press Enter. Add Start After Resource Group Dependency Change/Show Start After Resource Group Dependency Remove Start After Resource Group Dependency Display Start After Resource Group Dependencies •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • Select the Target Resource Group • • • • Move cursor to desired item and press Esc+7. • • ONE OR MORE items can be selected. • • Press Enter AFTER making all selections. • • • • dbrg • • • • F1=Help F2=Refresh F3=Cancel • • Esc+7=Select Esc+8=Image Esc+0=Exit • F1• Enter=Do /=Find n=Find Next • Es•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• Figure 5-38 Selecting the target resource group of a Start After dependency Example 5-8 shows the result. Example 5-8 Start After dependency configured sydney:/ # clrgdependency -t'START_AFTER' -sl #Source Target apprg dbrg sydney:/ # Configuring application monitoring for the target resource group The Start After dependency guarantees that only the source resource group is started after the target resource group is started. You might need the application in your source resource group (source startup script) to start only after a full and successful start of the application in your target resource group (after target startup script returns 0). In this case, you must configure the startup monitoring for your target application. The dummy scripts in Example 5-9 show the configuration of the test cluster. Example 5-9 Dummy scripts for target and source applications sydney:/HA71 # ls -l total 48 -rwxr--r-1 root -rwxr--r-1 root -rwxr--r-1 root -rwxr--r-1 root -rwxr--r-1 root -rwxr--r-1 root 98 IBM PowerHA SystemMirror 7.1 for AIX system system system system system system 226 283 233 201 274 229 Oct Oct Oct Oct Oct Oct 12 12 12 12 12 12 07:00 07:06 07:03 06:03 07:24 06:04 app_mon.sh app_start.sh app_stop.sh db_mon.sh db_start.sh db_stop.sh The remainder of this task continues from the configuration started in “Configuring application controllers” on page 91. You only have to add a monitor for the dbac application controller that you already configured. Follow the path smitty sysmirror Cluster Applications and Resources Resources Configure User Applications (Scripts and Monitors) Add Custom Application Monitor. The Add Custom Application Monitor panel (Figure 5-39) is displayed. We do not explain the fields here because they are the same as the fields in previous versions. However, keep in mind that the Monitor Mode value Both means both startup monitoring and long-running monitoring. Add Custom Application Monitor Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [dbam] dbac [Both] [/HA71/db_mon.sh] [30] [] [120] [3] [] [fallover] [/HA71/db_stop.sh] [/HA71/db_start.sh] [] * * * * Monitor Name Application Controller(s) to Monitor Monitor Mode Monitor Method Monitor Interval Hung Monitor Signal * Stabilization Interval * Restart Count Restart Interval * Action on Application Failure Notify Method Cleanup Method Restart Method + + # # # # # + Figure 5-39 Adding the dbam custom application monitor Chapter 5. Configuring a PowerHA cluster 99 Similarly, you can configure an application monitor and an application controller for the apprg resource group as shown in Figure 5-40. Change/Show Custom Application Monitor Type or select values in entry fields. Press Enter AFTER making all desired changes. * Monitor Name Application Controller(s) to Monitor * Monitor Mode * Monitor Method Monitor Interval Hung Monitor Signal * Stabilization Interval Restart Count Restart Interval * Action on Application Failure Notify Method Cleanup Method Restart Method [Entry Fields] appam appac + [Long-running monitori> + [/HA71/app_mon.sh] [30] # [9] # [15] # [3] # [594] # [fallover] + [] [/HA71/app_stop.sh] [/HA71/app_start.sh] Figure 5-40 Configuring the appam application monitor and appac application controller For a series of tests performed on this configuration, see 9.8, “Testing a Start After resource group dependency” on page 297. 5.1.7 Creating a user-defined resource type Now create a user-defined resource type by using SMIT: 1. To define a user-defined resource type, follow the path smitty sysmirror Custom Cluster Configuration Resources Configure User Defined Resources and Types Add a User Defined Resource Type. Resource type management: PowerHA SystemMirror automatically manages most resource types. 100 IBM PowerHA SystemMirror 7.1 for AIX 2. In the Add a User Defined Resource Type panel (Figure 5-41), define a resource type. Also select the processing order from the pick list. Add a User Defined Resource Type Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * Resource Type Name [my_resource_type] * Processing order [] + Verification Method [] Verification Type [Script] + Start Method [] Stop Method [] +--------------------------------------------------------------------------+ ¦ Processing order ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. ¦ ¦ ¦ ¦ FIRST ¦ ¦ WPAR ¦ ¦ VOLUME_GROUP ¦ ¦ FILE_SYSTEM ¦ ¦ SERVICEIP ¦ ¦ TAPE ¦ ¦ APPLICATION ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ F1¦ Esc+8=Image Esc+0=Exit Enter=Do ¦ Es¦ /=Find n=Find Next ¦ Es+--------------------------------------------------------------------------+ Figure 5-41 Adding a user-defined resource type 3. After you create your own resource, add it to the resource group. The resource group can be shown in the pick list. This information is stored in the HACMresourcetype, HACMPudres_def, and HACMPudresouce cluster configuration files. Chapter 5. Configuring a PowerHA cluster 101 5.1.8 Configuring the dynamic node priority (adaptive failover) As mentioned in 2.5.3, “Dynamic node priority: Adaptive failover” on page 35, in PowerHA 7.1, you can decide node priority based on the return value of your own script. To configure the dynamic node priority (DNP), follow these steps: 1. Follow the path smitty sysmirror Cluster Applications and Resource Resource Groups Change/Show Resources and Attributes for a Resource Group (if you already have your resource group). As you can see in Change/Show Resources and Attributes for a Resource Group panel (Figure 5-42), the algeria_rg resource group has default node priority. The participating nodes are algeria, brazil, and usa. 2. To configure DNP, choose the dynamic node priority policy. In this example, we chose cl_lowest_nonzero_udscript_rc as the dynamic node priority. Usage of this DNP means the node that has the lowest return from the DNP script gets the highest priority among the nodes. Also define the DNP script path and timeout value. DNP script for the nodes: Ensure that all nodes have the DNP script and that the script has executable mode. Otherwise, you receive an error message while running the synchronization or verification process. For a description of this test scenario, see 9.9, “Testing dynamic node priority” on page 302. Change/Show All Resources and Attributes for a Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] Resource Group Name Participating Nodes (Default Node Priority) * Dynamic Node Priority Policy DNP Script path DNP Script timeout value [Entry Fields] algeria_rg algeria brazil usa [cl_lowest_nonzero_uds> + <HTTPServer/bin/DNP.sh] / [20] # Startup Policy Fallover Policy Fallback Policy Fallback Timer Policy (empty is immediate) Online On Home Node O> Fallover Using Dynami> Fallback To Higher Pr> [] + [MORE...11] F1=Help Esc+5=Reset Esc+9=Shell F2=Refresh Esc+6=Command Esc+0=Exit Figure 5-42 Configuring DNP in a SMIT session 102 IBM PowerHA SystemMirror 7.1 for AIX F3=Cancel Esc+7=Edit Enter=Do F4=List Esc+8=Image 5.1.9 Removing a cluster You can remove your cluster by using the path smitty sysmirror Cluster Nodes and Networks Manage the Cluster Remove the Cluster Definition. Removing a cluster consists of deleting the PowerHA definition and deleting the CAA cluster from AIX. Removing the CAA cluster is the last step of the Remove operation as shown in Figure 5-43. COMMAND STATUS Command: OK stdout: yes stderr: no Before command completion, additional instructions may appear below. Attempting to delete node "Perth" from the cluster... Attempting to delete the local node from the cluster... Attempting to delete the cluster from AIX ... F1=Help Esc+8=Image n=Find Next F2=Refresh Esc+9=Shell F3=Cancel Esc+0=Exit Esc+6=Command /=Find Figure 5-43 Removing the cluster Normally, deleting the cluster with this method removes both the PowerHA SystemMirror and the CAA cluster definitions from the system. If a problem is encountered while PowerHA is trying to remove the CAA cluster, you might need to delete the CAA cluster manually. For more information, see Chapter 10, “Troubleshooting PowerHA 7.1” on page 305. After you remove the cluster, ensure that the caavg_private volume group is no longer displayed as shown in Figure 5-44. --- before --# lspv caa_private0 hdisk6 hdisk7 hdisk0 --- after --# lspv hdisk5 hdisk6 hdisk7 hdisk0 000fe40120e16405 000fe4114cf8d258 000fe4114cf8d2ec 000fe411201305c3 caavg_private dbvg applvg rootvg 000fe40120e16405 000fe4114cf8d258 000fe4114cf8d2ec 000fe411201305c3 None dbvg applvg rootvg active active active Figure 5-44 The lspv command output before and after removing a cluster Chapter 5. Configuring a PowerHA cluster 103 5.2 Cluster configuration using the clmgr tool PowerHA 7.1 introduces the clmgr command-line tool. This tool is partially new. It is based on the clvt tool with the following improvements: Consistent usage across the supported functions Improved usability Improved serviceability Uses fully globalized message catalog Multiple levels of debugging Automatic help To see the possible values for the attributes, use the man clvt command. 5.2.1 The clmgr action commands The following actions are currently supported in the clmgr command: add delete manage modify move offline online query recover sync view For a list of actions, you can use clmgr command with no arguments. See “The clmgr command” on page 106 and Example 5-10 on page 106. Most of the actions in the list provide aliases. Table 5-1 shows the current actions and their abbreviations and aliases. Table 5-1 Command aliases 104 Actual Synonyms or aliases add a, create, make, mk query q, ls, get modify mod, ch, set delete de, rem, rm er online on, start offline off, stop move mov, mv recover rec sync sy verify ve view vi, cat IBM PowerHA SystemMirror 7.1 for AIX Actual Synonyms or aliases manage mg 5.2.2 The clmgr object classes The following object classes are currently supported: application_controller application_monitor cluster dependency fallback_timer file_collection file_system (incomplete coverage) interface logical_volume (incomplete coverage) method (incomplete coverage) network node persistent_ip physical_volume (incomplete coverage) report resource_group service_ip snapshot site tape volume_group (incomplete coverage) For a list, you can use clmgr with no arguments. See “The clmgr query command” on page 107 and Example 5-11 on page 107. Most of these object classes in the list provide aliases. Table 5-2 on page 105 lists the current object classes and their abbreviations and aliases. Table 5-2 Object classes with aliases Actual Minimum string Cluster cl site si node no interface in, if network ne, nw resource_group rg service_ip se persistent_ip pe, pi application_controller ac, app, appctl Chapter 5. Configuring a PowerHA cluster 105 5.2.3 Examples of using the clmgr command This section provides information about some of the clmgr commands. An advantage of the clmgr command compared to the clvt command is that it is not case-sensitive. For more details about the clmgr command, see Appendix D, “The clmgr man page” on page 501. For a list of the actions that are currently supported, see 5.2.1, “The clmgr action commands” on page 104. For a list, you can use clmgr command with no arguments. See “The clmgr command” on page 106 and Example 5-10 on page 106. For a list of object classes that are currently supported, see 5.2.2, “The clmgr object classes” on page 105. For a list, use the clmgr command with no arguments. See “The clmgr query command” on page 107 and Example 5-11 on page 107. For most of these actions and object classes, abbreviations and aliases are available. These commands are not case-sensitive. You can find more details about the actions and their aliases in “The clmgr action commands” on page 104. For more information about object classes, see “The clmgr object classes” on page 105. Error messages: At the time of writing, the clmgr error messages referred to clvt. This issue will be fixed in a future release so that it references clmgr. The clmgr command Running the clmgr command with no arguments or with the -h option shows the operations that you can perform. Example 5-10 shows the output that you see just by using the clmgr command. You see similar output if you use the -h option. The difference between the clmgr and clmgr -h commands is that, in the output of the clmgr -h command, the line with the error message is missing. For more details about the -h option, see “Using help in clmgr” on page 111. Example 5-10 Output of the clmgr command with no arguments # clmgr ERROR: an invalid operation was requested: clmgr [-c|-x] [-S] [-v] [-f] [-D] [-l {low|med|high|max}] [-T <ID>] \ [-a {<ATTR#1>,<ATTR#2>,<ATTR#n>,...}] <ACTION> <CLASS> [<NAME>] \ [-h | <ATTR#1>=<VALUE#1> <ATTR#2>=<VALUE#2> <ATTR#n>=<VALUE#n> ...] clmgr [-c|-x] [-S] [-v] [-f] [-D] [-l {low|med|high|max}] [-T <ID>] \ [-a {<ATTR#1>,<ATTR#2>,<ATTR#n>,...}] -M " <ACTION> <CLASS> [<NAME>] [<ATTR#1>=<VALUE#1> <ATTR#n>=<VALUE#n> ...] . . ." ACTION={add|modify|delete|query|online|offline|...} CLASS={cluster|site|node|network|resource_group|...} clmgr {-h|-?} [-v] clmgr [-v] help 106 IBM PowerHA SystemMirror 7.1 for AIX # Available actions for clvt: add delete help manage modify move offline online query recover sync verify view # The clmgr query command Running the clmgr command with only the query argument generates a list of the supported object classes as shown in Example 5-11. You see similar output if you use the -h option. The difference between the clmgr query and clmgr query -h commands is that, in the output of the clmgr query -h command, the lines with the object class names are indented. For more details about the -h option, see “Using help in clmgr” on page 111. Example 5-11 Output of the clmgr query command # clmgr query # Available classes for clvt action "query": application_controller application_monitor cluster dependency fallback_timer file_collection file_system interface log logical_volume method network node persistent_ip physical_volume resource_group service_ip site smart_assist snapshot tape volume_group # Chapter 5. Configuring a PowerHA cluster 107 The clmgr query cluster command You use the clmgr query cluster command to obtain detailed information about your cluster. Example 5-12 show the output from the cluster used in the test environment. Example 5-12 Output of the clmgr query cluster command # clmgr query cluster CLUSTER_NAME="hacmp29_cluster" CLUSTER_ID="1126895238" STATE="STABLE" VERSION="7.1.0.1" VERSION_NUMBER="12" EDITION="STANDARD" CLUSTER_IP="" REPOSITORY="caa_private0" SHARED_DISKS="" UNSYNCED_CHANGES="false" SECURITY="Standard" FC_SYNC_INTERVAL="10" RG_SETTLING_TIME="0" RG_DIST_POLICY="node" MAX_EVENT_TIME="180" MAX_RG_PROCESSING_TIME="180" SITE_POLICY_FAILURE_ACTION="fallover" SITE_POLICY_NOTIFY_METHOD="" DAILY_VERIFICATION="Enabled" VERIFICATION_NODE="Default" VERIFICATION_HOUR="0" VERIFICATION_DEBUGGING="Enabled" LEVEL="" ALGORITHM="" GRACE_PERIOD="" REFRESH="" MECHANISM="" CERTIFICATE="" PRIVATE_KEY="" # As mentioned previously, most clmgr actions and object classes provide aliases. Another helpful feature of the clmgr command is the ability to understand abbreviated commands. For example, the previous command can be shortened as follows: # clmgr q cl For more details about the capability of the clmgr command, see 5.2.1, “The clmgr action commands” on page 104, and 5.2.2, “The clmgr object classes” on page 105. See also the man pages listed in Appendix D, “The clmgr man page” on page 501. 108 IBM PowerHA SystemMirror 7.1 for AIX The enhanced search capability An additional feature of the clmgr command is that it provides an easy search capability with the query action. Example 5-13 shows how to list all the defined resource groups. Example 5-13 List of all defined resource groups # clmgr query rg rg1 rg2 rg3 rg4 rg5 rg6 # You can also use more complex search expressions. Example 5-14 shows how you can use simple regular expression command. In addition, you can search on more than one field, and only those objects that match all provided searches are displayed. Example 5-14 Simple regular expression command # clmgr query rg name=rg[123] rg1 rg2 rg3 # The -a option Some query commands produce a rather long output. You can use the -a (attributes) option to obtain shorter output and for information about a single value as shown in Example 5-15. You can also use this option to get information about several values as shown in Example 5-16. Example 5-15 List state of the cluster node munich:/ # clmgr -a state query cluster STATE="STABLE" munich:/ # Example 5-16 shows how to get information about the state and the location of a resource group. The full output of the query command for the nfsrg resource group is shown in Example 5-31 on page 123. Example 5-16 List state and location of a resource group munich:/ # clmgr -a STATE,Current query rg nfsrg STATE="ONLINE" CURRENT_NODE="berlin" munich:/ # Chapter 5. Configuring a PowerHA cluster 109 You can also use wildcards for getting information about some values as shown in Example 5-17. Example 5-17 The -a option and wildcards munich:/ # clmgr -a "*NODE*" query rg nfsrg CURRENT_NODE="berlin" NODES="berlin munich" PRIMARYNODES="" PRIMARYNODES_STATE="" SECONDARYNODES="" SECONDARYNODES_STATE="" NODE_PRIORITY_POLICY="default" munich:/ # The -v option The -v (verbose) option is helpful when used with the query action as shown in Example 5-18. You use this option almost exclusively in IBM Systems Director to scan the cluster for information. Example 5-18 The -v option for query all resource groups munich:/ # clmgr -a STATE,current -v STATE="ONLINE" CURRENT_NODE="munich" query rg STATE="ONLINE" CURRENT_NODE="berlin" munich:/ # If you do not use the -v option with the query action as shown in Example 5-18, you see an error message similar to the one in Example 5-19. Example 5-19 Error message when not using the -v option for query all resource groups munich:/ # clmgr -a STATE,current query rg ERROR: a name/label must be provided. munich:/ # Returning only one value You might want only one value returned for a clmgr command. This requirement mainly happens if you prefer to use the clmgr command in a script and do not like to get the ATTR="VALUE" format. You only need the VALUE. Example 5-20 shows how you can ensure that only one value is returned. The command has the following syntax: clmgr -cSa <ATTR> query <CLASS> <OBJECT> Example 5-20 The command to return a single value from the clmgr command # clmgr -cSa state query rg rg1 ONLINE # 110 IBM PowerHA SystemMirror 7.1 for AIX 5.2.4 Using help in clmgr You can use the -h option in combination with actions and object classes. For example, if you want to know how to add a resource group to an existing cluster, you can use the clmgr add resource_group -h command. Example 5-21 shows the output of using this command. For an example of using the clmgr add resource_group command, see Example 5-28 on page 121. Example 5-21 Help for adding resource group using the clmgr command # clmgr add resource_group -h # Available options for "clvt add resource_group": <RESOURCE_GROUP_NAME> NODES PRIMARYNODES SECONDARYNODES FALLOVER FALLBACK STARTUP FALLBACK_AT SERVICE_LABEL APPLICATIONS VOLUME_GROUP FORCED_VARYON VG_AUTO_IMPORT FILESYSTEM FSCHECK_TOOL RECOVERY_METHOD FS_BEFORE_IPADDR EXPORT_FILESYSTEM EXPORT_FILESYSTEM_V4 MOUNT_FILESYSTEM STABLE_STORAGE_PATH WPAR_NAME NFS_NETWORK SHARED_TAPE_RESOURCES DISK AIX_FAST_CONNECT_SERVICES COMMUNICATION_LINKS WLM_PRIMARY WLM_SECONDARY MISC_DATA CONCURRENT_VOLUME_GROUP NODE_PRIORITY_POLICY NODE_PRIORITY_POLICY_SCRIPT NODE_PRIORITY_POLICY_TIMEOUT SITE_POLICY # Object class names between the angle brackets (<>) are required information, which does not mean that all the other items are optional. Some items might not be marked because of other dependencies. In Example 5-22 on page 112, only CLUSTER_NAME is listed as required, but because of the new CAA dependency, the REPOSITORY (disk) is also required. For more details about how to create a cluster using the clmgr command, see “Configuring a new cluster using the clmgr command” on page 113. Chapter 5. Configuring a PowerHA cluster 111 Example 5-22 Help for creating a cluster # clmgr add cluster -h # Available options for "clvt add cluster": <CLUSTER_NAME> FC_SYNC_INTERVAL NODES REPOSITORY SHARED_DISKS CLUSTER_IP RG_SETTLING_TIME RG_DIST_POLICY MAX_EVENT_TIME MAX_RG_PROCESSING_TIME SITE_POLICY_FAILURE_ACTION SITE_POLICY_NOTIFY_METHOD DAILY_VERIFICATION VERIFICATION_NODE VERIFICATION_HOUR VERIFICATION_DEBUGGING 5.2.5 Configuring a PowerHA cluster using the clmgr command In this section, you configure the two-node mutual takeover cluster with a focus on the PowerHA configuration only. The system names are munich and berlin. This task does not include the preliminary steps, which include setting up the IP interfaces and the shared disks. For details and an example of the output, see “Starting the cluster using the clmgr command” on page 127. All the steps in the referenced section were executed on the munich system. To configure a PowerHA cluster by using the clmgr command, follow these steps: 1. Configure the cluster: # clmgr add cluster de_cluster NODES=munich,berlin REPOSITORY=hdisk4 For details, see “Configuring a new cluster using the clmgr command” on page 113. 2. Configure the service IP addresses: # clmgr add service_ip alleman NETWORK=net_ether_01 NETMASK=255.255.255.0 # clmgr add service_ip german NETWORK=net_ether_01 NETMASK=255.255.255.0 For details, see “Defining the service address using the clmgr command” on page 118. 3. Configure the application server: # clmgr add application_controller http_app \ > STARTSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k start" \ > STOPSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k stop" For details, see “Defining the application server using the clmgr command” on page 120. 4. Configure a resource group: # clmgr add resource_group httprg VOLUME_GROUP=httpvg NODES=munich,berlin \ > SERVICE_LABEL=alleman APPLICATIONS=http_app # > > > 112 clmgr add resource_group nfsrg VOLUME_GROUP=nfsvg NODES=berlin,munich \ SERVICE_LABEL=german FALLBACK=NFB RECOVERY_METHOD=parallel \ FS_BEFORE_IPADDR=true EXPORT_FILESYSTEM="/nfsdir" \ MOUNT_FILESYSTEM="/sap;/nfsdir" IBM PowerHA SystemMirror 7.1 for AIX For details, see “Defining the resource group using the clmgr command” on page 120. 5. Sync the cluster: clmgr sync cluster For details, see “Synchronizing the cluster definitions by using the clmgr command” on page 124. 6. Start the cluster: clmgr online cluster start_cluster BROADCAST=false CLINFO=true For details, see “Starting the cluster using the clmgr command” on page 127. Command and syntax of clmgr: To ensure a robust and easy-to-use SMIT interface, when using the clmgr command or CLI to configure or manage the PowerHA cluster, you must use the correct command and syntax. Configuring a new cluster using the clmgr command Creating a cluster using the clmgr command is similar to using the typical configuration through SMIT (described in 5.1.3, “Typical configuration of a cluster topology” on page 69). If you want a method that is similar to the custom configuration in SMIT, you must use a combination of the classical PowerHA commands and the clmgr command. The steps in the following sections use the clmgr command only. Preliminary setup Prerequisite: In this section, you must know how to set up the prerequisites for a PowerHA cluster. The IP interfaces are already defined and the shared volume groups and file systems have been created. The host names of the two systems are munich and berlin. Figure 5-45 shows the disks and shared volume groups that are defined so far. hdisk4 is used as the CAA repository disk. munich:/ # lspv hdisk1 00c0f6a012446137 hdisk2 00c0f6a01245190c hdisk3 00c0f6a012673312 hdisk4 00c0f6a01c784107 hdisk0 00c0f6a07c5df729 munich:/ # httpvg httpvg nfsvg None rootvg active Figure 5-45 List of available disks Chapter 5. Configuring a PowerHA cluster 113 Figure 5-46 shows the network interfaces that are defined on the munich system. munich:/ # netstat -i Name Mtu Network Address en0 1500 link#2 a2.4e.58.a0.41.3 en0 1500 192.168.100 munich en1 1500 link#3 a2.4e.58.a0.41.4 en1 1500 100.168.200 munichb1 en2 1500 link#4 a2.4e.58.a0.41.5 en2 1500 100.168.220 munichb2 lo0 16896 link#1 lo0 16896 127 localhost.locald lo0 16896 localhost6.localdomain6 munich:/ # Ipkts Ierrs 23992 0 23992 0 2 0 2 0 4324 0 4324 0 16039 0 16039 0 16039 0 Opkts Oerrs Coll 24516 0 0 24516 0 0 7 0 0 7 0 0 7 0 0 7 0 0 16039 0 0 16039 0 0 16039 0 0 Figure 5-46 Defined network interfaces Creating the cluster To begin, define the cluster along with the repository disk. If you do not remember all the options for creating a cluster with the clmgr command, use the clmgr add cluster -h command. Example 5-22 on page 112 shows the output of this command. Before you use the clmgr add cluster command, you must know which disk will be used for the CAA repository disk. Example 5-23 shows the command and its output. Table 5-3 provides more details about the command and arguments that are used. Table 5-3 Creating a cluster using the clmgr command Action, object class, or argument Value used Comment add Basic preferred action cluster Basic object class used CLUSTER_NAME de_cluster Optional argument name, but required value NODES munich, berlin Preferred node to use in the cluster REPOSITORY hdisk4 The disk for the CAA repository Example 5-23 Creating a cluster using the clmgr command munich:/ # clmgr add cluster de_cluster NODES=munich,berlin REPOSITORY=hdisk4 Cluster Name: de_cluster Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No Repository Disk: None Cluster IP Address: There are 2 node(s) and 2 network(s) defined NODE berlin: Network net_ether_01 berlinb2 100.168.220.141 berlinb1 100.168.200.141 Network net_ether_010 114 IBM PowerHA SystemMirror 7.1 for AIX berlin 192.168.101.141 NODE munich: Network net_ether_01 munichb2 100.168.220.142 munichb1 100.168.200.142 Network net_ether_010 munich 192.168.101.142 No resource groups defined clharvest_vg: Initializing.... Gathering cluster information, which may take a few minutes... clharvest_vg: Processing... Storing the following information in file /usr/es/sbin/cluster/etc/config/clvg_config berlin: Hdisk: hdisk1 PVID: 00c0f6a012446137 VGname: httpvg VGmajor: 100 Conc-capable: Yes VGactive: No Quorum-required:Yes Hdisk: hdisk2 PVID: 00c0f6a01245190c VGname: httpvg VGmajor: 100 Conc-capable: Yes VGactive: No Quorum-required:Yes munich: Hdisk: hdisk1 PVID: 00c0f6a012446137 VGname: httpvg VGmajor: 100 Conc-capable: Yes VGactive: No Quorum-required:Yes berlin: Hdisk: hdisk3 PVID: 00c0f6a012673312 VGname: nfsvg VGmajor: 200 Conc-capable: Yes VGactive: No Quorum-required:Yes munich: Hdisk: hdisk2 Chapter 5. Configuring a PowerHA cluster 115 PVID: 00c0f6a01245190c VGname: httpvg VGmajor: 100 Conc-capable: Yes VGactive: No Quorum-required:Yes berlin: Hdisk: hdisk4 PVID: 00c0f6a01c784107 VGname: None VGmajor: 0 Conc-capable: No VGactive: No Quorum-required:No munich: Hdisk: hdisk3 PVID: 00c0f6a012673312 VGname: nfsvg VGmajor: 200 Conc-capable: Yes VGactive: No Quorum-required:Yes berlin: Hdisk: hdisk0 PVID: 00c0f6a048cf8bfd VGname: rootvg VGmajor: 10 Conc-capable: No VGactive: Yes Quorum-required:Yes FREEMAJORS: 35..99,101..199,201... munich: Hdisk: hdisk4 PVID: 00c0f6a01c784107 VGname: None VGmajor: 0 Conc-capable: No VGactive: No Quorum-required:No Hdisk: hdisk0 PVID: 00c0f6a07c5df729 VGname: rootvg VGmajor: 10 Conc-capable: No VGactive: Yes Quorum-required:Yes FREEMAJORS: 35..99,101..199,201... 116 IBM PowerHA SystemMirror 7.1 for AIX Cluster Name: de_cluster Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No Repository Disk: hdisk4 Cluster IP Address: There are 2 node(s) and 2 network(s) defined NODE berlin: Network net_ether_01 berlinb2 100.168.220.141 berlinb1 100.168.200.141 Network net_ether_010 berlin 192.168.101.141 NODE munich: Network net_ether_01 munichb1 100.168.200.142 munichb2 100.168.220.142 Network net_ether_010 munich 192.168.101.142 No resource groups defined Warning: There is no cluster found. cllsclstr: No cluster defined cllsclstr: Error reading configuration Communication path berlin discovered a new node. Hostname is berlin. Adding it to the configuration with Nodename berlin. Communication path munich discovered a new node. Hostname is munich. Adding it to the configuration with Nodename munich. Discovering IP Network Connectivity Discovered [10] interfaces IP Network Discovery completed normally Current cluster configuration: Discovering Volume Group Configuration Current cluster configuration: munich:/ # Chapter 5. Configuring a PowerHA cluster 117 To see the configuration up to this point, you can use the cltopinfo command. Keep in mind that this information is local to the system on which you are working. Example 5-24 shows the configuration up to this point. Example 5-24 Output of the cltopinfo command after creating cluster definitions munich:/ # cltopinfo Cluster Name: de_cluster Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No Repository Disk: hdisk4 Cluster IP Address: There are 2 node(s) and 2 network(s) defined NODE berlin: Network net_ether_01 berlinb2 100.168.220.141 berlinb1 100.168.200.141 Network net_ether_010 berlin 192.168.101.141 NODE munich: Network net_ether_01 munichb1 100.168.200.142 munichb2 100.168.220.142 Network net_ether_010 munich 192.168.101.142 No resource groups defined munich:/ # Defining the service address using the clmgr command Next you define the service addresses. Example 5-25 on page 119 shows the command and its output. The clmgr add cluster command: The clmgr add cluster command automatically runs discovery on IP and volume group harvesting. It results in adding the IP network interfaces automatically to the cluster configuration. Table 5-4 provides more details about the command and arguments that are used. Table 5-4 Defining the service address using the clmgr command Action, object class, or argument 118 Value used Comment add Basic preferred action service_ip Basic object class used SERVICE_IP_NAME alleman german Optional argument name, but required value NETWORK net_ether_01 The network name from the cltopinfo command used previously IBM PowerHA SystemMirror 7.1 for AIX Action, object class, or argument Value used Comment NETMASK 255.255.255.0 Optional; when you specify a value, use the same one that you used in setting up the interface. Example 5-25 Defining the service address munich:/ # clmgr add service_ip alleman NETWORK=net_ether_01 NETMASK=255.255.255.0 munich:/ # clmgr add service_ip german NETWORK=net_ether_01 NETMASK=255.255.255.0 munich:/ # To check the configuration up to this point, use the cltopinfo command again. Example 5-26 shows the current configuration. Example 5-26 The cltopinfo output after creating cluster definitions munich:/ # cltopinfo Cluster Name: de_cluster Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No Repository Disk: hdisk4 Cluster IP Address: There are 2 node(s) and 2 network(s) defined NODE berlin: Network net_ether_01 german 10.168.101.141 alleman 10.168.101.142 berlinb1 100.168.200.141 berlinb2 100.168.220.141 Network net_ether_010 berlin 192.168.101.141 NODE munich: Network net_ether_01 german 10.168.101.141 alleman 10.168.101.142 munichb1 100.168.200.142 munichb2 100.168.220.142 Network net_ether_010 munich 192.168.101.142 No resource groups defined munich:/ # Chapter 5. Configuring a PowerHA cluster 119 Defining the application server using the clmgr command Next you define the application server. Example 5-27 shows the command and its output. The application is named server http_app. Table 5-5 provides more details about the command and arguments that are used. Table 5-5 Defining the application server using the clmgr command Action, object class, or argument Value used Comment add Basic preferred action application_controller Basic object class used APPLICATION_SERVER_NAME http_app Optional argument name, but required value STARTSCRIPT "/usr/IBM/HTTPServer/bin/ apachectl -k start" The start script used for the application STOPSCRIPT "/usr/IBM/HTTPServer/bin/ apachectl -k stop" The stop script used for the application Example 5-27 Defining the service address munich:/ # munich:/ # clmgr add application_controller http_app \ > STARTSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k start" \ > STOPSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k stop" munich:/ # Defining the resource group using the clmgr command Next you define the resource groups. Example 5-28 on page 121 shows the commands and their output. Compared to the smit functions, by using the clmgr command, you create a resource group and its resources in one step. Therefore, you must ensure that you have defined all the service IP addresses and your application servers. Two resource groups have been created. The first one uses only the items needed for this resource group (httprg), so that the system used the default values for the remaining arguments. Table 5-6 provides more details about the command and arguments that are used. Table 5-6 Defining the resource groups using the clmgr (httprg) command action, object class, or argument 120 Value used comment add Basic preferred action. resource_group Basic object class used. RESOURCE_GROUP_NAME httprg Optional argument name, but required value. VOLUME_GROUP httpvg The volume group used for this resource group. NODES munich,berlin The sequence of the nodes is important. The first node is the primary node. IBM PowerHA SystemMirror 7.1 for AIX action, object class, or argument Value used comment SERVICE_LABEL alleman The service address used for this resource group. APPLICATIONS http_app The application server label created in a previous step. For the second resource group in the test environment, we specified more details because we did not want to use the default values (nfsrg). Table 5-7 provides more details about the command and arguments that we used. Table 5-7 Defining the resource groups using the clmgr (nfsrg) command Action, object class, or argument Value used Comment add Basic preferred action. resource_group Basic object class used. RESOURCE_GROUP_NAME httprg Optional argument name, but required value. VOLUME_GROUP nfsvg The volume group use for this resource group. NODES berlin,munich The sequence of the nodes is important. The first node is the primary node. SERVICE_LABEL german The service address used for this resource group. FALLBACK NFB Never Fall Back (NFB) preferred for this resource group (the default is FBHPN). RECOVERY_METHOD parallel Parallel preferred as the recovery method for this resource group. (The default is sequential.) FS_BEFORE_IPADDR true Because we want to define an NFS cross mount, we must use the value true here. (The default is false.) EXPORT_FILESYSTEM /nfsdir The file system for NFS to export. MOUNT_FILESYSTEM "/sap;/nfsdir" Requires the same syntax because we used it in smit to define the NFS cross mount. Example 5-28 shows the commands that are used to define the resource groups listed in Table 5-6 on page 120 and Table 5-7. Example 5-28 Defining the resource groups munich:/ # clmgr add resource_group httprg VOLUME_GROUP=httpvg \ > NODES=munich,berlin SERVICE_LABEL=alleman APPLICATIONS=http_app Auto Discover/Import of Volume Groups was set to true. Gathering cluster information, which may take a few minutes. munich:/ # munich:/ # clmgr add resource_group nfsrg VOLUME_GROUP=nfsvg \ > NODES=berlin,munich SERVICE_LABEL=german FALLBACK=NFB \ > RECOVERY_METHOD=parallel FS_BEFORE_IPADDR=true EXPORT_FILESYSTEM="/nfsdir" \ > MOUNT_FILESYSTEM="/sap;/nfsdir" Chapter 5. Configuring a PowerHA cluster 121 Auto Discover/Import of Volume Groups was set to true. Gathering cluster information, which may take a few minutes. munich:/ # To see the configuration up to this point, use the clmgr query command. Example 5-29 shows how to check which resource groups you defined. Example 5-29 Listing the defined resource groups using the clmgr command munich:/ # clmgr query rg httprg nfsrg munich:/ # Next, you can see the content that you created for the resource groups. Example 5-30 shows the content of the httprg. As discussed previously, the default values for this resource group were used as much as possible. Example 5-30 Contents listing of httprg munich:/ # clmgr query rg httprg NAME="httprg" STATE="UNKNOWN" CURRENT_NODE="" NODES="munich berlin" PRIMARYNODES="" PRIMARYNODES_STATE="UNKNOWN" SECONDARYNODES="" SECONDARYNODES_STATE="UNKNOWN" TYPE="" APPLICATIONS="http_app" STARTUP="OHN" FALLOVER="FNPN" FALLBACK="FBHPN" NODE_PRIORITY_POLICY="default" SITE_POLICY="ignore" DISK="" VOLUME_GROUP="httpvg" CONCURRENT_VOLUME_GROUP="" FORCED_VARYON="false" FILESYSTEM="" FSCHECK_TOOL="fsck" RECOVERY_METHOD="sequential" EXPORT_FILESYSTEM="" SHARED_TAPE_RESOURCES="" AIX_CONNECTIONS_SERVICES="" AIX_FAST_CONNECT_SERVICES="" COMMUNICATION_LINKS="" MOUNT_FILESYSTEM="" SERVICE_LABEL="alleman" MISC_DATA="" SSA_DISK_FENCING="false" VG_AUTO_IMPORT="false" INACTIVE_TAKEOVER="false" CASCADE_WO_FALLBACK="false" 122 IBM PowerHA SystemMirror 7.1 for AIX FS_BEFORE_IPADDR="false" NFS_NETWORK="" MOUNT_ALL_FS="true" WLM_PRIMARY="" WLM_SECONDARY="" FALLBACK_AT="" RELATIONSHIP="" SRELATIONSHIP="ignore" GMD_REP_RESOURCE="" PPRC_REP_RESOURCE="" ERCMF_REP_RESOURCE="" SRDF_REP_RESOURCE="" TRUCOPY_REP_RESOURCE="" SVCPPRC_REP_RESOURCE="" GMVG_REP_RESOURCE="" EXPORT_FILESYSTEM_V4="" STABLE_STORAGE_PATH="" WPAR_NAME="" VARYON_WITH_MISSING_UPDATES="true" DATA_DIVERGENCE_RECOVERY="ignore" munich:/ # Now you can see the content that was created for the resource groups. Example 5-31 shows the content of the nfsrg resource group. Example 5-31 List the content of nfsrg resource group munich:/ # clmgr query rg nfsrg NAME="nfsrg" STATE="UNKNOWN" CURRENT_NODE="" NODES="berlin munich" PRIMARYNODES="" PRIMARYNODES_STATE="UNKNOWN" SECONDARYNODES="" SECONDARYNODES_STATE="UNKNOWN" TYPE="" APPLICATIONS="" STARTUP="OHN" FALLOVER="FNPN" FALLBACK="NFB" NODE_PRIORITY_POLICY="default" SITE_POLICY="ignore" DISK="" VOLUME_GROUP="nfsvg" CONCURRENT_VOLUME_GROUP="" FORCED_VARYON="false" FILESYSTEM="" FSCHECK_TOOL="fsck" RECOVERY_METHOD="parallel" EXPORT_FILESYSTEM="/nfsdir" SHARED_TAPE_RESOURCES="" AIX_CONNECTIONS_SERVICES="" AIX_FAST_CONNECT_SERVICES="" COMMUNICATION_LINKS="" MOUNT_FILESYSTEM="/sap;/nfsdir" Chapter 5. Configuring a PowerHA cluster 123 SERVICE_LABEL="german" MISC_DATA="" SSA_DISK_FENCING="false" VG_AUTO_IMPORT="false" INACTIVE_TAKEOVER="false" CASCADE_WO_FALLBACK="false" FS_BEFORE_IPADDR="true" NFS_NETWORK="" MOUNT_ALL_FS="true" WLM_PRIMARY="" WLM_SECONDARY="" FALLBACK_AT="" RELATIONSHIP="" SRELATIONSHIP="ignore" GMD_REP_RESOURCE="" PPRC_REP_RESOURCE="" ERCMF_REP_RESOURCE="" SRDF_REP_RESOURCE="" TRUCOPY_REP_RESOURCE="" SVCPPRC_REP_RESOURCE="" GMVG_REP_RESOURCE="" EXPORT_FILESYSTEM_V4="" STABLE_STORAGE_PATH="" WPAR_NAME="" VARYON_WITH_MISSING_UPDATES="true" DATA_DIVERGENCE_RECOVERY="ignore" munich:/ # Synchronizing the cluster definitions by using the clmgr command After you create all topology and resource information, synchronize the cluster. Verifying and propagating the changes: After using the clmgr command to modify the cluster configuration, enter the clmgr verify cluster and clmgr sync cluster commands to verify and propagate the changes to all nodes. Example 5-32 shows usage of the clmgr sync cluster command to synchronize the cluster and the command output. Example 5-32 Synchronizing the cluster using the clmgr sync cluster command munich:/ # clmgr sync cluster Verification to be performed on the following: Cluster Topology Cluster Resources Retrieving data from available cluster nodes. This could take a few minutes. Start data collection on node berlin Start data collection on node munich Waiting on node berlin data collection, 15 seconds elapsed Waiting on node munich data collection, 15 seconds elapsed Collector on node berlin completed 124 IBM PowerHA SystemMirror 7.1 for AIX Collector on node munich completed Data collection complete Verifying Cluster Topology... Completed 10 percent of the verification checks berlin munich net_ether_010 net_ether_010 Completed 20 percent of the verification checks Completed 30 percent of the verification checks Verifying Cluster Resources... Completed 40 percent of the verification checks http_app Completed Completed Completed Completed Completed Completed httprg 50 percent of the verification checks 60 percent of the verification checks 70 percent of the verification checks 80 percent of the verification checks 90 percent of the verification checks 100 percent of the verification checks Remember to redo automatic error notification if configuration has changed. Verification has completed normally. Committing any changes, as required, to all available nodes... Adding any necessary PowerHA SystemMirror for AIX entries to /etc/inittab and /etc/rc.net for IP Address Takeover on node munich. Adding any necessary PowerHA SystemMirror for AIX entries to /etc/inittab and /etc/rc.net for IP Address Takeover on node berlin. Verification has completed normally. WARNING: Multiple communication interfaces are recommended for networks that use IP aliasing in order to prevent the communication interface from becoming a single point of failure. There are fewer than the recommended number of communication interfaces defined on the following node(s) for the given network(s): Node: Network: ------------------------------------------------------------------WARNING: Not all cluster nodes have the same set of HACMP filesets installed. The following is a list of fileset(s) missing, and the node where the fileset is missing: Fileset: -------------------------------- Node: -------------------------------- Chapter 5. Configuring a PowerHA cluster 125 WARNING: There are IP labels known to HACMP and not listed in file /usr/es/sbin/cluster/etc/clhosts.client on node: berlin. Clverify can automatically populate this file to be used on a client node, if executed in auto-corrective mode. WARNING: There are IP labels known to HACMP and not listed in file /usr/es/sbin/cluster/etc/clhosts.client on node: munich. Clverify can automatically populate this file to be used on a client node, if executed in auto-corrective mode. WARNING: Network option "nonlocsrcroute" is set to 0 and will be set to 1 on during HACMP startup on the following nodes: berlin munich WARNING: Network option "ipsrcrouterecv" is set to 0 and will be set to 1 on during HACMP startup on the following nodes: berlin munich WARNING: Application monitors are required for detecting application failures in order for HACMP to recover from them. Application monitors are started by HACMP when the resource group in which they participate is activated. The following application(s), shown with their associated resource group, do not have an application monitor configured: Application Server Resource Group -------------------------------- --------------------------------WARNING: Node munich has cluster.es.nfs.rte installed however grace periods are not fully enabled on this node. Grace periods must be enabled before NFSv4 stable storage can be used. HACMP will attempt to fix this opportunistically when acquiring NFS resources on this node however the change won't take effect until the next time that nfsd is started. If this warning persists, the administrator should perform the following steps to enable grace periods on 126 IBM PowerHA SystemMirror 7.1 for AIX munich at the next planned downtime: 1. stopsrc -s nfsd 2. smitty nfsgrcperiod 3. startsrc -s nfsd munich:/ # When the migration finishes successfully, the CAA repository disk is now defined. Figure 5-47 shows the disks before the cluster synchronization, which are the same as those shown in Figure 5-45 on page 113. munich:/ # lspv hdisk1 00c0f6a012446137 hdisk2 00c0f6a01245190c hdisk3 00c0f6a012673312 hdisk4 00c0f6a01c784107 hdisk0 00c0f6a07c5df729 munich:/ # httpvg httpvg nfsvg None rootvg active Figure 5-47 List of available disks before sync Figure 5-48 shows the output of the lspv command after the synchronization. In our example, hdisk4 is now converted into a CAA repository disk and is listed as caa_private0. munich:/ # lspv hdisk1 hdisk2 hdisk3 caa_private0 hdisk0 munich:/ # 00c0f6a012446137 00c0f6a01245190c 00c0f6a012673312 00c0f6a01c784107 00c0f6a07c5df729 httpvg httpvg nfsvg caavg_private rootvg active active Figure 5-48 List of available disks after using the cluster sync command Starting the cluster using the clmgr command To determine whether the cluster is configured correctly, test the cluster. To begin, start the cluster nodes. Example 5-33 show the command that we used and some of the output from using this command. To start the clinfo command, we used the CLINFO=true argument. We did not want a broadcast message. Therefore, we also defined the BROADCAST=false argument. Example 5-33 Starting the cluster by using the clmgr command munich:/ # clmgr online cluster start_cluster BROADCAST=false CLINFO=true Warning: "WHEN" must be specified. Since it was not, a default of "now" will be used. Warning: "MANAGE" must be specified. Since it was not, a default of "auto" will be used. Chapter 5. Configuring a PowerHA cluster 127 /usr/es/sbin/cluster/diag/cl_ver_alias_topology[42] [[ high = high ]] --- skipped lines --- /usr/es/sbin/cluster/diag/cl_ver_alias_topology[335] return 0 WARNING: Multiple communication interfaces are recommended for networks that use IP aliasing in order to prevent the communication interface from becoming a single point of failure. There are fewer than the recommended number of communication interfaces defined on the following node(s) for the given network(s): Node: ---------------------------------berlin munich Network: ---------------------------------net_ether_010 net_ether_010 WARNING: Network option "nonlocsrcroute" is set to 0 and will be set to 1 on during HACMP startup on the following nodes: munich WARNING: Network option "ipsrcrouterecv" is set to 0 and will be set to 1 on during HACMP startup on the following nodes: munich WARNING: Application monitors are required for detecting application failures in order for HACMP to recover from them. Application monitors are started by HACMP when the resource group in which they participate is activated. The following application(s), shown with their associated resource group, do not have an application monitor configured: Application Server Resource Group -------------------------------- --------------------------------http_app httprg /usr/es/sbin/cluster/diag/clwpardata[23] [[ high == high ]] --- skipped lines --- /usr/es/sbin/cluster/diag/clwpardata[325] exit 0 WARNING: Node munich has cluster.es.nfs.rte installed however grace periods are not fully enabled on this node. Grace periods must be enabled before NFSv4 stable storage can be used. HACMP will attempt to fix this opportunistically when acquiring NFS resources on this node however the change won't take effect until the next time that nfsd is started. If this warning persists, the administrator should perform the following steps to enable grace periods on munich at the next planned downtime: 1. stopsrc -s nfsd 2. smitty nfsgrcperiod 3. startsrc -s nfsd berlin: start_cluster: Starting PowerHA SystemMirror 128 IBM PowerHA SystemMirror 7.1 for AIX berlin: berlin: berlin: berlin: munich: munich: munich: munich: munich: 2359456 - 0:09 syslogd Setting routerevalidate to 1 0513-059 The clevmgrdES Subsystem has been started. Subsystem PID is 10682520. 0513-059 The clinfoES Subsystem has been started. Subsystem PID is 10027062. start_cluster: Starting PowerHA SystemMirror 3408044 - 0:07 syslogd Setting routerevalidate to 1 0513-059 The clevmgrdES Subsystem has been started. Subsystem PID is 5505122. 0513-059 The clinfoES Subsystem has been started. Subsystem PID is 6029442. The cluster is now online. munich:/ # Starting all nodes in a cluster: The clmgr online cluster start_cluster command starts all nodes in a cluster by default. Example 5-49 shows that all nodes are now up and running. clstat - HACMP Cluster Status Monitor ------------------------------------Cluster: de_cluster (1126819374) Wed Oct 13 17:27:30 EDT 2010 State: UP SubState: STABLE Nodes: 2 Node: berlin State: UP Interface: berlinb1 (0) Interface: berlinb2 (0) Interface: berlin (1) Interface: german (0) Address: State: Address: State: Address: State: Address: State: 100.168.200.141 UP 100.168.220.141 UP 192.168.101.141 UP 10.168.101.141 UP State: On line Resource Group: nfsrg Node: munich State: UP Interface: munichb1 (0) Interface: munichb2 (0) Interface: munich (1) Interface: alleman (0) Resource Group: httprg Address: State: Address: State: Address: State: Address: State: 100.168.200.142 UP 100.168.220.142 UP 192.168.101.142 UP 10.168.101.142 UP State: On line ************************ f/forward, b/back, r/refresh, q/quit ***************** Figure 5-49 Output of the clstat -a command showing that all nodes are running Chapter 5. Configuring a PowerHA cluster 129 5.2.6 Alternative output formats for the clmgr command All of the previous examples use the ATTR="VALUE" format. However, two other formats are supported. One format is colon-delimited (by using -c). The other format is simple XML (by using -x). Colon-delimited format When using the colon-delimited output format (-c), you can use the -S option to silence or eliminate the header line. Example 5-34 shows the colon-delimited output format. Example 5-34 The colon-delimited output format # clmgr query ac appctl1 NAME="appctl1" MONITORS="" STARTSCRIPT="/bin/hostname" STOPSCRIPT="/bin/hostname" # clmgr -c query ac appctl1 # NAME:MONITORS:STARTSCRIPT:STOPSCRIPT appctl1::/bin/hostname:/bin/hostname # clmgr -cS query ac appctl1 appctl1::/bin/hostname:/bin/hostname Simple XML format Example 5-35 shows the simple XML-based output format. Example 5-35 Simple XML-based output format # clmgr -x query ac appctl1 <APPLICATION_CONTROLLERS> <APPLICATION_CONTROLLER> <NAME>appctl1</NAME> <MONITORS></MONITORS> <STARTSCRIPT>/bin/hostname</STARTSCRIPT> <STOPSCRIPT>/bin/hostname</STOPSCRIPT> </APPLICATION_CONTROLLER> </APPLICATION_CONTROLLERS> 5.2.7 Log file of the clmgr command The traditional PowerHA practice of setting VERBOSE_LOGGING to produce debug output is supported with the clmgr command. You can also set VERBOSE_LOGGING on a per-run basis with the clmgr -l command. The -l flag has the following options: 130 low Typically of interest to support personnel; shows simple function entry and exit. med Typically of interest to support personnel; shows the same information as the low option, but includes input parameters and return codes. high The recommended setting for customer use; turns on set -x in scripts (equivalent to VERBOSE_LOGGING=high) but leaves out internal utility functions. IBM PowerHA SystemMirror 7.1 for AIX max Turns on everything that the high option does and omits nothing. Is likely to make debugging more difficult because of the volume of output that is produced. Attention: The max value might have a negative impact on performance. The main log file for clmgr debugging is the /var/hacmp/log/clutils.log file. This log file includes all standard error and output from each command. The return codes used by the clmgr command are standard for all commands: RC_UNKNOWN=-1 A result is not known. It is useful as an initializer. RC_SUCCESS=0 No errors were detected; the operation seems to have been successful. RC_ERROR=1 A general error has occurred. RC_NOT_FOUND=2 A specified resource does not exist or could not be found. RC_MISSING_INPUT=3 Some required input was missing. RC_INCORRECT_INPUT=4 Some detected input was incorrect. RC_MISSING_DEPENDENCY=5 A required dependency does not exist. RC_SEARCH_FAILED=6 A specified search failed to match any data. Example 5-36 lists the format of the trace information in the clutils.log file. Example 5-36 The trace information in the clutils.log file <SENTINEL>:<RETURN_CODE>:<FILE>:<FUNCTION>[<LINENO>](<ELAPSED_TIME>): <TRANSACTION_ID>:<PID>:<PPID>: <SCRIPT_LINE> The following line shows an example of how the clutils.log file might be displayed: CLMGR:0:resource_common:SerializeAsAssociativeArray()[537](0.704):13327:9765002:90 44114: unset 'array[AIX_LEVEL0]' Example 5-37 shows some lines from the clutils.log file (not using trace). Example 5-37 The clutils.log file CLMGR STARTED (243:7667890:9437234): Wed Oct 6 23:51:22 CDT 2010 CLMGR USER (243:7667890:9437234): ::root:system CLMGR COMMAND (243:7012392:7667890): clmgr -T 243 modify cluster hacmp2728_cluster REPOSITORY=hdisk2 CLMGR ACTUAL (243:7012392:7667890): modify_cluster properties hdisk2 CLMGR RETURN (243:7012392:7667890): 0 CLMGR STDERR -- BEGIN (243:7667890:9437234): Wed Oct 6 23:51:26 CDT 2010 Current cluster configuration: CLMGR STDERR -- END (243:7667890:9437234): Wed Oct 6 23:51:26 CDT 2010 CLMGR ENDED (243:7667890:9437234): Wed Oct 6 23:51:26 CDT 2010 CLMGR ELAPSED (243:7667890:9437234): 3.720 Chapter 5. Configuring a PowerHA cluster 131 5.2.8 Displaying the log file content by using the clmgr command You can use the clmgr action view command to view the log content. Defining the number of lines returned By using the TAIL argument, you can define the number of clmgr command-related lines that are returned from the clutils.log file. Example 5-38 shows how you can specify 1000 lines of clmgr log information. Example 5-38 Using the TAIL argument when viewing the content of the clmgr log file # clmgr view log clutils.log TAIL=1000 | wc -l 1000 # Filtering special items by using the FILTER argument You can use the FILTER argument to filter special items that you are looking for. Example 5-39 shows how to list just the last 10 clmgr commands that were run. Example 5-39 Listing the last 10 clmgr commands # clmgr view log clutils.log TAIL=10 FILTER="CLMGR COMMAND" CLMGR COMMAND (12198:13828308:15138846): clmgr -T 12198 add application_controller appctl1 start=/bin/hostname stop=/bin/hostname CLMGR COMMAND (2629:15138850:17891482): clmgr -T 2629 query application_controller appctl1 CLMGR COMMAND (4446:19464210:17891482): clmgr -c -T 4446 query application_controller appctl1 CLMGR COMMAND (23101:19464214:17891482): clmgr -c -S -T 23101 query application_controller appctl1 CLMGR COMMAND (24919:17826012:17891482): clmgr -x -T 24919 query application_controller appctl1 CLMGR COMMAND (464:14352476:15138926): clmgr -T 464 view log clutils.log CLMGR COMMAND (18211:15728818:15138928): clmgr -T 18211 view log clutils.log CLMGR COMMAND (10884:13828210:14156024): clmgr -T 10884 view log clutils.log CLMGR COMMAND (28631:17629296:14156026): clmgr -T 28631 view log clutils.log CLMGR COMMAND (19061:17825922:14156028): clmgr -T 19061 view log clutils.log TAIL=1000 # Example 5-40 shows how to list the last five clmgr query commands that were run. Example 5-40 Listing the last five clmgr query commands # clmgr view log clutils.log TAIL= FILTER="CLMGR COMMAND",query CLMGR COMMAND (9047:17825980:17891482): clmgr -x -T 9047 query resource_group rg1 CLMGR COMMAND (2629:15138850:17891482): clmgr -T 2629 query application_controller appctl1 CLMGR COMMAND (4446:19464210:17891482): clmgr -c -T 4446 query application_controller appctl1 CLMGR COMMAND (23101:19464214:17891482): clmgr -c -S -T 23101 query application_controller appctl1 CLMGR COMMAND (24919:17826012:17891482): clmgr -x -T 24919 query application_controller appctl1 # 132 IBM PowerHA SystemMirror 7.1 for AIX 5.3 PowerHA SystemMirror for IBM Systems Director Using the web browser graphical user interface makes it easy to complete the configuration and management tasks by mouse clicks. For example, you can easily create a cluster, verify and synchronize a cluster, and add nodes to a cluster) Director client agent of PowerHA SystemMirror is installed on cluster nodes in the same manner as PowerHA SystemMirror itself by using the installp command. The Director server and PowerHA server plug-in installation require a separate effort. You must download them from the external website and manually install them on a dedicated system. This system does not have to be a PowerHA system. To learn about installing the Systems Director and PowerHA components, and their use for configuration and management tasks, see Chapter 12, “Creating and managing a cluster using IBM Systems Director” on page 333. Chapter 5. Configuring a PowerHA cluster 133 134 IBM PowerHA SystemMirror 7.1 for AIX 6 Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 PowerHA SystemMirror Smart Assist for DB2 is included in the base Standard Edition software. It simplifies and minimizes the time and effort of making a non-DPF DB2 database highly available. The Smart Assist automatically discovers DB2 instances and databases and creates start and stop scripts for the instances. The Smart Assist also creates process and custom PowerHA application monitors that help to keep the DB2 instances highly available. This chapter explains how to configure a hot standby two-node IBM PowerHA SystemMirror 7.1 cluster using the Smart Assist for DB2. The lab cluster korea is used for the examples with the participating nodes seoul and busan. This chapter includes the following topics: Prerequisites Implementing a PowerHA SystemMirror cluster and Smart Assist for DB2 7.1 © Copyright IBM Corp. 2011. All rights reserved. 135 6.1 Prerequisites This section describes the prerequisites for the Smart Assist implementation. 6.1.1 Installing the required file sets You must install two additional file sets, as shown in Example 6-1, before using Smart Assist for DB2. Example 6-1 Additional file sets required for installing Smart Assist seoul:/ # clcmd lslpp -l cluster.es.assist.common cluster.es.assist.db2 ------------------------------NODE seoul ------------------------------Fileset Level State Description ---------------------------------------------------------------------------Path: /usr/lib/objrepos cluster.es.assist.common 7.1.0.1 COMMITTED PowerHA SystemMirror Smart Assist Common Files cluster.es.assist.db2 7.1.0.1 COMMITTED PowerHA SystemMirror Smart Assist for DB2 ------------------------------NODE busan ------------------------------Fileset Level State Description ---------------------------------------------------------------------------Path: /usr/lib/objrepos cluster.es.assist.common 7.1.0.1 COMMITTED PowerHA SystemMirror Smart Assist Common Files cluster.es.assist.db2 7.1.0.1 COMMITTED PowerHA SystemMirror Smart Assist for DB2 6.1.2 Installing DB2 on both nodes The DB2 versions supported by the PowerHA Smart Assist are versions 8.1, 8.2, 9.1, and 9.5. For the example in this appendix, DB2 9.5 has been installed on both nodes, seoul and busan, as shown in Example 6-2. Example 6-2 DB2 version installed seoul:/db2/db2pok # db2pd -v Instance db2pok uses 64 bits and DB2 code release SQL09050 with level identifier 03010107 Informational tokens are DB2 v9.5.0.0, s071001, AIX6495, Fix Pack 0. 136 IBM PowerHA SystemMirror 7.1 for AIX 6.1.3 Importing the shared volume group and file systems The storage must be accessible from both nodes with the logical volume structures created and imported on both sides. If the volume groups are not imported on the secondary node, Smart Assist for DB2 does it automatically as shown in Example 6-3. Example 6-3 Volume groups imported in the nodes seoul:/db2/db2pok # clcmd lspv ------------------------------NODE seoul ------------------------------hdisk0 00c0f6a088a155eb caa_private0 00c0f6a01077342f cldisk2 00c0f6a0107734ea cldisk1 00c0f6a010773532 rootvg caavg_private pokvg pokvg active active ------------------------------NODE busan ------------------------------hdisk0 00c0f6a089390270 caa_private0 00c0f6a01077342f cldisk2 00c0f6a0107734ea cldisk1 00c0f6a010773532 rootvg caavg_private pokvg pokvg active active 6.1.4 Creating the DB2 instance and database on the shared volume group Before launching the PowerHA Smart Assist for DB2, you must have already created the DB2 instance and DB2 database over the volume groups that are shared by both nodes. In Example 6-4, the home for the POK database was created in the /db2/POK/db2pok shared file system of the volume group pokvg. The instance was created in the /db2/db2pok shared file system, which is the home directory for user db2pok. The instance was created in the primary node only as far as the structures are created over a shared volume group. Example 6-4 Displaying the logical volume groups of pokvg seoul:/ # lsvg -l pokvg pokvg: LV NAME TYPE LPs loglv001 jfs2log 1 poklv001 jfs2 96 poklv002 jfs2 192 poklv003 jfs2 32 poklv004 jfs2 48 poklv005 jfs2 64 poklv006 jfs2 64 poklv008 jfs2 32 poklv009 jfs2 4 poklv007 jfs2 32 PPs 1 96 192 32 48 64 64 32 4 32 PVs 1 1 2 1 1 1 1 1 1 1 LV STATE open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd MOUNT POINT N/A /db2/POK/db2pok /db2/POK/sapdata1 /db2/POK/sapdatat1 /db2/POK/log_dir /export/sapmnt/POK /export/usr/sap/trans /usr/sap/POK /db2/POK/db2dump /db2/db2pok seoul:/ # clcmd grep db2pok /etc/passwd ------------------------------NODE seoul ------------------------------- Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 137 db2pok:!:203:101::/db2/db2pok:/usr/bin/ksh ------------------------------NODE busan ------------------------------db2pok:!:203:101::/db2/db2pok:/usr/bin/ksh seoul:/ # /opt/IBM/db2/V9.5/instance/db2icrt -a SERVER -s ese -u db2fenc1 -p db2c_db2pok db2pok seoul:/ # su - db2pok seoul:/db2/db2pok # ls -ld sqllib drwxrwsr-t 19 db2pok db2iadm1 4096 Sep 21 13:12 sqllib seoul:/db2/db2pok # db2start seoul:/db2/db2pok # db2 "create database pok on /db2/POK/db2pok CATALOG TABLESPACE managed by database using (file '/db2/POK/sapdata1/catalog.tbs' 100000) EXTENTSIZE 4 PREFETCHSIZE 4 USER TABLESPACE managed by database using (file '/db2/POK/sapdata1/sapdata.tbs' 500000) EXTENTSIZE 4 PREFETCHSIZE 4 TEMPORARY TABLESPACE managed by database using (file '/db2/POK/sapdatat1/temp.tbs' 200000) EXTENTSIZE 4 PREFETCHSIZE 4" seoul:/db2/db2pok # db2 list db directory System Database Directory Number of entries in the directory = 1 Database 1 entry: Database alias = POK Database name = POK Local database directory = /db2/POK/db2pok Database release level = c.00 Comment = Directory entry type = Indirect Catalog database partition number = 0 Alternate server hostname = Alternate server port number = seoul:/db2/db2pok # db2 update db cfg for pok using NEWLOGPATH /db2/POK/log_dir seoul:/db2/db2pok # db2 update db cfg for pok using LOGRETAIN on seoul:/db2/db2pok # db2 backup db pok to /tmp seoul:/db2/db2pok # db2stop; db2start seoul:/db2/db2pok # db2 connect to pok Database Connection Information Database server SQL authorization ID Local database alias 138 = DB2/AIX64 9.5.0 = DB2POK = POK IBM PowerHA SystemMirror 7.1 for AIX seoul:/db2/db2pok # db2 connect reset DB20000I The SQL command completed successfully. Non-DPF database support: Smart Assist for DB2 supports only non-DPF databases. 6.1.5 Updating the /etc/services file on the secondary node When the instance is created on the primary node, the /etc/services file is updated with information for DB2 use. You must also add these lines to the /etc/services file on the secondary node as in the following example: db2c_db2pok DB2_db2pok DB2_db2pok_1 DB2_db2pok_2 DB2_db2pok_END 50000/tcp 60000/tcp 60001/tcp 60002/tcp 60003/tcp 6.1.6 Configuring IBM PowerHA SystemMirror You must configure the topology of the PowerHA cluster before using Smart Assist for DB2. In Example 6-5, the cluster korea was configured with two Ethernet interfaces in each node. Example 6-5 Cluster korea configuration seoul:/ # busan-b2 busan-b1 poksap-db seoul-b1 seoul-b2 poksap-db cllsif boot net_ether_01 ether public busan boot net_ether_01 ether public busan service net_ether_01 ether public busan boot net_ether_01 ether public seoul boot net_ether_01 ether public seoul service net_ether_01 ether public seoul 192.168.201.144 192.168.101.144 10.168.101.143 192.168.101.143 192.168.201.143 10.168.101.143 en2 en0 en0 en2 255.255.252.0 255.255.252.0 255.255.252.0 255.255.252.0 255.255.252.0 255.255.252.0 22 22 22 22 22 22 6.2 Implementing a PowerHA SystemMirror cluster and Smart Assist for DB2 7.1 This section explains the preliminary steps that are required before you start Smart Assist for DB2. Then it explains how to start Smart Assist for DB2. 6.2.1 Preliminary steps Before starting Smart Assist for DB2, complete the following steps: 1. Stop the PowerHA cluster services on both nodes by issuing the lssrc -ls clstrmgrES command on both nodes as shown in Example 6-6 on page 140. A ST_INIT state indicates that the cluster services are stopped. The shared volume group is active, with file systems mounted, on the node where Smart Assist for DB2 is going to be installed. Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 139 Example 6-6 Checking for PowerHA stopped cluster services seoul:/ # lssrc -ls clstrmgrES Current state: ST_INIT sccsid = "$Header: @(#) 61haes_r710_integration/13 43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710 2010-08-19T1 0:34:17-05:00$" busan:/ # lssrc -ls clstrmgrES Current state: ST_INIT sccsid = "$Header: @(#) 61haes_r710_integration/13 43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710 2010-08-19T1 0:34:17-05:00$" 2. Mount the file systems as shown in Example 6-7 so that Smart Assist for DB2 can discover the available instances and databases. Example 6-7 Checking for mounted file systems in node seoul seoul:/db2/db2pok # lsvg -l pokvg pokvg: LV NAME TYPE LPs PPs loglv001 jfs2log 1 1 poklv001 jfs2 96 96 poklv002 jfs2 192 192 poklv003 jfs2 32 32 poklv004 jfs2 48 48 poklv005 jfs2 64 64 poklv006 jfs2 64 64 poklv008 jfs2 32 32 poklv009 jfs2 4 4 poklv007 jfs2 32 32 PVs 1 1 2 1 1 1 1 1 1 1 LV STATE open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd MOUNT POINT N/A /db2/POK/db2pok /db2/POK/sapdata1 /db2/POK/sapdatat1 /db2/POK/log_dir /export/sapmnt/POK /export/usr/sap/trans /usr/sap/POK /db2/POK/db2dump /db2/db2pok The DB2 instance is active on the node where Smart Assist for DB2 is going to be executed as shown in Example 6-8. Example 6-8 Checking for active DB2 instances seoul:/ # su - db2pok seoul:/db2/db2pok # db2ilist db2pok seoul:/db2/db2pok # db2start 09/24/2010 11:38:53 0 0 SQL1063N DB2START processing was successful. SQL1063N DB2START processing was successful. seoul:/db2/db2pok # ps -ef | grep db2sysc | grep -v grep db2pok 15794218 8978496 0 11:38:52 - 0:00 db2sysc 0 seoul:/db2/db2pok # db2pd Database Partition 0 -- Active -- Up 0 days 00:00:10 140 IBM PowerHA SystemMirror 7.1 for AIX 3. After the instance is running, edit the $INSTHOME/sqllib/db2nodes.cfg file as shown in Example 6-9 to add the service IP label. This service IP label is going to be used in the IBM PowerHA resource group. If you edited it before, the database instance will not start because the service IP label is not configured on the network interface when PowerHA is down. Example 6-9 Editing and adding the service IP label to the db2nodes.cfg file seoul:/ # cat /db2/db2pok/sqllib/db2nodes.cfg 0 poksap-db 0 The .rhosts file (Example 6-10) for the DB2 instance owner has all the base, persistent, and service addresses. It also has the right permissions. Example 6-10 Checking the .rhosts file seoul:/ # cat /db2/db2pok/.rhosts seoul db2pok busan db2pok seoul-b1 db2pok busan-b1 db2pok seoul-b2 db2pok busan-b2 db2pok poksap-db db2pok seoul:/db2/db2pok # ls -ld .rhosts -rw------1 db2pok system 107 Oct 4 15:10 .rhosts 4. Find the path for the binary files and then export the variable as shown in Example 6-11. The DSE_INSTALL_DIR environment variable is exported as a root user with the actual path for the DB2 binary files. If more than one DB2 version is installed, choose the version that you to use for your high available instance. Example 6-11 Finding the DB2 binary files and exporting them seoul:/db2/db2pok # db2level DB21085I Instance "db2pok" uses "64" bits and DB2 code release "SQL09050" with level identifier "03010107". Informational tokens are "DB2 v9.5.0.0", "s071001", "AIX6495", and Fix Pack "0". Product is installed at "/opt/IBM/db2/V9.5". seoul:/ # export DSE_INSTALL_DIR=/opt/IBM/db2/V9.5 6.2.2 Starting Smart Assist for DB2 After completing the steps in 6.2.1, “Preliminary steps” on page 139, you are ready to start Smart Assist for DB2 as explained in the following steps: 1. Launch Smart Assist for DB2 by using the path for seoul: smitty sysmirror Cluster Applications and Resources Make Applications Highly Available (Use Smart Assists) Add an Application to the PowerHA SystemMirror Configuration. 2. In the Add an Application to the PowerHA SystemMirror Configuration panel, select Select a Smart Assist From the List of Available Smart Assists. Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 141 3. In the Select a Smart Assist From the List of Available Smart Assists panel (Figure 6-1), select DB2 UDB non-DPF Smart Assist. Select a Smart Assist From the List of Available Smart Assists Move cursor to desired item and press Enter. DB2 UDB non-DPF Smart Assist # busan seoul DHCP Smart Assist # busan seoul DNS Smart Assist # busan seoul Lotus Domino Smart Assist # busan seoul FileNet P8 Smart Assist # busan seoul IBM HTTP Server Smart Assist # busan seoul SAP MaxDB Smart Assist # busan seoul Oracle Database Smart Assist # busan seoul Oracle Application Server Smart Assist # busan seoul Print Subsystem Smart Assist # busan seoul SAP Smart Assist # busan seoul Tivoli Directory Server Smart Assist # busan seoul TSM admin smart assist # busan seoul TSM client smart assist # busan seoul TSM server smart assist # busan seoul WebSphere Smart Assist # busan seoul F1=Help Esc+8=Image /=Find F2=Refresh Esc+0=Exit n=Find Next F3=Cancel Enter=Do Figure 6-1 Selecting DB2 UDB non-DPF Smart Assist 4. In the Add an Application to the PowerHA SystemMirror Configuration panel, select Select Configuration Mode. 5. In the Select Configuration Mode panel (Figure 6-2), select Automatic Discovery and Configuration. Select Configuration Mode Move cursor to desired item and press Enter. Automatic Discovery And Configuration Manual Configuration F1=Help Esc+8=Image /=Find F2=Refresh Esc+0=Exit n=Find Next Figure 6-2 Selecting the configuration mode 142 IBM PowerHA SystemMirror 7.1 for AIX F3=Cancel Enter=Do 6. In the Add an Application to the PowerHA SystemMirror Configuration panel, select Select the Specific Configuration You Wish to Create. 7. In the Select the Specific Configuration You Wish to Create panel (Figure 6-3), select DB2 Single Instance. Select The Specific Configuration You Wish to Create Move cursor to desired item and press Enter. DB2 Single Instance F1=Help Esc+8=Image /=Find # busan seoul F2=Refresh Esc+0=Exit n=Find Next F3=Cancel Enter=Do Figure 6-3 Selecting the configuration to create 8. Select the DB2 instance name. In this case, only one instance, db2pok, is available as shown in Figure 6-4. Select a DB2 Instance Move cursor to desired item and press Enter. db2pok F1=Help Esc+8=Image /=Find F2=Refresh Esc+0=Exit n=Find Next F3=Cancel Enter=Do Figure 6-4 Selecting the DB2 instance name 9. Using the available pick lists (F4), edit the Takeover Node, DB2 Instance Database to Monitor, and Service IP Label fields as shown in Figure 6-5. Press Enter. Add a DB2 Highly Available Instance Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. * Application Name [DB2_Instance_db2pok] * * * * * [seoul] [busan] db2pok POK [poksap-db] DB2 Instance Owning Node Takeover Node(s) DB2 Instance Name DB2 Instance Database to Monitor Service IP Label + + + + + Figure 6-5 Adding the DB2 high available instance resource group Tip: You can edit the Application Name field and change it to have a more meaningful name. Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 143 A new PowerHA resource group, called db2pok_ResourceGroup, is created. The volume group pokvg and the service IP label poksap-db are automatically added to the resource group as shown in Example 6-12. Example 6-12 The configured resource group for the DB2 instance seoul:/ # /usr/es/sbin/cluster/utilities/cllsres APPLICATIONS="db2pok_ApplicationServer" FILESYSTEM="" FORCED_VARYON="false" FSCHECK_TOOL="logredo" FS_BEFORE_IPADDR="false" RECOVERY_METHOD="parallel" SERVICE_LABEL="poksap-db" SSA_DISK_FENCING="false" VG_AUTO_IMPORT="false" VOLUME_GROUP="pokvg" USERDEFINED_RESOURCES="" seoul:/ # /usr/es/sbin/cluster/utilities/cllsgrp db2pok_ResourceGroup 10.Administrator task: Verify the start and stop scripts that were created for the resource group. a. To verify the scripts, use the odmget or cllsserv commands or the SMIT tool as shown in Example 6-13. Example 6-13 Verifying the start and stop scripts busan:/ # odmget HACMPserver HACMPserver: name = "db2pok_ApplicationServer" start = "/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok" stop = "/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok" min_cpu = 0 desired_cpu = 0 min_mem = 0 desired_mem = 0 use_cod = 0 min_procs = 0 min_procs_frac = 0 desired_procs = 0 desired_procs_frac = 0 seoul:/ # /usr/es/sbin/cluster/utilities/cllsserv db2pok_ApplicationServer /usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok /usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok b. Follow the path on seoul: smitty sysmirror Cluster Applications and Resources Resources Configure User Applications (Scripts and Monitors) Application Controller Scripts Change/Show Application Controller Scripts. 144 IBM PowerHA SystemMirror 7.1 for AIX c. Select the application controller (Figure 6-8) and press Enter. Select Application Controller Move cursor to desired item and press Enter. db2pok_ApplicationServer F1=Help Esc+8=Image /=Find F2=Refresh Esc+0=Exit n=Find Next F3=Cancel Enter=Do Figure 6-6 Selecting the DB2 application controller The characteristics of the application controller displayed as shown in Figure 6-7. Change/Show Application Controller Scripts Type or select values in entry fields. Press Enter AFTER making all desired changes. Application Controller Name db2pok_ApplicationServer New Name [db2pok_ApplicationServer] Start Script [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok] Stop Script [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok] Application Monitor Name(s) db2pok_SQLMonitor db2pok_ProcessMonitor Figure 6-7 Change/Show Application Controller Scripts panel 11.Administrator task: Verify which custom and process application monitors were created by Smart Assist for DB2. In our example, the application monitors are db2pok_SQLMonitor and db2pok_ProcessMonitor. a. Run the following path for seoul: smitty sysmirror Cluster Applications and Resources Resources Configure User Applications (Scripts and Monitors) Application Monitors Configure Custom Application Monitors Change/Show Custom Application Monitor. b. In the Application Monitor to Change panel (Figure 6-8), select db2pok_SQLMonitor and press Enter. Application Monitor to Change Move cursor to desired item and press Enter. db2pok_SQLMonitor F1=Help Esc+8=Image /=Find F2=Refresh Esc+0=Exit n=Find Next F3=Cancel Enter=Do Figure 6-8 Selecting the application monitor to change Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 145 c. In the Change/Show Custom Application Monitor panel (Figure 6-9), you see the attributes of the application monitor. Change/Show Custom Application Monitor Type or select values in entry fields. Press Enter AFTER making all desired changes. * Monitor Name db2pok_SQLMonitor Application Controller(s) to Monitor db2pok_ApplicationServer + * Monitor Mode [Long-running monitoring] + * Monitor Method [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2cmon -i db2pok -A po> Monitor Interval [120] # Hung Monitor Signal [9] # * Stabilization Interval [240] # Restart Count [3] # Restart Interval [1440] # * Action on Application Failure [fallover] + Notify Method [] Cleanup Method [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok] Restart Method [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok] Figure 6-9 Change/Show Custom Application Monitor panel d. Run the following path for seoul: smitty sysmirror Cluster Applications and Resources Resources Configure User Applications (Scripts and Monitors) Application Monitors Configure Process Application Monitors Change/Show Process Application Monitor. e. In the Application Monitor to Change panel (Figure 6-10), select db2pok_ProcessMonitor and press Enter. Application Monitor to Change Move cursor to desired item and press Enter. db2pok_ProcessMonitor F1=Help Esc+8=Image /=Find F2=Refresh Esc+0=Exit n=Find Next Figure 6-10 Selecting the application monitor to change 146 IBM PowerHA SystemMirror 7.1 for AIX F3=Cancel Enter=Do In the Change/Show Process Application Monitor panel, you see the attributes of the application monitor (Figure 6-11). Change/Show Process Application Monitor Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * Monitor Name db2pok_ProcessMonitor Application Controller(s) to Monitor db2pok_ApplicationServer + * Monitor Mode [Long-running monitoring] + * Processes to Monitor [db2sysc] * Process Owner [db2pok] Instance Count [1] # * Stabilization Interval [240] # * Restart Count [3] # Restart Interval [1440] # * Action on Application Failure [fallover] + Notify Method [] Cleanup Method [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok] Restart Method [/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok] Figure 6-11 Change/Show Process Application Monitor panel 6.2.3 Completing the configuration After the Smart Assist for DB2 is started, complete the configuration: 1. Stop the DB2 instance on the primary node as shown in Example 6-14. Keep in mind that it was active only for the sake of the Smart Assist for DB2 discovery process. Example 6-14 Stopping the DB2 instance seoul:/ # su - db2pok seoul:/db2/db2pok # db2stop 09/24/2010 12:02:56 0 0 SQL1064N DB2STOP processing was successful. SQL1064N DB2STOP processing was successful. 2. Unmount the shared file systems as shown in Example 6-15. Example 6-15 Unmounting the shared file systems seoul:/db2/db2pok # lsvg -l pokvg pokvg: LV NAME TYPE LPs PPs loglv001 jfs2log 1 1 poklv001 jfs2 96 96 poklv002 jfs2 192 192 poklv003 jfs2 32 32 poklv004 jfs2 48 48 poklv005 jfs2 64 64 poklv006 jfs2 64 64 poklv008 jfs2 32 32 poklv009 jfs2 4 4 poklv007 jfs2 32 32 PVs 1 1 2 1 1 1 1 1 1 1 LV STATE closed/syncd closed/syncd closed/syncd closed/syncd closed/syncd closed/syncd closed/syncd closed/syncd closed/syncd closed/syncd MOUNT POINT N/A /db2/POK/db2pok /db2/POK/sapdata1 /db2/POK/sapdatat1 /db2/POK/log_dir /export/sapmnt/POK /export/usr/sap/trans /usr/sap/POK /db2/POK/db2dump /db2/db2pok Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 147 3. Deactivate the shared volume group as shown in Example 6-16. Example 6-16 Deactivating the shared volume group of pokvg seoul:/ # varyoffvg pokvg seoul:/ # lsvg -o caavg_private rootvg 4. Synchronize the PowerHA cluster by using SMIT: a. Follow the path smitty sysmirror Custom Cluster Configuration Verify and Synchronize Cluster Configuration (Advanced). b. In the PowerHA SystemMirror Verification and Synchronization panel (Figure 6-12), press Enter to accept the default option. PowerHA SystemMirror Verification and Synchronization Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * Verify, Synchronize or Both * Include custom verification library checks * Automatically correct errors found during verification? * Force synchronization if verification fails? * Verify changes only? * Logging [Both] [Yes] [Yes] + + + [No] [No] [Standard] + + + Figure 6-12 Accepting the default actions on the Verification and Synchronization panel 5. Start the cluster on both nodes, seoul and busan, by running smitty clstart. 6. In the Start Cluster Services panel (Figure 6-13 on page 149), complete these steps: a. b. c. d. e. f. g. h. 148 For Start now, on system restart or both, select now. For Start Cluster SErvices on these nodes, enter [seoul busan]. For Manage Resource Groups, select Automatically. For BROADCAST message at startup, select false. For Startup Cluster Information Daemon, select true. For Ignore verification errors, select false. For Automatically correct errors found during cluster start?, select yes. Press Enter. IBM PowerHA SystemMirror 7.1 for AIX Start Cluster Services Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * Start now, on system restart or both Start Cluster Services on these nodes * Manage Resource Groups BROADCAST message at startup? Startup Cluster Information Daemon? Ignore verification errors? Automatically correct errors found during cluster start? now [seoul busan] Automatically false true false yes + + + + + + + Figure 6-13 Specifying the options for starting cluster services Tip: The log file for the Smart Assist is in the /var/hacmp/log/sa.log file. You can use the clmgr utility to easily view the log, as in the following example: clmgr view log sa.log When the PowerHA cluster starts, the DB2 instance is automatically started. The application monitors start after the defined stabilization interval as shown in Example 6-17. Example 6-17 Checking the status of the high available cluster and the DB2 instance seoul:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------db2pok_Resourc ONLINE seoul OFFLINE busan seoul:/ # ps -ef | grep /usr/es/sbin/cluster/clappmond | grep -v grep root 7340184 15728806 0 12:17:53 - 0:00 /usr/es/sbin/cluster/clappmond db2pok1_SQLMonitor root 11665630 4980958 0 12:17:53 - 0:00 /usr/es/sbin/cluster/clappmond db2pok_ProcessMonitor seoul:/ # su - db2pok seoul:/db2/db2pok # db2pd Database Partition 0 -- Active -- Up 0 days 00:19:38 Your DB2 instance and database are now configured for high availability in a hot-standby PowerHA SystemMirror configuration. Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 149 150 IBM PowerHA SystemMirror 7.1 for AIX 7 Chapter 7. Migrating to PowerHA 7.1 This chapter includes the following topics for migrating to PowerHA 7.1: Considerations before migrating Understanding the PowerHA 7.1 migration process Snapshot migration Rolling migration Offline migration © Copyright IBM Corp. 2011. All rights reserved. 151 7.1 Considerations before migrating Before migrating your cluster, you must be aware of the following considerations: The required software – AIX – Virtual I/O Server (VIOS) Multicast address Repository disk FC heartbeat support All non-IP networks support removed – – – – – RS232 TMSCSI TMSSA Disk heartbeat (DISKHB) Multinode disk heartbeat (MNDHB) IP networks support removed – Asynchronous transfer mode (ATM) – Fiber Distributed Data Interface (FDDI) – Token ring IP Address Takeover (IPAT) via replacement support removed Heartbeat over alias support removed Site support not available in this version IPV6 support not available in this version You can migrate from High-Availability Cluster Multi-Processing (HACMP) or PowerHA versions 5.4.1, 5.5, and 6.1 only. If you are running a version earlier than HACMP 5.4.1, you must upgrade to a newer version first. TL6: AIX must be at a minimum version of AIX 6.1 TL6 (6.1.6.0) on all nodes before migration. Use of AIX 6.1 TL6 SP2 or later is preferred. For more information about migration considerations, see 3.4, “Migration planning” on page 46. Only the following migration methods are supported: Snapshot migration (as explained in 7.3, “Snapshot migration” on page 161) Rolling migration (as explained in 7.4, “Rolling migration” on page 177) Offline migration (as explained in 7.5, “Offline migration” on page 191) Important: A nondisruptive upgrade is not available in PowerHA 7.1, because this version is the first one to use Cluster Aware AIX (CAA). 152 IBM PowerHA SystemMirror 7.1 for AIX 7.2 Understanding the PowerHA 7.1 migration process Before you begin a migration, you must understand the migration process and all migration scenarios. The process is different from the previous versions of PowerHA (HACMP). With the introduction of PowerHA 7.1, you now use the features of CAA introduced in AIX 6.1 TL6 and AIX 7.1. For more information about the new features of this release, see 2.2, “New features” on page 24. The migration process now has two main cluster components: CAA and PowerHA. This process involves updating your existing PowerHA product and configuring the CAA cluster component. 7.2.1 Stages of migration Migrating to PowerHA 7.1 involves the following stages: Stage 1: Upgrading to AIX 6.1 TL6 or AIX 7.1 Before you can migrate, you must have working a cluster-aware version of AIX. You can perform this task as part of a two-stage rolling migration or upgrade to AIX first before you start the PowerHA migration. This version is required before you can start premigration checking (stage 2). Stage 2: Performing the premigration check (clmigcheck) During this stage, you use the clmigcheck command to upgrade PowerHA to PowerHA 7.1: a. Stage 2a: Run the clmigcheck command on the first node to choose Object Data Manager (ODM) or snapshot. Run it again to choose the repository disk (and optionally the IP multicast address). b. Stage 2b: Run the clmigcheck command on each node (including the first node) to see the “OK to install the new version” message and then upgrade the node to PowerHA 7.1. The clmigcheck command: The clmigcheck command automatically creates the CAA cluster when it is run on the last node. For a detailed explanation about the clmigcheck process, see 7.2.2, “Premigration checking: The clmigcheck program” on page 157. Chapter 7. Migrating to PowerHA 7.1 153 Stage 3: Upgrading to PowerHA 7.1 After stage 2 is completed, you upgrade to PowerHA 7.1 on the node. Figure 7-1 shows the state of the cluster in the test environment after updating to PowerHA 7.1 on one node. Topology services are still active so that the newly migrated PowerHA 7.1 node can communicate with the previous version, PowerHA 6.1. The CAA configuration has been completed, but the CAA cluster is not yet created. Figure 7-1 Mixed version cluster after migrating node 1 Stage 4: Creating the CAA cluster (last node) When you are on the last node of the cluster, you create the CAA cluster after running the clmigcheck command a final time. CAA is required for PowerHA 7.1 to work, making this task a critical step. Figure 7-2 shows the state of the environment after running the clmigcheck command on the last node of the cluster, but before completing the migration. Figure 7-2 Mixed version cluster after migrating node 2 At this stage, the clmigcheck process has run on the last node of the cluster. The CAA cluster is now created and CAA has established communication with the other node. 154 IBM PowerHA SystemMirror 7.1 for AIX However, PowerHA is still using the Topology Services (topsvcs) function because the migration switchover to CAA is not yet completed. Stage 5: Starting the migration protocol As soon as you create the CAA cluster and install PowerHA 7.1, you must start the cluster. The node_up event checks whether all nodes are running PowerHA 7.1 and starts the migration protocol. The migration protocol has two phases: – Phase 1 You call ha_gs_migrate_to_caa_prep(0) to start the migration from groups services to CAA. Ensure that each node can proceed with the migration. – Phase 2 During the second phase, you update the DCD and ACD ODM entries in HACMPnode and HACMPcluster to the latest version. You call ha_gs_migrate_to_caa_commit() to complete the migration and issue the following command: /usr/es/sbin/cluster/utilities/clmigcleanup The clmigcleanup process removes existing non-IP entries from the HACMPnetwork, HACMPadapter, and HACMPnim ODM entries, such as any diskhb entries. Figure 7-3 shows sections from the clstrmgr.debug log file showing the migration protocol stages. Migration phase one - extract from clstrmgr.debug Mon Sep 27 20:22:51 nPhaseCb: First phase of the migration protocol, call ha_gs_caa_migration_prep() Mon Sep 27 20:22:51 domainControlCb: Called, state=ST_STABLE Mon Sep 27 20:22:51 domainControlCb: Notification type: HA_GS_DOMAIN_NOTIFICATION Mon Sep 27 20:22:51 domainControlCb: HA_GS_MIGRATE_TO_CAA Mon Sep 27 20:22:51 domainControlCb: Sub-Type: HA_GS_DOMAIN_CAA_MIGRATION_COORD Mon Sep 27 20:22:51 domainControlCb: reason: HA_GS_VOTE_FOR_MIGRATION Mon Sep 27 20:22:51 domainControlCb: Called, state=ST_STABLE Mon Sep 27 20:22:51 domainControlCb: Notification type: HA_GS_DOMAIN_NOTIFICATION Mon Sep 27 20:22:51 domainControlCb: HA_GS_MIGRATE_TO_CAA Mon Sep 27 20:22:51 domainControlCb: Sub-Type: HA_GS_DOMAIN_CAA_MIGRATION_APPRVD Mon Sep 27 20:22:51 domainControlCb: reason: HA_GS_MIGRATE_TO_CAA_PREP_DONE Mon Sep 27 20:22:51 domainControlCb: Set RsctMigPrepComplete flag Mon Sep 27 20:22:51 domainControlCb: Voting to CONTINUE with RsctMigrationPrepMsg. Migration phase two - updating cluster version Mon Sep 27 20:22:51 DoNodeOdm: Called for DCD HACMPnode class Mon Sep 27 20:22:51 GetObjects: Called with criteria: name=chile Mon Sep 27 20:22:51 DoNodeOdm: Updating DCD HACMPnode stanza with node_id = 1 and version = 12 for object NAME_SERVER of node chile Mon Sep 27 20:22:51 DoNodeOdm: Updating DCD HACMPnode stanza with node_id = 1 and version = 12 for object DEBUG_LEVEL of node chile Finishing migration Mon Sep 27 20:23:51 Mon Sep 27 20:23:51 Mon Sep 27 20:23:51 - calling clmigcleanup finishMigrationGrace: resetting MigrationGracePeriod finishMigrationGrace: Calling ha_gs_migrate_to_caa_commit() finifhMigration Grace: execute clmigcleanup command Figure 7-3 Extract from the clstrmgr.debug file showing the migration protocol Chapter 7. Migrating to PowerHA 7.1 155 Stage 6: Switching over from Group Services (grpsvcs) to CAA When migration is complete, switch over the grpsvcs communication function from topsvcs to the new communication with CAA. The topsvcs function is now inactive, but the service is still part of Reliable Scalable Cluster Technology (RSCT) and is not removed. CAA communication: The grpsvcs SRC subsystem is active until you restart the system. This subsystem is now communicating with CAA and not topsvcs as shown in Figure 7-4. Figure 7-4 Switching over Group Services to use CAA Figure 7-5 shows the services that are running after migration, including cthags. chile:/ # lssrc -a | grep cluster clstrmgrES cluster clevmgrdES cluster 4391122 11862228 active active chile:/ # lssrc -a | grep cthags cthags cthags 7405620 active chile:/ # lssrc -a | grep caa cld caa clcomd caa solid caa clconfd caa solidhac caa 4063436 3670224 7864338 5505178 7471164 active active active active active Figure 7-5 Services running after migration 156 IBM PowerHA SystemMirror 7.1 for AIX Table 7-1 shows the changes to the SRC subsystem before and after migration. Table 7-1 Changes in the SRC subsystems Older PowerHA PowerHA 7.1 or later Topology Services topsvcs N/A Group Services grpsvcs cthags The clcomdES and clcomd subsystems When running in a mixed-version cluster, you must handle the changes in the clcomd subsystem. During a rolling or mixed-cluster situation, you can have two separate instances of the communication daemon running: clcomd and clcomdES. clcomd instances: You can have two instances of the clcomd daemon in the cluster, but never on a given node. After PowerHA 7.1 is installed on a node, the clcomd daemon is run, and the clcomdES daemon does not exist. AIX 6.1.6.0 and later with a back-level PowerHA version (before version 7.1) only runs the clcomdES daemon even though the clcomd daemon exists. The clcomd daemon uses port 16191, and the clcomdES daemon uses port 6191. When migration is complete, the clcomdES daemon is removed. The clcomdES daemon: The clcomdES daemon is removed when the older PowerHA software version is removed (snapshot migration) or overwritten by the new PowerHA 7.1 version (rolling or offline migration). 7.2.2 Premigration checking: The clmigcheck program Before starting migration, you must run the clmigcheck program to prepare the cluster for migration. The clmigcheck program has two functions. First, it validates the current cluster configuration (by using ODM or snapshot) for migration. If the configuration is not valid, the clmigcheck program notifies you of any unsupported elements, such as disk heartbeating or IPAT via replacement. It also indicates any actions that might be required before you can migrate. Second, this program prepares for the new cluster by obtaining the disk to be used for the repository disk and multicast address. Command profile: The clmigcheck command is not a PowerHA command, but the command is part of bos.cluster and is in the /usr/sbin directory. Chapter 7. Migrating to PowerHA 7.1 157 High-level overview of the clmigcheck process Figure 7-6 shows a high-level view of how the clmigcheck program works. The clmigcheck program must go through several stages to complete the cluster migration. Figure 7-6 High-level process of the clmigcheck command The clmigcheck program goes through the following stages: 1. Performing the first initial run When the clmigcheck program runs, it checks whether it has been run before by looking for a /var/clmigcheck/clmigcheck.txt file. If this file does not exist, the clmigcheck program runs and opens the menu shown in Figure 7-8 on page 159. 2. Verifying that the cluster configuration is suitable for migration From the clmigcheck menu, you can select options 1 or 2 to check your existing ODM or snapshot configuration to see if your environment is ready for migration. 3. Creating the CAA required configuration After performing option 1 or 2, choose option 3. Option 3 creates the /var/clmigcheck /clmigcheck.txt file with the information entered and is copied to all nodes in the cluster. 4. Performing the second run on the first node, or first run on any other node that is not the first or the last node in the cluster to be migrated If the clmigcheck program is run again and the clmigcheck.txt file already exists, a message is returned indicating that you can proceed with the upgrade of PowerHA. 158 IBM PowerHA SystemMirror 7.1 for AIX 5. Verifying whether the last node in the cluster is upgraded When the clmigcheck program runs, apart from checking for the presence of the clmigcheck.txt file, it verifies if it is the last node in the cluster to be upgraded. The lslpp command is run against each node in the cluster to establish whether PowerHA has been upgraded. If all other nodes are upgraded, this command confirms that this node is the last node of the cluster and can now create the CAA cluster. The clmigcheck program uses the mkcluster command and passes the cluster parameters from the existing PowerHA cluster, along with the repository disk and multicast address (if applicable). Figure 7-7 shows an example of the mkcluster command being called. usr/sbin/mkcluster -n newyork -r hdisk1 -m chile{cle_globid=4},scotland{cle_globid=5},serbia{cle_globid=6} Figure 7-7 The clmigcheck command calling the mkcluster command Running the clmigcheck command Figure 7-8 shows the main clmigcheck panel. You choose option 1 or 2 depending on which type of migration you want to perform. Option 1 is for a rolling or offline migration. Option 2 is for a snapshot migration. When you choose either option, a check of the cluster configuration is performed to verify if the cluster can be migrated. If any problems are detected, a warning or error message is displayed. ------------[ PowerHA SystemMirror Migration Check ]------------Please select one of the following options: 1 = Check ODM configuration. 2 = Check snapshot configuration. 3 = Enter repository disk and multicast IP addresses. Select one of the above,"x"to exit or "h" for help: Figure 7-8 The clmigcheck menu A warning message is displayed for certain unsupported elements, such as disk heartbeat as shown in Figure 7-9. ------------[ PowerHA SystemMirror Migration Check ]------------CONFIG-WARNING: The configuration contains unsupported hardware: Disk Heartbeat network. The PowerHA network name is net_diskhb_01. This will be removed from the configuration during the migration to PowerHA SystemMirror 7.1. Hit <Enter> to continue Figure 7-9 The disk heartbeat warning message when running the clmigcheck command Chapter 7. Migrating to PowerHA 7.1 159 Non-IP networks can be dynamically removed during the migration process by using the clmigcleanup command. However, other configurations, such as IPAT via replacement, require manual steps to remove or change them to a supported configuration. After the changes are made, run clmigcheck again to ensure that the error is resolved. The second function of the clmigcheck program is to prepare the CAA cluster environment. This function is performed when you select option 3 (Enter repository disk and multicast IP addresses) from the menu. When you select this option, the clmigcheck program stores the information entered in the /var/clmigcheck/clmigcheck.txt file. This file is also copied to the /var/clmigcheck directory on all nodes in the cluster. This file contains the physical volume identifier (PVID) of the repository disk and the chosen multicast address. If PowerHA is allowed to choose a multicast address automatically, the NULL setting is specified in the file. Figure 7-10 shows an example of the clmigcheck.txt file. CLUSTER_TYPE:STANDARD CLUSTER_REPOSITORY_DISK:000fe40120e16405 CLUSTER_MULTICAST:NULL Figure 7-10 Contents of the clmigcheck.txt file Upon running the clmigcheck command, the command checks to see if the clmigcheck.txt file exists. If the clmigcheck.txt file exists and the node is not the last node in the cluster to be migrated, the panel shown in Figure 7-11 is displayed. It contains a message indicating that you can now upgrade to the later level of PowerHA. ------------[ PowerHA SystemMirror Migration Check ]------------clmigcheck: This is not the first node or last node clmigcheck was run on. No further checking is required on this node. You can install the new version of PowerHA SystemMirror. Hit <Enter> to continue ----------------------------------------------------------------------Figure 7-11 The clmigcheck panel after it has been run once and before the PowerHA upgrade The clmigcheck program checks the installed version of PowerHA to see if it has been upgraded. This step is important to determine which node is the last node to be upgraded in the cluster. If it is the last node in the cluster, then additional configuration operations must be completed along with creating and activating the CAA cluster. Important: You must run the clmigcheck program before you upgrade PowerHA. Then upgrade PowerHA one node at a time, and run the clmigcheck program on the next node only after you complete the migration on the previous node. If you do not run the clmigcheck program specifically on the last node, the cluster is still in migration mode without creating the CAA cluster. For information about how to resolve this situation, see 10.4.7, “The ‘Cluster services are not active’ message” on page 323. 160 IBM PowerHA SystemMirror 7.1 for AIX After you upgrade PowerHA, if you run the clmigcheck program again, you see an error message similar to the one shown in Figure 7-12. The message indicates that all migration steps for this node of the cluster have been completed. ERROR: This program is intended for PowerHA configurations prior to version 7.1 The version currently installed appears to be: 7.1.0 Figure 7-12 clmigcheck panel after PowerHA has been installed on a node. Figure 7-13 shows an extract from the /tmp/clmigcheck/clmigcheck.log file that was taken when the clmigcheck command ran on the last node in a three-node cluster migration. This file shows the output by the clmigcheck program when checking whether this node is the last node of the cluster. ck_lastnode: Getting version of cluster.es.server.rte on node chile ck_lastnode: lslpp from node (chile) is /etc/objrepos:cluster.es.server.rte:7.1. 0.1::COMMITTED:F:Base Server Runtime: ck_lastnode: cluster.es.server.rte on node chile is (7.1.0.1) ck_lastnode: Getting version of cluster.es.server.rte on node serbia ck_lastnode: lslpp from node (serbia) is /etc/objrepos:cluster.es.server.rte:7.1 .0.1::COMMITTED:F:Base Server Runtime: ck_lastnode: cluster.es.server.rte on node serbia is (7.1.0.1) ck_lastnode: Getting version of cluster.es.server.rte on node scotland ck_lastnode: lslpp from node (scotland) is /etc/objrepos:cluster.es.server.rte:6 .1.0.2::COMMITTED:F:ES Base Server Runtime: ck_lastnode: cluster.es.server.rte on node scotland is (6.1.0.2) ck_lastnode: oldnodes = 1 ck_lastnode: This is the last node to run clmigcheck. clmigcheck: This is the last node to run clmigcheck, create the CAA cluster Figure 7-13 Extract from clmigcheck.log file showing the lslpp last node checking 7.3 Snapshot migration To illustrate a snapshot migration, the environment in this scenario entails a two-node AIX 6.1.3 and PowerHA 5.5 SP4 cluster being migrated to AIX 6.1 TL6 and PowerHA 7.1 SP1. The nodes are IBM POWER6® 550 systems and configured as VIO client partitions. Virtual devices are used for network and storage configuration. Chapter 7. Migrating to PowerHA 7.1 161 The network topology consists of one IP network and one non-IP network, which is the disk heartbeat network. The initial IPAT method is IPAT via replacement, which must be changed before starting the migration, because PowerHA 7.1 only supports IPAT via aliasing. Also the environment has one resource group that includes one service IP, two volume groups, and application monitoring. This environment also has an IBM HTTP server as the application. Figure 7-14 shows the relevant resource group settings. Resource Group Name Participating Node Name(s) Startup Policy Fallover Policy Fallback Policy Site Relationship Node Priority Service IP Label Volume Groups testrg algeria brazil Online On Home Node Only Fallover To Next Priority Node Never Fallback ignore algeria_svc algeria_vg brazil_vg Figure 7-14 Cluster resource group configuration using snapshot migration 7.3.1 Overview of the migration process A major difference from previous migration versions is the clmigcheck script, which is mandatory for the migration procedure. As stated in 1.2, “Cluster Aware AIX” on page 7, PowerHA 7.1 uses CAA for monitoring and event management. By running the clmigcheck script (option 3), you can specify a repository disk and a multicast address, which are required for the CAA service. The snapshot migration method requires all cluster nodes to be offline for some time. It requires removing previous versions of PowerHA and installing AIX 6.1 TL6 or later and the new version of PowerHA 7.1. In this scenario, to begin, PowerHA 5.5 SP4 is on AIX 6.1.3 and migrated to PowerHA 7.1 SP1 on AIX 6.1 TL6. The network topology consists of one IP network using IPAT via replacement and the disk heartbeat network. Both of these network types are no longer supported. However, if you have an IPAT via replacement configuration, the clmigcheck script generates an error message as shown in Figure 7-15. You must remove this configuration to proceed with the migration. ------------[ PowerHA SystemMirror Migration Check ]------------CONFIG-ERROR: The configuration contains unsupported options: IP Address Takeover via Replacement. The PowerHA network name is net_ether_01. This will have to be removed from the configuration before migration to PowerHA SystemMirror Hit <Enter> to continue Figure 7-15 The clmigcheck error message for IPAT via replacement IPAT via replacement configuration: If your cluster has an IPAT via replacement configuration, remove or change to the IPAT via alias method before starting the migration. 162 IBM PowerHA SystemMirror 7.1 for AIX 7.3.2 Performing a snapshot migration The next steps are followed to migrate the cluster. Creating a snapshot Create a snapshot by entering the smit cm_add_snap.dialog command while your cluster is running. Stopping the cluster Run the smit clstop command on all nodes to take down the cluster. Ensure that the cluster is down by using the lssrc -ls clstrmgrES command (Figure 7-16) for each node. # lssrc -ls clstrmgrES Current state: ST_INIT sccsid = "$Header: @(#) 61haes_r710_integration/13 43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710 2010-08-19T1 0:34:17-05:00$" Figure 7-16 The lssrc -ls clstrmgrES command to ensure that each cluster is down Installing AIX 6.1.6 and clmigcheck To install AIX 6.1.6 and the clmigcheck program, follow these steps: 1. By using the AIX 6.1.6 installation media or TL6 updates, perform a smitty update_all. 2. After updating AIX, check whether the bos.cluster and bos.ahafs file sets are correctly installed as shown in Figure 7-17. These two file sets are new for the CAA services. You might need to install them separately. brazil:/ # lslpp -l |grep bos.cluster bos.cluster.rte 6.1.6.1 APPLIED bos.cluster.solid 6.1.6.1 APPLIED bos.cluster.rte 6.1.6.1 APPLIED bos.cluster.solid 6.1.6.0 COMMITTED brazil:/ # Cluster Aware AIX POWER HA Business Resiliency Cluster Aware AIX POWER HA Business Resiliency Figure 7-17 Verifying additional required file sets The clcomd subsystem is now part of AIX and requires the fully qualified host names of all nodes in the cluster to be listed in the /etc/cluster/rhosts file. Because AIX was updated, a restart is required. 3. Because you updated the AIX image, restart the system before you continue with the next step. After restarting the system, you can see the clcomd subsystem from the caa subsystem group that is up and running. The clcomdES daemon, which is part of PowerHA, is also running as shown in Figure 7-18. algeria:/usr/es/sbin/cluster/etc # lssrc -a|grep com clcomd caa 4128960 active clcomdES clcomdES 2818102 active algeria:/usr/es/sbin/cluster/etc # Figure 7-18 Two clcomd daemons exist Chapter 7. Migrating to PowerHA 7.1 163 Now AIX 6.1.6 is installed and you ready for the clmigcheck step. 4. Run the clmigcheck command on the first node (algeria). Figure 7-19 shows the clmigcheck menu. ------------[ PowerHA SystemMirror Migration Check ]------------Please select one of the following options: 1 = Check ODM configuration. 2 = Check snapshot configuration. 3 = Enter repository disk and multicast IP addresses. Select one of the above,"x"to exit or "h" for help: Figure 7-19 Options on the clmigcheck menu The clmigcheck menu options: In the clmigcheck menu, option 1 and 2 review the cluster configurations. Option 3 gathers information that is necessary to create the CAA cluster during its execution on the last node of the cluster. In option 3, you define a cluster repository disk and multicast IP address. Selecting option 3 means that you are ready to start the migration. In option 3 of the clmigcheck menu, you select two configurations: The disk to use for the repository The multicast address for internal cluster communication Option 2: Checking the snapshot configuration When you choose option 2 from the clmigcheck menu, a prompt is displayed for you to provide the snapshot file name. The clmigcheck review specifies the snapshot file and shows an error or warning message if any unsupported elements are discovered. 164 IBM PowerHA SystemMirror 7.1 for AIX In the test environment, a disk heartbeat network is not supported in PowerHA 7.1. The warning message from clmigcheck is for the disk heartbeat configuration as Figure 7-20 shows. ------------[ PowerHA SystemMirror Migration Check ]------------h = help Enter snapshot name (in /usr/es/sbin/cluster/snapshots): snapshot_mig clsnapshot: Removing any existing temporary HACMP ODM entries... clsnapshot: Creating temporary HACMP ODM object classes... clsnapshot: Adding HACMP ODM entries to a temporary directory.. clsnapshot: Succeeded generating temporary ODM containing Cluster Snapshot: snapshot_mig ------------[ PowerHA SystemMirror Migration Check ]------------CONFIG-WARNING: The configuration contains unsupported hardware: Disk Heartbeat network. The PowerHA network name is net_diskhb_01. This will be removed from the configuration during the migration to PowerHA SystemMirror 7.1. Hit <Enter> to continue Figure 7-20 The clmigcheck warning message for a disk heartbeat configuration Figure 7-20 shows the warning message “This will be removed from the configuration during the migration”. Because it is only a warning message, you can continue with the migration. After completing the migration, verify that the disk heartbeat is removed. When option 2 of clmigcheck is completed without error, proceed with option 3 as shown in Figure 7-21. ------------[ PowerHA SystemMirror Migration Check ]------------The ODM has no unsupported elements. Hit <Enter> to continue Figure 7-21 clmigcheck passed for snapshot configurations Chapter 7. Migrating to PowerHA 7.1 165 Option 3: Entering the repository disk and multicast IP addresses In option 3, clmigcheck lists all shared disks on both nodes. In this scenario, hdisk1 is specified as the repository disk as shown in Figure 7-22. ------------[ PowerHA SystemMirror Migration Check ]------------Select the disk to use for the repository 1 2 3 4 5 = = = = = 000fe4114cf8d1ce(hdisk1) 000fe4114cf8d3a1(hdisk4) 000fe4114cf8d441(hdisk5) 000fe4114cf8d4d5(hdisk6) 000fe4114cf8d579(hdisk7) Select one of the above or "x" to exit: 1 Figure 7-22 Selecting the repository disk You can create a NULL entry for the multicast address. Then, AIX generates one such address as shown in Figure 7-23. Keep this value as the default so that AIX can generate the multicast address. ------------[ PowerHA SystemMirror Migration Check ]------------PowerHA SystemMirror uses multicast address for internal cluster communication and monitoring. These must be in the multicast range, 224.0.0.0 - 239.255.255.255. If you make a NULL entry, AIX will generate an appropriate address for you. You should only specify an address if you have an explicit reason to do so, but are cautioned that this address cannot be changed once the configuration is activated (i.e. migration is complete). h = help Enter the multicast IP address to use for network monitoring: Figure 7-23 Defining a multicast address 166 IBM PowerHA SystemMirror 7.1 for AIX The clmigcheck process is logged in the /tmp/clmigcheck/clmigcheck.log file (Figure 7-24). validate_disks: No sites, only one repository disk needed. validate_disks: Disk 000fe4114cf8d1ce exists prompt_mcast: Called prompt_mcast: User entered: validate_mcast: Called write_file: Called write_file: Copying /tmp/clmigcheck/clmigcheck.txt to algeria:/var/clmigcheck/clmigcheck.txt write_file: Copying /tmp/clmigcheck/clmigcheck.txt to brazil:/var/clmigcheck/clmigcheck.txt Figure 7-24 /tmp/clmigcheck/clmigcheck.log The completed clmigcheck program When the clmigcheck program is completed, it creates a /var/clmigcheck/clmigcheck.txt file on each node of the cluster. The text file contains a PVID of the repository disk and the multicast address for the CAA cluster as shown in Figure 7-25. # cat /var/clmigcheck/clmigcheck.txt CLUSTER_TYPE:STANDARD CLUSTER_REPOSITORY_DISK:000fe4114cf8d1ce CLUSTER_MULTICAST:NULL Figure 7-25 The /var/clmigcheck/clmigcheck.txt file When PowerHA 7.1 is installed, this information is used to create the HACMPsircol.odm file as shown in Figure 7-26. This file is created when you finish restoring the snapshot in this scenario. algeria:/ # odmget HACMPsircol HACMPsircol: name = "canada_cluster_sircol" id = 0 uuid = "0" repository = "000fe4114cf8d1ce" ip_address = "" nodelist = "brazil,algeria" backup_repository1 = "" backup_repository2 = "" algeria:/ # Figure 7-26 The HACMPsircol.odm file Chapter 7. Migrating to PowerHA 7.1 167 Running clmigcheck on one node: Compared to the rolling migration method, the snapshot migration method entails running the clmigcheck command on one node. Do not run the clmigcheck command on another node while you are doing a snapshot migration or the migration will fail. If you run the clmigcheck command on every node, the CAA cluster is created upon executing the clmigcheck command on the last node and goes into the rolling migration phase. Uninstalling PowerHA SystemMirror 5.5 To uninstall PowerHA SystemMirror 5.5, follow these steps: 1. Run smit install_remove and specify cluster.* from all nodes. Verify this step by running the following command to show that all PowerHA file sets are removed: lslpp -l cluster.* 2. Install PowerHA 7.1 by using the following command: smit install_all 3. Verify that the file sets are installed correctly: lslpp -l cluster.* After you install the new PowerHA 7.1 file sets, you can see that the clcomdES daemon has disappeared. You now have the clcomd daemon, which is part of CAA, instead of the clcomdES daemon. Updating the /etc/cluster/rhosts file After you complete the installation of PowerHA 7.1, update the /etc/cluster/rhosts file: 1. Update the /etc/cluster/rhosts file with the fully qualified domain name of each node in the cluster. (For example, you might use the output from the hostname command). 2. Restart the clcomd subsystem as shown in Figure 7-27. algeria:/ # stopsrc -s clcomd 0513-044 The clcomd Subsystem was requested to stop. algeria:/ # startsrc -s clcomd 0513-059 The clcomd Subsystem has been started. Subsystem PID is 12255420. algeria:/ # Figure 7-27 Restarting the clcomd subsystem on both nodes 3. Stop and start the clcomd daemon instead by using the following command: refresh -s clcomd 4. To verify that the clcomd subsystem is working, use the clrsh command. If it does not work, correct any problems before proceeding as explained in Chapter 10, “Troubleshooting PowerHA 7.1” on page 305. Converting the snapshot Now convert the snapshot from PowerHA 5.5. On PowerHA 7.1, run the clconvert_snapshot command before you restore it. (In some older versions of PowerHA, you do not need to run this command.) While converting the snapshot, the clconvert_snapshot command refers to the /var/clmigcheck/clmigcheck.txt file and adds the HACMPsircol stanza with the repository disk and multicast address, which are newly introduced in PowerHA 7.1. After you 168 IBM PowerHA SystemMirror 7.1 for AIX restore the snapshot, you can see that the HACMPsircol ODM contains this information as illustrated in Figure 7-26 on page 167. Restoring a snapshot To restore a snapshot, follow the path smitty hacmp Cluster Nodes and Networks Manage the Cluster Snapshot Configuration Restore the Cluster Configuration From a Snapshot for restoring a snapshot. Failure to restore a snapshot When you restore the snapshot with the default option, an error message about clcomd communication is displayed. Because there is no configuration, the snapshot fails at the communication_check function in the clsnapshot program as shown in Figure 7-28. cllsnode: Error reading configuration /usr/es/sbin/cluster/utilities/clsnapshot[2127]: apply_CS[116]: communication_check: line 49: local: not found Warning: unable to verify inbound clcomd communication from node "algeria" to the local node, "". /usr/es/sbin/cluster/utilities/clsnapshot[2127]: apply_CS[116]: communication_check: line 49: local: not fou nd Warning: unable to verify inbound clcomd communication from node "brazil" to the local node, "". clsnapshot: Verifying configuration using Cannot get local HACMPnode ODM. Cannot get local HACMPnode ODM. FATAL ERROR: CA_invoke_client nodecompath FATAL ERROR: CA_invoke_client nodecompath FATAL ERROR: CA_invoke_client nodecompath FATAL ERROR: CA_invoke_client nodecompath temporary PowerHA SystemMirror ODM entries... == == == == NULL! NULL! NULL! NULL! @ @ @ @ line: line: line: line: of of of of file: file: file: file: clver_ca_main.c clver_ca_main.c clver_ca_main.c clver_ca_main.c Figure 7-28 A failed snapshot restoration Chapter 7. Migrating to PowerHA 7.1 169 If you are at PowerHA 7.1 SP2, you should not see the failure message. However, some error messages concern the disk heartbeat network (Figure 7-29), which is not supported in PowerHA 7.1. You can ignore this error message. clsnapshot: Removing any existing temporary PowerHA SystemMirror ODM entries... clsnapshot: Creating temporary PowerHA SystemMirror ODM object classes... clsnapshot: Adding PowerHA SystemMirror ODM entries to a temporary directory..ODMDIR set to /tmp/snapshot Error: Network's network type diskhb is not known. Error: Interface/Label's network type diskhb is not known. cllsclstr: Error reading configuration Error: Network's network type diskhb is not known. Error: Interface/Label's network type diskhb is not known. cllsnode: Error reading configuration clodmget: Could not retrieve object for HACMPnode, odm errno 5904 /usr/es/sbin/cluster/utilities/clsnapshot[2139]: apply_CS[125]: communication_check: line 52: local: not found Warning: unable to verify inbound clcomd communication from node "algeria" to the local node, " ". /usr/es/sbin/cluster/utilities/clsnapshot[2139]: apply_CS[125]: communication_check: line 52: local: not found Warning: unable to verify inbound clcomd communication from node "brazil" to the local node, "" Figure 7-29 The snapshot restoring the error with the new clsnapshot command When you finish restoring the snapshot, the CAA cluster is created based on the repository disk and multicast address based in the /var/clmigcheck/clmigcheck.txt file. Sometimes the synchronization or verification fails because the snapshot cannot create the CAA cluster. If you see an error message similar to the one shown in Figure 7-30, look in the /var/adm/ras/syslog.caa file and correct the problem. ERROR: Problems encountered creating the cluster in AIX. Use the syslog facility to see output from the mkcluster command. ERROR: Creating the cluster in AIX failed. Check output for errors in local cluster configuration, correct them, and try synchronization again. ERROR: Updating the cluster in AIX failed. Check output for errors in local cluster configuration, correct them, and try synchronization again. cldare: Error detected during synchronization. Figure 7-30 Failure of CAA creation during synchronization or verification 170 IBM PowerHA SystemMirror 7.1 for AIX Figure 7-30 on page 170 shows a sample CAA creation failure, which is a clrepos_private1 file system mount point that is used for the CAA service. Assuming you have enabled syslog, you can easily find it in the syslog.caa file, which you can find by searching on “odmadd HACMPsircol.add.” After completing all the steps, check the CAA cluster configuration and status on both nodes. First, the caavg_private volume group is created and varied on as shown in Figure 7-31. algeria:/ # lspv hdisk2 000fe4114cf8d258 hdisk3 000fe4114cf8d2ec hdisk8 000fe4114cf8d608 caa_private0 000fe40120e16405 hdisk0 000fe4113f087018 algeria:/ # algeria_vg brazil_vg diskhb caavg_private rootvg active active Figure 7-31 The caavg_private volume group varied on Chapter 7. Migrating to PowerHA 7.1 171 From the lscluster command, you can see information about the CAA cluster including the repository disk, the multicast address, and so on, as shown in Figure 7-32. algeria:/ # lscluster -m Calling node query for all nodes Node query number of nodes examined: 2 Node name: algeria Cluster shorthand id for node: 1 uuid for node: 0410c158-c6ca-11df-88bc-c21e45bc6603 State of node: UP NODE_LOCAL Smoothed rtt to node: 0 Mean Deviation in network rtt to node: 0 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID canada_cluster local e8fbea82-c6c9-11df-b8d6-c21e4a9e5103 Number of points_of_contact for node: 0 Point-of-contact interface & contact state n/a -----------------------------Node name: brazil Cluster shorthand id for node: 2 uuid for node: e8ff0dde-c6c9-11df-b8d6-c21e4a9e5103 State of node: UP Smoothed rtt to node: 7 Mean Deviation in network rtt to node: 3 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID canada_cluster local e8fbea82-c6c9-11df-b8d6-c21e4a9e5103 Number of points_of_contact for node: 2 Point-of-contact interface & contact state en1 UP en0 UP algeria:/mnt/HA71 # lscluster -c Cluster query for cluster canada_cluster returns: Cluster uuid: e8fbea82-c6c9-11df-b8d6-c21e4a9e5103 Number of nodes in cluster = 2 Cluster id for node algeria is 1 Primary IP address for node algeria is 192.168.101.101 Cluster id for node brazil is 2 Primary IP address for node brazil is 192.168.101.102 Number of disks in cluster = 0 Multicast address for cluster is 228.168.101.102 algeria:/mnt/HA71 # Figure 7-32 The lscluster command after creating the CAA cluster 172 IBM PowerHA SystemMirror 7.1 for AIX You can also check whether the multicast address is correctly defined for each interface by running the netstat -a -I en0 command as shown in Figure 7-33. algeria:/ # netstat -a -I en0 Name Mtu Network Address en0 1500 link#2 c2.1e.45.bc.66.3 01:00:5e:28:65:65 01:00:5e:7f:ff:fd 01:00:5e:00:00:01 en0 1500 192.168.100 algeria 228.168.101.101 239.255.255.253 224.0.0.1 en0 1500 10.168.100 algeria_svc 228.168.101.101 239.255.255.253 224.0.0.1 Ipkts Ierrs 1407667 0 Opkts Oerrs Coll 1034372 0 0 1407667 0 1034372 0 0 1407667 0 1034372 0 0 algeria:/ # netstat -a -I en1 Name Mtu Network Address Ipkts Ierrs en1 1500 link#3 c2.1e.45.bc.66.4 390595 0 01:00:5e:28:65:65 01:00:5e:7f:ff:fd 01:00:5e:00:00:01 en1 1500 192.168.200 algeria_boot 390595 0 228.168.101.101 239.255.255.253 224.0.0.1 Opkts Oerrs Coll 23 0 0 23 0 0 Figure 7-33 The multicast address for CAA service After the clmigcheck command is done running, you can remove the older version of PowerHA and install PowerHA 7.1. Optional: Adding a shared disk to the CAA services After the migration, the shared volume group is not included in the CAA service as shown in Figure 7-34. # lspv caa_private0 hdisk2 hdisk3 hdisk0 # 000fe40120e16405 000fe4114cf8d258 000fe4114cf8d2ec 000fe4113f087018 caavg_private algeria_vg brazil_vg rootvg active active Figure 7-34 The lspv output after restoring the snapshot Chapter 7. Migrating to PowerHA 7.1 173 To add the shared volume group disks to the CAA service, run the following command: chcluster -n <cluster_name> -d +hdiskX, hdiskY where: <cluster_name> +hdiskX hdsiskY is canada_cluster. is +hdisk2. is hdisk3. The two shared disks are now included in the CAA shared disk as shown in Figure 7-35. algeria: # chcluster -n canada_cluster -d +hdisk2,hdisk3 chcluster: Cluster shared disks are automatically renamed to names such as cldisk1, [cldisk2, ...] on all cluster nodes. However, this cannot take place while a disk is busy or on a node which is down or not reachable. If any disks cannot be renamed now, they will be renamed later by the clconfd daemon, when the node is available and the disks are not busy. algeria: # Figure 7-35 Using the chcluster command for shared disks Now hdisk2 and hdisk3 are changed to cldisk. The hdisk name from the lspv command shows the cldiskX instead of the hdiskX as shown in Figure 7-36. algeria:/ # lspv caa_private0 000fe40120e16405 cldisk1 000fe4114cf8d258 cldisk2 000fe4114cf8d2ec hdisk8 000fe4114cf8d608 hdisk0 000fe4113f087018 algeria:/ # caavg_private algeria_vg brazil_vg diskhb rootvg active active Figure 7-36 The lspv command showing cldisks for shared disks When you use the lscluster command to perform the check, you can see that the shared disks (cldisk1 and cldisk2) are monitored by the CAA service. Keep in mind that two types of disks are in CAA. One type is the repository disk that is shown as REPDISK, and the other type is the shared disk that is shown as CLUSDISK. See Figure 7-37 on page 175. 174 IBM PowerHA SystemMirror 7.1 for AIX algeria:/ # lscluster -d Storage Interface Query Cluster Name: canada_cluster Cluster uuid: 97833c9e-c5b8-11df-be00-c21e45bc6603 Number of nodes reporting = 2 Number of nodes expected = 2 Node algeria Node uuid = 88cff8be-c58f-11df-95ab-c21e45bc6604 Number of disk discovered = 3 cldisk2 state : UP uDid : 533E3E213600A0B80001146320000F1A74C18BDAA0F1815 FAStT03IBMfcp05VDASD03AIXvscsi uUid : 600a0b80-0011-4632-0000-f1a74c18bdaa type : CLUSDISK cldisk1 state : UP uDid : 533E3E213600A0B8000291B080000D3CB053B7EA60F1815 FAStT03IBMfcp05VDASD03AIXvscsi uUid : 600a0b80-0029-1b08-0000-d3cb053b7ea6 type : CLUSDISK caa_private0 state : UP uDid : uUid : 600a0b80-0029-1b08-0000-d3cd053b7f0d type : REPDISK Node Node uuid = 00000000-0000-0000-0000-000000000000 Number of disk discovered = 0 algeria:/ # Figure 7-37 The shared disks monitored by the CAA service Verifying the cluster To verify the snapshot migration, check the components shown in Table 7-2 on each node. Table 7-2 Components to verify after the snapshot migration Component Command The CAA services are active. lssrc -g caa lscluster -m The RSCT services are active. lssrc -s cthags Start the cluster service one by one. smitty clstart Chapter 7. Migrating to PowerHA 7.1 175 7.3.3 Checklist for performing a snapshot migration Because the entire migration can be confusing, Table 7-3 provides a step-by-step checklist for the snapshot migration of each node in the cluster. Table 7-3 Checklist for performing a snapshot migration Step Node 1 Node 2 0 Ensure that the cluster is running. Ensure that the cluster is running. 1 Create a snapshot. 2 Stop the cluster. Stop the cluster. lssrc -ls clstrmgrES 3 Update AIX 6.1.6. Update AIX 6.1.6. oslevel -s install bos.cluster and bos.ahafs filesets 4 Restart the system. Restart the system. 5 Select option 2 from the clmigcheck menu. Check for unsupported configurations. 6 Select option 3 from the clmigcheck menu. /var/clmigcheck/clgmicheck.txt 7 Remove PowerHA 5.5 and install PowerHA 7.1. 8 Convert the snapshot. 9 Restore the snapshot. 10 Start the cluster. 11 Remove PowerHA 5.5 and install PowerHA 7.1. Check lslpp -l | grep cluster clconvert_snapshot lssrc -ls clstrmgrES, hacmp.out Start the cluster. lssrc -ls clstrmgrES, hacmp.out 7.3.4 Summary A snapshot migration to PowerHA 7.1 entails running the clmigcheck program. Before you begin the migration, you must prepare for it by installing AIX 6.1.6 or later and checking if any part of the configuration is unsupported. Then you run the clmigcheck command to review your PowerHA configuration and verify that is works with PowerHA 7.1. After verifying the configuration, you specify a repository disk and multicast address for the CAA service, which are essential components for the CAA service. After you successfully complete the clmigcheck procedure, you can install PowerHA 7.1. The CAA service is made while you restore your snapshot. PowerHA 7.1 uses the newly configured CAA service for event monitoring and heartbeating. 176 IBM PowerHA SystemMirror 7.1 for AIX 7.4 Rolling migration This section explains how to perform a three-node rolling migration of AIX and PowerHA. The test environment begins with PowerHA 6.1 SP3 and AIX 6.1 TL3 versions. The step-by-step instructions in this topic explain how to perform a three-node rolling migration of AIX to 6.1 TL6 and PowerHA to 7.1 SP1 versions as illustrated in Figure 7-38. Figure 7-38 Three-node cluster before migration The cluster is using virtualized resources provided by VIOS for network and storage. Rootvg (hdisk0) is also hosted from the VIOS. The backing devices are provided from a DS4800 storage system. The network topology is configured as IPAT via aliasing. Also disk heartbeating is used over the shared storage between all the nodes. The cluster contains two resource groups: newyork_rg and test_rg. The newyork_rg resource group hosts the IBM HTTP Server application, and the test_rg resource group hosts a test script application. The node priority for newyork_rg is node chile, and test_rg is node serbia. Node scotland is running in a standby node capacity. Chapter 7. Migrating to PowerHA 7.1 177 Figure 7-39 shows the relevant attributes of the newyork_rg and test_rg resource groups. Resource Group Name Participating Node Name(s) Startup Policy Fallover Policy Fallback Policy Volume Groups Application Servers newyork_rg chile scotland serbia Online On Home Node Only Fallover To Next Priority Node Never Fallback ny_datavg httpd_app Resource Group Name Participating Node Name(s) Startup Policy Fallover Policy Fallback Policy Application Servers test_app_rg serbia chile scotland Online On Home Node Only Fallover To Next Priority Node Fallback To Higher Priority Node test_app Figure 7-39 Three-node cluster resource groups 7.4.1 Planning Before beginning a rolling migration, you must properly plan to ensure that you are ready to proceed. For more information, see 7.1, “Considerations before migrating” on page 152. The migration to PowerHA 7.1 is different from previous releases, because of the support for CAA integration. Therefore, see also 7.2, “Understanding the PowerHA 7.1 migration process” on page 153. Ensure that the cluster is stable on all nodes and is synchronized. With a rolling migration, you must be aware of the following restrictions while performing the migration, because a mixed-software-version cluster is involved: Do not perform synchronization or verification while a mixed-software-version cluster exists. Such actions are not allowed in this case. Do not make any cluster configuration changes. Do not perform a Cluster Single Point Of Control (C-SPOC) operation while a mixed-software-version cluster exists. Such action is not allowed in this case. Try to perform the migration during one maintenance period, and do not leave your cluster in a mixed state for any significant length of time. 7.4.2 Performing a rolling migration In this example, a two-phase migration is performed in which you migrate AIX from version 6.1 TL3 to version 6.1 TL6, restart the system, and then migrate PowerHA. You perform this migration on one node at a time, ensuring that any resource group that the node is hosting is moved to another node first. 178 IBM PowerHA SystemMirror 7.1 for AIX Migrating the first node Figure 7-40 shows the cluster before upgrading AIX. Figure 7-40 Rolling migration: Scotland before the AIX upgrade To migrate the first node, follow these steps: 1. Shut down PowerHA services on the standby node (scotland). Specify the smitty clstop command to stop this node. Because this node is a standby node, no resource groups are hosted. Therefore, you do not need to perform any resource group operations first. Ensure that cluster services are stopped by running the following command: lssrc -ls clstrmgres Look for the ST_INIT status, which indicates that cluster services on this node are in a stopped state. 2. Update AIX to version 6.1 TL6 (scotland node). To perform this task, run the smitty update_all command by using the TL6 images, which you can download by going to: http://www.ibm.com/support/entry/portal/Downloads/IBM_Operating_Systems/AIX CAA-specific file sets: You must install the CAA specific bos.cluster and bos.ahafs file sets because update_all does not install them. After you complete the installation, restart the node. Chapter 7. Migrating to PowerHA 7.1 179 When AIX is upgraded, you are at the stage shown in Figure 7-41. Figure 7-41 Rolling migration: Scotland post AIX upgrade 3. Decide which shared disk you to use for the CAA private repository (scotland node). See 7.1, “Considerations before migrating” on page 152, for more information. Previous volume disk group: The disk must be a clean logical unit number (LUN) that does not contain a previous volume group. If you have a previous volume group on this disk, you must remove it. See 10.4.5, “Volume group name already in use” on page 320. 4. Run the clmigcheck command on the first node (scotland). You have now upgraded AIX to a CAA version and chosen the CAA disk. When you start the clmigcheck command, you see the panel shown in Figure 7-42 on page 181. For more information about the clmigcheck command, see 7.2, “Understanding the PowerHA 7.1 migration process” on page 153. 180 IBM PowerHA SystemMirror 7.1 for AIX ------------[ PowerHA SystemMirror Migration Check ]------------Please select one of the following options: 1 = Check ODM configuration. 2 = Check snapshot configuration. 3 = Enter repository disk and multicast IP addresses. Select one of the above,"x"to exit or "h" for help: Figure 7-42 Running the clmigcheck command first during a rolling migration a. Select option 1 (Check the ODM configuration). When choosing this option, the clmigcheck command checks your configuration and reports any problems that cannot be migrated. This migration scenario uses disk-based heartbeating. The clmigcheck command detects this method and shows a message similar to the one in Figure 7-43, indicating that this configuration will be removed during migration. ------------[ PowerHA SystemMirror Migration Check ]------------CONFIG-WARNING: The configuration contains unsupported hardware: Disk Heartbeat network. The PowerHA network name is net_diskhb_01. This will be removed from the configuration during the migration to PowerHA SystemMirror 7.1. Hit <Enter> to continue Figure 7-43 The disk heartbeat warning message from the clmigcheck command You do not need to take any action because the disk-based heartbeating is automatically removed during migration. Because three disk heartbeat networks are in the configuration, this warning message is displayed three times, once for each network. If no errors are detected, you see the message shown in Figure 7-44. ------------[ PowerHA SystemMirror Migration Check ]------------The ODM has no unsupported elements. Hit <Enter> to continue Figure 7-44 ODM no unsupported elements message Press Enter after this last panel, and you return to the main menu. Chapter 7. Migrating to PowerHA 7.1 181 b. Select option 3 to enter the repository disk. As shown in Figure 7-45, in this scenario, we chose option 1 to use hdisk1 (PVID 000fe40120e16405). -----------[ PowerHA SystemMirror Migration Check ]------------Select the disk to use for the repository 1 2 3 4 5 6 = = = = = = 000fe40120e16405(hdisk1) 000fe4114cf8d258(hdisk2) 000fe4114cf8d2ec(hdisk3) 000fe4013560cc77(hdisk5) 000fe4114cf8d4d5(hdisk6) 000fe4114cf8d579(hdisk7) Select one of the above or "x" to exit: Figure 7-45 Choosing a CAA disk c. Enter the multicast address as shown in Figure 7-46. You can specify a multicast, or you can have clmigcheck automatically assign one. For more information about multicast addresses, see 1.3.1, “Communication interfaces” on page 13. Press Enter and you return to the main menu. ------------[ PowerHA SystemMirror Migration Check ]------------PowerHA SystemMirror uses multicast address for internal cluster communication and monitoring. These must be in the multicast range, 224.0.0.0 - 239.255.255.255. If you make a NULL entry, AIX will generate an appropriate address for you. You should only specify an address if you have an explicit reason to do so, but are cautioned that this address cannot be changed once the configuration is activated (i.e. migration is complete). h = help Enter the multicast IP address to use for network monitoring: Figure 7-46 Choosing a multicast address d. Exit the clmigcheck tool. 182 IBM PowerHA SystemMirror 7.1 for AIX 5. Verify whether you are ready for the PowerHA upgrade on the node scotland by running the clmigcheck tool again. If you are ready, you see the panel shown in Figure 7-47. ------------[ PowerHA SystemMirror Migration Check ]------------clmigcheck: This is not the first node or last node clmigcheck was run on. No further checking is required on this node. You can install the new version of PowerHA SystemMirror. Hit <Enter> to continue Figure 7-47 Verifying readiness for migration 6. Upgrade PowerHA on the scotland node to PowerHA 7.1 SP1. Because the cluster services are down, you can perform a smitty update_all to upgrade PowerHA. 7. When this process is complete, modify the new rhosts definition for CAA as shown in Figure 7-48. Although in this scenario, we used network addresses, you can also add the short name for the host name into rhosts considering that you configured the /etc/hosts file correctly. See “Creating a cluster with host names in the FQDN format” on page 75, for more information. /etc/cluster # cat rhosts 192.168.101.111 192.168.101.112 192.168.101.113 Figure 7-48 Extract showing the configured rhosts file Populating the /etc/cluster/rhosts file: The /etc/cluster/rhosts file must be populated with all cluster IP addresses before using PowerHA SystemMirror. This process was done automatically in previous releases but is now a required, manual process. The addresses that you enter in this file must include the addresses that resolve to the host name of the cluster nodes. If you update this file, you must refresh the clcomd subsystem by using the following command: refresh -s clcomd Restarting the cluster: You do not need to restart the cluster after you upgrade PowerHA. 8. Start PowerHA on the scotland node by issuing the smitty clstart command. The node should be able to rejoin the cluster. However, you receive warning messages about mixed versions of PowerHA. After PowerHA is started on this node, move any resource groups that the next node is hosting onto this node so that you can migrate the second node in the cluster. In this scenario, the serbia node is hosting the test_app_rg resource group. Therefore, we perform a resource group move request to move this resource to the newly migrated scotland node. The serbia node is then available to migrate. Chapter 7. Migrating to PowerHA 7.1 183 You have now completed the first node migration of the three-node cluster. You have rejoined the cluster and are now in a mixed version. Figure 7-49 shows the starting point for migrating the next node in the cluster, with the test_app_rg resource group moved to the newly migrated scotland node. Figure 7-49 Rolling migration: Scotland post HA upgrade 184 IBM PowerHA SystemMirror 7.1 for AIX Migrating the second node Figure 7-50 shows that you are ready to proceed with migration of the second node (serbia). Figure 7-50 Rolling migration: Serbia before the AIX upgrade To migrate the second node, follow these steps: 1. Shut down PowerHA services on the serbia node. You must stop cluster services on this node before you begin the migration. 2. Upgrade to AIX 6.1 TL6 (serbia node) similar to the process you used for the scotland node. After the update is complete, ensure that AIX is rebooted. Chapter 7. Migrating to PowerHA 7.1 185 You are now in the state as shown in Figure 7-51. Figure 7-51 Rolling migration: Serbia post AIX upgrade 3. Run the clmigcheck command to ensure that the migration worked and that you can proceed with the PowerHA upgrade. This step is important even though you have already performed the cluster configuration migration check and CAA configuration on the first node (scotland) is complete. Figure 7-52 shows the panel that you see now. ------------[ PowerHA SystemMirror Migration Check ]------------clmigcheck: This is not the first node or last node clmigcheck was run on. No further checking is required on this node. You can install the new version of PowerHA SystemMirror. Hit <Enter> to continue Figure 7-52 The clmigcheck panel on the second node 4. Upgrade PowerHA on the serbia node to PowerHA 7.1 SP1. Follow the same migration procedure as in the first node. Reminder: Update the /etc/cluster/rhosts file so that it is the same as the first node that you upgraded. See step 6 on page 183. 186 IBM PowerHA SystemMirror 7.1 for AIX 5. Start PowerHA on the serbia node and rejoin this node to the cluster. After this node is started, check and move the newyork_rg resource group from the chile node to the scotland node. By performing this task, you are ready to proceed with migration of the final node in the cluster (the chile node). At this stage, two of the three nodes in the cluster are migrated to AIX 6.1 TL6 and PowerHA 7.1. The chile node is the last node in the cluster to be upgraded. Figure 7-53 shows how the cluster looks now. Figure 7-53 Rolling migration: The serbia node post HA upgrade Chapter 7. Migrating to PowerHA 7.1 187 Migrating the final node Figure 7-54 shows that you are ready to proceed with migration of the final node of the chile cluster. The newyork_rg resource group has been moved to the scotland node and the cluster services are down and ready for the AIX migration. Figure 7-54 Rolling migration: The chile node before the AIX upgrade To migrate the final node, follow these steps: 1. Shut down PowerHA services on the chile node. 2. Upgrade to AIX 6.1 TL6 (chile node). Remember to reboot the node after the upgrade. Then run the clmigcheck command for the last time. When the clmigcheck command is run for the last time, it recognizes that this node is the last node of the cluster to migrate. This command then initiates the final phase of the migration, which configures CAA. You see the message shown in Figure 7-55. clmigcheck: You can install the new version of PowerHA SystemMirror. Figure 7-55 Final message from the clmigcheck command 188 IBM PowerHA SystemMirror 7.1 for AIX If a problem exists at this stage, you might see the message shown in Figure 7-56. chile:/ # clmigcheck Verifying clcomd communication, please be patient. clmigcheck: Running /usr/sbin/rsct/install/bin/ct_caa_set_disabled_for_migration on each node in the cluster Creating CAA cluster, please be patient. ERROR: Problems encountered creating the cluster in AIX. Use the syslog facility to see output from the mkcluster command. Figure 7-56 Error condition from clmigcheck If you see a message similar to the one shown in Figure 7-56, the final mkcluster phase has failed. For more information about this problem, see 10.2, “Troubleshooting the migration” on page 308. At this stage, you have upgraded AIX and run the final clmigcheck process. Figure 7-57 shows how the cluster looks now. Figure 7-57 Rolling migration: Chile post AIX upgrade Chapter 7. Migrating to PowerHA 7.1 189 3. Upgrade PowerHA on the chile node by following the same procedure that you previously used. Reminder: Update the /etc/cluster/rhosts file so that it is the same as the other nodes that you upgraded. See step 6 on page 183. In this scenario, you started PowerHA on the chile node and performed a synchronization or verification of the cluster, which is the final stage of the migration. The newyork_rg resource group was moved back to the chile node. The cluster migration is now completed. Figure 7-58 shows how the cluster looks now. Figure 7-58 Rolling migration completed 190 IBM PowerHA SystemMirror 7.1 for AIX 7.4.3 Checking your newly migrated cluster After the migration is completed, perform the following checks to ensure that everything has migrated correctly: Verify that CAA is configured and running on all nodes. Check that CAA is working by running the lscluster -m command. This command returns information about your cluster from all your nodes. If a problem exists, you see a message similar to the one shown in Figure 7-59. # lscluster -m Cluster services are not active. Figure 7-59 Message indicating that CAA is not running If you receive this message, see 10.4.7, “The ‘Cluster services are not active’ message” on page 323, for details about how to fix this problem. Verify that CAA private is defined and active on all nodes. Check the lspv output to ensure that the CAA repository is defined and varied on for each node. You see output similar to what is shown in Figure 7-60. chile:/ # lspv caa_private0 hdisk2 000fe40120e16405 000fe4114cf8d258 caavg_private None active Figure 7-60 Extract from lspv showing the CAA repository disk Check conversion of PowerHA ODM. Review the /tmp/clconvert.log file to ensure that the conversion of the PowerHA ODM has been successful. For additional details about the log files and troubleshooting information, see 10.1, “Locating the log files” on page 306. Synchronize or verify the cluster. Run verification on your cluster to ensure that it operates as expected. Troubleshooting: For information about common problems and solutions, see Chapter 10, “Troubleshooting PowerHA 7.1” on page 305. 7.5 Offline migration This section explains how to perform an offline migration. The test environment begins with AIX 6.1.3.2 and PowerHA 6.1.0.2. The migration leads to AIX 7.1.0.1 and PowerHA 7.1.0.1. 7.5.1 Planning the offline migration Part of planning for any migration is to ensure that you meet all the hardware and software requirements. For more details, see 7.1, “Considerations before migrating” on page 152, and 7.2, “Understanding the PowerHA 7.1 migration process” on page 153. Chapter 7. Migrating to PowerHA 7.1 191 Starting configuration Figure 7-61 on page 192 shows a simplified layout of the cluster that is migrated in this scenario. Both systems are running AIX 6.1 TL3 SP 2. The installed PowerHA version is 6.1 SP 2. The cluster layout is a mutual takeover configuration. The munich system is the primary server for the HTTP application. The berlin system is the primary server for the Network File System (NFS), which is cross mounted by the system munich. Because of resource limitations, the disk heartbeat is using one of the existing shared disks. Two networks are defined: The net_ether_01 network is the administrative network and is used only by the system administration team. The net_ether_10 network is used by the applications and its users. Figure 7-61 Start point for offline migration 192 IBM PowerHA SystemMirror 7.1 for AIX Planned target configuration The plan is to update both systems to AIX 7.1 and to PowerHA SystemMirror 7.1. Because PowerHA SystemMirror 6.1 SP2 is not supported on AIX 7.1, the quickest way to update it is through an offline migration. A rolling migration is also possible, but requires the following migration steps: 1. Update to PowerHA 6.1 SP3 or later (which can be performed by using a nondisruptive upgrade method). 2. Migrate to AIX 7.1. 3. Migrate to PowerHA 7.1. PowerHA 6.1 support on AIX 7.1: PowerHA 6.1 SP2 is not supported on AIX 7.1. You need a minimum of PowerHA 6.1 SP3. As mentioned in 1.2.3, “The central repository” on page 9, an additional shared disk is required for the new CAA repository disk. Figure 7-62 shows the results of the completed migration. To perform the migration, see 7.5.3, “Performing an offline migration” on page 195. Figure 7-62 Planned configuration for offline migration Chapter 7. Migrating to PowerHA 7.1 193 7.5.2 Offline migration flow Figure 7-63 shows a high-level overview of the offline migration flow. First and most importantly, you must have fulfilled all the new hardware requirements. Then you ensure that AIX has been upgraded on all cluster nodes before continuing with the update of PowerHA. To perform the migration, see 7.5.3, “Performing an offline migration” on page 195. Figure 7-63 Offline migration flow 194 IBM PowerHA SystemMirror 7.1 for AIX 7.5.3 Performing an offline migration Before you start the migration, you must complete all hardware and software requirements. For a list of the requirements, see 7.1, “Considerations before migrating” on page 152. 1. Create a snapshot and copy it to a safe place and create a system backup (mksysb). The snapshot and the mksysb are not required to complete the migration, but they might be helpful if something goes wrong. You can also use the snapshot file to perform a snapshot migration. You can use the system backup to re-install the system back to its original starting point if necessary. 2. Stop cluster services on all nodes by running the smitty clstop command. Before you continue, ensure that cluster services are stopped on all nodes. 3. Update to AIX 6.1.6 or later. Alternatively perform a migration installation of AIX to version 7.1. or later. In this test scenario, a migration installation to version 7.1 is performed on both systems in parallel. 4. Ensure that the new AIX cluster file sets are installed, specifically the bos.ahafs and bos.cluster file sets. These file sets are not installed as part of the AIX migration. 5. Restart the systems. Important: You must restart the systems to ensure that all needed processes for CAA are running. 6. Verify that the new clcomd subsystem is running. If the clcomd subsystem is not running, a required file set is missing (see step 4). Figure 7-64 shows an example of the output indicating that the subsystems are running. # lssrc -a | grep clcom clcomd caa clcomdES clcomdES # 3866824 5243068 active active Figure 7-64 Verifying if the clcomd subsystem is running Beginning with PowerHA 6.1 SP3 or later, you can start the cluster if preferred, but we do not start it now in this scenario. 7. Run the clmigcheck program on one of the cluster nodes. Important: You must run the clmigcheck program (in the /usr/sbin/ directory) before you install PowerHA 7.1. Keep in mind that you must run this program on each node one-at-a-time in the cluster. Chapter 7. Migrating to PowerHA 7.1 195 The following steps are required for offline migration when running the clmigcheck program. The steps might differ slightly if you perform a rolling or snapshot migration. a. Select option 1 (check ODM configuration) from the first clmigcheck panel (Figure 7-65). ------------[ PowerHA SystemMirror Migration Check ]------------Please select one of the following options: 1 = Check ODM configuration. 2 = Check snapshot configuration. 3 = Enter repository disk and multicast IP addresses. Select one of the above,"x"to exit or "h" for help: 1 Figure 7-65 The clmigcheck main panel While checking the configuration, you might see warning or error messages. You must correct errors manually, but can clean up issues identified by warning messages during the migration process. In this case, a warning message (Figure 7-66) is displayed indicating the disk heartbeat network will be removed at the end of the migration. ------------[ PowerHA SystemMirror Migration Check ]------------CONFIG-WARNING: The configuration contains unsupported hardware: Disk Heartbeat network. The PowerHA network name is net_diskhb_01. This will be removed from the configuration during the migration to PowerHA SystemMirror 7.1. Hit <Enter> to continue Figure 7-66 Warning message after selecting clmigcheck option 1 b. Continue with the next clmigcheck panel. Only one error or warning is displayed at a time. Press the Enter key, and any additional messages are displayed. In this case, only one warning message is displayed. Manually correct or fix all issues that are identified by error messages before continuing with the process. After you fix an issue, restart the system as explained in step 5 on page 195. 196 IBM PowerHA SystemMirror 7.1 for AIX c. Verify that you receive a message similar to the one in Figure 7-67 indicating that ODM has no supported elements. You must receive this message before you continue with the clmigcheck process and the installation of PowerHA. ------------[ PowerHA SystemMirror Migration Check ]------------The ODM has no unsupported elements. Hit <Enter> to continue Figure 7-67 ODM check successful message Press Enter, and the main clmigcheck panel (Figure 7-65 on page 196) is displayed again. d. Select option 3 (Enter repository disk and multicast IP addresses). The next panel (Figure 7-68) lists all available shared disks that might be used for the CAA repository disk. You need one shared disk for the CAA repository. ------------[ PowerHA SystemMirror Migration Check ]------------Select the disk to use for the repository 1 = 00c0f6a01c784107(hdisk4) Select one of the above or "x" to exit: 1 Figure 7-68 Selecting the repository disk e. Configure the multicast address as shown in Figure 7-69 on page 198. The system automatically creates an appropriate address for you. By default, PowerHA creates a multicast address by replacing the first octet of the IP communication path of the lowest node in the cluster by 228. Press Enter. Manually specifying an address: Only specify an address manually if you have an explicit reason to do so. Important: You cannot change the selected IP multicast address after the configuration is activated. You must set up any routers in the network topology to forward multicast messages. Chapter 7. Migrating to PowerHA 7.1 197 ------------[ PowerHA SystemMirror Migration Check ]------------PowerHA SystemMirror uses multicast address for internal cluster communication and monitoring. These must be in the multicast range, 224.0.0.0 - 239.255.255.255. If you make a NULL entry, AIX will generate an appropriate address for you. You should only specify an address if you have an explicit reason to do so, but are cautioned that this address cannot be changed once the configuration is activated (i.e. migration is complete). h = help Enter the multicast IP address to use for network monitoring: Figure 7-69 Configuring a multicast address f. From the main clmigcheck panel, type an x to exit the clmigcheck program. g. In the next panel (Figure 7-70), confirm the exit request by typing y. ------------[ PowerHA SystemMirror Migration Check ]------------You have requested to exit clmigcheck. Do you really want to exit? (y) y Figure 7-70 The clmigcheck exit confirmation message A warning message (Figure 7-71) is displayed as a reminder to complete all the previous steps before you exit. Note - If you have not completed the input of repository disks and multicast IP addresses, you will not be able to install PowerHA SystemMirror Additional details for this session may be found in /tmp/clmigcheck/clmigcheck.log. Figure 7-71 The clmigcheck exit warning message 198 IBM PowerHA SystemMirror 7.1 for AIX 8. Install PowerHA only on the node where the clmigcheck program was executed. If the clmigcheck program is not run, a failure message (Figure 7-72) is displayed when you try to install PowerHA 7.1. In this case, return to step 7 on page 195. COMMAND STATUS Command: failed stdout: yes stderr: no Before command completion, additional instructions may appear below. [MORE...94] restricted by GSA ADP Schedule Contract with IBM Corp. . . . . . << End of copyright notice for cluster.es.migcheck >>. . . . The /usr/sbin/clmigcheck command must be run to verify the back level configuration before you can install this version. If you are not migrating the back level configuration you must remove it before before installing this version. Failed /usr/sbin/clmigcheck has not been run instal: Failed while executing the cluster.es.migcheck.pre_i script. [MORE...472] F1=Help F8=Image n=Find Next F2=Refresh F9=Shell F3=Cancel F10=Exit F6=Command /=Find Figure 7-72 PowerHA 7.1 installation failure message 9. Add the host names of your cluster nodes to the /etc/cluster/rhosts file. The names must match the PowerHA node names. 10.Refresh the clcomd subsystem. refresh -s clcomd 11.Review the /tmp/clconvert.log file to ensure that a conversion of the PowerHA ODMs has occurred. 12.Start cluster services only on the node that you updated by using smitty clstart. 13.Ensure that the cluster services have started successfully on this node by using any of the following commands:. clstat -a lssrc -ls clstrmgrES | grep state clmgr query cluster | grep STATE 14.Continue to the next node. 15.Run the clmigcheck program on this node. Keep in mind that you must run the clmigcheck program on each node before you can install PowerHA 7.1. Follow the same steps as for the first system as explained in step 7 on page 195. Chapter 7. Migrating to PowerHA 7.1 199 An error message similar to the one shown in Figure 7-73 indicates that one of the steps was not performed. Often this message is displayed because the system was not restarted after the installation of the AIX cluster file sets. To correct this issue, return to step 4 on page 195. You might have to restart both systems, depending on which part was missed. # clmigcheck Saving existing /tmp/clmigcheck/clmigcheck.log to /tmp/clmigcheck/clmigcheck.log.bak rshexec: cannot connect to node munich ERROR: Internode communication failed, check the clcomd.log file for more information. # Figure 7-73 The clmigcheck execution error message Attention: Do not start the clcomd subsystem manually. Starting this system manually can result in further errors, which might require you to re-install this node or all the cluster nodes. 16.Install PowerHA only on this node in the same way as you did on the first node. See step 8 on page 199. 17.As on the first node, add the host names of your cluster nodes to the /etc/cluster/rhosts file. The names must be the same as the node names. 18.Refresh the clcomd subsystem. 19.Start the cluster services only on the node that you updated. 20.Ensure that the cluster services started successfully on this node. 21.If you have more than two nodes in you cluster, repeat step 15 on page 199 through step 20 until all of your cluster nodes are updated. You now have a fully running cluster environment. Before going into production mode, test your cluster as explained in Chapter 9, “Testing the PowerHA 7.1 cluster” on page 259. Upon checking the topology information by using the cltopinfo command, all non-IP and disk heartbeat networks should be removed. If these networks are not removed, see Chapter 10, “Troubleshooting PowerHA 7.1” on page 305. When checking the RSCT subsystems, the topology subsystem should now be inactive as shown in Figure 7-74. # lssrc -a | grep svcs grpsvcs grpsvcs emsvcs emsvcs topsvcs topsvcs grpglsm grpsvcs emaixos emsvcs Figure 7-74 Checking for topology service 200 IBM PowerHA SystemMirror 7.1 for AIX 6684834 5898390 active active inoperative inoperative inoperative 8 Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster Monitoring plays an important role in managing issues when a cluster has duplicated hardware that can “hide” the failing components from the user. It is also essential for tracking the behavior of a cluster and helping to address performance issues or bad design implementations. The role of the administrator is to quickly find relevant information and analyze it to make the best decision in every situation. This chapter provides several examples that show how the PowerHA 7.1 administrator can gather information about the cluster by using several methods. For most of the examples in this chapter, the korea cluster from the test environment is used with the participating seoul and busan nodes. All the commands in the examples are executed as root user. This chapter includes the following topics: Collecting information before a cluster is configured Collecting information after a cluster is configured Collecting information after a cluster is running © Copyright IBM Corp. 2011. All rights reserved. 201 8.1 Collecting information before a cluster is configured Before you configure the cluster, you must collect the relevant information. Later, the administrator can use this information to see the changes that have been made after a configured IBM PowerHA SystemMirror 7.1 for AIX cluster is running. Ensure that this information is available to assist in troubleshooting and diagnosing the cluster in the future. This topic lists the relevant information that you might want to collect. The /etc/hosts file The /etc/hosts file must have all the IP addresses that are used in the cluster configuration, including the boot or base addresses, persistent addresses, and service addresses, as shown in Example 8-1. Example 8-1 A /etc/hosts sample configuration seoul, busan:/ # egrep "seoul|busan|poksap" /etc/hosts 192.168.101.143 seoul-b1 # Boot IP label 1 192.168.101.144 busan-b1 # Boot IP label 1 192.168.201.143 seoul-b2 # Boot IP label 2 192.168.201.144 busan-b2 # Boot IP label 2 10.168.101.43 seoul # Persistent IP 10.168.101.44 busan # Persistent IP 10.168.101.143 poksap-db # Service IP label The /etc/cluster/rhosts file The /etc/cluster/rhosts file (Example 8-2) in PowerHA 7.1 replaces the /usr/es/sbin/cluster/etc/rhosts file. This file is populated with the communication paths used at the moment of the nodes definition. Example 8-2 A /etc/cluster/rhosts sample configuration seoul, busan:/ # cat /etc/cluster/rhosts seoul # Persistent IP address used as communication path busan # Persistent IP address used as communication path CAA subsystems Cluster Aware AIX (CAA) introduces a new set of subsystems. When the cluster is not running, its status is inactive, except for the clcomd subsystem, which is active (Example 8-3). The clcomdES subsystem has been replaced by the clcomd subsystem and is no longer part of the cluster subsystems group. It is now part of the AIX Base Operating System (BOS), not PowerHA. Example 8-3 CAA subsystems status seoul, busan:/ # lssrc -a | grep caa clcomd caa 5505056 cld caa clconfd caa active inoperative inoperative busan:/ # lslpp -w /usr/sbin/clcomd File Fileset Type ---------------------------------------------------------------------------/usr/sbin/clcomd bos.cluster.rte File 202 IBM PowerHA SystemMirror 7.1 for AIX PowerHA groups IBM PowerHA 7.1 creates two operating system groups during installation. The group numbers must be consistent across cluster nodes as shown in Example 8-4. Example 8-4 Groups created while installing PowerHA file sets seoul, busan:/ # grep ha /etc/group hacmp:!:202: haemrm:!:203: Disk configuration With the current code level in AIX 7.1.0.1, the CAA repository cannot be created over virtual SCSI (VSCSI) disks. For the korea cluster, a DS4800 storage system is used and is accessed over N_Port ID Virtualization (NPIV). The rootvg volume group is the only one using VSCSI devices. Example 8-5 shows a list of storage disks. Example 8-5 Storage disks listing seoul:/ # lspv hdisk0 00c0f6a088a155eb hdisk1 00c0f6a077839da7 hdisk2 00c0f6a0107734ea hdisk3 00c0f6a010773532 rootvg None None None active busan:/ # lspv hdisk0 00c0f6a089390270 hdisk1 00c0f6a077839da7 hdisk2 00c0f6a0107734ea hdisk3 00c0f6a010773532 rootvg None None None active seoul, hdisk0 hdisk1 hdisk2 hdisk3 busan:/ # Available Available Available Available lsdev -Cc disk Virtual SCSI Disk Drive C5-T1-01 MPIO Other DS4K Array Disk C5-T1-01 MPIO Other DS4K Array Disk C5-T1-01 MPIO Other DS4K Array Disk Network interfaces configuration The boot or base address is configured as the initial address for each network interface. The future persistent IP address is aliased over the en0 interface in each node before the PowerHA cluster configuration. Example 8-6 shows a configuration of the network interfaces. Example 8-6 Network interfaces configuration seoul:/ # ifconfig -a en0: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT ,CHECKSUM_OFFLOAD(ACTIVE),CHAIN> inet 192.168.101.143 netmask 0xffffff00 broadcast 192.168.101.255 inet 10.168.101.43 netmask 0xffffff00 broadcast 10.168.101.255 tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1 en2: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT ,CHECKSUM_OFFLOAD(ACTIVE),CHAIN> inet 192.168.201.143 netmask 0xffffff00 broadcast 192.168.201.255 Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 203 lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LAR GESEND,CHAIN> inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1%1/0 tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1 busan:/ # ifconfig -a en0: en0: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT ,CHECKSUM_OFFLOAD(ACTIVE),CHAIN> inet 192.168.101.144 netmask 0xffffff00 broadcast 192.168.101.255 inet 10.168.101.44 netmask 0xffffff00 broadcast 10.168.101.255 tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1 en2: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT ,CHECKSUM_OFFLOAD(ACTIVE),CHAIN> inet 192.168.201.144 netmask 0xffffff00 broadcast 192.168.201.255 tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1 lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LAR GESEND,CHAIN> inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1%1/0 tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1 Routing table Keeping the routing table is an important source of information. As shown in 8.3.1, “AIX commands and log files” on page 216, the multicast address is not displayed in this table, even when the CAA and IBM PowerHA clusters are running. Example 8-7 shows the routing table for the seoul node. Example 8-7 Routing table seoul:/ # netstat -rn Routing tables Destination Gateway Flags Route tree for Protocol Family 2 (Internet): default 192.168.100.60 UG 10.168.100.0 10.168.101.43 UHSb 10.168.100/22 10.168.101.43 U 10.168.101.43 127.0.0.1 UGHS 10.168.103.255 10.168.101.43 UHSb 127/8 127.0.0.1 U 192.168.100.0 192.168.101.143 UHSb 192.168.100/22 192.168.101.143 U 192.168.101.143 127.0.0.1 UGHS 192.168.103.255 192.168.101.143 UHSb 192.168.200.0 192.168.201.143 UHSb 192.168.200/22 192.168.201.143 U 192.168.201.143 127.0.0.1 UGHS 192.168.203.255 192.168.201.143 UHSb 204 IBM PowerHA SystemMirror 7.1 for AIX Refs Use If 1 0 10 11 0 12 0 2 0 0 0 0 0 0 3489 0 39006 24356 0 10746 0 1057 16 39 0 2 4 0 en0 en0 en0 lo0 en0 lo0 en0 en0 lo0 en0 en2 en2 lo0 en2 Exp Groups - - => => => Route tree for Protocol Family 24 (Internet v6): ::1%1 ::1%1 UH 3 17903 lo0 - - Multicast information You can use the netstat command to display information about an interface for which multicast is enabled. As shown in Example 8-8 for en0, no multicast address is configured, other than the default 224.0.0.1 address before the cluster is configured. Example 8-8 Multicast information seoul:/ # netstat -a -I en0 Name Mtu Network Address en0 1500 link#2 a2.4e.50.54.31.3 01:00:5e:7f:ff:fd 01:00:5e:00:00:01 en0 1500 192.168.100 seoul-b1 239.255.255.253 224.0.0.1 en0 1500 10.168.100 seoul 239.255.255.253 224.0.0.1 Ipkts Ierrs 304248 0 Opkts 60964 Oerrs 0 Coll 0 304248 0 60964 0 0 304248 0 60964 0 0 Status of the IBM Systems Director common agent subsystems The two subsystems must be active in every node to be discovered and managed by IBM Systems Director as shown in Example 8-9. To monitor the cluster using the IBM Systems Director web and command-line interfaces (CLIs), see 8.3, “Collecting information after a cluster is running” on page 216. Example 8-9 Common agent subsystems status seoul:/ # lssrc -a | egrep "cim|platform" platform_agent 2359482 cimsys 3211362 active active busan:/ # lssrc -a | egrep "cim|platform" platform_agent 3014798 cimsys 2818190 active active Cluster status Before a cluster is configured, the state of every node is NOT_CONFIGURED as shown in Example 8-10. Example 8-10 PowerHA cluster status seoul:/ # lssrc -g cluster Subsystem Group clstrmgrES cluster PID 6947066 Status active seoul:/ # lssrc -ls clstrmgrES Current state: NOT_CONFIGURED sccsid = "$Header: @(#) 61haes_r710_integration/13 43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710 2010-08-19T1 0:34:17-05:00$" Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 205 busan:/ # lssrc -g cluster Subsystem Group clstrmgrES cluster PID 3342346 Status active busan:/ # lssrc -ls clstrmgrES Current state: NOT_CONFIGURED sccsid = "$Header: @(#) 61haes_r710_integration/13 43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710 2010-08-19T1 0:34:17-05:00$" Modifications in the /etc/syslogd.conf file During the installation of the PowerHA 7.1 file sets, entries are added to the /etc/syslogd.conf configuration file as shown in Example 8-11. Example 8-11 Modifications to the /etc/syslogd.conf file # PowerHA SystemMirror Critical Messages local0.crit /dev/console # PowerHA SystemMirror Informational Messages local0.info /var/hacmp/adm/cluster.log # PowerHA SystemMirror Messages from Cluster Scripts user.notice /var/hacmp/adm/cluster.log # PowerHA SystemMirror Messages from Cluster Daemons daemon.notice /var/hacmp/adm/cluster.log Lines added to the /etc/inittab file In PowerHA 7.1, the clcomd subsystem has a separate entry in the /etc/inittab file because the clcomd subsystem is no longer part of the cluster subsystem group. Two entries now exist as shown in Example 8-12. Example 8-12 Modification to the /etc/inittab file clcomd:23456789:once:/usr/bin/startsrc -s clcomd hacmp:2:once:/usr/es/sbin/cluster/etc/rc.init >/dev/console 2>&1 8.2 Collecting information after a cluster is configured After the configuration is done and the first cluster synchronization is performed, the CAA services become available. Also, the administrator can start using the clcmd utility that distributes every command passed as an argument to all the cluster nodes. As soon as the configuration is synchronized to all nodes and the CAA cluster is created, the administrator cannot change the cluster name or the cluster multicast address. Changing the repository disk: The administrator can change the repository disk with the procedure for replacing a repository disk provided in the PowerHA 7.1 Release Notes. 206 IBM PowerHA SystemMirror 7.1 for AIX Disk configuration During the first successful synchronization, the CAA repository is created over the chosen disk. In each node, the hdisk device is renamed according to the new cluster unified nomenclature. Is name changes to caa_private0. The repository volume group is called caavg_private and is in active state in every node. After the first synchronization, two other disks are added in the cluster storage by using the following command: chcluster -n korea -d+hdisk2,hdisk3 where hdisk2 is renamed to cldisk2, and hdisk3 is renamed to cldisk1. Example 8-13 shows the resulting disk listing. Example 8-13 Disk listing seoul:/ # clcmd lspv ------------------------------NODE seoul ------------------------------hdisk0 00c0f6a088a155eb caa_private0 00c0f6a077839da7 cldisk2 00c0f6a0107734ea cldisk1 00c0f6a010773532 ------------------------------NODE busan ------------------------------hdisk0 00c0f6a089390270 caa_private0 00c0f6a077839da7 cldisk2 00c0f6a0107734ea cldisk1 00c0f6a010773532 rootvg caavg_private None None active active rootvg caavg_private None None active active Attention: The cluster repository disk is a special device for the cluster. The use of Logical Volume Manager (LVM) commands over the repository disk is not supported. AIX LVM commands are single node commands and are not intended for use in a clustered configuration. Multicast information Compared with the multicast information collected when the cluster was not configured, the netstat command now shows the 228.168.101.43 address in the table (Example 8-14). Example 8-14 Multicast information seoul:/ # netstat -a -I en0 Name Mtu Network Address Ipkts Ierrs en0 1500 link#2 a2.4e.50.54.31.3 70339 0 01:00:5e:28:65:2b 01:00:5e:7f:ff:fd 01:00:5e:00:00:01 en0 1500 192.168.100 seoul-b1 70339 0 228.168.101.43 239.255.255.253 224.0.0.1 Opkts Oerrs Coll 44686 0 0 44686 0 0 Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 207 en0 1500 10.168.100 seoul 228.168.101.43 239.255.255.253 224.0.0.1 70339 0 44686 0 0 Cluster status The cluster status changes from NOT_CONFIGURED to ST_INIT as shown in Example 8-15. Example 8-15 PowerHA cluster status busan:/ # lssrc -ls clstrmgrES Current state: ST_INIT sccsid = "$Header: @(#) 61haes_r710_integration/13 43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710 2010-08-19T1 0:34:17-05:00$" CAA subsystem group active All the CAA subsystems become active after the first cluster synchronization as shown in Example 8-16. Example 8-16 CAA subsystems status seoul:/ # clcmd lssrc -g caa ------------------------------NODE seoul ------------------------------Subsystem Group cld caa clcomd caa clconfd caa solidhac caa solid caa ------------------------------NODE busan ------------------------------Subsystem Group cld caa clcomd caa solid caa clconfd caa solidhac caa PID 3735780 5439664 4915418 6947064 5701642 Status active active active active active PID 3211462 2687186 6160402 6488286 5439698 Status active active active active active Subsystem guide: cld determines whether the local node must become the primary or secondary solidDB server in a failover. The solid subsystem is the database engine. The solidhac subsystem is used for the high availability of the solidDB server. The clconfd subsystem runs every 10 minutes to put any missed cluster configuration changes into effect on the local node. 208 IBM PowerHA SystemMirror 7.1 for AIX Cluster information using the lscluster command CAA comes with a set of command-line tools, as explained in the following sections, that can be used to monitor the status and statistics of a running cluster. For more information about CAA and its functionalities, see Chapter 2, “Features of PowerHA SystemMirror 7.1” on page 23. Listing the cluster configuration: -c flag Example 8-17 shows the cluster configuration by using the lscluster -c command. Example 8-17 Listing the cluster configuration seoul:/ # lscluster -c Cluster query for cluster korea returns: Cluster uuid: a01f47fe-d089-11df-95b5-a24e50543103 Number of nodes in cluster = 2 Cluster id for node busan is 1 Primary IP address for node busan is 10.168.101.44 Cluster id for node seoul is 2 Primary IP address for node seoul is 10.168.101.43 Number of disks in cluster = 2 for disk cldisk1 UUID = fe1e9f03-005b-3191-a3ee-4834944fcdeb cluster_major = 0 cluster_minor = 1 for disk cldisk2 UUID = 428e30e8-657d-8053-d70e-c2f4b75999e2 cluster_major = 0 cluster_minor = 2 Multicast address for cluster is 228.168.101.43 Tip: The primary IP address shown for each node is the IP address chosen as the communication path during cluster definition. In this case, the address is the same IP address that is used as the persistent IP address. The multicast address, when not specified by the administrator during cluster creation, is composed by the number 228 followed by the last three octets of the communication path from the node where the synchronization is executed. In this particular example, the synchronization was run from the seoul node that has the communication path 192.168.101.43. Therefore, the multicast address for the cluster becomes 228.168.101.43 as can be observed in the output of lscluster -c command. Listing the cluster nodes configuration: -m flag The -m flag has a different output in each node. In the output shown in Example 8-18, clcmd is used to distribute the command over all cluster nodes. Example 8-18 Listing the cluster nodes configuration seoul:/ # clcmd lscluster -m ------------------------------NODE seoul ------------------------------Calling node query for all nodes Node query number of nodes examined: 2 Node name: busan Cluster shorthand id for node: 1 uuid for node: e356646e-c0dd-11df-b51d-a24e57e18a03 State of node: UP Smoothed rtt to node: 7 Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 209 Mean Deviation in network rtt to node: 3 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID korea local a01f47fe-d089-11df-95b5-a24e50543103 Number of points_of_contact for node: 2 Point-of-contact interface & contact state en2 UP en0 UP -----------------------------Node name: seoul Cluster shorthand id for node: 2 uuid for node: 4f8858be-c0dd-11df-930a-a24e50543103 State of node: UP NODE_LOCAL Smoothed rtt to node: 0 Mean Deviation in network rtt to node: 0 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID korea local a01f47fe-d089-11df-95b5-a24e50543103 Number of points_of_contact for node: 0 Point-of-contact interface & contact state n/a ------------------------------NODE busan ------------------------------Calling node query for all nodes Node query number of nodes examined: 2 Node name: busan Cluster shorthand id for node: 1 uuid for node: e356646e-c0dd-11df-b51d-a24e57e18a03 State of node: UP NODE_LOCAL Smoothed rtt to node: 0 Mean Deviation in network rtt to node: 0 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID korea local a01f47fe-d089-11df-95b5-a24e50543103 Number of points_of_contact for node: 0 Point-of-contact interface & contact state n/a -----------------------------Node name: seoul Cluster shorthand id for node: 2 uuid for node: 4f8858be-c0dd-11df-930a-a24e50543103 State of node: UP Smoothed rtt to node: 7 Mean Deviation in network rtt to node: 3 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 210 IBM PowerHA SystemMirror 7.1 for AIX CLUSTER NAME korea TYPE SHID local UUID a01f47fe-d089-11df-95b5-a24e50543103 Number of points_of_contact for node: 2 Point-of-contact interface & contact state en2 UP en0 UP Zone: Example 8-18 on page 209 mentions zones. A zone is a concept that is planned for use in future versions of CAA, where the node can be part of different groups of machines. Listing the cluster interfaces: -i flag The korea cluster is configured with NPIV through the VIOS. To have SAN heartbeating, you must direct SAN connection through Fibre Channel (FC) adapters. In Example 8-19, a cluster with such requirements has been used to demonstrate the output. Example 8-19 Listing the cluster interfaces sydney:/ # lscluster -i Network/Storage Interface Query Cluster Name: au_cl Cluster uuid: 0252a470-c216-11df-b85d-6a888564f202 Number of nodes reporting = 2 Number of nodes expected = 2 Node sydney Node uuid = a6ac83d4-c1d4-11df-8953-6a888564f202 Number of interfaces discovered = 4 Interface number 1 en0 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 6a.88.85.64.f2.2 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x1e080863 ndd flags for interface = 0x21081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en2 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 6a.88.85.64.f2.4 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x1e080863 ndd flags for interface = 0x21081b Interface state UP Number of regular addresses configured on interface = 1 Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 211 IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 sfwcom ifnet type = 0 ndd type = 304 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP Interface number 4 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 750 Mean Deviation in network rrt across interface = 1500 Probe interval for interface = 22500 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED Node perth Node uuid = c89d962c-c1d4-11df-aa87-6a888dd67502 Number of interfaces discovered = 4 Interface number 1 en0 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 6a.88.8d.d6.75.2 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x1e080863 ndd flags for interface = 0x21081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en2 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 6a.88.8d.d6.75.4 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x1e080863 ndd flags for interface = 0x21081b Interface state UP Number of regular addresses configured on interface = 1 212 IBM PowerHA SystemMirror 7.1 for AIX IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 sfwcom ifnet type = 0 ndd type = 304 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP Interface number 4 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 750 Mean Deviation in network rrt across interface = 1500 Probe interval for interface = 22500 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED rtt: The round-trip time (rtt) is calculated by using a mean deviation formula. Some commands show rrt instead of rtt, which is believed to be a typographic error in the command. sfwcom: Storage Framework Communication (sfwcom) is the interface created by CAA for SAN heartbeating. To enable sfwcom, the following prerequisites must be in place: Each node must have either a 4 GB or 8 GB FC adapter. If you are using vSCSI or NPIV, VIOS 2.2.0.11-FP24 SP01 is the minimum level required. The adapters used for SAN heartbeating must have the tme (target mode enabled) parameter set to yes. The Fibre Channel controller must have the parameter dyntrk set to yes, and the parameter fc_err_recov set to fast_fail. All the adapters participating in the heartbeating must be in the same fabric zone. In the previous example, sydney-fcs0 and perth-fcs0 are in the same fabric zone; sydney-fcs1 and perth-fcs1 are in the same fabric zone. dpcomm: The dpcomm interface is the actual repository disk. It means that, on top of the Ethernet and the Fibre Channel adapters, the cluster also uses the repository disk as a physical media to exchange heartbeats among the nodes. Excluding configured interfaces: Currently you cannot exclude configured interfaces from being used for cluster monitoring and communication. All network interfaces are used for cluster monitoring and communication. Listing the cluster storage interfaces: -d flag Example 8-20 shows all storage disks that are participating in the cluster, including the repository disk. Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 213 Example 8-20 Listing cluster storage interfaces seoul:/ # clcmd lscluster -d ------------------------------NODE seoul ------------------------------Storage Interface Query Cluster Name: korea Cluster uuid: a01f47fe-d089-11df-95b5-a24e50543103 Number of nodes reporting = 2 Number of nodes expected = 2 Node seoul Node uuid = 4f8858be-c0dd-11df-930a-a24e50543103 Number of disk discovered = 3 cldisk2 state : UP uDid : 3E213600A0B8000114632000009554C8E0B010F1815 uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2 type : CLUSDISK cldisk1 state : UP uDid : 3E213600A0B8000291B080000E90C05B0CD4B0F1815 uUid : fe1e9f03-005b-3191-a3ee-4834944fcdeb type : CLUSDISK caa_private0 state : UP uDid : uUid : 03e41dc1-3b8d-c422-3426-f1f61c567cda type : REPDISK Node seoul Node uuid = 4f8858be-c0dd-11df-930a-a24e50543103 Number of disk discovered = 3 cldisk2 state : UP uDid : 3E213600A0B8000114632000009554C8E0B010F1815 uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2 type : CLUSDISK cldisk1 state : UP uDid : 3E213600A0B8000291B080000E90C05B0CD4B0F1815 uUid : fe1e9f03-005b-3191-a3ee-4834944fcdeb type : CLUSDISK caa_private0 state : UP uDid : uUid : 03e41dc1-3b8d-c422-3426-f1f61c567cda type : REPDISK ------------------------------NODE busan ------------------------------Storage Interface Query Cluster Name: Cluster uuid: 214 korea a01f47fe-d089-11df-95b5-a24e50543103 IBM PowerHA SystemMirror 7.1 for AIX FAStT03IBMfcp FAStT03IBMfcp FAStT03IBMfcp FAStT03IBMfcp Number of nodes reporting = 2 Number of nodes expected = 2 Node busan Node uuid = e356646e-c0dd-11df-b51d-a24e57e18a03 Number of disk discovered = 3 cldisk1 state : UP uDid : 3E213600A0B8000291B080000E90C05B0CD4B0F1815 uUid : fe1e9f03-005b-3191-a3ee-4834944fcdeb type : CLUSDISK cldisk2 state : UP uDid : 3E213600A0B8000114632000009554C8E0B010F1815 uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2 type : CLUSDISK caa_private0 state : UP uDid : uUid : 03e41dc1-3b8d-c422-3426-f1f61c567cda type : REPDISK Node busan Node uuid = e356646e-c0dd-11df-b51d-a24e57e18a03 Number of disk discovered = 3 cldisk1 state : UP uDid : 3E213600A0B8000291B080000E90C05B0CD4B0F1815 uUid : fe1e9f03-005b-3191-a3ee-4834944fcdeb type : CLUSDISK cldisk2 state : UP uDid : 3E213600A0B8000114632000009554C8E0B010F1815 uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2 type : CLUSDISK caa_private0 state : UP uDid : uUid : 03e41dc1-3b8d-c422-3426-f1f61c567cda type : REPDISK FAStT03IBMfcp FAStT03IBMfcp FAStT03IBMfcp FAStT03IBMfcp Listing the network statistics: -s flag Example 8-21 shows overall statistics about cluster heartbeating and the gossip protocol used for nodes communication. Example 8-21 Listing the network statistics seoul:/ # lscluster -s Cluster Statistics: Cluster Network Statistics: pkts seen:194312 IP pkts:126210 gossip pkts sent:22050 cluster address pkts:0 bad transmits:0 pkts passed:66305 UDP pkts:127723 gossip pkts recv:64076 CP pkts:127497 bad posts:0 Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 215 short pkts:0 cluster wide errors:0 dup pkts:3680 fragments queued:0 requests dropped:0 pkts pulled:0 rxmit requests recv:21 requests missed:0 requests reset sent:0 requests lnk reset send :0 rxmit requests sent:5 alive pkts sent:0 ahafs pkts sent:17 nodedown pkts sent:0 socket pkts sent:733 cwide pkts sent:230 socket pkts no space:0 stale pkts recv:0 storage pkts sent:1 storage out-of-range pkts recv:0 multicast pkts:127768 bad pkts:0 pkt fragments:0 fragments freed:0 pkts routed:0 no memory:0 requests found:21 ooo pkts:2 reset recv:0 reset lnk recv:0 alive pkts recv:0 ahafs pkts recv:7 nodedown pkts recv:0 socket pkts recv:414 cwide pkts recv:230 pkts recv notforhere:0 other cluster pkts:0 pkts recv:1 8.3 Collecting information after a cluster is running Up to this point, all the examples in this chapter collected information about a non-running PowerHA 7.1 cluster. This section explains how to obtain valuable information from a configured and running cluster. WebSMIT: WebSMIT is no longer a supported tool. 8.3.1 AIX commands and log files AIX 7.1, which is used in the korea cluster, provides a set of tools that can be used to collect relevant information about the cluster, cluster services, and cluster device status. This section shows examples of that type of information. Disk configuration All the volume groups controlled by a resource group are shown as concurrent on both sides as shown in Example 8-22. Example 8-22 Listing disks seoul:/ # clcmd lspv ------------------------------NODE seoul ------------------------------hdisk0 00c0f6a088a155eb caa_private0 00c0f6a077839da7 cldisk2 00c0f6a0107734ea cldisk1 00c0f6a010773532 ------------------------------NODE busan 216 IBM PowerHA SystemMirror 7.1 for AIX rootvg caavg_private pokvg pokvg active active concurrent concurrent ------------------------------hdisk0 00c0f6a089390270 caa_private0 00c0f6a077839da7 cldisk2 00c0f6a0107734ea cldisk1 00c0f6a010773532 rootvg caavg_private pokvg pokvg active active concurrent concurrent Multicast information When compared with the multicast information collected when the cluster is not configured, the netstat command shows that the 228.168.101.43 address is present in the table. See Example 8-23. Example 8-23 Multicast information seoul:/ # netstat -a -I en0 Name Mtu Network Address Ipkts Ierrs en0 1500 link#2 a2.4e.50.54.31.3 82472 0 01:00:5e:28:65:2b 01:00:5e:7f:ff:fd 01:00:5e:00:00:01 en0 1500 192.168.100 seoul-b1 82472 0 228.168.101.43 239.255.255.253 224.0.0.1 en0 1500 10.168.100 seoul 82472 0 228.168.101.43 239.255.255.253 224.0.0.1 seoul:/ # netstat -a -I en2 Name Mtu Network Address Ipkts Ierrs en2 1500 link#3 a2.4e.50.54.31.7 44673 0 01:00:5e:7f:ff:fd 01:00:5e:28:65:2b 01:00:5e:00:00:01 en2 1500 192.168.200 seoul-b2 44673 0 239.255.255.253 228.168.101.43 224.0.0.1 en2 1500 10.168.100 poksap-db 44673 0 239.255.255.253 228.168.101.43 224.0.0.1 Opkts Oerrs Coll 53528 0 0 53528 0 0 53528 0 0 Opkts Oerrs Coll 22119 0 0 22119 0 0 22119 0 0 Status of the cluster When the PowerHA cluster is running, its status changes from ST_INIT to ST_STABLE as shown in Example 8-24. Example 8-24 PowerHA cluster status seoul:/ # lssrc -ls clstrmgrES Current state: ST_STABLE sccsid = "$Header: @(#) 61haes_r710_integration/13 43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710 2010-08-19T1 0:34:17-05:00$" Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 217 i_local_nodeid 1, i_local_siteid -1, my_handle 2 ml_idx[1]=0 ml_idx[2]=1 There are 0 events on the Ibcast queue There are 0 events on the RM Ibcast queue CLversion: 12 # Note: Version 12 represents PowerHA SystemMirror 7.1 local node vrmf is 7101 cluster fix level is "1" The following timer(s) are currently active: Current DNP values DNP Values for NodeId - 1 NodeName - busan PgSpFree = 1308144 PvPctBusy = 0 PctTotalTimeIdle = 98.105654 DNP Values for NodeId - 2 NodeName - seoul PgSpFree = 1307899 PvPctBusy = 0 PctTotalTimeIdle = 96.912367 Group Services information Previous versions of PowerHA use the grpsvcs subsystem. PowerHA 7.1 uses the cthags subsystem. The output of the lssrc -ls cthags command has similar information to what used to be presented by the lssrc -ls grpsvcs command. Example 8-25 shows this output. Example 8-25 Output of the lssrc -ls cthags command seoul:/ # lssrc -ls cthags Subsystem Group PID Status cthags cthags 6095048 active 5 locally-connected clients. Their PIDs: 6160578(IBM.ConfigRMd) 1966256(rmcd) 3604708(IBM.StorageRMd) 7078046(clstrmgr) 14680286(gsclvmd) HA Group Services domain information: Domain established by node 1 Number of groups known locally: 8 Number of Number of local Group name providers providers/subscribers rmc_peers 2 1 0 s00O3RA00009G0000015CDBQGFL 2 1 0 IBM.ConfigRM 2 1 0 IBM.StorageRM.v1 2 1 0 CLRESMGRD_1108531106 2 1 0 CLRESMGRDNPD_1108531106 2 1 0 CLSTRMGR_1108531106 2 1 0 d00O3RA00009G0000015CDBQGFL 2 1 0 Critical clients will be terminated if unresponsive Network configuration and routing table The service IP address is added to an interface on the node where the resource group is started. The routing table also keeps the service IP address. The multicast address is not displayed in the routing table. See Example 8-26. Example 8-26 Network configuration and routing table seoul:/ # clcmd ifconfig -a ------------------------------NODE seoul ------------------------------- 218 IBM PowerHA SystemMirror 7.1 for AIX en0: en0: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT ,CHECKSUM_OFFLOAD(ACTIVE),CHAIN> inet 192.168.101.143 netmask 0xffffff00 broadcast 192.168.103.255 inet 10.168.101.43 netmask 0xffffff00 broadcast 10.168.103.255 tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1 en2: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT ,CHECKSUM_OFFLOAD(ACTIVE),CHAIN> inet 192.168.201.143 netmask 0xffffff00 broadcast 192.168.203.255 inet 10.168.101.143 netmask 0xffffff00 broadcast 10.168.103.255 tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1 lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LAR GESEND,CHAIN> inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1%1/0 tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1 ------------------------------NODE busan ------------------------------en0: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT ,CHECKSUM_OFFLOAD(ACTIVE),CHAIN> inet 192.168.101.144 netmask 0xffffff00 broadcast 192.168.103.255 inet 10.168.101.44 netmask 0xffffff00 broadcast 10.168.103.255 tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1 en2: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT ,CHECKSUM_OFFLOAD(ACTIVE),CHAIN> inet 192.168.201.144 netmask 0xffffff00 broadcast 192.168.203.255 tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1 lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LAR GESEND,CHAIN> inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1%1/0 tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1 seoul:/ # clcmd netstat -rn ------------------------------NODE seoul ------------------------------Routing tables Destination Gateway Flags Route tree for Protocol Family 2 (Internet): default 192.168.100.60 UG 10.168.100.0 10.168.101.43 UHSb 10.168.100.0 10.168.101.143 UHSb 10.168.100/22 10.168.101.43 U 10.168.100/22 10.168.101.143 U 10.168.101.43 127.0.0.1 UGHS 10.168.101.143 127.0.0.1 UGHS 10.168.103.255 10.168.101.43 UHSb Refs Use If 1 0 0 7 3 10 1 0 4187 0 0 56800 1770 33041 72 0 en0 en0 en2 en0 en2 lo0 lo0 en0 Exp Groups - - Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster => => => => 219 10.168.103.255 127/8 192.168.100.0 192.168.100/22 192.168.101.143 192.168.103.255 192.168.200.0 192.168.200/22 192.168.201.143 192.168.203.255 10.168.101.143 127.0.0.1 192.168.101.143 192.168.101.143 127.0.0.1 192.168.101.143 192.168.201.143 192.168.201.143 127.0.0.1 192.168.201.143 UHSb U UHSb U UGHS UHSb UHSb U UGHS UHSb en2 lo0 en0 en0 lo0 en0 en2 en2 lo0 en2 - - Route tree for Protocol Family 24 (Internet v6): ::1%1 ::1%1 UH 2 4180 lo0 - - ------------------------------NODE busan ------------------------------Routing tables Destination Gateway Refs Use If 1 0 23 10 0 19 0 3 0 2 0 0 0 0 2012 0 54052 5706 0 3803 0 1953 14 27 0 2 4 0 en0 en0 en0 lo0 en0 lo0 en0 en0 lo0 en0 en2 en2 lo0 en2 - - 876 lo0 - - Flags Route tree for Protocol Family 2 (Internet): default 192.168.100.60 UG 10.168.100.0 10.168.101.44 UHSb 10.168.100/22 10.168.101.44 U 10.168.101.44 127.0.0.1 UGHS 10.168.103.255 10.168.101.44 UHSb 127/8 127.0.0.1 U 192.168.100.0 192.168.101.144 UHSb 192.168.100/22 192.168.101.144 U 192.168.101.144 127.0.0.1 UGHS 192.168.103.255 192.168.101.144 UHSb 192.168.200.0 192.168.201.144 UHSb 192.168.200/22 192.168.201.144 U 192.168.201.144 127.0.0.1 UGHS 192.168.203.255 192.168.201.144 UHSb 0 15 0 2 0 0 0 0 0 0 Route tree for Protocol Family 24 (Internet v6): ::1%1 ::1%1 UH 6 0 16316 0 1201 18 43 0 2 4 0 Exp => => Groups => => => Using tcpdump, iptrace, and mping utilities to monitor multicast traffic With the introduction of the multicast address and the gossip protocol, the cluster administrator can use tools to monitor Ethernet heartbeating. The following sections explain how to use the native tcpdump, iptrace, and mping native AIX tools for this type of monitoring. The tcpdump utility You can dump all the traffic between the seoul node and the multicast address 228.168.101.43 by using the tcpdump utility. Observe that the UDP packets originate in the base or boot addresses of the interfaces, not in the persistent or service IP labels. Example 8-27 shows how to list the available interfaces and then capture traffic for the en2 interface. Example 8-27 Multicast packet monitoring for the seoul node using the tcpdump utility seoul:/ # tcpdump -D 1.en0 220 IBM PowerHA SystemMirror 7.1 for AIX 2.en2 3.lo0 seoul:/ # tcpdump -t -i2 -v ip and host 228.168.101.43 tcpdump: listening on en0, link-type 1, capture size 96 bytes IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP seoul-b2.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 IP (tos 0x02 ttl 32, id 0, offset 0, flags [none], proto: UDP seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP seoul-b2.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 (17), length: 1478) (17), length: 1478) (17), length: 1478) (17), length: 1478) (17), length: 1478) (17), length: 1478) (17), length: 1478) (17), length: 1478) The same information is captured on the busan node as shown in Example 8-28. Example 8-28 Multicast packet monitoring for the busan node using the tcpdump utility busan:/tmp # tcpdump -D 1.en0 2.en2 3.lo0 busan:/ # tcpdump -t -i2 -v ip and host 228.168.101.43 IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP busan-b2.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP busan-b2.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450 (17), length: 1478) (17), length: 1478) (17), length: 1478) (17), length: 1478) (17), length: 1478) (17), length: 1478) (17), length: 1478) (17), length: 1478) You can also see the multicast traffic for all the PowerHA 7.1 clusters in your LAN segment. The following command generates the output: seoul:/ # tcpdump -n -vvv port drmsfsd Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 221 The iptrace utility The iptrace utility provides a more detailed packet tracing information compared to the tcpdump utility. Both the en0 (MAC address A24E50543103) and en2 (MAC address A24E50543107) interfaces are generating packets toward the cluster multicast address 228.168.101.43 as shown in Example 8-29. Example 8-29 The iptrace utility for monitoring multicast packets seoul:/tmp # iptrace -a -s 228.168.101.43 -b korea_cluster.log; sleep 30 [10289364] seoul:/tmp # kill -9 10289364 seoul:/tmp # /usr/sbin/ipreport korea_cluster.log | more IPTRACE version: 2.0 ====( 1492 bytes transmitted on interface en0 )==== 12:49:17.384871427 ETHERNET packet : [ a2:4e:50:54:31:03 -> 01:00:5e:28:65:2b ] type 800 (IP) IP header breakdown: < SRC = 192.168.101.143 > (seoul-b1) < DST = 228.168.101.43 > ip_v=4, ip_hl=20, ip_tos=0, ip_len=1478, ip_id=0, ip_off=0 ip_ttl=32, ip_sum=251c, ip_p = 17 (UDP) UDP header breakdown: <source port=4098(drmsfsd), <destination port=4098(drmsfsd) > [ udp length = 1458 | udp checksum = 0 ] 00000000 00000009 100234c8 00000030 00000000 |......4....0....| 00000010 1be40fb0 c19311df 920ca24e 50543103 |...........NPT1.| ******** 00000030 ffffffff ffffffff ffffffff ffffffff |................| 00000040 00001575 00000000 00000000 00000000 |...u............| 00000050 00000000 00000003 00000000 00000000 |................| 00000060 00000000 00000000 00020001 00020fb0 |................| 00000070 c19311df 1be40fb0 c19311df 920ca24e |...............N| 00000080 50543103 0000147d 00000000 4f8858be |PT1....}....O.X.| 00000090 c0dd11df 930aa24e 50543103 00000000 |.......NPT1.....| 000000a0 00000000 00000000 00000000 00000000 |................| ******** 000005a0 00000000 00000000 0000 |.......... | ====( 1492 bytes transmitted on interface en0 )==== 12:49:17.388085181 ETHERNET packet : [ a2:4e:50:54:31:03 -> 01:00:5e:28:65:2b ] type 800 (IP) IP header breakdown: < SRC = 192.168.101.143 > (seoul-b1) < DST = 228.168.101.43 > ip_v=4, ip_hl=20, ip_tos=0, ip_len=1478, ip_id=0, ip_off=0 ip_ttl=32, ip_sum=251c, ip_p = 17 (UDP) UDP header breakdown: <source port=4098(drmsfsd), <destination port=4098(drmsfsd) > [ udp length = 1458 | udp checksum = 0 ] 00000000 00000004 10021002 00000070 00000000 |...........p....| 00000010 1be40fb0 c19311df 920ca24e 50543103 |...........NPT1.| ******** 00000030 ffffffff ffffffff ffffffff ffffffff |................| 00000040 00001575 00000000 00000000 00000000 |...u............| 00000050 f1000815 b002b8a0 00000000 00000000 |................| 00000060 00000000 00000000 0002ffff 00010000 |................| 222 IBM PowerHA SystemMirror 7.1 for AIX 00000070 00000080 00000090 000000a0 000000b0 000000c0 000000d0 000000e0 ******** 000005a0 00000000 00000000 00000000 00000000 00000000 00000001 50543103 00000000 00000000 00000d7a 00000000 00020000 00000000 4f8858be 00000001 00000000 00000000 00000000 00000000 00000000 00000000 c0dd11df 00000000 00000000 00000000 00000000 0000 00000000 00000000 00000000 00000000 00001575 930aa24e 00000000 00000000 |................| |.......z........| |................| |................| |...............u| |....O.X........N| |PT1.............| |................| |.......... | ====( 1492 bytes transmitted on interface en2 )==== 12:49:17.394219029 ETHERNET packet : [ a2:4e:50:54:31:07 -> 01:00:5e:28:65:2b ] type 800 (IP) IP header breakdown: < SRC = 192.168.201.143 > (seoul-b2) < DST = 228.168.101.43 > ip_v=4, ip_hl=20, ip_tos=0, ip_len=1478, ip_id=0, ip_off=0 ip_ttl=32, ip_sum=c11b, ip_p = 17 (UDP) UDP header breakdown: <source port=4098(drmsfsd), <destination port=4098(drmsfsd) > [ udp length = 1458 | udp checksum = 0 ] 00000000 00000009 100234c8 00000030 00000000 |......4....0....| 00000010 a01f47fe d08911df 95b5a24e 50543103 |..G........NPT1.| ******** 00000030 ffffffff ffffffff ffffffff ffffffff |................| 00000040 00000fab 00000000 00000000 00000000 |................| 00000050 00000000 00000003 00000000 00000000 |................| 00000060 00000000 00000000 00020001 000247fe |..............G.| 00000070 d08911df a01f47fe d08911df 95b5a24e |......G........N| 00000080 50543103 000014b4 00000000 4f8858be |PT1.........O.X.| 00000090 c0dd11df 930aa24e 50543103 00000000 |.......NPT1.....| 000000a0 00000000 00000000 00000000 00000000 |................| ******** 000005a0 00000000 00000000 0000 |.......... | . . . Tip: You can observer the multicast address in the last line of the lscluster -c CAA command. The mping utility You can also use the mping utility to test the multicast connectivity. One node acts as a sender of packets, and the other node acts as a receiver of packets. You trigger the command on both nodes at the same time as shown in Example 8-30. Example 8-30 Using the mping utility to test multicast connectivity seoul:/ # mping -v -s -a 228.168.101.43 mping version 1.0 Localhost is seoul, 10.168.101.43 mpinging 228.168.101.43/4098 with ttl=32: 32 bytes from 10.168.101.44: seqno=1 ttl=32 time=0.260 ms 32 bytes from 10.168.101.44: seqno=1 ttl=32 time=0.326 ms 32 bytes from 10.168.101.44: seqno=1 ttl=32 time=0.344 ms Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 223 32 32 32 32 32 32 32 32 32 bytes bytes bytes bytes bytes bytes bytes bytes bytes from from from from from from from from from 10.168.101.44: 10.168.101.44: 10.168.101.44: 10.168.101.44: 10.168.101.44: 10.168.101.44: 10.168.101.44: 10.168.101.44: 10.168.101.44: seqno=1 seqno=2 seqno=2 seqno=2 seqno=2 seqno=3 seqno=3 seqno=3 seqno=3 ttl=32 ttl=32 ttl=32 ttl=32 ttl=32 ttl=32 ttl=32 ttl=32 ttl=32 time=0.361 time=0.235 time=0.261 time=0.299 time=0.317 time=0.216 time=0.262 time=0.282 time=0.300 busan:/ # mping -v -r -a 228.168.101.43 mping version 1.0 Localhost is busan, 10.168.101.44 Listening on 228.168.101.43/4098: Replying to mping from 10.168.101.43 bytes=32 Replying to mping from 10.168.101.43 bytes=32 Discarding receiver packet Discarding receiver packet Replying to mping from 10.168.101.43 bytes=32 Replying to mping from 10.168.101.43 bytes=32 Discarding receiver packet Discarding receiver packet Replying to mping from 10.168.101.43 bytes=32 Replying to mping from 10.168.101.43 bytes=32 Discarding receiver packet Discarding receiver packet ms ms ms ms ms ms ms ms ms seqno=1 ttl=32 seqno=1 ttl=32 seqno=2 ttl=32 seqno=2 ttl=32 seqno=3 ttl=32 seqno=3 ttl=32 8.3.2 CAA commands and log files This section explains the commands specifically for gathering CAA-related information and the associated log files. Cluster information The CAA comes with a set of command-line tools, as explained in “Cluster information using the lscluster command” on page 209. These tools can be used to monitor the status and statistics of a running cluster. For more information about CAA and its functionalities, see Chapter 2, “Features of PowerHA SystemMirror 7.1” on page 23. Cluster repository disk, CAA, and solidDB This section provides additional information about the cluster repository disk, CAA, and solidDB. UUID The UUID of the caa_private0 disk is stored as a cluster0 device attribute as shown in Example 8-31. Example 8-31 The cluster0 device attributes seoul:/ # lsattr -El cluster0 clvdisk 03e41dc1-3b8d-c422-3426-f1f61c567cda Cluster repository disk identifier True node_uuid 4f8858be-c0dd-11df-930a-a24e50543103 OS image identifier True Example 8-32 also shows the UUID. 224 IBM PowerHA SystemMirror 7.1 for AIX Example 8-32 UUID caa_private0 state uDid uUid type : UP : : 03e41dc1-3b8d-c422-3426-f1f61c567cda : REPDISK The repository disk contains logical volumes for the bootstrap and solidDB file systems as shown in Example 8-33. Example 8-33 Repository logical volumes seoul:/ # lsvg -l caavg_private caavg_private: LV NAME TYPE LPs caalv_private1 boot 1 caalv_private2 boot 1 caalv_private3 boot 4 fslv00 jfs2 4 fslv01 jfs2 4 powerha_crlv boot 1 PPs 1 1 4 4 4 1 PVs 1 1 1 1 1 1 LV STATE closed/syncd closed/syncd open/syncd closed/syncd open/syncd closed/syncd MOUNT POINT N/A N/A N/A /clrepos_private1 /clrepos_private2 N/A Querying the bootstrap repository Example 8-34 shows the bootstrap repository. Example 8-34 Querying the bootstrap repository seoul:/ # /usr/lib/cluster/clras dumprepos HEADER CLUSRECID: 0xa9c2d4c2 Name: korea UUID: a01f47fe-d089-11df-95b5-a24e50543103 SHID: 0x0 Data size: 1536 Checksum: 0xc197 Num zones: 0 Dbpass: a0305b84_d089_11df_95b5_a24e50543103 Multicast: 228.168.101.43 DISKS name cldisk1 FAStT03IBMfcp cldisk2 FAStT03IBMfcp devno 1 2 uuid fe1e9f03-005b-3191-a3ee-4834944fcdeb udid 3E213600A0B8000291B080000E90C05B0CD4B0F1815 428e30e8-657d-8053-d70e-c2f4b75999e2 3E213600A0B8000114632000009554C8E0B010F1815 NODES numcl 0 0 numz 0 0 uuid 4f8858be-c0dd-11df-930a-a24e50543103 e356646e-c0dd-11df-b51d-a24e57e18a03 shid 2 1 name seoul busan ZONES none The solidDB status You can use the command shown in Example 8-35 to check which node currently hosts the active solidDB database. Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 225 Example 8-35 The solidDB status seoul:/ # clcmd /opt/cluster/solidDB/bin/solcon -x pwdfile:/etc/cluster/dbpass -e "hsb state" "tcp 2188" caa ------------------------------NODE seoul ------------------------------IBM solidDB Remote Control - Version 6.5.0.0 Build 0010 (c) Solid Information Technology Ltd. 1993, 2009 SECONDARY ACTIVE ------------------------------NODE busan ------------------------------IBM solidDB Remote Control - Version 6.5.0.0 Build 0010 (c) Solid Information Technology Ltd. 1993, 2009 PRIMARY ACTIVE Tip: The solidDB database is not necessarily active in the same node where the PowerHA resource group is active. You can see this difference when comparing Example 8-35 with the output of the clRGinfo command: seoul:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------db2pok_Resourc ONLINE seoul OFFLINE busan In this case, the solidDB database has the primary database active in the busan node, and the PowerHA resource group is currently settled in the seoul node. Another way to check which node has solidDB active is to use the lssrc command. Example 8-36 shows that solidDB is active in the seoul node. Observe the line that says “Group Leader.” Example 8-36 Using the lssrc command to check where solidDB is active seoul:/ # lssrc -ls Subsystem : PID : Cluster Name : Node Number : Daemon start time : IBM.StorageRM IBM.StorageRM 7077950 korea 2 10/05/10 10:06:57 PeerNodes: 2 QuorumNodes: 2 Group IBM.StorageRM.v1: ConfigVersion: 0x24cab3184 Providers: 2 QuorumMembers: 2 Group Leader: seoul, 0xdc82faf0908920dc, 2 Information from malloc about memory use: Total Space : 0x00be0280 (12452480) Allocated Space: 0x007ec198 (8307096) 226 IBM PowerHA SystemMirror 7.1 for AIX Unused Space : 0x003ed210 (4117008) Freeable Space : 0x00000000 (0) Information about trace levels: _SEU Errors=255 Info=0 API=0 Buffer=0 SvcTkn=0 CtxTkn=0 _SEL Errors=255 Info=0 API=0 Buffer=0 Perf=0 _SEI Error=0 API=0 Mapping=0 Milestone=0 Diag=0 _SEA Errors=255 Info=0 API=0 Buffer=0 SVCTKN=0 CTXTKN=0 _MCA Errors=255 Info=0 API=0 Callbacks=0 Responses=0 RspPtrs=0 Protocol=0 APItoProto=0 PrototoRsp=0 CommPath=0 Thread=0 ThreadCtrl=0 RawProtocol=0 Signatures=0 _RCA RMAC_SESSION=0 RMAC_COMMANDGROUP=0 RMAC_REQUEST=0 RMAC_RESPONSE=0 RMAC_CALLBACK=0 _CAA Errors=255 Info=0 Debug=0 AUA_Blobs=0 AHAFS_Events=0 _GSA Errors=255 Info=2 GSCL=0 Debug=0 _SRA API=0 Errors=255 Wherever=0 _RMA Errors=255 Info=0 API=0 Thread=0 Method=0 Object=0 Protocol=0 Work=0 CommPath=0 _SKD Errors=255 Info=0 Debug=0 _SDK Errors=255 Info=0 Exceptions=0 _RMF Errors=255 Info=2 Debug=0 _STG Errors=255 Info=1 Event=1 Debug=0 /var/ct/2W7qV~q8aHtvMreavGL343/log/mc/IBM.StorageRM/trace -> spooling not enabled Using the solidDB SQL interface You can also retrieve some information shown by the lscluster command by using the solidDB SQL interface as shown in Example 8-37 and Example 8-38 on page 228. Example 8-37 The solidDB SQL interface (view from left side of code) seoul:/ # /opt/cluster/solidDB/bin/solsql -x pwdfile:/etc/cluster/dbpass "tcp 2188" caa IBM solidDB SQL Editor (teletype) - Version: 6.5.0.0 Build 0010 (c) Solid Information Technology Ltd. 1993, 2009 Connected to 'tcp 2188'. Execute SQL statements terminated by a semicolon. Exit by giving command: exit; list schemas; RESULT -----Catalog: CAA SCHEMAS: -------CAA 35193956_C193_11DF_A3EA_A24E50543103 36FC3B56_C193_11DF_A29A_A24E50543103 1 rows fetched. list tables; RESULT -----Catalog: CAA Schema: CAA TABLES: ------CLUSTERS Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 227 NODES REPOSNAMESPACE REPOSSTORES SHAREDDISKS INTERFACES INTERFACE_ATTRS PARENT_CHILD ENTITIES 1 rows fetched. select * from clusters; CLUSTER_ID CLUSTER_NAME ETYPE ESUBTYPE ---------- ----------------------1 SIRCOL_UNKNOWN 4294967296 32 2 korea 4294967296 32 2 rows fetched. select * from nodes; NODES_ID -------1 2 NODE_NAME --------busan seoul ETYPE ----8589934592 8589934592 ESUBTYPE -------0 0 GLOB_ID UUID ---------4294967297 00000000-0000-0000-0000-000000000000 4294967296 a01f47fe-d089-11df-95b5-a24e50543103 GLOB_ID ------8589934593 85899345944 UUID ---e356646e-c0dd-11df-b51d-a24e57e18a03 f8858be-c0dd-11df-930a-a24e50543103 2 rows fetched. select * from SHAREDDISKS; SHARED_DISK_ID DISK_NAME -------------- --------1 cldisk2 2 cldisk1 2 rows fetched. ETYPE ----34359738368 34359738368 GLOB_ID ------34359738370 34359738369 UUID ---428e30e8-657d-8053-d70e-c2f4b75999e2 fe1e9f03-005b-3191-a3ee-4834944fcdeb Example 8-38 Using the solidDB SQL interface (view from right side starting at CLUSTER_ID row) VERIFIED_STATUS --------------NULL NULL ESTATE -----1 1 VERSION_OPERATING ----------------1 1 VERIFIED_STATUS --------------NULL NULL PARENT_CLUSTER_ID ----------------2 2 VERIFIED_STATUS --------------NULL NULL PARENT_CLUSTER_ID ----------------2 2 VERSION_CAPABLE --------------1 1 MULTICAST --------0 0 ESTATE -----1 1 VERSION_OPERATING ----------------1 1 VERSION_CAPABLE --------------1 1 ESTATE -----1 1 VERSION_OPERATING ----------------1 1 VERSION_CAPABLE --------------1 1 SIRCOL: SIRCOL stands for Storage Interconnected Resource Collection. 228 IBM PowerHA SystemMirror 7.1 for AIX The /var/adm/ras/syslog.caa log file The mkcluster, chcluster and rmcluster commands (and their underlying APIs) use the syslogd daemon for error logging. The cld and clconfd daemons and the clusterconf command also use syslogd facility for error logging. For that purpose, when PowerHA 7.1 file sets are installed, the following line is added to the /etc/syslog.conf file: *.info /var/adm/ras/syslog.caa rotate size 1m files 10 This file keeps all the logs about CAA activity, including the error outputs from the commands. Example 8-39 shows an error caught in the /var/adm/ras/syslog.caa file during the cluster definition. The chosen repository disk has already been part of a repository in the past and had not been cleaned up. Example 8-39 Output of the /var/adm/ras/syslog.caa file Sep 16 08:58:14 seoul user:err|error syslog: validate_device: Specified device, hdisk1, is a repository. Sep 16 08:58:14 seoul user:warn|warning syslog: To force cleanup of this disk, use rmcluster -r hdisk1 # It also keeps track of all PowerHA SystemMirror events. Example: Sep 16 09:40:40 seoul user:notice PowerHA SystemMirror for AIX: EVENT acquire_service_addr 0 Sep 16 09:40:42 seoul user:notice PowerHA SystemMirror for AIX: EVENT rg_move seoul 1 ACQUIRE 0 Sep 16 09:40:42 seoul user:notice PowerHA SystemMirror for AIX: EVENT rg_move_acquire seoul 1 0 Sep 16 09:40:42 seoul user:notice PowerHA SystemMirror for AIX: EVENT rg_move_complete seoul 1 Sep 16 09:40:42 seoul user:notice PowerHA SystemMirror for AIX: EVENT rg_move_complete seoul 1 0 Sep 16 09:40:44 seoul user:notice PowerHA SystemMirror for AIX: EVENT node_up_complete seoul Sep 16 09:40:44 seoul user:notice PowerHA SystemMirror for AIX: EVENT node_up_complete seoul 0 COMPLETED: COMPLETED: COMPLETED: START: COMPLETED: START: COMPLETED: Tip: To capture debug information, you can replace *.info with *.debug in the /etc/syslog.conf file, followed by a syslogd daemon refresh. Given that the output in debug mode provides much information, redirect the syslogd output to a file system other than /, /var, or /tmp. The solidDB log files The solidDB daemons keep log files on file systems over the repository disk in every node inside the solidDB directory as shown in Example 8-40. Example 8-40 The solidDB log files and directories seoul:/ # lsvg -l caavg_private caavg_private: LV NAME TYPE LPs caalv_private1 boot 1 caalv_private2 boot 1 caalv_private3 boot 4 fslv00 jfs2 4 /clrepos_private1 PPs 1 1 4 4 PVs 1 1 1 1 LV STATE closed/syncd closed/syncd open/syncd closed/syncd MOUNT POINT N/A N/A N/A Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 229 fslv01 /clrepos_private2 powerha_crlv jfs2 4 4 1 open/syncd boot 1 1 1 closed/syncd seoul:/ # ls -lrt /clrepos_private2 total 8 drwxr-xr-x 2 root system drwxr-xr-x 4 bin bin N/A 256 Sep 16 09:05 lost+found 4096 Sep 17 14:32 solidDB seoul:/ # ls -lrt /clrepos_private2/solidDB total 18608 -r-xr-xr-x 1 root system 650 -r-xr-xr-x 1 root system 5246 -r-xr-xr-x 1 root system 5975 d--x-----2 root system 256 -r-x-----1 root system 322 drwxr-xr-x 2 bin bin 256 -rw------1 root system 8257536 -rw-r--r-1 root system 18611 -rw------1 root system 1054403 -rw------1 root system 166011 Feb Jun Aug Aug Sep Sep Sep Sep Sep Sep 20 6 7 7 17 17 17 17 17 17 2010 18:54 15:53 23:10 12:06 12:06 12:06 12:06 14:32 15:03 solid.lic caa.sql solid.ini .sec solidhac.ini logs solid.db hacmsg.out solmsg.bak solmsg.out seoul:/ # ls -lrt /clrepos_private2/solidDB/logs total 32 -rw------1 root system 16384 Sep 17 12:07 sol00002.log Explanation of file names: The solid daemon generates the solmsg.out log file. The solidhac daemon generates the hacmsg.out log file. The solid.db file is the database itself, and the logs directory contains the database transaction logs. The solid.ini files are the configuration files for the solid daemons; the solidhac.ini files are the configuration files for the solidhac daemons. Collecting CAA debug information for IBM support The CAA component is now included in the snap command. The snap -e and clsnap commands collect all the necessary information for IBM support. The snap command gathers the following files from each node, compressing them into a .pax file: LOG bootstrap_repository clrepos1_solidDB.tar dbpass lscluster_clusters lscluster_network_interfaces lscluster_network_statistics lscluster_nodes lscluster_storage_interfaces 230 IBM PowerHA SystemMirror 7.1 for AIX lscluster_zones solid_lssrc solid_lssrc_S solid_select_sys_tables solid_select_tables syslog_caa system_proc_version system_uname 8.3.3 PowerHA 7.1 cluster monitoring tools PowerHA 7.1 comes with many commands and utilities that an administrator can use to monitor the cluster. This section explains those tools that are most commonly used. Using the clstat utility The clstat utility is the most traditional and most used interactive tool to observe the cluster status. Before using the clstat utility, you must convert the Simple Network Management Protocol (SNMP) from version 3 to version 1, if it is not done yet. Example 8-41 shows the steps and sample outputs. Example 8-41 Converting SNMP from V3 to V1 seoul:/ # stopsrc -s snmpd 0513-044 The snmpd Subsystem was requested to stop. seoul:/ # ls -ld /usr/sbin/snmpd lrwxrwxrwx 1 root system snmpdv3ne 9 Sep 15 22:17 /usr/sbin/snmpd -> seoul:/ # /usr/sbin/snmpv3_ssw -1 Stop daemon: snmpmibd In /etc/rc.tcpip file, comment out the line that contains: snmpmibd In /etc/rc.tcpip file, remove the comment from the line that contains: dpid2 Make the symbolic link from /usr/sbin/snmpd to /usr/sbin/snmpdv1 Make the symbolic link from /usr/sbin/clsnmp to /usr/sbin/clsnmpne Start daemon: dpid2 seoul:/ # ls -ld /usr/sbin/snmpd lrwxrwxrwx 1 root system /usr/sbin/snmpdv1 17 Sep 20 09:49 /usr/sbin/snmpd -> seoul:/ # startsrc -s snmpd 0513-059 The snmpd Subsystem has been started. Subsystem PID is 8126570. The clstat utility in interactive mode With the new -i flag, you can now select the cluster ID from a list of available ones as shown in Example 8-42. Example 8-42 The clstat command in interactive mode sydney:/ # clstat -i clstat - HACMP Cluster Status Monitor ------------------------------------Number of clusters active: 1 ID Name State 1108531106 korea UP Select an option: # - the Cluster ID 1108531106 q- quit Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 231 clstat - HACMP Cluster Status Monitor ------------------------------------Cluster: korea (1108531106) Tue Oct 5 11:01:17 2010 State: UP Nodes: 2 SubState: STABLE Node: busan State: UP Interface: busan-b1 (0) 192.168.101.144 UP 192.168.201.144 UP Address: State: Address: State: Address: State: 192.168.101.143 UP 192.168.201.143 UP 10.168.101.143 UP State: On line Interface: busan-b2 (0) Address: State: Address: State: Node: seoul State: UP Interface: seoul-b1 (0) Interface: seoul-b2 (0) Interface: poksap-db (0) Resource Group: db2pok_ResourceGroup The clstat utility with the -o flag You can use the clstat utility with the -o flag as shown in Example 8-43. This flag instructs the utility to run once and then exit. It is useful for scripts and cron jobs. Example 8-43 The clstat utility with the option to run only once sydney:/ # clstat -o clstat - HACMP Cluster Status Monitor ------------------------------------Cluster: au_cl (1128255334) Mon Sep 20 10:26:10 2010 State: UP SubState: STABLE Nodes: 2 Node: perth State: UP Interface: perth (0) Interface: perthb2 (0) Interface: perths (0) Address: State: Address: State: Address: State: 192.168.101.136 UP 192.168.201.136 UP 10.168.201.136 UP State: On line Resource Group: perthrg Node: sydney State: UP Interface: sydney (0) Interface: sydneyb2 (0) Interface: sydneys (0) 232 IBM PowerHA SystemMirror 7.1 for AIX Address: State: Address: State: Address: State: 192.168.101.135 UP 192.168.201.135 UP 10.168.201.135 UP Resource Group: sydneyrg State: On line sydney:/ # Tip: The sfwcom and dpcomm interfaces that are shown with the lscluster -i command are not shown in output of the clstat utility. The PowerHA 7.1 cluster is unaware of the CAA cluster that is present at the AIX level. Using the cldump utility Another traditional way to observe the cluster status is to use the cldump utility, which also relies on the SNMP infrastructure as shown in Example 8-44. Example 8-44 cldump command seoul:/ # cldump Obtaining information via SNMP from Node: seoul... _____________________________________________________________________________ Cluster Name: korea Cluster State: UP Cluster Substate: STABLE _____________________________________________________________________________ Node Name: busan State: UP Network Name: net_ether_01 State: UP Address: 192.168.101.144 Label: busan-b1 Address: 192.168.201.144 Label: busan-b2 Node Name: seoul State: UP State: UP State: UP Network Name: net_ether_01 State: UP Address: 10.168.101.143 Label: poksap-db Address: 192.168.101.143 Label: seoul-b1 Address: 192.168.201.143 Label: seoul-b2 State: UP State: UP State: UP Cluster Name: korea Resource Group Name: db2pok_ResourceGroup Startup Policy: Online On Home Node Only Fallover Policy: Fallover To Next Priority Node In The List Fallback Policy: Never Fallback Site Policy: ignore Node Group State ---------------------------- --------------seoul ONLINE busan OFFLINE Tools in the /usr/es/sbin/cluster/utilities/ file The administrator of a running PowerHA 7.1 cluster can use several tools that are provided with the cluster.es.server.utils file set. These tools are kept in the /usr/es/sbin/cluster/utilities/ directory. Examples of the tools are provided in the following sections. Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 233 Listing the PowerHA SystemMirror cluster interfaces Example 8-45 shows the list of interfaces in the cluster using the cllsif command. Example 8-45 Listing cluster interfaces using the cllsif command seoul:/ # /usr/es/sbin/cluster/utilities/cllsif Adapter Type Network Net Type Attribute Address Hardware Address Interface Name Global Name Alias for HB Prefix Length busan-b2 192.168.201.144 busan-b1 192.168.101.144 poksap-db 10.168.101.143 seoul-b1 192.168.101.143 seoul-b2 192.168.201.143 poksap-db 10.168.101.143 boot boot service boot boot service net_ether_01 ether en2 net_ether_01 ether en0 net_ether_01 ether net_ether_01 ether en0 net_ether_01 ether en2 net_ether_01 ether Node Netmask busan 255.255.255.0 public busan 255.255.255.0 public busan 255.255.255.0 public seoul 255.255.255.0 public seoul 255.255.255.0 public seoul 255.255.255.0 IP public 24 24 24 24 24 24 Listing the whole cluster topology information Example 8-46 shows the cluster topology information that is generated by using the cllscf command. Example 8-46 Cluster topology listing by using the cllscf command seoul:/ # /usr/es/sbin/cluster/utilities/cllscf Cluster Name: korea Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No There were 1 networks defined: net_ether_01 There are 2 nodes in this cluster NODE busan: This node has 1 service IP label(s): Service IP Label poksap-db: IP address: 10.168.101.143 Hardware Address: Network: net_ether_01 Attribute: public Aliased Address?: Enable Service IP Label poksap-db has 2 communication interfaces. (Alternate Service) Communication Interface 1: busan-b2 IP Address: 192.168.201.144 Network: net_ether_01 Attribute: public Alias address for heartbeat: (Alternate Service) Communication Interface 2: busan-b1 234 IBM PowerHA SystemMirror 7.1 for AIX IP Address: Network: Attribute: 192.168.101.144 net_ether_01 public Alias address for heartbeat: Service IP Label poksap-db has no communication interfaces for recovery. This node has 1 persistent IP label(s): Persistent IP Label busan: IP address: 10.168.101.44 Network: net_ether_01 NODE seoul: This node has 1 service IP label(s): Service IP Label poksap-db: IP address: 10.168.101.143 Hardware Address: Network: net_ether_01 Attribute: public Aliased Address?: Enable Service IP Label poksap-db has 2 communication interfaces. (Alternate Service) Communication Interface 1: seoul-b1 IP Address: 192.168.101.143 Network: net_ether_01 Attribute: public Alias address for heartbeat: (Alternate Service) Communication Interface 2: seoul-b2 IP Address: 192.168.201.143 Network: net_ether_01 Attribute: public Alias address for heartbeat: Service IP Label poksap-db has no communication interfaces for recovery. This node has 1 persistent IP label(s): Persistent IP Label seoul: IP address: 10.168.101.43 Network: net_ether_01 Breakdown of network connections: Connections to network net_ether_01 Node busan is connected to network net_ether_01 by these interfaces: busan-b2 busan-b1 poksap-db busan Node seoul is connected to network net_ether_01 by these interfaces: seoul-b1 Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 235 seoul-b2 poksap-db seoul Tip: The cltopinfo -m command is used to show the heartbeat rings in the previous versions of PowerHA. Because this concept no longer applies, the output of the cltopinfo -m command is empty in PowerHA 7.1. The PowerHA 7.1 cluster administrator must explore all the utilities in the /usr/es/sbin/cluster/utilities/ directory in a testing system. Most of the utilities are only informational tools. Remember to never trigger unknown commands in production systems. 8.3.4 PowerHA ODM classes Example 8-47 on page 236 provides a comprehensive list of PowerHA Object Data Manager (ODM) files. Never edit these files directly, unless you are directed by IBM support. However, you can use the odmget command to grab cluster configuration information directly from these files as explained in this section. Example 8-47 PowerHA ODM files seoul:/etc/es/objrepos # ls HACMP* HACMPadapter HACMPpprcconsistgrp HACMPcluster HACMPras HACMPcommadapter HACMPresource HACMPcommlink HACMPresourcetype HACMPcsserver HACMPrg_loc_dependency HACMPcustom HACMPrgdependency HACMPdaemons HACMPrresmethods HACMPdisksubsys HACMPrules HACMPdisktype HACMPsa HACMPercmf HACMPsa_metadata HACMPercmfglobals HACMPsdisksubsys HACMPevent HACMPserver HACMPeventmgr HACMPsircol HACMPfcfile HACMPsite HACMPfcmodtime HACMPsiteinfo HACMPfilecollection HACMPsna HACMPgpfs HACMPsp2 HACMPgroup HACMPspprc HACMPlogs HACMPsr HACMPmonitor HACMPsvc HACMPnetwork HACMPsvcpprc HACMPnim HACMPsvcrelationship HACMPnode HACMPtape HACMPnpp HACMPtc HACMPoemfilesystem HACMPtimer HACMPoemfsmethods HACMPtimersvc HACMPoemvgmethods HACMPtopsvcs HACMPoemvolumegroup HACMPude HACMPpager HACMPudres_def HACMPpairtasks HACMPudresource HACMPpathtasks HACMPx25 HACMPport HACMPxd_mirror_group 236 IBM PowerHA SystemMirror 7.1 for AIX HACMPpprc Use the odmget command followed by the name of the file in the /etc/es/objrepos directory. Example 8-48 shows how to retrieve information about the cluster. Example 8-48 Using the odmget command to grab cluster information seoul:/ # ls -ld /etc/es/objrepos/HACMPcluster -rw-r--r-1 root hacmp 4096 Sep 17 12:29 /etc/es/objrepos/HACMPcluster seoul:/ # odmget HACMPcluster HACMPcluster: id = 1108531106 name = "korea" nodename = "seoul" sec_level = "Standard" sec_level_msg = "" sec_encryption = "" sec_persistent = "" last_node_ids = "" highest_node_id = 0 last_network_ids = "" highest_network_id = 0 last_site_ids = "" highest_site_id = 0 handle = 2 cluster_version = 12 reserved1 = 0 reserved2 = 0 wlm_subdir = "" settling_time = 0 rg_distribution_policy = "node" noautoverification = 0 clvernodename = "" clverhour = 0 clverstartupoptions = 0 Tip: In previous versions of PowerHA, the ODM HACMPtopsvcs class kept information about the current instance number for a node. In PowerHA 7.1, this class always has the instance number 1 (instanceNum = 1 as shown in the following example) because topology services are not used anymore. This number never changes. seoul:/ # odmget HACMPtopsvcs HACMPtopsvcs: hbInterval = 1 fibrillateCount = 4 runFixedPri = 1 fixedPriLevel = 38 tsLogLength = 5000 gsLogLength = 5000 instanceNum = 1 Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 237 You can use the HACMPnode ODM class to discover which version of PowerHA is installed as shown in Example 8-49. Example 8-49 Using the odmget command to retrieve the PowerHA version seoul:/ # odmget HACMPnode | grep version | sort -u version = 12 The following version numbers and corresponding HACMP/PowerHA release are available: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: HACMP 4.3.1 HACMP 4.4 HACMP 4.4.1 HACMP 4.5 HACMP 5.1 HACMP 5.2 HACMP 5.3 HACMP 5.4 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 Querying the HACMPnode ODM class is useful during cluster synchronization after a migration, when PowerHA issues warning messages about mixed versions among the nodes. If the HACMPtopsvcs ODM class can no longer be used to discover if the configuration must be synchronized across the nodes, you can query the HACMPcluster ODM class. This class keeps a numeric attribute called handle. Each node has a different value for this attribute, ranging from 1 to 32. You can retrieve the handle values by using the odmget or clhandle commands as shown in Example 8-50. Example 8-50 Viewing the cluster handles seoul:/ # clcmd odmget HACMPcluster ------------------------------NODE seoul ------------------------------HACMPcluster: id = 1108531106 name = "korea" nodename = "seoul" sec_level = "Standard" sec_level_msg = "" sec_encryption = "" sec_persistent = "" last_node_ids = "" highest_node_id = 0 last_network_ids = "" highest_network_id = 0 last_site_ids = "" highest_site_id = 0 handle = 2 cluster_version = 12 reserved1 = 0 reserved2 = 0 wlm_subdir = "" 238 IBM PowerHA SystemMirror 7.1 for AIX settling_time = 0 rg_distribution_policy = "node" noautoverification = 0 clvernodename = "" clverhour = 0 clverstartupoptions = 0 ------------------------------NODE busan ------------------------------HACMPcluster: id = 1108531106 name = "korea" nodename = "busan" sec_level = "Standard" sec_level_msg = "" sec_encryption = "" sec_persistent = "" last_node_ids = "" highest_node_id = 0 last_network_ids = "" highest_network_id = 0 last_site_ids = "" highest_site_id = 0 handle = 1 cluster_version = 12 reserved1 = 0 reserved2 = 0 wlm_subdir = "" settling_time = 0 rg_distribution_policy = "node" noautoverification = 0 clvernodename = "" clverhour = 0 clverstartupoptions = 0 seoul:/ # clcmd clhandle ------------------------------NODE seoul ------------------------------2 seoul ------------------------------NODE busan ------------------------------1 busan When you perform a cluster configuration change in any node, that node receives a numeric value of 0 over its handle. Suppose that you want to add a new resource group to the korea cluster and that you make the change from the seoul node. After you do the modification, and before you synchronize Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 239 the cluster, the handle attribute in the HACMPcluster ODM class in the seoul node has a value of 0 as shown in Example 8-51. Example 8-51 Handle values after a change, before synchronization seoul:/ # clcmd odmget HACMPcluster | egrep "NODE|handle" NODE seoul handle = 0 NODE busan handle = 1 seoul:/ # clcmd clhandle ------------------------------NODE seoul ------------------------------0 seoul ------------------------------NODE busan ------------------------------1 busan After you synchronize the cluster, the handle goes back to its original value of 2 as shown in Example 8-52. Example 8-52 Original handle values after synchronization seoul:/ # smitty sysmirror Custom Cluster Configuration Verify and Synchronize Cluster Configuration (Advanced) seoul:/ # clcmd odmget HACMPcluster | egrep "NODE|handle" NODE seoul handle = 2 NODE busan handle = 1 seoul:/ # clcmd clhandle ------------------------------NODE seoul ------------------------------2 seoul ------------------------------NODE busan ------------------------------1 busan If you experience a situation where more than one node has a handle with a 0 value, you or another person might have performed the changes from different nodes. Therefore, you must decide in which node you want to start the synchronization. As result, the cluster modifications made on the other nodes are then lost. 240 IBM PowerHA SystemMirror 7.1 for AIX 8.3.5 PowerHA clmgr utility The clmgr utility provides a new interface to PowerHA with consistency, usability, and serviceability. The tool is packed into the cluster.es.server.utils file set as shown in Example 8-53. Example 8-53 The clmgr utility file set seoul:/ # whence clmgr /usr/es/sbin/cluster/utilities/clmgr seoul:/ # lslpp -w /usr/es/sbin/cluster/utilities/clmgr File Fileset Type ---------------------------------------------------------------------------/usr/es/sbin/cluster/utilities/clmgr cluster.es.server.utils Hardlink The clmgr command generates a /var/hacmp/log/clutils.log log file. The clmgr command supports the actions as listed in 5.2.1, “The clmgr action commands” on page 104. For monitoring purposes, you can use the query and view actions. For a list of object classes, that are available for each action, see 5.2.2, “The clmgr object classes” on page 105. Example using the query action Example 8-54 shows the query action on the PowerHA cluster using the clmgr command. Example 8-54 Query action on the PowerHA cluster using the clmgr command seoul:/ # clmgr query cluster CLUSTER_NAME="korea" CLUSTER_ID="1108531106" STATE="STABLE" VERSION="7.1.0.1" VERSION_NUMBER="12" EDITION="STANDARD" CLUSTER_IP="" REPOSITORY="caa_private0" SHARED_DISKS="cldisk2,cldisk1" UNSYNCED_CHANGES="false" SECURITY="Standard" FC_SYNC_INTERVAL="10" RG_SETTLING_TIME="0" RG_DIST_POLICY="node" MAX_EVENT_TIME="180" MAX_RG_PROCESSING_TIME="180" SITE_POLICY_FAILURE_ACTION="fallover" SITE_POLICY_NOTIFY_METHOD="" DAILY_VERIFICATION="Enabled" VERIFICATION_NODE="Default" VERIFICATION_HOUR="0" VERIFICATION_DEBUGGING="Enabled" LEVEL="" ALGORITHM="" GRACE_PERIOD="" Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 241 REFRESH="" MECHANISM="" CERTIFICATE="" PRIVATE_KEY="" seoul:/ # clmgr query interface busan-b2 busan-b1 poksap-db seoul-b1 seoul-b2 seoul:/ # clmgr query node busan seoul seoul:/ # clmgr query network net_ether_01 seoul:/ # clmgr query resource_group db2pok_ResourceGroup seoul:/ # clmgr query volume_group caavg_private pokvg Tip: Another way to check the PowerHA version is to query the SNMP subsystem as follows: seoul:/ # snmpinfo -m dump -v -o /usr/es/sbin/cluster/hacmp.defs clstrmgrVersion clstrmgrVersion.1 = "7.1.0.1" clstrmgrVersion.2 = "7.1.0.1" Example using the view action Example 8-55 shows the view action on the PowerHA cluster using the clmgr command. Example 8-55 Using the view action on the PowerHA cluster using clmgr seoul:/ # clmgr view report cluster Cluster: korea Cluster services: active State of cluster: up Substate: stable ############# APPLICATIONS ############# Cluster korea provides the following applications: db2pok_ApplicationServer Application: db2pok_ApplicationServer db2pok_ApplicationServer is started by /usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok db2pok_ApplicationServer is stopped by /usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok 242 IBM PowerHA SystemMirror 7.1 for AIX Application monitors for db2pok_ApplicationServer: db2pok_SQLMonitor db2pok_ProcessMonitor Monitor name: db2pok_SQLMonitor Type: custom Monitor method: user Monitor interval: 120 seconds Hung monitor signal: 9 Stabilization interval: 240 seconds Retry count: 3 tries Restart interval: 1440 seconds Failure action: fallover Cleanup method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok Restart method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok Monitor name: db2pok_ProcessMonitor Type: process Process monitored: db2sysc Process owner: db2pok Instance count: 1 Stabilization interval: 240 seconds Retry count: 3 tries Restart interval: 1440 seconds Failure action: fallover Cleanup method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok Restart method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok This application is part of resource group 'db2pok_ResourceGroup'. Resource group policies: Startup: on home node only Fallover: to next priority node in the list Fallback: never State of db2pok_ApplicationServer: online Nodes configured to provide db2pok_ApplicationServer: seoul {up} busan {up} Node currently providing db2pok_ApplicationServer: seoul {up} The node that will provide db2pok_ApplicationServer if seoul fails is: busan Resources associated with db2pok_ApplicationServer: Service Labels poksap-db(10.168.101.143) {online} Interfaces configured to provide poksap-db: seoul-b1 {up} with IP address: 192.168.101.143 on interface: en0 on node: seoul {up} on network: net_ether_01 {up} seoul-b2 {up} with IP address: 192.168.201.143 on interface: en2 on node: seoul {up} on network: net_ether_01 {up} busan-b2 {up} with IP address: 192.168.201.144 on interface: en2 on node: busan {up} on network: net_ether_01 {up} Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 243 busan-b1 {up} with IP address: 192.168.101.144 on interface: en0 on node: busan {up} on network: net_ether_01 {up} Shared Volume Groups: pokvg ############# TOPOLOGY ############# korea consists of the following nodes: busan seoul busan Network interfaces: busan-b2 {up} with IP address: 192.168.201.144 on interface: en2 on network: net_ether_01 {up} busan-b1 {up} with IP address: 192.168.101.144 on interface: en0 on network: net_ether_01 {up} seoul Network interfaces: seoul-b1 {up} with IP address: 192.168.101.143 on interface: en0 on network: net_ether_01 {up} seoul-b2 {up} with IP address: 192.168.201.143 on interface: en2 on network: net_ether_01 {up} seoul:/ # clmgr view report topology Cluster Name: korea Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No Repository Disk: caa_private0 Cluster IP Address: NODE busan: Network net_ether_01 poksap-db busan-b2 busan-b1 NODE seoul: Network net_ether_01 poksap-db seoul-b1 seoul-b2 Network 244 Attribute IBM PowerHA SystemMirror 7.1 for AIX 10.168.101.143 192.168.201.144 192.168.101.144 Alias 10.168.101.143 192.168.101.143 192.168.201.143 Monitor method Node Adapter(s) net_ether_01 public busan-b1 poksap-db Enable Default monitoring busan seoul seoul-b1 seoul-b2 poksap-db Adapter Type Network Net Type Attribute Address Hardware Address Interface Name Global Name Alias for HB Prefix Length busan-b2 192.168.201.144 22 busan-b1 192.168.101.144 22 poksap-db 10.168.101.143 22 seoul-b1 192.168.101.143 22 seoul-b2 192.168.201.143 22 poksap-db 10.168.101.143 22 busan-b2 Node Netmask boot net_ether_01 ether en2 public busan 255.255.255.0 boot net_ether_01 ether en0 public busan 255.255.255.0 service net_ether_01 ether public busan 255.255.255.0 boot net_ether_01 ether en0 public seoul 255.255.255.0 boot net_ether_01 ether en2 public seoul 255.255.255.0 service net_ether_01 ether public seoul 255.255.255.0 IP You can also use the clmgr command to see the list of PowerHA SystemMirror log files as shown in Example 8-56. Example 8-56 Viewing the PowerHA cluster log files using the clmgr command seoul:/ # clmgr view log Available Logs: autoverify.log cl2siteconfig_assist.log cl_testtool.log clavan.log clcomd.log clcomddiag.log clconfigassist.log clinfo.log clstrmgr.debug clstrmgr.debug.long cluster.log cluster.mmddyyyy clutils.log clverify.log cspoc.log cspoc.log.long cspoc.log.remote dhcpsa.log dnssa.log domino_server.log emuhacmp.out Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 245 hacmp.out ihssa.log migration.log sa.log sax.log Tip: The output verbose level can be set by using the -l option as in the following example: clmgr -l {low|med|high|max} action object 8.3.6 IBM Systems Director web interface This section explains how to discover and monitor a cluster by using the IBM Systems Director 6.1 web interface. For the steps to install IBM Systems Director, the IBM PowerHA SystemMirror plug-in, and IBM System Director Common Agent, see Chapter 11, “Installing IBM Systems Director and the PowerHA SystemMirror plug-in” on page 325. Login page for IBM Systems Director When you point the web browser to the IBM Systems Director IP address, port 8422, you are presented with a login page. The root user and password are used to log on as shown in Figure 8-1 on page 247. Root user: Do not use the root user. The second person who logs on with the root user ID unlogs the first person, and so on. The logon is exclusive. For a production environment, create an AIX user ID for each person who must connect to the IBM Systems Director web interface. This user ID must belong to smadmin. Therefore, everyone can connect simultaneously to the IBM Systems Director web interface. For more information, see the “Users and user groups in IBM Systems Director” topic in the IBM Systems Director V6.1.x Information Center at: http://publib.boulder.ibm.com/infocenter/director/v6r1x/index.jsp?topic=/direct or.security_6.1/fqm0_c_user_accounts.html smadmin (Administrator group): Members of the smadmin group are authorized for all operations. 246 IBM PowerHA SystemMirror 7.1 for AIX Figure 8-1 IBM Systems Director 6.1 login page Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 247 Welcome page for IBM Systems Director On the welcome page, the administrator must first discover the systems with PowerHA to administer. Figure 8-2 shows the link underlined in red. Figure 8-2 IBM Systems Director 6.1 welcome page 248 IBM PowerHA SystemMirror 7.1 for AIX Discovery Manager In the Discovery Manager panel, the administrator must click the System discovery link as shown in Figure 8-3. Figure 8-3 IBM Systems Director 6.1 Discovery Manager Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 249 Selecting the systems and agents to discover In the System Discovery panel, complete the following actions: 1. For Select a discovery option, select the Range of IPv4 addresses. 2. Enter the starting and ending IP addresses. In Figure 8-4, only the two IP addresses for the seoul and busan nodes are used. 3. For Select the resource type to discover, leave the default of All. 4. Click the Discover now button. The discovery takes less than 1 minute in this case because the IP range is limited to two machines. Figure 8-4 Selecting the systems to discover 250 IBM PowerHA SystemMirror 7.1 for AIX IBM Systems Director availability menu In the left navigation bar, expand Availability and click the PowerHA SystemMirror link as shown in Figure 8-5. Figure 8-5 IBM Systems Director 6.1 availability menu Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 251 Initial panel of the PowerHA SystemMirror plug-in In the Health Summary list, you can see that two systems have an OK status with one resource group also having an OK status. Click the Manage Clusters link as shown in Figure 8-6. Figure 8-6 PowerHA SystemMirror plug-in initial menu 252 IBM PowerHA SystemMirror 7.1 for AIX PowerHA available clusters On the Cluster and Resource Group Management panel (Figure 8-7), the PowerHA plug-in for IBM Systems Director shows the available clusters. This information has been retrieved in the discovery process. Two clusters are shown: korea and ro_cl. In the korea cluster, the two nodes, seoul and busan, are visible and indicate a healthy status. The General tab on the right shows more relevant information about the selected cluster. Figure 8-7 PowerHA SystemMirror plug-in: Available clusters Cluster menu You can right-click all the objects to access options. Figure 8-8 shows an example of the options for the korea cluster. Figure 8-8 Menu options when right-clicking a cluster in the PowerHA SystemMirror plug-in Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 253 PowerHA SystemMirror plug-in: Resource Groups tab The Resource Group tab (Figure 8-9) shows the available resource groups in the cluster. Figure 8-9 PowerHA SystemMirror plug-in: Resource Groups tab 254 IBM PowerHA SystemMirror 7.1 for AIX Resource Groups menu You can right-click the resource groups to access options such as those shown in Figure 8-10. Figure 8-10 Options available when right-clicking a resource group in PowerHA SystemMirror plug-in PowerHA SystemMirror plug-in: Cluster tab The Cluster tab has several tabs on the right that you can use to retrieve information about the cluster. These tabs include the Resource Groups tab, Network tab, Storage tab, and Additional Properties tab as shown in the following sections. Resource Group tab Figure 8-11 shows the Resource Groups tab and the information that is presented. Figure 8-11 PowerHA SystemMirror plug-in: Resource Groups tab Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 255 Network tab Figure 8-12 shows the Networks tab and the information that is displayed. Figure 8-12 PowerHA SystemMirror plug-in: Networks tab Storage tab Figure 8-13 shows the Storage tab and the information that is presented. Figure 8-13 PowerHA SystemMirror plug-in: Storage tab 256 IBM PowerHA SystemMirror 7.1 for AIX Additional Properties tab Figure 8-14 shows the Additional Properties tab and the information that is presented. Figure 8-14 PowerHA SystemMirror plug-in Additional Properties tab 8.3.7 IBM Systems Director CLI (smcli interface) The new web interface for IBM Systems Director is powerful, allowing IBM Systems Director to be used anywhere to open a systems management console. However, it is often desirable to perform certain functions against the management server by using a command line. Whether scripting something to be used on many systems or to automate a process, the CLI can be useful in a management environment such as IBM Systems Director. Tip: To run the commands, the smcli interface requires you to be an IBM Systems Director superuser. Example 8-57 runs the smcli command host name mexico in IBM Systems Director to see the available options with PowerHA. Example 8-57 Available options for PowerHA in IBM Systems Director CLI mexico:/ # /opt/ibm/director/bin/smcli lsbundle | grep sysmirror sysmirror/help sysmirror/lsac sysmirror/lsam sysmirror/lsappctl sysmirror/lsappmon sysmirror/lscl sysmirror/lscluster sysmirror/lsdependency sysmirror/lsdp sysmirror/lsfc sysmirror/lsfilecollection sysmirror/lsif sysmirror/lsinterface sysmirror/lslg sysmirror/lslog sysmirror/lsmd sysmirror/lsmethod Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster 257 sysmirror/lsnd . . . All the configuration commands listed in Example 8-57 can be triggered by using the smcli command. Example 8-58 shows the commands that you can use. Example 8-58 Using #smcli to retrieve PowerHA information # Lists the clusters that can be managed by the IBM Systems Director: mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lscluster korea (1108531106) # Lists the service labels of a cluster: mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lssi -c korea poksap-db # Lists all interfaces defined in a cluster: mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsif -c korea busan-b1 busan-b2 seoul-b1 seoul-b2 # Lists resource groups of a cluster: mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsrg -c korea db2pok_ResourceGroup # Lists networks: mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsnw -c korea net_ether_01 # Lists application servers of a cluster: mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsac -c korea db2pok_ApplicationServer 258 IBM PowerHA SystemMirror 7.1 for AIX 9 Chapter 9. Testing the PowerHA 7.1 cluster This chapter takes you through several simulations for testing a PowerHA 7.1 cluster and then explains the cluster behavior and log files. This chapter includes the following topics: Testing the SAN-based heartbeat channel Testing the repository disk heartbeat channel Simulation of a network failure Testing the rootvg system event Simulation of a crash in the node with an active resource group Simulations of CPU starvation Simulation of a Group Services failure Testing a Start After resource group dependency Testing dynamic node priority © Copyright IBM Corp. 2011. All rights reserved. 259 9.1 Testing the SAN-based heartbeat channel This section explains how to check the redundant heartbeat through the storage area network (SAN)-based channel if the network communication between nodes is lost. The procedure is based on the test cluster shown in Figure 9-1. In this environment, the PowerHA cluster is synchronized, and the CAA cluster is running. Figure 9-1 Testing the SAN-based heartbeat Example 9-1 shows the working state of the CAA cluster. Example 9-1 Initial error-free CAA status sydney:/ # lscluster -i Network/Storage Interface Query Cluster Name: au_cl Cluster uuid: d77ac57e-cc1b-11df-92a4-00145ec5bf9a Number of nodes reporting = 2 Number of nodes expected = 2 Node sydney Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a Number of interfaces discovered = 4 Interface number 1 en1 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.c5.bf.9a Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 5 260 IBM PowerHA SystemMirror 7.1 for AIX Probe interval for interface = 120 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en2 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.c5.bf.9b Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 5 Probe interval for interface = 120 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 sfwcom ifnet type = 0 ndd type = 304 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 0 Mean Deviation in network rrt across interface = 0 Probe interval for interface = 100 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP Interface number 4 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 750 Mean Deviation in network rrt across interface = 1500 Probe interval for interface = 22500 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED Node perth Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182 Number of interfaces discovered = 4 Interface number 1 en1 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.e7.25.d9 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Chapter 9. Testing the PowerHA 7.1 cluster 261 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en2 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.e7.25.d8 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 sfwcom ifnet type = 0 ndd type = 304 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 0 Mean Deviation in network rrt across interface = 0 Probe interval for interface = 100 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP Interface number 4 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 750 Mean Deviation in network rrt across interface = 1500 Probe interval for interface = 22500 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED 262 IBM PowerHA SystemMirror 7.1 for AIX You can check the connectivity between nodes by using the socksimple command. This command provides a ping-type interface to send and receive packets over the cluster communications channels. Example 9-2 shows the usage output of running the socksimple command. Example 9-2 socksimple usage sydney:/ # socksimple Usage: socksimple -r|-s [-v] [-a address] [-p port] [-t ttl] -r|-s -a address -p port -p ttl -v Receiver or sender. Required argument, mutually exclusive Cluster address to listen/send on, overrides the default. (must be < 16 characters long) port to listen/send on, overrides the default of 12. Time-To-Live to send, overrides the default of 1. Verbose mode You can obtain the cluster address for the -a option of the socksimple command from the lscluster -c command output (Example 9-3). Example 9-3 Node IDs of the CAA cluster sydney:/ # lscluster -c Cluster query for cluster aucl returns: Cluster uuid: 98f28ffa-cfde-11df-9a82-00145ec5bf9a Number of nodes in cluster = 2 Cluster id for node perth is 1 Primary IP address for node perth is 192.168.101.136 Cluster id for node sydney is 2 Primary IP address for node sydney is 192.168.101.135 Number of disks in cluster = 0 Multicast address for cluster is 228.168.101.135 To test the SAN-based heartbeat channel, follow these steps: 1. Check the cluster communication with all the network interfaces up (Example 9-4). Example 9-4 The socksimple test with the network channel up sydney:/ # socksimple -s -a 1 socksimple version 1.2 socksimpleing 1/12 with ttl=1: 1275 bytes from cluster host id = 1: seqno=1275 ttl=1 time=0.415 ms 1276 bytes from cluster host id = 1: seqno=1276 ttl=1 time=0.381 ms 1277 bytes from cluster host id = 1: seqno=1277 ttl=1 time=0.347 ms --- socksimple statistics --3 packets transmitted, 3 packets received round-trip min/avg/max = 0.347/0.381/0.415 ms perth:/ # socksimple -r -a 1 socksimple version 1.2 Listening on 1/12: Chapter 9. Testing the PowerHA 7.1 cluster 263 Replying to socksimple from cluster node id = 2 bytes=1275 seqno=1275 ttl=1 Replying to socksimple from cluster node id = 2 bytes=1276 seqno=1276 ttl=1 Replying to socksimple from cluster node id = 2 bytes=1277 seqno=1277 ttl=1 perth:/ # 2. Disconnect the network interfaces, by pulling the cables in one node to simulate an Ethernet network failure. Example 9-5 shows the interfaces status. Example 9-5 Ethernet ports down perth:/ # entstat -d ent1 | grep -i link Link Status: UNKNOWN perth:/ # entstat -d ent2 | grep -i link Link Status: UNKNOWN 3. Check the cluster communication by using the socksimple command as shown in Example 9-6. Example 9-6 The socksimple test with Ethernet ports down sydney:/ # socksimple -s -a 1 socksimple version 1.2 socksimpleing 1/12 with ttl=1: 1275 1275 1275 1276 1276 1276 bytes bytes bytes bytes bytes bytes from from from from from from cluster cluster cluster cluster cluster cluster host host host host host host id id id id id id = = = = = = 1: 1: 1: 1: 1: 1: seqno=1275 seqno=1275 seqno=1275 seqno=1276 seqno=1276 seqno=1276 ttl=1 ttl=1 ttl=1 ttl=1 ttl=1 ttl=1 time=1.075 ms time=50.513 ms time=150.663 ms time=0.897 ms time=50.623 ms time=150.791 ms --- socksimple statistics --2 packets transmitted, 6 packets received round-trip min/avg/max = 0.897/67.427/150.791 ms perth:/ # socksimple -r -a 1 socksimple version 1.2 Listening on 1/12: Replying to socksimple from cluster node id = 2 bytes=1275 seqno=1275 ttl=1 Replying to socksimple from cluster node id = 2 bytes=1276 seqno=1276 ttl=1 perth:/ 4. Check the status of the cluster interfaces by using the lscluster -i command. Example 9-7 shows the status for both disconnected ports on the perth node. In this example, the status has changed from UP to DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT. Example 9-7 CAA cluster status with Ethernet ports down sydney:/ # lscluster -i Network/Storage Interface Query Cluster Name: Cluster uuid: 264 aucl 98f28ffa-cfde-11df-9a82-00145ec5bf9a IBM PowerHA SystemMirror 7.1 for AIX Number of nodes reporting = 2 Number of nodes expected = 2 Node sydney Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a Number of interfaces discovered = 4 Interface number 1 en1 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.c5.bf.9a Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en2 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.c5.bf.9b Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 sfwcom ifnet type = 0 ndd type = 304 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP Interface number 4 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 750 Mean Deviation in network rrt across interface = 1500 Chapter 9. Testing the PowerHA 7.1 cluster 265 Probe interval for interface = 22500 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED Node perth Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182 Number of interfaces discovered = 4 Interface number 1 en1 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.e7.25.d9 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x630853 Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en2 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.e7.25.d8 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x630853 Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 sfwcom ifnet type = 0 ndd type = 304 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP Interface number 4 dpcom ifnet type = 0 ndd type = 305 266 IBM PowerHA SystemMirror 7.1 for AIX Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 750 Mean Deviation in network rrt across interface = 1500 Probe interval for interface = 22500 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED 5. Reconnect the Ethernet cables and check the port status as shown in Example 9-8. Example 9-8 Ethernet ports reconnected perth:/ # entstat -d ent1 | grep -i link Link Status: Up perth:/ # entstat -d ent2|grep -i link Link Status: Up 6. Check if the cluster status has recovered. Example 9-9 shows that both Ethernet ports on the perth node are now in the UP state. Example 9-9 CAA cluster status recovered sydney:/ # lscluster -i Network/Storage Interface Query Cluster Name: aucl Cluster uuid: 98f28ffa-cfde-11df-9a82-00145ec5bf9a Number of nodes reporting = 2 Number of nodes expected = 2 Node sydney Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a Number of interfaces discovered = 4 Interface number 1 en1 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.c5.bf.9a Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en2 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.c5.bf.9b Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Chapter 9. Testing the PowerHA 7.1 cluster 267 netmask Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 sfwcom ifnet type = 0 ndd type = 304 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP Interface number 4 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 750 Mean Deviation in network rrt across interface = 1500 Probe interval for interface = 22500 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED Node perth Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182 Number of interfaces discovered = 4 Interface number 1 en1 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.e7.25.d9 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en2 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.14.5e.e7.25.d8 268 IBM PowerHA SystemMirror 7.1 for AIX netmask Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 sfwcom ifnet type = 0 ndd type = 304 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP Interface number 4 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 750 Mean Deviation in network rrt across interface = 1500 Probe interval for interface = 22500 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED 9.2 Testing the repository disk heartbeat channel This section explains how to test the repository disk heartbeat channel. 9.2.1 Background When the entire PowerHA SystemMirror IP network fails, and either the SAN-based heartbeat network (sfwcom) does not exist, or it exists but has failed, CAA uses the heartbeat-over-repository-disk (dpcom) feature. The example in the next section describes dpcom heartbeating in a two-node cluster after all IP interfaces have failed. Chapter 9. Testing the PowerHA 7.1 cluster 269 9.2.2 Testing environment A two-node cluster is configured with the following topology: en0 is not included in the PowerHA cluster, but it is monitored by CAA. en3 through en5 are included in the PowerHA cluster and monitored by CAA. No SAN-based communication channel (sfwcom) is available. Initially, both nodes are online and running cluster services, all IP interfaces are online, and the service IP address has an alias on the en3 interface. This test scenario includes unplugging the cable of one interface at a time, starting with en3, en4, en5, and finally en0. As each cable is unplugged, the service IP correctly swaps to the next available interface on the same node. Each failed interface is marked as DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT as shown in Example 9-10. After the cables for the en3 through en5 interfaces are unplugged, a local network failure event occurs, leading to a selective failover of the resource group to the remote node. However, because the en0 interface is still up, CAA continues to heartbeat over the en0 interface. Example 9-10 Output of the lscluster -i command [hacmp27:HAES7101/AIX61-06 /] # lscluster -i Network/Storage Interface Query Cluster Name: ha71sp1_aixsp2 Cluster uuid: c37f7324-daff-11df-903e-0011257e4998 Number of nodes reporting = 2 Number of nodes expected = 2 Node hacmp27 Node uuid = 66b66b10-d16e-11df-aa3f-0011257e4998 Number of interfaces discovered = 5 Interface number 1 en0 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.7e.49.98 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 9.3.44.27 broadcast 9.3.44.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en3 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cc.d.b5 Smoothed rrt across interface = 8 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 110 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x630853 Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 10.1.1.27 broadcast 10.1.1.255 netmask 255.255.255.0 270 IBM PowerHA SystemMirror 7.1 for AIX Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 en4 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cc.d.b6 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x630853 Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 10.1.2.27 broadcast 10.1.2.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 4 en5 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cc.d.b7 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x630853 Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 10.1.3.27 broadcast 10.1.3.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 5 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 118 Mean Deviation in network rrt across interface = 81 Probe interval for interface = 1990 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED Node hacmp28 Node uuid = 15e86116-d173-11df-8bdf-0011257e4340 Number of interfaces discovered = 5 Interface number 1 en0 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.7e.43.40 Smoothed rrt across interface = 8 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 110 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 9.3.44.28 broadcast 9.3.44.255 netmask 255.255.255.0 Chapter 9. Testing the PowerHA 7.1 cluster 271 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en3 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cb.e1.d Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 2 IPV4 ADDRESS: 10.1.1.28 broadcast 10.1.1.255 netmask 255.255.255.0 IPV4 ADDRESS: 192.168.1.27 broadcast 192.168.1.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 en4 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cb.e1.e Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 10.1.2.28 broadcast 10.1.2.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 4 en5 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cb.e1.f Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 10.1.3.28 broadcast 10.1.3.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 5 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 1037 Mean Deviation in network rrt across interface = 1020 Probe interval for interface = 20570 ms 272 IBM PowerHA SystemMirror 7.1 for AIX ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED Example 9-11 shows the output of the lscluster -m command. Example 9-11 Output of the lscluster -m command [hacmp27:HAES7101/AIX61-06 /] # lscluster -m Calling node query for all nodes Node query number of nodes examined: 2 Node name: hacmp27 Cluster shorthand id for node: 1 uuid for node: 66b66b10-d16e-11df-aa3f-0011257e4998 State of node: UP NODE_LOCAL Smoothed rtt to node: 0 Mean Deviation in network rtt to node: 0 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998 Number of points_of_contact for node: 0 Point-of-contact interface & contact state n/a -----------------------------Node name: hacmp28 Cluster shorthand id for node: 2 uuid for node: 15e86116-d173-11df-8bdf-0011257e4340 State of node: UP Smoothed rtt to node: 8 Mean Deviation in network rtt to node: 3 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998 Number of points_of_contact for node: 4 Point-of-contact interface & contact state en0 UP en5 DOWN en4 DOWN en3 DOWN Chapter 9. Testing the PowerHA 7.1 cluster 273 After the en0 cable is unplugged, CAA proceeds to heartbeat over the repository disk (dpcom). This action is indicated by the node status REACHABLE THROUGH REPOS DISK ONLY in the lscluster -m command (Example 9-12). Example 9-12 Output of the lscluster -m command [hacmp27:HAES7101/AIX61-06 /] # lscluster -m Calling node query for all nodes Node query number of nodes examined: 2 Node name: hacmp27 Cluster shorthand id for node: 1 uuid for node: 66b66b10-d16e-11df-aa3f-0011257e4998 State of node: UP NODE_LOCAL REACHABLE THROUGH REPOS DISK ONLY Smoothed rtt to node: 0 Mean Deviation in network rtt to node: 0 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998 Number of points_of_contact for node: 0 Point-of-contact interface & contact state n/a -----------------------------Node name: hacmp28 Cluster shorthand id for node: 2 uuid for node: 15e86116-d173-11df-8bdf-0011257e4340 State of node: UP REACHABLE THROUGH REPOS DISK ONLY Smoothed rtt to node: 143 Mean Deviation in network rtt to node: 107 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998 Number of points_of_contact for node: 5 Point-of-contact interface & contact state dpcom UP en0 DOWN en5 DOWN en4 DOWN en3 DOWN [hacmp28:HAES7101/AIX61-06 /] # lscluster -m Calling node query for all nodes Node query number of nodes examined: 2 Node name: hacmp27 Cluster shorthand id for node: 1 uuid for node: 66b66b10-d16e-11df-aa3f-0011257e4998 274 IBM PowerHA SystemMirror 7.1 for AIX State of node: UP REACHABLE THROUGH REPOS DISK ONLY Smoothed rtt to node: 17 Mean Deviation in network rtt to node: 5 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998 Number of points_of_contact for node: 5 Point-of-contact interface & contact state dpcom UP en4 DOWN en5 DOWN en3 DOWN en0 DOWN -----------------------------Node name: hacmp28 Cluster shorthand id for node: 2 uuid for node: 15e86116-d173-11df-8bdf-0011257e4340 State of node: UP NODE_LOCAL Smoothed rtt to node: 0 Mean Deviation in network rtt to node: 0 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998 Number of points_of_contact for node: 0 Point-of-contact interface & contact state n/a Example 9-13 shows the output of the lscluster -i command with the dpcom status changing from UP RESTRICTED AIX_CONTROLLED to UP AIX_CONTROLLED. Example 9-13 Output of the lscluster -i command showing the dpcom status [hacmp27:HAES7101/AIX61-06 /] # lscluster -i Network/Storage Interface Query Cluster Name: ha71sp1_aixsp2 Cluster uuid: c37f7324-daff-11df-903e-0011257e4998 Number of nodes reporting = 2 Number of nodes expected = 2 Node hacmp27 Node uuid = 66b66b10-d16e-11df-aa3f-0011257e4998 Number of interfaces discovered = 5 Interface number 1 en0 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.7e.49.98 Smoothed rrt across interface = 8 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 110 ms Chapter 9. Testing the PowerHA 7.1 cluster 275 ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x630853 Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 9.3.44.27 broadcast 9.3.44.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en3 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cc.d.b5 Smoothed rrt across interface = 8 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 110 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x630853 Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 10.1.1.27 broadcast 10.1.1.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 en4 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cc.d.b6 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x630853 Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 10.1.2.27 broadcast 10.1.2.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 4 en5 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cc.d.b7 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x630853 Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 10.1.3.27 broadcast 10.1.3.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 5 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 23 Mean Deviation in network rrt across interface = 11 276 IBM PowerHA SystemMirror 7.1 for AIX Probe interval for interface = 340 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP AIX_CONTROLLED Node hacmp28 Node uuid = 15e86116-d173-11df-8bdf-0011257e4340 Number of interfaces discovered = 5 Interface number 1 en0 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.7e.43.40 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 9.3.44.28 broadcast 9.3.44.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en3 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cb.e1.d Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 2 IPV4 ADDRESS: 10.1.1.28 broadcast 10.1.1.255 netmask 255.255.255.0 IPV4 ADDRESS: 192.168.1.27 broadcast 192.168.1.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 en4 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cb.e1.e Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 10.1.2.28 broadcast 10.1.2.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 4 en5 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cb.e1.f Smoothed rrt across interface = 7 Chapter 9. Testing the PowerHA 7.1 cluster 277 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 10.1.3.28 broadcast 10.1.3.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 5 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 10 Mean Deviation in network rrt across interface = 7 Probe interval for interface = 170 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP AIX_CONTROLLED After any interface cable is reconnected, such as the en0 interface, CAA stops heartbeating over the repository disk and resumes heartbeating over the IP interface. Example 9-14 shows the output of the lscluster -m command after the en0 cable is reconnected. The dpcom status changes from UP to DOWN RESTRICTED, and the en0 interface status changes from DOWN to UP. Example 9-14 Output of the lscluster -m command after en0 is reconnected [hacmp27:HAES/AIX61-06 /] # lscluster -m Calling node query for all nodes Node query number of nodes examined: 2 Node name: hacmp27 Cluster shorthand id for node: 1 uuid for node: 66b66b10-d16e-11df-aa3f-0011257e4998 State of node: UP NODE_LOCAL Smoothed rtt to node: 0 Mean Deviation in network rtt to node: 0 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998 Number of points_of_contact for node: 0 Point-of-contact interface & contact state n/a -----------------------------Node name: hacmp28 Cluster shorthand id for node: 2 uuid for node: 15e86116-d173-11df-8bdf-0011257e4340 State of node: UP Smoothed rtt to node: 7 278 IBM PowerHA SystemMirror 7.1 for AIX Mean Deviation in network rtt to node: 4 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID ha71sp1_aixsp2 local c37f7324-daff-11df-903e-0011257e4998 Number of points_of_contact for node: 5 Point-of-contact interface & contact state dpcom DOWN RESTRICTED en0 UP en5 DOWN en4 DOWN en3 DOWN Example 9-15 shows the output of the lscluster -i command. The en0 interface is now marked as UP, and the dpcom returns to UP RESTRICTED AIX_CONTROLLED. Example 9-15 Output of the lscluster -i command [hacmp27:HAES/AIX61-06 /] # lscluster -i Network/Storage Interface Query Cluster Name: ha71sp1_aixsp2 Cluster uuid: c37f7324-daff-11df-903e-0011257e4998 Number of nodes reporting = 2 Number of nodes expected = 2 Node hacmp27 Node uuid = 66b66b10-d16e-11df-aa3f-0011257e4998 Number of interfaces discovered = 5 Interface number 1 en0 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.7e.49.98 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 9.3.44.27 broadcast 9.3.44.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en3 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cc.d.b5 Smoothed rrt across interface = 8 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 110 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x630853 Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 10.1.1.27 broadcast 10.1.1.255 netmask 255.255.255.0 Chapter 9. Testing the PowerHA 7.1 cluster 279 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 en4 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cc.d.b6 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x630853 Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 10.1.2.27 broadcast 10.1.2.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 4 en5 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cc.d.b7 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x630853 Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 10.1.3.27 broadcast 10.1.3.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 5 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 120 Mean Deviation in network rrt across interface = 105 Probe interval for interface = 2250 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED Node hacmp28 Node uuid = 15e86116-d173-11df-8bdf-0011257e4340 Number of interfaces discovered = 5 Interface number 1 en0 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.7e.43.40 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 9.3.44.28 broadcast 9.3.44.255 netmask 255.255.255.0 280 IBM PowerHA SystemMirror 7.1 for AIX Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en3 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cb.e1.d Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 2 IPV4 ADDRESS: 10.1.1.28 broadcast 10.1.1.255 netmask 255.255.255.0 IPV4 ADDRESS: 192.168.1.27 broadcast 192.168.1.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 en4 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cb.e1.e Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 10.1.2.28 broadcast 10.1.2.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 4 en5 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 0.11.25.cb.e1.f Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x5e080863 ndd flags for interface = 0x63081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 10.1.3.28 broadcast 10.1.3.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 5 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms Chapter 9. Testing the PowerHA 7.1 cluster 281 ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED 9.3 Simulation of a network failure The following section explains the simulation of a network failure. 9.3.1 Background In PowerHA 7.1, the heartbeat method has changed. Heartbeating between the nodes is now done by AIX. The newly introduced CAA takes the role for heartbeating and event management. This simulation tests a network down scenario and looks at the log files of PowerHA and CAA monitoring. This test scenario has a two-node cluster, and one network interface is down on one of the nodes using the ifconfig command. This cluster has one IP heartbeat path and two non-heartbeat paths. One of the non-heartbeat paths is a SAN-based heartbeat channel (sfwcom). The other non-heartbeat path is heartbeating over the repository disk (dpcom). Although IP connectivity is lost when using the ifconfig command, PowerHA SystemMirror use CAA for heartbeating over the two other channels. This process is similar to the rs232 or diskhb heartbeat networks in previous versions of PowerHA. 9.3.2 Testing environment Before starting the network failover test, you check the status of the cluster. The resource group myrg is on the riyad node as shown in Figure 9-2. riyad:/ # netstat -i Name Mtu Network en0 1500 link#2 en0 1500 192.168.100 en0 1500 10.168.200 lo0 16896 link#1 lo0 16896 127 lo0 16896 loopback Address a2.4e.5f.b4.5.2 riyad saudisvc loopback Ipkts Ierrs 74918 74918 74918 3937 3937 3937 0 0 0 0 0 0 Opkts Oerrs 50121 50121 50121 3937 3937 3937 Coll 0 0 0 0 0 0 0 0 0 0 0 0 riyad:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------myrg ONLINE riyad OFFLINE jeddah Figure 9-2 Status of the riyad node 282 IBM PowerHA SystemMirror 7.1 for AIX The output of the lscluster -i command (Figure 9-3) shows that every adapter has the UP state. riyad:/ # lscluster -i |egrep "Interface|Node" Network/Storage Interface Query Node riyad Node uuid = 2f1590d0-cc02-11df-bf20-a24e5fb40502 Interface number 1 en0 Interface state UP Interface number 2 sfwcom Interface state UP Interface number 3 dpcom Interface state UP RESTRICTED AIX_CONTROLLED Node jeddah Node uuid = 39710df0-cc04-11df-929f-a24e5f0d9e02 Interface number 1 en0 Interface state UP Interface number 2 sfwcom Interface state UP Interface number 3 dpcom Interface state UP RESTRICTED AIX_CONTROLLED Figure 9-3 Output of the lscluster -i command 9.3.3 Testing a network failure Now, the ifconfig en0 down command is issued on the riyad node. The lscluster command shows en0 in a DOWN state and the resource group of the cluster moves to the next available node in the chain as shown in Figure 9-4. riyad:/ # lscluster -i |egrep "Interface|Node" Network/Storage Interface Query Node riyad Node uuid = 2f1590d0-cc02-11df-bf20-a24e5fb40502 Interface number 1 en0 Interface state DOWN SOURCE SOFTWARE Interface number 2 sfwcom Interface state UP Interface number 3 dpcom Interface state UP RESTRICTED AIX_CONTROLLED Node jeddah Node uuid = 39710df0-cc04-11df-929f-a24e5f0d9e02 Interface number 1 en0 Interface state UP Interface number 2 sfwcom Interface state UP Interface number 3 dpcom Interface state UP RESTRICTED AIX_CONTROLLED Figure 9-4 The lscluster -i command after a network failure Chapter 9. Testing the PowerHA 7.1 cluster 283 The clRGinfo command shows that the myrg resource group moved to the jeddah node (Figure 9-5). riyad:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------myrg OFFLINE riyad ONLINE jeddah Figure 9-5 clRGinfo while network failure You can also check the network down event in the /var/hacmp/adm/cluster.log file (Figure 9-6). Oct 6 09:57:42 riyad user:notice PowerHA SystemMirror network_down riyad net_ether_01 Oct 6 09:57:42 riyad user:notice PowerHA SystemMirror COMPLETED: network_down riyad net_ether_01 0 Oct 6 09:57:42 riyad user:notice PowerHA SystemMirror network_down_complete riyad net_ether_01 Oct 6 09:57:43 riyad user:notice PowerHA SystemMirror COMPLETED: network_down_complete riyad net_ether_01 0 Figure 9-6 Network down event from the cluster.log file 284 IBM PowerHA SystemMirror 7.1 for AIX for AIX: EVENT START: for AIX: EVENT for AIX: EVENT START: for AIX: EVENT You can see this event by monitoring the AHAFS events. You can monitor AHAFS event by running the /usr/sbin/rsct/bin/ahafs_mon_multi command as shown in Figure 9-7. jeddah:/ # /usr/sbin/rsct/bin/ahafs_mon_multi === write String : CHANGED=YES;CLUSTER=YES === files being monitored: fd file 3 /aha/cluster/nodeState.monFactory/nodeStateEvent.mon 4 /aha/cluster/nodeAddress.monFactory/nodeAddressEvent.mon 5 /aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon 6 /aha/cluster/nodeList.monFactory/nodeListEvent.mon 7 /aha/cpu/processMon.monFactory/usr/sbin/rsct/bin/hagsd.mon ================================== Loop 1: Event for /aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon has occurred. BEGIN_EVENT_INFO TIME_tvsec=1286376025 TIME_tvnsec=623294923 SEQUENCE_NUM=0 RC_FROM_EVPROD=0 BEGIN_EVPROD_INFO EVENT_TYPE=ADAPTER_DOWN INTERFACE_NAME=en0 NODE_NUMBER=2 NODE_ID=0x2F1590D0CC0211DFBF20A24E5FB40502 CLUSTER_ID=0x93D8689AD0F211DFA49CA24E5F0D9E02 END_EVPROD_INFO END_EVENT_INFO ================================== Figure 9-7 Event monitoring from AHAFS With help from the caa_event, you can monitor the network failure event. You can see the CAA event by running the /usr/sbin/rsct/bin/caa_event -a command (Figure 9-8). # /usr/sbin/rsct/bin/caa_event -a EVENT: adapter liveness: event_type(0) node_number(2) node_id(0) sequence_number(0) reason_number(0) p_interface_name(en0) EVENT: adapter liveness: event_type(1) node_number(2) node_id(0) sequence_number(1) reason_number(0) p_interface_name(en0) Figure 9-8 Network failure in CAA event monitoring Chapter 9. Testing the PowerHA 7.1 cluster 285 In this test scenario, you can see that the non-IP based heartbeat channel is working. Compared to previous version, heartbeating is now performed by CAA. 9.4 Testing the rootvg system event This scenario tests the event monitoring capability of PowerHA 7.1 on the new rootvg system. Because events are now being monitored at the kernel level with CAA, you can monitor the loss of access to the rootvg volume group. 9.4.1 The rootvg system event As discussed previously, event monitoring is now at the kernel level. The /usr/lib/drivers/phakernmgr kernel extension, which is loaded by the clevmgrdES subsystem, monitors these events for loss of rootvg. It can initiate a node restart operation if enabled to do so as shown in Figure 9-9. PowerHA 7.1 has a new system event that is enabled by default. This new event allows for the monitoring of the loss of the rootvg volume group while the cluster node is up and running. Previous versions of PowerHA/HACMP were unable to monitor this type of loss. Also the cluster was unable to perform a failover action in the event of the loss of access to rootvg. An example is if you lose a SAN disk that is hosting the rootvg for this cluster node. The new option is available under the SMIT menu path smitty sysmirror Custom Cluster Configuration Events System Events. Figure 9-9 shows that the rootvg system event is defined and enabled by default in PowerHA 7.1. Change/Show Event Response Type or select values in entry fields. Press Enter AFTER making all desired changes. * Event Name * Response * Active [Entry Fields] ROOTVG Log event and reboot Yes + + + Figure 9-9 The rootvg system event The default event properties instruct the system to log an event and restart when a loss of rootvg occurs. This exact scenario is tested in the next section to demonstrate this concept. 9.4.2 Testing the loss of the rootvg volume group We simulate this test with a two-node cluster. The rootvg volume group is hosted on SAN storage through a Virtual I/O Server (VIOS). The test removes access to the rootvg file systems with the cluster node still up and running. This test can be done in several ways, from pulling the physical SAN connection to making the storage unavailable to the operating system. In this situation, the VSCSI resource is made unavailable on the VIOS. This scenario entails a two-node cluster with one resource group. The cluster is running on two nodes: sydney and perth. The rootvg volume group is hosted by the VIOS on a VSCSI disk. 286 IBM PowerHA SystemMirror 7.1 for AIX Cluster node status and mapping First, you check the VIOS for the client mapping. You can identify the client partition number by running the uname -L command on the cluster node. In this case, the client partition is 7. Next you run the lsmap -all command on the VIOS, as shown in Figure 9-10, and look up the client partition. Only one LUN is mapped through VIOS to the cluster node, because the shared storage is attached by using Fibre Channel (FC) adapters. lsmap -all SVSA Physloc Client Partition ID --------------- -------------------------------------------- -----------------vhost5 U9117.MMA.101F170-V1-C26 0x00000007 VTD Status LUN Backing device Physloc vtscsi13 Available 0x8100000000000000 lp5_rootvg Figure 9-10 VIOS output of the lsmap command showing the rootvg resource Check the node to ensure that you have the right disk as shown in Figure 9-11. sydney:/ # lscfg -l hdisk0 hdisk0 U9117.MMA.101F170-V7-C5-T1-L8100000000000000 Disk Drive sydney:/ # lspv hdisk0 caa_private0 hdisk2 hdisk3 00c1f170ff638163 00c0f6a0febff5d4 00c1f170674f3d6b 00c1f1706751bc0d rootvg caavg_private dbvg appvg Virtual SCSI active active Figure 9-11 PowerHA node showing the mapping of hdisk0 to the VIOS After the mapping is established, review the cluster status to ensure that the resource group is online as shown in Figure 9-12. sydney:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------dbrg ONLINE sydney OFFLINE perth sydney:/ # lssrc -ls clstrmgrES | grep "Current state" Current state: ST_STABLE Figure 9-12 Sydney cluster status Chapter 9. Testing the PowerHA 7.1 cluster 287 Testing by taking the rootvg volume group offline To perform the test, take the mapping offline on the VIOS by removing the virtual target device definition. You do this test while the PowerHA node is up and running as shown in Figure 9-13. $ rmvdev -vtd vtscsi13 $ lsmap -vadapter vhost5 SVSA Physloc Client Partition ID --------------- -------------------------------------------- -----------------vhost5 U9117.MMA.101F170-V1-C26 0x00000007 VTD NO VIRTUAL TARGET DEVICE FOUND Figure 9-13 VIOS: Taking the rootvg LUN offline You have now removed the virtual target device (VTD) mapping that maps the rootvg LUN to the client partition, which in this case, is the PowerHA node called sydney. You perform this operation while the node is up and running and hosting the resource group. This operation demonstrates what happens to the node when rootvg access has been lost. While checking the node, the node halted and failed the resource group over to the standby node perth (Figure 9-14). This behavior is new and expected in this situation. It is a result of the system event that monitors access to rootvg from the kernel. Checking perth shows that the failover happened. perth:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------dbrg OFFLINE sydney ONLINE perth Figure 9-14 Node status from the standby node showing that the node failed over 288 IBM PowerHA SystemMirror 7.1 for AIX 9.4.3 Loss of rootvg: What PowerHA logs To show that this event is recognized and that you took the correct action, check the system error report shown in Figure 9-15. LABEL: IDENTIFIER: Date/Time: Sequence Number: Machine Id: Node Id: Class: Type: WPAR: Resource Name: KERNEL_PANIC 225E3B63 Wed Oct 6 14:07:54 2010 2801 00C1F1704C00 sydney S TEMP Global PANIC Description SOFTWARE PROGRAM ABNORMALLY TERMINATED Recommended Actions PERFORM PROBLEM DETERMINATION PROCEDURES Detail Data ASSERT STRING PANIC STRING System Halt because of rootvg failure Figure 9-15 System error report showing a rootvg failure 9.5 Simulation of a crash in the node with an active resource group This section presents a scenario to simulate the node crash with the resource group active. The scenario is made of the hot-standby cluster configuration with participating nodes seoul and busan with only one Ethernet network and two Ethernet interfaces in each node. The halt -q command is triggered in the seoul node that is hosting the resource group. The result is that the resource group moved to the standby node as expected. Example 9-16 shows the relevant output that is written to the busan:/var/hacmp/adm/cluster.log file. Example 9-16 Output of the resource move to the standby node Sep 29 16:30:22 busan user:warn|warning cld[11599982]: Shutting down all services. Sep 29 16:30:23 busan user:warn|warning cld[11599982]: Unmounting file systems. Sep 29 16:30:28 busan daemon:err|error ConfigRM[10879056]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID::::Template ID: a098bf90:::Details File: :::Location: RSCT,PeerDomain.C,1.99.1.519,17853:::CONFIGRM_PENDINGQUORUM_ER The operational quorum state of the active peer domain has changed to PENDING_QUORUM. This state usually indicates that exactly half of the nodes that are defined in the peer domain are online. In this state cluster resources cannot be recovered although none will be stopped explicitly. Sep 29 16:30:28 busan local0:crit clstrmgrES[5701662]: Wed Sep 29 16:30:28 Removing 2 from ml_idx Sep 29 16:30:28 busan user:notice PowerHA SystemMirror for AIX: EVENT START: node_down seoul Chapter 9. Testing the PowerHA 7.1 cluster 289 Sep Sep Sep Sep Sep Sep Sep Sep Sep Sep Sep Sep Sep Sep Sep Sep Sep 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 16:30:28 16:30:31 16:30:31 16:30:32 16:30:32 16:30:32 16:30:33 16:30:35 16:30:35 16:30:35 16:30:35 16:30:36 16:30:38 16:30:45 16:30:45 16:30:45 16:30:46 busan busan busan busan busan busan busan busan busan busan busan busan busan busan busan busan busan user:notice user:notice user:notice user:notice user:notice user:notice user:notice user:notice user:notice user:notice user:notice user:notice user:notice user:notice user:notice user:notice user:notice PowerHA PowerHA PowerHA PowerHA PowerHA PowerHA PowerHA PowerHA PowerHA PowerHA PowerHA PowerHA PowerHA PowerHA PowerHA PowerHA PowerHA SystemMirror SystemMirror SystemMirror SystemMirror SystemMirror SystemMirror SystemMirror SystemMirror SystemMirror SystemMirror SystemMirror SystemMirror SystemMirror SystemMirror SystemMirror SystemMirror SystemMirror for for for for for for for for for for for for for for for for for AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: AIX: EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT COMPLETED: node_down seoul 0 START: rg_move_release busan 1 START: rg_move busan 1 RELEASE COMPLETED: rg_move busan 1 RELEASE 0 COMPLETED: rg_move_release busan 1 0 START: rg_move_fence busan 1 COMPLETED: rg_move_fence busan 1 0 START: rg_move_fence busan 1 COMPLETED: rg_move_fence busan 1 0 START: rg_move_acquire busan 1 START: rg_move busan 1 ACQUIRE START: acquire_takeover_addr COMPLETED: acquire_takeover_addr 0 COMPLETED: rg_move busan 1 ACQUIRE 0 COMPLETED: rg_move_acquire busan 1 0 START: rg_move_complete busan 1 COMPLETED: rg_move_complete busan 1 0 The cld messages are related to the solidDB. The cld subsystem determines whether the local node must become the primary or secondary solidDB server in a failover. Before the crash, solidDB was active on the seoul node as follows: seoul:/ # lssrc -ls IBM.StorageRM | grep Leader Group Leader: seoul, 0xdc82faf0908920dc, 2 As expected, after the crash, solidDB is active in the remaining busan node as follows: busan:/ # lssrc -ls IBM.StorageRM | grep Leader Group Leader: busan, 0x564bc620973c9bdc, 1 With the absence of the seoul node, its interfaces are in STALE status as shown in Example 9-17. Example 9-17 The lscluster -i command to check the status of the cluster busan:/ # lscluster -i Network/Storage Interface Query Cluster Name: korea Cluster uuid: a01f47fe-d089-11df-95b5-a24e50543103 Number of nodes reporting = 2 Number of nodes expected = 2 Node busan Node uuid = e356646e-c0dd-11df-b51d-a24e57e18a03 Number of interfaces discovered = 3 Interface number 1 en0 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = a2.4e.57.e1.8a.3 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x1e080863 ndd flags for interface = 0x21081b Interface state UP Number of regular addresses configured on interface = 2 IPV4 ADDRESS: 192.168.101.144 broadcast 192.168.103.255 255.255.255.0 290 IBM PowerHA SystemMirror 7.1 for AIX netmask IPV4 ADDRESS: 10.168.101.44 broadcast 10.168.103.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.43 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 750 Mean Deviation in network rrt across interface = 1500 Probe interval for interface = 22500 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP RESTRICTED AIX_CONTROLLED Interface number 3 en2 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = a2.4e.57.e1.8a.7 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x1e080863 ndd flags for interface = 0x21081b Interface state UP Number of regular addresses configured on interface = 2 IPV4 ADDRESS: 192.168.201.144 broadcast 192.168.203.255 netmask 255.255.255.0 IPV4 ADDRESS: 10.168.101.143 broadcast 10.168.103.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.43 broadcast 0.0.0.0 netmask 0.0.0.0 Node seoul Node uuid = 4f8858be-c0dd-11df-930a-a24e50543103 Number of interfaces discovered = 3 Interface number 1 en0 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = a2.4e.50.54.31.3 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x1e080863 ndd flags for interface = 0x21081b Interface state STALE Number of regular addresses configured on interface = 2 IPV4 ADDRESS: 192.168.101.143 broadcast 192.168.103.255 netmask 255.255.255.0 IPV4 ADDRESS: 10.168.101.43 broadcast 10.168.103.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.43 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 2 en2 Chapter 9. Testing the PowerHA 7.1 cluster 291 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = a2.4e.50.54.31.7 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x1e080863 ndd flags for interface = 0x21081b Interface state STALE Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 192.168.201.143 broadcast 192.168.203.255 netmask 255.255.255.0 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.168.101.43 broadcast 0.0.0.0 netmask 0.0.0.0 Interface number 3 dpcom ifnet type = 0 ndd type = 305 Mac address length = 0 Mac address = 0.0.0.0.0.0 Smoothed rrt across interface = 750 Mean Deviation in network rrt across interface = 1500 Probe interval for interface = 22500 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state STALE Results: The results were the same when issuing the halt command instead of the halt -q command. 9.6 Simulations of CPU starvation In previous versions of PowerHA, CPU starvation could activate the deadman switch, leading the starved node to a halt with a consequent move of the resource groups. In PowerHA 7.1, the deadman switch no longer exists, and its functionality is accomplished at the kernel interruption level. This test shows how the absence of the deadman switch can influence cluster behavior. Scenario 1 This scenario shows the use of a stress tool on the CPU of one node with more than 50 processes in the run queue and a duration of 60 seconds. Overview This scenario consists of a hot-standby cluster configuration with participating nodes seoul and busan with only one Ethernet network. Each node has two Ethernet interfaces. The resource group is hosted on seoul, and solidDB is active on the busan node. A tool is run to stress the seoul CPU with more than 50 processes in the run queue with a duration of 60 seconds as shown in Example 9-18 on page 293. 292 IBM PowerHA SystemMirror 7.1 for AIX Example 9-18 Scenario testing the use of a stress tool on one node seoul:/ # lssrc -ls IBM.StorageRM | grep Leader Group Leader: busan, 0x564bc620973c9bdc, 1 seoul:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------db2pok_Resourc ONLINE seoul OFFLINE busan Beneath the lpartstat output header, you see the CPU and memory configuration for each node: Seoul: Power 6, type=Shared, mode=Uncapped, smt=On, lcpu=2, mem=3584MB, ent=0.50 Busan: Power 6, type=Shared, mode=Uncapped, smt=On, lcpu=2, mem=3584MB, ent=0.50 Results Before the test, the seoul node is running within an average of 3% of its entitled capacity. The run queue is within an average of three processes as shown in Example 9-19. Example 9-19 The vmstat result of the seoul node seoul:/ # vmstat 2 System configuration: lcpu=2 mem=3584MB ent=0.50 kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------------------r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec 2 0 424045 10674 0 0 0 0 0 0 92 1508 359 1 2 97 0 0.02 3.4 3 0 424045 10674 0 0 0 0 0 0 84 1001 346 1 1 97 0 0.02 3.1 3 0 424044 10675 0 0 0 0 0 0 88 1003 354 1 1 97 0 0.02 3.1 3 0 424045 10674 0 0 0 0 0 0 91 1507 352 1 2 97 0 0.02 3.5 3 0 424047 10672 0 0 0 0 0 0 89 1057 370 1 2 97 0 0.02 3.3 3 0 424064 10655 0 0 0 0 0 0 94 1106 379 1 2 97 0 0.02 3.6 During the test, the entitled capacity raised to 200%, and the run queue raised to an average of 50 processes as shown in Example 9-20. Example 9-20 Checking the node status after running the stress test seoul:/ # vmstat 2 System configuration: lcpu=2 mem=3584MB ent=0.50 kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------------------r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec 52 0 405058 167390 0 0 0 0 0 0 108 988 397 42 8 50 0 0.25 50.6 41 0 405200 167248 0 0 0 0 0 0 78 140 245 99 0 0 0 0.79 158.1 49 0 405277 167167 0 0 0 0 0 0 71 206 249 99 0 0 0 1.00 199.9 50 0 405584 166860 0 0 0 0 0 0 73 33 241 99 0 0 0 1.00 199.9 48 0 405950 166491 0 0 0 0 0 0 71 297 244 99 0 0 0 1.00 199.8 As expected, the CPU starvation did not trigger a resource group move from the seoul node to the busan node. The /var/adm/ras/syslog.caa log file reported messages about solidDB daemons being unable to communicate, but the leader node continued to be the busan node as shown in Example 9-21 on page 294. Chapter 9. Testing the PowerHA 7.1 cluster 293 Example 9-21 Status of the nodes after triggering a CPU starvation scenario seoul:/ # lssrc -ls IBM.StorageRM | grep Leader Group Leader: busan, 0x564bc620973c9bdc, 1 seoul:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------db2pok_Resourc ONLINE seoul OFFLINE busan Scenario 2 This scenario shows the use of a stress tool on the CPU of two nodes with more than 50 processes in the run queue and a duration of 60 seconds. Overview This scenario consists of a hot-standby cluster configuration with participating nodes seoul and busan with only one Ethernet network. Each node has two Ethernet interfaces. Both the resource group and the solidDB are active in busan node. A tool is run to stress the CPU of both nodes with more than 50 processes in the run queue with a duration of 60 seconds as shown in Example 9-22. Example 9-22 Scenario testing the use of a stress tool on both nodes seoul:/ # lssrc -ls IBM.StorageRM | grep Leader Group Leader: busan, 0x564bc620973c9bdc, 1 seoul:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------db2pok_Resourc OFFLINE seoul ONLINE busan Results Before the test, both nodes have a low run queue and low entitled capacity as shown in Example 9-23. Example 9-23 Results of the stress test in scenario two seoul:/ # vmstat 2 System configuration: lcpu=2 mem=3584MB ent=0.50 kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------------------r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec 1 0 389401 181315 0 0 0 0 0 0 95 1651 302 2 2 97 0 0.02 3.5 1 0 389405 181311 0 0 0 0 0 0 91 960 316 1 2 97 0 0.02 3.3 1 0 389406 181310 0 0 0 0 0 0 88 953 299 1 1 97 0 0.02 3.1 1 0 389408 181308 0 0 0 0 0 0 97 1461 301 1 2 97 0 0.02 3.5 1 0 389411 181305 0 0 0 0 0 0 109 967 326 1 3 96 0 0.02 4.7 busan:/ # vmstat 2 System configuration: lcpu=2 mem=3584MB ent=0.50 kthr memory page faults 294 IBM PowerHA SystemMirror 7.1 for AIX cpu ----r b 1 0 1 0 1 0 1 0 1 0 ----------- -----------------------avm fre re pi po fr sr cy 450395 349994 0 0 0 0 0 450395 349994 0 0 0 0 0 450395 349994 0 0 0 0 0 450395 349994 0 0 0 0 0 450395 349994 0 0 0 0 0 ------------ ----------------------in sy cs us sy id wa pc ec 0 77 670 363 1 2 97 0 0.02 3.4 0 80 477 359 1 1 98 0 0.02 3.1 0 80 554 369 1 1 97 0 0.02 3.4 0 73 479 368 1 1 98 0 0.02 3.1 0 81 468 339 1 1 98 0 0.01 2.9 During the test, the seoul node kept an average of 50 processes in the run queue and an entitled capacity of 200% as shown in Example 9-24. Example 9-24 Seoul node vmstat results during the test seoul:/ # vmstat 2 System configuration: lcpu=2 mem=3584MB ent=0.50 kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------------------r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec 43 0 371178 199534 0 0 0 0 0 0 74 312 251 99 0 0 0 1.00 199.8 52 0 371178 199534 0 0 0 0 0 0 73 19 247 99 0 0 0 1.00 200.0 52 0 371176 199534 0 0 0 0 0 0 75 108 249 99 0 0 0 1.00 199.9 47 0 371075 199635 0 0 0 0 0 0 74 33 257 99 0 0 0 1.00 200.1 The busan node did not respond to the vmstat command during the test. When the CPU stress finished, it could throw just one line of output showing a run queue of 119 processes (Example 9-25). Example 9-25 Busan node showing only one line of output busan:/ # vmstat 2 System configuration: lcpu=2 mem=3584MB ent=0.50 kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------------------119 0 450463 349911 0 0 0 0 0 0 56 19 234 99 0 0 0 0.50 99.6 Both the resource group and solidDB database did not move from the busan node as shown in Example 9-26. Example 9-26 Status of the busan node seoul:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------db2pok_Resourc OFFLINE seoul ONLINE busan seoul:/ # lssrc -ls IBM.StorageRM | grep Leader Group Leader: busan, 0x564bc620973c9bdc, 1 Conclusion The conclusion of this test is that eventual peak performance degradation events do not cause resource group moves and unnecessary outages. Chapter 9. Testing the PowerHA 7.1 cluster 295 9.7 Simulation of a Group Services failure This scenario consists of a hot-standby cluster configuration with participating nodes seoul and busan with only one Ethernet network. Each node has two Ethernet interfaces. We end the cthags process in the seoul node that was hosting the resource group. As a result, the seoul node halted as expected, and the resource group is acquired by the remaining node as shown in Example 9-27. Example 9-27 Resource group movement seoul:/ # lssrc -ls cthags Subsystem Group PID Status cthags cthags 5963978 active 5 locally-connected clients. Their PIDs: 6095070(IBM.ConfigRMd) 6357196(rmcd) 5963828(IBM.StorageRMd) 7471354(clstrmgr) 12910678(gsclvmd) HA Group Services domain information: Domain established by node 2 Number of groups known locally: 8 Number of Number of local Group name providers providers/subscribers rmc_peers 2 1 0 s00O3RA00009G0000015CDBQGFL 2 1 0 IBM.ConfigRM 2 1 0 IBM.StorageRM.v1 2 1 0 CLRESMGRD_1108531106 2 1 0 CLRESMGRDNPD_1108531106 2 1 0 CLSTRMGR_1108531106 2 1 0 d00O3RA00009G0000015CDBQGFL 2 1 0 Critical clients will be terminated if unresponsive seoul:/ # ps -ef | grep cthagsd | grep -v grep root 5963978 3866784 4 17:02:33 - 0:00 /usr/sbin/rsct/bin/hagsd cthags seoul:/ # kill -9 5963978 The seoul:/var/adm/ras/syslog.caa log file recorded the messages before the crash. You can observe that the seoul node was halted after 1 second as shown in Example 9-28. Example 9-28 Message in the syslog.caa file in the seoul node Sep 29 17:02:33 seoul daemon:err|error RMCdaemon[6357196]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6XqlQl0dZucA/POE1DK4e.1...................:::Reference ID: :::Template ID: b1731da3:::Details File: :::Location: RSCT,rmcd_gsi.c,1.50,10 48 :::RMCD_2610_101_ER Internal error. Error data 1 00000001 Error data 2 00000000 Error data 3 dispatch_gs Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 announcementCb: Called, state=ST_STABLE, provider token 1 Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 announcementCb: GsToken 3, AdapterToken 4, rm_GsToken 1 Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 announcementCb: GRPSVCS announcment code=512; exiting Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 CHECK FOR FAILURE OF RSCT SUBSYSTEMS (cthags) Sep 29 17:02:33 seoul daemon:err|error ConfigRM[6095070]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID: :::Template ID: 362b0a5f:::Details File: :::Location: RSCT,PeerDomain.C,1.99.1.519,21079:::CONFIGRM_EXIT_GS_ER The peer domain configuration manager daemon 296 IBM PowerHA SystemMirror 7.1 for AIX (IBM.ConfigRMd) is exiting due to the Group Services subsystem terminating. The configuration manager daemon will restart automatically, synchronize the nodes configuration with the domain and rejoin the domain if possible. Sep 29 17:02:34 seoul daemon:notice StorageRM[5963828]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID: :::Template ID: a8576c0d:::Details File: :::Location: RSCT,StorageRMDaemon.C,1.56,323 :::STORAGERM_STOPPED_ST IBM.StorageRM daemon has been stopped. Sep 29 17:02:34 seoul daemon:notice ConfigRM[6095070]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID: :::Template ID: de84c4db:::Details File: :::Location: RSCT,IBM.ConfigRMd.C,1.55,346 :::CONFIGRM_STARTED_STIBM.ConfigRM daemon has started. Sep 29 17:02:34 seoul daemon:notice snmpd[3342454]: NOTICE: lost peer (SMUX ::1+51812+5) Sep 29 17:02:34 seoul daemon:notice RMCdaemon[15663146]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6eKora0eZucA/Xuo/D K4e.1...................:::Reference ID: :::Template ID: a6df45aa:::Details File: :::Location: RSCT,rmcd.c,1.75,225:::RMCD_INFO_0_ST The daemon is started. Sep 29 17:02:34 seoul user:notice PowerHA SystemMirror for AIX: clexit.rc : Unexpected termination of clstrmgrES Sep 29 17:02:34 seoul user:notice PowerHA SystemMirror for AIX: clexit.rc : Halting system immediately!!! 9.8 Testing a Start After resource group dependency This test uses the example that was configured in 5.1.6, “Configuring Start After and Stop After resource group dependencies” on page 96. Figure 9-16 shows a summary of the configuration. The dependency configuration of the Start After resource group is tested to see whether it works as expected. Figure 9-16 Start After dependency between the apprg and dbrg resource group Chapter 9. Testing the PowerHA 7.1 cluster 297 9.8.1 Testing the standard configuration of a Start After resource group dependency Example 9-29 shows the state of a resource group pair after a normal startup of the cluster on both nodes. Example 9-29 clRGinfo for a Start After resource group pair sydney:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------dbrg ONLINE sydney OFFLINE perth apprg ONLINE perth OFFLINE sydney With both resource groups online, the source (dependent) apprg resource group can be brought offline and then online again. Alternatively, it can be gracefully moved to another node without any influence on the target dbrg resource group. With both resource groups online, the source (dependent) apprg resource group can be brought offline and then online again. Alternatively, it can be gracefully moved to another node without any influence on the target dbrg resource group. Target resource group can also be brought offline. However, to bring the source resource group online, the target resource group must be brought online manually (if it is offline). If you start the cluster only on the home node of the source resource group, the apprg resource group in this case, the cluster waits until the dbrg resource group is brought online as shown in Example 9-30. The startup policy is Online On Home Node Only for both resource groups. Example 9-30 Offline because the target is offline from clRGinfo sydney:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------dbrg OFFLINE sydney OFFLINE perth apprg OFFLINE due to target offlin perth OFFLINE sydney 9.8.2 Testing application startup with Startup Monitoring configured For this test, both resource groups are started on the same node. This way their application scripts logs messages in the same file so that you can see the detailed sequence of their start and finish moments. The home node is temporarily modified to sydney for both resource groups. Then the cluster is started only on the sydney node with both resource groups. 298 IBM PowerHA SystemMirror 7.1 for AIX Example 9-31 shows the start, stop, and monitoring scripts. Note the syslog configuration that was made to log the messages through the local7 facility in the /var/hacmp/log/StartAfter_cluster.log file. Example 9-31 Dummy start, stop, and monitoring scripts sydney:/HA71 # ls app_mon.sh app_stop.sh app_start.sh db_mon.sh db_start.sh db_stop.sh sydney:/HA71 # cat db_start.sh #!/bin/ksh fp="local7.info" file="`expr "//$0" : '.*/\([^/]*\)'`" # cleanup if [ -f /dbmp/db.lck ]; then rm /dbmp/db.lck; fi logger -t"$file" -p$fp "Starting up DB... " sleep 50 echo "DB started at:\n\t`date`">/dbmp/db.lck logger -t"$file" -p$fp "DB is running!" exit 0 sydney:/HA71 # cat db_stop.sh #!/bin/ksh fp="local7.info" file="`expr "//$0" : '.*/\([^/]*\)'`" logger -t"$file" -p$fp "Shutting down DB... " sleep 20 # cleanup if [ -f /dbmp/db.lck ]; then rm /dbmp/db.lck; fi logger -t"$file" -p$fp "DB stopped!" exit 0 sydney:/HA71 # cat db_mon.sh #!/bin/ksh fp="local7.info" file="`expr "//$0" : '.*/\([^/]*\)'`" if [ -f /dbmp/db.lck ]; then logger -t"$file" -p$fp "DB is running!" exit 0 fi logger -t"$file" -p$fp "DB is NOT running!" exit 1 sydney:/HA71 # cat app_start.sh #!/bin/ksh fp="local7.info" file="`expr "//$0" : '.*/\([^/]*\)'`" # cleanup if [ -f /appmp/app.lck ]; then rm /appmp/app.lck; fi logger -t"$file" -p$fp "Starting up APP... " sleep 10 Chapter 9. Testing the PowerHA 7.1 cluster 299 echo "APP started at:\n\t`date`">/appmp/app.lck logger -t"$file" -p$fp "APP is running!" exit 0 sydney:/HA71 # cat app_stop.sh #!/bin/ksh fp="local7.info" file="`expr "//$0" : '.*/\([^/]*\)'`" logger -t"$file" -p$fp "Shutting down APP... " sleep 2 # cleanup if [ -f /appmp/app.lck ]; then rm /appmp/app.lck; fi logger -t"$file" -p$fp "APP stopped!" exit 0 sydney:/HA71 # cat app_mon.sh #!/bin/ksh fp="local7.info" file="`expr "//$0" : '.*/\([^/]*\)'`" if [ -f /appmp/app.lck ]; then logger -t"$file" -p$fp "APP is running!" exit 0 fi logger -t"$file" -p$fp "APP is NOT running!" exit 1 sydney:/HA71 # grep local7 /etc/syslog.conf local7.info /var/hacmp/log/StartAfter_cluster.log rotate size 256k files 4 Without Startup Monitoring, the APP startup script is launched before the DB startup script returns as shown Example 9-32. Example 9-32 Startup sequence without Startup monitoring mode ... Oct Oct Oct Oct Oct Oct Oct Oct ... 300 12 12 12 12 12 12 12 12 07:53:26 07:53:27 07:53:36 07:53:37 07:53:47 07:53:53 07:54:17 07:54:23 sydney sydney sydney sydney sydney sydney sydney sydney local7:info local7:info local7:info local7:info local7:info local7:info local7:info local7:info IBM PowerHA SystemMirror 7.1 for AIX db_mon.sh: DB is NOT running! db_start.sh: Starting up DB... app_mon.sh: APP is NOT running! app_start.sh: Starting up APP... app_start.sh: APP is running! app_mon.sh: APP is running! db_start.sh: DB is running! app_mon.sh: APP is running! With Startup Monitoring, the APP startup script is launched after the DB startup script returns, as shown in Example 9-33, and as expected. Example 9-33 Startup sequence with Startup Monitoring ... Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct ... 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 08:02:38 08:02:39 08:02:39 08:02:45 08:02:51 08:02:57 08:03:03 08:03:09 08:03:15 08:03:21 08:03:27 08:03:29 08:03:33 08:03:49 08:03:50 08:04:00 sydney sydney sydney sydney sydney sydney sydney sydney sydney sydney sydney sydney sydney sydney sydney sydney local7:info local7:info local7:info local7:info local7:info local7:info local7:info local7:info local7:info local7:info local7:info local7:info local7:info local7:info local7:info local7:info db_mon.sh: DB is NOT running! db_start.sh: Starting up DB... db_mon.sh: DB is NOT running! db_mon.sh: DB is NOT running! db_mon.sh: DB is NOT running! db_mon.sh: DB is NOT running! db_mon.sh: DB is NOT running! db_mon.sh: DB is NOT running! db_mon.sh: DB is NOT running! db_mon.sh: DB is NOT running! db_mon.sh: DB is NOT running! db_start.sh: DB is running! db_mon.sh: DB is running! app_mon.sh: APP is NOT running! app_start.sh: Starting up APP... app_start.sh: APP is running! Example 9-34 shows the state change of the resource groups during this startup. Example 9-34 Resource group state during startup sydney:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------dbrg OFFLINE sydney OFFLINE perth apprg OFFLINE sydney OFFLINE perth sydney:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------dbrg ACQUIRING sydney OFFLINE perth apprg TEMPORARY ERROR sydney OFFLINE perth sydney:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------dbrg ONLINE sydney OFFLINE perth apprg ACQUIRING OFFLINE sydney perth Chapter 9. Testing the PowerHA 7.1 cluster 301 sydney:/ # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------dbrg ONLINE sydney OFFLINE perth apprg ONLINE sydney OFFLINE perth 9.9 Testing dynamic node priority This test has the algeria, brazil, and usa nodes, and one resource group in the cluster as shown in Figure 9-17. This resource group is configured to fail over based on a script return value. The DNP.sh script returns different values for each node. For details about configuring the dynamic node priority (DNP), see 5.1.8, “Configuring the dynamic node priority (adaptive failover)” on page 102. Figure 9-17 Dynamic node priority test environment Table 9-1 provides the cluster details. Table 9-1 Cluster details 302 Field Value Resource name algeria_rg Participating nodes algeria, brazil, usa Dynamic node priority policy cl_lowest_nonzero_udscript_rc IBM PowerHA SystemMirror 7.1 for AIX Field Value DNP script path /usr/IBM/HTTPServer/bin/DNP.sh DNP script timeout value 20 The default node priority is algeria first, then brazil, and then usa. The usa node gets the lowest return value from DNP.sh. When a resource group failover is triggered, the algeria_rg resource group is moved to the usa node, because the return value is the lowest one as shown in Example 9-35. Example 9-35 Expected return value for each nodes usa:/ # clcmd cat /usr/IBM/HTTPServer/bin/DNP.sh ------------------------------NODE usa ------------------------------exit 100 ------------------------------NODE brazil ------------------------------exit 105 ------------------------------NODE algeria ------------------------------exit 103 When the resource group fails over, algeria_rg moves from the algeria node to the usa node, which has the lowest return value in DNP.sh as shown in Figure 9-18. # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------algeria_rg ONLINE algeria OFFLINE brazil OFFLINE usa # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------algeria_rg OFFLINE algeria OFFLINE brazil ONLINE usa Figure 9-18 clRGinfo of before and after takeover Chapter 9. Testing the PowerHA 7.1 cluster 303 Then the DNP.sh script is modified to set brazil with the lowest return value as shown in Example 9-36. Example 9-36 Changing the DNP.sh file usa:/ # clcmd cat /usr/IBM/HTTPServer/bin/DNP.sh ------------------------------NODE usa ------------------------------exit 100 ------------------------------NODE brazil ------------------------------exit 101 ------------------------------NODE algeria ------------------------------exit 103 Upon resource group failover, the resource group moves to brazil as long as it has the lowest return value among the cluster nodes this time as shown in Figure 9-19. # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------algeria_rg OFFLINE algeria OFFLINE brazil ONLINE usa # clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------algeria_rg OFFLINE algeria ONLINE brazil OFFLINE usa Figure 9-19 Resource group moving To simplify the test scenario, DNP.sh is defined to simply return a value. In a real situation, you can replace this DNP.sh sample file with any customized script. Then, node failover is done based upon the return value of your own script. 304 IBM PowerHA SystemMirror 7.1 for AIX 10 Chapter 10. Troubleshooting PowerHA 7.1 This chapter shares the experiences of the writers of this IBM Redbooks publication and the lessons learned in all the phases of implementing PowerHA 7.1 to help you troubleshoot your migration, installation, configuration, and Cluster Aware AIX (CAA). This chapter includes the following topics: Locating the log files Troubleshooting the migration Troubleshooting the installation and configuration Troubleshooting problems with CAA © Copyright IBM Corp. 2011. All rights reserved. 305 10.1 Locating the log files This section explains where you can find the various log files in your PowerHA cluster to assist in managing problems with CAA and PowerHA. 10.1.1 CAA log files You can check the CAA clutils log file and the syslog file for error messages as explained in the following sections. The clutils file If you experience a problem with an operation, such as creating a cluster in CAA, check the /var/hacmp/log/clutils.log log file. The syslog facility The CAA service uses the syslog facility to log errors and debugging information. All CAA messages are written to the /var/adm/ras/syslog.caa file. For verbose logging information, you must enable debug mode by editing the /etc/syslog.conf configuration file and adding the following line as shown in Figure 10-1: *.debug /tmp/syslog.out rotate size 10m files 10 local0.crit /dev/console local0.info /var/hacmp/adm/cluster.log user.notice /var/hacmp/adm/cluster.log daemon.notice /var/hacmp/adm/cluster.log *.info /var/adm/ras/syslog.caa rotate size 1m files 10 *.debug /tmp/syslog.out rotate size 10m files 10 Figure 10-1 Extract from the /etc/syslog.conf file After you make this change, verify that a syslog.out file is in the in /tmp directory. If this file is not in the directory, create one by entering the touch /tmp/syslog.out command. After you create the file, refresh the syslog daemon by issuing the refresh -s syslogd command. When debug mode is enabled, you capture detailed debugging information in the /tmp/syslog.out file. This information can assist you in troubleshooting problems with commands, such as the mkcluster command during cluster migration. 10.1.2 PowerHA log files The following PowerHA log files are most commonly used: /var/hacmp/adm/cluster.log One of the main sources of information for the administrator. This file tracks time-stamped messages of all PowerHA events, scripts, and daemons. /var/hacmp/log/hacmp.out Along with cluster.log file, this file is the most important source of information. Recent PowerHA releases are sending more details to this log file, including summaries of events and the location of resource groups. 306 IBM PowerHA SystemMirror 7.1 for AIX /var/log/clcomd/clcomd.log Includes information about communication that is exchanged among all the cluster nodes. Increasing the verbose logging level You can increase the verbose logging level in PowerHA by enabling the export VERBOSE_LOGGING=high setting. This setting enables a high level of logging for PowerHA. The result is that you see more information in the log files when this variable is exported in such logs as the hacmp.out and clmigcheck.log files. Listing the PowerHA log files by using the clmgr utility One of the common ways to have a list of all PowerHA log files is to use the clmgr command-line utility. First you run the clmgr view log command to access a list of the available logs as shown in Example 10-1. Then you run the clmgr view log logname command replacing logname with the log that you want to analyze. Example 10-1 Generating a list of PowerHA log files with the clmgr utility seoul:/ # clmgr view log ERROR: """" does not appear to exist! Available Logs: autoverify.log cl2siteconfig_assist.log cl_testtool.log clavan.log clcomd.log clcomddiag.log clconfigassist.log clinfo.log clstrmgr.debug clstrmgr.debug.long cluster.log cluster.mmddyyyy clutils.log clverify.log cspoc.log cspoc.log.long cspoc.log.remote dhcpsa.log dnssa.log domino_server.log emuhacmp.out hacmp.out ihssa.log migration.log sa.log sax.log seoul:/ # clmgr view log cspoc.log | more Warning: no options were provided for log "cspoc.log". Defaulting to the last 500 lines. 09/21/10 10:23:09 seoul: success: clresactive -v datavg 09/21/10 10:23:10 seoul: success: /usr/es/sbin/cluster/cspoc/clshowfs2 datavg 09/21/10 10:23:29 [========== C_SPOC COMMAND LINE ==========] Chapter 10. Troubleshooting PowerHA 7.1 307 09/21/10 10:23:29 /usr/es/sbin/cluster/sbin/cl_chfs -cspoc -nseoul,busan -FM -a size=+896 -A no /database/logdir 09/21/10 10:23:29 busan: success: clresactive -v datavg 09/21/10 10:23:29 seoul: success: clresactive -v datavg 09/21/10 10:23:30 seoul: success: eval LC_ALL=C lspv 09/21/10 10:23:35 seoul: success: chfs -A no -a size="+1835008" /database/logdir 09/21/10 10:23:36 seoul: success: odmget -q 'attribute = label and value = /database/logdir' CuAt 09/21/10 10:23:37 busan: success: eval varyonvg -n -c -A datavg ; imfs -lx lvdata09 ; imfs -l lvdata09; varyonvg -n -c -P datavg. 10.2 Troubleshooting the migration This section offers a collection of problems and solutions that you might encounter when migration testing. The information is based on the experience of the writers of this Redbooks publication. 10.2.1 The clmigcheck script The clmigcheck script writes all activity to the /tmp/clmigcheck.log file (Figure 10-2). Therefore, you must first look in this file for an error message if you run into any problems with the clmigcheck utility. mk_cluster: ERROR: Problems encountered creating the cluster in AIX. Use the syslog facility to see output from the mkcluster command. Error termination on: Wed Sep 22 15:47:43 EDT 2010 Figure 10-2 Output from the clmigcheck.log file 10.2.2 The ‘Cluster still stuck in migration’ condition When migration is completed, you might not progress to the update of the Object Data Manager (ODM) entries until the node_up event is run on the last node of the cluster. If you have this problem, start the node to see if this action completes the migration protocol and updates the version numbers correctly. For PowerHA 7.1, the version number must be 12 in the HACMPcluster class. You can verify this number by running odmget as shown in example 7-51. If the version number is less than 12, you are still stuck in migration and must call IBM support. 10.2.3 Existing non-IP networks The following section provides details about problems with existing non-IP networks that are not removed. It describes a possible workaround to remove disk heartbeat networks if they were not deleted as part of the migration process. 308 IBM PowerHA SystemMirror 7.1 for AIX After the migration, the output of the cltopinfo command might still show the disk heartbeat network as shown in Example 10-2. Example 10-2 The cltopinfo command with the disk heartbeat still being displayed berlin:/ # cltopinfo Cluster Name: de_cluster Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No Repository Disk: caa_private0 Cluster IP Address: There are 2 node(s) and 3 network(s) defined NODE berlin: Network net_diskhb_01 berlin_hdisk1_01 /dev/hdisk1 Network net_ether_01 berlin 192.168.101.141 Network net_ether_010 alleman 10.168.101.142 german 10.168.101.141 berlinb1 192.168.200.141 berlinb2 192.168.220.141 NODE munich: Network net_diskhb_01 munich_hdisk1_01 /dev/hdisk1 Network net_ether_01 munich 192.168.101.142 Network net_ether_010 alleman 10.168.101.142 german 10.168.101.141 munichb1 192.168.200.142 munichb2 192.168.220.142 Resource Group http_rg Startup Policy Online On Home Node Only Fallover Policy Fallover To Next Priority Node In The List Fallback Policy Never Fallback Participating Nodes munich berlin Service IP Label alleman Resource Group nfs_rg Startup Policy Online On Home Node Only Fallover Policy Fallover To Next Priority Node In The List Fallback Policy Fallback To Higher Priority Node In The List Participating Nodes berlin munich Service IP Label german Chapter 10. Troubleshooting PowerHA 7.1 309 To remove the disk heartbeat network, follow these steps: 1. Stop PowerHA on all cluster nodes. You must perform this action because the removal does not work in a running cluster. Figure 10-3 shows the error message that is received when trying to remove the network in an active cluster. COMMAND STATUS Command: failed stdout: yes stderr: no Before command completion, additional instructions may appear below. cldare: Migration from PowerHA SystemMirror to PowerHA SystemMirror/ES detected. A DARE event cannot be run until the migration has completed. F1=Help F8=Image n=Find Next F2=Refresh F9=Shell F3=Cancel F10=Exit Figure 10-3 Cluster synchronization error message 310 IBM PowerHA SystemMirror 7.1 for AIX F6=Command /=Find 2. Remove the network: a. Follow the path smitty sysmirror Cluster Nodes and Networks Manage Networks and Network Interfaces Networks Remove a Network. b. On the SMIT panel, similar to the one shown in Figure 10-4, select the disk heartbeat network that you want to remove. You might have to repeat these steps if you have more than one disk heartbeat network. Networks Move cursor to desired item and press Enter. Add a Network Change/Show a Network Remove a Network +--------------------------------------------------------------------------+ | Select a Network to Remove | | | | Move cursor to desired item and press Enter. | | | | net_diskhb_01 | | net_ether_01 (192.168.100.0/22) | | net_ether_010 (10.168.101.0/24 192.168.200.0/24 192.168.220.0/24) | | | | F1=Help F2=Refresh F3=Cancel | | F8=Image F10=Exit Enter=Do | F1| /=Find n=Find Next | F9+--------------------------------------------------------------------------+ Figure 10-4 Removing the disk heartbeat network 3. Synchronize your cluster by selecting the path: smitty sysmirror Custom Cluster Configuration Verify and Synchronize Cluster Configuration (Advanced). 4. See if the network is deleted by using the cltopinfo command as shown in Example 10-3. Example 10-3 Output of the cltopinfo command after removing the disk heartbeat network berlin:/ # cltopinfo Cluster Name: de_cluster Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No Repository Disk: caa_private0 Cluster IP Address: There are 2 node(s) and 2 network(s) defined NODE berlin: Network net_ether_01 berlin 192.168.101.141 Network net_ether_010 german 10.168.101.141 alleman 10.168.101.142 berlinb1 192.168.200.141 Chapter 10. Troubleshooting PowerHA 7.1 311 berlinb2 192.168.220.141 NODE munich: Network net_ether_01 munich 192.168.101.142 Network net_ether_010 german 10.168.101.141 alleman 10.168.101.142 munichb1 192.168.200.142 munichb2 192.168.220.142 Resource Group http_rg Startup Policy Online On Home Node Only Fallover Policy Fallover To Next Priority Node In The List Fallback Policy Never Fallback Participating Nodes munich berlin Service IP Label alleman Resource Group nfs_rg Startup Policy Online On Home Node Only Fallover Policy Fallover To Next Priority Node In The List Fallback Policy Fallback To Higher Priority Node In The List Participating Nodes berlin munich Service IP Label german berlin:/ # 5. Start PowerHA on all your cluster nodes by running the smitty cl_start command. 10.3 Troubleshooting the installation and configuration This section explains how you can recover from various installation and configuration problems on CAA and PowerHA. 10.3.1 The clstat and cldump utilities and the SNMP After installing and configuring PowerHA 7.1 in AIX 7.1, the clstat and cldump utilities do not work. If you experience this problem, convert the SNMP from version 3 to version 1. See Example 10-4 for all the steps to correct this problem. Example 10-4 The clstat utility not working under SNMP V3 seoul:/ # clstat -a Failed retrieving cluster information. There are a number of possible causes: clinfoES or snmpd subsystems are not active. snmp is unresponsive. snmp is not configured correctly. Cluster services are not active on any nodes. Refer to the HACMP Administration Guide for more information. seoul:/ # stopsrc -s snmpd 0513-044 The snmpd Subsystem was requested to stop. seoul:/ # ls -ld /usr/sbin/snmpd 312 IBM PowerHA SystemMirror 7.1 for AIX lrwxrwxrwx snmpdv3ne 1 root system 9 Sep 15 22:17 /usr/sbin/snmpd -> seoul:/ # /usr/sbin/snmpv3_ssw -1 Stop daemon: snmpmibd In /etc/rc.tcpip file, comment out the line that contains: snmpmibd In /etc/rc.tcpip file, remove the comment from the line that contains: dpid2 Make the symbolic link from /usr/sbin/snmpd to /usr/sbin/snmpdv1 Make the symbolic link from /usr/sbin/clsnmp to /usr/sbin/clsnmpne Start daemon: dpid2 seoul:/ # ls -ld /usr/sbin/snmpd lrwxrwxrwx 1 root system /usr/sbin/snmpdv1 17 Sep 20 09:49 /usr/sbin/snmpd -> seoul:/ # startsrc -s snmpd 0513-059 The snmpd Subsystem has been started. Subsystem PID is 8126570. 10.3.2 The /var/log/clcomd/clcomd.log file and the security keys You might find that you cannot start the clcomd daemon and its log file has messages indicating problems with the security keys as shown in Example 10-5. Example 10-5 The clcomd daemon indicating problems with the security keys 2010-09-23T00:02:07.983104: WARNING: Cannot read the key /etc/security/cluster/key_md5_des 2010-09-23T00:02:07.985975: WARNING: Cannot read the key /etc/security/cluster/key_md5_3des 2010-09-23T00:02:07.986082: WARNING: Cannot read the key /etc/security/cluster/key_md5_aes This problem means that the /etc/cluster/rhosts file is not completed correctly. On all cluster nodes, edit this file by using the IP addresses as the communication paths during cluster definition, before the first synchronization. Use the host name as the persistent address and the communication path. Then add the persistent addresses to the /etc/cluster/rhosts file. Finally, issue the startsrc -s clcomd command. 10.3.3 The ECM volume group When creating an ECM volume group by using the PowerHA C-SPOC menus, the administrator receives the message shown in Example 10-6 about the inability to create the group. Example 10-6 Error messages when trying to create an ECM volume group using C-SPOC seoul: 0516-1335 mkvg: This system does not support enhanced concurrent capable seoul: volume groups. seoul: 0516-862 mkvg: Unable to create volume group. seoul: cl_rsh had exit code = 1, see cspoc.log and/or clcomd.log for more information cl_mkvg: An error occurred executing mkvg appvg on node seoul In /var/hacmp/log/cspoc.log, the messages are: Chapter 10. Troubleshooting PowerHA 7.1 313 09/14/10 17:41:40 [========== C_SPOC COMMAND LINE ==========] 09/14/10 17:41:40 /usr/es/sbin/cluster/sbin/cl_mkvg -f -n -B -cspoc -nseoul,busan -rdatarg -y datavg -s32 -V100 -lfalse E 00c0f6a0107734ea 00c0f6a010773532 00c0f6a0fed38de6 00c0f6a0fed3d324 00c0f6a0fed3ef8f 09/14/10 17:41:40 busan: success: clresactive -v datavg 09/14/10 17:41:40 seoul: success: clresactive -v datavg 09/14/10 17:41:41 cl_mkvg: cl_mkvg: An error occurred executing mkvg datavg on node seoul 09/14/10 17:41:41 seoul: FAILED: mkvg -f -n -B -y datavg -s 32 -V 100 -C cldisk4 cldisk3 cldisk1 cldisk2 cldisk5 09/14/10 17:41:41 seoul: 0516-1335 mkvg: This system does not support enhanced concurrent capable 09/14/10 17:41:41 seoul: volume groups. 09/14/10 17:41:41 seoul: 0516-862 mkvg: Unable to create volume group. 09/14/10 17:41:41 seoul: RETURN_CODE=1 09/14/10 17:41:41 seoul: cl_rsh had exit code = 1, see cspoc.log and/or clcomd.log for more information 09/14/10 17:41:42 seoul: success: cl_vg_fence_init datavg rw cldisk4 cldisk3 cldisk1 cldisk2 cldisk5 In this case, install the bos.clvm.enh file set and any fixes for this file set for the system to stay in a consistent version state. 10.3.4 Communication path If your cluster node communication path is misconfigured, you might see an error message similar to the one shown in Figure 10-5. ------------[ PowerHA SystemMirror Migration Check ]------------ERROR: Communications Path for node brazil must be set to hostname Hit <Enter> to continue ERROR: Figure 10-5 clmigcheck error for communication path 314 IBM PowerHA SystemMirror 7.1 for AIX If you see an error for communication path while running the clmigcheck program, verify that the /etc/hosts file includes the communication path for the cluster. Also check the communication path in the HACMPnode ODM class as shown in Figure 10-6. algeria:/ # odmget HACMPnode | grep -p COMMUNICATION HACMPnode: name = "algeria" object = "COMMUNICATION_PATH" value = "algeria" node_id = 1 node_handle = 1 version = 12 HACMPnode: name = "brazil" object = "COMMUNICATION_PATH" value = "brazil" node_id = 3 node_handle = 3 version = 12 Figure 10-6 Communication path definition at HACMPnode.odm Because the clmigcheck program is a ksh script, certain profiles can cause a similar problem. If the problem persists after you correct the /etc/hosts configuration file, try to remove the contents of the kshrc file because it might be affecting the behavior of the clmigcheck program. If your /etc/cluster/rhosts program is not configured properly, you see an error message similar to the one shown in Figure 10-7. The /etc/cluster/rhosts file must contain the fully qualified domain name of each node in the cluster (that is, the output from the host name command). After changing the /etc/cluster/rhosts file, run the stopsrc and startsrc commands on the clcomd subsystem. brazil:/ # clmigcheck lslpp: Fileset hageo* not installed. rshexec: cannot connect to node algeria ERROR: Internode communication failed, check the clcomd.log file for more information. brazil:/ # clrsh algeria date connect: Connection refused rshexec: cannot open socket Figure 10-7 The clcomd error message You can also check clcomd communication by using the clrsh command as shown in Figure 10-8. algeria:/ # clrsh algeria date Mon Sep 27 11:14:12 EDT 2010 algeria:/ # clrsh brazil date Mon Sep 27 11:14:15 EDT 2010 Figure 10-8 Checking the clcomd connection Chapter 10. Troubleshooting PowerHA 7.1 315 10.4 Troubleshooting problems with CAA In this chapter, we discuss various problems that you could encounter on configuration or installation of CAA, and provide recovery steps. 10.4.1 Previously used repository disk for CAA When defining a PowerHA cluster, you must define a disk to use as the repository for the CAA. If the specified disk was used previously as a repository by another cluster, upon synchronizing the cluster, you receive a message in the /var/adm/ras/syslog.caa file (or another file defined in /etc/syslog.conf). Example 10-7 shows the message that you receive. Example 10-7 CAA error message in the /var/adm/ras/syslog.caa file Sep 16 08:58:14 seoul user:err|error syslog: validate_device: Specified device, hdisk1, is a repository. Sep 16 08:58:14 seoul user:warn|warning syslog: To force cleanup of this disk, use rmcluster -r hdisk1 Example 10-8 shows the exact error message saved in the smit.log file. Example 10-8 CAA errors in the smit.log file ERROR: Problems encountered creating the cluster in AIX. Use the syslog facility to see output from the mkcluster command. ERROR: Creating the cluster in AIX failed. Check output for errors in local cluster configuration, correct them, and try synchronization again. The message includes the solution as shown in Example 10-7. You run the rmcluster command as shown in Example 10-9 to remove all CAA structures from the specified disk. Example 10-9 Removing CAA structures from a disk seoul:/ # rmcluster -r hdisk1 This operation will scrub hdisk1, removing any volume groups and clearing cluster identifiers. If another cluster is using this disk, that cluster will be destroyed. Are you sure? (y/[n]) y remove_cluster_repository: Couldn't get cluster repos lock. remove_cluster_repository: Force continue. After you issue the rmcluster command, the administrator can synchronize the cluster again. Tip: After running the rmcluster command, verify that the caa_private0 disk has been unconfigured and is not seen on other nodes. Run the lqueryvg -Atp command against the repository disk to ensure that the volume group definition is removed from the disk. If you encounter problems with the rmcluster command, see “Removal of the volume group when the rmcluster command does not” on page 320 for information about how to manually remove the volume group. 316 IBM PowerHA SystemMirror 7.1 for AIX 10.4.2 Repository disk replacement The information to replace a repository disk is currently only available in the /usr/es/sbin/cluster/README7.1.0.UPDATE file. However, the following information has been provided to assist you in solving this problem: 1. If necessary, add a new disk and ensure that it is recognized by AIX. The maximum size required is 10 GB. The disk must be zoned and masked to all cluster nodes. 2. Identify the current repository disk. You can use any of the following commands to obtain this information: lspv | grep caa_private cltopinfo lscluster -d 3. Stop cluster services on all nodes. Either bring resource groups offline or place them in an unmanaged state. 4. Remove the CAA cluster by using the following command: rmcluster -fn clustername 5. Verify that the AIX cluster is removed by running the following command in each node: lscluster -m 6. If the CAA cluster is still present, run the following command in each node: clusterconf -fu 7. Verify that the cluster repository is removed by using the lspv command. The repository disk (see step 2) must not belong to any volume group. 8. Define a new repository disk by following the path: smitty sysmirror Cluster Nodes and Networks Initial Cluster Setup (Typical) Define Repository Disk and Cluster IP Address. 9. Verify and synchronize the PowerHA cluster: #smitty cm_ver_and_sync 10.Verify that the AIX cluster is recreated by running the following command: #lscluster -m 11.Verify that the repository disk has changed by running the following command: lspv | grep caa_private 12.Start cluster services on all nodes: smitty cl_start 10.4.3 CAA cluster after the node restarts In some cases, the CAA cluster disappears after a system reboot or halt. If you encounter this situation, try the following solutions: Wait 10 minutes. If you have another node in your cluster, the clconfd daemon checks for nodes that need to join or sync up. It wakes up every 10 minutes. If the previous method does not work, run the clusterconf command manually. This solution works only if the system is aware of the repository disk location. You can check it by running the lsattr -El cluster0 command. Chapter 10. Troubleshooting PowerHA 7.1 317 See if clvdisk contains the repository disk UUID. Otherwise, you see the clusterconf error message as shown in Example 10-10. Example 10-10 The clusterconf error message riyad:/ # clusterconf -v _find_and_load_repos(): No repository candidate found. leave_sinc: Could not get cluster disk names from cache file /etc/cluster/clrepos_cache: No such file or directory leave_sinc: Could not find cluster disk names. Manually define the repository disk by using the following command: clusterconf -vr caa_private0 If you know that the repository disk is available, and you know that your node is listed in the configuration on the repository disk, use the -s flag on the clusterconf command to do a search for it. This utility examines all locally visible hard disk drives to find the repository disk. 10.4.4 Creation of the CAA cluster You might encounter an error message about creating the CAA cluster when the clmigcheck utility is run. You might also see such a message when trying to install PowerHA for the first time or when creating a CAA cluster configuration. Depending on whether you are doing a migration or a new configuration, you either see a problem in the clmigcheck.log file or on the verification of your cluster. One of the error messages that you see is “ERROR: Problems encountered creating the cluster in AIX.” This message indicates a problem with creating the CAA cluster. The clmigcheck program calls the mkcluster command to create the CAA cluster, which is what you must look for in the logs. To proceed with the troubleshooting, enable the syslog debugging as discussed in 10.2.1, “The clmigcheck script” on page 308. Incorrect entries in the /etc/filesystems file When the CAA cluster is created, the cluster creates a caavg_private volume group and the associated file systems for CAA. This information is kept in the /var/adm/ras/syslog.caa log file. Any problems that you face when running the mkcluster command are also logged in the /var/hacmp/clutils.log file. If you encounter a problem when creating your cluster, check these log files to ensure that the volume group and file systems are created without any errors. 318 IBM PowerHA SystemMirror 7.1 for AIX Figure 10-9 shows the contents of caavg_private volume group. # lsvg -l caavg_private caavg_private: LV NAME TYPE caalv_private1 boot caalv_private2 boot caalv_private3 boot fslv00 jfs2 fslv01 jfs2 powerha_crlv boot LPs 1 1 4 4 4 1 PPs 1 1 4 4 4 1 PVs 1 1 1 1 1 1 LV STATE closed/syncd closed/syncd open/syncd open/syncd closed/syncd closed/syncd MOUNT POINT N/A N/A N/A /clrepos_private1 /clrepos_private2 N/A Figure 10-9 Contents of the caavg_private volume group Figure 10-10 shows a crfs failure while creating the CAA cluster. This problem was corrected by removing incorrect entries in the /etc/filesystems file. Likewise, problems can happen when you already have the same logical volume name that must be used by the CAA cluster, for example. Sep 29 15:50:49 riyad user:info cluster[9437258]: stdout: caalv_private3 Sep 29 15:50:49 riyad user:info cluster[9437258]: stderr: Sep 29 15:50:49 riyad user:info cluster[9437258]: cl_run_log_method: '/usr/lib/cluster/clreposfs ' returned 1 Sep 29 15:50:49 riyad user:info cluster[9437258]: stdout: Sep 29 15:50:49 riyad user:info cluster[9437258]: stderr: crfs: /clrepos_private1 file system already exists '/usr/sbin/crfs -v jfs2 -m /clrepos_private1 -g caavg_private -a options=dio -a logname=INLINE -a size=256M' failed with rc=1 Sep 29 15:50:49 riyad user:err|error cluster[9437258]: cluster_repository_init: create_clv failed Sep 29 15:50:49 riyad user:info cluster[9437258]: cl_run_log_method: '/usr/sbin/varyonvg -b -u caavg_private' returned 0 Figure 10-10 The syslog.caa entries after a failure during CAA creation Tip: When you look at the syslog.caa file, focus on the AIX commands (such as mkvg, mklv, and crfs) and their returned values. If you find non-zero return values, a problem exists. Chapter 10. Troubleshooting PowerHA 7.1 319 10.4.5 Volume group name already in use A volume group that is already in use can cause the error message discussed in 10.4.4, “Creation of the CAA cluster” on page 318. When you encounter the error message, enable syslog debugging. The /tmp/syslog.out log file has the entries shown in Figure 10-11. Sep 23 11:46:09 chile user:info cluster[21037156]: cl_run_log_method: '/usr/sbin/mkvg -f -y caavg_private -s 64 caa_private0' returned 1 Sep 23 11:46:09 chile user:info cluster[21037156]: stdout: Sep 23 11:46:09 chile user:info cluster[21037156]: stderr: 0516-360 /usr/sbin/mkvg: The device name is already used; choose a different name. Sep 23 11:46:09 chile user:err|error cluster[21037156]: cluster_repository_init: create_cvg failed Figure 10-11 Extract from the syslog.out file You can see that the volume group creation failed because the name is already in use. This problem can happen for several reasons. Ffor example, it can occur if the disk was previously used as the CAA repository or the disk had the volume group descriptor area (VGDA) information of other volume group in it. Disk previously used by CAA volume group or third party If the disk was previously used by CAA or AIX, you can recover from this situation by running the following command: rmcluster -r hdiskx For the full sequence of steps, see 10.4.1, “Previously used repository disk for CAA” on page 316. If you find that the rmcluster command has not removed your CAA definition from the disk, use the steps in the following section, “Removal of the volume group when the rmcluster command does not.” Removal of the volume group when the rmcluster command does not In this situation, you must use the Logical Volume Manager (LVM) commands, which you can do in one of two ways. The easiest method is to import the volume group, vary on the volume group, and then reduce it so that the VGDA is removed from the disk. If this method does not work, use the dd command to overwrite special areas of the disk. Tip: Make sure that the data contained on the disk is not needed because usage of the following steps destroys the volume group data on the disk. Removing the VGDA from the disk This method involves importing the volume group from the disk and reducing it from the volume group to remove the VGDA information without losing the PVID. If you are able to import the volume group, activate it by using the varyonvg command: # varyonvg vgname If the activation fails, run the exportvg command to remove the volume group definition from the ODM. Then try to import it with a different name as follows: # exportvg vgname # importvg -y new-vgname hdiskx 320 IBM PowerHA SystemMirror 7.1 for AIX If you cannot activate the imported volume group, use the reducevg command as shown in Example 10-12. reducevg -df test_vg caa_private0 0516-1246 rmlv: If caalv_private1 is the boot logical volume, please run 'chpv -c <diskname>' as root user to clear the boot record and avoid a potential boot off an old boot image that may reside on the disk from which this logical volume is moved/removed. rmlv: Logical volume caalv_private1 is removed. 0516-1246 rmlv: If caalv_private2 is the boot logical volume, please run 'chpv -c <diskname>' as root user to clear the boot record and avoid a potential boot off an old boot image that may reside on the disk from which this logical volume is moved/removed. rmlv: Logical volume caalv_private2 is removed. Figure 10-12 The reducevg command After you complete the forced reduction, check whether the disk no longer contains a volume group by using the lqueryvg -Atp hdisk command. Also verify whether any previous volume group definition is still being displayed on the other nodes of your cluster by using the lspv command. If the lspv output shows the PVID with one associated volume group, you can fix it by running the exportvg vgname command. If experience any problems with this procedure, try a force overwrite of the disk as described in “Overwriting the disk.” Overwriting the disk This method involves writing data to the top of the disk to overwrite the VGDA information and effectively cleaning the disk, leaving it ready for use by other volume groups. Attention: Only attempt this method if the rmcluster and reducevg procedures fail and if AIX still has access to the disk. You can check this access by running the lquerypv -h /dev/hdisk command. Enter the following command: # dd if=/dev/zero of=/dev/hdiskx bs=4 count=1 This command zeros only the part of the disk that contains the repository offset. Therefore, you do not lose the PVID information. In some cases, this procedure is not sufficient to resolve the problem. If you need to completely overwrite the disk, run the following procedure: Attention: This procedure overwrites the entire disk structure including the PVID. You must follow the steps as shown to change the PVID if required during migration. # # # # dd if=/dev/zero of=/dev/hdiskn bs=512 count=9 chdev -l hdiskn -a pv=yes rmdev -dl hdiskn cfgmgr Chapter 10. Troubleshooting PowerHA 7.1 321 On any other node in the cluster, you must also update the disk: # rmdev -dl hdiskn # cfgmgr Run the lspv command to check that the PVID is the same on both nodes. To ensure that you have the real PVID, query the disk as follows: # lquerypv -h /dev/hdiskn Look for the PVID, which is in sector 80 as shown in Figure 10-13. chile:/ # lquerypv -h /dev/hdisk3 00000000 C9C2D4C1 00000000 00000000 00000010 00000000 00000000 00000000 00000020 00000000 00000000 00000000 00000030 00000000 00000000 00000000 00000040 00000000 00000000 00000000 00000050 00000000 00000000 00000000 00000060 00000000 00000000 00000000 00000070 00000000 00000000 00000000 00000080 000FE401 68921CEA 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 |................| |................| |................| |................| |................| |................| |................| |................| |....h...........| Figure 10-13 PVID from the lquerypv command The PVID should match the lspv output as shown in Figure 10-14. chile:/ # lspv hdisk1 hdisk2 hdisk3 hdisk4 hdisk5 hdisk6 hdisk7 hdisk8 hdisk0 000fe4114cf8d1ce 000fe40163c54011 000fe40168921cea 000fe4114cf8d3a1 000fe4114cf8d441 000fe4114cf8d4d5 000fe4114cf8d579 000fe4114cf8d608 000fe40140a5516a None None None None None None None ny_datavg rootvg active Figure 10-14 The lspv output showing PVID 10.4.6 Changed PVID of the repository disk Your repository disk PVID might have changed because of a dd on the whole disk or a change in the logical unit number (LUN). If this change happened and you must complete the migration, follow the guidance in this section to change it. 322 IBM PowerHA SystemMirror 7.1 for AIX If you are in a migration that has not yet been completed, change the PVID section in the /var/clmigcheck/clmigcheck.txt file (Figure 10-15). You must change this file on every node in your cluster. CLUSTER_TYPE:STANDARD CLUSTER_REPOSITORY_DISK:000fe40120e16405 CLUSTER_MULTICAST:NULL Figure 10-15 Changing the PVID in the clmigcheck.txt file If this is post migration and PowerHA is installed, you must also modify the HACMPsircol ODM class (Figure 10-16) on all nodes in the cluster. HACMPsircol: name = "newyork_sircol" id = 0 uuid = "0" repository = "000fe4114cf8d258" ip_address = "" nodelist = "serbia,scotland,chile," backup_repository1 = "" backup_repository2 = "" Figure 10-16 The HACMPsircol ODM class To modify the HACMPsircol ODM class, enter the following commands: # odmget HACMPsircol > HACMPsircol.add # vi HACMPsircol.add Change the repository = "000fe4114cf8d258" line to your new PVID as follows: # odmdelete -o HACMPsircol # odmadd HACMPsircol.add Then save the file. 10.4.7 The ‘Cluster services are not active’ message After migration of PowerHA, if you notice that CAA cluster services are not running, you see the “Cluster services are not active” message when you run the lscluster command. You also notice that the CAA repository disk is not varied on. You might be able to recover by recreating the CAA cluster from the last CAA configuration (HACMPsircol class in ODM) as explained in the following steps: 1. Clear the CAA repository disk as explained in “Previously used repository disk for CAA” on page 316. 2. Perform a synchronization or verification of the cluster. Upon synchronizing the cluster, the mkcluster command is run to recreate the CAA cluster. However, if the problem still persists, contact IBM support. Chapter 10. Troubleshooting PowerHA 7.1 323 324 IBM PowerHA SystemMirror 7.1 for AIX 11 Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in This chapter explains how to install IBM Systems Director Version 6.2. It also explains how to install the PowerHA SystemMirror plug-in for the IBM Systems Director, and the necessary agents on the client machines to be managed by Systems Director. For detailed planning, prerequisites, and instructions, see Implementing IBM Systems Director 6.1, SG24-7694. This chapter includes the following topics: Installing IBM Systems Director Version 6.2 Installing the SystemMirror plug-in Installing the clients © Copyright IBM Corp. 2011. All rights reserved. 325 11.1 Installing IBM Systems Director Version 6.2 Before you configure the cluster using the SystemMirror plug-in, you must install and configure the IBM Systems Director. You can install the IBM Systems Director Server on AIX, Linux, or Windows operating system. For quick reference, this section provides the installation steps on AIX. See the information in the following topics in the IBM Systems Director Information Center for details about installation on other operating systems: The “IBM Systems Director V6.2.x” topic for general information http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com.ib m.director.main.helps.doc/fqm0_main.html “Installing IBM Systems Director on the management server” topic for installation information http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com.ib m.director.install.helps.doc/fqm0_t_installing.html The following section, “Hardware requirements”, explains the installation requirements of IBM Systems Director v6.2 on AIX. 11.1.1 Hardware requirements See the “Hardware requirements for running IBM Systems Director Server” topic in the IBM Systems Director Information Center for details about the recommended hardware requirements for installing IBM Systems Director: http://publib.boulder.ibm.com/infocenter/director/v6r2x/topic/com.ibm.director.pla n.helps.doc/fqm0_r_hardware_requirements_for_running_ibm_systems_director_server.h tml Table 11-1 lists the hardware requirements for IBM Systems Director Server running on AIX for a small configuration that has less than 500 managed systems. Table 11-1 Hardware requirements for IBM Systems Director Server on AIX 326 Resource Requirement CPU Two processors, IBM POWER5, POWER6 or POWER7®, or for partitioned systems: Entitlement = 1 Uncapped Virtual processors = 4 Weight = Default Memory 3 GB Disk storage 4 GB File system requirement (during installation) root = 1.2 GB /tmp = 2 GB /opt = 4 GB IBM PowerHA SystemMirror 7.1 for AIX More information: Disk storage requirements for running the IBM Systems Director Server are used by the /opt file system. Therefore, a total of 4 GB is required for the /opt file system while installing IBM Systems Director and during run time. For more details about hardware requirements, see the “Recommended hardware requirements for IBM Systems Director Server running on AIX” topic in the IBM Systems Director Information Center at: http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com.ib m.director.plan.helps.doc/fqm0_r_hardware_requirements_servers_running_aix.html 11.1.2 Installing IBM Systems Director on AIX For the prerequisites and complete steps for installing IBM Systems Director, see the following topics in the IBM Systems Director Information Center: “Preparing to install IBM Systems Director Server on AIX” http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/ com.ibm.director.install.helps.doc/fqm0_t_preparing_to_install_ibm_director_on_ aix.html “Installing IBM Systems Director Server on AIX,” which provides the complete installation steps http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com.ib m.director.install.helps.doc/fqm0_t_installing_ibm_director_server_on_aix.html The following steps summarize the process for installing IBM Systems Director on AIX: 1. Increase the file size limit: ulimit -f 4194302 (or to unlimited) 2. Increase the number of file descriptors: ulimit -n 4000 3. Verify the file system (/, /tmp and /opt) size as mentioned in Table 11-1 on page 326: df -g / /tmp /opt 4. Download IBM Systems Director from the IBM Systems Director Downloads page at: http://www.ibm.com/systems/management/director/downloads/ 5. Extract the content: gzip -cd <package_name> | tar -xvf where <package_name> is the file name of the download package. 6. Install the content by using the script in the extracted package: ./dirinstall.server Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in 327 11.1.3 Configuring and activating IBM Systems Director To configure and activate IBM Systems Director, follow these steps: 1. Configure IBM Systems Director by using the following script: /opt/ibm/director/bin/configAgtMgr.sh Agent password: The script prompts for an agent password for which you can consider giving the host system root password or any other common password of your choice. This password is used by IBM Systems Director for its internal communication and does have any external impact. 2. Start IBM Systems Director: /opt/ibm/director/bin/smstart 3. Monitor the activation process as shown in Figure 11-1. This process might take 2-3 minutes. /opt/ibm/director/bin/smstatus -r Inactive Starting Active Figure 11-1 Activation status for IBM Systems Director Some subsystems are added as part of the installation process as follows: Subsystem platform_agent cimsys Group PID 2752614 3080288 Status active active Some process start automatically: root 6553804 7995522 0 13:24:40 pts/0 0:00 /opt/ibm/director/jre/bin/java -Xverify:none -cp /opt/ibm/director/lwi/r root 7340264 1 0 13:19:26 pts/2 3:14 /opt/ibm/director/jre/bin/java -Xms512m -Xmx2048m -Xdump:system:events=g root 7471292 2949286 0 12:00:31 - 0:00 /opt/freeware/cimom/pegasus/bin/cimssys platform_agent root 7536744 1 0 12:00:31 - 0:00 /opt/ibm/icc/cimom/bin/dirsnmpd root 8061058 3604568 0 13:16:32 - 0:14 /var/opt/tivoli/ep/_jvm/jre/bin/java -Xmx384m -Xminf0.01 -Xmaxf0.4 -Dsun 4. Log in to IBM Systems Director by using the following address: https://<hostname.domain.com or IP>:8422/ibm/console/logon.jsp In this example, we use the following address: https://indus74.in.ibm.com:8422/ibm/console/logon.jsp 5. On the welcome page (Figure 12-4 on page 335) that opens, log in using root credentials. After completing the installation of IBM Systems Director, install the SystemMirror plug-in as explained in the following section. 328 IBM PowerHA SystemMirror 7.1 for AIX 11.2 Installing the SystemMirror plug-in The IBM Systems Director provides two sets of plug-ins: The SystemMirror server plug-in to be installed in the IBM Systems Director Server. The SystemMirror agent plug-in to be installed in the cluster nodes or the endpoints as discovered by IBM Systems Director. 11.2.1 Installing the SystemMirror server plug-in You must install the SystemMirror server plug-in in the IBM Systems Director Server. Table 11-2 on page 329 outlines the installation steps for the SystemMirror server plug-in depending on your operating system. You can find this table and more information about the installation in the SystemMirror installation steps chapter in “Configuring AIX Clusters for High Availability Using PowerHA SystemMirror for Systems Director,” which you can download from: http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101774 Table 11-2 Installation steps for the SystemMirror server plug-in Operating system Installation steps AIX and Linux Graphical installation: # chmod 700 IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin # IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin Textual Installation: # chmod 700 IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin # IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin -i console Silent mode installation Edit the installer.properties file. # chmod 700 IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin # IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin -i silent Windows Graphical installation: IBMSystemsDirector-PowerHA_SystemMirror-Windows.exe Textual installation: IBMSystemsDirector-PowerHA_SystemMirror-Windows.exe -i console Silent installation: First, edit the installer.properties file. IBMSystemsDirector-PowerHA_SystemMirror-Windows.exe -i silent export DISPLAY: export DISPLAY =<ip address of X Windows Server>:1 is required to export the display of the server running the X Window System server to use the graphical installation. Verifying the installation of the SystemMirror plug-in The interface plug-in of the subagent is loaded when the IBM System Director Server starts. To check the installation, run the following command depending on your environment: AIX / Linux: /opt/ibm/director/lwi/bin/lwiplugin.sh -status | grep mirror Windows: C:/Program Files/IBM/Director/lwi/bin/lwiplugin.bat Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in 329 Figure 11-2 shows the output of the plug-in status. 94:RESOLVED:com.ibm.director.power.ha.systemmirror.branding:7.1.0.1:com.ibm.director.power.ha.systemmirr or.branding 95:ACTIVE:com.ibm.director.power.ha.systemmirror.common:7.1.0.1:com.ibm.director.power.ha.systemmirror.c ommon 96:ACTIVE:com.ibm.director.power.ha.systemmirror.console:7.1.0.1:com.ibm.director.power.ha.systemmirror. console 97:RESOLVED:com.ibm.director.power.ha.systemmirror.helps.doc:7.1.0.1:com.ibm.director.power.ha.systemmir ror.helps.doc 98:INSTALLED:com.ibm.director.power.ha.systemmirror.server.fragment:7.1.0.0:com.ibm.director.power.ha.sy stemmirror.server.fragment 99:ACTIVE:com.ibm.director.power.ha.systemmirror.server:7.1.0.1:com.ibm.director.power.ha.systemmirror.s erver Figure 11-2 Output of the plug-in status command If the subagent interface plug-in shows the RESOLVED status instead of the ACTIVE status, attempt to start the subagent. Enter the following commands by using the lwiplugin.sh script on AIX and Linux or the lwiplugin.bat script on Windows and the plug-in number (which is 94): AIX and Linux /opt/ibm/director/agent/bin/lwiplugin.sh -start 94 Windows C:/Program Files/IBM/Director/lwi/bin/lwiplugin.bat -start 94 If Systems Director was active during installation of the plug-in, you must stop it and restart it as follows: 1. Stop the IBM Systems Director Server: # /opt/ibm/director/bin/smstop 2. Start the IBM Systems Director Server: # /opt/ibm/director/bin/smstart 3. Monitor the startup process: # /opt/ibm/director/bin/smstatus -r Inactive Starting Active *** (the "Active" status can take a long time) 11.2.2 Installing the SystemMirror agent plug-in in the cluster nodes Install the cluster.es.director.agent file set by using SMIT. This file set is provided with the base PowerHA SystemMirror installable images. More information: See the SystemMirror agent installation section in Configuring AIX Clusters for High Availability Using PowerHA SystemMirror for Systems Director paper at: http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101774 See also PowerHA SystemMirror for IBM Systems Director, SC23-6763. 330 IBM PowerHA SystemMirror 7.1 for AIX 11.3 Installing the clients You must perform the steps in the following sections in each node that is going to be managed by the PowerHA SystemMirror plug-in for IBM Systems Director. This topic includes the following sections: Installing the common agent Installing the PowerHA SystemMirror agent 11.3.1 Installing the common agent Perform these steps on each node that is going to be managed by the IBM Systems Director Server: 1. Extract the SysDir6_2_Common_Agent_AIX.jar file set: # /usr/java5/bin/jar -xvf SysDir6_2_Common_Agent_AIX.jar 2. Give execution permission to the repository/dir6.2_common_agent_aix.sh file: # chmod +x repository/dir6.2_common_agent_aix.sh 3. Execute the repository/dir6.2_common_agent_aix.sh file: # ./repository/dir6.2_common_agent_aix.sh Some subsystems are added as part of the installation process: platform_agent cimsys 3211374 2621604 active active Some process start automatically: root 421934 1 0 15:55:30 - 0:00 /opt/ibm/icc/cimom/bin/dirsnmpd root 442376 1 0 15:55:40 - 0:00 /usr/bin/cimlistener root 458910 1 0 15:55:31 - 0:00 /opt/freeware/cimom/pegasus/bin/CIM_diagd root 516216 204950 0 15:55:29 - 0:00 /opt/freeware/cimom/pegasus/bin/cimssys platform_agent root 524366 1 0 15:55:29 - 0:00 ./slp_srvreg -D root 581780 1 0 15:55:37 - 0:04 [cimserve] root 626740 204950 0 15:55:29 - 0:00 /opt/freeware/cimom/pegasus/bin/cimssys cimsys root 630862 1 0 15:55:29 - 0:00 /opt/ibm/director/cimom/bin/tier1slp Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in 331 11.3.2 Installing the PowerHA SystemMirror agent To install the PowerHA SystemMirror agent on the nodes, follow these steps: 1. Install the cluster.es.director.agent.rte file set: # smitty install_latest 2. Stop the common agent: # stopsrc -s platform_agent # stopsrc -s cimsys 3. Start the common agent: # startsrc -s platform_agent Tip: The cimsys subsystem starts along with the platform_agent subsystem. 332 IBM PowerHA SystemMirror 7.1 for AIX 12 Chapter 12. Creating and managing a cluster using IBM Systems Director The SystemMirror plug-in provided by IBM Systems Director is used to configure and manage the PowerHA cluster. This plug-in provides a state-of-the-art interface and a command-line interface (CLI) for cluster configuration. It includes wizards to help you create and manage the cluster and the resource groups. The plug-in also helps in seamless integration of Smart Assists and third-party application support. This chapter explains how to create and manage the PowerHA SystemMirror cluster with IBM Systems Director. This chapter includes the following topics: Creating a cluster with the SystemMirror plug-in wizard Creating a cluster with the SystemMirror plug-in CLI Performing cluster management Performing cluster management with the SystemMirror plug-in CLI Creating a resource group with the SystemMirror plug-in GUI wizard Resource group management using the SystemMirror plug-in wizard Managing a resource group with the SystemMirror plug-in CLI Verifying and synchronizing a configuration with the GUI Verifying and synchronizing with the CLI Performing cluster monitoring with the SystemMirror plug-in © Copyright IBM Corp. 2011. All rights reserved. 333 12.1 Creating a cluster You can create a cluster by using the wizard for the SystemMirror plug-in or by using the CLI commands for the SystemMirror plug-in. This topic explains how to use both methods. 12.1.1 Creating a cluster with the SystemMirror plug-in wizard To create the cluster by using the GUI wizard of the SystemMirror plug-in, follow these steps. 1. Go to your IBM Systems Director server. 2. On the login page (Figure 12-1), log in to IBM Systems Director with your user ID and password. Figure 12-1 Systems Director login console 3. In the IBM Systems Director console, in the left navigation pane, expand Availability and select PowerHA SystemMirror (Figure 12-2). Figure 12-2 Selecting the PowerHA SystemMirror link in IBM Systems Director 334 IBM PowerHA SystemMirror 7.1 for AIX 4. In the right pane, under Cluster Management, click Create Cluster (Figure 12-3). Figure 12-3 The Create Cluster link under Cluster Management 5. Starting with the Create Cluster Wizard, follow the wizard panes to create the cluster. a. In the Welcome pane (Figure 12-4), click Next. Figure 12-4 Create Cluster Wizard Chapter 12. Creating and managing a cluster using IBM Systems Director 335 b. In the Name the cluster pane (Figure 12-5), in the Cluster name field, provide a name for the cluster. Click Next. Figure 12-5 Entering the cluster name c. In the Choose nodes pane (Figure 12-6), select the host names of the nodes. Figure 12-6 Selecting the cluster nodes 336 IBM PowerHA SystemMirror 7.1 for AIX Common storage: The cluster nodes must have the common storage for the repository disk. To verify the common storage, in the Choose nodes window, click the Common storage button. The Common storage window (Figure 12-7) opens showing the common disks. Figure 12-7 Verifying common storage availability for the repository disk d. In the Configure nodes pane (Figure 12-8), set the controlling node. The controlling node in the cluster is considered to be the primary or home node. Click Next. Figure 12-8 Setting the controlling node Chapter 12. Creating and managing a cluster using IBM Systems Director 337 e. In the Choose repositories pane (Figure 12-9), choose the storage disk that is shared among all nodes in the cluster to use as the common storage repository. Click Next. Figure 12-9 Selecting the repository disk f. In the Configure security pane (Figure 12-10), specify the security details to secure communication within the cluster. Figure 12-10 Configuring the cluster security configuration 338 IBM PowerHA SystemMirror 7.1 for AIX g. In the Summary pane (Figure 12-11), verify the configuration details. Figure 12-11 Summary pane 6. Verify the cluster creation in the AIX cluster nodes by using either of the following commands: – The CAA command: /usr/sbin/lscluster -m – The PowerHA command: /usr/es/sbin/cluster/utilities/cltopinfo 12.1.2 Creating a cluster with the SystemMirror plug-in CLI IBM Systems Director provides a CLI to monitor and manage the system. This section explains how to create a cluster by using the SystemMirror plug-in CLI. Chapter 12. Creating and managing a cluster using IBM Systems Director 339 Overview of the CLI The CLI is executed by using a general-purpose smcli command. To list the available CLI commands for managing the cluster, run the smcli lsbundle command as shown in Figure 12-12. # smcli lsbundle | grep sysmirror sysmirror/help sysmirror/lsac sysmirror/lsam sysmirror/lsappctl sysmirror/lsappmon sysmirror/lscl sysmirror/lscluster sysmirror/lsdependency sysmirror/lsdp sysmirror/lsfc sysmirror/lsfilecollection sysmirror/lsif sysmirror/lsinterface sysmirror/lslg sysmirror/lslog sysmirror/lsmd sysmirror/lsmethod ..... ..... Figure 12-12 CLI commands specific to SystemMirror You can retrieve help information for the commands (Figure 12-12) as shown in Figure 12-13. # smcli lscluster --help smcli sysmirror/lscluster {-h|-?|--help} \ [-v|--verbose] smcli sysmirror/lscluster [-v|--verbose] \ [<CLUSTER>[,<CLUSTER#2>,...]] Command Alias: lscl Figure 12-13 CLI help option Creating a cluster with the CLI Before you create a cluster, ensure that you have all the required details to create the cluster: Cluster nodes Persistent IP (if any) Repository disk Controlling node Security options (if any) To verify the availability of the mkcluster command, you can use the smcli lsbundle command in IBM Systems Director as shown in Figure 12-12. 340 IBM PowerHA SystemMirror 7.1 for AIX To create a cluster, issue the smcli mkcluster command from the IBM Systems Director Server as shown in Example 12-1. Example 12-1 Creating a cluster with the smcli mkcluster CLI command smcli mkcluster -i 224.0.0.0 \ -r hdisk3 \ –n nodeA.xy.ibm.com,nodeB.xy.ibm.com \ DB2_Cluster You can use the -h option to list the commands that are available (Figure 12-14). # smcli mkcluster -h smcli sysmirror/mkcluster {-h|-?|--help} [-v|--verbose] smcli sysmirror/mkcluster [{-i|--cluster_ip} <multicast_address>] \ [{-S|--fc_sync_interval} <##>] \ [{-s|--rg_settling_time} <##>] \ [{-e|--max_event_time} <##>] \ [{-R|--max_rg_processing_time} <##>] \ [{-c|--controlling_node} <node>] \ [{-d|--shared_disks} <DISK>[,<DISK#2>,...] ] \ {-r|--repository} <disk> \ {-n|--nodes} <NODE>[, <NODE#2>,...] \ [<cluster_name>] Figure 12-14 The mkcluster -h command to list the available commands To verify that the cluster has been created, you can use the smcli lscluster command. Command help: To assistance with using the commands, you can use either of the following help options: smcli <command name> -help --verbose smcli <command name> -h -v 12.2 Performing cluster management You can perform cluster management by using the GUI wizard for the SystemMirror plug-in or by using the CLI commands for the SystemMirror plug-in. This topic explains how to use both methods. 12.2.1 Performing cluster management with the SystemMirror plug-in GUI wizard IBM Systems Director provides GUI wizards to manage the network, storage, and snapshots of a cluster. IBM Systems Director also provides functionalities to add nodes, view cluster services status changes, review reports, and verify and synchronize operations. The following sections guide you through these functionalities. Chapter 12. Creating and managing a cluster using IBM Systems Director 341 Accessing the Cluster Management Wizard To access the Cluster Management Wizard, follow these steps: 1. In the IBM Systems Director console, expand Availability and select PowerHA SystemMirror (Figure 12-3 on page 335). 2. In the right pane, under Cluster Management, click the Manage Clusters link (Figure 12-15). Figure 12-15 Manage cluster 342 IBM PowerHA SystemMirror 7.1 for AIX Cluster management functionality This section describes the cluster management functionality: Cluster Management window (Figure 12-16) After clicking the Manage Clusters link in the IBM Systems Director console, you see the Cluster Management pane. This pane contains a series of tabs to help you manage your cluster. Figure 12-16 Cluster Management pane Chapter 12. Creating and managing a cluster using IBM Systems Director 343 Edit Advanced Properties button Under the General tab, you can click the Edit Advanced Properties button to modify the cluster properties. For example, you can change the controlling node as shown in Figure 12-17. Figure 12-17 Editing the advanced properties, such as the controlling node Add Network tab Under the Networks tab, you can click the Add Network button to add a network as shown in Figure 12-18. Figure 12-18 Add Network function 344 IBM PowerHA SystemMirror 7.1 for AIX Storage management On the Storage tab, you can perform disk management tasks such as converting the hdisk into VPATH. From the View drop-down list, select Disks to modify the disk properties as shown in Figure 12-19. Figure 12-19 Cluster storage management Capture Snapshot You can capture and manage snapshots through the Snapshots tab. To capture a new snapshot, click the Create button on the Snapshots tab as shown in Figure 12-20. Figure 12-20 Capture Snapshot function Chapter 12. Creating and managing a cluster using IBM Systems Director 345 File collection and logs management You can manage file collection and logs on the Additional Properties tab. From the View drop-down list, select either File Collections or Log files as shown in Figure 12-21. Figure 12-21 Additional Properties tab: File Collections and Log files options Creating a file collection On the Additional Properties tab, when you select File Collections from the View drop-down list and click the Create button, you can create a file collection as shown in Figure 12-22. Figure 12-22 Creating a file collection 346 IBM PowerHA SystemMirror 7.1 for AIX Collect log files button On the Additional Properties tab, when you select Log files from the View drop-down list and click the Collect log files button, you can collect log files as shown in Figure 12-23. Figure 12-23 Collect log files The Systems Director plug-in also provides a CLI to manage the cluster. The following section explains the available CLI commands and how you can find help for each of these commands. 12.2.2 Performing cluster management with the SystemMirror plug-in CLI The SystemMirror plug-in provides a CLI to most of the cluster management functions. For a list o the available functions, use the following command: smcli lsbundle | grep sysmirror A few of the CLI commands are provided as follows for a quick reference: Snapshot creation You can use the smcli mksnapshot command to create a snapshot. Figure 12-24 on page 348 shows the command for obtaining detailed help about this command. mkss: mkss is the alias for the mksnapshot command. Chapter 12. Creating and managing a cluster using IBM Systems Director 347 # smcli mkss -h -v smcli sysmirror/mksnapshot [-h|-?|--help] [-v|--verbose] smcli sysmirror/mksnapshot {-c|--cluster} <CLUSTER> \ {-d|--description} "<DESCRIPTION>" \ [{-M|--methods} <METHOD>[,<METHOD#2>,...] ] \ [-s|--save_logs] \ <snapshot_name> Figure 12-24 Help details for the mksnapshot command Figure 12-2 shows usage of the smcli mkss command. Example 12-2 Usage of the mksnapshot command smcli mkss -c selma04_cluster -d "Selma04 cluster snapshot taken on Sept2010" selma04_sep10_ss Verify the snapshot by using the smcli lsss command as shown in Example 12-3. Example 12-3 Verifying the snapshot # smcli lsss -c selma04_cluster selma04_sep10_ss NAME="selma04_sep10_ss" DESCRIPTION="Selma04 cluster snapshot taken on Sept2010" METHODS="" SAVE_LOGS="false" CAPTURE_DATE="Sep 29 09:47" NODE="selma03" File collection You can use the smcli mkfilecollection command to create a file collection as shown in Example 12-4. A file collection helps to keep the files and directories synchronized on all nodes in the cluster. Example 12-4 File collection # smcli mkfilecollection -c selma04_cluster -C -d "File Collection for the selma04 cluster" -F /home selma04_file_collection # smcli lsfilecollection -c selma04_cluster selma04_file_collection NAME="selma04_file_collection" DESCRIPTION="File Collection for the selma04 cluster" FILE="/home" SIZE="256" Log files You can use the smcli lslog command (Example 12-5) to list the available log files in the cluster. Then you can use the smcli vlog command to view the log files. Example 12-5 Log file management # smcli lslog -c selma04_cluster Node: selma03 ============= autoverify.log 348 IBM PowerHA SystemMirror 7.1 for AIX cl2siteconfig_assist.log cl_testtool.log clavan.log clcomd.log clcomddiag.log .... ....(output truncated) # smcli vlog -c selma04_cluster -n Collector succeeded Collector succeeded Collector succeeded selma03 0 selma03 on node on node on node -T 4 clverify.log selma03 (31610 bytes) selma03 (4250 bytes) selma03 (26 bytes) Modification functionality: At the time of writing this IBM Redbooks publication, an edit or modification CLI command, such as to modify the controlling node, is not available for its initial release. Therefore, use the GUI wizards for the modification functionality. 12.3 Creating a resource group with the SystemMirror plug-in GUI wizard You can configure the resource group by using the Resource Group Wizard as follows: 1. Log in to IBM Systems Director. 2. In the left navigation area, expand Availability and select PowerHA SystemMirror (Figure 12-25). 3. In the right pane, under Resource Group Management, click Add a resource group link. Figure 12-25 Resource group management Chapter 12. Creating and managing a cluster using IBM Systems Director 349 4. On the Clusters tab, click the Actions list and select Add Resource Group (Figure 12-26). Then select the cluster node, and click the Action button. Alternative: You can select the resource group configuration wizard by selecting the cluster nodes, as shown in Figure 12-26. Figure 12-26 Adding a resource group 5. In the Choose a cluster pane (Figure 12-27), choose the cluster where the resource group. Notice that this step is highlighted under welcome in the left pane. Figure 12-27 Choose the cluster for the resource group configuration You can now choose to create either a custom resource group or a predefined resource group as explained in 12.3.1, “Creating a custom resource group” on page 351, and 12.3.2, “Creating a predefined resource group” on page 353. 350 IBM PowerHA SystemMirror 7.1 for AIX 12.3.1 Creating a custom resource group To create a customer resource group, follow these steps: 1. In the Add a resource group pane (Figure 12-28), select the Create a custom resource group option, enter a resource group name, and click Next. Figure 12-28 Adding a resource group 2. In the Choose nodes pane (Figure 12-29), select the nodes for which you want to configure the resource group. Figure 12-29 Selecting the nodes for configuring a resource group Chapter 12. Creating and managing a cluster using IBM Systems Director 351 3. In the Choose policies and attributes pane (Figure 12-30), select the policies to add to the resource group. Figure 12-30 Selecting the policies and attributes 4. In the Choose resources pane (Figure 12-31), select the shared resources to define for the resource group. Figure 12-31 Selecting the shared resources 352 IBM PowerHA SystemMirror 7.1 for AIX 5. In the Summary pane (Figure 12-32), review the settings and click the Finish button to create the resource group. Figure 12-32 Summary pane of the Resource Creation wizard 12.3.2 Creating a predefined resource group For a set of applications, such as IBM SAP, WebSphere®, DB2, HTTP Server, and Tivoli Directory Server, the SystemMirror plug-in facilitates the process of creating predefined resource groups. To configure the predefined resource groups, follow these steps: 1. In the Add a resource group pane (Figure 12-33 on page 354), select the Create predefined resource groups for one of the following discovered applications radio button. Then select the application for which the resource group is to be configured. Application list: Only the applications installed in the cluster nodes are displayed under the predefined resource group list. Chapter 12. Creating and managing a cluster using IBM Systems Director 353 Figure 12-33 Predefined resource group configuration 2. In the Choose components pane, for the predefined resource group, select the components of the application to create the resource group. In the example shown in Figure 12-34, the Tivoli Director Server component is selected. Each component already has the predefined properties such as the primary node and takeover node. Modify the properties per your configuration and requirements. Then create the resource group. Figure 12-34 Application components 354 IBM PowerHA SystemMirror 7.1 for AIX 12.3.3 Verifying the creation of a resource group To verify the creation of a resource group, follow these steps: 1. In the right pane, under Cluster Management, click the Manage Clusters link (Figure 12-15 on page 342). 2. Click the Resource Groups tab (Figure 12-35). Figure 12-35 Resource Groups tab 3. Enter the following base SystemMirror command to verify that the resource group has been created: /usr/es/sbin/cluster/utilities/clshowres 12.4 Managing a resource group You can manage a resource group by using the SystemMirror plug-in wizard or the SystemMirror plug-in CLI commands. This topic explains how to use both methods. 12.4.1 Resource group management using the SystemMirror plug-in wizard The SystemMirror plug-in wizard has simplified resource group management with the addition of the following functionalities: Checking the status of a resource group Moving a resource group across nodes Creating dependencies Accessing the resource group management wizard To access the Resource Group Management wizard, follow these steps: 1. Log in to IBM Systems Director. 2. In the left pane, expand Availability and select PowerHA SystemMirror (Figure 12-36 on page 356). Chapter 12. Creating and managing a cluster using IBM Systems Director 355 3. In the right pane, under Resource Group Management, select Manage Resource Groups (Figure 12-36). Figure 12-36 Resource group management link The Resource Group Management wizard opens as in Figure 12-37. Alternatively, you can access the Resource Group Management wizard by selecting Manage Cluster under Cluster Management (Figure 12-36). To access the Cluster and Resource Group Management wizard, select the Resource Groups tab as shown in Figure 12-37. Figure 12-37 Resource Group Management tab 356 IBM PowerHA SystemMirror 7.1 for AIX Resource group management functionality The Resource Group Management wizard includes the following functions: Create Dependency function a. Select the Clusters button to see the resource groups defined under the cluster. b. Click the Action list and select Create Dependency (as shown in Figure 12-38). Alternatively, right-click a cluster name and select Create Dependency. Figure 12-38 Selecting the Create Dependency function c. In the Parent-child window (Figure 12-39), select the dependency type to configure the dependencies. Figure 12-39 Parent-child window Chapter 12. Creating and managing a cluster using IBM Systems Director 357 Resource group removal Right-click the selected resource group, and click Remove to remove the resource group as shown in Figure 12-40. Figure 12-40 Cluster and Resource Group Management pane Application Availability and Configuration reports The Application Availability and Configuration reports show the configuration details of the resource group. The output of these reports is similar to the output produced by the clshowres command in the base PowerHA installation. You can also see the status of the application. To access these reports, right-click a resource group name, select Reports and then select Application Availability or Configuration as shown in Figure 12-41. Figure 12-41 Application Monitors 358 IBM PowerHA SystemMirror 7.1 for AIX Resource group status change To view move, online, and offline status changes, right-click a resource group name and select Advanced. Then select the option you need as shown in Figure 12-42. Figure 12-42 viewing a status change 12.4.2 Managing a resource group with the SystemMirror plug-in CLI Similar to the CLI commands for cluster creation and management, a set of CLI commands are provided for resource group management. To list the available CLI commands for managing the cluster, run the smcli lsbundle command (Figure 12-12 on page 340). The following commands are specific to resource groups: To remove the resource group in the controlling node: sysmirror/rmresgrp To start the resource group in online state: sysmirror/startresgrp To stop the resource group to an offline state: sysmirror/stopresgrp To move the resource group to an online state: sysmirror/moveresgrp To list all the configured resource groups: sysmirror/lsresgrp If the resource group name is used along with this command, it provides the details of the resource group. Chapter 12. Creating and managing a cluster using IBM Systems Director 359 Examples of CLI command usage This section shows examples using the CLI commands for resource group management. To list the resource groups, use the following command as shown in Example 12-6: smcli lsresgrp -c <cluster name> Example 12-6 The smcli lsresgrp command # smcli lsresgrp -c selma04_cluster myRG RG01_selma03 RG02_selma03 RG03_selma04 RG04_selma04_1 RG05_selma03_04 RG06_selma03_04 RG_dhe To remove the resource group, use the following command as shown in Example 12-7: smcli rmresgrp -c <cluster name> -C <RG_name> Example 12-7 The smcli rmresgrp command using the -C option to confirm the removal operation # smcli rmresgrp -c selma04_cluster Test_AltRG Removing this resource group will cause all user-defined PowerHA information to be DELETED. Removing objects is something which is not easily reversed, and therefore requires confirmation. If you are sure that you want to proceed with this removal operation, re-run the command using the "--confirm" or "-C" option. Consider creating a snapshot of the current cluster configuration first, though, since restoring a snapshot will be the only way to reverse any deletions. 12.5 Verifying and synchronizing a configuration You can verify and synchronize a cluster by using the wizard for the SystemMirror plug-in or by using the CLI commands for the SystemMirror plug-in. This topic explains how to use both methods. 12.5.1 Verifying and synchronizing a configuration with the GUI To verify and synchronize the configuration by using the Synchronization and Verification function of the SystemMirror plug-in, follow these steps: 1. Log in to IBM Systems Director. 2. Expand Availability and select PowerHA SystemMirror as shown in Figure 12-4 on page 335. 3. Under Cluster Management, select the Manage Clusters link. 360 IBM PowerHA SystemMirror 7.1 for AIX 4. In the Cluster and Resource Group Management wizard, select the cluster for which you want to perform the synchronize and verification function. Then select the Action button or right-click the cluster to access the Verify and Synchronize option as shown in Figure 12-43. Figure 12-43 Cluster management option list 5. In the Verify and Synchronize pane (Figure 12-44), select whether you want to synchronize the entire configuration, only the unsynchronized changes, or verify. Then click OK. Figure 12-44 Verify and Synchronize window Chapter 12. Creating and managing a cluster using IBM Systems Director 361 6. Optional: Undo the changes to the configuration after synchronization. a. To access this option, in the Cluster and Resource Group Management wizard, on the Clusters tab, select the cluster for which you want to perform the synchronize and verification function (Figure 12-43 on page 361). b. As shown in (Figure 12-45), select the Recovery Undo local changes of configuration. Figure 12-45 Recovering the configuration option c. When you see the Undo Local Changes of the Configuration message (Figure 12-46), click OK. Figure 12-46 Undo changes message window Snapshot for the undo changes option: The undo changes option creates a snapshot before it deletes the configuration since the last synchronization. 362 IBM PowerHA SystemMirror 7.1 for AIX 12.5.2 Verifying and synchronizing with the CLI This section shows examples of performing cluster verification and synchronization by using the CLI functionality: Synchronization You can use the synccluster command to verify and synchronize the cluster. This command copies the cluster configuration from the controlling node of the specified cluster to each of the other nodes in the cluster. The help option is available by using the smcli synccluster -h -v command as shown in Example 12-8. Here you see options such as to perform a verification or synchronization (see Example 12-9). Example 12-8 The help option of the smcli synccluster command # smcli sysmirror/synccluster -h -v smcli sysmirror/synccluster {-h|-?|--help} [-v|--verbose] smcli sysmirror/synccluster [-n|--no_verification}] \ <CLUSTER> smcli sysmirror/synccluster [-x|--fix_errors}] \ [-C|--changes_only}] \ [-t|--custom_tests_only}] \ [{-M|--methods} <METHOD>[,<METHOD#2>,...] ] \ [{-e|--maximum_errors} <##>] \ [-F|--force] \ [{-l|--logfile} <full_path_to_file>] \ <CLUSTER> Command Alias: sycl ..... ..... <output truncated> Example 12-9 shows how to synchronize cluster changes and to log the output in its own specific log file. Example 12-9 smcli synccluster changes only with the log file option # smcli synccluster -C -l /tmp/sync.log selma04_cluster Undo changes To restore the cluster configuration back to the configuration after any synchronization, use the smcli undochanges command. This operation restores the cluster configuration from the active configuration database. Typically, this command has the effect of discarding any unsynchronized changes. The help option is available by using the smcli undochanges -h -v command as shown in Example 12-10. Example 12-10 The help option for the smcli undochanges command # smcli undochanges -h -v smcli sysmirror/undochanges {-h|-?|--help} [-v|--verbose] smcli sysmirror/undochanges <CLUSTER> Chapter 12. Creating and managing a cluster using IBM Systems Director 363 Command Alias: undo -h|-?|--help Requests help for this command. -v|--verbose Requests maximum details in the displayed information. <CLUSTER> The label of a cluster to perform this operation on. ... <output truncated > 12.6 Performing cluster monitoring with the SystemMirror plug-in This topic explains how to monitor the status of the cluster and the resource group before and while the cluster services are active. It also covers problem determination steps and how to collect log files to analyze cluster issues. 12.6.1 Monitoring cluster activities before starting a cluster This section explains the features you can use to monitor for cluster activities before starting the cluster: Topology view After the cluster and its resource groups are configured, select the topology view to understand the overall status of cluster and its configuration: a. Log in to IBM Systems Director. b. Expand Availability and select PowerHA SystemMirror as shown in Figure 12-4 on page 335. c. In the right pane, select the cluster to be monitored and click Actions. Select Map View (Figure 12-47) to access the Map view of the cluster configuration. Figure 12-47 Map view of cluster configuration 364 IBM PowerHA SystemMirror 7.1 for AIX Map view: The map view is available for resource configuration. As shown in Figure 12-47 on page 364, select the Resource Groups tab. Click Action, and click Map View to see the map view of the resource group configuration as shown in Figure 12-48. Test_AhRG myRG RG_test_NChg _testinggg RG_testing11 RG_testing9 RG01_selma03 RG_testing6 selma_04_cluster RG_testing2 RG05_selma03_04 RG_TEST_4 RG06_selma03_04 Figure 12-48 Map view of resource group configuration Chapter 12. Creating and managing a cluster using IBM Systems Director 365 Cluster subsystem services status: You can view the status of PowerHA services, such as the clcomd subsystem, by using the Status feature. To access this feature, select the cluster for which the service status is to be viewed. Click the Action button and select Reports Status. You now see the cluster service status details, similar to the example in Figure 12-49. Figure 12-49 Cluster Services status Cluster Configuration Report Before starting the cluster services, access the cluster configuration report. Select the cluster for which the configuration report is to be viewed. Click the Action button and select Reports, which shows the Cluster Configuration Report page (Figure 12-50). Figure 12-50 Cluster Configuration Report 366 IBM PowerHA SystemMirror 7.1 for AIX You can also view the Cluster Topology Configuration Report by using the following command: /usr/es/sbin/cluster/utilities/cltopinfo Then select the cluster, click the Action button, and select Reports Configuration. You see the results in a format similar to the example in Figure 12-51. Figure 12-51 Cluster Topology Configuration Report Similarly you can view the configuration report for the resource group as shown in Figure 12-52. On the Resource Groups tab, select the resource group for which you want to view the configuration. Then click the Action button and select Reports. Figure 12-52 Resource Group Configuration Report Chapter 12. Creating and managing a cluster using IBM Systems Director 367 Application monitoring To locate the details of the application monitors that are configured and assigned to a resource group, select the cluster. Click the Action button and select Reports Applications. Figure 12-53 shows the status of the application monitoring. Figure 12-53 Application monitoring status Similarly you can view the configuration report for networks and interfaces by selecting the cluster, clicking the Action button, and selecting Reports Networks and Interfaces. 12.6.2 Monitoring an active cluster When the cluster service is active, to see the status of the resource group, select the cluster for which the status is to be viewed. Click the Action button and select Report Event Summary. You can now access the online status of the resource group and events summary as shown in Figure 12-54. Figure 12-54 Resource group online status 368 IBM PowerHA SystemMirror 7.1 for AIX 12.6.3 Recovering from cluster configuration issues To recover from cluster configuration issues, such as recovering from an event failure and undoing local changes, consider the following tips: Getting the proper GUI Select the cluster and click the Actions button. Then select Recovery and choose the appropriate action as shown in Figure 12-55. Figure 12-55 Recovery options Releasing cluster modification locks After you issue the release of the cluster modification locks, you see a message similar to the one shown in Figure 12-56. Before you perform the operation, save a snapshot of the cluster as indicated in the message. Figure 12-56 Release cluster modification locks Chapter 12. Creating and managing a cluster using IBM Systems Director 369 Recovering from an event failure After you issue a cluster recover from event failure, you see a message similar to the one shown in Figure 12-57. Verify that you have addressed all problems that led to the error before continuing with the operation. Figure 12-57 Recovery from an event failure Collecting problem determination data To collect problem determination data, select the Turn on debugging option and Collect the RSCT log files (Figure 12-58). Figure 12-58 Collect Problem Determination Data window Undoing local changes of a configuration To undo local changes of a configuration, see 12.5.1, “Verifying and synchronizing a configuration with the GUI” on page 360. 370 IBM PowerHA SystemMirror 7.1 for AIX 13 Chapter 13. Disaster recovery using DS8700 Global Mirror This chapter explains how to configure disaster recovery based on IBM PowerHA SystemMirror for AIX Enterprise Edition using IBM System Storage DS8700 Global Mirror as a replicated resource. This support was added in version 6.1 with service pack 3 (SP3). This chapter includes the following topics: Planning for Global Mirror Installing the DSCLI client software Scenario description Configuring the Global Mirror resources Configuring AIX volume groups Configuring the cluster Failover testing LVM administration of DS8000 Global Mirror replicated resources © Copyright IBM Corp. 2011. All rights reserved. 371 13.1 Planning for Global Mirror Proper planning is crucial to the success of any disaster recovery solution. This topic reveals the basic requirements to implement Global Mirror and integrate it with the IBM PowerHA SystemMirror for AIX Enterprise Edition. 13.1.1 Software prerequisites Global Mirror functionality works with all the AIX levels that are supported by PowerHA SystemMirror Standard Edition. The following software is required for the configuration of the PowerHA SystemMirror for AIX Enterprise Edition for Global Mirror: The following base file sets for PowerHA SystemMirror for AIX Enterprise Edition 6.1: – – – – – cluster.es.pprc.cmds cluster.es.pprc.rte cluster.es.spprc.cmds cluster.es.spprc.rte cluster.msg.en_US.pprc PPRC and SPPRC file sets: The PPRC and SPPRC file sets are not required for Global Mirror support on PowerHA. The following additional file sets included in SP3 (must be installed separately and require the acceptance of licenses during the installation): – cluster.es.genxd cluster.es.genxd.cmds cluster.es.genxd.rte 6.1.0.0 Generic XD support - Commands 6.1.0.0 Generic XD support - Runtime – cluster.msg.en_US.genxd cluster.msg.en_US.genxd 6.1.0.0 Generic XD support - Messages AIX supported levels: – 5.3 TL9, RSCT 2.4.12.0, or later – 6.1 TL2 SP1, RSCT 2.5.4.0, or later The IBM DS8700 microcode bundle 75.1.145.0 or later DS8000 CLI (DSCLI) 6.5.1.203 or later client interface (must be installed on each PowerHA SystemMirror node): – Java 1.4.1 or later – APAR IZ74478, which removes the previous Java requirement The path name for the DSCLI client in the PATH for the root user on each PowerHA SystemMirror node (must be added) 13.1.2 Minimum DS8700 requirements Before you implement PowerHA SystemMirror with Global Mirror, you must ensure that the following requirements are met: Collect the following information for all the HMCs in your environment: – IP addresses – Login names and passwords – Associations with storage units 372 IBM PowerHA SystemMirror 7.1 for AIX Verify that all the data volumes that must be mirrored are visible to all relevant AIX hosts. Verify that the DS8700 volumes are appropriately zoned so that the IBM FlashCopy® volumes are not visible to the PowerHA SystemMirror nodes. Ensure all Hardware Management Consoles (HMCs) are accessible by using the Internet Protocol network for all PowerHA SystemMirror nodes where you want to run Global Mirror. 13.1.3 Considerations The PowerHA SystemMirror Enterprise Edition using DS8700 Global Mirror has the following considerations: The AIX Virtual SCSI is not supported in this initial release. No auto-recovery is available from a PPRC path or link failure. If the PPRC path or link between Global Mirror volumes breaks down, the PowerHA Enterprise Edition is unaware of it. (PowerHA does not process Simple Network Management Protocol (SNMP) for volumes that use DS8K Global Mirror technology for mirroring). In this case, the user must identify and correct the PPRC path failure. Depending on timing conditions, such an event can result in the corresponding Global Mirror session to go to a “Fatal” state. If this situation occurs, the user must manually stop and restart the corresponding Global Mirror Session (using the rmgmir and mkgmir DSCLI commands) or an equivalent DS8700 interface. Cluster Single Point Of Control (C-SPOC) cannot perform the some Logical Volume Manager (LVM) operations on nodes at the remote site that contain the target volumes. Operations that require nodes at the target site to read from the target volumes result in an error message in C-SPOC. Such operations include such functions as changing the file system size, changing the mount point, and adding LVM mirrors. However, nodes on the same site as the source volumes can successfully perform these tasks, and the changes can be propagated later to the other site by using a lazy update. Attention: For C-SPOC operations to work on all other LVM operations, you must perform all C-SPOC operations with the DS8700 Global Mirror volume pairs in a synchronized or consistent state. Alternatively, you must perform them in the active cluster on all nodes. The volume group names must be listed in the same order as the DS8700 mirror group names in the resource group. 13.2 Installing the DSCLI client software You can download the latest version of the DS8000 DSCLI client software from the following web page: ftp://ftp.software.ibm.com/storage/ds8000/updates/DS8K_Customer_Download_Files/CLI Install the DS8000 DSCLI software on each PowerHA SystemMirror node. By default, the installation process installs the DSCLI in the /opt/ibm/dscli directory. Add the installation directory of the DSCLI into the PATH environment variable for the root user. For more details about the DS8000 DSCLI, see the IBM System Storage DS8000: Command-Line Interface User’s Guide, SC26-7916. Chapter 13. Disaster recovery using DS8700 Global Mirror 373 13.3 Scenario description This scenario uses a three-node cluster named Txrmnia. Two nodes are in the primary site, Texas, and one node is in the site Romania. The jordan and leeann nodes are at the Texas site and the robert node is at the Romania site. The primary site, Texas, has both local automatic failover and remote recovery. Figure 13-1 provides a software and hardware overview of the tested configuration between the two sites. Txrmnia Figure 13-1 DS8700 Global Mirror test scenario For this test, the resources are limited. Each system has a single IP, an XD_ip network, and single Fibre Channel (FC) host adapters. Ideally, redundancy might exist throughout the system, including in the local Ethernet networks, cross-site XD_ip networks, and FC connectivity. This scenario has a single resource group, ds8kgmrg, which consists of a service IP address (service_1), a volume group (txvg), and a DS8000 Global Mirror replicated resource (texasmg). To configure the cluster, see 13.6, “Configuring the cluster” on page 385. 13.4 Configuring the Global Mirror resources This section explains how to perform the following tasks: Checking the prerequisites Identifying the source and target volumes Configuring the Global Mirror relationships For each task, the DS8000 storage units are already added to the storage area network (SAN) fabric and zoned appropriately. Also, the volumes are already provisioned to the nodes. 374 IBM PowerHA SystemMirror 7.1 for AIX For details about how to set up the storage units, see IBM System Storage DS8700 Architecture and Implementation, SG24-8786. 13.4.1 Checking the prerequisites To check the prerequisites, follow these steps: 1. Ensure that the DSCLI installation path is in the PATH environment variable on all nodes. 2. Verify that you have the appropriate microcode version on each storage unit by running the ver -lmc command in a DSCLI session as shown in Example 13-1. Example 13-1 Checking the microcode level (0) root @ r9r4m21: : / # dscli -cfg /opt/ibm/dscli/profile/dscli.profile.hmc1 Date/Time: October 6, 2010 2:15:33 PM CDT IBM DSCLI Version: 6.5.15.19 IBM.2107-75DC890 DS: dscli> ver -lmc Date/Time: October 6, 2010 2:15:41 PM CDT IBM DSCLI Version: 6.5.15.19 DS: Storage Image LMC ========================== IBM.2107-75DC890 5.5.1.490 dscli> 3. Check the code bundle level that corresponds to your LMC version on the “DS8700 Code Bundle Information” web page at: http://www.ibm.com/support/docview.wss?uid=ssg1S1003593 The code bundle level must be at version 75.1.145.0 or later. Also on the same page, verify that your displayed DSCLI version corresponds to the installed code bundle level or a later level. Example 13-2 shows the extra parameters inserted into the DSCLI configuration file for the storage unit in the primary site, /opt/ibm/dscli/profile/dscli.profile.hmc1. Adding these parameters helps to prevent from having to type them each time they are required. Example 13-2 Editing the DSCLI configuration file username: redbook password: r3dbook hmc1: 9.3.207.122 devid: IBM.2107-75DC890 remotedevid: IBM.2107-75DC980 13.4.2 Identifying the source and target volumes Figure 13-2 on page 376 shows the volume allocation in DS8000 units for the scenario in this chapter. Global Copy source volumes are attached to both nodes in the primary site, Texas, and the corresponding Global Copy target volumes are attached to the node in the secondary site, Romania. The gray volumes, FlashCopy targets, are not exposed to the hosts. Chapter 13. Disaster recovery using DS8700 Global Mirror 375 Texas Romania 2604 0A08 2C04 2C00 2600 2E00 2804 2800 Global Copy Data Volume Flash Copy Flash Copy Volume Figure 13-2 Volume allocation in DS8000 units Table 13-1 shows the association between the source and target volumes of the replication relationship and between their logical subsystems (LSS, the two most significant digits of a volume identifier highlighted in bold in the table). Table 13-1 also indicates the mapping between the volumes in the DS8000 units and their disk names on the attached AIX hosts. Table 13-1 AIX hdisk to DS8000 volume mapping Site Texas Site Romania AIX disk LSS/VOL ID LSS/VOL ID AIX disk hdisk10 2E00 2800 hdisk2 hdisk6 2600 2C00 hdisk6 You can easily obtain this mapping by using the lscfg -vl hdiskX | grep Serial command as shown in Example 13-3. The hdisk serial number is a concatenation of the storage image serial number and the ID of the volume at the storage level. Example 13-3 The hdisk serial number in the lscfg command output # lscfg -vl hdisk10 | grep Serial Serial Number...............75DC8902E00 # lscfg -vl hdisk6 | grep Serial Serial Number...............75DC8902600 Symmetrical configuration: In an actual environment (and different from this sample environment), to simplify the management of your Global Mirror environment, maintain a symmetrical configuration in terms of both physical and logical elements. With this type of configuration, you can keep the same AIX disk definitions on all nodes. It also helps you during configuration and management operations of the disk volumes within the cluster. 376 IBM PowerHA SystemMirror 7.1 for AIX 13.4.3 Configuring the Global Mirror relationships In this section, you configure the Global Mirror replication relationships by performing the following tasks: Creating PPRC paths Creating Global Copy relationships Creating FlashCopy relationships Selecting an available Global Mirror session identifier Defining Global Mirror sessions for all involved LSSs Including all the source and target volumes in the Global Mirror session Creating PPRC paths In this task, the appropriate FC links have been configured between the storage units. Example 13-4 shows the FC links that are available for the setup. Example 13-4 Available FC links dscli> lsavailpprcport -remotewwnn 5005076308FFC804 2e:28 Date/Time: October 5, 2010 5:48:09 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 Local Port Attached Port Type ============================= I0010 I0210 FCP I0013 I0203 FCP I0013 I0310 FCP I0030 I0200 FCP I0030 I0230 FCP I0030 I0330 FCP I0040 I0200 FCP I0040 I0230 FCP I0041 I0232 FCP I0041 I0331 FCP I0042 I0211 FCP I0110 I0203 FCP I0110 I0310 FCP I0110 I0311 FCP I0111 I0310 FCP I0111 I0311 FCP I0130 I0200 FCP I0130 I0230 FCP I0130 I0300 FCP I0130 I0330 FCP I0132 I0232 FCP I0132 I0331 FCP dscli> Complete the following steps: 1. Run the lssi command on the remote storage unit to obtain the remote wwnn parameter for the lsavailpprcport command. The last parameter is one possible pair of your source and target LSSs. 2. For redundancy and bandwidth, configure more FC links by using redundant SAN fabrics. Chapter 13. Disaster recovery using DS8700 Global Mirror 377 3. Among the multiple displayed links, choose two that have their ports on different adapters. Use them to create the PPRC path for the 2e:28 LSS pair (see Example 13-5). Example 13-5 Creating pprc paths dscli> mkpprcpath -remotewwnn 5005076308FFC804 -srclss 2e -tgtlss 28 I0030:I0230 I0110:I0203 Date/Time: October 5, 2010 5:55:46 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 CMUC00149I mkpprcpath: Remote Mirror and Copy path 2e:28 successfully established. dscli> lspprcpath 2e Date/Time: October 5, 2010 5:56:13 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 Src Tgt State SS Port Attached Port Tgt WWNN ========================================================= 2E 28 Success FF28 I0030 I0230 5005076308FFC804 2E 28 Success FF28 I0110 I0203 5005076308FFC804 dscli> 4. In a similar manner, configure one PPRC path for each other involved LSS pair. 5. Because the PPRC paths are unidirectional, create a second path, in the opposite direction, for each LSS pair. You use the same procedure, but work on the other storage unit (see Example 13-6). We select different FC links for this direction. Example 13-6 Creating PPRC paths in opposite directions dscli> mkpprcpath -remotewwnn 5005076308FFC004 -srclss 28 -tgtlss 2e I0311:I0111 I0300:I0130 Date/Time: October 5, 2010 5:57:02 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 CMUC00149I mkpprcpath: Remote Mirror and Copy path 28:2e successfully established. dscli> Creating Global Copy relationships Create Global Copy relationship between the source and target volumes and then check their status by using the commands shown in Example 13-7. Example 13-7 Creating Global Copy relationships dscli> mkpprc -type gcp 2e00:2800 2600:2c00 Date/Time: October 5, 2010 5:57:13 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship 2E00:2800 successfully created. CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship 2600:2C00 successfully created. dscli> lspprc 2e00:2800 2600:2c00 Date/Time: October 5, 2010 5:57:42 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ================================================================================================== 2600:2C00 Copy Pending Global Copy 26 60 Disabled True 2E00:2800 Copy Pending Global Copy 2E 60 Disabled True dscli> 378 IBM PowerHA SystemMirror 7.1 for AIX Creating FlashCopy relationships Create FlashCopy relationships on both DS8000 storage units as shown in Example 13-8. Example 13-8 Creating FlashCopy relationships dscli> mkflash -tgtinhibit -nocp -record 2e00:0a08 2600:2604 Date/Time: October 5, 2010 4:17:13 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 CMUC00137I mkflash: FlashCopy pair 2E00:0A08 successfully created. CMUC00137I mkflash: FlashCopy pair 2600:2604 successfully created. dscli> lsflash 2e00:0a08 2600:2604 Date/Time: October 5, 2010 4:17:31 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 ID SrcLSS SequenceNum Timeout ActiveCopy Recording Persistent Revertible SourceWriteEnabled TargetWriteEnabled BackgroundCopy =========================================================================================== 2E00:0A08 0A 0 60 Disabled Enabled Enabled Disabled Enabled Disabled Disabled 2600:2604 26 0 60 Disabled Enabled Enabled Disabled Enabled Disabled Disabled dscli> dscli> mkflash -tgtinhibit -nocp -record 2800:2804 2c00:2c04 Date/Time: October 5, 2010 4:20:14 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 CMUC00137I mkflash: FlashCopy pair 2800:2804 successfully created. CMUC00137I mkflash: FlashCopy pair 2C00:2C04 successfully created. dscli> lsflash 2800:2804 2c00:2c04 Date/Time: October 5, 2010 4:20:38 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 ID SrcLSS SequenceNum Timeout ActiveCopy Recording Persistent Revertible SourceWriteEnabled TargetWriteEnabled BackgroundCopy =========================================================================================== 2800:2804 28 0 60 Disabled Enabled Enabled Disabled Enabled Disabled Disabled 2C00:2C04 2C 0 60 Disabled Enabled Enabled Disabled Enabled Disabled Disabled dscli> Selecting an available Global Mirror session identifier Example 13-9 lists the Global Mirror sessions that are already defined on each DS8000 storage unit. In this scenario, we chose 03 as the session identifier because it is free on both storage units. Example 13-9 Sessions defined on both DS8000 storage units dscli> lssession 00-ff Date/Time: October 5, 2010 6:07:19 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete AllowCascading =========================================================================================================== 04 77 Normal 0400 Join Pending Primary Copy Pending Secondary Simplex True Disable 0A 04 Normal 0A04 Join Pending Primary Suspended Secondary Simplex False Disable 16 05 Normal 1604 Join Pending Primary Suspended Secondary Simplex False Disable 16 05 Normal 1605 Join Pending Primary Suspended Secondary Simplex False Disable 18 02 Normal 1800 Join Pending Primary Suspended Secondary Simplex False Disable 1C 04 Normal 1C00 Join Pending Primary Suspended Secondary Simplex False Disable 1C 04 Normal 1C01 Join Pending Primary Suspended Secondary Simplex False Disable dscli> lssession 00-ff Date/Time: October 5, 2010 6:08:23 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 Chapter 13. Disaster recovery using DS8700 Global Mirror 379 LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete AllowCascading =========================================================================================================== 1A 20 Normal 1A00 Join Pending Primary Simplex Secondary Copy Pending True Disable 1C 01 30 77 Normal 3000 Join Pending Primary Simplex Secondary Copy Pending True Disable dscli> Defining Global Mirror sessions for all involved LSSs Define the Global Mirror sessions for all the LSSs associated with source and target volumes as shown in Example 13-10. The same freely available session identifier, determined in “Selecting an available Global Mirror session identifier” on page 379, is used on both storage units. Example 13-10 Defining the GM session for the source and target volumes dscli> mksession -lss Date/Time: October 5, CMUC00145I mksession: dscli> mksession -lss Date/Time: October 5, CMUC00145I mksession: 2e 03 2010 6:11:07 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 Session 03 opened successfully. 26 03 2010 6:11:25 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 Session 03 opened successfully. dscli> mksession -lss Date/Time: October 6, CMUC00145I mksession: dscli> mksession -lss Date/Time: October 6, CMUC00145I mksession: dscli> 28 03 2010 5:39:02 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 Session 03 opened successfully. 2c 03 2010 5:39:15 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 Session 03 opened successfully. Including all the source and target volumes in the Global Mirror session Add the volumes in the Global Mirror sessions and verify their status by using the commands shown in Example 13-11. Example 13-11 Adding source and target volumes to the Global Mirror sessions dscli> chsession -lss 26 -action add -volume 2600 03 Date/Time: October 5, 2010 6:15:17 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 CMUC00147I chsession: Session 03 successfully modified. dscli> chsession -lss 2e -action add -volume 2e00 03 Date/Time: October 5, 2010 6:15:56 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 CMUC00147I chsession: Session 03 successfully modified. dscli> lssession 26 2e Date/Time: October 5, 2010 6:16:21 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete AllowCascading =========================================================================================================== 26 03 Normal 2600 Join Pending Primary Copy Pending Secondary Simplex True Disable 2E 03 Normal 2E00 Join Pending Primary Copy Pending Secondary Simplex True Disable dscli> dscli> chsession -lss Date/Time: October 6, CMUC00147I chsession: dscli> chsession -lss Date/Time: October 6, 380 2c -action add -volume 2c00 03 2010 5:41:12 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 Session 03 successfully modified. 28 -action add -volume 2800 03 2010 5:41:56 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 IBM PowerHA SystemMirror 7.1 for AIX CMUC00147I chsession: Session 03 successfully modified. dscli> lssession 28 2c Date/Time: October 6, 2010 5:44:02 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete AllowCascading =========================================================================================================== 28 03 Normal 2800 Join Pending Primary Simplex Secondary Copy Pending True Disable 2C 03 Normal 2C00 Join Pending Primary Simplex Secondary Copy Pending True Disable dscli> 13.5 Configuring AIX volume groups In this scenario, you create a volume group and a file system on the hdisks associated with the DS8000 source volumes. These volumes are already identified in 13.4.2, “Identifying the source and target volumes” on page 375. They are hdisk6 and hdisk10 on the jordan node. You must configure the volume groups and file systems on the cluster nodes. The application might need the same major number for the volume group on all nodes. Perform this configuration task because it might be useful later for additional configuration of the Network File System (NFS). For the nodes on the primary site, you can use the standard procedure. You define the volume groups and file systems on one node and then import them to the other nodes. For the nodes on the secondary site, you must first suspend the replication on the involved target volumes. 13.5.1 Configuring volume groups and file systems on primary site In this task, you create an AIX volume group on the hdisks associated with the DS8000 source volumes on the jordan node and import it on the leeann node. Follow these steps: 1. Choose the next free major number on all cluster nodes by running the lvlstmajor command on each cluster node. The next common free major number on all systems is 50 as shown in Example 13-12. Example 13-12 Running the lvlstmajor command on all cluster nodes root@leeann: lvlstmajor 50... root@robert: lvlstmajor 44..54,56... root@jordan: # lvlstmajor 50... Chapter 13. Disaster recovery using DS8700 Global Mirror 381 2. Create a volume group, called txvg, and a file system, called /txro. These volumes are already identified in 13.4.2, “Identifying the source and target volumes” on page 375. They are hdisk6 and hdisk10 on the jordan node. Example 13-13 shows a list of commands to run on the jordan node. Example 13-13 Creating txvg volume group on jordan root@jordan: mkvg -V 50 -y txvg hdisk6 hdisk10 0516-1254 mkvg: Changing the PVID in the ODM. txvg root@jordan:chvg -a n xvg root@jordan: mklv -e x -t jfs2 -y txlv txvg 250 txlv root@jordan: mklv -e x -t jfs2log -y txloglv txvg 1 txloglv root@jordan: crfs -v jfs2 -d /dev/txlv -a log=/dev/txloglv -m /txro -A no File system created successfully. 1023764 kilobytes total disk space. New File System size is 2048000 root@jordan: lsvg -p txvg txvg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdisk6 active 511 385 102..00..79..102..102 hdisk10 active 511 386 103..00..79..102..102 root@jordan:lspv|grep -e hdisk6 -e hdisk10 hdisk6 000a625afe2a4958 txvg active hdisk10 000a624a833e440f txvg active root@jordan: varyoffvg txvg root@jordan: 3. Import the volume group on the second node on the primary site, leeann, as shown in Example 13-14: a. b. c. d. Verify that the shared disks have the same PVID on both nodes. Run the rmdev -dl command for each hdisk. Run the cfgmgr program. Run the importvg command. Example 13-14 Importing the txvg volume group on the leeann node root@leean: rmdev -dl hdisk6 hdisk6 deleted root@leean: rmdev -dl hdisk10 hdisk10 deleted root@leean: cfgmgr root@leean:lspv | grep -e hdisk6 -e hdisk10 hdisk6 000a625afe2a4958 hdisk10 000a624a833e440f root@leean: importvg -V 51 -y txvg hdisk6 txvg root@leean: lsvg -l txvg txvg: LV NAME TYPE LPs PPs txlv jfs2 250 250 txloglv jfs2log 1 1 382 IBM PowerHA SystemMirror 7.1 for AIX txvg txvg PVs 2 1 LV STATE open/syncd open/syncd MOUNT POINT /txro N/A root@leean: chvg -a n txvg root@leean: varyoffvg txvg 13.5.2 Importing the volume groups in the remote site To import the volume groups in the remote site, use the following steps. Example 13-15 shows the commands to run on the primary site. 1. Obtain a consistent replica of the data, on the primary site, by ensuring that the volume group is varied off as shown by the last command in Example 13-14. 2. Ensure that the Global Copy is in progress and that the Out of Sync count is 0. 3. Suspend the replication by using the pausepprc command. Example 13-15 Pausing the Global Copy relationship on the primary site dscli> lspprc -l 2600 2e00 Date/Time: October 6, 2010 3:40:56 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 ID State Reason Type Out Of Sync Tracks Tgt Read Src Cascade Tgt Cascade Date Suspended SourceLSS Timeout (secs) Critical Mode First Pass Status Incremental Resync Tgt Write GMIR CG PPRC CG isTgtSE DisableAutoResync =========================================================================================================== =========================================================================================================== 2600:2C00 Copy Pending Global Copy 0 Disabled Disabled Invalid 26 60 Disabled True Disabled Disabled N/A Disabled Unknown False 2E00:2800 Copy Pending Global Copy 0 Disabled Disabled Invalid 2E 60 Disabled True Disabled Disabled N/A Disabled Unknown False dscli> pausepprc 2600:2C00 2E00:2800 Date/Time: October 6, 2010 3:49:29 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 CMUC00157I pausepprc: Remote Mirror and Copy volume pair 2600:2C00 relationship successfully paused. CMUC00157I pausepprc: Remote Mirror and Copy volume pair 2E00:2800 relationship successfully paused. dscli> lspprc -l 2600 2e00 Date/Time: October 6, 2010 3:49:41 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 ID State Reason Type Out Of Sync Tracks Tgt Read Src Cascade Tgt Cascade Date Suspended SourceLSS Timeout (secs) Critical Mode First Pass Status Incremental Resync Tgt Write GMIR CG PPRC CG isTgtSE DisableAutoResync =========================================================================================================== =========================================================================================================== 2600:2C00 Suspended Host Source Global Copy 0 Disabled Disabled Invalid 26 60 Disabled True Disabled Disabled N/A Disabled Unknown False 2E00:2800 Suspended Host Source Global Copy 0 Disabled Disabled Invalid 2E 60 Disabled True Disabled Disabled N/A Disabled Unknown False dscli> 4. To make the target volumes available to the attached hosts, use the failoverpprc command on the secondary site as shown in Example 13-16. Example 13-16 The failoverpprc command on the secondary site storage unit dscli> failoverpprc -type gcp 2C00:2600 2800:2E00 Date/Time: October 6, 2010 3:55:19 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 CMUC00196I failoverpprc: Remote Mirror and Copy pair 2C00:2600 successfully reversed. CMUC00196I failoverpprc: Remote Mirror and Copy pair 2800:2E00 successfully reversed. dscli> lspprc 2C00:2600 2800:2E00 Date/Time: October 6, 2010 3:55:35 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 Chapter 13. Disaster recovery using DS8700 Global Mirror 383 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ==================================================================================================== 2800:2E00 Suspended Host Source Global Copy 28 60 Disabled True 2C00:2600 Suspended Host Source Global Copy 2C 60 Disabled True dscli> 5. Refresh and check the PVIDs. Then import and vary off the volume group as shown in Example 13-17. Example 13-17 Importing the volume group txvg on the secondary site node, robert root@robert: rmdev -dl hdisk2 hdisk2 deleted root@robert: rmdev -dl hdisk6 hdisk6 deleted root@robert: cfgmgr root@robert: lspv |grep -e hdisk2 -e hdisk6 hdisk2 000a624a833e440f hdisk6 000a625afe2a4958 root@robert: importvg -V 50 -y txvg hdisk2 txvg root@robert: lsvg -l txvg txvg: LV NAME TYPE LPs PPs txlv jfs2 250 250 txloglv jfs2log 1 1 root@robert: varyoffvg txvg txvg txvg PVs 2 1 LV STATE closed/syncd closed/syncd MOUNT POINT /txro N/A 6. Re-establish the Global Copy relationship as shown in Example 13-18. Example 13-18 Re-establishing the initial Global Copy relationship dscli> failbackpprc -type gcp 2600:2C00 2E00:2800 Date/Time: October 6, 2010 4:24:10 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 CMUC00197I failbackpprc: Remote Mirror and Copy pair 2600:2C00 successfully failed back. CMUC00197I failbackpprc: Remote Mirror and Copy pair 2E00:2800 successfully failed back. dscli> lspprc 2600:2C00 2E00:2800 Date/Time: October 6, 2010 4:24:41 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ================================================================================================== 2600:2C00 Copy Pending Global Copy 26 60 Disabled True 2E00:2800 Copy Pending Global Copy 2E 60 Disabled True dscli> lspprc 2800 2c00 Date/Time: October 6, 2010 4:24:57 AM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ========================================================================================================= 2600:2C00 Target Copy Pending Global Copy 26 unknown Disabled Invalid 2E00:2800 Target Copy Pending Global Copy 2E unknown Disabled Invalid dscli> 384 IBM PowerHA SystemMirror 7.1 for AIX 13.6 Configuring the cluster To configure the cluster, you must complete all software prerequisites. Also you must configure the /etc/hosts file properly, and verify that the clcomdES subsystem is running on each node. To configure the cluster, follow these steps: 1. Add a cluster. 2. Add all three nodes. 3. Add both sites. 4. Add the XD_ip network. 5. Add the disk heartbeat network. 6. Add the base interfaces to XD_ip network. 7. Add the service IP address. 8. Add the DS8000 Global Mirror replicated resources. 9. Add a resource group. 10.Add a service IP, application server, volume group, and DS8000 Global Mirror Replicated Resource to the resource group. 13.6.1 Configuring the cluster topology Configuring a cluster entails the following tasks: Adding a cluster Adding nodes Adding sites Adding networks Adding communication interfaces Adding a cluster To add a cluster, follow these steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select Extended Configuration Extended Topology Configuration Configure an HACMP Cluster Add/Change/Show an HACMP Cluster. 3. Enter the cluster name, which is Txrmnia in this scenario, as shown in Figure 13-3. Press Enter. Add/Change/Show an HACMP Cluster Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [Txrmnia] * Cluster Name Figure 13-3 Adding a cluster in the SMIT menu The output is displayed in the SMIT Command Status window. Chapter 13. Disaster recovery using DS8700 Global Mirror 385 Adding nodes To add the nodes, follow these steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path Extended Configuration Extended Topology Configuration Configure HACMP Nodes Add a Node to the HACMP Cluster. 3. Enter the desired node name, which is jordan in this case, as shown in Figure 13-4. Press Enter. The output is displayed in the SMIT Command Status window. Add a Node to the HACMP Cluster Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [jordan] [] * Node Name Communication Path to Node + Figure 13-4 Add a Node SMIT menu 4. In this scenario, repeat these steps two more times to add the additional nodes of leeann and robert. Adding sites To add the nodes, follow these steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path Extended Configuration Extended Topology Configuration Configure HACMP Sites Add a Site. 3. Enter the desired site name, which in this scenario is the Texas site with the nodes jordan and leeann, as shown in Figure 13-5. Press Enter. The output is displayed in the SMIT Command Status window. Add a Site Type or select values in entry fields. Press Enter AFTER making all desired changes. * Site Name * Site Nodes [Entry Fields] [Texas] jordan leeann Figure 13-5 Add a Site SMIT menu 4. In this scenario, repeat these steps to add the Romania site with the robert node. 386 IBM PowerHA SystemMirror 7.1 for AIX + + Example 13-19 shows the site definitions. The dominance information is displayed, but not relevant until a resource group is defined later by using the nodes. Example 13-19 cllssite information about site definitions ./cllssite ---------------------------------------------------Sitename Site Nodes Dominance --------------------------------------------------Texas jordan leeann Romania robert Protection Type NONE NONE Adding networks To add the nodes, follow these steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path Extended Configuration Extended Topology Configuration Configure HACMP Networks Add a Network to the HACMP Cluster. 3. Choose the desired network type, which in this scenario is XD_ip. 4. Keep the default network name and press Enter (Figure 13-6). Add an IP-Based Network to the HACMP Cluster Type or select values in entry fields. Press Enter AFTER making all desired changes. * * * * Network Name Network Type Netmask(IPv4)/Prefix Length(IPv6) Enable IP Address Takeover via IP Aliases IP Address Offset for Heartbeating over IP Aliases [Entry Fields] [net_XD_ip_01] XD_ip [255.255.255.0] [Yes] [] + Figure 13-6 Add an IP-Based Network SMIT menu 5. Repeat these steps but select a network type of diskhb for the disk heartbeat network and keep the default network name of net_diskhb_01. Adding communication interfaces To add the nodes, follow these steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path Extended Configuration Extended Topology Configuration Configure HACMP Communication Interfaces/Devices Add Communication Interfaces/Devices Add Pre-defined Communication Interfaces and Devices Communication Interfaces. 3. Select the previously created network, which in this scenario is net_XD_ip_01. Chapter 13. Disaster recovery using DS8700 Global Mirror 387 4. Complete the SMIT menu fields. The first interface in this scenario is for jordan is shown in Figure 13-7. Press Enter. The output is displayed in the SMIT Command Status window. Add a Communication Interface Type or select values in entry fields. Press Enter AFTER making all desired changes. * * * * [Entry Fields] [jordan_base] XD_ip net_XD_ip_01 [jordan] IP Label/Address Network Type Network Name Node Name + + Figure 13-7 Add communication interface SMIT menu 5. Repeat these steps and select Communication Devices to complete the disk heartbeat network. The topology is now configured. Also you can see all the interfaces and devices from the cllsif command output shown in Figure 13-8. Adapter jordan_base jordandhb leeann_base leeanndhb robert_base Type boot service boot service boot Network Net Type net_XD_ip_01 XD_ip net_diskhb_01 diskhb net_XD_ip_01 XD_ip net_diskhb_01 diskhb net_XD_ip_01 XD_ip Attribute public serial public serial public Node jordan jordan leeann leeann robert IP Address 9.3.207.209 /dev/hdisk8 9.3.207.208 /dev/hdisk8 9.3.207.207 Figure 13-8 Cluster interfaces and devices defined 13.6.2 Configuring cluster resources and resource group The test scenario has only one resource group, which contains the resources of the service IP address, volume group, and DS8000 replicated resources. Configure the cluster resources and resource group as explained in the following sections. Defining the service IP Define the service IP by following these steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path Extended Configuration Extended Resource Configuration HACMP Extended Resources Configuration Configure HACMP Service IP Labels/Addresses Add a Service IP Label/Address Configurable on Multiple Nodes. 3. Choose the net_XD_ip_01 network and press Enter. 4. Choose the appropriate IP label or address. Press Enter. The output is displayed in the SMIT Command Status window. 388 IBM PowerHA SystemMirror 7.1 for AIX In this scenario, we added serviceip_2, as shown in Figure 13-9. Add a Service IP Label/Address configurable on Multiple Nodes (extended) Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * IP Label/Address serviceip_2 Netmask(IPv4)/Prefix Length(IPv6) [] * Network Name net_XD_ip_01 Alternate HW Address to accompany IP Label/Address [] Associated Site ignore + Figure 13-9 Add a Service IP Label SMIT menu In most true site scenarios, where each site is on different segments, it is common to create at least two service IP labels. You create one for each site by using the Associated Site option, which indicates the desire to have site-specific service IP labels. With this option, you can have a unique service IP label at each site. However, we do not use them in this test because we are on the same network segment. Defining the DS8000 Global Mirror resources To fully define the Global Mirror resources, follow these steps: 1. Add a storage agent or agents. 2. Add a storage system or systems. 3. Add a mirror group or groups. Because these options are all new, define each one before you configure them: Storage agent A generic name given by PowerHA SystemMirror for an entity such as the IBM DS8000 HMC. Storage agents typically provide a one-point coordination point and often use TCP/IP as their transport for communication. You must provide the IP address and authentication information that will be used to communicate with the HMC. Storage system A generic name given by PowerHA SystemMirror for an entity such as a DS8700 Storage Unit. When using Global Mirror, you must associate one storage agent with each storage system. You must provide the IBM DS8700 system identifier for the storage system. For example, IBM.2107-75ABTV1 is a storage identifier for a DS8000 Storage System. Mirror group A generic name given by PowerHA SystemMirror for a logical collection of volumes that must be mirrored to another storage system that resides on a remote site. A Global Mirror session represents a mirror group. Adding a storage agent To add a storage agent, follow these steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path Extended Configuration Extended Resource Configuration HACMP Extended Resources Configuration Configure DS8000 Global Mirror Resources Configure Storage Agents Add a Storage Agent. Chapter 13. Disaster recovery using DS8700 Global Mirror 389 3. Complete the menu appropriately and press Enter. Figure 13-10 shows the configuration for this scenario. The output is displayed in the SMIT Command Status window. Add a Storage Agent Type or select values in entry fields. Press Enter AFTER making all desired changes. * * * * [Entry Fields] [ds8khmc] [9.3.207.122] [redbook] [r3dbook] Storage Agent Name IP Addresses User ID Password Figure 13-10 Add a Storage Agent SMIT menu It is possible to have multiple storage agents. However, this test scenario has only one storage agent that manages both storage units. Important: The user ID and password are stored as flat text in the HACMPxd_storage_agent.odm file. Adding a storage system To add the storage systems, follow these steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path Extended Configuration Extended Resource Configuration HACMP Extended Resources Configuration Configure DS8000 Global Mirror Resources Configure Storage Systems Add a Storage System. 3. Complete the menu appropriately and press Enter. Figure 13-11 shows the configuration for this scenario. The output is displayed in the SMIT Command Status window. Add a Storage System Type or select values in entry fields. Press Enter AFTER making all desired changes. * * * * * Storage System Name Storage Agent Name(s) Site Association Vendor Specific Identification WWNN Figure 13-11 Add a Storage System SMIT menu 390 IBM PowerHA SystemMirror 7.1 for AIX [Entry Fields] [texasds8k] ds8kmainhmc Texas [IBM.2107-75DC890] [5005076308FFC004] + + + + 4. Repeat these steps for the storage system at Romania site, and name it romaniads8k. Example 13-20 shows the configuration. Example 13-20 Storage systems definitions Storage System Name Storage Agent Name(s) Site Association Vendor Specific Identification WWNN Storage System Name Storage Agent Name(s) Site Association Vendor Specific Identification WWNN texasds8k ds8kmainhmc Texas IBM.2107-75DC890 5005076308FFC004 romaniads8k ds8kmainhmc Romania IBM.2107-75DC980 5005076308FFC804 Adding a mirror group You are now ready to add the storage systems. To add a storage system, perform the following steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path Extended Configuration Extended Resource Configuration HACMP Extended Resources Configuration Configure DS8000 Global Mirror Resources Configure Mirror Groups Add a Mirror Group. 3. Complete the menu appropriately and press Enter. Figure 13-12 show the configuration for this scenario. The output is displayed in the SMIT Command Status window. Add a Mirror Group Type or select values in entry fields. Press Enter AFTER making all desired changes. * * * * Mirror Group Name Storage System Name Vendor Specific Identifier Recovery Action Maximum Coordination Time Maximum Drain Time Consistency Group Interval Time [Entry Fields] [texasmg] texasds8k romaniads8k [03] + automatic [50] [30] [0] + + Figure 13-12 Add a Mirror Group SMIT menu Vendor Specific Identifier field: For the Vendor Specific Identifier field, provide only the Global Mirror session number. Defining a resource group and Global Mirror resources Now that you have all the components configured that are required for the DS8700 replicated resource, you can create a resource group and add your resources to it. Chapter 13. Disaster recovery using DS8700 Global Mirror 391 Adding a resource group To add a resource group, follow these steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path Extended Configuration Extended Resource Configuration HACMP Extended Resources Group Configuration Add a Resource Group. 3. Complete the menu appropriately and press Enter. Figure 13-13 shows the configuration in this scenario. Notice that for the Inter-Site Management Policy, we chose Prefer Primary Site. This option ensures that resource group starts automatically when the cluster is started in the primary Texas site. The output is displayed in the SMIT Command Status window. Add a Resource Group (extended) Type or select values in entry fields. Press Enter AFTER making all desired changes. * Resource Group Name [Entry Fields] [ds8kgmrg] Inter-Site Management Policy * Participating Nodes from Primary Site Participating Nodes from Secondary Site [Prefer Primary Site] [jordan leeann] [robert] Startup Policy Fallover Policy Fallback Policy + + + Online On Home Node Only+ Fallover To Next Priority Node > + Never Fallback Figure 13-13 Add a Resource Group SMIT menu Adding resources to a resource group To add resources to a resource group, perform the following steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path Extended Configuration Extended Resource Configuration Change/Show Resources and Attributes for a Resource Group. 3. Choose the resource group, which in this example is ds8kgmrg. 4. Complete the menu appropriately and press Enter. Figure 13-13 shows the configuration for this scenario. The output is displayed in the SMIT Command Status window. In this scenario, we only added a service IP label, the volume group, and the DS8000 Global Mirror Replicated Resources as shown in the streamlined clshowres command output in Example 13-21. Volume group: The volume group names must be listed in the same order as the DS8700 mirror group names in the resource group. Example 13-21 Resource group attributes and resources Resource Group Name Inter-site Management Policy Participating Nodes from Primary Site 392 IBM PowerHA SystemMirror 7.1 for AIX ds8kgmrg Prefer Primary Site jordan leeann Participating Nodes from Secondary Site Startup Policy Fallover Policy Fallback Policy Service IP Label Volume Groups GENXD Replicated Resources robert Online On Home Node Only Fallover To Next Priority Node Never Fallback serviceip_2 txvg + texasmg + DS8000 Global Mirror Replicated Resources field: In the SMIT menu for adding resources to the resource group, notice that the appropriate field is named DS8000 Global Mirror Replicated Resources. However, when viewing the menu by using the clshowres command (Example 13-21 on page 392), the field is called GENXD Replicated Resources. You can now synchronize the cluster, start the cluster, and begin testing it. 13.7 Failover testing This section takes you through basic failover testing scenarios with the DS8000 Global Mirror replicated resources locally within the site and across sites. You must carefully plan the testing of a site cluster failover because more time is required to manipulate the secondary target LUNs at the recovery site. Also when testing the asynchronous replication, because of the nature of the asynchronous replication, it can also impact the data. In these scenarios, redundancy tests, such as on IP networks that have only a single network, cannot be performed. Instead you must configure redundant IP or non-IP communication paths to avoid isolation of the sites. The loss of all the communication paths between sites leads to a partitioned state of the cluster. Such a loss also leads to data divergence between sites if the replication links are also unavailable. Another specific failure scenario is the loss of replication paths between the storage subsystems while the cluster is running on both sites. To avoid this type of loss, configure a redundant PPRC path or links for the replication. You must manually recover the status of the pairs after the storage links are operational again. Important: If the PPRC path or link between Global Mirror volumes breaks down, the PowerHA Enterprise Edition is unaware. The reason is that PowerHA does not process SNMP for volumes that use DS8700 Global Mirror technology for mirroring. In such a case, you must identify and correct the PPRC path failure. Depending upon some timing conditions, such an event can result in the corresponding Global Mirror session going into a fatal state. In this situation, you must manually stop and restart the corresponding Global Mirror session (by using the rmgmir and mkgmir DSCLI commands) or an equivalent DS8700 interface. This topic takes you through the following tests: Graceful site failover Rolling site failure Site re-integration Each test, other than the re-integration test, begins in the same initial state of the primary site hosting the ds8kgmrg resource group on the primary node as shown in Example 13-22 on page 394. Before each test, we start copying data from another file system to the replicated file systems. After each test, we verify that the service IP address is online and that new data Chapter 13. Disaster recovery using DS8700 Global Mirror 393 is in the file systems. We also had a script that inserted the current time and date, along with the local node name, into a file on each file system. Example 13-22 Beginning of the test cluster resource group states jordan# clRGinfo ----------------------------------------------------------------------------Group Name State Node ----------------------------------------------------------------------------ds8kgmrg ONLINE jordan@Texas OFFLINE leeann@Texas ONLINE SECONDARY robert@Romania After each test, we show the Global Mirror states. Example 13-23 shows the normal running production status of the Global Mirror pairs from each site. Example 13-23 Beginning states of the Global Mirror pairs *******************From node jordan at site Texas*************************** dscli> lssession 26 2E Date/Time: October 10, 2010 4:00:04 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete AllowCascading =========================================================================================================== ============== 26 03 CG In Progress 2600 Active Primary Copy Pending Secondary Simplex True Disable 2E 03 CG In Progress 2E00 Active Primary Copy Pending Secondary Simplex True Disable dscli> lspprc 2600 2E00 Date/Time: October 10, 2010 4:00:43 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ================================================================================================== 2600:2C00 Copy Pending Global Copy 26 60 Disabled True 2E00:2800 Copy Pending Global Copy 2E 60 Disabled True *******************From remote node robert at site Romania*************************** dscli> lssession 28 2c Date/Time: October 10, 2010 3:54:58 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete AllowCascading =========================================================================================================== ====== 28 03 Normal 2800 Join Pending Primary Simplex Secondary Copy Pending True Disable 2C 03 Normal 2C00 Join Pending Primary Simplex Secondary Copy Pending True Disable dscli> lspprc 2800 2c00 Date/Time: October 10, 2010 3:55:48 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ========================================================================================================= 2600:2C00 Target Copy Pending Global Copy 26 unknown Disabled Invalid 2E00:2800 Target Copy Pending Global Copy 2E unknown Disabled Invalid 394 IBM PowerHA SystemMirror 7.1 for AIX 13.7.1 Graceful site failover Performing a controlled move of a production environment across sites is a basic test to ensure that the remote site can bring the production environment online. This test is done only during initial implementation testing or during a planned production outage of the site. In this test, we perform the graceful failover operation between sites by performing a resource group move. In a true maintenance scenario, you might most likely perform a graceful site failover by stopping the cluster on the local standby node first. Then you stop the cluster on the production node by using Move Resource Group. Moving the resource group to another site: In this scenario, because we only have one node at the Romania site, we use the option to move the resource group to another site. If multiple remote nodes are members of the resource, use the option to move the resource group to another node instead. During this move, the following operations are performed: Release the primary online instance of ds8kgmrg at the Texas site. This operation entails the following tasks: – – – – Executes the application server stop. Unmounts the file systems. Varies off the volume group. Removes the service IP address. Release the secondary online instance of ds8kgmrg at the Romania site. Acquire ds8kgmrg in the secondary online state at the Texas site. Acquire ds8kgmrg in the online primary state at the Romania site. To perform the resource group move by using SMIT, follow these steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path System Management (C-SPOC) Resource Groups and Applications Move a Resource Group to Another Node / Site Move Resource Groups to Another Site. Chapter 13. Disaster recovery using DS8700 Global Mirror 395 3. Select the ONLINE instance of ds8kgmrg to be moved as shown in Figure 13-14. Move a Resource Group to Another Node / Site Move cursor to desired item and press Enter. Move Resource Groups to Another Node Move +--------------------------------------------------------------------------+ | Select a Resource Group | | | | Move cursor to desired item and press Enter. Use arrow keys to scroll. | | | | # | | # Resource Group State Node(s) / Site | | # | | ds8kgmrg ONLINE jordan / Texas | | ds8kgmrg ONLINE SECONDARY robert / Romani | | | | # | | # Resource groups in node or site collocation configuration: | | # Resource Group(s) State Node / Site | | # | | | | F1=Help F2=Refresh F3=Cancel | | F8=Image F10=Exit Enter=Do | F1=Help| /=Find n=Find Next | F9=Shel+--------------------------------------------------------------------------+ Figure 13-14 Selecting a resource group 4. Select the Romania site from the next menu as shown in Figure 13-15. +--------------------------------------------------------------------------+ | Select a Destination Site | | | | Move cursor to desired item and press Enter. | | | | # *Denotes Originally Configured Primary Site | | Romania | | | | F1=Help F2=Refresh F3=Cancel | | F8=Image F10=Exit Enter=Do | | /=Find n=Find Next | +--------------------------------------------------------------------------+ Figure 13-15 Selecting a site for a resource group move 5. Verify the information in the final menu and Press Enter. 396 IBM PowerHA SystemMirror 7.1 for AIX Upon completion of the move, ds8kgmrg is online on the node robert as shown Example 13-24. Attention: During our testing, a problem was encountered. After performing the first resource group move between sites, we are unable to move it back due to the pick list for destination site is empty. We can move it back by node. Later in our testing, the by-site option started working. However, it moved the resource group to the standby node at the primary site instead of the original primary node. If you encounter similar problems, contact IBM support. Example 13-24 Resource group status after the site move to Romania ----------------------------------------------------------------------------Group Name State Node ----------------------------------------------------------------------------ds8kgmrg ONLINE SECONDARY jordan@Texas OFFLINE leeann@Texas ONLINE robert@Romania 6. Repeat the resource group move to move it back to its original primary site, Texas, and node, jordan, to return to the original starting state. However, instead of using the option to move it another site, use the option to move it to another node. Example 13-25 shows that the Global Mirror statuses are now swapped, and the local site is showing the LUNs now as the target volumes. Example 13-25 Global Mirror status after the resource group move *******************From node jordan at site Texas*************************** dscli> lssession 26 2E Date/Time: October 10, 2010 4:04:44 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete AllowCascading =========================================================================================================== ====== 26 03 Normal 2600 Active Primary Simplex Secondary Copy Pending True Disable 2E 03 Normal 2E00 Active Primary Simplex Secondary Copy Pending True Disable dscli> lspprc 2600 2E00 Date/Time: October 10, 2010 4:05:26 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ========================================================================================================= 2800:2E00 Target Copy Pending Global Copy 28 unknown Disabled Invalid 2C00:2600 Target Copy Pending Global Copy 2C unknown Disabled Invalid *******************From remote node robert at site Romania*************************** dscli> lssession 28 2C Date/Time: October 10, 2010 3:59:25 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete AllowCascading =========================================================================================================== ============== 28 03 CG In Progress 2800 Active Primary Copy Pending Secondary Simplex True Disable 2C 03 CG In Progress 2C00 Active Primary Copy Pending Secondary Simplex True Disable Chapter 13. Disaster recovery using DS8700 Global Mirror 397 dscli> lspprc 2800 2C00 Date/Time: October 10, 2010 3:59:35 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ================================================================================================== 2800:2E00 Copy Pending Global Copy 28 60 Disabled True 2C00:2600 Copy Pending Global Copy 2C 60 Disabled True 13.7.2 Rolling site failure This scenario entails performing a rolling site failure of the Texas site by using the following steps: 1. 2. 3. 4. 5. 6. Halt the primary production node jordan at the Texas site. Verify that the resource group ds8kgmrg is acquired locally by the node leeann. Verify that the Global Mirror pairs are in the same status as before the system failure. Halt the node leeann to produce a site down. Verify that the resource group ds8kgmrg is acquired remotely by the robert node. Verify that the Global Mirror pair states are changed. Begin with all three nodes active in the cluster and the resource group online on the primary node as shown in Example 13-22 on page 394. On the node jordan, we run the reboot -q command. The node leeann acquires the ds8kgmrg resource group as shown in Example 13-26. Example 13-26 Local node failover within the site Texas root@leeann: clRGinfo -----------------------------------------------------------------------------Group Name State Node ----------------------------------------------------------------------------ds8kgmrg OFFLINE jordan@Texas ONLINE leeann@Texas ONLINE SECONDARY robert@Romania Example 13-27 shows that the statuses are the same as when we started. Example 13-27 Global Mirror pair status after a local failover *******************From node leeann at site Texas*************************** dscli> lssession 26 2E Date/Time: October 10, 2010 4:10:04 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete AllowCascading =========================================================================================================== ============== 26 03 CG In Progress 2600 Active Primary Copy Pending Secondary Simplex True Disable 2E 03 CG In Progress 2E00 Active Primary Copy Pending Secondary Simplex True Disable dscli> lspprc 2600 2E00 Date/Time: October 10, 2010 4:10:43 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ================================================================================================== 2600:2C00 Copy Pending Global Copy 26 60 Disabled True 2E00:2800 Copy Pending Global Copy 2E 60 Disabled True 398 IBM PowerHA SystemMirror 7.1 for AIX *******************From remote node robert at site Romania*************************** dscli> lssession 28 2c Date/Time: October 10, 2010 4:04:58 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete AllowCascading =========================================================================================================== 28 03 Normal 2800 Join Pending Primary Simplex Secondary Copy Pending True Disable 2C 03 Normal 2C00 Join Pending Primary Simplex Secondary Copy Pending True Disable dscli> lspprc 2800 2c00 Date/Time: October 10, 2010 4:05:48 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ========================================================================================================= 2600:2C00 Target Copy Pending Global Copy 26 unknown Disabled Invalid 2E00:2800 Target Copy Pending Global Copy 2E unknown Disabled Invalid Upon the cluster stabilization, we run the reboot -q command on the leeann node invoking a site_down event. The robert node at the Romania site acquires the ds8kgmrg resource group as shown in Example 13-28. Example 13-28 Hard failover between sites root@robert: clRGinfo ----------------------------------------------------------------------------Group Name State Node ----------------------------------------------------------------------------ds8kgmrg OFFLINE jordan@Texas OFFLINE leeann@Texas ONLINE robert@Romania You can also see that the replicated pairs are now in the suspended state at the remote site as shown in Example 13-29. Example 13-29 Global Mirror pair status after site failover *******************From remote node robert at site Romania*************************** dscli> lssession 28 2c Date/Time: October 10, 2010 4:17:28 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete AllowCascading =========================================================================================================== 28 03 Normal 2800 Join Pending Primary Suspended Secondary Simplex False Disable 2C 03 Normal 2C00 Join Pending Primary Suspended Secondary Simplex False Disable dscli> lspprc 2800 2c00 Date/Time: October 10, 2010 4:17:55 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ==================================================================================================== 2800:2E00 Suspended Host Source Global Copy 28 60 Disabled False 2C00:2600 Suspended Host Source Global Copy 2C 60 Disabled False Chapter 13. Disaster recovery using DS8700 Global Mirror 399 Important: Although the testing resulted in a site_down event, we never lost access to the primary storage subsystem. PowerHA does not check storage connectivity back to the primary site during this event. Before moving back to the primary site, re-establish the replicated pairs and get them all back in sync. If you replace the storage, you might also have to change the storage agent, storage subsystem, and mirror groups to ensure that the new configuration is correct for the cluster. 13.7.3 Site re-integration Before bringing the primary site node back into the cluster, the Global Mirror pairs must be placed back in sync by using the following steps: Tip: Follow these steps “as is” because you can accomplish the same results using various methods: 1. 2. 3. 4. 5. 6. 7. 8. Verify that the Global Mirror statuses at the primary site are suspended. Fail back PPRC from the secondary site. Verify that the Global Mirror status at the primary site shows the target status. Verify that out-of-sync tracks are 0. Stop the cluster to ensure that the volume group I/O is stopped. Fail over the PPRC on the primary site. Fail back the PPRC on the primary site. Start the cluster. Failing back the PPRC pairs to the secondary site To fail back the PPRC pairs to the secondary site, follow these steps: 1. Verify the current state of the Global Mirror pairs at the primary site from the jordan node. The pairs are suspended as shown in Example 13-30. Example 13-30 Suspended pair status in Global Mirror on the primary site after node restart *******************From node jordan at site Texas*************************** dscli> lspprc 2600 2e00 Date/Time: October 10, 2010 4:27:48 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ==================================================================================================== 2600:2C00 Suspended Host Source Global Copy 26 60 Disabled True 2E00:2800 Suspended Host Source Global Copy 2E 60 Disabled True 2. On the remote node robert, fail back the PPRC pairs as shown in Example 13-31. Example 13-31 Failing back PPRC pairs at the remote site *******************From node robert at site Romania*************************** dscli> failbackpprc -type gcp 2C00:2600 2800:2E00 Date/Time: October 10, 2010 4:22:09 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 CMUC00197I failbackpprc: Remote Mirror and Copy pair 2C00:2600 successfully failed back. CMUC00197I failbackpprc: Remote Mirror and Copy pair 2800:2E00 successfully 400 IBM PowerHA SystemMirror 7.1 for AIX 3. After executing the fallback, check the status again of the pairs from the primary site to ensure that they are now shown as Target (Example 13-32). Example 13-32 Verifying that the primary site LUNs are now target LUNs *******************From node jordan at site Texas*************************** dscli> lspprc 2600 2e00 Date/Time: October 10, 2010 4:44:21 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ================================================================================================ ========= 2800:2E00 Target Copy Pending Global Copy 28 unknown Disabled Invalid 2C00:2600 Target Copy Pending Global Copy 2C unknown Disabled Invalid 4. Monitor that the status of replication at the remote site by watching the Out of Sync Tracks field by using the lspprc -l command. After they are at 0, as shown in Example 13-33, they are in sync. Then you can stop the remote site in preparation to move production back to the primary site. Example 13-33 Verifying that the Global Mirror pairs are back in sync dscli> lspprc -l 2800 2c00 Date/Time: October 10, 2010 4:22:46 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 ID State Reason Type Out Of Sync Tracks Tgt Read Src Cascade Tgt Cascade Date Suspended SourceLSS =========================================================================================================== ============ 2800:2E00 Copy Pending Global Copy 0 Disabled Disabled Invalid 28 2C00:2600 Copy Pending Global Copy 0 Disabled Disabled Invalid 2C 6 Failing over the PPRC pairs back to the primary site To fail over the PPRC pairs back to the primary site, follow these steps: 1. Stop the cluster on node robert by using the smitty clstop command to bring the resource group down. 2. After the resources are offline, continue to fail over the PPRC on the primary site jordan node as shown Example 13-34. Example 13-34 Failover PPRC pairs at local primary site *******************From node jordan at site Texas*************************** dscli> failoverpprc -type gcp 2600:2c00 2E00:2800 Date/Time: October 10, 2010 4:45:16 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 CMUC00196I failoverpprc: Remote Mirror and Copy pair 2600:2C00 successfully reversed. CMUC00196I failoverpprc: Remote Mirror and Copy pair 2E00:2800 successfully reversed. Chapter 13. Disaster recovery using DS8700 Global Mirror 401 3. Again verify that the status is in the suspended state on the primary site and that the remote site shows the copy state as shown in Example 13-35. Example 13-35 Global Mirror pairs suspended on the primary site *******************From node jordan at site Texas*************************** dscli> lspprc 2600 2E00 Date/Time: October 10, 2010 4:45:51 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ================================================================================================ ==== 2600:2C00 Suspended Host Source Global Copy 26 60 Disabled True 2E00:2800 Suspended Host Source Global Copy 2E 60 Disabled True ******************From node robert at site Romania*************************** dscli> lspprc 2800 2c00 Date/Time: October 10, 2010 4:39:27 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ================================================================================================ == 2800:2E00 Copy Pending Global Copy 28 60 Disabled True 2C00:2600 Copy Pending Global Copy 2C 60 Disabled True Failing back the PPRC pairs to the primary site You cannot complete the switchback to the primary site by performing a failback of the Global Mirror pairs to the primary site by running the failbackpprc command as shown in Example 13-36. Example 13-36 Failing back the PPRC pairs on the primary site *******************From node jordan at site Texas*************************** dscli> failbackpprc -type gcp 2600:2c00 2E00:2800 Date/Time: October 10, 2010 4:46:49 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 CMUC00197I failbackpprc: Remote Mirror and Copy pair 2600:2C00 successfully failed back. CMUC00197I failbackpprc: Remote Mirror and Copy pair 2E00:2800 successfully failed back. Verify the status of the pairs at each site as shown in Example 13-37. Example 13-37 Global Mirror pairs failed back to the primary site *******************From node jordan at site Texas*************************** dscli> lspprc 2600 2e00 Date/Time: October 10, 2010 4:47:04 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ================================================================================================== 2600:2C00 Copy Pending Global Copy 26 60 Disabled True 2E00:2800 Copy Pending Global Copy 2E 60 Disabled True ******************From node robert at site Romania*************************** dscli> lspprc 2800 2c00 Date/Time: October 10, 2010 4:40:44 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status 402 IBM PowerHA SystemMirror 7.1 for AIX ========================================================================================================= 2600:2C00 Target Copy Pending Global Copy 26 unknown Disabled Invalid 2E00:2800 Target Copy Pending Global Copy 2E unknown Disabled Invalid Starting the cluster To start the cluster, follow these steps: 1. Start all nodes in the cluster by using the smitty clstart command as shown Figure 13-16. Start Cluster Services Type or select values in entry fields. Press Enter AFTER making all desired changes. * Start now, on system restart or both Start Cluster Services on these nodes * Manage Resource Groups BROADCAST message at startup? Startup Cluster Information Daemon? Ignore verification errors? Automatically correct errors found during cluster start? [Entry Fields] now [jordan,leeann,robert] Automatically true true false Interactively + + + + + + + Figure 13-16 Restarting a cluster after a site failure Upon startup of the primary node jordan, the resource group is automatically started on jordan and back to the original starting point as shown in Example 13-38. Example 13-38 Resource group status after restart ----------------------------------------------------------------------------Group Name State Node ----------------------------------------------------------------------------ds8kgmrg ONLINE jordan@Texas OFFLINE leeann@Texas ONLINE SECONDARY robert@Romania 2. Verify the pair and session status on each site as shown in Example 13-39. Example 13-39 Global Mirror pairs back to normal *******************From node jordan at site Texas*************************** dscli>lssession 26 2e Date/Time: October 10, 2010 5:02:11 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete AllowCascading =========================================================================================================== ============== 26 03 CG In Progress 2600 Active Primary Copy Pending Secondary Simplex True Disable 2E 03 CG In Progress 2E00 Active Primary Copy Pending Secondary Simplex True Disable dscli> lspprc 2600 2e00 Chapter 13. Disaster recovery using DS8700 Global Mirror 403 Date/Time: October 10, 2010 5:02:26 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ================================================================================================== 2600:2C00 Copy Pending Global Copy 26 60 Disabled True 2E00:2800 Copy Pending Global Copy 2E 60 Disabled True ******************From node robert at site Romania*************************** dscli>lssession 28 2C Date/Time: October 10, 2010 4:56:11 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 LSS ID Session Status Volume VolumeStatus PrimaryStatus SecondaryStatus FirstPassComplete AllowCascading =========================================================================================================== ====== 28 03 Normal 2800 Active Primary Simplex Secondary Copy Pending True Disable 2C 03 Normal 2C00 Active Primary Simplex Secondary Copy Pending True Disable dscli> lspprc 2800 2c00 Date/Time: October 10, 2010 4:56:30 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ========================================================================================================= 2600:2C00 Target Copy Pending Global Copy 26 unknown Disabled Invalid 2E00:2800 Target Copy Pending Global Copy 2E unknown Disabled Invalid 13.8 LVM administration of DS8000 Global Mirror replicated resources This section provides the common scenarios for adding additional storage to an existing Global Mirror replicated environment. These scenarios work primarily with the Texas site and the ds8kgmrg resource group. You perform the following tasks: Adding a new Global Mirror pair to an existing volume group Adding a Global Mirror pair into a new volume group Dynamically expanding a volume: This topic does not provide information about dynamically expanding a volume because this option is not supported. 13.8.1 Adding a new Global Mirror pair to an existing volume group To add a new Global Mirror pair to an existing volume group, follow these steps: 1. Assign a new LUN to each site, add the FlashCopy devices, and add the new pair into the existing session as explained in 13.4.3, “Configuring the Global Mirror relationships” on page 377. Table 13-2 summarizes the LUNs that are used from each site. Table 13-2 Summary of the LUNs used on each site Texas 404 Romania AIX DISK LSS/VOL ID AIX DISK LSS/VOL ID hdisk11 2605 hdisk10 2C06 IBM PowerHA SystemMirror 7.1 for AIX 2. Define the new LUNs: a. Run the cfgmgr command on the primary node jordan. b. Assign the PVID on the node jordan. chdev -l hdisk11 -a pv=yes c. d. e. f. g. h. i. Configure disk and PVID on local node leeann by using the cfgmgr command. Verify that the PVID is displayed by running the lspv command. Pause the PPRC on the primary site. Fail over the PPRC to the secondary site. Configure the disk and PVID on the remote node robert with the cfgmgr command. Verify that the PVID is displayed by running the lspv command. Fail back the PPRC to the primary site. 3. Add the new disk into the volume group by using C-SPOC as follows: Important: C-SPOC cannot perform the certain LVM operations on nodes at the remote site (that contain the target volumes). Such operations include those operations that require nodes at the target site to read from the target volumes. These operations cause an error message in C-SPOC. This includes functions such as changing file system size, changing mount point, and adding LVM mirrors. However, nodes on the same site as the source volumes can successfully perform these tasks. The changes can be propagated later to the other site by using a lazy update. For C-SPOC operations to work on all other LVM operations, perform all C-SPOC operations with the Global Mirror volume pairs in synchronized or consistent states or the ACTIVE cluster on all nodes. a. From the command line, type the smitty cl_admin command. b. In SMIT, select the path System Management (C-SPOC) Storage Volume Groups Add a Volume to a Volume Group. c. Select the txvg volume group from the pop-up menu. Chapter 13. Disaster recovery using DS8700 Global Mirror 405 d. Select the disk or disks by PVID as shown in Figure 13-17. Set Characteristics of a Volume Group Move cursor to desired item and press Enter. Add a Volume to a Volume Group Change/Show characteristics of a Volume Group Remove a Volume from a Volume Group Enable/Disable a Volume Group for Cross-Site LVM Mirroring Verification +--------------------------------------------------------------------------+ | Physical Volume Names | | | | Move cursor to desired item and press Enter. | | | | 000a624a987825c8 ( hdisk10 on node robert ) | | 000a624a987825c8 ( hdisk11 on nodes jordan,leeann ) | | | | F1=Help F2=Refresh F3=Cancel | | F8=Image F10=Exit Enter=Do | F1| /=Find n=Find Next | F9+--------------------------------------------------------------------------+ Figure 13-17 Disk selection to add to the volume group e. Verify the menu information, as shown in Figure 13-18, and press Enter. Add a Volume to a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. VOLUME GROUP name Resource Group Name Node List Reference node VOLUME names [Entry Fields] txvg ds8kgmrg jordan,leeann,robert robert hdisk10 Figure 13-18 Add a Volume C-SPOC SMIT menu Upon completion of the C-SPOC operation, the local nodes have been updated but the remote node has not been updated as shown in Example 13-40. This node was not updated because the target volumes are not readable until the relationship is swapped. You receive an error message from C-SPOC, as shown in the note after Example 13-40. However, the lazy update procedure at the time of failover pulls in the remaining volume group information. Example 13-40 New disk added to volume group on all nodes root@jordan: lspv |grep txvg hdisk6 000a625afe2a4958 hdisk10 000a624a833e440f hdisk11 000a624a987825c8 406 IBM PowerHA SystemMirror 7.1 for AIX txvg txvg txvg root@leeann: lspv |grep txvg hdisk6 000a625afe2a4958 hdisk10 000a624a833e440f hdisk11 000a624a987825c8 txvg txvg txvg root@robert: lspv hdisk2 000a624a833e440f hdisk6 000a625afe2a4958 hdisk10 000a624a987825c8 txvg txvg none Attention: When using C-SPOC to modify a volume group containing a Global Mirror replicated resource, you can expect to see the following error message: cl_extendvg: Error executing clupdatevg txvg 000a624a833e440f on node robert You do not need to synchronize the cluster because all of these changes are made to an existing volume group. However, consider running a verification. Adding a new logical volume Again you use C-SPOC to add a new logical volume. As noted earlier, this process updates the local nodes within the site. For the remote site, when a failover occurs, the lazy update process updates the volume group information as needed. This process also adds a bit of extra time to the failover time. To add a new logical volume, follow these steps: 1. From the command line, type the smitty cl_admin command. 2. In SMIT, select the path System Management (C-SPOC) Storage Logical Volumes Add a Logical Volume. 3. Select the txvg volume group from the pop-up menu. Chapter 13. Disaster recovery using DS8700 Global Mirror 407 4. Select the newly added disk hdisk11 as shown in Figure 13-19. Logical Volumes Move cursor to desired item and press Enter. List All Logical Volumes by Volume Group Add a Logical Volume Show Characteristics of a Logical Volume Set Characteristics of a Logical Volume +--------------------------------------------------------------------------+ | Physical Volume Names | | | | Move cursor to desired item and press F7. | | ONE OR MORE items can be selected. | | Press Enter AFTER making all selections. | | | | Auto-select | | jordan hdisk6 | | jordan hdisk10 | | jordan hdisk11 | | | | F1=Help F2=Refresh F3=Cancel | | F7=Select F8=Image F10=Exit | F1| Enter=Do /=Find n=Find Next | F9+--------------------------------------------------------------------------+ Figure 13-19 Choose disk for new logical volume creation 408 IBM PowerHA SystemMirror 7.1 for AIX 5. Complete the information in the final menu (Figure 13-20), and press Enter. We added a new logical volume, named pattilv, which consists of 100 logical partitions (LPARs) and selected raw for the type. We left all other values with their defaults. Add a Logical Volume Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] Resource Group Name VOLUME GROUP name Node List Reference node * Number of LOGICAL PARTITIONS PHYSICAL VOLUME names Logical volume NAME Logical volume TYPE POSITION on physical volume RANGE of physical volumes MAXIMUM NUMBER of PHYSICAL VOLUMES to use for allocation Number of COPIES of each logical [MORE...15] [Entry Fields] ds8kgmrg txvg jordan,leeann,robert jordan [100] hdisk11 [pattilv] [raw] outer_middle minimum [] 1 # + + + # + Figure 13-20 New logical volume C-SPOC SMIT menu 6. Upon completion of the C-SPOC operation, verify that the new logical volume is created locally on node jordan as shown in Example 13-41. Example 13-41 Newly created logical volume root@jordan:lsvg -l txvg txvg: LV NAME TYPE txlv jfs2 txloglv jfs2log pattilv raw LPs 250 1 100 PPs 150 1 100 PVs 3 1 1 LV STATE open/syncd open/syncd closed/syncd MOUNT POINT /txro N/A N/A Similar to when you create the volume group, you see an error message (Figure 13-21) about being unable to update the remote node. COMMAND STATUS Command: OK stdout: yes stderr: no Before command completion, additional instructions may appear below. jordan: pattilv cl_mklv: Error executing clupdatevg txvg 000a625afe2a4958 on node robert Figure 13-21 C-SPOC normal error upon logical volume creation Chapter 13. Disaster recovery using DS8700 Global Mirror 409 Increasing the size of an existing file system Again you use C-SPOC to perform this operation. As noted previously, this process updates the local nodes within the site. For the remote site, when a failover occurs, the lazy update process updates the volume group information as needed. This process also adds a bit of extra time to the failover time. To increase the size of an existing file system, follow these steps: 1. From the command line, type the smitty cl_admin command. 2. In SMIT, select the path System Management (C-SPOC) Storage File Systems Change / Show Characteristics of a File System. 3. Select the txro file system from the pop-up menu. 4. Complete the information in the final menu, and press Enter. In the example in Figure 13-22, notice that we change the size from 1024 MB to 1250 MB. Change/Show Characteristics of a Enhanced Journaled File System Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] Volume group name Resource Group Name * Node Names [Entry Fields] txvg ds8kgmrg robert,leeann,jordan * File system name NEW mount point SIZE of file system Unit Size Number of Units Mount GROUP Mount AUTOMATICALLY at system restart? PERMISSIONS Mount OPTIONS [MORE...7] /txro [/txro] Megabytes [1250] [] no read/write [] / + # + + + Figure 13-22 Changing the file system size on the final C-SPOC menu 5. Upon completion of the C-SPOC operation, verify that the new file system size locally on node jordan has increased from 250 LPAR as shown in Example 13-41 on page 409 to 313 LPAR as shown Example 13-42. Example 13-42 Newly increased file system size root@jordan:lsvg -l txvg txvg: LV NAME TYPE txlv jfs2 txloglv jfs2log pattilv raw LPs 313 1 100 PPs 313 1 100 PVs 3 1 1 LV STATE open/syncd open/syncd closed/syncd MOUNT POINT /txro N/A N/A A cluster synchronization is not required, because technically the resources have not changed. All of the changes were made to an existing volume group that is already a resource in the resource group. 410 IBM PowerHA SystemMirror 7.1 for AIX Testing the fallover after making the LVM changes Because you do not know if the cluster is going to work when you need it, repeat the steps from 13.7.2, “Rolling site failure” on page 398. The new logical volume pattilv and additional space on /txro show up on each node. However, a noticeable difference is on the site failover when the lazy update is performed to update the volume group changes. 13.8.2 Adding a Global Mirror pair into a new volume group The steps to add a new volume begin the same as the steps in 13.5, “Configuring AIX volume groups” on page 381. However, for completeness, this section provides an overview of the steps again and then provide details about the new LUNs to be used. In this scenario, we re-use the LUNs from the previous section. We removed them from the volume group and removed the disks for all nodes except the main primary node jordan. In our process, we cleared the PVID and then assigned a new PVID for a clean start. Table 13-3 provides a summary of the LUNs that we implemented in each site. Table 13-3 Summary of the LUNs implemented in each site Texas Romania AIX dISK LSS/VOL ID AIX dISK LSS/VOL ID hdisk11 2605 hdisk10 2C06 Now continue with the following steps, which are the same as those steps for defining new LUNs: 1. Run the cfgmgr command on the primary node jordan. 2. Assign the PVID on the node jordan: chdev -l hdisk11 -a pv=yes 3. Configure the disk and PVID on the local node leeann by using the cfgmgr command. 4. Verify that PVID shows up by using the lspv command. 5. Pause the PPRC on the primary site. 6. Fail over the PPRC to the secondary site. 7. Fail back the PPRC to the secondary site. 8. Configure the disk and PVID on the remote node robert by using the cfgmgr command. 9. Verify that PVID shows up by using the lspv command. 10.Pause the PPRC on the secondary site. 11.Fail over the PPRC to the primary site. 12.Fail back the PPRC to the primary site. The main difference between adding a new volume group and extending an existing one is that, when adding a new volume group, you must swap the pairs twice. When extending an existing volume group, you can get away with only swapping once. The main difference between adding a new volume group and extending an existing one is similar to the original setup where we created all LVM components on the primary site and swap the PPRC pairs to the remote site to import the volume group and then swap it back. You can avoid performing two swaps, as we showed, by not choosing to include the third node when creating the volume group. Then you can swap the pairs, run cfgmgr on the new disk with the PVID, import the volume group, and swap the pairs back. Chapter 13. Disaster recovery using DS8700 Global Mirror 411 Creating a volume group Create a volume group by using C-SPOC: 1. From the command line, type the smitty cl_admin command. 2. In SMIT, select the path System Management (C-SPOC) Storage Volume Groups Create a Volume to a Volume Group. 3. Select the specific nodes. In this case, we chose all three nodes as shown in Figure 13-23. Volume Groups Move cursor to desired item and press Enter. List All Volume Groups Create a Volume Group Create a Volume Group with Data Path Devices +--------------------------------------------------------------------------+ | Node Names | | | | Move cursor to desired item and press F7. | | ONE OR MORE items can be selected. | | Press Enter AFTER making all selections. | | | | > jordan | | > leeann | | > robert | | | | | | F1=Help F2=Refresh F3=Cancel | | F7=Select F8=Image F10=Exit | F1| Enter=Do /=Find n=Find Next | F9+--------------------------------------------------------------------------+ Figure 13-23 Adding a volume group node pick list 412 IBM PowerHA SystemMirror 7.1 for AIX 4. Select the disk or disks by PVID as shown in Figure 13-24. Volume Groups Move cursor to desired item and press Enter. List All Volume Groups Create a Volume Group Create a Volume Group with Data Path Devices Set Characteristics of a Volume Group Enable a Volume Group for Fast Disk Takeover or Concurrent Access +--------------------------------------------------------------------------+ | Physical Volume Names | | | | Move cursor to desired item and press F7. | | ONE OR MORE items can be selected. | | Press Enter AFTER making all selections. | | | | 000a624a9bb74ac3 ( hdisk10 on node robert ) | | 000a624a9bb74ac3 ( hdisk11 on nodes jordan,leeann ) | | | | F1=Help F2=Refresh F3=Cancel | | F7=Select F8=Image F10=Exit | F1| Enter=Do /=Find n=Find Next | F9+--------------------------------------------------------------------------+ Figure 13-24 Selecting the disk or disks for the new volume group pick list Chapter 13. Disaster recovery using DS8700 Global Mirror 413 5. Select the volume group type. In this scenario, we select scalable as shown in Figure 13-25. Volume Groups Move cursor to desired item and press Enter. List All Volume Groups Create a Volume Group Create a Volume Group with Data Path Devices Set Characteristics of a Volume Group +--------------------------------------------------------------------------+ | Volume Group Type | | | | Move cursor to desired item and press Enter. | | | | Legacy | | Original | | Big | | Scalable | | | | F1=Help F2=Refresh F3=Cancel | | F8=Image F10=Exit Enter=Do | F1| /=Find n=Find Next | F9+--------------------------------------------------------------------------+ Figure 13-25 Choosing the volume group type for the new volume group pick list 6. Select the proper resource group. We select ds8kgmrg as shown in Figure 13-26. Create a Scalable Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] Node Names Resource Group Name PVID VOLUME GROUP name Physical partition SIZE in megabytes Volume group MAJOR NUMBER Enable Cross-Site LVM Mirroring Verification Enable Fast Disk Takeover or Concurrent Access Volume Group Type Maximum Physical Partitions in units of 1024 Maximum Number of Logical Volumes [Entry Fields] jordan,leeann,robert [ds8kgmrg] 000a624a9bb74ac3 [princessvg] 4 [51] false Fast Disk Takeover or> Scalable 32 256 Figure 13-26 Create a Scalable Volume Group (final) menu 7. Select a volume group name. We select princessvg. Then press Enter. 414 IBM PowerHA SystemMirror 7.1 for AIX + + # + + + + Instead of using C-SPOC, you can perform the steps manually and then import the volume groups on each node as needed. However, remember to add the volume group into the resource group after creating it. With C-SPOC, you can automatically add it to the resource group while you are creating the volume group. You can also use the C-SPOC CLI commands (Example 13-43). These commands are in the /usr/es/sbin/cluster/cspoc directory, and all begin with the cli_ prefix. Similar to the SMIT menus, their operation output is also saved in the cspoc.log file. Example 13-43 C-SPOC CLI commands root@jordan: ls cli_* cli_assign_pvids cli_extendlv cli_chfs cli_extendvg cli_chlv cli_importvg cli_chvg cli_mirrorvg cli_crfs cli_mklv cli_crlvfs cli_mklvcopy cli_mkvg cli_on_cluster cli_on_node cli_reducevg cli_replacepv cli_rmfs cli_rmlv cli_rmlvcopy cli_syncvg cli_unmirrorvg cli_updatevg Upon completion of the C-SPOC operation, the local nodes are updated, but the remote node is not as shown in Example 13-44. The remote nodes are not updated because the target volumes are not readable until the relationship is swapped. You see an error message from C-SPOC as shown in the note following Example 13-44. After you create all LVM structures, you swap the pairs back to the remote node and import the new volume group and logical volume. Example 13-44 New disk added to volume group on all nodes root@jordan: lspv |grep princessvg hdisk11 000a624a9bb74ac3 princessvg root@leeann: lspv |grep princessvg hdisk11 000a624a9bb74ac3 princessvg root@robert: lspv |grep princessvg Attention: When using C-SPOC to add a new volume group that contains a Global Mirror replicated resource, you might see the following error message: cl_importvg: Error executing climportvg 000a624a9bb74ac3 on node robert -V 51 -c -y princessvg -Q While this message is normal, if you select any remote nodes, you can omit the remote nodes and then you do not see the error message. This step is allowed because you manually import it anyway. When creating the volume group, it usually is automatically added to the resource group as shown in Example 13-45 on page 416. However, with the error message indicted in the previous attention box, it might not be automatically added. Therefore, double check that the volume group is added into the resource group before continuing. Otherwise we do not have to change the resource group any further. The new LUN pairs are added to the same storage subsystems and the same session (3) that is already defined in the mirror group texasmg. Chapter 13. Disaster recovery using DS8700 Global Mirror 415 Example 13-45 New volume group added to existing resource group Resource Group Name Inter-site Management Policy Participating Nodes from Primary Site Participating Nodes from Secondary Site Startup Policy Fallover Policy Fallback Policy Service IP Label Volume Groups GENXD Replicated Resources ds8kgmrg Prefer Primary Site jordan leeann robert Online On Home Node Only Fallover To Next Priority Node Never Fallback serviceip_2 txvg princessvg + texasmg Adding a new logical volume on the new volume group You repeat the steps in “Adding a new logical volume” on page 407 to create a new logical volume, named princesslv, on the newly created volume group, princessvg, as shown in Example 13-46. Example 13-46 New logical volume on the newly added volume group root@jordan: lsvg -l princessvg princessvg: LV NAME TYPE LPs princesslv raw 38 PPs 38 PVs 1 LV STATE closed/syncd MOUNT POINT N/A Importing the new volume group to the remote site To import the volume group, follow the steps in 13.5.2, “Importing the volume groups in the remote site” on page 383. As a review, we perform the following steps: 1. 2. 3. 4. 5. 6. 7. 8. 9. Vary off the volume group on the local site. Pause the PPRC pairs on the local site. Fail over the PPRC pairs on the remote site. Fail back the PPRC pairs on the remote site. Import the volume group. Vary off the volume group on the remote site. Pause the PPRC pairs on the remote site. Fail over the PPRC pairs on the local site. Fail back the PPRC pairs on the local site. Synchronizing and verifying the cluster configuration You now synchronize the resource group change to include the new volume group that was added. However, first run a verification only to check for errors. If you find errors, you must fix them manually because they are not automatically fixed in a running environment. Then synchronize and verify it: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path Extended Configuration Extended Verification and Synchronization and Verification. 416 IBM PowerHA SystemMirror 7.1 for AIX 3. Select the options as shown in Figure 13-27. HACMP Verification and Synchronization (Active Cluster Nodes Exist) Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [No] [Standard] * Verify changes only? * Logging F1=Help F5=Reset F2=Refresh F6=Command F3=Cancel F7=Edit + + F4=List F8=Image Figure 13-27 Extended Verification and Synchronization SMIT menu 4. Verify that the information is correct, and press Enter. Upon completion, the cluster configuration is synchronize and can now be tested. Testing the failover after adding a new volume group Because you do not know if the cluster is going to work when needed, repeat the steps from 13.7.2, “Rolling site failure” on page 398. The new volume group princessvg and logical volume princesslv are showing up in each node. Chapter 13. Disaster recovery using DS8700 Global Mirror 417 418 IBM PowerHA SystemMirror 7.1 for AIX 14 Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator This chapter explains how to configure disaster recovery based on IBM PowerHA SystemMirror for AIX Enterprise Edition using Hitachi TrueCopy/Hitachi Universal Replicator (HUR) replication services. This support is added in version 6.1 with service pack 3 (SP3). This chapter includes the following topics: Planning for TrueCopy/HUR management Overview of TrueCopy/HUR management Scenario description Configuring the TrueCopy/HUR resources Failover testing LVM administration of TrueCopy/HUR replicated pairs © Copyright IBM Corp. 2011. All rights reserved. 419 14.1 Planning for TrueCopy/HUR management Proper planning is crucial to the success of any implementation. Plan the storage deployment and replication necessary for your environment. This process is related to the applications and middleware that are being deployed in the environment, which can eventually be managed by PowerHA SystemMirror Enterprise Edition. This topic lightly covers site, network, storage area network (SAN), and storage planning, which are all key factors. However, the primary focus of this topic is the software prerequisites and support considerations. 14.1.1 Software prerequisites The following software is required: One of the following AIX levels or later: – AIX 5.3 TL9 and RSCT 2.4.12.0 – AIX 6.1 TL2 SP3 and RSCT 2.5.4.0 Multipathing software – AIX MPIO – Hitachi Dynamic Link Manager (HDLM) PowerHA 6.1 Enterprise Edition with SP3 The following additional file sets are included in SP3, must be installed separately, and require the acceptance of the license during the installation: – cluster.es.tc 6.1.0.0 ES HACMP - Hitachi support - Runtime Commands 6.1.0.0 ES HACMP - Hitachi support Commands – cluster.msg.en_US.tc (optional) 6.1.0.0 6.1.0.0 6.1.0.0 6.1.0.0 HACMP HACMP HACMP HACMP Hitachi Hitachi Hitachi Hitachi support Messages - U.S. English Messages - U.S. English IBM-850 Messages – Japanese Messages - Japanese IBM-eucJP Hitachi Command Control Interface (CCI) Version 01-23-03/06 or later USPV Microcode Level 60-06-05/00 or later 14.1.2 Minimum connectivity requirements for TrueCopy/HUR For TrueCopy/HUR connectivity, you must have the following minimum requirements in place: Ensure connectivity from the local Universal Storage Platform VM (USP VM) to the AIX host ports. The external storage ports on the local USP VMs (Data Center 1 and Data Center 2) are zoned and cabled to their corresponding existing storage systems. Present both the primary and secondary source devices to the local USP VMs. Primary and secondary source volumes in the migration group are presented from the existing storage systems to the corresponding local USP VMs. This step is transparent to the servers in the migration set. No devices are imported or accessed by the local USP VMs at this stage. 420 IBM PowerHA SystemMirror 7.1 for AIX Establish replication connectivity between the target storage systems. TrueCopy initiator and MCU target ports are configured on the pair of target USP VMs, and an MCU/RCU pairing is established to validate the configuration. Ensure replication connectivity from the local USP VMs to the remote USP VM TrueCopy/HUR initiator. Also ensure that MCU target ports are configured on the local and remote USP VMs. In addition, confirm that MCU and RCU pairing is established to validate the configuration. For HUR, configure Universal Replicator Journal Groups on local and remote USP VM storage systems. Configure the target devices. Logical devices on the target USP VM devices are formatted and presented to front-end ports or host storage domains. This way, device sizes, logical unit numbers, host modes, and presentation worldwide names (WWNs) are identical on the source and target storage systems. Devices are presented to host storage domains that correspond to both production and disaster recovery standby servers. Configure the target zoning. Zones are defined between servers in the migration group and the target storage system front-end ports, but new zones are not activated at this point. Ideally the connectivity is through redundant links, switches, and fabrics to the hosts and between the storage units themselves. 14.1.3 Considerations Keep in mind the following considerations for mirroring PowerHA SystemMirror Enterprise Edition with TrueCopy/HUR: AIX Virtual SCSI is not supported in this initial release. Logical Unit Size Expansion (LUSE) for Hitachi is not supported. Only fence-level NEVER is supported for synchronous mirroring. Only HUR is supported for asynchronous mirroring. The dev_name must map to a logical devices, and the dev_group must be defined in the HORCM_LDEV section of the horcm.conf file. The PowerHA SystemMirror Enterprise Edition TrueCopy/HUR solution uses dev_group for any basic operation, such as the pairresync, pairevtwait, or horctakeover operation. If several dev_names are in a dev_group, the dev_group must be enabled for consistency. PowerHA SystemMirror Enterprise Edition does not trap Simple Network Management Protocol (SNMP) notification events for TrueCopy/HUR storage. If a TrueCopy link goes down when the cluster is up and later the link is repaired, you must manually resynchronize the pairs. The creation of pairs is done outside the cluster control. You must create the pairs before you start the cluster services. Resource groups that are managed by PowerHA SystemMirror Enterprise Edition cannot contain volume groups with both TrueCopy/HUR-protected and non-TrueCopy/HUR-protected disks. All nodes in the PowerHA SystemMirror Enterprise Edition cluster must use same horcm instance. Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 421 You cannot use Cluster Single Point Of Control (C-SPOC) for the following Logical Volume Manager (LVM) operations to configure nodes at the remote site that contain the target volume: – Creating a volume group – Operations that require nodes at the target site to write to the target volumes For example, changing the file system size, changing the mount point, or adding LVM mirrors cause an error message in C-SPOC. However, nodes on the same site as the source volumes can successfully perform these tasks. The changes are then propagated to the other site by using a lazy update. C-SPOC on other LVM operations: For C-SPOC operations to work on all other LVM operations, perform all C-SPOC operations when the cluster is active on all PowerHA SystemMirror Enterprise Edition nodes and the underlying TrueCopy/HUR PAIRs are in a PAIR state. 14.2 Overview of TrueCopy/HUR management Hitachi TrueCopy/HUR storage management uses Command Control Interface (CCI) operations from the AIX operating system and PowerHA SystemMirror Enterprise Edition environment. PowerHA SystemMirror Enterprise Edition uses these interfaces to discover and integrate the Hitachi Storage replicated storage into the framework of PowerHA SystemMirror Enterprise Edition. With this integration, you can manage high availability disaster recovery (HADR) for applications by using the mirrored storage. Integration of TrueCopy/HUR and PowerHA SystemMirror Enterprise Edition provides the following benefits: Support for the Inter-site Management policy of Prefer Primary Site or Online on Either Site Flexible user-customizable resource group policies Support for cluster verification and synchronization Limited support for the C-SPOC in PowerHA SystemMirror Enterprise Edition Automatic failover and re-integration of server nodes attached to pairs of TrueCopy/HUR disk subsystem within sites and across sites Automatic management for TrueCopy/HUR links Management for switching the direction of the TrueCopy/HUR relationships when a site failure occurs. With this process, the backup site can take control of the managed resource groups in PowerHA SystemMirror Enterprise Edition from the primary site 14.2.1 Installing the Hitachi CCI software Use the following steps as a guideline to help you install the Hitachi CCI on the AIX cluster nodes. You can also find this information in the /usr/sbin/cluster/release_notes_xd file. However, the release notes only exist if you already have the PowerHA SystemMirror Enterprise Edition software installed. Always consult the latest version of the Hitachi Command Control Interface (CCI) User and Reference Guide, MK-90RD011, which you can download from: http://communities.vmware.com/servlet/JiveServlet/download/1183307-19474 422 IBM PowerHA SystemMirror 7.1 for AIX If you are installing CCI from a CD, use the RMinstsh and RMuninst scripts on the CD to automatically install and uninstall the CCI software. Important: You must install the Hitachi CCI software into the /HORCM/usr/bin directory. Otherwise, you must create a symbolic link to this directory. For other media, use the instructions in the following sections. Installing the Hitachi CCI software into a root directory To install the Hitachi CCI software into the root directory, follow these steps: 1. Insert the installation medium into the proper I/O device. 2. Move to the current root directory: # cd / 3. Copy all files from the installation medium by using the cpio command: # cpio -idmu < /dev/XXXX XXXX = I/O device Preserve the directory structure (d flag) and file modification times (m flag), and copy unconditionally (u flag). For diskettes, load them sequentially, and repeat the command. An I/O device name of “floppy disk” designates a surface partition of the raw device file (unpartitioned raw device file). 4. Execute the Hitachi Open Remote Copy Manager (HORCM) installation command: # /HORCM/horcminstall.sh 5. Verify installation of the proper version by using the raidqry command: # raidqry -h Model: RAID-Manager/AIX Ver&Rev: 01-23-03/06 Usage: raidqry [options] for HORC Installing the Hitachi CCI software into a non-root directory To install the Hitachi CCI software into a non-root directory, follow these steps: 1. Insert the installation medium, such as a CD, into the proper I/O device. 2. Move to the desired directory for CCI. The specified directory must be mounted by a partition of except root disk or an external disk. # cd /Specified Directory 3. Copy all files from the installation medium by using the cpio command: # cpio -idmu < /dev/XXXX XXXX = I/O device Preserve the directory structure (d flag) and file modification times (m flag), and copy unconditionally (u flag). For diskettes, load them sequentially, and repeat the command. An I/O device name of “floppy disk” designates a surface partition of the raw device file (unpartitioned raw device file). 4. Make a symbolic link to the /HORCM directory: # ln -s /Specified Directory/HORCM /HORCM 5. Run the HORCM installation command: # /HORCM/horcminstall.sh Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 423 6. Verify installation of the proper version by using the raidqry command: # raidqry -h Model: RAID-Manager/AIX Ver&Rev: 01-23-03/06 Usage: raidqry [options] for HORC Installing a newer version of the Hitachi CCI software To install a newer version of the CCI software: 1. Confirm that HORCM is not running. If it is running, shut it down: One CCI instance: # horcmshutdown.sh Two CCI instances: # horcmshutdown.sh 0 1 If Hitachi TrueCopy commands are running in the interactive mode, terminate the interactive mode and exit these commands by using the -q option. 2. Insert the installation medium, such as a CD, into the proper I/O device. 3. Move to the directory that contains the HORCM directory as in the following example for the root directory: # cd / 4. Copy all files from the installation medium by using the cpio command: # cpio -idmu < /dev/XXXX XXXX = I/O device Preserve the directory structure (d flag) and file modification times (m flag) and copy unconditionally (u flag). For diskettes, load them sequentially, and repeat the command. An I/O device name of “floppy disk” designates a surface partition of the raw device file (unpartitioned raw device file). 5. Execute the HORCM installation command: # /HORCM/horcminstall.sh 6. Verify installation of the proper version by using the raidqry command: # raidqry -h Model: RAID-Manager/AIX Ver&Rev: 01-23-03/06 Usage: raidqry [options] for HORC 14.2.2 Overview of the CCI instance The CCI components on the storage system include the command device or devices and the Hitachi TrueCopy volumes, ShadowImage volumes, or both. Each CCI instance on a UNIX/PC server includes the following components: HORCM: – – – – Log and trace files A command server Error monitoring and event reporting files A configuration management feature Configuration definition file that is defined by the user The Hitachi TrueCopy user execution environment, ShadowImage user execution environment, or both, which contain the TrueCopy/ShadowImage commands, a command log, and a monitoring function. 424 IBM PowerHA SystemMirror 7.1 for AIX 14.2.3 Creating and editing the horcm.conf files The configuration definition file is a text file that is created and edited by using any standard text editor, such as the vi editor. A sample configuration definition file, HORCM_CONF (/HORCM/etc/horcm.conf), is included with the CCI software. Use this file as the basis for creating your configuration definition files. The system administrator must copy the sample file, set the necessary parameters in the copied file, and place the copied file in the proper directory. For detailed descriptions of the configuration definition files for sample CCI configurations, see the Hitachi Command Control Interface (CCI) User and Reference Guide, MK-90RD011, which you can download from: http://communities.vmware.com/servlet/JiveServlet/download/1183307-19474 Important: Do not edit the configuration definition file while HORCM is running. Shut down HORCM, edit the configuration file as needed, and then restart HORCM. You might have multiple CCI instances, each of which uses its own specific horcm#.conf file. For example, instance 0 might be horcm0.conf, instance 1 (Example 14-1) might be horcm1.conf, and so on. The test scenario presented later in this chapter uses instance 2 and provides examples of the horcm2.conf file on each cluster node. Example 14-1 The hormc.conf file Example configuration files: horcm1.conf file on local node -----------------------------HORCM_MON #ip_address => Address of the local node #ip_address service poll(10ms) timeout(10ms) 10.15.11.194 horcm1 12000 3000 HORCM_CMD #dev_name => hdisk of Command Device #UnitID 0 (Serial# eg. 45306) /dev/hdisk19 HORCM_DEV #Map dev_grp to LDEV# #dev_group dev_name port# TargetID LU# MU# VG01 test01 CL1-B 1 5 0 VG01 work01 CL1-B 1 24 0 VG01 work02 CL1-B 1 25 0 HORCM_INST #dev_group ip_address VG01 10.15.11.195 service horcm1 horcm1.conf file on remote node ------------------------------HORCM_MON #ip_address => Address of the local node #ip_address service poll(10ms) timeout(10ms) 10.15.11.195 horcm1 12000 3000 HORCM_CMD Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 425 #dev_name => hdisk of Command Device #UnitID 0 (Serial# eg. 45306) /dev/hdisk19 HORCM_DEV #Map dev_grp to LDEV# #dev_group dev_name port# TargetID LU# MU# VG01 test01 CL1-B 1 5 0 VG01 work01 CL1-B 1 21 0 VG01 work02 CL1-B 1 22 0 HORCM_INST #dev_group ip_address VG01 10.15.11.194 service horcm1 NOTE 1: For the horcm instance to use any available command device, in case one of them fails, it is RECOMMENDED that, in your horcm file, under HORCM_CMD section, the command device, is presented in the format below, where 10133 is the serial # of the array: \\.\CMD-10133:/dev/hdisk/ For example: \\.\CMD-10133:/dev/rhdisk19 /dev/rhdisk20 ( note space in between). NOTE 2: The Device_File will show "-----" for the "pairdisplay -fd" command, which will also cause verification to fail, if the ShadowImage license has not been activated on the storage system and the MU# column is not empty. It is therefore recommended that the MU# column be left blank if the ShadowImage license is NOT activated on the storage system. Starting the HORCM instances To start one instance of the CCI, follow these steps: 1. Modify the /etc/services file to register the port name/number (service) of the configuration definition file. Make the port name/number the same on all servers. horcm xxxxx/udp xxxxx = the port name/number of horcm.conf 2. Optional: If you want HORCM to start automatically each time the system starts, add /etc/horcmstart.sh to the system automatic startup file (such as the /sbin/rc file). 3. Run the horcmstart.sh script manually to start the CCI instance: # horcmstart.sh 4. Set the log directory (HORCC_LOG) in the command execution environment as needed. 5. Optional: If you want to perform Hitachi TrueCopy operations, do not set the HORCC_MRCF environment variable. – For the B shell: # HORCC_MRCF=1 # export HORCC_MRCF 426 IBM PowerHA SystemMirror 7.1 for AIX – For the C shell: # setenv HORCC_MRCF 1 # pairdisplay -g xxxx xxxx = group name To start two instances of the CCI, follow these steps: 1. Modify the /etc/services file to register the port name/number (service) of each configuration definition file. The port name/number must be different for each CCI instance. horcm0 xxxxx/udp xxxxx = the port name/number for horcm0.conf horcm1 yyyyy/udp yyyyy = the port name/number for horcm1.conf 2. If you want HORCM to start automatically each time the system starts, add /etc/horcmstart.sh 0 1 to the system automatic startup file (such as the /sbin/rc file). 3. Run the horcmstart.sh script manually to start the CCI instances: # horcmstart.sh 0 1 4. Set an instance number to the environment that executes a command: For the B shell: # HORCMINST=X X = instance number = 0 or 1 # export HORCMINST For the C shell: # setenv HORCMINST X 5. Set the log directory (HORCC_LOG) in the command execution environment as needed. 6. If you want to perform Hitachi TrueCopy operations, do not set the HORCC_MRCF environment variable. For B shell: # HORCC_MRCF=1 # export HORCC_MRCF For C shell: # setenv HORCC_MRCF 1 # pairdisplay -g xxxx xxxx = group name 14.3 Scenario description This scenario uses four nodes, two in each of the two sites: Austin and Miami. Nodes jessica and bina are in the Austin site, and nodes krod and maddi are in the Miami site. Each site provides local automatic failover, along with remote recovery for the other site, which is often referred to as a mutual takeover configuration. Figure 14-1 on page 428 provides a software and hardware overview of the tested configuration between the two sites. Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 427 Miami Austin AIX 6.1 TL6 PowerHA 6.1 SP3 CCI01-23-03/06 AIX 6.1 TL6 PowerHA 6.1 SP3 CCI01-23-03/06 Bina Jessica Krod Maddi USPV Microcode 60-06-16/00 USPV Microcode 60-06-05/00 FC Links USP-VM Ser#35754 USP-V Ser#45306 hdisc38 hdisc39 truesyncvg hdisc40 hdisc38 TrueCopy sync hdisc41 ursasyncvg truesyncvg hdisc40 URS async hdisc39 hdisc41 ursasyncvg Figure 14-1 Hitachi replication lab environment test configuration1 Each site consists of two type Ethernet networks. In this case, both networks are used for a public Ethernet and for cross-site networks. Usually the cross-site network is on separate segments and is an XD_ip network. It is also common to use site-specific service IP labels. Example 14-2 shows the interlace list from the cluster topology. Example 14-2 Test topology information root@jessica: llsif Adapter Type jessica boot jessicaalt boot service_1 service service_2 service bina boot bina alt boot service_1 service service_2 service krod boot krod alt boot service_1 service service_2 service maddi boot maddi alt boot service_1 service service_2 service 1 428 Courtesy of Hitachi Data Systems IBM PowerHA SystemMirror 7.1 for AIX Network Net Type net_ether_02 ether net_ether_03 ether net_ether_03 ether net_ether_03 ether net_ether_02 ether net_ether_03 ether net_ether_03 ether net_ether_03 ether net_ether_02 ether net_ether_03 ether net_ether_03 ether net_ether_03 ether net_ether_02 ether net_ether_03 ether net_ether_03 ether net_ether_03 ether Attribute public public public public public public public public public public public public public public public public Node jessica jessica jessica jessica bina bina bina bina krod krod krod krod maddi maddi maddi maddi IP Address 9.3.207.24 207.24.1.1 1.2.3.4 1.2.3.5 9.3.207.77 207.24.1.2 1.2.3.4 1.2.3.5 9.3.207.79 207.24.1.3 1.2.3.4 1.2.3.5 9.3.207.78 207.24.1.4 1.2.3.4 1.2.3.5 In this scenario, each node or site has four unique disks defined through each of the two separate Hitachi storage units. The jessica and bina nodes at the Austin site have two disks, hdisk38 and hdisk3. These disks are the primary source volumes that use TrueCopy synchronous replication for the truesyncvg volume group. The other two disks, hdisk40 and hdisk41, are to be used as the target secondary volumes that use HUR for asynchronous replication from the Miami site for the ursasyncvg volume group. The krod and bina nodes at the Miami site have two disks, hdisk38 and hdisk39. These disks are the secondary target volumes for the TrueCopy synchronous replication of the truesyncvg volume group from the Austin site. The other two disks, hdisk40 and hdisk41, are to be used as the primary source volumes for the ursasyncvg volume group that uses HUR for asynchronous replication. 14.4 Configuring the TrueCopy/HUR resources This topic explains how to perform the following tasks to configure the resources for TrueCopy/HUR: Assigning LUNs to the hosts (host groups) Creating replicated pairs Configuring an AIX disk and dev_group association For each of these tasks, the Hitachi storage units have been added to the SAN fabric and zoned appropriately. Also, the host groups have been created for the appropriate node adapters, and the LUNs have been created within the storage unit. 14.4.1 Assigning LUNs to the hosts (host groups) In this task, you assign LUNs by using the Hitachi Storage Navigator. Although an overview of the steps is provided, always refer to the official Hitachi documentation for your version as needed. To begin, the Hitachi USP-V storage unit is at the Austin site. The host group, JessBina, is assigned to port CL1-E on the Hitachi storage unit with the serial number 45306. Usually the host group is assigned to multiple ports for full multipath redundancy. To assign the LUNs to the hosts, follow these steps: 1. Locate the free LUNs and assign them to the proper host group. a. Verify whether LUNs are currently assigned by checking the number of paths associated with the LUN. If the fields are blank, the LUN is currently unassigned. b. Assign the LUNs. To assign one LUN, click and drag it to a free LUN/LDEV location. To assign multiple LUNs, hold down the Shift key and click each LUN. Then right-click the selected LUNs and drag them to a free location. This free location is indicated by a black and white disk image that also contains no information in the corresponding attribute columns of LDEV/UUID/Emulation as shown in Figure 14-2 on page 430. Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 429 Figure 14-2 Assigning LUNs to the Austin site nodes2 2. In the path verification window (Figure 14-3), check the information and record the LUN number and LDEV numbers. You use this information later. However, you can also retrieve this information from the AIX system after the devices are configured by the host. Click OK. Figure 14-3 Checking the paths for the Austin LUNs3 2 3 430 Courtesy of Hitachi Data Systems Courtesy of Hitachi Data Systems IBM PowerHA SystemMirror 7.1 for AIX 3. Back on the LUN Manager tab (Figure 14-4), click Apply for these paths to become active and the assignment to be completed. Figure 14-4 Applying LUN assignments for Austin4 You have completed assigning four more LUNs for the nodes at the Austin site. However the lab environment already had several LUNs, including both command and journaling LUNs in the cluster nodes. These LUNs were added solely for this test scenario. Important: If these LUNs are the first ones to be allocated to the hosts, you must also assign the command LUNs. See the appropriate Hitachi documentation as needed. For the storage unit at the Miami site, repeat the steps that you performed for the Austin site. The host group, KrodMaddi, is assigned to port CL1-B on the Hitachi UPS-VM storage unit with the serial number 35764. Usually the host group is assigned to multiple ports for full multipath redundancy. Figure 14-5 on page 432 shows the result of these steps. Again record both the LUN numbers and LDEV numbers so that you can easily refer to them as needed when creating the replicated pairs. The numbers are also required when you add the LUNs into device groups in the appropriate horcm.conf file. 4 Courtesy of Hitachi Data Systems Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 431 Figure 14-5 Miami site LUNs assigned5 14.4.2 Creating replicated pairs PowerHA SystemMirror Enterprise Edition does not create replication pairs by using the Hitachi interfaces. You must use the Hitachi Storage interfaces to create the same replicated pairs before using PowerHA SystemMirror Enterprise Edition to achieve an HADR solution. For information about setting up TrueCopy/HUR pairs, see the Hitachi Command Control Interface (CCI) User and Reference Guide, MK-90RD011, which you can download from: http://communities.vmware.com/servlet/JiveServlet/download/1183307-19474 You must know exactly which LUNs from each storage unit will be paired together. They must be the same size. In this case, all of the LUNs that are used are 2 GB in size. The pairing of LUNs also uses the LDEV numbers. The LDEV numbers are hexadecimal values that also show up as decimal values on the AIX host. 5 432 Courtesy of Hitachi Data Systems IBM PowerHA SystemMirror 7.1 for AIX Table 14-1 translates the LDEV hex values of each LUN and its corresponding decimal value. Table 14-1 LUN number to LDEV number comparison Austin - 45306 Miami - 35764 LUN number LDEV-HEX LDEV-DEC number LUN number LDEV-HEX LDEV-DEC number 000A 00:01:10 272 001C 00:01:0C 268 000B 00:01:11 273 001D 00:01:0D 269 000C 00:01:12 274 001E 00:01:0E 271 000D 00:01:13 275 001F 00:01:0E 272 Although the pairing can be done by using the CCI, the example in this section shows how to create the replicated pairs through the Hitachi Storage Navigator. The appropriate commands are in the /HORCM/usr/bin directory. In this scenario, none of the devices have been configured to the AIX cluster nodes. Creating TrueCopy synchronous pairings Beginning with the Austin Hitachi unit, create two synchronous TrueCopy replicated pairings. 1. From within Storage Navigator (Figure 14-6), select Go TrueCopy Pair Operation. Figure 14-6 Storage Navigator menu options to perform a pair operation6 Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 433 2. In the TrueCopy Pair Operation window (Figure 14-7), select the appropriate port, CL-1E, and find the specific LUNs to use (00-00A and 00-00B). In this scenario, we have predetermined that we want to pair these LUNs with 00-01C and 00-01D from the Miami Hitachi storage unit on port CL1-B. Notice in the occurrence of SMPL in the Status column next to the LUNs. SMPL indicates simplex, meaning that no mirroring is being used with that LUN. 3. Right-click the first Austin LUN (00-00A), and select Paircreate Synchronize (Figure 14-7). Figure 14-7 Creating a TrueCopy synchronous pairing7 6 7 434 Courtesy of Hitachi Data Systems Courtesy of Hitachi Data Systems IBM PowerHA SystemMirror 7.1 for AIX 4. In the full synchronous Paircreate menu (Figure 14-8), select the proper port and LUN that you previously created and recorded. Click Set. Because we have only one additional remote storage unit, the RCU field already shows the proper one for Miami. 5. Repeat step 4 for the second LUN pairing. Figure 14-8 shows details of the two pairings. Figure 14-8 TrueCopy pairings8 8 Courtesy of Hitachi Data Systems Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 435 6. After you complete the pairing selections, on the Pair Operation tab, verify that the information is correct and click Apply to apply them all at one time. Figure 14-9 shows both of the source LUNs in the middle of the pane. It also shows an overview of which remote LUNs they are to be paired with. Figure 14-9 Applying TrueCopy pairings9 9 436 Courtesy of Hitachi Data Systems IBM PowerHA SystemMirror 7.1 for AIX This step automatically starts copying the LUNs from the local Austin primary source to the remote Miami secondary source LUNs. You can also right-click a LUN and select Detailed Information as shown in Figure 14-10. Figure 14-10 Detailed LUN pairing and copy status information10 10 Courtesy of Hitachi Data Systems Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 437 After the copy has completed, the status is displayed as PAIR as shown in Figure 14-11. You can also view this status from the management interface of either one of the storage units. Figure 14-11 TrueCopy pairing and copy completed11 11 438 Courtesy of Hitachi Data Systems IBM PowerHA SystemMirror 7.1 for AIX Creating a Universal Replicator asynchronous pairing Now switch over to the Miami Hitachi storage unit to create the asynchronous replicated pairings. 1. From the Storage Navigator, select Go Universal Replicator Pair Operation (Figure 14-12). Figure 14-12 Menu selection to perform the pair operation12 12 Courtesy of Hitachi Data Systems Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 439 2. In the Universal Replicator Pair Operation window (Figure 14-13), select the appropriate port CL-1B and find the specific LUNs that you want to use, which are 00-01E and 00-01F in this example). We have already predetermined that we want to pair these LUNs with 00-0C and 00-00D from the Austin Hitachi storage unit on port CL1-E. Right-click one of the desired LUNs and select Paircreate. Figure 14-13 Selecting Paircreate in the Universal Replicator13 13 440 Courtesy of Hitachi Data Systems IBM PowerHA SystemMirror 7.1 for AIX 3. In the full synchronous Paircreate window, complete these steps: a. Select the proper port and LUN that you previously created and recorded. b. Because we only have one additional remote storage unit, the RCU field already shows the proper one for Austin. c. Unlike when using TrueCopy synchronous replication, when using Universal Replicator, specify a master journal volume (M-JNL), a remote journal volume (R-JNL), and a consistency (CT) group. Important: If these are the first Universal Replicator LUNs to be allocated, you must also assign journaling groups and LUNs for both storage units. Refer to the appropriate Hitachi Universal Replicator documentation as needed. We chose ones that have been already previously created in the environment. d. Click Set e. Repeat these steps for the second LUN pairing. Figure 14-14 shows details of the two pairings. Figure 14-14 Paircreate details in Universal Replicator14 14 Courtesy of Hitachi Data Systems Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 441 4. After you complete the pairing selections, on the Pair Operation tab, verify that the information is correct and click Apply to apply them all at one time. When the pairing is established, the copy automatically begins to synchronize with the remote LUNs at the Austin site. The status changes to COPY, as shown in Figure 14-15, until the pairs are in sync. After the pairs are synchronized, their status changes to PAIR. Figure 14-15 Asynchronous copy in progress in Universal Replicator15 15 442 Courtesy of Hitachi Data Systems IBM PowerHA SystemMirror 7.1 for AIX 5. Upon completion of the synchronization of the LUNs, configure the LUNs into the AIX cluster nodes. Figure 14-16 shows an overview of the Hitachi replicated environment. Figure 14-16 Replicated Hitachi LUN overview16 14.4.3 Configuring an AIX disk and dev_group association Before you continue with the steps in this section, you must ensure that the Hitachi hdisks are made available to your nodes. You can run the cfgmgr command to configure the new hdisks. Also the CCI must already be installed on each cluster node. If you must install the CCI, see 14.2.1, “Installing the Hitachi CCI software” on page 422. In the test environment, we already have hdisk0-37 on each of the four cluster nodes. After running the cfgmgr command one each node, one at a time, we now have four additional disks, hdisk38-hdisk41, as shown in Example 14-3. Example 14-3 New Hitachi disks root@jessica: hdisk38 hdisk39 hdisk40 hdisk41 none none none none None None None None Although the LUN and LDEV numbers were written down during the initial LUN assignments, you must identify the correct LDEV numbers of the Hitachi disks and the corresponding AIX hdisks by performing the following steps: 16 Courtesy of Hitachi Data Systems Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 443 1. On the PowerHA SystemMirror Enterprise Edition nodes, select the Hitachi disks and the disks that will be used in the TrueCopy/HUR relationships by running the inqraid command. Example 14-4 shows hdisk38-hdisk41, which are the Hitachi disks that we just added. Example 14-4 Hitachi disks added root@jessica: # lsdev -Cc disk|grep hdisk|/HORCM/usr/bin/inqraid hdisk38 -> [SQ] CL1-E Ser = 45306 LDEV = 272 [HITACHI HORC = P-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL RAID5[Group 1- 2] SSID = 0x0005 hdisk39 -> [SQ] CL1-E Ser = 45306 LDEV = 273 [HITACHI HORC = P-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL RAID5[Group 1- 2] SSID = 0x0005 hdisk40 -> [SQ] CL1-E Ser = 45306 LDEV = 274 [HITACHI HORC = S-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL RAID5[Group 1- 2] SSID = 0x0005 CTGID = 10 hdisk41 -> [SQ] CL1-E Ser = 45306 LDEV = 275 [HITACHI HORC = S-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL RAID5[Group 1- 2] SSID = 0x0005 CTGID = 10 ] [OPEN-V MU#2 = SMPL] ] ] [OPEN-V MU#2 = SMPL] ] ] [OPEN-V MU#2 = SMPL] ] ] [OPEN-V MU#2 = SMPL] ] 2. Edit the HORCM LDEV section in the horcm#.conf file to identify the dev_group that will be managed by PowerHA SystemMirror Enterprise Edition. In this example, we use the horcm2.conf file. Hdisk38 (ldev 272) and hdisk39 (ldev 273) are the pair for the synchronous replicated resource group, which is primary at the Austin site. Hdisk40 (ldev 275) and hdisk41 (ldev276) are the pair for an asynchronous replicated resource, which is primary at the Miami site. Specify the device groups (dev_group) in the horcm#.conf file. We are using dev_group htcdg01 with dev_names htcd01 and htcd02 for the synchronous replicated pairs. For the asynchronous pairs, we are using dev_group hurdg01 and dev_names hurd01 and hurd02. The device group names are needed later when checking that status of the replicated pairs and when defining the replicated pairs as a resource for PowerHA Enterprise Edition to control. Important: Do not edit the configuration definition file while HORCM is running. Shut down HORCM, edit the configuration file as needed, and then restart HORCM. Example 14-5 shows the horcm2.conf file from the jessica node, at the Austin site. Because two nodes are at the Austin site, the same updates were performed to the /etc/horcm2.conf file on the bina node. Notice that you can use either the decimal value of the LDEV or the hexidecimal value. We specifically did one pair each way just to show it and to demonstrate that it works. Although several groups were already defined, only those that are relevant to this scenario are shown. Example 14-5 Horcm2.conf file used for the Austin site nodes root@jessica: /etc/horcm2.conf HORCM_MON #Address of local node... #ip_address service 444 IBM PowerHA SystemMirror 7.1 for AIX poll(10ms) timeout(10ms) r9r3m11.austin.ibm.com 52323 1000 HORCM_CMD #hdisk of Command Device... #dev_name dev_name #UnitID 0 (Serial# 45306) #/dev/rhdisk10 \\.\CMD-45306:/dev/rhdisk10 /dev/rhdisk14 HORCM_LDEV #Map dev_grp #dev_group # #--------htcdg01 htcdg01 hurdg01 hurdg01 to LDEV#... dev_name Serial# --------htcd01 htcd02 hurd01 hurd02 ------45306 45306 45306 45306 CU:LDEV (LDEV#) -------272 273 01:12 01:13 3000 dev_name MU# --- siteA siteB hdisk -> hdisk -------------------- # Address of remote node for each dev_grp... HORCM_INST #dev_group ip_address service htcdg01 maddi.austin.ibm.com 52323 hurdg01 maddi.austin.ibm.com 52323 For the krod and maddi nodes at the Miami site, the dev_groups, dev_names, and the LDEV numbers are the same. The difference is the specific serial number of the storage unit at that site. Also, the remote system or IP address for the appropriate system in the Austin site. Example 14-6 shows the horcm2.conf file that we used for both nodes in the Miami site. Notice that, for the ip_address fields, fully qualified names are used instead of the IP address. As long as these names are resolvable, the format is still valid. However, the format is seen using the actual addresses as shown in Example 14-1 on page 425. Example 14-6 The horcm2.conf file used for the nodes in the Miami site root@krod: horcm2.conf HORCM_MON #Address of local node... #ip_address service r9r3m13.austin.ibm.com 52323 poll(10ms) 1000 HORCM_CMD #hdisk of Command Device... #dev_name dev_name #UnitID 0 (Serial# 35764) #/dev/rhdisk10 # /dev/hdisk19 \\.\CMD-45306:/dev/rhdisk11 /dev/rhdisk19 #HUR_GROUP htcdg01 htcdg01 hurdg01 HUR_103_153 45306 htcd01 35764 htcd02 35764 hurd01 35764 01:53 268 269 01:0E timeout(10ms) 3000 dev_name 0 Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 445 hurdg01 hurd02 35764 01:0F # Address of remote node for each dev_grp... HORCM_INST #dev_group htcdg01 hurdg01 ip_address service bina.austin.ibm.com bina.austin.ibm.com 52323 52323 3. Map the TrueCopy-protected hdisks to the TrueCopy device groups by using the raidscan command. In the following example, 2 is the HORCM instance number: lsdev -Cc disk|grep hdisk | /HORCM/usr/bin/raidscan -IH2 -find inst The -find inst option of the raidscan command registers the device file name (hdisk) to all mirror descriptors of the LDEV map table for HORCM. This option also permits the matching volumes on the horcm.conf file in protection mode and is started automatically by using the /etc/horcmgr command. Therefore you do not need to use this option normally. This option is terminated to avoid wasteful scanning when the registration has been finished based on HORCM. Therefore, if HORCM no longer needs the registration, then no further action is taken and it exits. You can use the -find inst option with the -fx option to view LDEV numbers in the hexadecimal format. 4. Verify that the PAIRs are established by running either the pairvdisplay command or the pairvolchk command against the device groups htcdg01 and hurdg01. Example 14-7 shows how we use the pairvdisplay command. For device group htcdg01, the status of PAIR and fence of NEVER indicates that they are a synchronous pair. For device group hurdg01, the ASYNC fence option clearly indicates that it is in an asynchronous pair. Also notice that the CTG field shows the consistency group number for the asynchronous pair managed by HUR. Example 14-7 The pairdisplay command to verify that the pair status is synchronized # pairdisplay -g htcdg01 -IH2 -fe Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# htcdg01 htcd01(L) (CL1-E-0, 0, 10)45306 272.P-VOL PAIR NEVER ,35764 268 htcdg01 htcd01(R) (CL1-B-0, 0, 28)35764 268.S-VOL PAIR NEVER ,----272 htcdg01 htcd02(L) (CL1-E-0, 0, 11)45306 273.P-VOL PAIR NEVER ,35764 269 htcdg01 htcd02(R) (CL1-B-0, 0, 29)35764 269.S-VOL PAIR NEVER ,----273 M CTG JID AP - 1 - - 1 - - # pairdisplay -g hurdg01 -IH2 -fe Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# hurdg01 hurd01(L) (CL1-E-0, 0, 12)45306 274.S-VOL PAIR ASYNC ,----270 hurdg01 hurd01(R) (CL1-B-0, 0, 30)35764 270.P-VOL PAIR ASYNC ,45306 274 hurdg01 hurd02(L) (CL1-E-0, 0, 13)45306 275.S-VOL PAIR ASYNC ,----271 hurdg01 hurd02(R) (CL1-B-0, 0, 31)35764 271.P-VOL PAIR ASYNC ,45306 275 M CTG JID AP - 10 3 1 - 10 3 2 - 10 3 1 - 10 3 2 To show the output in Example 14-7, we removed the last three columns of the output because it was not relevant to what we are checking. 446 IBM PowerHA SystemMirror 7.1 for AIX Unestablished pairs: If pairs are not yet established, the status is displayed as SMPL. To continue, you must create the pairs. For instructions about creating pairs from the command line, see the Hitachi Command Control Interface (CCI) User and Reference Guide, MK-90RD011, which you can download from: http://communities.vmware.com/servlet/JiveServlet/download/1183307-19474 Otherwise, if you are using Storage Navigator, see 14.4.2, “Creating replicated pairs” on page 432. Creating volume groups and file systems on replicated disks After identifying the hdisks and dev_groups that will be managed by PowerHA SystemMirror Enterprise Edition, you must create the volume groups and file systems. To set up volume groups and file systems in the replicated disks, follow these steps: 1. On each of the four PowerHA SystemMirror Enterprise Edition cluster nodes, verify the next free major number by running the lvlstmajor command on each cluster node. Also verify that the physical volume name for the file system can also be used across sites. In this scenario, we use the major numbers 56 for the truesyncvg volume group and 57 for the ursasyncvg volume group. We use these numbers later when importing the volume to the other cluster nodes. Although the major numbers are not required to match, it is a preferred practice. We create the truesyncvg scalable volume group on the jessica node where the primary LUNs are located. We also create the logical volumes, jfslog, and file systems as shown in Example 14-8. Example 14-8 Details about the truesyncvg volume group root@jessica:lsvg truesyncvg VOLUME GROUP: truesyncvg 00cb14ce00004c000000012b564c41b9 VG STATE: active VG PERMISSION: read/write MAX LVs: 256 LVs: 3 OPEN LVs: 3 TOTAL PVs: 2 STALE PVs: 0 ACTIVE PVs: 2 MAX PPs per VG: 32768 LTG size (Dynamic): 256 kilobyte(s) HOT SPARE: no PV RESTRICTION: none root@jessica:lsvg -l truesyncvg lsvg -l truesyncvg truesyncvg: LV NAME TYPE LPs oreolv jfs2 125 majorlv jfs2 125 truefsloglv jfs2log 1 VG IDENTIFIER: PP SIZE: TOTAL PPs: FREE PPs: USED PPs: QUORUM: VG DESCRIPTORS: STALE PPs: AUTO ON: MAX PVs: AUTO SYNC: BB POLICY: PPs 125 125 1 PVs 1 1 1 4 megabyte(s) 988 (3952 megabytes) 737 (2948 megabytes) 251 (1004 megabytes) 2 (Enabled) 3 0 no 1024 no relocatable LV STATE closed/syncd closed/syncd closed/syncd MOUNT POINT /oreofs /majorfs N/A Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 447 We create the ursasyncvg big volume group on the krod node where the primary LUNs are located. We also create the logical volumes, jfslog, and file systems as shown in Example 14-9. Example 14-9 Ursasyncvg volume group information root@krod:lspv hdisk40 00cb14ce5676ad24 hdisk41 00cb14ce5676afcf ursasyncvg ursasyncvg root@krod:lsvg ursasyncvg VOLUME GROUP: ursasyncvg 00cb14ce00004c000000012b5676b11e VG STATE: active VG PERMISSION: read/write MAX LVs: 512 LVs: 3 OPEN LVs: 3 TOTAL PVs: 2 STALE PVs: 0 ACTIVE PVs: 2 MAX PPs per VG: 130048 MAX PPs per PV: 1016 LTG size (Dynamic): 256 kilobyte(s) HOT SPARE: no root@krod:lsvg -l ursasyncvg ursasyncvg: LV NAME TYPE ursfsloglv jfs2log hannahlv jfs2 julielv jfs2 LPs 2 200 220 active active VG IDENTIFIER: PPs 2 200 220 PP SIZE: TOTAL PPs: FREE PPs: USED PPs: QUORUM: VG DESCRIPTORS: STALE PPs: AUTO ON: 4 megabyte(s) 1018 (4072 megabytes) 596 (2384 megabytes) 422 (1688 megabytes) 2 (Enabled) 3 0 no MAX PVs: AUTO SYNC: BB POLICY: 128 no relocatable PVs 1 1 1 LV STATE closed/syncd closed/syncd closed/syncd MOUNT POINT N/A /hannahfs /juliefs 2. Vary off the newly created volume groups by running the varyoffvg command. To import the volume groups onto the other three systems, the pairs must be in sync. We execute the pairresync command as shown in Example 14-10 on the local disks and make sure that they are in the PAIR state. This process verifies that the local disk information has been copied to the remote storage. Notice that the command is being run on the respective node that contains the primary source LUNs and where the volume groups are created. Example 14-10 Pairresync command #root@jessica:pairresync -g htcdg01 -IH2 #root@krod:pairresync -g hurdg01 -IH2 Verify that the pairs are in sync with the pairdisplay command as shown in Example 14-7 on page 446. 448 IBM PowerHA SystemMirror 7.1 for AIX 3. Split the pair relationship so that the remote systems can import the volume groups as needed on each node. Run the pairsplit command against the device group as shown in Example 14-11. Example 14-11 The pairsplit command to suspend replication root@jessica: pairsplit -g htcdg01 -IH2 root@krod: pairsplit -g hurdg01 -IH2 To verify that the pairs are split, check the status by using the pairdisplay command. Example 14-12 shows that the pairs are in a suspended state. Example 14-12 Pairdisplay shows pairs suspended root@jessica: pairdisplay -g htcdg01 -IH2 -fe Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M CTG JID AP htcdg01 htcd01(L) (CL1-E-0, 0, 10)45306 272.P-VOL PSUS NEVER ,35764 268 - 1 htcdg01 htcd01(R) (CL1-B-0, 0, 28)35764 268.S-VOL SSUS NEVER ,----272 - htcdg01 htcd02(L) (CL1-E-0, 0, 11)45306 273.P-VOL PSUS NEVER ,35764 269 - 1 htcdg01 htcd02(R) (CL1-B-0, 0, 29)35764 269.S-VOL SSUS NEVER ,----273 - root@krod: pairdisplay -g hurdg01 -IH2 -fe Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M CTG JID AP hurdg01 hurd01(L) (CL1-B-0, 0, 30)35764 270.P-VOL PSUS ASYNC ,45306 274 - 10 3 2 hurdg01 hurd01(R) (CL1-E-0, 0, 12)45306 274.S-VOL SSUS ASYNC ,----270 - 10 3 1 hurdg01 hurd02(L) (CL1-B-0, 0, 31)35764 271.P-VOL PSUS ASYNC ,45306 275 - 10 3 2 hurdg01 hurd02(R) (CL1-E-0, 0, 13)45306 275.S-VOL SSUS ASYNC ,----271 - 10 3 1 4. To import the volume groups on the remaining nodes, ensure that the PVID is present on the disks by using one of the following options: – Run the rmdev -dl command for each hdisk and then run the cfgmgr command. – Run the appropriate chdev command against each disk to pull in the PVID. As shown in Example 14-13, we use the chdev command on each of the three additional nodes. Example 14-13 The chdev command to acquire the PVIDs root@jessica: chdev -l hdisk40 -a pv=yes root@jessica: chdev -l hdisk41 -a pv=yes root@bina: root@bina: root@bina: root@bina: chdev chdev chdev chdev -l -l -l -l hdisk38 hdisk39 hdisk40 hdisk41 -a -a -a -a pv=yes pv=yes pv=yes pv=yes root@krod: chdev -l hdisk38 -a pv=yes root@krod: chdev -l hdisk39 -a pv=yes root@maddi: root@maddi: root@maddi: root@maddi: chdev chdev chdev chdev -l -l -l -l hdisk38 hdisk39 hdisk40 hdisk41 -a -a -a -a pv=yes pv=yes pv=yes pv=yes Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 449 5. Verify that the PVIDs are correctly showing on each system by running the lspv command as shown in Example 14-14. Because all four of the nodes have the exact hdisk numbering, we show the output only from one node, the bina node. Example 14-14 LSPV listing to verify PVIDs are present bina@root: lspv hdisk38 00cb14ce564c3f44 hdisk39 00cb14ce564c40fb hdisk40 00cb14ce5676ad24 hdisk41 00cb14ce5676afcf none none none none 6. Import the volume groups on each node as needed by using the importvg command. Specify the major number that you used earlier. 7. Disable both the auto varyon and quorum settings of the volume groups by using the chvg command. 8. Vary off the volume group as shown in Example 14-15. Attention: PowerHA SystemMirror Enterprise Edition attempts to automatically set the AUTO VARYON to NO during verification, except in the case of remote TrueCopy/HUR. Example 14-15 Importing the replicated volume groups root@jessica: importvg -y ursasyncvg -V 57 hdisk40 root@jessica: chvg -a n -Q n ursasyncvg root@jessica: varyoffvg ursasyncvg root@bina: root@bina: root@bina: root@bina: root@bina: root@bina: importvg -y truesyncvg -V 56 hdisk38 importvg -y ursasyncvg -V 57 hdisk40 chvg -a n -Q n truesyncvg chvg -a n -Q n ursasyncvg varyoffvg truesyncvg varyoffvg ursasyncvg root@krod: importvg -y truesyncvg -V 56 hdisk38 root@krod: chvg -a n -Q n truesyncvg root@krod: varyoffvg truesyncvg root@maddi: root@maddi: root@maddi: root@maddi: root@maddi: root@maddi: importvg -y truesyncvg -V 56 hdisk38 importvg -y ursasyncvg -V 57 hdisk40 chvg -a n -Q n truesyncvg chvg -a n -Q n ursasyncvg varyoffvg truesyncvg varyoffvg ursasyncvg 9. Re-establish the pairs that you split in step 3 on page 449 by running the pairresync command again as shown in Example 14-10 on page 448. 10.Verify again if they are in sync by using the pairdisplay command as shown in Example 14-7 on page 446. 450 IBM PowerHA SystemMirror 7.1 for AIX 14.4.4 Defining TrueCopy/HUR managed replicated resource to PowerHA To add a replicated resource to be controlled by PowerHA consists of two specific steps per device group, and four steps overall: Adding TrueCopy/HUR replicated resources Adding the TrueCopy/HUR replicated resources to a resource group Verifying the TrueCopy/HUR configuration Synchronizing the cluster configuration In these steps, the cluster topology has been configured, including all four nodes, both sites, and networks. Adding TrueCopy/HUR replicated resources To define a TrueCopy replicated resource, follow these steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path Extended Configuration Extended Resource Configuration TrueCopy Replicated Resources Add Hitachi TrueCopy/HUR Replicated Resource. 3. In the Ad Hitachi TrueCopy/HUR Replication Resource panel, press Enter. 4. Complete the available fields appropriately and press Enter. In this configuration, we created two replicated resources. One resource is for the synchronous device group, htcdg01, named trulee. The second resource for the asynchronous device group, hurdg01, named ursasyncRR. Example 14-16 shows both of the replicated resources. Example 14-16 TrueCopy/HUR replicated resource definitions Add a HITACHI TRUECOPY(R)/HUR Replicated Resource Type or select values in entry fields. Press Enter AFTER making all desired changes. * * * * * * * TRUECOPY(R)/HUR Resource Name TRUECOPY(R)/HUR Mode Device Groups Recovery Action Horcm Instance Horctakeover Timeout Value Pairevtwait Timeout Value [Entry Fields] [truelee] SYNC [htcdg01] AUTO [horcm2] [300] [3600] + + + # # Add a HITACHI TRUECOPY(R)/HUR Replicated Resource Type or select values in entry fields. Press Enter AFTER making all desired changes. * * * * TRUECOPY(R)/HUR Resource Name TRUECOPY(R)/HUR Mode Device Groups Recovery Action [Entry Fields] [ursasyncRR] ASYNC [hurdg01] AUTO Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator + + + 451 * Horcm Instance * Horctakeover Timeout Value * Pairevtwait Timeout Value [horcm2] [300] [3600] # # For a complete list of all of defined TrueCopy/HUR replicated resources, run the cllstc command, which is in the /usr/es/sbin/cluster/tc/cmds directory. Example 14-17 shows the output of the cllstc command. Example 14-17 The cllstc command to list the TrueCopy/HUR replicated resources root@jessica: cllstc -a Name CopyMode DeviceGrps truelee SYNC htcdg01 ursasyncRR ASYNC hurdg01 RecoveryAction AUTO AUTO HorcmInstance HorcTimeOut horcm2 300 horcm2 300 PairevtTimeout 3600 3600 Adding the TrueCopy/HUR replicated resources to a resource group To add a TrueCopy replicated resource to a resource group, follow these steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path Extended Configuration Extended Resource Configuration Extended Resource Group Configuration. Depending on whether you are working with an existing resource group or creating a resource group, the TrueCopy Replicated Resources entry is displayed at the bottom of the page in SMIT. This entry is a pick list that shows the resource names that are created in the previous task. 3. Ensure that the volume groups that are selected on the Resource Group configuration display match the volume groups that are used in the TrueCopy/HUR Replicated Resource: – If you are changing an existing resource group, select Change/Show Resource Group. – If you are adding a resource group, select Add a Resource Group. 4. In the TrueCopy Replicated Resources field, press F4 for a list of the TrueCopy/HUR replicated resources that were previously added. Verify that this resource matches the volume group that is specified. Important: You cannot mix regular (non-replicated) volume groups and TrueCopy/HUR replicated volume groups in the same resource group. Press Enter. In this scenario, we changed an existing resource group, emlecRG, for the Austin site and specifically chose a site relationship, also known as an Inter-site Management Policy of Prefer Primary Site. We added a new resource group, valhallarg, for the Miami site and chose to use the same site relationship. We also added the additional nodes from each site. We configured both to failover locally within a site and failover between sites. If a site failure occurs, the node falls over to the remote site standby node, but never to the remote production node. 452 IBM PowerHA SystemMirror 7.1 for AIX Example 14-18 shows the relevant resource group information. Example 14-18 Resource groups for the TrueCopy/HUR replicated resources Resource Group Name Participating Node Name(s) Startup Policy Fallover Policy Fallback Policy Site Relationship Node Priority Service IP Label Volume Groups Hitachi TrueCopy Replicated Resources emlecRG jessica bina maddi Online On Home Node Only Fallover To Next Priority Node Never Fallback Prefer Primary Site Resource Group Name Participating Node Name(s) Startup Policy Fallover Policy Fallback Policy Site Relationship Node Priority Service IP Label Volume Groups Hitachi TrueCopy Replicated Resources valhallaRG krod maddi bina Online On Home Node Only Fallover To Next Priority Node Never Fallback Prefer Primary Site service_1 truesyncvg truelee service_2 ursasyncvg ursasyncRR Verifying the TrueCopy/HUR configuration Before synchronizing the new cluster configuration, verify the TrueCopy/HUR configuration: 1. To verify the configuration, run the following command: /usr/es/sbin/cluster/tc/utils/cl_verify_tc_config 2. Correct any configuration errors that are shown. If you see error messages such as those shown in Figure 14-17, usually these types of messages indicate that the raidscan command was not run or was run incorrectly. See step 3 on page 449 in “Creating volume groups and file systems on replicated disks” on page 447. 3. Run the script again. Figure 14-17 Error messages found during TrueCopy/HUR replicated resource verification Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 453 Synchronizing the cluster configuration You must verify the PowerHA SystemMirror Enterprise Edition cluster and the TrueCopy/HUR configuration before you can synchronize the cluster. To propagate the new TrueCopy/HUR configuration information and the additional resource group that were created across the cluster, follow these steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select Extended Configuration Extended Verification and Synchronization. 3. In the Verify Synchronize or Both field select Synchronize. In the Automatically correct errors found during verification field select No. Press Enter. The output is displayed in the SMIT Command Status window. 14.5 Failover testing This topic explains the basic failover testing of the TrueCopy/HUR replicated resources locally within the site and across sites. You must carefully plan the testing of the site cluster failover because it requires more time to manipulate the secondary target LUNs at the recovery site. Also when testing the asynchronous replication, because of the nature of asynchronous replication, testing can also impact the data. These scenarios do not entail performing a redundancy test with the IP networks. Instead you configure redundant IP or non-IP communication paths to avoid isolation of the sites. The loss of all the communication paths between sites leads to a partitioned state of the cluster and to data divergence between sites if the replication links are also unavailable. Another specific failure scenario is the loss of the replication paths between the storage subsystems while the cluster is running on both sites. To avoid this situation, configure redundant communication links for TrueCopy/HUR replication. You must manually recover the status of the pairs after the storage links are operational again. Important: PowerHA SystemMirror Enterprise Edition does not trap SNMP notification events for TrueCopy/HUR storage. If a TrueCopy link goes down when the cluster is up and the link is repaired later, you must manually resynchronize the pairs. This topic explains how to perform the following tests for each site and resource group: 454 Graceful site failover for the Austin site Rolling site failure of the Austin site Site re-integration for the Austin site Graceful site failover for the Miami site Rolling site failure of the Miami site Site re-integration for the Miami site IBM PowerHA SystemMirror 7.1 for AIX Each test, except for the last re-integration test, begins in the same initial state of each site hosting its own production resource group on the primary node as shown in Example 14-19. Example 14-19 Beginning of test cluster resource group states clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------emlecRG ONLINE jessica@Austin OFFLINE bina@Austin ONLINE SECONDARY maddi@Miami valhallaRG ONLINE krod@Miami OFFLINE maddi@Miami ONLINE SECONDARY bina@Austin Before each test, we start copying data from another file system to the replicated file systems. After each test, we verify that the site service IP address is online and new data is in the file systems. We also had a script that inserts the current time and date into a file on each file system. Because of the small amounts of I/O in our environment, we were unable to determine to have lost any data in the asynchronous replication either. 14.5.1 Graceful site failover for the Austin site Performing a controlled move of a production environment across sites is a basic test to ensure that the remote site can bring the production environment online. However, this task is done only during initial implementation testing or during a planned production outage of the site. You perform the graceful failover operation between sites by performing a resource group move. In a true maintenance scenario, you most likely perform this task by stopping the cluster on the local standby node first. Then you stop the cluster on the production node by using the Move Resource Group. You perform the following operations during this move: Releasing the primary online instance of emlecRG at the Austin site – – – – Executes application server stop Unmounts the file systems Varies off the volume group Removes the service IP address Releasing the secondary online instance of emlecRG at the Miami site. Acquire the emlecRG resource group in the secondary online state at Austin site. Acquire the emlecRG resource group in the online primary state at the Miami site. To move the resource group by using SMIT, follow these steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path System Management (C-SPOC) Resource Groups and Applications Move a Resource Group to Another Node / Site Move Resource Groups to Another Site. Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 455 3. In the Move a Resource Group to Another Node / Site panel (Figure 14-18), select the ONLINE instance of the emlecRG resource group to be moved. Move a Resource Group to Another Node / Site Move cursor to desired item and press Enter. +--------------------------------------------------------------------------+ | Select Resource Group(s) | | | | Move cursor to desired item and press Enter. Use arrow keys to scroll. | | | | # | | # Resource Group State Node(s) / Site | | # | | emlecRG ONLINE jessica / Austi | | emlecRG ONLINE SECONDARY maddi / Miami | | valhallarg ONLINE krod / Miami | | | | # | | # Resource groups in node or site collocation configuration: | | # Resource Group(s) State Node / Site | | # | | | | F1=Help F2=Refresh F3=Cancel | | F8=Image F10=Exit Enter=Do | F1| /=Find n=Find Next | F9+--------------------------------------------------------------------------+ Figure 14-18 Moving the Austin resource group across to site Miami 4. In the Select a Destination Site panel, select the Miami site as shown in Figure 14-19. +--------------------------------------------------------------------------+ | Select a Destination Site | | | | Move cursor to desired item and press Enter. | | | | # *Denotes Originally Configured Primary Site | | Miami | | | | F1=Help F2=Refresh F3=Cancel | | F8=Image F10=Exit Enter=Do | F1| /=Find n=Find Next | F9+--------------------------------------------------------------------------+ Figure 14-19 Selecting the site for resource group move 456 IBM PowerHA SystemMirror 7.1 for AIX 5. Verify the information in the final menu and Press Enter. Upon completion of the move, emlecRG is online on the maddi node at the Miami site as shown in Example 14-20. Example 14-20 Resource group status after a move to the Miami site root@maddi# clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------emlecRG ONLINE SECONDARY jessica@Austin OFFLINE bina@Austin ONLINE maddi@Miami valhallarg ONLINE OFFLINE OFFLINE krod@Miami maddi@Miami bina@Austin 6. Repeat the resource group move to move it back to its original primary site and node to return to the original starting state. Attention: In our environment, after the first resource group move between sites, we were unable to move the resource group back without leaving the pick list for the destination site empty. However, we were able to move it back by node, instead of by site. Later in our testing, the by-site option started working, but it moved it to the standby node at the primary site instead of the original primary node. If you encounter similar problems, contact IBM support. 14.5.2 Rolling site failure of the Austin site In this scenario, you perform a rolling site failure of the Austin site by performing the following tasks: 1. 2. 3. 4. Halt the primary production node jessica at the Austin site. Verify that the resource group emlecRG is acquired locally by the bina node. Halt the bina node to produce a site down. Verify that the resource group emlecRG is acquired remotely by the maddi node. To begin, all four nodes are active in the cluster and the resource groups are online on the primary node as shown in Example 14-19 on page 455. 1. On the jessica node, run the reboot -q command. The bina node acquires the emlecRG resource group as shown in Example 14-21. Example 14-21 Local node failover within the Austin site root@bina: clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------emlecRG OFFLINE jessica@Austin ONLINE bina@Austin OFFLINE maddi@Miami Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 457 valhallarg ONLINE OFFLINE ONLINE SECONDARY krod@Miami maddi@Miami bina@Austin 2. Run the pairdisplay command (as shown in Example 14-22) to verify that the pairs are still established because the volume group is still active on the primary site. Example 14-22 Pairdisplay status after a local site failover root@bina: pairdisplay -g htcdg01 -IH2 -fe Group htcdg01 htcdg01 htcdg01 htcdg01 PairVol(L/R) (Port#,TID, htcd01(L) (CL1-E-0, 0, htcd01(R) (CL1-B-0, 0, htcd02(L) (CL1-E-0, 0, htcd02(R) (CL1-B-0, 0, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# 10)45306 272.P-VOL PAIR NEVER ,35764 268 28)35764 268.S-VOL PAIR NEVER ,----- 272 11)45306 273.P-VOL PAIR NEVER ,35764 269 29)35764 269.S-VOL PAIR NEVER ,----- 273 M CTG JID AP - - 1 - - - - 1 - - - 3. Upon cluster stabilization, run the reboot -q command on the bina node. The maddi node at the Miami site acquires the emlecRG resource group as shown in Example 14-23. Example 14-23 Hard failover between sites root@maddi: clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------emlecRG OFFLINE jessica@Austin OFFLINE bina@Austin ONLINE maddi@Miami valhallarg ONLINE OFFLINE OFFLINE krod@Miami maddi@Miami bina@Austin 4. Verify that the replicated pairs are now in the suspended state from the command line as shown in Example 14-24. Example 14-24 Pairdisplay status after a hard site failover root@maddi: pairdisplay -g htcdg01 -IH2 -fe Group htcdg01 htcdg01 htcdg01 htcdg01 458 PairVol(L/R) (Port#,TID, htcd01(L) (CL1-B-0, 0, htcd01(R) (CL1-E-0, 0, htcd02(L) (CL1-B-0, 0, htcd02(R) (CL1-E-0, 0, IBM PowerHA SystemMirror 7.1 for AIX LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M CTG JID AP 28)35764 268.S-VOL SSUS NEVER ,----- 272 W - - 1 10)45306 272.P-VOL PSUS NEVER ,35764 268 - - - 1 29)35764 269.S-VOL SSUS NEVER ,----- 273 W - - 1 11)45306 273.P-VOL PSUS NEVER ,35764 269 - - - 1 You can also verify that the replicated pairs are in the suspended state by using the Storage Navigator (Figure 14-20). Important: Although our testing resulted in a site_down event, we never lost access to the primary storage subsystem. In a true site failure, including loss of storage, re-establish the replicated pairs, and synchronize them before moving back to the primary site. If you must change the storage LUNs, modify the horcm.conf file, and use the same device group and device names. You do not have to change the cluster resource configuration. Figure 14-20 Pairs suspended after a site failover17 14.5.3 Site re-integration for the Austin site In this scenario, we restart both cluster nodes at the Austin site by using the smitty clstart command. Upon startup of the primary node jessica, the emlecRG resource group is automatically gracefully moved back to and returns to the original starting point as shown in Example 14-19 on page 455. 17 Courtesy of Hitachi Data Systems Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 459 Important: The resource group settings of the Inter-site Management Policy, also known as the site relationship, dictate the behavior of what occurs upon re-integration of the primary node. Because we chose Prefer Primary Site, the automatic fallback occurred. Initially we are unable to restart the cluster on the jessica node because of verification errors at startup, which are similar to the errors shown in Figure 14-17 on page 453. Of the two possible reasons for these errors, one reason is that we failed to include starting the horcm instance on bootup. The second is reason is that we also had to re-map the copy protected device groups by running the raidscan command again. Important: Always ensure that the horcm instance is running before rejoining a node into the cluster. In some cases, if all instances, cluster nodes, or both have been down, you might need to run the raidscan command again. 14.5.4 Graceful site failover for the Miami site This move scenario starts from the states shown in Example 14-19 on page 455. You repeat the steps from the previous three sections, one section at a time. However these steps are performed to test the asynchronous replication of the Miami site. The following tasks are performed during this move: 1. Release the primary online instance of valhallaRG at the Miami site. – – – – Executes the application server stop. Unmounts the file systems Varies off the volume group Removes the service IP address 2. Release the secondary online instance of valhallaRG at the Austin site. 3. Acquire valhallaRG in the secondary online state at the Miami site. 4. Acquire valhallaRG in the online primary state at the Austin site. Perform the resource group move by using SMIT as follows: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path System Management (C-SPOC) Resource Groups and Applications Move a Resource Group to Another Node / Site Move Resource Groups to Another Site. 3. Select the ONLINE instance of valhallaRG to be moved. 4. Select the Austin site from the pop-up menu. 5. Verify the information in the final menu and press Enter. Upon completion of the move, the valhallaRG resource group is online on the bina node at the Austin site. The resource group is online secondary on the local production krod node at the Miami site as shown in Example 14-25. Example 14-25 Resource group status after moving to the Austin site root@bina: clRGinfo Group Name Group State Node ----------------------------------------------------------------------------emlecRG ONLINE jessica@Austin OFFLINE bina@Austin ONLINE SECONDARY maddi@Miami 460 IBM PowerHA SystemMirror 7.1 for AIX valhallarg ONLINE SECONDARY OFFLINE ONLINE krod@Miami maddi@Miami bina@Austin 6. Repeat these steps to move a resource group back to the original primary krod node at the Miami site. Attention: In our environment, after the first resource group move between sites, we were unable to move the resource group back without leaving the pick list for the destination site empty. However, we were able to move it back by node, instead of by site. Later in our testing, the by-site option started working, but it moved it to the standby node at the primary site instead of the original primary node. If you encounter similar problems, contact IBM support. 14.5.5 Rolling site failure of the Miami site In this scenario, you perform a rolling site failure of the Miami site by performing the following tasks: 1. 2. 3. 4. Halt primary production node krod at site Miami Verify resource group valhallaRG is acquired locally by node maddi Halt node maddi to produce a site down Verify resource group valhallaRG is acquired remotely by node bina To begin, all four nodes are active in the cluster, and the resource groups are online on the primary node as shown in Example 14-19 on page 455. Follow these steps: 1. On the krod node, run the reboot -q command. The maddi node brings the valhallaRG resource group online, and the remote bina node maintains the online secondary status as shown in Example 14-26. This time the failover time was noticeably longer, specifically in the fsck portion. The longer amount of time is most likely a symptom of the asynchronous replication. Example 14-26 Local node fallover within the Miami site root@maddi: clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------emlecRG ONLINE jessica@Austin OFFLINE bina@Austin ONLINE SECONDARY maddi@Miami valhallarg OFFLINE ONLINE ONLINE SECONDARY krod@Miami maddi@Miami bina@Austin Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 461 2. Run the pairdisplay command as shown in Example 14-27 to verify that the pairs are still established because the volume group is still active on the primary site. Example 14-27 Status using the pairdisplay command after the local Miami site fallover root@maddi: pairdisplay -fd -g hurdg01 -IH2 -CLI Group PairVol L/R Device_File Seq# LDEV# P/S Status hurdg01 hurd01 L hdisk40 35764 270 P-VOL PAIR hurdg01 hurd01 R hdisk40 45306 274 S-VOL PAIR hurdg01 hurd02 L hdisk41 35764 271 P-VOL PAIR hurdg01 hurd02 R hdisk41 45306 275 S-VOL PAIR Fence Seq# P-LDEV# ASYNC 45306 274 ASYNC 270 ASYNC 45306 275 ASYNC 271 M - 3. Upon cluster stabilization, run the reboot -q command on the maddi node. The bina node at the Austin sites acquires the valhallaRG resource group as shown in Example 14-28. Example 14-28 Hard failover from Miami site to Austin site root@bina: clRGinfo ----------------------------------------------------------------------------Group Name Group State Node ----------------------------------------------------------------------------emlecRG ONLINE jessica@Austin OFFLINE bina@Austin OFFLINE maddi@Miami valhallarg OFFLINE OFFLINE ONLINE krod@Miami maddi@Miami bina@Austin Important: Although our testing resulted in a site_down event, we never lost access to the primary storage subsystem. In a true site failure, including loss of storage, re-establish the replicated pairs, and synchronize them before moving back to the primary site. If you must change the storage LUNs, modify the horcm.conf file, and use the same device group and device names. You do not have to change the cluster resource configuration. 14.5.6 Site re-integration for the Miami site In this scenario, we restart both cluster nodes at the Miami site by using the smitty clstart command. Upon startup of the primary node krod, the valhallaRG resource group is automatically gracefully moved back to and returns to the original starting point as shown in Example 14-19 on page 455. Important: The resource group settings of the Inter-site Management Policy, also known as the site relationship, dictate the behavior of what occurs upon re-integration of the primary node. Because we chose Prefer Primary Site policy, the automatic fallback occurred. Initially we are unable to restart the cluster on the jessica node because of verification errors at startup, which are similar to the errors shown in Figure 14-17 on page 453. Of the two possible reasons for these errors, the first reason is that we failed to include starting the horcm instance on bootup. The second is reason is that we also had to re-map the copy protected device groups by running the raidscan command again. 462 IBM PowerHA SystemMirror 7.1 for AIX Important: Always ensure that the horcm instance is running before rejoining a node into the cluster. In some cases, if all instances, cluster nodes, or both have been down, you might need to run the raidscan command again. 14.6 LVM administration of TrueCopy/HUR replicated pairs This topic explains common scenarios for adding additional storage to an existing replicated environment using Hitachi TrueCopy/HUR. In this scenario, you only work with the Austin site and the emlecRG resource group in a TrueCopy synchronous replication. Overall the steps are the same for both types of replication. The difference is the initial pair creation. You perform the following tasks: Adding LUN pairs to an existing volume group Adding a new logical volume Increasing the size of an existing file system Adding a LUN pair to a new volume group Important: This topic does not explain how to dynamically expand a volume through Hitachi Logical Unit Size Expansion (LUSE) because this option is not supported. 14.6.1 Adding LUN pairs to an existing volume group In this task, you assign a new LUN to each site as you did in 14.4.1, “Assigning LUNs to the hosts (host groups)” on page 429. Table 14-2 shows a summary of the LUNs that are used. Before continuing, the LUNs must already be established in a paired relationship, and the LUNs or hdisk must be available on the appropriate cluster nodes. Table 14-2 Summary of the LUNs implemented Austin - Hitachi USPV - 45306 Miami - Hitachi USPVM - 35764 Port CL1-E Port CL-1B CU 01 CU 01 LUN 000E LUN 001B LDEV 01:14 LDEV 01:1F jessica hdisk# hdisk42 krod hdisk# hdisk42 bina hdisk# hdisk42 maddi hdisk# hdisk42 Then follow the same steps from of defining new LUNs as follows: 1. Run the cfgmgr command on the primary node jessica. 2. Assign the PVID on the jessica node. chdev -l hdisk42 -a pv=yes 3. Run the pairsplit command on the replicated LUNs. 4. Run the cfgmgr command on each of the remaining three nodes. 5. Verify that the PVID shows up on each node by using the lspv command. 6. Run the pairresync command on the replicated LUNs. Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 463 7. Shut down the horcm2 instance on each node: /HORCM/usr/bin/horcmshutdown.sh 2 8. Edit the /etc/horcm2.conf file on each node as appropriate for each site: – The krod and maddi nodes on the Miami site added the following new line: htcdg01 htcd03 35764 01:1F – The jessica and bina nodes on the Austin site added the following new line: htcdg01 htcd03 45306 01:14 9. Restart horcm2 instance on each node: /HORCM/usr/bin/horcmstart.sh 2 10.Map the devices and device group on any node: lsdev -Cc disk|grep hdisk|/HORCM/usr/bin/raidscan -IH2 -find inst We ran this command on the jessica node. 11.Verify that the htcgd01 device group pairs are now showing the new pairs, which consist of hdisk42 on each system as shown in Example 14-29. Example 14-29 New LUN pairs in the htcgd01 device group root@jessica: pairdisplay -fd -g htcdg01 -IH2 -CLI Group PairVol L/R Device_File Seq# LDEV# P/S Status htcdg01 htcd01 L hdisk38 45306 272 P-VOL PAIR htcdg01 htcd01 R hdisk38 35764 268 S-VOL PAIR htcdg01 htcd02 L hdisk39 45306 273 P-VOL PAIR htcdg01 htcd02 R hdisk39 35764 269 S-VOL PAIR htcdg01 htcd03 L hdisk42 45306 276 P-VOL PAIR htcdg01 htcd03 R hdisk42 35764 287 S-VOL PAIR Fence Seq# P-LDEV# NEVER 35764 268 NEVER 272 NEVER 35764 269 NEVER 273 NEVER 35764 287 NEVER 276 M - You are now ready to use C-SPOC to add the new disk into the volume group: Important: You cannot use C-SPOC for the following LVM operations to configure nodes at the remote site that contain the target volume: Creating a volume group Operations that require nodes at the target site to write to the target volumes For example, changing the file system size, changing the mount point, or adding LVM mirrors cause an error message in C-SPOC. However, nodes on the same site as the source volumes can successfully perform these tasks. The changes are then propagated to the other site by using a lazy update. For C-SPOC operations to work on all other LVM operations, perform all C-SPOC operations with the (TrueCopy/HUR) volume pairs in the Synchronized or Consistent states or the cluster ACTIVE on all nodes. 1. From the command line, type the smitty cl_admin command. 2. In SMIT, select the path System Management (C-SPOC) Storage Volume Groups Add a Volume to a Volume Group 3. Select the volume group truesyncvg from the pop-up menu. 464 IBM PowerHA SystemMirror 7.1 for AIX 4. Select hdisk42 as shown in Figure 14-21. Set Characteristics of a Volume Group Move cursor to desired item and press Enter. Add a Volume to a Volume Group Change/Show characteristics of a Volume Group Remove a Volume from a Volume Group +--------------------------------------------------------------------------+ | Physical Volume Names | | | | Move cursor to desired item and press Enter. | | | | 000a621aaf47ce83 ( hdisk2 on nodes bina,jessica ) | | 000a621aaf47ce83 ( hdisk3 on nodes krod,maddi ) | | 000cf1da43e72fc2 ( hdisk5 on nodes bina,jessica ) | | 000cf1da43e72fc2 ( hdisk6 on nodes krod,maddi ) | | 00cb14ce74090ef3 ( hdisk42 on all selected nodes ) | | 00cb14ceb0f5bd25 ( hdisk4 on nodes bina,jessica ) | | 00cb14ceb0f5bd25 ( hdisk14 on nodes krod,maddi ) | | | | F1=Help F2=Refresh F3=Cancel | | F8=Image F10=Exit Enter=Do | F1| /=Find n=Find Next | F9+--------------------------------------------------------------------------+ Figure 14-21 Selecting a disk to add to the volume group 5. Verify the menu information, as shown in Figure 14-22, and press Enter. Add a Volume to a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. VOLUME GROUP name Resource Group Name Node List Reference node VOLUME names [Entry Fields] truesyncvg emlecRG bina,jessica,krod,mad> bina hdisk42 Figure 14-22 Adding a volume to a volume group The krod node does not need the volume group because it is not a member of the resource group. However, we started with all four nodes seeing all volume groups and decided to leave the configuration that way. This way we have additional flexibility later if we need to change the cluster configuration to allow the krod node to take over as a last resort. Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 465 Upon completion of the C-SPOC operation, all four nodes now have the new disk as a member of the volume group as shown in Example 14-30. Example 14-30 New disk added to the volume group on all nodes root@jessica: lspv |grep truesyncvg hdisk38 00cb14ce564c3f44 hdisk39 00cb14ce564c40fb hdisk42 00cb14ce74090ef3 truesyncvg truesyncvg truesyncvg root@bina: lspv |grep truesyncvg hdisk38 00cb14ce564c3f44 hdisk39 00cb14ce564c40fb hdisk42 00cb14ce74090ef3 truesyncvg truesyncvg truesyncvg root@krod: lspv |grep truesyncvg hdisk38 00cb14ce564c3f44 hdisk39 00cb14ce564c40fb hdisk42 00cb14ce74090ef3 truesyncvg truesyncvg truesyncvg root@maddi: lspv |grep truesyncvg hdisk38 00cb14ce564c3f44 hdisk39 00cb14ce564c40fb truesyncvg truesyncvg hdisk42 truesyncvg 00cb14ce74090ef3 active active active We do not need to synchronize the cluster because all of these changes are made to an existing volume group. However, you might want to run the cl_verify_tc_config command to verify the resources replicated correctly. 14.6.2 Adding a new logical volume To perform this task, again you use C-SPOC, which updates the local nodes within the site. For the remote site, when a failover occurs, the lazy update process updates the volume group information as needed. This process also adds a bit of extra time to the failover time. To add a new logical volume: 1. From the command line, type the smitty cl_admin command. 2. In SMIT, select the path System Management (C-SPOC) Storage Logical Volumes Add a Logical Volume. 3. Select the truesyncvg volume group from the pop-up menu. 466 IBM PowerHA SystemMirror 7.1 for AIX 4. Choose the newly added disk hdisk42 as shown in Figure 14-23. Logical Volumes Move cursor to desired item and press Enter. List All Logical Volumes by Volume Group Add a Logical Volume Show Characteristics of a Logical Volume Set Characteristics of a Logical Volume +--------------------------------------------------------------------------+ | Physical Volume Names | | | | Move cursor to desired item and press F7. | | ONE OR MORE items can be selected. | | Press Enter AFTER making all selections. | | | | Auto-select | | jessica hdisk38 | | jessica hdisk39 | | jessica hdisk42 | | | | F1=Help F2=Refresh F3=Cancel | | F7=Select F8=Image F10=Exit | F1| Enter=Do /=Find n=Find Next | F9+--------------------------------------------------------------------------+ Figure 14-23 Selecting a disk for new logical volume creation 5. Complete the information in the final menu and press Enter. We added a new logical volume, named micah, which consists of 50 logical partitions (LPARs) and selected a type of raw. We accepted the default values for all other fields as shown in Figure 14-24. Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] Resource Group Name VOLUME GROUP name Node List Reference node * Number of LOGICAL PARTITIONS PHYSICAL VOLUME names Logical volume NAME Logical volume TYPE POSITION on physical volume RANGE of physical volumes MAXIMUM NUMBER of PHYSICAL VOLUMES to use for allocation Number of COPIES of each logical [Entry Fields] emlecRG truesyncvg bina,jessica,krod,mad> jessica [50] # hdisk42 [micah] [raw] + outer_middle + minimum + [] # 1 + Figure 14-24 Defining a new logical volume Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 467 6. Upon completion of the C-SPOC operation, verify that the new logical was created locally on the jessica node as shown Example 14-31. Example 14-31 Newly created logical volume root@jessica: lsvg -l truesyncvg truesyncvg: LV NAME TYPE LPs oreolv jfs2 125 majorlv jfs2 125 truefsloglv jfs2log 1 micah raw 50 PPs 125 125 1 50 PVs 1 1 1 1 LV STATE closed/syncd closed/syncd closed/syncd closed/syncd MOUNT POINT /oreofs /majorfs N/A N/A 14.6.3 Increasing the size of an existing file system To perform this task, again you use C-SPOC, which updates the local nodes within the site. For the remote site, when a failover occurs, the lazy update process updates the volume group information as needed. This process also adds a bit of extra time to the failover time. To increase the size of an existing file system, follow these steps: 1. From the command line, type the smitty cl_admin command. 2. In SMIT, select the path System Management (C-SPOC) Storage File Systems Change / Show Characteristics of a File System. 3. Select the oreofs file system from the pop-up menu. 4. Complete the information in the final menu as desired and press Enter. In this scenario, we roughly tripled the size of the file system from 500 MB (125 LPARs), as shown in Example 14-31, to 1536 MB as shown in Figure 14-25. Change/Show Characteristics of a Enhanced Journaled File System Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] Volume group name Resource Group Name * Node Names * File system name NEW mount point SIZE of file system Unit Size Number of units Mount GROUP Mount AUTOMATICALLY at system restart? PERMISSIONS Mount OPTIONS Figure 14-25 Changing the file system size 468 IBM PowerHA SystemMirror 7.1 for AIX [Entry Fields] truesyncvg emlecRG krod,maddi,bina,jessi> /oreofs [/oreofs] M [1536] [] no read/write [] / + # + + 5. Upon completion of the C-SPOC operation, verify the new file system size locally on the jessica node as shown in Example 14-32. Example 14-32 Newly increased file system size root@jessica: lsvg -l truesyncvg truesyncvg: LV NAME TYPE LPs oreolv jfs2 384 majorlv jfs2 125 truefsloglv jfs2log 1 micah raw 50 PPs 384 125 1 50 PVs 1 1 1 1 LV STATE closed/syncd closed/syncd closed/syncd closed/syncd MOUNT POINT /oreofs /majorfs N/A N/A You do not need to synchronize the cluster because all of these changes are made to an existing volume group. However, you might want to make sure that the replicated resources verify correctly. Use the cl_verify_tc_config command first to isolate the replicated resources specifically. Testing failover after making the LVM changes Because you do not know if the cluster is going to work when needed, repeat the steps from 14.5.2, “Rolling site failure of the Austin site” on page 457. The new logical volume micah and the additional space on /oreofs show up on each node. However, there is a noticeable difference in the total time involved during the site failover when the lazy update was performed to update the volume group changes. 14.6.4 Adding a LUN pair to a new volume group The steps for adding a new volume are the same as the steps in 14.6.1, “Adding LUN pairs to an existing volume group” on page 463. The differences are that you are creating a volume group, which is required to add a new volume group into a resource group. For completeness, the initial steps are documented here along with an overview of the new LUNs to be used: 1. Run the cfgmgr command on the primary node jessica. 2. Assign the PVID on the jessica node: chdev -l hdisk43 -a pv=yes 3. Run the pairsplit command on the replicated LUNs. 4. Run the cfgmgr command on each of the remaining three nodes. 5. Verify that the PVID shows up on each node by using the lspv command. 6. Run the pairresync command on the replicated LUNs. 7. Shut down the horcm2 instance on each node: /HORCM/usr/bin/horcmshutdown.sh 2 8. Edit the /etc/horcm2.conf file on each node as appropriate for each site: – On the Miami site, the krod and maddi nodes added the following new line: htcdg01 htcd04 45306 00:20 – On the Austin site, the jessica and bina nodes added the following new line: htcdg01 htcd04 35764 00:0A 9. Restart the horcm2 instance on each node: /HORCM/usr/bin/horcmstart.sh 2 Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 469 10.Map the devices and device group on any node. We ran the raidscan command on the jessica node. See Table 14-3 for additional configuration details. lsdev -Cc disk|grep hdisk|/HORCM/usr/bin/raidscan -IH2 -find inst Table 14-3 Details on the Austin and Miami LUNs Austin - Hitachi USPV - 45306 Miami - Hitachi USPVM - 35764 Port CL1-E Port CL-1B CU 00 CU 00 LUN 000F LUN 0021 LDEV 00:20 LDEV 00:0A jessica hdisk# hdisk43 krod hdisk# hdisk43 bina hdisk# hdisk43 maddi hdisk# hdisk43 11.Verify that the htcgd01 device group pairs are now showing the new pairs that consist of hdisk42 on each system as shown in Example 14-33. Example 14-33 New LUN pairs add to htcgd01 device group root@jessica: pairdisplay -fd -g htcdg01 -IH2 -CLI Group PairVol L/R Device_File Seq# LDEV# P/S Status htcdg01 htcd01 L hdisk38 45306 272 P-VOL PAIR htcdg01 htcd01 R hdisk38 35764 268 S-VOL PAIR htcdg01 htcd02 L hdisk39 45306 273 P-VOL PAIR htcdg01 htcd02 R hdisk39 35764 269 S-VOL PAIR htcdg01 htcd04 L hdisk43 45306 32 P-VOL PAIR htcdg01 htcd04 R hdisk43 35764 10 S-VOL PAIR Fence Seq# P-LDEV# NEVER 35764 268 NEVER 272 NEVER 35764 269 NEVER 273 NEVER 35764 10 NEVER 32 You are now ready to use C-SPOC to create a volume group: 1. From the command line, type the smitty cl_admin command. 2. In SMIT, select the path System Management (C-SPOC) Storage Volume Groups Create a Volume to a Volume Group. 470 IBM PowerHA SystemMirror 7.1 for AIX M - 3. In the Node Names panel, select the specific nodes. We chose all four as shown in Figure 14-26. Volume Groups Move cursor to desired item and press Enter. List All Volume Groups Create a Volume Group Create a Volume Group with Data Path Devices +--------------------------------------------------------------------------+ | Node Names | | | | Move cursor to desired item and press F7. | | ONE OR MORE items can be selected. | | Press Enter AFTER making all selections. | | | | > bina | | > jessica | | > krod | | > maddi | | | | F1=Help F2=Refresh F3=Cancel | | F7=Select F8=Image F10=Exit | F1| Enter=Do /=Find n=Find Next | F9+--------------------------------------------------------------------------+ Figure 14-26 Selecting a volume group node Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 471 4. In the Physical Volume Names panel (Figure 14-27), select hdisk43. Volume Groups Move cursor to desired item and press Enter. List All Volume Groups +--------------------------------------------------------------------------+ | Physical Volume Names | | | | Move cursor to desired item and press F7. | | ONE OR MORE items can be selected. | | Press Enter AFTER making all selections. | | | | 000a621aaf47ce83 ( hdisk2 on nodes bina,jessica ) | | 000a621aaf47ce83 ( hdisk3 on nodes krod,maddi ) | | 000cf1da43e72fc2 ( hdisk5 on nodes bina,jessica ) | | 000cf1da43e72fc2 ( hdisk6 on nodes krod,maddi ) | | 00cb14ce75bab41a ( hdisk43 on all selected nodes ) | | 00cb14ceb0f5bd25 ( hdisk4 on nodes bina,jessica ) | | 00cb14ceb0f5bd25 ( hdisk14 on nodes krod,maddi ) | | | | F1=Help F2=Refresh F3=Cancel | | F7=Select F8=Image F10=Exit | F1| Enter=Do /=Find n=Find Next | F9+--------------------------------------------------------------------------+ Figure 14-27 Selecting an hdisk for a new volume group 472 IBM PowerHA SystemMirror 7.1 for AIX 5. In the Volume Group Type panel, select the volume group type. We chose Scalable as shown in Figure 14-28. Volume Groups Move cursor to desired item and press Enter. List All Volume Groups Create a Volume Group Create a Volume Group with Data Path Devices Set Characteristics of a Volume Group +--------------------------------------------------------------------------+ | Volume Group Type | | | | Move cursor to desired item and press Enter. | | | | Legacy | | Original | | Big | | Scalable | | | | F1=Help F2=Refresh F3=Cancel | | F8=Image F10=Exit Enter=Do | F1| /=Find n=Find Next | F9+--------------------------------------------------------------------------+ Figure 14-28 Selecting the volume group type for a new volume group 6. In the Create a Scalable Volume Group panel, select the proper resource group. We chose emlecRG as shown in Figure 14-29. Create a Scalable Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] Node Names Resource Group Name PVID VOLUME GROUP name Physical partition SIZE in megabytes Volume group MAJOR NUMBER Enable Cross-Site LVM Mirroring Verification Enable Fast Disk Takeover or Concurrent Access Volume Group Type Maximum Physical Partitions in units of 1024 Maximum Number of Logical Volumes [Entry Fields] bina,jessica,krod,mad> [emlecRG] 00cb14ce75bab41a [truetarahvg] 4 [58] false no Scalable 32 256 + + # + + + + Figure 14-29 Create a volume group final C-SPOC SMIT menu 7. Choose a volume group name. We chose truetarahvg. Press Enter. Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 473 8. Verify that the volume group is successfully created, which we do on all four nodes as shown in Example 14-34. Example 14-34 Newly created volume group on all nodes root@jessica: lspv |grep truetarahvg hdisk43 00cb14ce75bab41a truetarahvg root@bina: lspv |grep truetarahvg hdisk43 00cb14ce75bab41a truetarahvg root@krod: lspv |grep truetarahvg hdisk43 00cb14ce75bab41a truetarahvg root@maddi: lspv |grep truetarahvg hdisk43 00cb14ce75bab41a truetarahvg When creating the volume group, the volume group is automatically added to the resource group as shown in Example 14-35. However, we do not have to change the resource group any further, because the new disk and device are added to the same device group and TrueCopy/HUR replicated resource. Example 14-35 Newly added volume group also added to the resource group Resource Group Name Participating Node Name(s) Startup Policy Fallover Policy Fallback Policy Site Relationship Node Priority Service IP Label Volume Groups Hitachi TrueCopy Replicated Resources emlecRG jessica bina maddi Online On Home Node Only Fallover To Next Priority Node Never Fallback Prefer Primary Site service_1 truesyncvg truetarahvg truelee 9. Repeat the steps in 14.6.2, “Adding a new logical volume” on page 466, to create a new logical volume, named tarahlv on the newly created volume group truetarahvg. Example 14-36 shows the new logical volume. Example 14-36 New logical volume on newly added volume group root@jessica: lsvg -l truetarahvg truetarahvg: LV NAME TYPE LPs tarahlv raw 25 PPs 25 PVs 1 LV STATE closed/syncd MOUNT POINT N/A 10.Manually run the cl_verify_tc_config command to verify that the new addition of the replicated resources is complete. 474 IBM PowerHA SystemMirror 7.1 for AIX Important: During our testing, we encountered a defect after the second volume group was added to the resource group. The cl_verify_tc_config command produced the following error messages: cl_verify_tc_config: ERROR - Disk hdisk38 included in Device Group htcdg01 does not match any hdisk in Volume Group truetarahvg. cl_verify_tc_config: ERROR - Disk hdisk39 included in Device Group htcdg01 does not match any hdisk in Volume Group truetarahvg. cl_verify_tc_config: ERROR - Disk hdisk42 included in Device Group htcdg01 does not match any hdisk in Volume Group truetarahvg. Errors found verifying the HACMP TRUECOPY/HUR configuration. Status=3 These results incorrectly imply a one to one relationship between the device group/replicated resource and the volume group, which is not intended. To work around this problem, ensure that the cluster is down, do a forced synchronization, and then start the cluster but ignore the verification errors. Usually performing both a forced synchronization and then starting the cluster ignoring errors is not recommended. Contact IBM support to see if a fix is available. Synchronize the resource group change to include the new volume that you just added. Usually you can perform this task within a running cluster. However, because of the defect mentioned in the previous Important box, we had to have the cluster down to synchronize it. To perform this task, follow these steps: 1. From the command line, type the smitty hacmp command. 2. In SMIT, select the path Extended Configuration Extended Verification and Synchronization and Verification 3. In the HACMP Verification and Synchronization display (Figure 14-30), for Force synchronization if verification fails, select Yes. HACMP Verification and Synchronization Type or select values in entry fields. Press Enter AFTER making all desired changes. * Verify, Synchronize or Both * Automatically correct errors found during verification? * Force synchronization if verification fails? * Verify changes only? * Logging F1=Help F5=Reset F2=Refresh F6=Command [Entry Fields] [Both] [No] + + [Yes] [No] [Standard] + + + F3=Cancel F7=Edit F4=List F8=Image Figure 14-30 Extended Verification and Synchronization SMIT menu 4. Verify the information is correct, and press Enter. Upon completion, the cluster configuration is in sync and can now be tested. 5. Repeat the steps for a rolling system failure as explained in 14.5.2, “Rolling site failure of the Austin site” on page 457. In this scenario, the tests are successful. Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator 475 Testing failover after adding a new volume group Because you do not know if the cluster is going to work when needed, repeat the steps of a rolling site failure as explained in 14.5.2, “Rolling site failure of the Austin site” on page 457. The new volume group truetarahvg and new logical volume tarahlv are displayed on each node. However, there is a noticeable difference in total time involved during the site failover when the lazy update is performed to update the volume group changes. 476 IBM PowerHA SystemMirror 7.1 for AIX A Appendix A. CAA cluster commands This appendix provides a list of the Cluster Aware AIX (CAA) administration commands, and examples of how to use them. The information about these commands has been gathered from the new AIX man pages and placed in this appendix for your reference. This list is not an exhaustive list of all new commands, but focuses on commands that you might come across during the administration of your PowerHA cluster. This appendix includes the following topics: The lscluster command The mkcluster command The rmcluster command The chcluster command The clusterconf command © Copyright IBM Corp. 2011. All rights reserved. 477 The lscluster command The lscluster command lists the cluster configuration information. Syntax lscluster -i [ -n ] | -s | -m | -d | -c Description The lscluster command shows the attributes that are associated with the cluster and the cluster configuration. Flags -i Lists the cluster configuration interfaces on the local node. -n Allows the cluster name to be queried for all interfaces (applicable only with the -i flag). -s Lists the cluster network statistics on the local node. -m Lists the cluster node configuration information. -d Lists the cluster storage interfaces. -c Lists the cluster configuration. Examples To list the cluster configuration for all nodes, enter the following command: lscluster -m To list the cluster statistics for the local node, enter the following command: lscluster -s To list the interface information for the local node, enter the following command: lscluster -i To list the interface information for the cluster, enter the following command: lscluster -i -n mycluster To list the storage interface information for the cluster, enter the following command: lscluster -d To list the cluster configuration, enter the following command: lscluster -c The mkcluster command The mkcluster command creates a cluster. Syntax mkcluster [ -n clustername ] [ -m node[,...] ] -r reposdev [-d shareddisk [,...]] [-s multaddr_local ] [-v ] 478 IBM PowerHA SystemMirror 7.1 for AIX Description The mkcluster command creates a cluster. Each node that is added to the cluster must have common storage area network (SAN) storage devices that are configured and zoned appropriately. The SAN storage devices are used for the cluster repository disk and for any clustered shared disks. (The shared disks that are added to a cluster configuration share the same name across all the nodes in the cluster.) A multicast address is used for cluster communications between the nodes in the cluster. Therefore, if any network considerations must be reviewed before creating a cluster, consult your network systems administrator. Flags -n clustername Sets the name of the local cluster being created. If no name is specified when you first run the mkcluster command, a default of SIRCOL_hostname is used, where hostname is the name (gethostname()) of the local host. -m node[,...] Lists the comma-separated resolvable host names or IP addresses for nodes that are members of the cluster. The local host must be included in the list. If the -m option is not used, the local host is implied, causing a one-node local cluster to be created. -r reposdev Specifies the name, such as hdisk10, of the SAN-shared storage device that is used as the central repository for the cluster configuration data. This device must be accessible from all nodes. This device is required to be a minimum of 1 GB in size and backed by a redundant and highly available SAN configuration. This flag is required when you first run the mkcluster command within a Storage Interconnected Resource Collection (SIRCOL), and cannot be used thereafter. -d shareddisk[,...] Specifies a comma-separated list of SAN-shared storage devices, such as hdisk12,hdisk34, to be incorporated into the cluster configuration. These devices are renamed with a cldisk prefix. The same name is assigned to this device on all cluster nodes from which the device is accessible. Specified devices must not be open when the mkcluster command is executed. This flag is used only when you first run the mkcluster command. -s multaddr_local Sets the multicast address of the local cluster that is being created. This address is used for internal communication within the local cluster. If the -s option is not specified when you first run the mkcluster command within a SIRCOL, a multicast address is automatically generated. This flag is used only when you first run the mkcluster command within a SIRCOL. -v Specifies the verbose mode. Examples To create a cluster of one node and use the default values, enter the following command: mkcluster -r hdisk1 The output is a cluster named SIRCOL_myhostname with a single node in the cluster. The multicast address is automatically generated, and no shared disks are created for this cluster. The repository device is set up on hdisk1, and this disk cannot be used by the Appendix A. CAA cluster commands 479 node for any other purpose. The repository device is now dedicated to being the cluster repository disk. To create a multinode cluster, enter the following command: mkcluster -n mycluster -m nodeA,nodeB,nodeC -r hdisk1 -d hdisk10,hdisk11,hdisk12 The output is a cluster of three nodes and uses the default values. The output also creates a cluster with the specified name, and the multicast address is automatically created. Three disks are created as shared clustered disks for this cluster, and these disks share the same name across all the nodes in this cluster. You can run the lspv command to see the new names after the cluster is created. The repository device is set up on hdisk1 and cannot be used by any of the nodes for any other purpose. The repository device is now dedicated to being the cluster repository disk. A volume group is created for the cluster repository disk. These logical volumes are used exclusively by the clustering subsystem. The rmcluster command The rmcluster command removes the cluster configuration. Syntax rmcluster -n name [-f] [-v] Description The rmcluster command removes the cluster configuration. The repository disk and all SAN Volume Controller (SVC) shared disks are released, and the SAN shared disks are re-assigned to a generic hdisk name. The generic hdisk name cannot be the same name that was initially used to add the disk to the cluster. Flags -n name Specifies the name of the cluster to be removed. -f Forces certain errors to be ignored. -v Specifies the verbose. Example To remove the cluster configuration, enter the following command: rmcluster -n mycluster The chcluster command The chcluster command is used to change the cluster configuration. Syntax chcluster [ -n name ] [{ -d | -m } [+|-] name [,....]] ..... [ -q ][ -f ][ -v ] Description The chcluster command changes the cluster configuration. With this command, SAN shared disks and nodes can be added and removed from the cluster configuration. 480 IBM PowerHA SystemMirror 7.1 for AIX Flags -d [+|-]shareddisk[,...] Specifies a comma-separated list of shared storage-device names to be added to or removed from a cluster configuration. The new shared disks are renamed with a cldisk prefix. The same name is assigned to this device on all cluster nodes from which the device can be accessed. Deleted devices are re-assigned a generic hdisk name. This newly reassigned hdisk name might not be the same as it was before it was added to the cluster configuration. The shared disks must not be open when the chcluster command is executed. -m [+|-]node[,...] Specifies a comma-separated list of node names to be added or removed from the cluster configuration. -n name Specifies the name of the cluster to be changed. If omitted, the default cluster is used. -q The quick mode option, which performs the changes on the local node only. If this option is used, the other nodes in the cluster configuration are asynchronously contacted and the changes are performed. -f The force option, which causes certain errors to be ignored. -v Verbose mode Examples To add shared disks to the cluster configuration, enter the following command: chcluster -n mycluster -d +hdisk20,+hdisk21 To remove shared disks from the cluster configuration, enter the following command: chcluster -n mycluster -d -hdisk20,-hdisk21 To add nodes to the cluster configuration, enter the following command: chcluster -n mycluster -m +nodeD,+nodeE To remove nodes from the cluster configuration, enter the following command: chcluster -n mycluster -m -nodeD,-nodeE The clusterconf command The clusterconf command is a service utility for administration of a cluster configuration. Syntax clusterconf [ -u [-f ] | -s | -r hdiskN ] [-v ] Description The clusterconf command allows administration of the cluster configuration. A node in a cluster configuration might indicate a status of DOWN (viewable by issuing the lscluster -m command). Alternatively, a node in a cluster might not be displayed in the cluster configuration, and you know the node is part of the cluster configuration (viewable from another node in the cluster by using the lscluster -m command). In these cases, the following flags allow the node to search and read the repository disk and take self-correcting actions. Do not use the clusterconf command option to remove a cluster configuration. Instead, use the rmcluster command for normal removal of the cluster configuration. Appendix A. CAA cluster commands 481 Flags If no flags are specified, the clusterconf command performs a refresh operation by retrieving the cluster repository configuration and performing the necessary actions. The following actions might occur: A cluster node joins a cluster of which the node is a member and for some reason was disconnected from the cluster (either from network or SAN problems) A cluster node might perform a resync with the cluster repository configuration (again from some problems in the network or SAN) A cluster node might leave the cluster configuration if the node was removed from the cluster repository configuration. The clusterconf command is a normal cluster service and is automatically handled during normal operation. This following flags are possible for this command: -r hdiskN Has the cluster subsystem read the repository device if you know where the repository disk is (lspv and look for cvg). It causes the node to join the cluster if the node is configured in the repository disk. -s Performs an exhaustive search for a cluster repository disk on all configured hdisk devices. It stops when a cluster repository disk is found. This option searches all disks that are looking for the signature of a repository device. If a disk is found with the signature identifying it as the cluster repository, the search is stopped. If the node finds itself in the cluster configuration on the disk, the node joins the cluster. If the storage network is dirty and multiple repositories are in the storage network (not supported), it stops at the first repository disk. If the node is not in that repository configuration, it does not join the cluster. Use the -v flag to see which disk was found. Then use the other options on the clusterconf command to clean up the storage network until the desired results are achieved. -u Performs the unconfigure operation for the local node. If the node is in the cluster repository configuration on the shared disk to which the other nodes have access, the other nodes in the cluster request this node to rejoin the cluster. The -u option is used when cleanup must be performed on the local node. (The node was removed from the cluster configuration. For some reason, the local node was either down or inaccessible from the network to be removed during normal removal operations such as when the chcluster -m -nodeA command was run). The updates to clean up the environment on the local node are performed by the unconfigure operation. -f The force option, which performs the unconfigure operation and ignores errors. -v Verbose mode. Examples To clean up the local node, the following command cleans up the nodes environment: clusterconf -fu To recover the cluster configuration and start cluster services, enter the following command: clusterconf -r hdisk1 To search for the cluster repository device and join the cluster, enter the following command: clusterconf -s 482 IBM PowerHA SystemMirror 7.1 for AIX B Appendix B. PowerHA SMIT tree This appendix includes the PowerHA v7.1 SMIT tree. Depending on the version of PowerHA that you have installed, you might notice some differences. Note the following explanation to help you understand how to read the tree: The number of right-pointing double quotation marks (») indicates the number of screens that you have to go down in the PowerHA SMIT tree. For example, » » » means that you must page down three screens. The double en dashes (--) are used as a separator between the SMIT text and the SMIT fast path. The parentheses (()) indicate the fast path. » Cluster Nodes and Networks -- (cm_cluster_nodes_networks) » » Initial Cluster Setup (Typical) -- (cm_setup_menu) » » » Setup a Cluster, Nodes and Networks -- (cm_setup_cluster_nodes_networks) » » » Define Repository Disk and Cluster IP Address -- cm_define_repos_ip_addr) » » » What are a repository disk and cluster IP address ? -- (cm_whatis_repos_ip_addr) » » Manage the Cluster -- (cm_manage_cluster) » » »PowerHA SystemMirror Configuration -- (cm_show_cluster_top) » » »Remove the Cluster Definition -- (cm_remove_cluster) » » » Snapshot Configuration -- (cm_cfg_snap_menu) » » » » Create a Snapshot of the Cluster Configuration -- (cm_add_snap.dialog) » » » » Change/Show a Snapshot of the Cluster Configuration -- (cm_show_snap.select) » » » » Remove a Snapshot of the Cluster Configuration -- (cm_rm_snap.select) » » » » Restore the Cluster Configuration From a Snapshot -- (cm_apply_snap.select) » » » » Configure a Custom Snapshot Method -- (clsnapshot_custom_menu) » » » » » Add a Custom Snapshot Method -- (clsnapshot_custom_dialog_add) » » » » » Change/Show a Custom Snapshot Method -- (clsnapshot_custom_dialog_cha.select) » » » » » Remove a Custom Snapshot Method -- (clsnapshot_custom_dialog_rem.select) » » Manage Nodes -- (cm_manage_nodes) » » » Show Topology Information by Node -- (cllsnode_menu) » » » » Show All Nodes -- (cllsnode.dialog) » » » » Select a Node to Show -- (cllsnode_select) » » » Add a Node -- (cm_add_node) » » » Change/Show a Node -- (cm_change_show_node) » » » Remove Nodes -- (cm_remove_node) » » » Configure Persistent Node IP Label/Addresses -- (cm_persistent_addresses) © Copyright IBM Corp. 2011. All rights reserved. 483 » » » » Add a Persistent Node IP Label/Address -(cm_add_a_persistent_node_ip_label_address_select) » » » » Change/Show a Persistent Node IP Label/Address -(cm_change_show_a_persistent_node_ip_label_address_select) » » » » Remove a Persistent Node IP Label/Address -(cm_delete_a_persistent_node_ip_label_address_select) » » » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync) » » Manage Networks and Network Interfaces -- (cm_manage_networks_interfaces) » » » Networks -- (cm_manage_networks_menu) » » » » Add a Network -- (cm_add_network) » » » » Change/Show a Network -- (cm_change_show_network) » » » » Remove a Network -- (cm_remove_network) » » » Network Interfaces -- (cm_manage_interfaces_menu) » » » » Add a Network Interface -- (cm_add_interfaces) » » » » Change/Show a Network Interface -- (cm_change_show_interfaces) » » » » Remove a Network Interface -- (cm_remove_interfaces) » » » Show Topology Information by Network -- (cllsnw_menu) » » » » Show All Networks -- (cllsnw.dialog) » » » » Select a Network to Show -- (cllsnw_select) » » » Show Topology Information by Network Interface -- (cllsif_menu) » » » » Show All Network Interfaces -- (cllsif.dialog) » » » » Select a Network Interface to Show -- (cllsif_select) » » » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync) » » Discover Network Interfaces and Disks -- (cm_discover_nw_interfaces_and_disks) » » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync) » Cluster Applications and Resources -- (cm_apps_resources) » » Make Applications Highly Available (Use Smart Assists) -- (clsa) » » Resources -- (cm_resources_menu) » » » Configure User Applications (Scripts and Monitors) -- (cm_user_apps) » » » » Application Controller Scripts -- (cm_app_scripts) » » » » » Add Application Controller Scripts -- (cm_add_app_scripts) » » » » » Change/Show Application Controller Scripts -- (cm_change_show_app_scripts) » » » » » Remove Application Controller Scripts -- (cm_remove_app_scripts) » » » » » What is an "Application Controller" anyway ? -- (cm_app_controller_help) » » » » Application Monitors -- (cm_appmon) » » » » » Configure Process Application Monitors -- (cm_cfg_process_appmon) » » » » » » Add a Process Application Monitor -- (cm_add_process_appmon) » » » » » » Change/Show Process Application Monitor -- (cm_change_show_process_appmon) » » » » » » Remove a Process Application Monitor -- (cm_remove_process_appmon) » » » » » Configure Custom Application Monitors -- (cm_cfg_custom_appmon) » » » » » » Add a Custom Application Monitor -- (cm_add_custom_appmon) » » » » » » Change/Show Custom Application Monitor -- (cm_change_show_custom_appmon) » » » » » » Remove a Custom Application Monitor -- (cm_remove_custom_appmon) » » » » Configure Application for Dynamic LPAR and CoD Resources -- (cm_cfg_appondemand) » » » » » Configure Communication Path to HMC -- (cm_cfg_apphmc) » » » » » » Add HMC IP addresses for a node -- (cladd_apphmc.dialog) » » » » » » Change/Show HMC IP addresses for a node -- (clch_apphmc.select) » » » » » » Remove HMC IP addresses for a node -- (clrm_apphmc.select) » » » » » Configure Dynamic LPAR and CoD Resources for Applications -- (cm_cfg_appdlpar) » » » » » » Add Dynamic LPAR and CoD Resources for Applications -- (cm_add_appdlpar) » » » » » » Change/Show Dynamic LPAR and CoD Resources for Applications -(cm_change_show_appdlpar) » » » » » » Remove Dynamic LPAR and CoD Resources for Applications -- (cm_remove_appdlpar) » » » » Show Cluster Applications -- (cldisp.dialog) » » » Configure Service IP Labels/Addresses -- (cm_service_ip) » » » » Add a Service IP Label/Address -- (cm_add_a_service_ip_label_address.select_net) » » » » Change/Show a Service IP Label/Address -- (cm_change_service_ip.select) » » » » Remove Service IP Label(s)/Address(es) -- (cm_delete_service_ip.select) » » » » Configure Service IP Label/Address Distribution Preferences -- 484 IBM PowerHA SystemMirror 7.1 for AIX (cm_change_show_service_ip_distribution_preference_select) » » » Configure Tape Resources -- (cm_cfg_tape) » » » » Add a Tape Resource -- (cm_add_tape) » » » » Change/Show a Tape Resource -- (cm_change_tape) » » » » Remove a Tape Resource -- (cm_remove_tape) » » » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync) » » Resource Groups -- (cm_resource_groups) » » » Add a Resource Group -- (cm_add_resource_group) » » » Change/Show Nodes and Policies for a Resource Group -(cm_change_show_rg_nodes_policies) » » » Change/Show Resources and Attributes for a Resource Group -(cm_change_show_rg_resources) » » » Remove a Resource Group -- (cm_remove_resource_group) » » » Configure Resource Group Run-Time Policies -(cm_config_resource_group_run-time_policies_menu_dmn) » » » » Configure Dependencies between Resource Groups -- (cm_rg_dependencies_menu) » » » » » Configure Parent/Child Dependency -- (cm_rg_dependencies) » » » » » » Add Parent/Child Dependency between Resource Groups -(cm_rg_dependencies add.select) » » » » » » Change/Show Parent/Child Dependency between Resource Groups -(cm_rg_dependencies ch.select) » » » » » » Remove Parent/Child Dependency between Resource Groups -(cm_rg_dependencies rm.select) » » » » » » Display All Parent/Child Resource Group Dependencies -(cm_rg_dependencies display.select) » » » » » Configure Start After Resource Group Dependency -(cm_rg_dependencies_startafter_main_menu) » » » » » » Add Start After Resource Group Dependency -- (cm_rg_dependencies add.select startafter) » » » » » » Change/Show Start After Resource Group Dependency -(cm_rg_dependencies ch.select startafter) » » » » » » Remove Start After Resource Group Dependency -(cm_rg_dependencies rm.select startafter) » » » » » » Display Start After Resource Group Dependencies -(cm_rg_dependencies display.select startafter) » » » » » Configure Stop After Resource Group Dependency -(cm_rg_dependencies_stopafter_main_menu) » » » » » » Add Stop After Resource Group Dependency -(cm_rg_dependencies add.select stopafter) » » » » » » Change/Show Stop After Resource Group Dependency -(cm_rg_dependencies ch.select stopafter) » » » » » » Remove Stop After Resource Group Dependency -(cm_rg_dependencies rm.select stopafter) » » » » » » Display Stop After Resource Group Dependencies -(cm_rg_dependencies display.select stopafter) » » » » » Configure Online on the Same Node Dependency -- (cm_rg_osn_dependencies) » » » » » » Add Online on the Same Node Dependency Between Resource Groups -(cm_rg_osn_dependencies add.dialog) » » » » » » Change/Show Online on the Same Node Dependency Between Resource Groups -(cm_rg_osn_dependencies ch.select) » » » » » » Remove Online on the Same Node Dependency Between Resource -(cm_rg_osn_dependencies rm.select) » » » » » Configure Online on Different Nodes Dependency -- (cm_rg_odn_dependencies.dialog) » » » » Configure Resource Group Processing Ordering -- (cm_processing_order) » » » » Configure PowerHA SystemMirror Workload Manager Parameters -- (cm_cfg_wlm_runtime) » » » » Configure Delayed Fallback Timer Policies -- (cm_timer_menu) » » » » » Add a Delayed Fallback Timer Policy -- (cm_timer_add.select) » » » » » Change/Show a Delayed Fallback Timer Policy -- (cm_timer_update.select) » » » » » Remove a Delayed Fallback Timer Policy -- (cm_timer_remove.select) » » » » Configure Settling Time for Resource Groups -- (cm_settling_timer_menu) » » » Show All Resources by Node or Resource Group -- Appendix B. PowerHA SMIT tree 485 (cm_show_all_resources_by_node_or_resource_group_menu_dmn » » » » Show Resource Information by Node -- (cllsres.select) » » » » Show Resource Information by Resource Group -- (clshowres.select) » » » » Show Current State of Applications and Resource Groups -(cm_show_current_state_application_resource_group_menu_dwn) » » » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync) » » » What is a "Resource Group" anyway ? -- (cm_resource_group_help) » » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync) » System Management (C-SPOC) -- (cm_system_management_cspoc_menu_dmn) » » Storage -- (cl_lvm) » » » Volume Groups -- (cl_vg) » » » » List All Volume Groups -- (cl_lsvgA) » » » » Create a Volume Group -- (cl_createvg) » » » » Create a Volume Group with Data Path Devices -- (cl_createvpathvg) » » » » Set Characteristics of a Volume Group -- (cl_vgsc) » » » » » Add a Volume to a Volume Group -- (cl_extendvg) » » » » » Change/Show characteristics of a Volume Group -- (cl_chshsvg) » » » » » Remove a Volume from a Volume Group -- (cl_reducevg) » » » » » Enable/Disable a Volume Group for Cross-Site LVM Mirroring Verification -(hacmp_sm_lv_svg_sc_ed) » » » » Enable a Volume Group for Fast Disk Takeover or Concurrent Access -- (cl_vgforfdto) » » » » Import a Volume Group -- (cl_importvg) » » » » Mirror a Volume Group -- (cl_mirrorvg) » » » » Unmirror a Volume Group -- (cl_unmirrorvg) » » » » Manage Critical Volume Groups -- (cl_manage_critical_vgs) » » » » » Mark a Volume Group as Critical -- (cl_mark_critical_vg.select) » » » » » Show all Critical volume groups -- (cl_show_critical_vgs) » » » » » Mark a Volume Group as non-Critical -- (cl_mark_noncritical_vg.select) » » » » » Configure failure actions for Critical Volume Groups -- (cl_set_critical_vg_response) » » » » Synchronize LVM Mirrors -- (cl_syncvg) » » » » » Synchronize by Volume Group -- (cl_syncvg_vg) » » » » » Synchronize by Logical Volume -- (cl_syncvg_lv) » » » » Synchronize a Volume Group Definition -- (cl_updatevg) » » » Logical Volumes -- (cl_lv) » » » » List All Logical Volumes by Volume Group -- (cl_lslv0) » » » » Add a Logical Volume -- (cl_mklv) » » » » Show Characteristics of a Logical Volume -- (cl_lslv) » » » » Set Characteristics of a Logical Volume -- (cl_lvsc) » » » » » Rename a Logical Volume -- (cl_renamelv) » » » » » Increase the Size of a Logical Volume -- (cl_extendlv) » » » » » Add a Copy to a Logical Volume -- (cl_mklvcopy) » » » » » Remove a Copy from a Logical Volume -- (cl_rmlvcopy) » » » » Change a Logical Volume -- (cl_chlv1) » » » » Remove a Logical Volume -- (cl_rmlv1) » » » File Systems -- (cl_fs) » » » » List All File Systems by Volume Group -- (cl_lsfs) » » » » Add a File System -- (cl_mkfs) » » » » Change / Show Characteristics of a File System -- (cl_chfs) » » » » Remove a File System -- (cl_rmfs) » » » Physical Volumes -- (cl_disk_man) » » » » Add a Disk to the Cluster -- (cl_disk_man add nodes) » » » » Remove a Disk From the Cluster -- (cl_disk_man rem nodes) » » » » Cluster Disk Replacement -- (cl_disk_man.replace) » » » » Cluster Data Path Device Management -- (cl_dpath_mgt) » » » » » Display Data Path Device Configuration -- (cl_dpls_cfg.select) » » » » » Display Data Path Device Status -- (cl_dp_stat.select) » » » » » Display Data Path Device Adapter Status -- (cl_dpdadapter_stat.select) » » » » » Define and Configure all Data Path Devices -- (cl_dpdefcfg_all.select) » » » » » Add Paths to Available Data Path Devices -- (cl_dpaddpaths.select) 486 IBM PowerHA SystemMirror 7.1 for AIX » » » » » Configure a Defined Data Path Device -- (cl_dpconfdef.select) » » » » » Remove a Data Path Device -- (cl_dprmvp.select) » » » » » Convert ESS hdisk Device Volume Group to an SDD VPATH Device -(cl_dphd2vp.select) » » » » » Convert SDD VPATH Device Volume Group to an ESS hdisk Device -(cl_dpvp2hd.select) » » » » Configure Disk/Site Locations for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds) » » » » » Add Disk/Site Definition for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds_ad) » » » » » Change/Show Disk/Site Definition for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds_cs) » » » » » Remove Disk/Site Definition for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds_rm) » » PowerHA SystemMirror Services -- (cl_cm_startstop_menu) » » » Start Cluster Services -- (clstart) » » » Stop Cluster Services -- (clstop) » » » Show Cluster Services -- (clshowsrv.dialog) » » Communication Interfaces -(cm_hacmp_communication_interface_management_menu_dmn) » » » Configure Communication Interfaces/Devices to the Operating System on a Node -(cm_config_comm_dev_node.select) » » » Update PowerHA SystemMirror Communication Interface with AIX Settings -(cm_update_hacmp_interface_with_aix_settings) » » » Swap IP Addresses between Communication Interfaces -- (cl_swap_adapter) » » » PCI Hot Plug Replace a Network Interface Card --(cl_pcihp) » » Resource Groups and Applications -(cm_hacmp_resource_group_and_application_management_menu) » » » Show the Current State of Applications and Resource Groups -(cm_show_current_state_application_resource_group_menu_dwn) » » » Bring a Resource Group Online -- (cl_resgrp_start.select) » » » Bring a Resource Group Offline -- (cl_resgrp_stop.select) » » » Move Resource Groups to Another Node -- (cl_resgrp_move_node.select) » » » Suspend/Resume Application Monitoring -- (cm_suspend_resume_menu) » » » » Suspend Application Monitoring -- (cm_suspend_appmon.select) » » » » Resume Application Monitoring -- (cm_resume_appmon.select) » » » Application Availability Analysis -- (cl_app_AAA.dialog) » » PowerHA SystemMirror Logs -- (cm_hacmp_log_viewing_and_management_menu_dmn) » » » View/Save/Delete PowerHA SystemMirror Event Summaries -- (cm_dsp_evs) » » » » View Event Summaries -- (cm_show_evs) » » » » Save Event Summaries to a file -- (dspevs.dialog) » » » » Delete Event Summary History -- (cm_del_evs) » » » View Detailed PowerHA SystemMirror Log Files -- (cm_log_menu) » » » » Scan the PowerHA SystemMirror for AIX Scripts log -- (cm_scan_scripts_log_select) » » » » Watch the PowerHA SystemMirror for AIX Scripts log -- (cm_watch_scripts_log.dialog) » » » » Scan the PowerHA SystemMirror for AIX System log -- (cm_scan_syslog.dialog) » » » » Watch the PowerHA SystemMirror for AIX System log -- (cm_watch_syslog.dialog) » » » » Scan the C-SPOC System Log File -- (cl_scan_syslog.dialog) » » » » Watch the C-SPOC System Log File -- (cl_watch_syslog.dialog) » » » Change/Show PowerHA SystemMirror Log File Parameters -- (cm_run_time.select) » » » Change/Show Cluster Manager Log File Parameters -- (cluster_manager_log_param) » » » Change/Show a Cluster Log Directory -- (clusterlog_redir.select) » » » Change All Cluster Logs Directory -- (clusterlog_redirall_cha) » » » Collect Cluster log files for Problem Reporting -- (cm_clsnap_dialog) » » File Collections -- (cm_filecollection_menu) » » » Manage File Collections -- (cm_filecollection_mgt) » » » » Add a File Collection -- (cm_filecollection_add) » » » » Change/Show a File Collection -- (cm_filecollection_ch) » » » » Remove a File Collection -- (cm_filecollection_rm) » » » » Change/Show Automatic Update Time -- (cm_filecollection_time) » » » Manage File in File Collections -- (cm_filesinfilecollection_mgt) » » » » Add Files to a File Collection -- (cm_filesinfilecollection_add) » » » » Remove Files from a File Collection -- (cm_filesfromfilecollection_selectfc) » » » Propagate Files in File Collections -- (cm_filecollection_prop) Appendix B. PowerHA SMIT tree 487 » » Security and Users -- (cl_usergroup) » » » PowerHA SystemMirror Cluster Security -- (cm_config_security) » » » » Configure Connection Authentication Mode -- (cm_config_security.connection) » » » » Configure Message Authentication Mode and Key Management -(cm_config_security.message) » » » » » Configure Message Authentication Mode -- (cm_config_security.message_dialog) » » » » » Generate/Distribute a Key -- (cm_config_security.message_key_dialog) » » » » » Enable/Disable Automatic Key Distribution -- (cm_config_security.keydist_message_dialog) » » » » » Activate the new key on all PowerHA SystemMirror cluster node -(cm_config_security.keyrefr_message_dialog) » » » Users in an PowerHA SystemMirror cluster -- (cl_users) » » » » Add a User to the Cluster -- (cl_mkuser) » » » » Change / Show Characteristics of a User in the Cluster -- (cl_chuser) » » » » Remove a User from the Cluster -- (cl_rmuser) » » » » List Users in the Cluster -- (cl_lsuser.hdr) » » » Groups in an PowerHA SystemMirror cluster -- (cl_groups) » » » » List All Groups in the Cluster -- (cl_lsgroup.hdr) » » » » Add a Group to the Cluster -- (cl_mkgroup) » » » » Change / Show Characteristics of a Group in the Cluster -- (cl_chgroup) » » » » Remove a Group from the Cluster -- (cl_rmgroup) » » » Passwords in an PowerHA SystemMirror cluster -- (cl_passwd) » » » » Change a User's Password in the Cluster -- (cl_chpasswd) » » » » Change Current Users Password -- (cl_chuserpasswd) » » » » Manage List of Users Allowed to Change Password -- (cl_manageusers) » » » » List Users Allowed to Change Password -- (cl_listmanageusers) » » » » Modify System Password Utility -- (cl_modpasswdutil) » » Open a SMIT Session on a Node -- (cm_open_a_smit_session_select) » Problem Determination Tools -- (cm_problem_determination_tools_menu_dmn) » » PowerHA SystemMirror Verification -- (cm_hacmp_verification_menu_dmn) » » » Verify Cluster Configuration -- (clverify.dialog) » » » Configure Custom Verification Method -- (clverify_custom_menu) » » » » Add a Custom Verification Method -- (clverify_custom_dialog_add) » » » » Change/Show a Custom Verification Method -- (clverify_custom_dialog_cha.select) » » » » Remove a Custom Verification Method -- (clverify_custom_dialog_rem.select) » » » Automatic Cluster Configuration Monitoring -- (clautover.dialog) » » View Current State -- (cm_view_current_state_menu_dmn) » » PowerHA SystemMirror Log Viewing and Management -(cm_hacmp_log_viewing_and_management_menu_dmn) » » » View/Save/Delete PowerHA SystemMirror Event Summaries -- (cm_dsp_evs) » » » » View Event Summaries -- (cm_show_evs) » » » » Save Event Summaries to a file -- (dspevs.dialog) » » » » Delete Event Summary History -- (cm_del_evs) » » » View Detailed PowerHA SystemMirror Log Files -- (cm_log_menu) » » » » Scan the PowerHA SystemMirror for AIX Scripts log -- (cm_scan_scripts_log_select) » » » » Watch the PowerHA SystemMirror for AIX Scripts log -- (cm_watch_scripts_log.dialog) » » » » Scan the PowerHA SystemMirror for AIX System log -- (cm_scan_syslog.dialog) » » » » Watch the PowerHA SystemMirror for AIX System log -- (cm_watch_syslog.dialog) » » » » Scan the C-SPOC System Log File -- (cl_scan_syslog.dialog) » » » » Watch the C-SPOC System Log File -- (cl_watch_syslog.dialog) » » » Change/Show PowerHA SystemMirror Log File Parameters -- (cm_run_time.select) » » » Change/Show Cluster Manager Log File Parameters -- (cluster_manager_log_param) » » » Change/Show a Cluster Log Directory -- (clusterlog_redir.select) » » » Change All Cluster Logs Directory -- (clusterlog_redirall_cha) » » » Collect Cluster log files for Problem Reporting -- (cm_clsnap_dialog) » » Recover From PowerHA SystemMirror Script Failure -- (clrecover.dialog.select) » » Restore PowerHA SystemMirror Configuration Database from Active Configuration -(cm_copy_acd_2dcd.dialog) » » Release Locks Set By Dynamic Reconfiguration -- (cldarelock.dialog) » » Cluster Test Tool -- (hacmp_testtool_menu) 488 IBM PowerHA SystemMirror 7.1 for AIX » » » Execute Automated Test Procedure -- (hacmp_testtool_auto_extended) » » » Execute Custom Test Procedure -- (hacmp_testtool_custom) » » PowerHA SystemMirror Trace Facility -- (cm_trace_menu) » » » Enable/Disable Tracing of PowerHA SystemMirror for AIX daemons -- (tracessys) » » » » Start Trace -- (tracessyson) » » » » Stop Trace -- (tracessysoff) » » » Start/Stop/Report Tracing of PowerHA SystemMirror for AIX Service -- (trace) » » » » START Trace -- (trcstart) » » » » STOP Trace -- (trcstop) » » » » Generate a Trace Report -- (trcrpt) » » » » Manage Event Groups -- (grpmenu) » » » » » List all Event Groups -- (lsgrp) » » » » » Add an Event Group -- (addgrp) » » » » » Change/Show an Event Group -- (chgrp) » » » » » Remove Event Groups -- (delgrp.hdr) » » » » Manage Trace -- (mngtrace) » » » » » Change/Show Default Values -- (cngtrace) » » » » » Reset Original Default Values -- (rstdflts) » » PowerHA SystemMirror Error Notification -- (cm_EN_menu) » » » Configure Automatic Error Notification -- (cm_AEN_menu) » » » » List Error Notify Methods for Cluster Resources -- (cm_aen_list.dialog) » » » » Add Error Notify Methods for Cluster Resources -- (cm_aen_add.dialog) » » » » Remove Error Notify Methods for Cluster Resources -- (cm_aen_delete.dialog) » » » Add a Notify Method -- (cm_add_notifymeth.dialog) » » » Change/Show a Notify Method -- (cm_change_notifymeth_select) » » » Remove a Notify Method -- (cm_del_notifymeth_select) » » » Emulate Error Log Entry -- (show_err_emulate.select) » » Stop RSCT Service -- (cm_manage_rsct_stop.dialog) » » AIX Tracing for Cluster Resources -- (cm_trc_menu) » » » Enable AIX Tracing for Cluster Resources -- (cm_trc_enable.select) » » » Disable AIX Tracing for Cluster Resources -- (cm_trc_disable.dialog) » » » Manage Command Groups for AIX Tracing for Cluster Resources -- (cm_trc_man_cmdgrp_menu) » » » » List Command Groups for AIX Tracing for Cluster Resources -- (cm_trc_ls_cmdgrp.dialog) » » » » Add a Command Group for AIX Tracing for Cluster Resources -- (cm_trc_add_cmdgrp.select) » » » » Change / Show a Command Group for AIX Tracing for Cluster Resou -(cm_trc_ch_cmdgrp.select) » » » » Remove Command Groups for AIX Tracing for Cluster Resources -- (cm_trc_rm_cmdgrp.dialog) » » Open a SMIT Session on a Node -- (cm_open_a_smit_session_select) » Custom Cluster Configuration -- (cm_custom_menu) » » Cluster Nodes and Networks -- (cm_custom_cluster_nodes_networks) » » » Initial Cluster Setup (Custom) -- (cm_custom_setup_menu) » » » » Cluster -- (cm_custom_setup_cluster_menu) » » » » » Add/Change/Show a Cluster -- (cm_add_change_show_cluster) » » » » » Remove the Cluster Definition -- (cm_remove_cluster) » » » » Nodes -- (cm_custom_setup_nodes_menu) » » » » » Add a Node -- (cm_add_node) » » » » » Change/Show a Node -- (cm_change_show_node) » » » » » Remove a Node -- (cm_remove_node) » » » » Networks -- (cm_manage_networks_menu) » » » » » Add a Network -- (cm_add_network) » » » » » Change/Show a Network -- (cm_change_show_network) » » » » » Remove a Network -- (cm_remove_network) » » » » Network Interfaces -- (cm_manage_interfaces_menu) » » » » » Add a Network Interface -- (cm_add_interfaces) » » » » » Change/Show a Network Interface -- (cm_change_show_interfaces) » » » » » Remove a Network Interface -- (cm_remove_interfaces) » » » » Define Repository Disk and Cluster IP Address -- (cm_define_repos_ip_addr) » » » Manage the Cluster -- (cm_custom_mgt_menu) » » » » Cluster Startup Settings -- (cm_startup_options) Appendix B. PowerHA SMIT tree 489 » » » » Reset Cluster Tunables -- (cm_reset_cluster_tunables) » » » Verify and Synchronize Cluster Configuration (Advanced) -- (cm_adv_ver_and_sync) » » Resources -- (cm_custom_apps_resources) » » » Custom Disk Methods -- (cldisktype_custom_menu) » » » » Add Custom Disk Methods -- (cldisktype_custom_dialog_add) » » » » Change/Show Custom Disk Methods -- (cldisktype_custom_dialog_cha.select) » » » » Remove Custom Disk Methods -- (cldisktype_custom_dialog_rem.select) » » » Custom Volume Group Methods -- (cm_config_custom_volume_methods_menu_dmn) » » » » Add Custom Volume Group Methods -- (cm_dialog_add_custom_volume_methods) » » » » Change/Show Custom Volume Group Methods -(cm_selector_change_custom_volume_methods) » » » » Remove Custom Volume Group Methods -- (cm_dialog_delete_custom_volume_methods) » » » Custom File System Methods -- (cm_config_custom_filesystem_methods_menu_dmn) » » » » Add Custom File System Methods -- (cm_dialog_add_custom_filesystem_methods) » » » » Change/Show Custom File System Methods -(cm_selector_change_custom_filesystem_methods) » » » » Remove Custom File System Methods -- (cm_dialog_delete_custom_filesystem_methods) » » » Configure User Defined Resources and Types -- (cm_cludrestype_main_menu) » » » » Configure User Defined Resource Types -- (cm_cludrestype_sub_menu) » » » » » Add a User Defined Resource Type -- (cm_cludrestype_add) » » » » » Change/Show a User Defined Resource Type -- (cm_cludrestype_change) » » » » » Remove a User Defined Resource Type -- (cm_cludrestype_remove) » » » » Configure User Defined Resources -- (cm_cludres_sub_menu) » » » » » Add a User Defined Resource -- (cm_cludres_add) » » » » » Change/Show a User Defined Resource -- (cm_cludres_change) » » » » » Remove a User Defined Resource -- (cm_cludres_remove) » » » » » Change/Show User Defined Resource Monitor -- (cm_cludres_chmonitor) » » » » Import User Defined Resource Types and Resources Definition from XML file -(cm_cludrestype_importxml) » » » Customize Resource Recovery -- (_cm_change_show_resource_action_select) » » » Verify and Synchronize Cluster Configuration (Advanced) -- (cm_adv_ver_and_sync) » » Events -- (cm_events) » » » Cluster Events -- (cm_cluster_events) » » » » Configure Pre/Post-Event Commands -- (cm_defevent_menu) » » » » » Add a Custom Cluster Event -- (cladd_event.dialog) » » » » » Change/Show a Custom Cluster Event -- (clchsh_event.select) » » » » » Remove a Custom Cluster Event -- (clrm_event.select) » » » » Change/Show Pre-Defined Events -- (clcsclev.select) » » » » User-Defined Events -- (clude_custom_menu) » » » » » Add Custom User-Defined Events -- (clude_custom_dialog_add) » » » » » Change/Show Custom User-Defined Events -- (clude_custom_dialog_cha.select) » » » » » Remove Custom User-Defined Events -- (clude_custom_dialog_rem.select) » » » » Remote Notification Methods -- (cm_def_cus_pager_menu) » » » » » Configure a Node/Port Pair -- (define_node_port) » » » » » Remove a Node/Port Pair -- (remove_node_port) » » » » » Add a Custom Remote Notification Method -- (cladd_pager_notify.dialog) » » » » » Change/Show a Custom Remote Notification Method -- (clch_pager_notify) » » » » » Remove a Custom Remote Notification Method -- (cldel_pager_notify) » » » » » Send a Test Remote Notification -- (cltest_pager_notify) » » » » Change/Show Time Until Warning -- (cm_time_before_warning) » » » System Events -- (cm_system_events) » » » » Change/Show Event Response -- (cm_change_show_sys_event) » » » Verify and Synchronize Cluster Configuration (Advanced) -- (cm_adv_ver_and_sync) » » Verify and Synchronize Cluster Configuration (Advanced) -- (cm_adv_ver_and_sync) » Can't find what you are looking for ? -- (cm_tree) » Not sure where to start ? -- (cm_getting_started) 490 IBM PowerHA SystemMirror 7.1 for AIX C Appendix C. PowerHA supported hardware Historically, newer versions of PowerHA inherited support from previous versions, unless specific support was removed by the product. Over time, it has become uncommon to remove support for old hardware. If the hardware was supported in the past and it can run a version of AIX that is supported by the current version of PowerHA, the hardware is supported. Because PowerHA 7.1 is not supported on any AIX level before 6.1.6, if the hardware is not supported on 6.1.6, then by definition PowerHA 7.1 does not support it either. Also, if the hardware manufacturer has not made any statement of support for AIX 7.1, it is not valid until such support is stated. This is true even though the tables in this appendix might show that PowerHA supports it. This appendix contains information about IBM Power Systems, IBM storage, adapters, and AIX levels supported by current versions of High-Availability Cluster Multi-Processing (HACMP) 5.4.1 through PowerHA 7.1. It focuses on hardware support from around the last five years and consists mainly of IBM POWER5 systems and later. At the time of writing, the information was current and complete. All POWER5 and later systems are supported on AIX 7.1 and HACMP 5.4.1 and later. AIX 7.1 support has the following specific requirements for HACMP and PowerHA: HACMP 5.4.1, SP10 PowerHA 5.5, SP7 PowerHA 6.1, SP3 PowerHA 7.1 Full software support details are in the official support flash. The information in this appendix is available and maintained in the “PowerHA hardware support matrix” at: http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105638 Most of the devices in the online documentation are linked to their corresponding support flash. This appendix includes the following topics: IBM Power Systems IBM storage Adapters © Copyright IBM Corp. 2011. All rights reserved. 491 IBM Power Systems The following sections provide details about the IBM Power System servers and the levels of PowerHA and AIX supported. IBM POWER5 systems Table C-1 lists the software versions for PowerHA with AIX supported on IBM POWER5 System p models. Table C-1 POWER5 System p model support for HACMP and PowerHA 492 System p models HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 7037-A50 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9110-510 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9110-51A AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9111-285 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9111-520 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 r AIX 7.1 9113-550 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9115-505 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9116-561+ AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9117-570 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9118-575 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 r AIX 7.1 9119-590 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9119-595 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9131-52A AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9133-55A AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 IBM PowerHA SystemMirror 7.1 for AIX Table C-2 lists the software versions for PowerHA with AIX supported on IBM POWER5 System i® models. Table C-2 POWER5 System i model support for HACMP and PowerHA System i models HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 9406-520 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9406-550 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9406-570 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9406-590 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9406-595 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 IBM POWER6 systems Table C-3 lists the software versions for PowerHA with AIX supported on POWER6 System p models. Table C-3 POWER6 System p support for PowerHA and AIX System p models HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 8203-E4A AIX 5.3 TL7 AIX 6.1 TL0 SP2 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 8203-E8A AIX 5.3 TL7 AIX6.1 TL0 SP AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 8234-EMA AIX 5.3 TL8 AIX 6.1 TL0 SP5 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9117-MMA AIX 5.3 TL6 AIX 6.1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9119-FHA AIX 5.3 TL8 AIX 6.1 SP1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 9125-F2A AIX 5.3 TL8 AIX 6.1 SP1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 Built-in serial ports: Built-in serial ports in POWER6 servers are not available for PowerHA use. Instead, use disk heartbeating. However, note that the built-in Ethernet (IVE) adapters are supported for PowerHA use. Appendix C. PowerHA supported hardware 493 IBM POWER7 Systems Table C-4 lists the software versions for HACMP and PowerHA with AIX supported on IBM POWER7 System p models. Table C-4 POWER7 System p support for HACMP and PowerHA System p models HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 8202-E4B/720 AIX 5.3 TL11 SP1 AIX 6.1 TL4 SP2 AIX 5.3 TL12 AIX 6.1 TL5 AIX 5.3 TL12 AIX 6.1 TL5 AIX 6.1 TL6 AIX 7.1 8205-E6B/740 AIX 5.3 TL11 SP1 AIX 6.1 TL4 SP2 AIX 5.3 TL12 AIX 6.1 TL5 AIX 5.3 TL12 AIX 6.1 TL5 AIX 6.1 TL6 AIX 7.1 8231-E2B/710 AIX 5.3 TL11 SP1 AIX 6.1 TL4 SP2 AIX 5.3 TL12 AIX 6.1 TL5 AIX 5.3 TL12 AIX 6.1 TL5 AIX 6.1 TL6 AIX 7.1 8231-E2B/730 AIX 5.3 TL11 SP1 AIX 6.1 TL4 SP2 AIX 5.3 TL12 AIX 6.1 TL5 AIX 5.3 TL12 AIX 6.1 TL5 AIX 6.1 TL6 AIX 7.1 8233-E8B/750 AIX 5.3 TL11 SP1 AIX 6.1 TL4 SP2 AIX 5.3 TL11 SP1 AIX 6.1 TL4 SP3 AIX 5.3 TL11 AIX 6.1 TL4 SP3 AIX 6.1 TL6 r AIX 7.1 9117-MMB/770 AIX 5.3 TL11 SP1 AIX 6.1 TL4 SP2 AIX 5.3 TL11 AIX 6.1 TL4 SP3 AIX 5.3 TL11 AIX 6.1 TL4 SP3 AIX 6.1 TL6 AIX 7.1 9119-FHB/795 AIX 5.3 TL11 SP1 AIX 6.1 TL4 SP2 AIX 5.3 TL12 AIX 6.1 TL5 AIX 5.3 TL12 AIX 6.1 TL5 AIX 6.1 TL6 AIX 7.1 9179-FHB/780 AIX 5.3 TL11 SP1 AIX 6.1 TL4 SP2 AIX 5.3 TL11 AIX 6.1 TL4 SP3 AIX 5.3 TL11 or AIX 6.1 TL4 SP3 AIX 6.1 TL6 AIX 7.1 Built-in serial ports: Built-in serial ports in POWER7 Servers are not available for PowerHA use. Instead, use disk heartbeating. However, note that the built-in Ethernet (IVE) adapters are supported for PowerHA use. IBM POWER Blade servers Table C-5 lists the software versions for HACMP and PowerHA with AIX supported on IBM POWER Blade servers. Table C-5 IBM POWER Blade support for HACMP and PowerHA 494 System p models HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 7778-23X/JS23 HACMP SP2 AIX 5.3 TL7 AIX 6.1 TL0 SP2 AIX 5.3 TL9 AIX 6.1 TL2 SP1 AIX 5.3 TL9 AIX 6.1 TL2 SP1 AIX 6.1 TL6 AIX 7.1 7778-43X/JS43 HACMP SP2 AIX 5.3 TL7 AIX 6.1 TL0 SP2 AIX 5.3 TL9 AIX 6.1 TL2 SP1 AIX 5.3 TL9 AIX 6.1 TL2 SP1 AIX 6.1 TL6 AIX 7.1 7998-60X/JS12 HACMP SP2 AIX 5.3 TL7 AIX 5.3 TL9 AIX 6.1 TL2 SP AIX 5.3 TL9 AIX 6.1 TL2 SP1 AIX 6.1 TL6 AIX 7.1 7998-61X/JS22 HACMP SP2 AIX 5.3 TL6 AIX 5.3 TL7 AIX 6.1 TL2 SP1 AIX 5.3 TL9 AIX 6.1 TL2 SP1 AIX 6.1 TL6 AIX 7.1 IBM PowerHA SystemMirror 7.1 for AIX System p models HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 8406-70Y/PS700 AIX 5.3 TL11 SP1 AIX 6.1 TL4 SP2 AIX 5.3 TL12 AIX 6.1 TL5 AIX 5.3 TL12 AIX 6.1 TL5 AIX 6.1 TL6 AIX 7.1 8406-71Y/PS701 PS702 AIX 5.3 TL11 SP1 AIX 6.1 TL4 SP2 AIX 5.3 TL12 AIX 6.1 TL5 AIX 5.3 TL12 AIX 6.1 TL5 AIX 6.1 TL6 AIX 7.1 8844-31U/JS21 8844-51U/JS21 AIX 5.3. TL4 AIX 5.3 TL9 AIX 6.1 TL2 SP1 AIX 5.3 TL9 AIX 6.1 TL2 SP1 AIX 6.1 TL6 AIX 7.1 Blade support includes support for IVM and IVE on both POWER6 and POWER7 blades. The following adapter cards are supported in the POWER6 and POWER7 blades: 8240 Emulex 8Gb FC Expansion Card (CIOv) 8241 QLogic 4Gb FC Expansion Card (CIOv) 8242 QLogic 8Gb Fibre Channel Expansion Card (CIOv) 8246 SAS Connectivity Card (CIOv) 8251 Emulex 4Gb FC Expansion Card (CFFv) 8252 QLogic combo Ethernet and 4 Gb Fibre Channel Expansion Card (CFFh) 8271 QLogic Ethernet/8Gb FC Expansion Card (CFFh) IBM storage It is common to use multipathing drivers with storage. If using MPIO, SDD, SDDPCM, or all three types on any PowerHA controlled storage, you are required to use enhanced concurrent volume groups (ECVGs). This requirement also applies to vSCSI and NPIV devices. Fibre Channel adapters This section provides information about support for fibre channel (FC) adapters. DS storage units Table C-6 lists the DS storage unit support for HACMP and PowerHA with AIX. Table C-6 DS storage unit support for HACMP and PowerHA Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 DS3400 HACMP SP2 AIX 5.3 TL8 AIX 6.1 TL2 AIX 5.3 TL9 AIX TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 DS3500 HACMP SP2 AIX 5.3 TL8 AIX 6.1 TL2 AIX 5.3 TL9 AIX TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 DS4100 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 DS4200 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 DS4300 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 Appendix C. PowerHA supported hardware 495 Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 DS4400 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 DS4500 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 DS4700 AIX 5.3 TL5 AIX 6.1 AIX 5.3 TL9 AIX TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 DS4800 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 DS5020 HACMP SP2 AIX 5.3 TL7 AIX 6.1 TL0 SP2 AIX 5.3 TL9 AIX 6.1 TL2 SP1 AIX 5.3 TL9 AIX 6.1 TL2 SP1 AIX 6.1 TL6 AIX 7.1 DS6000 DS6800 AIX 5.3 TL5 AIX 6.1 AIX 5.3 TL9 AIX TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 DS5100 HACMP SP2 AIX 5.3 TL7 AIX 6.1 TL0 SP2 AIX 5.3 TL9 AIX 6.1 TL2 SP1 AIX 5.3 TL9 AIX 6.1 TL2 SP1 AIX 6.1 TL6 AIX 7.1 DS5300 HACMP SP2 AIX 5.3 TL7 AIX 6.1 TL0 SP2 AIX 5.3 TL9 AIX 6.1 TL2 SP1 AIX 5.3 TL9 AIX 6.1 TL2 SP1 AIX 6.1 TL6 AIX 7.1 DS8000 931,932,9B2 AIX 5.3 TL5 AIX 6.1 AIX 5.3 TL9 AIX TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 DS8700 HACMP SP2 AIX 5.3 TL8 AIX 6.1 TL2 AIX 5.3 TL9 AIX TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 IBM XIV Table C-7 lists the software versions for HAMCP and PowerHA with AIX supported on XIV storage. PowerHA requires XIV microcode level 10.0.1 or later. Table C-7 IBM XIV support for HACMP and PowerHA with AIX 496 Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 XIV 2810-A14 HACMP SP4 AIX 5.3 TL7 SP6 AIX 6.1 TL0 SP2 AIX 5.3 TL9 AIX TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 IBM PowerHA SystemMirror 7.1 for AIX SAN Volume Controller Table C-8 shows the software versions for HACMP and PowerHA with AIX supported on the SAN Volume Controller (SVC). SVC software levels are supported up through SVC v5.1. The levels shown in the table are the absolute minimum requirements for v5.1. Table C-8 SVC supported models for HACMP and PowerHA with AIX Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 2145-4F2 HACMP SP8 AIX 5.3 TL9 AIX 6.1 TL2 SP3 PowerHA SP6 AIX 5.3 TL9 AIX 6.1 TL2 SP3 PowerHA SP1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 2145-8F2 HACMP SP8 AIX 5.3 TL9 AIX 6.1 TL2 SP3 PowerHA SP8 AIX 5.3 TL9 AIX 6.1 TL2 SP3 PowerHA SP1 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 Network-attached storage Table C-9 shows the software versions for PowerHA and AIX supported on network-attached storage (NAS). Table C-9 NAS supported models for HACMP and PowerHA with AIX Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 N3700 (A20) AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N5200 (A20) AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N5200 (G20) AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N5300 HACMP SP3 AIX 5.3 TL7 AIX 6.1 TL0 SP2 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N5500 (A20) AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N5500 (G20) AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N5600 HACMP SP3 AIX 5.3 TL7 AIX 6.1 TL0 SP2 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N6040 AIX 5.3 TL7 AIX 6.1 TL0 SP2 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N6060 AIX 5.3 TL7 AIX 6.1 TL0 SP2 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N6070 AIX 5.3 TL7 AIX 6.1 TL0 SP2 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N7600 (A20) AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N7600 (G20) AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 Appendix C. PowerHA supported hardware 497 Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 N7700 (A21) AIX 5.3 TL7 AIX 6.1 TL0 SP2 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N7700 (G21) AIX 5.3 TL7 AIX 6.1 TL0 SP2 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N7800 (A20) AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N7800 (G20) AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N7900 (A21) AIX 5.3 TL7 AIX 6.1 TL0 SP2 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 N7900 (G21) AIX 5.3 TL7 AIX 6.1 TL0 SP2 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 Serial-attached SCSI Table C-10 lists the software versions for PowerHA and AIX supported on the serial-attached SCSI (SAS) model. Table C-10 SAS supported model for HACMP and PowerHA with AIX Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 5886 EXP12S HACMP SP5 AIX 5.3 TL9 AIX 6.1 TL2 SP3 HACMP SP2 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 SCSI Table C-11 shows the software versions for PowerHA and AIX supported on the SCSI model. Table C-11 SCSI supported model for HACMP and PowerHA with AIX Model HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 7031-D24 AIX 5.3 TL4 AIX 6.1 AIX 5.3 TL7 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 Adapters This following sections contain information about the supported adapters for PowerHA. Fibre Channel adapters The following FC adapters are supported: 498 #1905 4 Gb Single Port Fibre Channel PCI-X 2.0 DDR Adapter #1910 4 Gb Dual Port Fibre Channel PCI-X 2.0 DDR Adapter #1957 2 Gigabit Fibre Channel PCI-X Adapter #1977 2 Gigabit Fibre Channel PCI-X Adapter IBM PowerHA SystemMirror 7.1 for AIX #5273 LP 8 Gb PCI-Express Dual Port Fibre Channel Adapter* #5276 LP 4 Gb PCI-Express Fibre Channel Adapter #5716 2 Gigabit Fibre Channel PCI-X Adapter #5735 8 Gb PCI-Express Dual Port Fibre Channel Adapter* #5758 4 Gb Single Port Fibre Channel PCI-X 2.0 DDR Adapter #5759 4 Gb Dual Port Fibre Channel PCI-X 2.0 DDR Adapter #5773 Gigabit PCI Express Fibre Channel Adapter #5774 Gigabit PCI Express Fibre Channel Adapter #6228 1-and 2-Gigabit Fibre Channel Adapter for 64-bit PCI Bus #6239 2 Gigabit FC PCI-X Adapter #5273/#5735 PCI-Express Dual Port Fibre Channel Adapter: The 5273/5735 minimum requirements are PowerHA 5.4.1 SP2 or 5.5 SP1. SAS The following SAS adapters are supported: #5278 LP 2x4port PCI-Express SAS Adapter 3 Gb #5901 PCI-Express SAS Adapter #5902 PCI-X DDR Dual –x4 Port SAS RAID Adapter #5903 PCI-Express SAS Adapters #5912 PCI-X DDR External Dual – x4 Port SAS Adapter Table C-12 lists the SAS software support requirements. Table C-12 SAS software support for HACMP and PowerHA with AIX HACMP 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1 HACMP SP5 AIX 5.3 TL9 AIX 6.1 TL2 SP3 HACMP SP2 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 5.3 TL9 AIX 6.1 TL2 SP3 AIX 6.1 TL6 AIX 7.1 Ethernet The following Ethernet adapters are supported with PowerHA: #1954 4-Port 10/100/100 Base-TX PCI-X Adapter #1959 IBM 10/100/1000 Base-TX Ethernet PCI-X Adapter #1978 IBM Gigabit Ethernet-SX PCI-X Adapter #1979 IBM 10/100/1000 Base-TX Ethernet PCI-X Adapter #1981 IBM 10 Gigabit Ethernet-SR PCI-X Adapter #1982 IBM 10 Gigabit Ethernet-LR PCI-X Adapter #1983 IBM 2-port 10/100/1000 Base-TX Ethernet PCI-X #1984 IBM Dual Port Gigabit Ethernet-SX PCI-X Adapter #1990 IBM 2-port 10/100/1000 Base-TX Ethernet PCI-X #4961 IBM Universal 4-Port 10/100 Ethernet Adapter #4962 IBM 10/100 Mbps Ethernet PCI Adapter II #5271 LP 4-Port Ethernet 10/100/1000 Base-TX PCI-X Adapter #5274 LP 2-Port Gigabit Ethernet-SX PCI Express #5700 IBM Gigabit Ethernet-SX PCI-X Adapter #5701 IBM 10/100/1000 Base-TX Ethernet PCI-X Adapter #5706 IBM 2-Port 10/100/1000 Base-TX Ethernet PCI-X Adapter #5707 IBM 2-Port Gigabit Ethernet-SX PCI-X Adapter #5717 IBM 4-Port Ethernet 10/100/1000 Base-TX PCI-X Adapter Appendix C. PowerHA supported hardware 499 #5718 IBM 10 Gigabit -SR/-LR Ethernet PCI-x adapters #5719 IBM 10 Gigabit -SR/-LR Ethernet PCI-x adapters #5721 IBM 10 Gigabit Ethernet-SR PCI-X 2.0 Adapter #5722 IBM 10 Gigabit Ethernet-LR PCI-X 2.0 Adapter #5740 4-Port 10/100/100 Base-TX PCI-X Adapter #5767 Adapter 2-Port 10/100/1000 Base-TX Ethernet PCI Express #5768 Adapter 2-Port Gigabit Ethernet-SX PCI Express InfiniBand The following InfiniBand adapters are supported with PowerHA: #1809 IBM GX Dual-port 4x IB HCA #1810 IBM GX Dual-port 4x IB HCA #1811 IBM GX Dual-port 4x IB HCA #1812 IBM GX Dual-port 4x IB HCA #1820 IBM GX Dual-port 12x IB HCA SCSI and iSCSI The following SCSI and iSCSI adapters are supported with PowerHA: #1912 IBM PCI-X DDR Dual Channel Ultra320 LVD SCSI Adapter #1913 PCI-X DDR Dual Channel Ultra320 SCSI RAID Adapter #1975 PCI-X Dual Channel Ultra320 SCSI RAID Adapter #1986 1 Gigabit-TX iSCSI TOE PCI-X adapter (copper connector) #1987 1 Gigabit-SX iSCSI TOE PCI-X adapter (optical connector) #5703 PCI-X Dual Channel Ultra320 SCSI RAID Adapter #5710 PCI-X Dual Channel Ultra320 SCSI Adapter #5711 PCI-X Dual Channel Ultra320 SCSI RAID Blind Swap Adapter #5712 PCI-X Dual Channel Ultra320 SCSI Adapter #5713 1 Gigabit-TX iSCSI TOE PCI-X adapter (copper connector) #5714 1 Gigabit-SX iSCSI TOE PCI-X adapter (optical connector) #5736 IBM PCI-X DDR Dual Channel Ultra320 SCSI Adapter #5737 PCI-X DDR Dual Channel Ultra320 SCSI RAID Adapter PCI bus adapters PowerHA 7.1 no longer supports RS-232 connections. Therefore, the following adapters are supported up through PowerHA 6.1 only: #2943 8-Port Asynchronous EIA-232/RS-422, PCI bus adapter #2944 128-Port Asynchronous Controller, PCI bus adapter #5277 IBM LP 4-Port Async EIA-232 PCIe Adapter #5723 2-Port Asynchronous EIA-232/RS-422, PCI bus adapter #5785 IBM 4-Port Async EIA-232 PCIe adapter The 5785 adapter is only supported by PowerHA 5.5 and 6.1. 500 IBM PowerHA SystemMirror 7.1 for AIX D Appendix D. The clmgr man page At time of writing, no documentation was available about the clmgr command except for the related man pages. To make it easier for those of you who do not have the product installed and want more details about the clmgr command, a copy of the man pages is provided as follows: clmgr command ************ Purpose ======= clmgr: Provides a consistent, reliable interface for performing IBM PowerHA SystemMirror cluster operations via a terminal or script. All clmgr operations are logged in the "clutils.log" file, including the command that was executed, its start/stop time, and what user initiated the command. The basic format for using clmgr is consistently as follows: clmgr <ACTION> <CLASS> [<NAME>] [<ATTRIBUTES...>] This consistency helps make clmgr easier to learn and use. Further help is also available at each part of clmgr's commmand line. For example, just executing "clmgr" by itself will result in a list of the available ACTIONs supported by clmgr. Executing "clmgr ACTION" with no CLASS provided will result in a list of all the available CLASSes for the specified ACTION. Executing "clmgr ACTION CLASS" with no NAME or ATTRIBUTES provided is slightly different, though, since for some ACTION+CLASS combinations, that may be a valid command format. So to get help in this scenario, it is necessary to explicitly request it by appending the "-h" flag. So executing "clmgr ACTION CLASS -h" will result in a listing of all known attributes for that ACTION+CLASS combination being displayed. That is where clmgr's ability to help ends, however; it can not help with each individual attribute. If there is a question about what a particular attribute is for, or when to use it, the product © Copyright IBM Corp. 2011. All rights reserved. 501 documentation will need to be consulted. Synopsis ======== clmgr [-c|-x] [-S] [-v] [-f] [-D] [-l {low|med|high|max}] [-T <ID>] [-a {<ATTR#1>,<ATTR#2>,<ATTR#n>,...}] <ACTION> <CLASS> [<NAME>] [-h | <ATTR#1>=<VALUE#1> <ATTR#2>=<VALUE#2> <ATTR#n>=<VALUE#n>] ACTION={add|modify|delete|query|online|offline|...} CLASS={cluster|site|node|network|resource_group|...} clmgr {-h|-?} [-v] clmgr [-v] help ACTION a verb describing the operation to be performed The following four ACTIONs are available on almost all the supported CLASSes (there are a few exceptions): add query modify delete (Aliases: (Aliases: (Aliases: (Aliases: a) q, ls, get) mod, ch, set) de, rm, er) The remaining ACTIONS are typically only supported on a small subset of the supported CLASSes: Cluster, Sites, Node, Resource Group: online (Aliases: on, start) offline (Aliases: off, stop) Resource Group, Service IP, Persistent IP: move (Aliases: mv) Cluster, Log, Node, Snapshot: manage (Aliases: mg) Cluster, File Collection: sync (Aliases: sy) Cluster, Method: verify (Aliases: ve) Log, Report, Snapshot: view (Aliases: vi) NOTE: ACTION is *not* case-sensitive. NOTE: all ACTIONs provide a shorter alias, such as "rm" in place of "delete". These aliases are provided for convenience/ease-of-use at a terminal, and are not recommended for use in scripts. CLASS 502 the type of object upon which the ACTION will be performed. The complete list of supported CLASSes is: IBM PowerHA SystemMirror 7.1 for AIX cluster site node interface network resource_group service_ip persistent_ip application_controller application_monitor tape dependency file_collection snapshot resource resource_type method volume_group logical_volume file_system physical_volume (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: (Aliases: cl) si) no) in, ne, rg) se) pe) ac, am, tp) de) fi, sn, rs) rt) me) vg) lv) fs) pv) if) nw) app) mon) fc) ss) NOTE: CLASS is *not* case-sensitive. NOTE: all CLASSes provide a shorter alias, such as "fc" in place of "file_collection". These aliases are provided for convenience/ease-of-use at a terminal, and are not recommended for use in scripts. NAME ATTR=VALUE the specific object, of type "CLASS", upon which the ACTION is to be performed. optional, attribute/value pairs that are specific to the ACTION+CLASS combination. These may be used to do specify configuration settings, or adjust particular operations. When used with the "query" action, ATTR=VALUE specifications may be used to perform attribute-based searching/filtering. When used for this purpose, simple wildcards may be used. For example, "*" matches zero or more of any character, "?" matches zero or one of any character. NOTE: an ATTR may not always need to be fully typed. Only the number of leading characters required to uniquely identify the attribute from amongst the set of attributes available for the specified operation need to be provided. So instead of "FC_SYNC_INTERVAL", for the "add/modify cluster" operation, "FC" could be used, and would have the same result. -a valid only with the "query", "add", and "modify" ACTIONs, requests that only the specified attribute(s) be displayed. NOTE: the specified order of these attributes is *not* guaranteed to be preserved in the resulting output. Appendix D. The clmgr man page 503 -c valid only with the "query", "add", and "modify" ACTIONs, requests all data to be displayed in colon-delimited format. -D disables the dependency mechanism in clmgr that will attempt to create any requisite resources if they are not already defined within the cluster. -f requests an override of any interactive prompts, forcing the current operation to be attempted (if forcing the operation is a possibility). -h requests that any available help information be displayed. An attempt is made to provide context-sensitive assistance. -l activates trace logging for serviceability: low: med: logs function entry/exit adds function entry parameters, as well as function return values high: adds tracing of every line of execution, only omitting routine, "utility" functions max: adds the routine/utility functions. Also adds a time/date stamp to the function entry/exit messages. All trace data is written into the "clutils.log" file. This option is typically only of interest when troubleshooting. -S valid only with the "query" ACTION and "-c" option, requests that all column headers be suppressed. -T a transaction ID to be applied to all logged output, to help group one of more activities into a single body of output that can be extracted from the log for analysis. This option is typically only of interest when troubleshooting. -v requests maximum verbosity in the output. NOTE: when used with the "query" action and no specific object name, queries all instances of the specified class. For example, "clmgr -v query node" will query and display *all* nodes and their attributes. When used with the "add" or "modify" operations, the final, resulting attributes after the operation is complete will be displayed (only if the operation was successful). -x valid only with the "query", "add", and "modify" ACTIONs, requests all data to be displayed in simple XML format. Operations ========== CLUSTER: clmgr add cluster \ [ <cluster_label> ] \ REPOSITORY=<hdisk#> \ 504 IBM PowerHA SystemMirror 7.1 for AIX clmgr clmgr clmgr clmgr SHARED_DISKS=<hdisk#>[,<hdisk#>,...] \ [ NODES=<host>[,<host#2>,<host#n>,...] ] \ [ CLUSTER_IP=<IP_Address> ] \ [ FC_SYNC_INTERVAL=## ] \ [ RG_SETTLING_TIME=## ] \ [ MAX_EVENT_TIME=### ] \ [ MAX_RG_PROCESSING_TIME=### ] \ [ SITE_POLICY_FAILURE_ACTION={fallover|notify} ] \ [ SITE_POLICY_NOTIFY_METHOD="<FULL_PATH_TO_FILE>" ] [ DAILY_VERIFICATION={Enabled|Disabled} ] \ [ VERIFICATION_NODE={Default|<node>} ] \ [ VERIFICATION_HOUR=<00..23> ] \ [ VERIFICATION_DEBUGGING={Enabled|Disabled} ] modify cluster \ [ NEWNAME=<new_cluster_label> ] \ [ SHARED_DISKS=<disk>[,<disk#2>,<disk#n>,...] ] \ [ NODES=<host>[,<host#2>,<host#n>,...] ] \ [ CLUSTER_IP=<IP_Address> ] \ [ FC_SYNC_INTERVAL=## ] \ [ RG_SETTLING_TIME=## ] \ [ MAX_EVENT_TIME=### ] \ [ MAX_RG_PROCESSING_TIME=### ] \ [ SITE_POLICY_FAILURE_ACTION={fallover|notify} ] \ [ SITE_POLICY_NOTIFY_METHOD="<FULL_PATH_TO_FILE>" ] [ DAILY_VERIFICATION={Enabled|Disabled} ] \ [ VERIFICATION_NODE={Default|<node>} ] \ [ VERIFICATION_HOUR=<00..23> ] \ [ VERIFICATION_DEBUGGING={Enabled|Disabled} ] query cluster delete cluster [ NODES={ALL|<node>[,<node#2>,<node#n>,...}] ] recover cluster NOTE: the "delete" action defaults to only deleting the cluster on the local node. clmgr sync cluster \ [ VERIFY={yes|no} ] \ [ CHANGES_ONLY={no|yes} ] \ [ DEFAULT_TESTS={yes|no} ] \ [ METHODS=<method#1>[,<method#n>,...] ] \ [ FIX={no|yes} ] \ [ LOGGING={standard|verbose} ] \ [ LOGFILE=<PATH_TO_LOG_FILE> ] \ [ MAX_ERRORS=## ] \ [ FORCE={no|yes} ] NOTE: all options are verification parameters, so they are only valid when "VERIFY" is set to "yes". clmgr manage cluster {discover|reset|unlock} clmgr manage cluster security \ LEVEL={Disable|Low|Med|High} clmgr manage cluster security \ ALGORITHM={DES|3DES|AES} \ [ GRACE_PERIOD=<SECONDS> ] \ Appendix D. The clmgr man page 505 [ REFRESH=<SECONDS> ] clmgr manage cluster security \ MECHANISM={OpenSSL|SelfSigned|SSH} \ [ CERTIFICATE=<PATH_TO_FILE> ] \ [ PRIVATE_KEY=<PATH_TO_FILE> ] NOTE: "GRACE_PERIOD" defaults to 21600 seconds (6 hours). NOTE: "REFRESH" defaults to 86400 seconds (24 hours). clmgr verify cluster \ [ CHANGES_ONLY={no|yes} ] \ [ DEFAULT_TESTS={yes|no} ] \ [ METHODS=<method#1>[,<method#n>,...] ] \ [ FIX={no|yes} ] \ [ LOGGING={standard|verbose} ] \ [ LOGFILE=<PATH_TO_LOG_FILE> ] \ [ MAX_ERRORS=## ] [ SYNC={no|yes} ] \ [ FORCE={no|yes} ] NOTE: the "FORCE" option should only be used when "SYNC" is set to "yes". clmgr offline cluster \ [ WHEN={now|restart|both} ] \ [ MANAGE={offline|move|unmanage} ] \ [ BROADCAST={true|false} ] \ [ TIMEOUT=<seconds_to_wait_for_completion> ] clmgr online cluster \ [ WHEN={now|restart|both} ] \ [ MANAGE={auto|manual} ] \ [ BROADCAST={false|true} ] \ [ CLINFO={false|true|consistent} ] \ [ FORCE={false|true} ] \ [ FIX={no|yes|interactively} ] [ TIMEOUT=<seconds_to_wait_for_completion> ] NOTE: the "RG_SETTLING_TIME" attribute only affects resource groups with a startup policy of "Online On First Available Node". NOTE: an alias for "cluster" is "cl". SITE: clmgr add site <sitename> \ [ NODES=<node>[,<node#2>,<node#n>,...] ] clmgr modify site <sitename> \ [ NEWNAME=<new_site_label> ] \ [ {ADD|REPLACE}={ALL|<node>[,<node#2>,<node#n>,...}] ] At least one modification option must be specified. ADD attempts to append the specified nodes to the site. REPLACE attempts to replace the sites current nodes with the specified nodes. clmgr query site [ <sitename>[,<sitename#2>,<sitename#n>,...] ] clmgr delete site {<sitename>[,<sitename#2>,<sitename#n>,...] | ALL} clmgr recover site <sitename> clmgr offline site <sitename> \ [ WHEN={now|restart|both} ] \ 506 IBM PowerHA SystemMirror 7.1 for AIX [ MANAGE={offline|move|unmanage} ] \ [ BROADCAST={true|false} ] \ [ TIMEOUT=<seconds_to_wait_for_completion> ] clmgr online site <sitename> \ [ WHEN={now|restart|both} ] \ [ MANAGE={auto|manual} ] \ [ BROADCAST={false|true} ] \ [ CLINFO={false|true|consistent} ] \ [ FORCE={false|true} ] \ [ FIX={no|yes|interactively} ] [ TIMEOUT=<seconds_to_wait_for_completion> ] NOTE: an alias for "site" is "si". NODE: clmgr add node <node> \ [ COMMPATH=<ip_address_or_network-resolvable_name> ] \ [ RUN_DISCOVERY={true|false} ] \ [ PERSISTENT_IP=<IP> NETWORK=<network> {NETMASK=<255.255.255.0 | PREFIX=1..128} ] \ [ START_ON_BOOT={false|true} ] \ [ BROADCAST_ON_START={true|false} ] \ [ CLINFO_ON_START={false|true|consistent} ] \ [ VERIFY_ON_START={true|false} ] clmgr modify node <node> \ [ NEWNAME=<new_node_label> ] \ [ COMMPATH=<new_commpath> ] \ [ PERSISTENT_IP=<IP> NETWORK=<network> {NETMASK=<255.255.255.0 | PREFIX=1..128} ] \ [ START_ON_BOOT={false|true} ] \ [ BROADCAST_ON_START={true|false} ] \ [ CLINFO_ON_START={false|true|consistent} ] \ [ VERIFY_ON_START={true|false} ] clmgr query node [ {<node>|LOCAL}[,<node#2>,<node#n>,...] ] clmgr delete node {<node>[,<node#2>,<node#n>,...] | ALL} clmgr manage node undo_changes clmgr recover node <node>[,<node#2>,<node#n>,...] clmgr online node <node>[,<node#2>,<node#n>,...] \ [ WHEN={now|restart|both} ] \ [ MANAGE={auto|manual} ] \ [ BROADCAST={false|true} ] \ [ CLINFO={false|true|consistent} ] \ [ FORCE={false|true} ] \ [ FIX={no|yes|interactively} ] [ TIMEOUT=<seconds_to_wait_for_completion> ] clmgr offline node <node>[,<node#2>,<node#n>,...] \ [ WHEN={now|restart|both} ] \ [ MANAGE={offline|move|unmanage} ] \ [ BROADCAST={true|false} ] \ [ TIMEOUT=<seconds_to_wait_for_completion> ] NOTE: the "TIMEOUT" attribute defaults to 120 seconds. NOTE: an alias for "node" is "no". NETWORK: Appendix D. The clmgr man page 507 clmgr add network <network> \ [ TYPE={ether|XD_data|XD_ip|infiniband} ] \ [ {NETMASK=<255.255.255.0 | PREFIX=1..128} ] \ [ IPALIASING={true|false} ] clmgr modify network <network> \ [ NEWNAME=<new_network_label> ] \ [ TYPE={ether|XD_data|XD_ip|infiniband} ] \ [ {NETMASK=<###.###.###.###> | PREFIX=1..128} ] \ [ ENABLE_IPAT_ALIASING={true|false} ] \ [ PUBLIC={true|false} ] \ [ RESOURCE_DIST_PREF={AC|C|CPL|ACPL} ] clmgr query network [ <network>[,<network#2>,<network#n>,...] ] clmgr delete network {<network>[,<network#2>,<network#n>,...] | ALL} NOTE: the TYPE defaults to "ether" if not specified. NOTE: when adding, the default is to construct an IPv4 network using a netmask of "255.255.255.0". To create an IPv6 network, specify a valid prefix. NOTE: AC == Anti-Collocation C == Collocation CPL == Collocation with Persistent Label ACPL == Anti-Collocation with Persistent Label NOTE: aliases for "network" are "ne" and "nw". INTERFACE: clmgr add interface <interface> \ NETWORK=<network> \ [ NODE=<node> ] \ [ TYPE={ether|infiniband} ] \ [ INTERFACE=<network_interface> ] clmgr modify interface <interface> \ NETWORK=<network> clmgr query interface [ <interface>[,<if#2>,<if#n>,...] ] clmgr delete interface {<interface>[,<if#2>,<if#n>,...] | ALL} NOTE: NOTE: NOTE: NOTE: NOTE: the "interface" may be either an IP address or label the "NODE" attribute defaults to the local node name. the "TYPE" attribute defaults to "ether" the "<network_interface>" might look like "en1", "en2", ... aliases for "interface" are "in" and "if". RESOURCE GROUP: clmgr add resource_group <resource_group> NODES=nodeA1,nodeA2,... [ SECONDARYNODES=nodeB2,nodeB1,... [ STARTUP={OHN|OFAN|OAAN|OUDP} [ FALLOVER={FNPN|FUDNP|BO} [ FALLBACK={NFB|FBHPN} [ NODE_PRIORITY_POLICY={default|mem|cpu| disk|least|most} [ NODE_PRIORITY_POLICY_SCRIPT=</path/to/script> [ NODE_PRIORITY_POLICY_TIMEOUT=### [ SITE_POLICY={ignore|primary|either|both} [ SERVICE_LABEL=service_ip#1[,service_ip#2,...] [ APPLICATIONS=appctlr#1[,appctlr#2,...] 508 IBM PowerHA SystemMirror 7.1 for AIX ] ] ] ] ] ] ] ] ] ] \ \ \ \ \ \ \ \ \ \ \ \ \ [ [ [ [ [ [ [ [ [ [ [ [ [ STARTUP: OHN ----OFAN ---OAAN ---OUDP ---- SHARED_TAPE_RESOURCES=<TAPE>[,<TAPE#2>,...] VOLUME_GROUP=<VG>[,<VG#2>,...] FORCED_VARYON={true|false} VG_AUTO_IMPORT={true|false} FILESYSTEM=/file_system#1[,/file_system#2,...] DISK=<hdisk>[,<hdisk#2>,...] FS_BEFORE_IPADDR={true|false} WPAR_NAME="wpar_name" EXPORT_FILESYSTEM=/expfs#1[,/expfs#2,...] EXPORT_FILESYSTEM_V4=/expfs#1[,/expfs#2,...] STABLE_STORAGE_PATH="/fs3" NFS_NETWORK="nfs_network" MOUNT_FILESYSTEM=/nfs_fs1;/expfs1,/nfs_fs2;,... Online Online Online Online ] ] ] ] ] ] ] ] ] ] ] ] ] \ \ \ \ \ \ \ \ \ \ \ \ Home Node (default value) on First Available Node on All Available Nodes (concurrent) Using Node Distribution Policy FALLOVER: FNPN ---- Fallover to Next Priority Node (default value) FUDNP --- Fallover Using Dynamic Node Priority BO ------ Bring Offline (On Error Node Only) FALLBACK: NFB ----- Never Fallback FBHPN --- Fallback to Higher Priority Node (default value) NODE_PRIORITY_POLICY: NOTE: this policy may only be established if if the FALLOVER policy has been set to "FUDNP". default - next node in the NODES list mem ----- node with most available memory disk ---- node with least disk activity cpu ----- node with most available CPU cycles least --- node where the dynamic node priority script returns the lowest value most ---- node where the dynamic node priority script returns the highest value SITE_POLICY: ignore -- Ignore primary - Prefer Primary Site either -- Online On Either Site both ---- Online On Both Sites NOTE: "SECONDARYNODES" and "SITE_POLICY" only apply when sites are configured within the cluster. NOTE: "appctlr" is an abbreviation for "application_controller". clmgr modify resource_group <resource_group> [ NEWNAME=<new_resource_group_label> [ NODES=nodeA1[,nodeA2,...] [ SECONDARYNODES=nodeB2[,nodeB1,...] \ ] \ ] \ ] \ Appendix D. The clmgr man page 509 [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ STARTUP={OHN|OFAN|OAAN|OUDP} ] FALLOVER={FNPN|FUDNP|BO} ] FALLBACK={NFB|FBHPN} ] NODE_PRIORITY_POLICY={default|mem|cpu| disk|least|most} ] SITE_POLICY={ignore|primary|either|both} ] SERVICE_LABEL=service_ip#1[,service_ip#2,...] ] APPLICATIONS=appctlr#1[,appctlr#2,...] ] VOLUME_GROUP=volume_group#1[,volume_group#2,...]] FORCED_VARYON={true|false} ] VG_AUTO_IMPORT={true|false} ] FILESYSTEM=/file_system#1[,/file_system#2,...] ] DISK=hdisk#1[,hdisk#2,...] ] FS_BEFORE_IPADDR={true|false} ] WPAR_NAME="wpar_name" ] EXPORT_FILESYSTEM=/expfs#1[,/expfs#2,...] ] EXPORT_FILESYSTEM_V4=/expfs#1[,/expfs#2,...] ] STABLE_STORAGE_PATH="/fs3" ] NFS_NETWORK="nfs_network" ] MOUNT_FILESYSTEM=/nfs_fs1;/expfs1,/nfs_fs2;,... ] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ NOTE: "SECONDARYNODES" and "SITE_POLICY" only apply when sites are configured within the cluster. NOTE: "appctlr" is an abbreviation for "application_controller". clmgr query resource_group [ <resource_group>[,<rg#2>,<rg#n>,...] ] clmgr delete resource_group {<resource_group>[,<rg#2>,<rg#n>,...] | ALL} clmgr online resource_group <resource_group>[,<rg#2>,<rg#n>,...] \ [ NODES=<node>[,<node#2>,...] ] clmgr offline resource_group <resource_group>[,<rg#2>,<rg#n>,...] \ [ NODES=<node>[,<node#2>,...] ] clmgr move resource_group <resource_group>[,<rg#2>,<rg#n>,...] \ {SITE|NODE}=<node_or_site_label> \ [ STATE={online|offline} ] \ [ SECONDARY={false|true} ] NOTE: the "SITE" and "SECONDARY" attributes are only applicable when sites are configured in the cluster. NOTE: the "SECONDARY" attribute defaults to "false". NOTE: the resource group STATE remains unchanged if "STATE" is not explicitly specified. NOTE: an alias for "resource_group" is "rg". FALLBACK TIMER: clmgr add fallback_timer <timer> \ [ YEAR=<####> ] \ [ MONTH=<{1..12 | Jan..Dec}> ] \ [ DAY_OF_MONTH=<{1..31}> ] \ [ DAY_OF_WEEK=<{0..6 | Sun..Sat}> ] \ HOUR=<{0..23}> \ MINUTE=<{0..59}> clmgr modify fallback_timer <timer> \ [ YEAR=<{####}> ] \ [ MONTH=<{1..12 | Jan..Dec}> ] \ 510 IBM PowerHA SystemMirror 7.1 for AIX [ [ [ [ [ DAY_OF_MONTH=<{1..31}> ] \ DAY_OF_WEEK=<{0..6 | Sun..Sat}> ] \ HOUR=<{0..23}> ] \ MINUTE=<{0..59}> ] \ REPEATS=<{0,1,2,3,4 | Never,Daily,Weekly,Monthly,Yearly}> ] clmgr query fallback_timer [<timer>[,<timer#2>,<timer#n>,...] ] clmgr delete fallback_timer {<timer>[,<timer#2>,<timer#n>,...] |\ ALL} NOTE: aliases for "fallback_timer" are "fa" and "timer". PERSISTENT IP/LABEL: clmgr add persistent_ip <persistent_IP> \ NETWORK=<network> \ [ NODE=<node> ] clmgr modify persistent_ip <persistent_label> \ [ NEWNAME=<new_persistent_label> ] \ [ NETWORK=<new_network> ] \ [ PREFIX=<new_prefix_length> ] clmgr query persistent_ip [ <persistent_IP>[,<pIP#2>,<pIP#n>,...] ] clmgr delete persistent_ip {<persistent_IP>[,<pIP#2>,<pIP#n>,...] | ALL} clmgr move persistent_ip <persistent_IP> \ INTERFACE=<new_interface> NOTE: an alias for "persistent_ip" is "pe". SERVICE IP/LABEL: clmgr add service_ip <service_ip> \ NETWORK=<network> \ [ {NETMASK=<255.255.255.0 | PREFIX=1..128} ] \ [ HWADDR=<new_hardware_address> ] \ [ SITE=<new_site> ] clmgr modify service_ip <service_ip> \ [ NEWNAME=<new_service_ip> ] \ [ NETWORK=<new_network> ] \ [ {NETMASK=<###.###.###.###> | PREFIX=1..128} ] \ [ HWADDR=<new_hardware_address> ] \ [ SITE=<new_site> ] clmgr query service_ip [ <service_ip>[,<service_ip#2>,...] ] clmgr delete service_ip {<service_ip>[,<service_ip#2>,,...] | ALL} clmgr move service_ip <service_ip> \ INTERFACE=<new_interface> NOTE: if the "NETMASK/PREFIX" attributes are not specified, the netmask/prefix value for the underlying network is used. NOTE: an alias for "service_ip" is "se". APPLICATION CONTROLLER: clmgr add application_controller <application_controller> \ STARTSCRIPT="/path/to/start/script" \ STOPSCRIPT ="/path/to/stop/script" [ MONITORS=<monitor>[,<monitor#2>,<monitor#n>,...] ] Appendix D. The clmgr man page 511 clmgr modify application_controller <application_controller> \ [ NEWNAME=<new_application_controller_label> ] \ [ STARTSCRIPT="/path/to/start/script" ] \ [ STOPSCRIPT ="/path/to/stop/script" ] [ MONITORS=<monitor>[,<monitor#2>,<monitor#n>,...] ] clmgr query application_controller [ <appctlr>[,<appctlr#2>,...] ] clmgr delete application_controller {<appctlr>[,<appctlr#2>,...] | \ ALL} NOTE: "appctlr" is an abbreviation for "application_controller". NOTE: aliases for "application_controller" are "ac" and "app". APPLICATION MONITOR: clmgr add application_monitor <monitor> \ TYPE={Process|Custom} \ APPLICATIONS=<appctlr#1>[,<appctlr#2>,<appctlr#n>,...] \ MODE={continuous|startup|both} \ [ STABILIZATION="1 .. 3600" ] \ [ RESTARTCOUNT="0 .. 100" ] \ [ FAILUREACTION={notify|fallover} ] \ Process Arguments: PROCESSES="pmon1,dbmon,..." \ OWNER="<processes_owner_name>" \ [ INSTANCECOUNT="1 .. 1024" ] \ [ RESTARTINTERVAL="1 .. 3600" ] \ [ NOTIFYMETHOD="</script/to/notify>" ] \ [ CLEANUPMETHOD="</script/to/cleanup>" ] \ [ RESTARTMETHOD="</script/to/restart>" ] Custom Arguments: MONITORMETHOD="/script/to/monitor" \ [ MONITORINTERVAL="1 .. 1024" ] \ [ HUNGSIGNAL="1 .. 63" ] \ [ RESTARTINTERVAL="1 .. 3600" ] \ [ NOTIFYMETHOD="</script/to/notify>" ] \ [ CLEANUPMETHOD="</script/to/cleanup>" ] \ [ RESTARTMETHOD="</script/to/restart>" ] NOTE: "STABILIZATION" defaults to 180 NOTE: "RESTARTCOUNT" defaults to 3 clmgr modify application_monitor <monitor> \ [ NEWNAME=<new_monitor_label> ] \ [ See the "add" action, above, for a list of supported modification attributes. ] clmgr query application_monitor [ <monitor>[,<monitor#2>,...] ] clmgr delete application_monitor {<monitor>[,<monitor#2>,...] | ALL} NOTE: "appctlr" is an abbreviation for "application_controller". NOTE: aliases for "application_monitor" are "am" and "mon". DEPENDENCY: 512 IBM PowerHA SystemMirror 7.1 for AIX # Temporal Dependency (parent ==> child) clmgr add dependency \ PARENT=<rg#1> \ CHILD="<rg#2>[,<rg#2>,<rg#n>...]" clmgr modify dependency <parent_child_dependency> \ [ TYPE=PARENT_CHILD ] \ [ PARENT=<rg#1> ] \ [ CHILD="<rg#2>[,<rg#2>,<rg#n>...]" ] # Temporal Dependency (start/stop after) clmgr add dependency \ {STOP|START}="<rg#2>[,<rg#2>,<rg#n>...]" \ AFTER=<rg#1> clmgr modify dependency \ [ TYPE={STOP_AFTER|START_AFTER} ] \ [ {STOP|START}="<rg#2>[,<rg#2>,<rg#n>...]" ] \ [ AFTER=<rg#1> ] # Location Dependency (colocation) clmgr add dependency \ SAME={NODE|SITE} \ GROUPS="<rg1>,<rg2>[,<rg#n>...]" clmgr modify dependency <colocation_dependency> \ [ TYPE=SAME_{NODE|SITE} ] \ GROUPS="<rg1>,<rg2>[,<rg#n>...]" # Location Dependency (anti-colocation) clmgr add dependency \ HIGH="<rg1>,<rg2>,..." \ INTERMEDIATE="<rg3>,<rg4>,..." \ LOW="<rg5>,<rg6>,..." clmgr modify dependency <anti-colocation_dependency> \ [ TYPE=DIFFERENT_NODES ] \ [ HIGH="<rg1>,<rg2>,..." ] \ [ INTERMEDIATE="<rg3>,<rg4>,..." ] \ [ LOW="<rg5>,<rg6>,..." ] # Acquisition/Release Order clmgr add dependency \ TYPE={ACQUIRE|RELEASE} \ { SERIAL="{<rg1>,<rg2>,...|ALL}" | PARALLEL="{<rg1>,<rg2>,...|ALL}" } clmgr modify dependency \ TYPE={ACQUIRE|RELEASE} \ { SERIAL="{<rg1>,<rg2>,...|ALL}" | PARALLEL="{<rg1>,<rg2>,...|ALL}" } clmgr query dependency [ <dependency> ] clmgr delete dependency {<dependency> | ALL} \ [ TYPE={PARENT_CHILD|STOP_AFTER|START_AFTER| \ SAME_NODE|SAME_SITE}|DIFFERENT_NODES} ] clmgr delete dependency RG=<RESOURCE_GROUP> NOTE: an alias for "dependency" is "de". Appendix D. The clmgr man page 513 TAPE: clmgr add tape <tape> \ DEVICE=<tape_device_name> \ [ DESCRIPTION=<tape_device_description> ] \ [ START="</script/to/start/tape/device>" ] \ [ START_SYNCHRONOUSLY={no|yes} ] \ [ STOP="</script/to/stop/tape/device>" ] \ [ STOP_SYNCHRONOUSLY={no|yes} ] clmgr modify tape <tape> \ [ NEWNAME=<new_tape_label> ] \ [ DEVICE=<tape_device_name> ] \ [ DESCRIPTION=<tape_device_description> ] \ [ START="</script/to/start/tape/device>" ] \ [ START_SYNCHRONOUSLY={no|yes} ] \ [ STOP="</script/to/stop/tape/device>" ] \ [ STOP_SYNCHRONOUSLY={no|yes} ] clmgr query tape [ <tape>[,<tape#2>,<tape#n>,...] ] clmgr delete tape {<tape> | ALL} NOTE: an alias for "tape" is "tp". FILE COLLECTION: clmgr add file_collection <file_collection> \ FILES="/path/to/file1,/path/to/file2,..." \ [ SYNC_WITH_CLUSTER={no|yes} ] \ [ SYNC_WHEN_CHANGED={no|yes} ] \ [ DESCRIPTION="<file_collection_description>" ] clmgr modify file_collection <file_collection> \ [ NEWNAME="<new_file_collection_label>" ] \ [ ADD="/path/to/file1,/path/to/file2,..." ] \ [ DELETE={"/path/to/file1,/path/to/file2,..."|ALL} ] \ [ REPLACE={"/path/to/file1,/path/to/file2,..."|""} ] \ [ SYNC_WITH_CLUSTER={no|yes} ] \ [ SYNC_WHEN_CHANGED={no|yes} ] \ [ DESCRIPTION="<file_collection_description>" ] clmgr query file_collection [ <file_collection>[,<fc#2>,<fc#n>,...]] clmgr delete file_collection {<file_collection>[,<fc#2>,<fc#n>,...]| ALL} clmgr sync file_collection <file_collection> NOTE: the "REPLACE attribute replaces all existing files with the specified set NOTE: aliases for "file_collection" are "fc" and "fi". SNAPSHOT: clmgr add snapshot <snapshot> \ DESCRIPTION="<snapshot_description>" \ [ METHODS="method1,method2,..." ] \ [ SAVE_LOGS={false|true} ] clmgr modify snapshot <snapshot> \ [ NEWNAME="<new_snapshot_label>" ] \ [ DESCRIPTION="<snapshot_description>" ] clmgr query snapshot [ <snapshot>[,<snapshot#2>,<snapshot#n>,...] ] clmgr view snapshot <snapshot> \ [ TAIL=<number_of_trailing_lines> ] \ 514 IBM PowerHA SystemMirror 7.1 for AIX [ [ [ [ HEAD=<number_of_leading_lines> ] \ FILTER=<pattern>[,<pattern#2>,<pattern#n>,...] ] \ DELIMITER=<alternate_pattern_delimiter> ] \ CASE={insensitive|no|off|false} ] clmgr delete snapshot {<snapshot>[,<snapshot#2>,<snapshot#n>,...] | ALL} clmgr manage snapshot restore <snapshot> \ [ CONFIGURE={yes|no} ] \ [ FORCE={no|yes} ] NOTE: the "view" action displays the contents of the ".info" file for the snapshot, if that file exists. NOTE: CONFIGURE defaults to "yes"; FORCE defaults to "no". NOTE: an alias for "snapshot" is "sn". METHOD: clmgr add method <method_label> \ TYPE={snapshot|verify} \ FILE=<executable_file> \ [ DESCRIPTION=<description> ] clmgr modify method <method_label> \ TYPE={snapshot|verify} \ [ NEWNAME=<new_method_label> ] \ [ DESCRIPTION=<new_description> ] \ [ FILE=<new_executable_file> ] clmgr add method <method_label> \ TYPE=notify \ CONTACT=<number_to_dial_or_email_address> \ EVENT=<event>[,<event#2>,<event#n>,...] \ [ NODES=<node>[,<node#2>,<node#n>,...] ] \ [ FILE=<message_file> ] \ [ DESCRIPTION=<description> ] \ [ RETRY=<retry_count> ] \ [ TIMEOUT=<timeout> ] NOTE: "NODES" defaults to the local node. clmgr modify method <method_label> \ TYPE=notify \ [ NEWNAME=<new_method_label> ] \ [ DESCRIPTION=<description> ] \ [ FILE=<message_file> ] \ [ CONTACT=<number_to_dial_or_email_address> ] \ [ EVENT=<cluster_event_label> ] \ [ NODES=<node>[,<node#2>,<node#n>,...] ] \ [ RETRY=<retry_count> ] \ [ TIMEOUT=<timeout> ] clmgr query method [ <method>[,<method#2>,<method#n>,...] ] \ [ TYPE={notify|snapshot|verify} ] clmgr delete method {<method>[,<method#2>,<method#n>,...] | ALL} \ [ TYPE={notify|snapshot|verify} ] clmgr verify method <method> NOTE: the "verify" action can only be applied to "notify" methods. Appendix D. The clmgr man page 515 If more than one method exploits the same event, and that event is specified, then both methods will be invoked. NOTE: an alias for "method" is "me". LOG: clmgr modify logs ALL DIRECTORY="<new_logs_directory>" clmgr modify log {<log>|ALL} \ [ DIRECTORY="{<new_log_directory>"|DEFAULT} ] [ FORMATTING={none|standard|low|high} ] \ [ TRACE_LEVEL={low|high} ] [ REMOTE_FS={true|false} ] clmgr query log [ <log>[,<log#2>,<log#n>,...] ] clmgr view log [ {<log>|EVENTS} ] \ [ TAIL=<number_of_trailing_lines> ] \ [ HEAD=<number_of_leading_lines> ] \ [ FILTER=<pattern>[,<pattern#2>,<pattern#n>,...] ] \ [ DELIMITER=<alternate_pattern_delimiter> ] \ [ CASE={insensitive|no|off|false} ] clmgr manage logs collect \ [ DIRECTORY="<directory_for_collection>" ] \ [ NODES=<node>[,<node#2>,<node#n>,...] ] \ [ RSCT_LOGS={yes|no} ] \ NOTE: when "DEFAULT: is specified for the "DIRECTORY" attribute, then the original, default IBM PowerHA SystemMirror directory value is restored. NOTE: the "FORMATTING" attribute only applies to the "hacmp.out" log, and is ignored for all other logs. NOTE: the "FORMATTING" and "TRACE_LEVEL" attributes only apply to the "hacmp.out" and "clstrmgr.debug" logs, and are ignored for all other logs. NOTE: when "ALL" is specified in place of a log name, then the provided DIRECTORY and REMOTE_FS modifications are applied to all the logs. NOTE: when "EVENTS" is specified in place of a log name, then an events summary report is displayed. VOLUME GROUP: clmgr query volume_group LOGICAL VOLUME: clmgr query logical_volume FILE_SYSTEM: clmgr query file_system PHYSICAL VOLUME: clmgr query physical_volume \ [ <disk>[,<disk#2>,<disk#n>,...] ] \ [ NODES=<node>,<node#2>[,<node#n>,...] ] \ [ ALL={no|yes} ] NOTE: "node" may be either a node name, or a networkresolvable name (i.e. hostname or IP address). 516 IBM PowerHA SystemMirror 7.1 for AIX NOTE: "disk" may be either a device name (e.g. "hdisk0") or a PVID (e.g. "00c3a28ed9aa3512"). NOTE: an alias for "physical_volume" is "pv". REPORT: clmgr view report [<report>] \ [ FILE=<PATH_TO_NEW_FILE> ] \ [ TYPE={text|html} ] clmgr view report {nodeinfo|rginfo|lvinfo| fsinfo|vginfo|dependencies} \ [ TARGETS=<target>[,<target#2>,<target#n>,...] ] \ [ FILE=<PATH_TO_NEW_FILE> ] \ [ TYPE={text|html} ] clmgr view report availability \ [ TARGETS=<appctlr>[,<appctlr#2>,<appctlr#n>,...] ] \ [ FILE=<PATH_TO_NEW_FILE> ] \ [ TYPE={text|html} ] \ [ BEGIN_TIME="YYYY:MM:DD" ] \ [ END_TIME="YYYY:MM:DD" ] NOTE: the currently supported reports are "basic", "cluster", "status", "topology", "applications", "availability", "events", "nodeinfo", "rginfo", "networks", "vginfo", "lvinfo", "fsinfo", and "dependencies". Some of these reports provide overlapping information, but each also provides its own, unique information, as well. NOTE: "appctlr" is an abbreviation for "application_controller". NOTE: "MM" must be 1 - 12. "DD" must be 1 - 31. NOTE: if no "BEGIN_TIME" is provided, then a report will be generated for the last 30 days prior to "END_TIME". NOTE: if no "END_TIME" is provided, then the current time will be the default. NOTE: an alias for "report" is "re". Usage Examples ============== clmgr query cluster * For output that is more easily consumed by other programs, alternative output formats, such as colon-delimited or XML, may be helpful: clmgr -c query cluster clmgr -x query node nodeA * Most multi-value lists can be specified in either a colon-delimited manner, or via quoted strings: clmgr -a cluster_id,version query cluster clmgr -a "cluster_id version" query cluster * Combinations of option flags can be used to good effect. For example, to retrieve a single value for a single attribute: clmgr -cSa "version" query cluster Appendix D. The clmgr man page 517 * Attribute-based searching can help filter out unwanted data, ensuring that only the desired results are returned: clmgr -v -a "name" q rg nodes="*nodeB*" clmgr query file_collection files="*rhosts*" * Application availability reports can help measure application uptime requirements: clmgr view report availability clmgr add cluster tryme nodes=nodeA,nodeB clmgr add application_controller manage_httpd \ start_script=/usr/local/bin/scripts/start_ihs.sh \ stop_script=/usr/local/bin/scripts/stop_ihs.sh clmgr add application_monitor monitor_httpd \ type=process \ applications=manage_httpd \ processes=httpd \ owner=root \ mode=continuous \ stabilization=300 \ restartcount=3 \ failureaction=notify \ notifymethod="/usr/local/bin/scripts/ihs_notification.sh" \ cleanupmethod="/usr/local/bin/scripts/web_app_cleanup.sh" \ restartmethod="/usr/local/bin/scripts/start_ihs.sh" clmgr add resource_group ihs_rg \ nodes=nodeA,nodeB \ startup=OFAN \ fallover=FNPN \ fallback=NFB \ node_priority_policy=mem \ applications=manage_httpd clmgr view log hacmp.out FILTER=Event: Suggested Reading ================= IBM PowerHA SystemMirror for AIX Troubleshooting Guide IBM PowerHA SystemMirror for AIX Planning Guide IBM PowerHA SystemMirror for AIX Installation Guide Prerequisite Information ======================== IBM PowerHA SystemMirror for AIX Concepts and Facilities Guide Related Information =================== IBM PowerHA SystemMirror for AIX Administration Guide 518 IBM PowerHA SystemMirror 7.1 for AIX Related publications The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this book. IBM Redbooks The following IBM Redbooks publication provides additional information about the topic in this document. Note that it might be available in softcopy only. Best Practices for DB2 on AIX 6.1 for POWER Systems, SG24-7821 DS8000 Performance Monitoring and Tuning, SG24-7146 IBM AIX Version 7.1 Differences Guide, SG24-7910 IBM System Storage DS8700 Architecture and Implementation, SG24-8786 Implementing IBM Systems Director 6.1, SG24-7694 Personal Communications Version 4.3 for Windows 95, 98 and NT, SG24-4689 PowerHA for AIX Cookbook, SG24-7739 You can search for, view, download or order this document and other Redbooks, Redpapers, Web Docs, draft, and additional materials, at the following website: ibm.com/redbooks Other publications These publications are also relevant as further information sources: Cluster Management, SC23-6779 PowerHA SystemMirror for IBM Systems Director, SC23-6763 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Administration Guide, SC23-6750 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Concepts and Facilities Guide, SC23-6751 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Installation Guide, SC23-6755 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Master Glossary, SC23-6757 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Planning Guide, SC23-6758-01 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Programming Client Applications, SC23-6759 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist Developer’s Guide, SC23-6753 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist for DB2 user’s Guide, SC23-6752 © Copyright IBM Corp. 2011. All rights reserved. 519 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist for Oracle User’s Guide, SC23-6760 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist for WebSphere User’s Guide, SC23-6762 PowerHA SystemMirror Version 7.1 for AIX Standard Edition Troubleshooting Guide, SC23-6761 Online resources These websites are also relevant as further information sources: IBM PowerHA SystemMirror for AIX http://www.ibm.com/systems/power/software/availability/aix/index.html PowerHA hardware support matrix http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105638 IBM PowerHA High Availability wiki http://www.ibm.com/developerworks/wikis/display/WikiPtype/High%20Availability Implementation Services for Power Systems for PowerHA for AIX http://www.ibm.com/services/us/index.wss/offering/its/a1000032 IBM training classes for PowerHA SystemMirror for AIX http://www.ibm.com/training Help from IBM IBM Support and downloads ibm.com/support IBM Global Services ibm.com/services 520 IBM PowerHA SystemMirror 7.1 for AIX Index /etc/cluster/rhosts file 73 collection monitoring 202 populating 183 rolling migration 186 snapshot migration 168 /etc/filesystems file 318 /etc/hosts file, collection monitoring 202 /etc/inittab file, cluster monitoring 206 /etc/services file 139 /etc/syslogd.conf file, cluster monitoring 206 /tmp/clmigcheck/clmigcheck.log 161 /usr/es/sbin/cluster/utilities/ file 233 /var/adm/ras/syslog.caa log file 229 /var/hacmp/log/clutils.log file, clmgr debugging 131 /var/log/clcomd/clcomd.log file 313 #5273/#5735 PCI-Express Dual Port Fibre Channel Adapter 499 Application Availability and Configuration reports 358 application configuration 86 application controller configuration 91 versus application server 67 application list 353 application monitoring 368 Application Name field tip 143 application server clmgr command 120 versus application controller 67 application startup, testing with Startup Monitoring configured 298 architecture changes for RSCT 3.1 3 IBM Systems Director 22 PowerHA SystemMirror 1 Autonomic Health Advisor File System (AHAFS) 11 files used in RSCT 12 A B Symbols -a option clmgr command 109 wildcards 110 ADAPTER_DOWN event 12 adapters Ethernet 499 fibre channel 498 for the repository disk 49 InfiniBand 500 PCI bus 500 SAS 499 SCSI and iSCSI 500 adaptive failover 35, 102 Add Network tab 344 adding on new volume group 416 Additional Properties tab 257 agent password 328 AHAFS (Autonomic Health Advisor File System) 11 files used in RSCT 12 AIX commands and log files 216 disk and dev_group association 443 importing volume groups 383 installation of IBM Systems Director 327 volume group configuration 381 AIX 6.1 67 AIX 6.1 TL6 152 for migration 47 upgrading to 153 AIX 7.1 support of PowerHA 6.1 193 AIX BOS components installation 59 prerequisites for 44 AIX_CONTROLLED interface state 18 © Copyright IBM Corp. 2011. All rights reserved. bootstrap repository 225 built-in serial ports 493–494 C -c flag 209 CAA (Cluster Aware AIX) 7, 224 /etc/filesystems file 318 AIX 6.1 and 7.1 3 central repository 9 changed PVID of repository disk 322 chcluster command 480 clcomdES 24 cluster after the node restarts 317 cluster commands 477 cluster creation 154, 318 cluster environment management 11 cluster services not active message 323 clusterconf command 481 collecting debug information for IBM support 230 commands and log files 224 communication 156 daemons 8 disk fencing 37 file sets 7, 179 initial cluster status 82 log files for troubleshooting 306 lscluster command 478 mkcluster command 478 previously used repository disk 316 removal of volume group when rmcluster does not 320 repository disk 9 repository disk replacement 317 521 rmcluster command 480 RSCT changes 8 services, adding a shared disk 173 subsystem group active 208 subsystem guide 208 subsystems 202 support in RSCT v3.1 3 switch from Group Services 156 troubleshooting 316 volume group already in use 320 previously used 320 zone 211 Can’t find what you are looking for ? 66 Capture Snapshot window 345 CCI instance 424 central cluster repository-based communication (DPCOM) interface 15 states 15 central repository 9 chcluster command 480 description 480 examples 481 flags 481 syntax 480 checking the configuration 164 clcomd instances 157 clcomd subsystem 157 clcomdES daemon 157 clcomdES subsystem 157 clconfd subsystem 8 clconvert_snapshot command 168 cldump utility 233, 312 clevmgrdES subsystem 31 CLI cluster creation 340 cluster creation with SystemMirror plug-in 339 command help 341 examples of command usage for resource group management 360 clinfo command 127 clmgr add cluster command 114, 118 clmgr add resource_group command 111 clmgr add resource_group -h command 111 clmgr command 131, 501 -a option 109 actions 104 alternative output formats 130 application server 120 cluster definition synchronization 124 cluster start 127 colon-delimited format 130 configuring a PowerHA cluster 112 displaying log file content 132 enhanced search capability 109 error messages 106 log file 130 new cluster configuration 113 object classes 105 resource group 120 522 IBM PowerHA SystemMirror 7.1 for AIX return of only one value 110 service address 118 simple XML format 130 syntax 113 usage examples 106 using help 111 -v option 110 clmgr debugging, /var/hacmp/log/clutils.log file 131 clmgr man page 501 clmgr online cluster start_cluster command 129 clmgr query cluster command 108 clmgr query command 107, 122 clmgr sync cluster command 124 clmgr utility 65, 241 cluster configuration 104 PowerHA log files 307 query action 241 view action 242 clmgr verify cluster command 124 clmgr view log command 307 clmigcheck command 153 menu options 164 process overview 158 profile 157 program 157, 167 running 159 running on one node 168 clmigcheck script 308 clmigcheck.txt file 160 clmigcleanup process 155 clRGinfo command 284 clshowres command 393 clstat utility 231, 312 interactive mode 231 -o flag 232 cltopinfo command 82, 118, 309 cluster adding 385 adding networks 387 adding sites 386 checking in rolling migration 191 configuration 385 configuration synchronization 454 creation CAA 318 with CLI 340 event flow when a node joins 39 IP address 67 menu 253 multicast IP address, configuration 73 name 114 removal of 103 restarting 183 starting 403 status 205 topology, custom configuration 78 Cluster Applications and Resources 27 Cluster Aware AIX (CAA) 7, 224 /etc/filesystems file 318 AIX 6.1 and 7.1 3 central repository 9 chcluster command 480 clcomdES 24 cluster after node restarts 317 cluster commands 477 cluster creation 154, 318 cluster environment management 11 clusterconf command 481 collecting debug information for IBM support 230 commands and log files 224 communication 156 daemons 8 disk fencing 37 file sets 7, 179 initial cluster status 82 log files for troubleshooting 306 lscluster command 478 mkcluster command 478 repository disk 9 rmcluster command 480 RSCT changes 8 services, adding a shared disk 173 subsystem group active 208 subsystem guide 208 subsystems 202 support in RSCT v3.1 3 troubleshooting 316 volume group, previously used 320 zone 211 cluster communication 13 heartbeat configuration 20 interfaces 13 AIX_CONTROLLED 18 central cluster repository-based communication (DPCOM) 15 IP network interfaces 13 RESTRICTED 18 SAN-based communication (SFWCOM) 14 node status 18 round-trip time 20 cluster configuration 65 clmgr utility 65, 104 defining 70 event failure recovery 370 node names 70 problem determination data collection 370 recovery from issues 369 resource groups 95 SMIT menu 65–66 custom configuration 68, 78 repository disk and cluster multicast IP address 73 resource group dependencies 96 resources and applications configuration 86 resources configuration 68 typical configuration 67, 69 starting all nodes 129 SystemMirror for IBM Systems Director 133 SystemMirror plug-in 65 test environment 68 undoing local changes 370 verification and synchronization 360 CLI 363 GUI 360 Cluster Configuration Report 366 cluster creation 333–334 common storage 337 host names in FQDN format 75 SystemMirror plug-in CLI 339 SystemMirror plug-in wizard 334 cluster event management 11 cluster implementation hardware requirements 44 migration planning 46 network 50 planning for high availability 43 PowerHA 7.1 considerations 46 prerequisites for AIX BOS components 44 prerequisites for RSCT components 44 software requirements 44 storage 48 supported hardware 45 cluster interfaces listing 234 cluster management 333 functionality 343 modification functionality 349 storage management 345 SystemMirror plug-in 341 CLI 347 SystemMirror plug-in GUI wizard 341–342 Cluster Management window 343 Cluster Management Wizard 342 cluster modification locks 369 cluster monitoring 201 /etc/inittab file 206 /etc/syslogd.conf file 206 /usr/es/sbin/cluster/utilities/ file tools 233 /var/adm/ras/syslog.caa log file 229 active cluster 368 AIX commands and log files 216 application monitoring 368 CAA commands and log files 224 CAA debug information for IBM support 230 CAA subsystem group active 208 CAA subsystem guide 208 CAA subsystems 202 cldump utility 233 clmgr utility 241 clstat utility 231 Cluster Configuration Report 366 cluster modification locks 369 cluster status 205, 208, 217 Cluster Topology Configuration Report 367 common agent subsystems 205 disk configuration 203, 207, 216 Group Services 218 information collection after cluster configuration 206 after cluster is running 216 before configuration 202 Index 523 lscluster command for cluster information 209 map view 365 multicast information 205, 207, 217 network configuration and routing table 218 network interfaces configuration 203 ODM classes 236 of activities before starting a cluster 364 PowerHA groups 203 recovery from configuration issues 369 repository disk 206 repository disk, CAA, solidDB 224 routing table 204 solidDB log files 229 subsystem services status 366 SystemMirror plug-in 364 tcipdump, iptrace, mping utilities 220 tools 231 topology view 364 cluster node installation of SystemMirror agent 330 status and mapping 287 Cluster Nodes and Networks 27 cluster resources, configuration 388 Cluster services are not active message 323 Cluster Snapshot menu 31 Cluster Standard Configuration menu 29 cluster status 208 cluster still stuck in migration condition 308 cluster testing 259, 297 CPU starvation 292 crash in node with active resource group 289 dynamic node priority 302 Group Services failure 296 loss of the rootvg volume group 286, 289 network failure 283 network failure simulation 282 repository disk heartbeat channel 269 rootvg system event 286 rootvg volume group offline 288 SAN-based heartbeat channel 260 cluster topology 385 Cluster Topology Configuration Report 367 cluster topology information 234 CLUSTER_OVERRIDE environment variable 36 deleting the variable 37 error message 37 clusterconf command 481 description 481 examples 482 flags 482 syntax 481 clutils file 306 Collect log files button 347 collection monitoring /etc/cluster/rhosts file 202 /etc/hosts file 202 colon-delimited format of clmgr command 130 command help 341 profile 157 524 IBM PowerHA SystemMirror 7.1 for AIX common storage 337 communication interfaces, adding 387 communication node status 18 communication path 314 components, Reliable Scalable Cluster Technology (RSCT) 2 configuration AIX disk and dev_group association 443 cluster 385 adding 385 adding a node 386 adding communication interfaces 387 cluster resources and resource group 388 Hitachi TrueCopy/HUR resources 429 PowerHA cluster 65 recovery from issues 369 troubleshooting 312 verification and synchronization 360 CLI 363 GUI 360 verification of Hitachi TrueCopy/HUR 453 volume groups and file systems on primary site 381 Configure Persistent Node IP Label/Address menu 31 CPU starvation 292 Create Dependency function 357 creation custom resource group 351 predefined resource group 353 resource group 349 verifying 355 C-SPOC adding Global Mirror pair to existing volume group 405 creating a volume group 412 disaster recovery 373 on other LVM operations 422 storage resources and resource groups 86 cthags, grpsvcs 6 Custom Cluster Configuration menu 30 custom configuration 68, 78 verifying and synchronizing 81 D -d flag 213 daemons CAA 8 clcomd 8 clconfd 8 cld 8 failure in Group Services 12 data collection, problem determination 370 DB2 installation on nodes for Smart Assist 136 instance and database on shared 137 debug information, collecting for IBM support 230 default value 122 dev_group association and AIX disk configuration 443 disaster recovery C-SPOC operations 373 DS8700 requirements 372 Global Mirror 371 adding a cluster 385 adding a logical volume 407 adding a node 386 adding a pair to a new volume group 411 adding a pair to an existing volume group 404 adding communication interfaces 387 adding networks 387 adding new logical volume on new volume group 416 adding sites 386 AIX volume group configuration 381 cluster configuration 385 cluster resources and resource group configuration 388 considerations 373 creating new volume group 412 DS8700 requirements 372 failover testing 393 FlashCopy relationship creation 379 Global Copy relationship creation 378 graceful site failover 395 importing new volume group to remote site 416 importing volume groups 383 installing DSCLI client software 373 LVM administration of replicated resources 404 mirror group 391 planning 372 PPRC path creation 377 relationship configuration 377 resource configuration 374–375 resource definition 389 resources and resource group definition 391 rolling site failure 398 service IP definition 388 session identifier 379 sessions for involved LSSs 380 site re-integration 400 size increase of existing file system 410 software prerequisites 372 source and target volumes 380 storage agent 389 storage system 390 symmetrical configuration 376 synchronization and verification of cluster configuration 416 testing fallover after adding new volume group 417 volume group and file system configuration 381 Hitachi TrueCopy/HUR 419 adding logical volume 466 adding LUN pair to new volume group 469 adding LUN pairs to existing volume group 463 adding replicated resources 451 to a resource group 452 AIX disk and dev_group association 443 asynchronous pairing 439 CCI software installation 422 cluster configuration synchronization 454 configuration verification 453 considerations 421 creating volume groups and file systems on replicated disks 447 defining managed replicated resource to PowerHA 451 failover testing 454 graceful site failover 455, 460 HORCM instances 426 horcm.conf files 425 increasing size of existing file system 468 LVM administration of replicated pairs 463 management 422 minimum connectivity requirements 420 planning 420 replicated pair creation 432 resource configuration 429 rolling site failure 457, 461 site re-integration 459, 462 software prerequisites 420 Discovery Manager 249 disk configuration 203, 207 AIX 216 disk heartbeat network, removing 310 DNP (dynamic node priority) configuration 102 script for the nodes 102 testing 302 DPCOM 15 dpcom node connection 83 dpcomm interface 213 DPF database support 139 DS storage units 495 DS8000 Global Mirror Replicated Resources field 393 DS8700 disaster recovery requirements 372 DSCLI client software 373 duplicated events, filtering 12 dynamic node priority (DNP) adaptive failover 35 configuration 102 script for the nodes 102 testing 302 E ECM volume group 313 Edit Advanced Properties button 344 error messages clmgr command 106 CLUSTER_OVERRIDE variable 37 Ethernet 499 event failure recovery 370 event flow 38 node down processing normal with takeover 41 startup processing 38 when another node joins the cluster 39 export DISPLAY 329 F failback of PPRC pairs Index 525 primary site 402 secondary site 400 failbackpprc command 402 failover of PPRC pairs back to primary site 401 failover testing 393 graceful site failover 395 Hitachi TrueCopy/HUR 454 rolling site failure 398 site re-integration 400 failoverpprc command 383 fallover testing after adding new volume group 476 after making LVM changes 469 fast path, smitty cm_apps_resources 95 fcsX device busy 57 X value 57 Fibre Channel adapters 495, 498 DS storage units 495 IBM XIV 496 SAN-based communication 57 SVC 497 file collection and logs management 346 file collection creation 346 file sets 7, 61 installation 58, 64 PowerHA 62 PPRC and SPPRC 372 Smart Assist 91 Smart Assist for DB2 136 file systems 121 configuration 381 creation with volume groups 447 importing for Smart Assist for DB2 137 increasing size 468 size increase 410 FILTER argument 132 FlashCopy relationship creation 379 FQDN format on host names 75 G GENXD Replicated Resources field 393 Global Copy relationships 378 Global Mirror adding a cluster 385 adding a logical volume 407 adding a node 386 adding a pair to a new volume group 411 adding a pair to an existing volume group 404 adding communication interfaces 387 adding networks 387 adding new logical volume on new volume group 416 adding sites 386 AIX volume group configuration 381 cluster configuration 385 cluster resources and resource group configuration 388 considerations for disaster recovery 373 creating new volume group 412 C-SPOC operations 373 526 IBM PowerHA SystemMirror 7.1 for AIX disaster recovery 371 DS8700 requirements 372 failover testing 393 graceful site failover 395 rolling site failure 398 site re-integration 400 importing new volume group to remote site 416 importing volume groups 383 LVM administration of replicated resources 404 mirror group 391 planning for disaster recovery 372 software prerequisites 372 PPRC and SPPRC file sets 372 relationship configuration 377 FlashCopy relationship creation 379 Global Copy relationship creation 378 PPRC path creation 377 session identifier 379 sessions for involved LSSs 380 source and target volumes 380 resource configuration 374 prerequisites 375 source and target volumes 375 resource definition 389 resources and resource group definition 391 service IP definition 388 session identifier 379 sessions for all involved LSSs 380 size increase of existing file system 410 source and target volumes 380 storage agent 389 storage system 390 symmetrical configuration 376 synchronization and verification of cluster configuration 416 testing fallover after adding new volume group 417 gossip protocol 13 graceful site failover 395, 455, 460 moving resource group to another site 395 Group Services 2 daemon failure 12 failure 296 information 218 subsystem name cthags 6 grpsvcs 6 switch to CAA 156 grpsvcs cthags 6 SRC subsystem 156 H HACMPtopsvcs class 237 halt -q command 289 hardware configuration Fibre Channel adapters for SAN-based communication 57 SAN zoning 54 shared storage 55 test environment 54 hardware requirements 44 multicast IP address 45 repository disk 45 SAN 45 supported hardware 45 heartbeat considerations for configuration 20 testing 260 heartbeat channel, repository disk 269 help in clmgr command 111 high availability, planning a cluster implementation 43 Hitachi CCI software 422 installation in a non-root directory 423 installation in root directory 423 installing a newer version 424 Hitachi TrueCopy/Hitachi Universal Replicator (Hitachi TrueCopy/HUR) 419 Hitachi TrueCopy/HUR adding LUN pairs to existing volume group 463 adding LUN pairs to new volume group 469 adding new logical volume 466 AIX disk and dev_group association 443 assigning LUNs to hosts 429 asynchronous pairing 439 CCI instance 424 CCI software installation 422 cluster configuration synchronization 454 considerations 421 creating volume groups and file systems on replicated disks 447 failover testing 454 graceful site failover 455, 460 HORCM instances 426 horcm.conf files 425 increasing size of existing file system 468 management 422 minimum connectivity requirements 420 replicated pair creation 432 rolling site failure 457, 461 site re-integration 459, 462 software prerequisites 420 HORCM 444 instance 426 horcm.conf files 425 host groups, assigning LUNs 429 host names FQDN format 75 network planning 51 hostname command 168 I -i flag 211 IBM storage 495 Fibre Channel adapters 495 DS storage units 495 IBM XIV 496 SVC 497 NAS 497 SAS 498 SCSI 498 IBM support, collecting CAA debug information 230 IBM Systems Director 21 advantages 21 agent file 58 agent password 328 architecture 22 availability menu 251 CLI (smcli interface) 257 cluster configuration 133 cluster creation 333–334 cluster management 333 configuration 328 installation 325–326 AIX 327 hardware requirements 326 login page 246 root user 246 smadmin group 246 smcli utility 22 status of common agent subsystems 205 SystemMirror plug-in 21, 65, 329 systems and agents to discover 250 web interface 246 welcome page 248 IBM XIV 496 ifconfig en0 down command 283 IGMP (Internet Group Management Protocol) 14 InfiniBand 500 information collection after cluster is running 216 installation AIX BOS components 59 common agent 331 DSCLI client software 373 hardware configuration 54 IBM Systems Director 325–326 hardware requirements 326 on AIX 327 PowerHA file sets 58, 62 PowerHA software example 59 PowerHA SystemMirror 7.1 for AIX Standard Edition 53 SystemMirror agent 332 SystemMirror plug-in 325, 329 agent installation 330 server installation 329 troubleshooting 312 volume group consideration 64 conversion 64 installp command 133 interfaces excluding configured 213 states 14 up, point of contact down 20 Internet Group Management Protocol (IGMP) 14 Inter-site Management Policy 452, 460, 462 invalid events, filtering 12 IP address 67, 94 snapshot migration 166 IP network interfaces 13, 118 Index 527 states 14 IPAT via aliasing subnetting requirements 51 IPAT via replacement configuration 162 iptrace utility 220, 222 iSCSI adapters 500 L LDEV hex values 433 log files 224 AIX 216 clmgr command 130 displaying content using clmgr command 132 PowerHA 306 troubleshooting 306 logical subsystem (LSS), Global Mirror session definition 380 logical volume 416, 466 adding 407 Logical Volume Manager (LVM) administration of Global Mirror replicated resources 404 commands over repository disk 207 lppchk command 64 lsavailpprcport command 377 lscluster command 18, 82, 478 -c flag 82, 209 cluster information 209 -d flag 213 description 478 examples 478 -i flag 14–15, 20, 83, 211, 275 output 16 -m flag 18, 20, 82, 209, 273 -s flag 215 syntax 478 zone 211 lscluster -m command 18 lslpp command 59 lsmap -all command 287 lspv command 127 lssi command 377 lssrc -ls clstrmgrES command 163 lssrc -ls cthags command 218 lssrc -ls grpsvcs command 218 lsvg command 10 LUN pairs adding to existing volume group 463 adding to new volume group 469 LUNs assigning to hosts 429 LDEV hex values 433 LVM (Logical Volume Manager) commands over repository disk 207 C-SPOC 422 Global Mirror replicated resources 404 lwiplugin.bat script 330 lwiplugin.sh script 330 528 IBM PowerHA SystemMirror 7.1 for AIX M -m flag 209 management interfaces 13 map view 365 migration AIX 6.1 TL6 152 CAA cluster creation 154 clcomdES and clcomd subsystems 157 considerations 152 planning 46 AIX 6.1 TL6 47 PowerHA 7.1 151 premigration checking 153, 157 process 153 protocol 155 snapshot 161 SRC subsystem changes 157 stages 153 switch from Group Services to CAA 156 troubleshooting 308 clmigcheck script 308 cluster still stuck in migration condition 308 non-IP networks 308 upgrade to AIX 6.1 TL6 153 upgrade to PowerHA 7.1 154 mirror group 389, 391 mkcluster command 478 description 479 examples 479 flags 479 syntax 478 mksnapshot command 347 mkss alias 347 modification functionality 349 monitoring 201 /etc/cluster/rhosts file 202 /etc/hosts file 202 /etc/inittab file 206 /etc/syslogd.conf file 206 /usr/es/sbin/cluster/utilities/ file tools 233 /var/adm/ras/syslog.caa log file 229 AIX commands and log files 216 CAA commands and log files 224 CAA debug information for IBM support 230 CAA subsystem group active 208 CAA subsystem guide 208 CAA subsystems 202 cldump utility 233 clmgr utility 241 clstat utility 231 cluster status 205, 208, 217 common agent subsystems 205 disk configuration 203, 207, 216 Group Services 218 IBM Systems Director web interface 246 information collection after cluster configuration 206 after cluster is running 216 before configuration 202 lscluster command for cluster information 209 multicast information 205, 207, 217 network configuration and routing table 218 network interfaces configuration 203 ODM classes 236 PowerHA groups 203 repository disk 206 repository disk, CAA, solidDB 224 routing table 204 solidDB log files 229 tcpdump, iptrace, mping utilities 220 tools 231 Move Resource Group 395, 455 mping utility 220, 223 mpio_get_config command 56 multicast address 51 multicast information 205, 207 netstat command 217 multicast IP address configuration 73 hardware requirements 45 not specified 74 multicast traffic monitoring utilities 220 multipath driver 50 N NAS (network-attached storage) 497 netstat command 217 network configuration 218 network failure simulation 282–283 testing environment 282 network planning 50 host name and node name 51 multicast address 51 network interfaces 51 single adapter networks 51 subnetting requirements for IPAT via aliasing 51 virtual Ethernet 51 Network tab 256 network-attached storage (NAS) 497 networks addition of 387 interfaces 51, 118 configuration 203 Never Fall Back (NFB) 121 node crash with an active resource group 289 down processing normal with takeover 41 event flow when joining a cluster 39 failure 41 status 18 node names cluster configuration 70 network planning 51 NODE_DOWN event 12 nodes adding 386 AIX 6.1 TL6 152 starting all in a cluster 129 non-DPF database support 139 non-IP networks 308 O object classes aliases 105 clmgr 105 supported 106 Object Data Manager (ODM) classes 236 ODM (Object Data Manager) classes 236 odmget command 237 OFFLINE DUE TO TARGET OFFLINE 33 offline migration 191 manually specifying an address 197 planned target configuration 193 planning 191 PowerHA 6.1 support on AIX 7.1 193 procedure 195 process flow 194 starting configuration 192 P pausepprc command 383 PCI bus adapters 500 physical volume ID (PVID) 88 pick list 90 planning cluster implementation for high availability 43 hardware requirements 44 migration 46 network 50 PowerHA 7.1 considerations 46 software requirements 44 storage 48 point of contact 18 down, interface up 20 point-of-contact status 82 POWER Blade servers 494 Power Systems 492 POWER5 systems 492 POWER6 systems 493 POWER7 Systems 494 PowerHA 1 available clusters 253 cluster configuration 65 clmgr command 112 clmgr utility 104 custom configuration 68, 78 resource group dependencies 96 resources and applications configuration 86 resources configuration 68 SMIT 66 SystemMirror for IBM Systems Director 133 typical configuration 67, 69 cluster topology with smitty sysmirror 14 defining Hitachi TrueCopy/HUR managed replicated resource 451 groups, cluster monitoring 203 installation AIX BOS components 59, 62 file sets 62 RSCT components 59 Index 529 SMIT tree 483 supported hardware 491 SystemMirror architecture foundation 1 management interfaces 13 SystemMirror 7.1 1 SystemMirror 7.1 features 23 PowerHA 6.1 support on AIX 7.1 193 PowerHA 7.1 36 considerations 46 file set installation 58 migration to 151–153 software installation 59 SystemMirror plug-in 21 volume group consideration 64 PPRC failing back pairs to primary site 402 failing back pairs to secondary site 400 failing over pairs back to primary site 401 file sets 372 path creation 377 Prefer Primary Site policy 452, 462 premigration checking 153, 157 previous version 1 primary node 120–121 problem determination data collection 370 PVID (physical volume ID) 88, 115 of repository disk 322 Q query action 241 R raidscan command 463 Redbooks Web site, Contact us xiv redundant heartbeat testing 260 refresh -s clcomd command 69 refresh -s syslogd command 306 relationship configuration 377 FlashCopy relationship creation 379 Global Copy relationship creation 378 PPRC path creation 377 session identifier 379 sessions for involved LSSs 380 source and target volumes 380 Reliable Scalable Cluster Technology (RSCT) 2 AHAFS files 12 architecture changes for v3.1 3 CAA support 3 cluster security services 2 components 2 Group Services (grpsvcs) 2 installation 59 PowerHA 5 prerequisites for 44 Remote Monitoring and Control 2 resource managers 2 Topology Services 2 remote site, importing new volume group 416 530 IBM PowerHA SystemMirror 7.1 for AIX replicated disks, volume group and file system creation 447 replicated pairs 432 LVM administration 463 replicated resources adding 451 adding to a resource group 452 defining to PowerHA 451 LVM administration of 404 reports Application Availability and Configuration 358 Cluster Configuration Report 366 Cluster Topology Configuration 367 repository disk 9 changed PVID 322 cluster 224 cluster monitoring 206 configuration 73 hardware requirements 45 heartbeat channel testing 269 LVM command support 207 node connection 83 previously used for CAA 316 replacement 317 snapshot migration 166 resource configuration, Global Mirror 374 prerequisites 375 source and target volumes 375 symmetrical configuration 376 resource group adding 392 adding from C-SPOC 86 adding Hitachi TrueCopy/HUR replicated resources 452 adding resources 392 application list 353 circular dependencies 33 clmgr command 120 configuration 95, 388 crash in node 289 creation verifying 355 with SystemMirror plug-in GUI wizard 349 custom creation 351 definition 391 dependencies, Start After 32 management 355 CLI 359 CLI command usage 360 functionality 357 wizard access 355 moving to another site 395 mutual-takeover dual-node implementation 68 OFFLINE DUE TO TARGET OFFLINE 33 parent/child dependency 33 predefined creation 353 removal 358 status change 359 resource group dependencies Start After and Stop After configuration 96 Stop After 32 Resource Group tab 255 Resource Groups menu 255 resource management 32 adaptive failover 35 dynamic node priority 35 Start After and Stop After resource group dependencies 32 user-defined resource type 34 resource managers 2 Resource Monitoring and Control (RMC) subsystem 2 resource type management 100 user-defined 100 resources adding to a resource group 392 configuration 68, 86 RESTRICTED interface state 18 RMC (Resource Monitoring and Control) subsystem 2 rmcluster command 316, 480 description 480 example 480 flags 480 removal of volume group 320 syntax 480 rolling migration 177 /etc/cluster/rhosts file 183, 186 checking newly migrated cluster 191 migrating the final node 188 migrating the first node 179 migrating the second node 185 planning 178 procedure 178 restarting the cluster 183 troubleshooting 191 rolling site failure 398, 457, 461 root user 246 rootvg system event 31 testing 286 rootvg volume group cluster node status and mapping 287 PowerHA logs 289 testing offline 288 testing the loss of 286 round trip time (rtt) 20, 213 routing table 204, 218 RSCT (Reliable Scalable Cluster Technology) 2, 59 AHAFS files 12 architecture changes for v3.1 3 CAA support 3 changes 8 cluster security services 2 components 2 Group Services 2 PowerHA 5 prerequisites for 44 Remote Monitoring and Control subsystem 2 resource managers 2 Topology Services 2 rtt (round-trip time) 20, 213 S -s flag 215 SAN hardware requirements 45 zoning 54 SAN fiber communication enabling 15 unavailable 15 SAN Volume Controller (SVC) 497 SAN-based communication channel 54 node connection 83 Fibre Channel adapters 57 SAN-based communication (SFWCOM) interface 14 state 15 SAN-based heartbeat channel testing 260, 263 SAS (serial-attached SCSI) 498 adapters 499 SCSI 498 adapters 500 security keys 313 serial-attached SCSI (SAS) 498 adapters 499 service address 118 defined 94 service IP 388 SFWCOM 14 sfwcom interface 213 node connection 83 shared disk, adding to CAA services 173 shared storage 55 for repository disk 48 shared volume group importing for Smart Assist for DB2 137 Smart Assist for DB2 instance and database creation 137 simple XML format of the clmgr command 130 single adapter networks 51 SIRCOL 228 site re-integration 400, 459, 462 failback of PPRC pairs to primary site 402 failback of PPRC pairs to secondary site 400 failover of PPRC pairs back to primary site 401 starting the cluster 403 site relationship 460 sites, addition of 386 smadmin group 246 Smart Assist 91 new location 29 Smart Assist for DB2 135 configuration 147 DB2 installation on both nodes 136 file set installation 136 implementation with SystemMirror cluster 139 instance and database on shared volume group 137 log file 149 prerequisites 136 shared volume group and file systems 137 starting 141 Index 531 steps before starting 139 SystemMirror configuration 139 updating /etc/services file 139 smcli command 257 smcli lslog command 348 smcli mkcluster command 341 smcli mkfilecollection command 348 smcli mksnapshot command 347 smcli synccluster -h -v command 363 smcli undochanges command 363 smcli utility 22 smit bffcreate command 62 smit clstop command 163 SMIT menu 65 changes 66 configuration 66 custom configuration 68, 78 locating available options 66 resource group dependencies configuration 96 resources and applications configuration 86 resources configuration 68 typical configuration 67, 69 SMIT panel 25 Cluster Snapshot menu 31 Cluster Standard Configuration menu 29 Configure Persistent Node IP Label/Address menu 31 Custom Cluster Configuration menu 30 SMIT tree 25 smitty clstart 28 smitty clstop 28 smitty hacmp 26 smit sysmirror fast path 66 SMIT tree 25, 483 smitty clstart command 28 smitty clstop command 28 smitty cm_apps_resources fast path 95 smitty hacmp command 26 smitty sysmirror command 26 PowerHA cluster topology 14 snapshot conversion 168 failure to restore 169 restoration 169 snapshot migration 161, 164 /etc/cluster/rhosts file 168 adding shared disk to CAA services 173 AIX 6.1.6 and clmigcheck installation 163 checklist 176 clmigcheck program 167 cluster verification 175 conversion 168 procedure 163 process overview 162 repository disk and multicast IP addresses 166 restoration 169 snapshot creation 163 stopping the cluster 163 uninstalling SystemMirror 5.5 168 SNMP, clstat and cldump utilities 312 532 IBM PowerHA SystemMirror 7.1 for AIX socksimple command 263 software prerequisites 372 software requirements 44 solid subsystem 8 solidDB 224 log file names 230 log files 229 SQL interface 227 status 225 source and target volumes disaster recovery 375 including in Global Mirror session 380 SPPRC file sets 372 SRC subsystem changes during migration 157 Start After resource group dependency 32, 297 configuration 96 standard configuration testing 298 testing 297 Startup Monitoring, testing application startup 298 startup processing 38 Stop After resource group dependency 32 configuration 96 storage agent 389 Fibre Channel adapters 495 management 345 NAS 497 resources, adding from C-SPOC 86 SAS 498 SCSI 498 system 389–390 Storage Framework Communication (sfwcom) 213 storage planning 48 adapters for the repository disk 49 multipath driver 50 shared storage for repository disk 48 System Storage Interoperation Center 50 Storage tab 256 subnetting requirements for IPAT via aliasing 51 subsystem services status 366 supported hardware 45, 491 supported storage, third-party multipathing software 49–50 SVC (SAN Volume Controller) 497 symmetrical configuration 376 synccluster command 363 synchronization of cluster configuration 360, 416 CLI 363 GUI 360 syslog facility 306 system event, rootvg 31 System Mirror 7.1 resource management 32 rootvg system event 31 System Storage Interoperation Center 50 SystemMirror agent installation 332 cluster and Smart Assist for DB2 implementation 139 configuration for Smart Assist for DB2 139 SystemMirror 5.5, uninstalling 168 SystemMirror 7.1 CAA disk fencing 37 CLUSTER_OVERRIDE environment variable 36 deprecated features 24 event flow differences 38 features 23 installation of the Standard Edition 53 new features 24 planning a cluster implementation for high availability 43 SMIT panel 25 supported hardware 45 SystemMirror plug-in 21 agent installation 330 CLI for cluster creation 339 cluster creation and management 333 cluster management 341 CLI 347 Cluster Management Wizard 342 functionality 343 GUI wizard 341 cluster monitoring 364 activities before starting a cluster 364 cluster subsystem services status 366 Cluster tab 255 common storage 337 creation custom resource group 351 predefined resource group 353 GUI wizard, resource group management 355 initial panel 252 installation 325, 329 verification 329 monitoring an active cluster 368 resource group creation with GUI wizard 349 management, CLI 359 Resource Groups tab 254 server installation 329 verifying creation of a resource group 355 wizard for cluster creation 334 T TAIL argument 132 takeover, node down processing normal 41 tcpdump utility 220 test environment 68 testing application startup with Startup Monitoring configured 298 cluster 259 CPU starvation 292 crash in node with active resource group 289 dynamic node priority 302 failover 393 fallover after adding new volume group 417, 476 fallover after making LVM changes 469 fallover on a cluster after making LVM changes 411 Group Services failure 296 Hitachi TrueCopy/HUR 454 loss of the rootvg volume group 286, 289 network failure 283 network failure simulation 282 repository disk heartbeat channel 269 environment 270 rootvg system event 286 rootvg volume group offline 288 SAN-based heartbeat channel 260, 263 Start After resource group dependency 297 third-party multipathing software 49–50 timeout value 36 top-level menu 67 last two items 67 Topology Services (topsvcs) 2 topology view 364 touch /tmp/syslog.out command 306 troubleshooting 305 CAA 316 changed PVID of repository disk 322 cluster after node restarts 317 cluster creation 318 cluster services not active message 323 previously used repository disk 316 previously used volume group 320 removal of volume group 320 repository disk replacement 317 volume group already in use 320 installation and configuration 312 /var/log/clcomd/clcomd.log file and security keys 313 clstat and cldump utilities and SNMP 312 communication path 314 ECM volume group 313 log files 306 CAA 306 clutils file 306 PowerHA 306 syslog facility 306 migration 308 clmigcheck script 308 cluster still stuck in migration condition 308 non-IP networks 308 rolling migration 191 verbose logging level 307 TrueCopy synchronous pairings 433 TrueCopy/HUR adding replicated resources 451 adding replicated resources to a resource group 452 configuration verification 453 defining managed replicated resource to PowerHA 451 disaster recovery 419 LVM administration of replicated pairs 463 planning for management 420 resource configuration 429 Two-Node Cluster Configuration Assistant 29 typical configuration 67, 69 clcomdES versus clcomd subsystem 70 node names 70 prerequisite 69 Index 533 U uname -L command 287 undo changes 363 undochanges command 363 undoing local changes of a configuration 370 unestablished pairs 447 Universal Replicator asynchronous pairing 439 user-defined resource type 34, 100 UUID 224 V -v option, clmgr command 110 verbose logging level 307 verification cluster configuration 360, 416 configuration CLI 363 GUI 360 Hitachi TrueCopy/HUR configuration 453 verification of cluster configuration 360 VGDA, removal from disk 320 view action 242 virtual Ethernet, network planning 51 volume disk group, previous 180 volume groups 120 adding a Global Mirror pair 404, 411 adding LUN pairs 463, 469 adding new logical volume 416 already in use 320 configuration 381 consideration for installation 64 conversion during installation 64 creating 412 creation with file systems on replicated disks 447 importing in the remote site 383 importing to remote site 416 previously used 320 removal when rmcluster command does not 320 testing fallover after adding 417 Volume Groups option 86 volume, dynamically expanding 404 W web interface 246 wildcards 110 Z zone 211 534 IBM PowerHA SystemMirror 7.1 for AIX IBM PowerHA SystemMirror 7.1 for AIX IBM PowerHA SystemMirror 7.1 for AIX IBM PowerHA SystemMirror 7.1 for AIX IBM PowerHA SystemMirror 7.1 for AIX (1.0” spine) 0.875”<->1.498” 460 <-> 788 pages IBM PowerHA SystemMirror 7.1 for AIX IBM PowerHA SystemMirror 7.1 for AIX Back cover ® IBM PowerHA SystemMirror 7.1 for AIX Learn how to plan for, install, and configure PowerHA with the Cluster Aware AIX component IBM PowerHA SystemMirror 7.1 for AIX is a major product announcement for IBM in the high availability space for IBM Power Systems Servers. This release now has a deeper integration between the IBM high availability solution and IBM AIX. It features integration with the IBM Systems Director, SAP Smart Assist and cache support, the IBM System Storage DS8000 Global Mirror support, and support for Hitachi Storage. See how to migrate to, monitor, test, and troubleshoot PowerHA 7.1 This IBM Redbooks publication contains information about the IBM PowerHA SystemMirror 7.1 release for AIX. This release includes fundamental changes, in particular departures from how the product has been managed in the past, which has necessitated this Redbooks publication. Explore the IBM Systems Director plug-in and disaster recovery This Redbooks publication highlights the latest features of PowerHA SystemMirror 7.1 and explains how to plan for, install, and configure PowerHA with the Cluster Aware AIX component. It also introduces you to PowerHA SystemMirror Smart Assist for DB2. This book guides you through migration scenarios and demonstrates how to monitor, test, and troubleshoot PowerHA 7.1. In addition, it shows how to use IBM Systems Director for PowerHA 7.1 and how to install the IBM Systems Director Server and PowerHA SystemMirror plug-in. Plus, it explains how to perform disaster recovery using IBM DS8700 Global Mirror and Hitachi TrueCopy and Universal Replicator. This publication targets all technical professionals (consultants, IT architects, support staff, and IT specialists) who are responsible for delivering and implementing high availability solutions for their enterprise. ® INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment. For more information: ibm.com/redbooks SG24-7845-00 ISBN 0738435120