Download IBM PowerHA SystemMirror 7.1 for AIX Front cover

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Front cover
IBM PowerHA
SystemMirror 7.1
for AIX
Learn how to plan for, install, and configure
PowerHA with the Cluster Aware AIX component
See how to migrate to, monitor, test,
and troubleshoot PowerHA 7.1
Explore the IBM Systems Director
plug-in and disaster recovery
Dino Quintero
Shawn Bodily
Brandon Boles
Bernhard Buehler
Rajesh Jeyapaul
SangHee Park
Minh Pham
Matthew Radford
Gus Schlachter
Stefan Velica
Fabiano Zimmermann
ibm.com/redbooks
International Technical Support Organization
IBM PowerHA SystemMirror 7.1 for AIX
March 2011
SG24-7845-00
Note: Before using this information and the product it supports, read the information in “Notices” on
page ix.
First Edition (March 2011)
This edition applies to the IBM PowerHA SystemMirror Version 7.1 and IBM AIX Version 6.1 TL6 and 7.1 as
the target.
© Copyright International Business Machines Corporation 2011. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Chapter 1. PowerHA SystemMirror architecture foundation. . . . . . . . . . . . . . . . . . . . . . 1
1.1 Reliable Scalable Cluster Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Overview of the components for Reliable Scalable Cluster Technology. . . . . . . . . 2
1.1.2 Architecture changes for RSCT 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 PowerHA and RSCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Cluster Aware AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 CAA daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 RSCT changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3 The central repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.4 Cluster event management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Cluster communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 Communication interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.2 Communication node status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.3 Considerations for the heartbeat configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3.4 Deciding when a node is down: Round-trip time (rtt) . . . . . . . . . . . . . . . . . . . . . . 20
1.4 PowerHA 7.1 SystemMirror plug-in for IBM Systems Director . . . . . . . . . . . . . . . . . . . 21
1.4.1 Introduction to IBM Systems Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.2 Advantages of using IBM Systems Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.3 Basic architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 2. Features of PowerHA SystemMirror 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Deprecated features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 New features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Changes to the SMIT panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 SMIT tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 The smitty hacmp command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 The smitty clstart and smitty clstop commands. . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.4 Cluster Standard Configuration menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.5 Custom Cluster Configuration menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.6 Cluster Snapshot menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.7 Configure Persistent Node IP Label/Address menu . . . . . . . . . . . . . . . . . . . . . . .
2.4 The rootvg system event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Resource management enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.1 Start After and Stop After resource group dependencies . . . . . . . . . . . . . . . . . . .
2.5.2 User-defined resource type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.3 Dynamic node priority: Adaptive failover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6 CLUSTER_OVERRIDE environment variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 CAA disk fencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8 PowerHA SystemMirror event flow differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.1 Startup processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
© Copyright IBM Corp. 2011. All rights reserved.
23
24
24
25
25
26
28
29
30
31
31
31
32
32
34
35
36
37
38
38
iii
2.8.2 Another node joins the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.8.3 Node down processing normal with takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Chapter 3. Planning a cluster implementation for high availability . . . . . . . . . . . . . . .
3.1 Software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Prerequisite for AIX BOS and RSCT components . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Hardware requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Supported hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Requirements for the multicast IP address, SAN, and repository disk . . . . . . . . .
3.3 Considerations before using PowerHA 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Migration planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Shared storage for the repository disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.2 Adapters supported for storage communication . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.3 Multipath driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.4 System Storage Interoperation Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6.1 Multicast address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6.2 Network interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6.3 Subnetting requirements for IPAT via aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6.4 Host name and node name. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6.5 Other network considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
44
44
44
45
45
46
46
48
48
49
50
50
50
51
51
51
51
51
Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX . . . . . . . . . . . . . . . . . . . . . . .
4.1 Hardware configuration of the test environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 SAN zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.2 Shared storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.3 Configuring the FC adapters for SAN-based communication . . . . . . . . . . . . . . . .
4.2 Installing PowerHA file sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 PowerHA software installation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Volume group consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
54
54
55
57
58
59
64
Chapter 5. Configuring a PowerHA cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.1 Cluster configuration using SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.1 SMIT menu changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.2 Overview of the test environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.1.3 Typical configuration of a cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.1.4 Custom configuration of the cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.1.5 Configuring resources and applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.1.6 Configuring Start After and Stop After resource group dependencies . . . . . . . . . 96
5.1.7 Creating a user-defined resource type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.8 Configuring the dynamic node priority (adaptive failover) . . . . . . . . . . . . . . . . . . 102
5.1.9 Removing a cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Cluster configuration using the clmgr tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.1 The clmgr action commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.2 The clmgr object classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2.3 Examples of using the clmgr command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.2.4 Using help in clmgr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2.5 Configuring a PowerHA cluster using the clmgr command. . . . . . . . . . . . . . . . . 112
5.2.6 Alternative output formats for the clmgr command . . . . . . . . . . . . . . . . . . . . . . . 130
5.2.7 Log file of the clmgr command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.2.8 Displaying the log file content by using the clmgr command . . . . . . . . . . . . . . . 132
5.3 PowerHA SystemMirror for IBM Systems Director . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
iv
IBM PowerHA SystemMirror 7.1 for AIX
Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2 . . . . . . . . . . . . . . . . . .
6.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1 Installing the required file sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.2 Installing DB2 on both nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.3 Importing the shared volume group and file systems . . . . . . . . . . . . . . . . . . . . .
6.1.4 Creating the DB2 instance and database on the shared volume group . . . . . . .
6.1.5 Updating the /etc/services file on the secondary node . . . . . . . . . . . . . . . . . . . .
6.1.6 Configuring IBM PowerHA SystemMirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Implementing a PowerHA SystemMirror cluster and Smart Assist for DB2 7.1 . . . . .
6.2.1 Preliminary steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.2 Starting Smart Assist for DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.3 Completing the configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
135
136
136
136
137
137
139
139
139
139
141
147
Chapter 7. Migrating to PowerHA 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1 Considerations before migrating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Understanding the PowerHA 7.1 migration process . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.1 Stages of migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.2 Premigration checking: The clmigcheck program . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Snapshot migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.1 Overview of the migration process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.2 Performing a snapshot migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.3 Checklist for performing a snapshot migration . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.4 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4 Rolling migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4.1 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4.2 Performing a rolling migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4.3 Checking your newly migrated cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5 Offline migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5.1 Planning the offline migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5.2 Offline migration flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5.3 Performing an offline migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
151
152
153
153
157
161
162
163
176
176
177
178
178
191
191
191
194
195
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster . . . . . . . . . . . . .
8.1 Collecting information before a cluster is configured . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Collecting information after a cluster is configured . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3 Collecting information after a cluster is running . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3.1 AIX commands and log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3.2 CAA commands and log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3.3 PowerHA 7.1 cluster monitoring tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3.4 PowerHA ODM classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3.5 PowerHA clmgr utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3.6 IBM Systems Director web interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3.7 IBM Systems Director CLI (smcli interface) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
201
202
206
216
216
224
231
236
241
246
257
Chapter 9. Testing the PowerHA 7.1 cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1 Testing the SAN-based heartbeat channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2 Testing the repository disk heartbeat channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2.2 Testing environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3 Simulation of a network failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3.2 Testing environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3.3 Testing a network failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.4 Testing the rootvg system event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
259
260
269
269
270
282
282
282
283
286
Contents
v
9.4.1 The rootvg system event. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.4.2 Testing the loss of the rootvg volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.4.3 Loss of rootvg: What PowerHA logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.5 Simulation of a crash in the node with an active resource group . . . . . . . . . . . . . . . .
9.6 Simulations of CPU starvation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.7 Simulation of a Group Services failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.8 Testing a Start After resource group dependency . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.8.1 Testing the standard configuration of a Start After resource group dependency
9.8.2 Testing application startup with Startup Monitoring configured. . . . . . . . . . . . . .
9.9 Testing dynamic node priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
286
286
289
289
292
296
297
298
298
302
Chapter 10. Troubleshooting PowerHA 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1 Locating the log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1.1 CAA log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1.2 PowerHA log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2 Troubleshooting the migration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2.1 The clmigcheck script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2.2 The ‘Cluster still stuck in migration’ condition . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2.3 Existing non-IP networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3 Troubleshooting the installation and configuration . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3.1 The clstat and cldump utilities and the SNMP. . . . . . . . . . . . . . . . . . . . . . . . . .
10.3.2 The /var/log/clcomd/clcomd.log file and the security keys . . . . . . . . . . . . . . . .
10.3.3 The ECM volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3.4 Communication path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4 Troubleshooting problems with CAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4.1 Previously used repository disk for CAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4.2 Repository disk replacement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4.3 CAA cluster after the node restarts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4.4 Creation of the CAA cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4.5 Volume group name already in use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4.6 Changed PVID of the repository disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4.7 The ‘Cluster services are not active’ message . . . . . . . . . . . . . . . . . . . . . . . . .
305
306
306
306
308
308
308
308
312
312
313
313
314
316
316
317
317
318
320
322
323
Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in .
325
11.1 Installing IBM Systems Director Version 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
11.1.1 Hardware requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
11.1.2 Installing IBM Systems Director on AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
11.1.3 Configuring and activating IBM Systems Director. . . . . . . . . . . . . . . . . . . . . . . 328
11.2 Installing the SystemMirror plug-in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
11.2.1 Installing the SystemMirror server plug-in. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
11.2.2 Installing the SystemMirror agent plug-in in the cluster nodes . . . . . . . . . . . . . 330
11.3 Installing the clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
11.3.1 Installing the common agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
11.3.2 Installing the PowerHA SystemMirror agent . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
Chapter 12. Creating and managing a cluster using IBM Systems Director . . . . . . .
12.1 Creating a cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.1.1 Creating a cluster with the SystemMirror plug-in wizard . . . . . . . . . . . . . . . . . .
12.1.2 Creating a cluster with the SystemMirror plug-in CLI . . . . . . . . . . . . . . . . . . . .
12.2 Performing cluster management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.2.1 Performing cluster management with the SystemMirror plug-in GUI wizard. . .
12.2.2 Performing cluster management with the SystemMirror plug-in CLI. . . . . . . . .
12.3 Creating a resource group with the SystemMirror plug-in GUI wizard . . . . . . . . . . .
vi
IBM PowerHA SystemMirror 7.1 for AIX
333
334
334
339
341
341
347
349
12.3.1 Creating a custom resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3.2 Creating a predefined resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3.3 Verifying the creation of a resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.4 Managing a resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.4.1 Resource group management using the SystemMirror plug-in wizard . . . . . . .
12.4.2 Managing a resource group with the SystemMirror plug-in CLI . . . . . . . . . . . .
12.5 Verifying and synchronizing a configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.5.1 Verifying and synchronizing a configuration with the GUI. . . . . . . . . . . . . . . . .
12.5.2 Verifying and synchronizing with the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.6 Performing cluster monitoring with the SystemMirror plug-in . . . . . . . . . . . . . . . . . .
12.6.1 Monitoring cluster activities before starting a cluster . . . . . . . . . . . . . . . . . . . .
12.6.2 Monitoring an active cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.6.3 Recovering from cluster configuration issues . . . . . . . . . . . . . . . . . . . . . . . . . .
351
353
355
355
355
359
360
360
363
364
364
368
369
Chapter 13. Disaster recovery using DS8700 Global Mirror . . . . . . . . . . . . . . . . . . . .
13.1 Planning for Global Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.1 Software prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.2 Minimum DS8700 requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.3 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.2 Installing the DSCLI client software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.3 Scenario description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.4 Configuring the Global Mirror resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.4.1 Checking the prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.4.2 Identifying the source and target volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.4.3 Configuring the Global Mirror relationships. . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.5 Configuring AIX volume groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.5.1 Configuring volume groups and file systems on primary site . . . . . . . . . . . . . .
13.5.2 Importing the volume groups in the remote site . . . . . . . . . . . . . . . . . . . . . . . .
13.6 Configuring the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.6.1 Configuring the cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.6.2 Configuring cluster resources and resource group . . . . . . . . . . . . . . . . . . . . . .
13.7 Failover testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.7.1 Graceful site failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.7.2 Rolling site failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.7.3 Site re-integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.8 LVM administration of DS8000 Global Mirror replicated resources . . . . . . . . . . . . .
13.8.1 Adding a new Global Mirror pair to an existing volume group. . . . . . . . . . . . . .
13.8.2 Adding a Global Mirror pair into a new volume group . . . . . . . . . . . . . . . . . . . .
371
372
372
372
373
373
374
374
375
375
377
381
381
383
385
385
388
393
395
398
400
404
404
411
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator . .
14.1 Planning for TrueCopy/HUR management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.1.1 Software prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.1.2 Minimum connectivity requirements for TrueCopy/HUR . . . . . . . . . . . . . . . . . .
14.1.3 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2 Overview of TrueCopy/HUR management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2.1 Installing the Hitachi CCI software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2.2 Overview of the CCI instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2.3 Creating and editing the horcm.conf files . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.3 Scenario description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.4 Configuring the TrueCopy/HUR resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.4.1 Assigning LUNs to the hosts (host groups). . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.4.2 Creating replicated pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.4.3 Configuring an AIX disk and dev_group association. . . . . . . . . . . . . . . . . . . . .
419
420
420
420
421
422
422
424
425
427
429
429
432
443
Contents
vii
14.4.4 Defining TrueCopy/HUR managed replicated resource to PowerHA . . . . . . . .
14.5 Failover testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.5.1 Graceful site failover for the Austin site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.5.2 Rolling site failure of the Austin site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.5.3 Site re-integration for the Austin site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.5.4 Graceful site failover for the Miami site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.5.5 Rolling site failure of the Miami site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.5.6 Site re-integration for the Miami site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.6 LVM administration of TrueCopy/HUR replicated pairs. . . . . . . . . . . . . . . . . . . . . . .
14.6.1 Adding LUN pairs to an existing volume group . . . . . . . . . . . . . . . . . . . . . . . . .
14.6.2 Adding a new logical volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.6.3 Increasing the size of an existing file system . . . . . . . . . . . . . . . . . . . . . . . . . .
14.6.4 Adding a LUN pair to a new volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . .
451
454
455
457
459
460
461
462
463
463
466
468
469
Appendix A. CAA cluster commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The lscluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The mkcluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The rmcluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The chcluster command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The clusterconf command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
477
478
478
480
480
481
Appendix B. PowerHA SMIT tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
Appendix C. PowerHA supported hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IBM Power Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IBM POWER5 systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IBM POWER6 systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IBM POWER7 Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IBM POWER Blade servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IBM storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fibre Channel adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Network-attached storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Serial-attached SCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fibre Channel adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
InfiniBand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SCSI and iSCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PCI bus adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
491
492
492
493
494
494
495
495
497
498
498
498
498
499
499
500
500
500
Appendix D. The clmgr man page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
519
519
519
520
520
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
viii
IBM PowerHA SystemMirror 7.1 for AIX
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs.
© Copyright IBM Corp. 2011. All rights reserved.
ix
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are
marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
AIX®
DB2®
Domino®
DS4000®
DS6000™
DS8000®
Enterprise Storage Server®
FileNet®
FlashCopy®
Global Technology Services®
HACMP™
IBM®
Lotus®
Power Systems™
POWER6®
POWER7®
PowerHA®
PowerVM®
POWER®
pureScale®
Redbooks®
Redbooks (logo)
solidDB®
System i®
System p®
System Storage®
Tivoli®
WebSphere®
XIV®
®
The following terms are trademarks of other companies:
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.
Snapshot, NetApp, and the NetApp logo are trademarks or registered trademarks of NetApp, Inc. in the U.S.
and other countries.
Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its
affiliates.
Java, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other
countries, or both.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel
SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its
subsidiaries in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
x
IBM PowerHA SystemMirror 7.1 for AIX
Preface
IBM® PowerHA® SystemMirror 7.1 for AIX® is a major product announcement for IBM in the
high availability space for IBM Power Systems™ Servers. This release now has a deeper
integration between the IBM high availability solution and IBM AIX. It features integration with
the IBM Systems Director, SAP Smart Assist and cache support, the IBM System Storage®
DS8000® Global Mirror support, and support for Hitachi storage.
This IBM Redbooks® publication contains information about the IBM PowerHA SystemMirror
7.1 release for AIX. This release includes fundamental changes, in particular departures from
how the product has been managed in the past, which has necessitated this Redbooks
publication.
This Redbooks publication highlights the latest features of PowerHA SystemMirror 7.1 and
explains how to plan for, install, and configure PowerHA with the Cluster Aware AIX
component. It also introduces you to PowerHA SystemMirror Smart Assist for DB2®. This
book guides you through migration scenarios and demonstrates how to monitor, test, and
troubleshoot PowerHA 7.1. In addition, it shows how to use IBM Systems Director for
PowerHA 7.1 and how to install the IBM Systems Director Server and PowerHA SystemMirror
plug-in. Plus, it explains how to perform disaster recovery using DS8700 Global Mirror and
Hitachi TrueCopy and Universal Replicator.
This publication targets all technical professionals (consultants, IT architects, support staff,
and IT specialists) who are responsible for delivering and implementing high availability
solutions for their enterprise.
The team who wrote this book
This book was produced by a team of specialists from around the world working at the
International Technical Support Organization (ITSO), Poughkeepsie Center.
Dino Quintero is a Project Leader and IT generalist with the ITSO in Poughkeepsie, NY. His
areas of expertise include enterprise continuous availability planning and implementation,
enterprise systems management, virtualization, and clustering solutions. He is currently an
Open Group Master Certified IT Specialist - Server Systems. Dino holds a Master of
Computing Information Systems degree and a Bachelor of Science degree in Computer
Science from Marist College.
Shawn Bodily is a Certified Consulting IT Specialist for Advanced Technical Support
Americas in Dallas, Texas. He has worked for IBM for 12 years and has 14 years of AIX
experience, with 12 years specializing in High-Availability Cluster Multi-Processing
(HACMP™). He is certified in both versions 4 and 5 of HACMP and ATE. He has written and
presented on high availability and storage. Shawn has coauthored five other Redbooks
publications.
Brandon Boles is a Development Support Specialist for PowerHA/HACMP in Austin, Texas.
He has been with IBM for four years and has been doing support, programming, and
consulting with PowerHA and HACMP for 11 years. Brandon has been working with AIX since
version 3.2.5.
© Copyright IBM Corp. 2011. All rights reserved.
xi
Bernhard Buehler is an IT Specialist for IBM in Germany. He is currently working for IBM
STG Lab Services in La Gaude, France. He has worked at IBM for 29 years and has 20 years
of experience in the AIX and availability field. His areas of expertise include AIX, PowerHA,
High Availability architecture, script programming, and AIX security. Bernhard has coauthored
several Redbooks publications and several courses in the IBM AIX curriculum.
Rajesh Jeyapaul is the technical lead for IBM Systems Director Power Server management.
His focus is on improving PowerHA SystemMirror, DB2 pureScale®, and the AIX Runtime
Expert plug-in for System Director. He has worked extensively with customers and
specialized in performance analysis under the IBM System p® and AIX environment. His
areas of expertise includes IBM POWER® Virtualization, high availability, and system
management. He has coauthored DS8000 Performance Monitoring and Tuning, SG24-7146,
and Best Practices for DB2 on AIX 6.1 for POWER Systems, SG24-7821. Rajesh holds a
Master in Software Systems degree from the University of BITS, India, and a Master of
Business Administration (MBA) degree from the University of MKU, India.
SangHee Park is a Certified IT Specialist in IBM Korea. He is currently working for IBM
Global Technology Services® in Maintenance and Technical Support. He has 5 years of
experience in Power Systems. His areas of expertise include AIX, PowerHA SystemMirror,
and PowerVM® Virtualization. SangHee holds a bachelor degree in aerospace and
mechanical engineering from Korea Aerospace University.
Minh Pham is currently a Development Support Specialist for PowerHA and HACMP in
Austin, Texas. She has worked for IBM for 10 years, including 6 years in System p
microprocessor development and 4 years in AIX development support. Her areas of expertise
include core and chip logic design for System p and AIX with PowerHA. Minh holds a
Bachelor of Science degree in Electrical Engineering from the University of Texas at Austin.
Matthew Radford is a Certified AIX Support Specialist in IBM UK. He is currently working for
IBM Global Technology Services in Maintenance and Technical Support. He has worked at
IBM for 13 years and is a member of the UKI Technical Council. His areas of expertise include
AIX, and PowerHA. Matthew coauthored Personal Communications Version 4.3 for Windows
95, 98 and NT, SG24-4689. Matthew holds a Bachelor of Science degree in Information
Technology from the University of Glamorgan.
Gus Schlachter is a Development Support Specialist for PowerHA in Austin, TX. He has
worked with HACMP for over 15 years in support, development, and testing. Gus formerly
worked for CLAM/Availant and is an IBM-certified Instructor for HACMP.
Stefan Velica is an IT Specialist who is currently working for IBM Global Technologies
Services in Romania. He has five years of experience in Power Systems. He is a Certified
Specialist for IBM System p Administration, HACMP for AIX, High-end and Entry/Midrange
DS Series, and Storage Networking Solutions. His areas of expertise include IBM System
Storage, PowerVM, AIX, and PowerHA. Stefan holds a bachelor degree in electronics and
telecommunications engineering from Politechnical Institute of Bucharest.
Fabiano Zimmermann is an AIX/SAN/TSM Subject Matter Expert for Nestlé in Phoenix,
Arizona. He has been working with AIX, High Availability and System Storage since 2000. A
former IBM employee, Fabiano has experience and expertise in the areas of Linux, DB2, and
Oracle. Fabiano is a member of the L3 team that provides worldwide support for the major
Nestlé data centers. Fabiano holds a degree in computer science from Brazil.
xii
IBM PowerHA SystemMirror 7.1 for AIX
Front row from left to right: Minh Pham, SangHee Park, Stefan Velica, Brandon Boles, and Fabiano
Zimmermann; back row from left to right: Gus Schlachter, Dino Quintero (project leader), Bernhard
Buehler, Shawn Bodily, Matt Radford, and Rajesh Jeyapaul
Thanks to the following people for their contributions to this project:
Bob Allison
Catherine Anderson
Chuck Coleman
Bill Martin
Darin Meyer
Keith O'Toole
Ashutosh Rai
Hitachi Data Systems
David Bennin
Ella Buslovich
Richard Conway
Octavian Lascu
ITSO, Poughkeepsie Center
Patrick Buah
Michael Coffey
Mark Gurevich
Felipe Knop
Paul Moyer
Skip Russell
Stephen Tovcimak
IBM Poughkeepsie
Eric Fried
Frank Garcia
Kam Lee
Gary Lowther
Deb McLemore
Ravi A. Shankar
Preface
xiii
Stephen Tee
Tom Weaver
David Zysk
IBM Austin
Nick Fernholz
Steven Finnes
Susan Jasinski
Robert G. Kovacs
William E. (Bill) Miller
Rohit Krishna Prasad
Ted Sullivan
IBM USA
Philippe Hermes
IBM France
Manohar R Bodke
Jes Kiran
Anantoju Srinivas
IBM India
Claudio Marcantoni
IBM Italy
Now you can become a published author, too!
Here's an opportunity to spotlight your skills, grow your career, and become a published
author—all at the same time! Join an ITSO residency project and help write a book in your
area of expertise, while honing your experience using leading-edge technologies. Your efforts
will help to increase product acceptance and customer satisfaction, as you expand your
network of technical contacts and relationships. Residencies run from two to six weeks in
length, and you can participate either in person or as a remote resident working from your
home base.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
[email protected]
xiv
IBM PowerHA SystemMirror 7.1 for AIX
Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Stay connected to IBM Redbooks
Find us on Facebook:
http://www.facebook.com/IBMRedbooks
Follow us on Twitter:
http://twitter.com/ibmredbooks
Look for us on LinkedIn:
http://www.linkedin.com/groups?home=&gid=2130806
Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks
weekly newsletter:
https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm
Stay current on recent Redbooks publications with RSS Feeds:
http://www.redbooks.ibm.com/rss.html
Preface
xv
xvi
IBM PowerHA SystemMirror 7.1 for AIX
1
Chapter 1.
PowerHA SystemMirror
architecture foundation
This chapter provides information about the new architecture of the IBM PowerHA
SystemMirror 7.1 for AIX. It includes the differences from previous versions.
This chapter includes the following topics:
Reliable Scalable Cluster Technology
Cluster Aware AIX
Cluster communication
PowerHA 7.1 SystemMirror plug-in for IBM Systems Director
For an introduction to high availability and IBM PowerHA SystemMirror 7.1, see the “IBM
PowerHA SystemMirror for AIX” page at:
http://www.ibm.com/systems/power/software/availability/aix/index.html
© Copyright IBM Corp. 2011. All rights reserved.
1
1.1 Reliable Scalable Cluster Technology
Reliable Scalable Cluster Technology (RSCT) is a set of software components that together
provide a comprehensive clustering environment for AIX, Linux, Solaris, and Microsoft
Windows. RSCT is the infrastructure used by various IBM products to provide clusters with
improved system availability, scalability, and ease of use.
This section provides an overview of RSCT, its components, and the communication paths
between these components. Several helpful IBM manuals, white papers, and Redbooks
publications are available about RSCT. This section focuses on the components that affect
PowerHA SystemMirror.
To find the most current documentation for RSCT, see the RSCT library in the IBM Cluster
Information Center at:
http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.
cluster.rsct.doc%2Frsctbooks.html
1.1.1 Overview of the components for Reliable Scalable Cluster Technology
RSCT has the following main components:
Topology Services
This component provides node and network failure detection.
Group Services
This component provides cross-node or process coordination on some cluster
configurations. For a detailed description about how Group Services work, see IBM
Reliable Scalable Cluster Technology: Group Services Programming Guide, SA22-7888,
at:
http://publibfp.boulder.ibm.com/epubs/pdf/a2278889.pdf
RSCT cluster security services
This component provides the security infrastructure that enables RSCT components to
authenticate the identity of other parties.
Resource Monitoring and Control (RMC) subsystem
This subsystem is the scalable, reliable backbone of RSCT. It runs on a single machine or
on each node (operating system image) of a cluster. Also, it provides a common
abstraction for the resources of the individual system or the cluster of nodes. You can use
RMC for single system monitoring or for monitoring nodes in a cluster. However, in a
cluster, RMC provides global access to subsystems and resources throughout the cluster.
Therefore, it provides a single monitoring and management infrastructure for clusters.
Resource managers
A resource manager is a software layer between a resource (a hardware or software entity
that provides services to some other component) and RMC. A resource manager maps
programmatic abstractions in RMC into the actual calls and commands of a resource.
For a more detailed description of the RSCT components, see the IBM Reliable Scalable
Cluster Technology: Administration Guide, SA22-7889, at the following web address:
http://publibfp.boulder.ibm.com/epubs/pdf/22788919.pdf
2
IBM PowerHA SystemMirror 7.1 for AIX
1.1.2 Architecture changes for RSCT 3.1
RSCT version 3.1 is the first version that supports Cluster Aware AIX (CAA). Although this
section provides a high-level introduction to the RSCT architecture changes to support, you
can find more details about CAA in 1.2, “Cluster Aware AIX” on page 7.
As shown in Figure 1-1 on page 3, RSCT 3.1 can operate without CAA in “non-CAA” mode.
You use the non-CAA mode if you use one of the following products:
PowerHA versions before PowerHA 7.1
A mixed cluster with PowerHA 7.1 and older PowerHA versions
Existing RSCT Peer Domains (RPD) that were created before RSCT 3.1 was installed
A new RPD, when you specify during creation that the system must not use or create a
CAA cluster
Figure 1-1 shows both modes in which RSCT 3.1 can be used (with or without CAA). The left
part shows the non-CAA mode, which is equal to the older RSCT versions. The right part
shows the CAA-based mode. The difference between these modes is that Topology Services
has been replaced with CAA.
Important: On a given node, use only one RSCT version at a time.
RSCT with CAA
RSCT without CAA
RSCT
Group
Services
(grpsvcs)
Resource
Manager
Resource
Monitoring and Control
Resource
Monitoring and Control
Resource
Manager
RSCT
Group
Services
(cthags)
Topology
Services
AIX
AIX
CAA
Figure 1-1 RSCT 3.1
RSCT 3.1 is available for both AIX 6.1 and AIX 7.1. To use CAA, for RSCT 3.1 on AIX 6.1, you
must have TL 6 or later installed.
CAA on AIX 6.1 TL 6: The use of CAA on AIX 6.1 TL 6 is enabled only for PowerHA 7.1.
Chapter 1. PowerHA SystemMirror architecture foundation
3
Figure 1-2 shows a high-level architectural view of how IBM high availability (HA) applications
PowerHA, IBM Tivoli® System Automation for Multiplatforms, and Virtual I/O Server (VIOS)
Clustered Storage use the RSCT and CAA architecture.
Figure 1-2 HA applications using the RSCT and CAA architecture
4
IBM PowerHA SystemMirror 7.1 for AIX
1.1.3 PowerHA and RSCT
Figure 1-3 shows the non-CAA communication paths between PowerHA and RSCT.
Non-CAA mode is still used when you have a PowerHA version 6.1 or earlier, even if you are
using AIX 7.1.
The main communication goes from PowerHA to Group Services (grpsvcs), then to Topology
Services (topsvcs), and back to PowerHA. The communication path from PowerHA to RMC is
used for PowerHA Process Application Monitors. Another case where PowerHA uses RMC is
when a resource group is configured with the Dynamic Node Priority policy.
Figure 1-3 PowerHA using RSCT without CAA
Chapter 1. PowerHA SystemMirror architecture foundation
5
Figure 1-4 shows the new CAA-based communication paths of PowerHA, RSCT, and CAA.
You use this architecture when you have PowerHA v7.1 or later. It is the same architecture for
AIX 6.1 TL 6 and AIX 7.1 or later. As in the previous architecture, the main communication
goes from PowerHA to Group Services. However, in Figure 1-4, Group Services
communicates with CAA.
Figure 1-4 RSCT with Cluster Aware AIX (CAA)
Example 1-1 lists the cluster processes on a running PowerHA 7.1 cluster.
Group Services subsystem name: Group Services now uses the subsystem name
cthags, which replaces grpsvcs. Group Services is now started with a different control
script (cthags) and in turn from a different subsystem name cthags.
Example 1-1 Output of lssrc
# lssrc -a | egrep "rsct|ha|svcs|caa|cluster" | grep -v _rm
cld
caa
4980920
active
clcomd
caa
4915400
active
clconfd
caa
5243070
active
cthags
cthags
4456672
active
ctrmc
rsct
5767356
active
clstrmgrES
cluster
10813688
active
solidhac
caa
10420288
active
solid
caa
5832836
active
clevmgrdES
cluster
5177370
active
clinfoES
cluster
11337972
active
ctcas
rsct
inoperative
topsvcs
topsvcs
inoperative
6
IBM PowerHA SystemMirror 7.1 for AIX
grpsvcs
grpglsm
emsvcs
emaixos
grpsvcs
grpsvcs
emsvcs
emsvcs
inoperative
inoperative
inoperative
inoperative
1.2 Cluster Aware AIX
Cluster Aware AIX introduces fundamental clustering capabilities into the base operating
system AIX. Such capabilities include the creation and definition of the set of nodes that
comprise the cluster. CAA provides the tools and monitoring capabilities for the detection of
node and interface health.
File sets: CAA is provided by the non-PowerHA file sets bos.cluster.rte, bos.ahafs, and
bos.cluster.solid. The file sets are on the AIX Install Media or in the TL6 of AIX 6.1.
More information: For more information about CAA, see Cluster Management,
SC23-6779, and the IBM AIX Version 7.1 Differences Guide, SG24-7910.
CAA provides a set of tools and APIs to enable clustering on the AIX operating system. CAA
does not provide the application monitoring and resource failover capabilities that PowerHA
provides. PowerHA uses the CAA capabilities. Other applications and software programs can
use the APIs and command-line interfaces (CLIs) that CAA provides to make their
applications and services “Cluster Aware” on the AIX operating system.
Figure 1-2 on page 4 illustrates how applications can use CAA. The following products and
parties can use CAA technology:
RSCT (3.1 and later)
PowerHA (7.1 and later)
VIOS (CAA support in a future release)
Third-party ISVs, service providers, and software products
CAA provides the following features among others:
Central repository
– Configuration
– Security
Quorumless (CAA does not require a quorum to be up and operational.)
Monitoring capabilities for custom actions
Fencing aids
– Network
– Storage
– Applications
The following sections explain the concepts of the CAA central repository, RSCT changes,
and how PowerHA 7.1 uses CAA.
Chapter 1. PowerHA SystemMirror architecture foundation
7
1.2.1 CAA daemons
When CAA is active in your cluster, you notice the daemon services running as shown in
Figure 1-5.
chile:/ # lssrc -g caa
Subsystem
Group
clcomd
caa
cld
caa
solid
caa
clconfd
caa
solidhac
caa
PID
4849670
7012500
11010276
7340038
10027064
Status
active
active
active
active
active
Figure 1-5 CAA services
CAA includes the following services:
clcomd
This daemon is the cluster communications daemon, which has changed in
PowerHA 7.1. In previous versions of PowerHA, it was called clcomdES. The
location of the rhosts file that PowerHA uses has also changed. The rhosts file
used by the clcomd service is in the /etc/cluster/rhosts directory. The old
clcomdES rhosts file in the /usr/es/sbin/cluster/etc directory is not used.
cld
The cld daemon runs on each node and determines whether the local node
must be the primary or the secondary solidDB® database server.
solid
The solid subsystem provides the database engine, and solidHAC is used for
high availability of the IBM solidDB database. Both run on the primary and the
secondary database servers.
In a two-node cluster, the primary database is mounted on node 1
(/clrepos_private1), and the secondary database is mounted on node 2
(/clrepos_private2). These nodes have the solid and solidHAC subsystems
running.
In a three-node cluster configuration, the third node acts as a standby for the
other two nodes. The solid subsystem (solid and solidHAC) is not running,
and the file systems (/clrepos_private1 and /clrepos_private2) are not
mounted.
If a failure occurs on the primary or secondary nodes of the cluster, the third
node activates the solid subsystem. It mounts either the primary or secondary
file system, depending on the node that has failed. See 1.2.3, “The central
repository” on page 9, for information about file systems.
clconfd
The clconfd subsystem runs on each node of the cluster. The clconfd daemon
wakes up every 10 minutes to synchronize any necessary cluster changes.
1.2.2 RSCT changes
IBM PowerHA now uses CAA, instead of RSCT, to handle the cluster topology, including
heartbeating, configuration information, and live notification events. PowerHA still
communicates with RSCT Group Services (grpsvcs replaced by cthags), but PowerHA has
replaced the topsvcs function with the new CAA function. CAA reports the status of the
8
IBM PowerHA SystemMirror 7.1 for AIX
topology to cthags, by using Autonomic Health Advisory File System API (AHAFS) events,
which are fed up to cthagsrhosts.
For information about the RSCT changes, see 1.1.2, “Architecture changes for RSCT 3.1” on
page 3.
1.2.3 The central repository
A major part of CAA is the central repository. The central repository is stored on a dedicated
storage area network (SAN) disk that is shared between all participating nodes. This
repository contains the following structures:
Bootstrap repository (BSR)
LV1, LV2, LV3 (private LVs)
solidDB (primary location (/clrepos_private1) and secondary location
(/clrepos_private2))
CAA repository disk: The CAA repository disk is reserved for use by CAA only. Do not
attempt to change any of it. The information in this chapter is provided for information only
to help you understand the purpose of the new disk and file system structure.
Figure 1-6 shows an overview of the CAA repository disk and its structure.
Figure 1-6 Cluster repository disk structure
If you installed and configured PowerHA 7.1, your cluster repository disk is displayed as
varied on (active) in lspv output as shown in Figure 1-7 on page 10. In this figure, the disk
label has changed to caa_private0 to remind you that this disk is for private use by CAA only.
Figure 1-7 on page 10 also shows a volume group, called caavg_private, which must always
be varied on (active) when CAA is running. CAA is activated when PowerHA 7.1 is installed
Chapter 1. PowerHA SystemMirror architecture foundation
9
and configured. If you are performing a migration or have an earlier level of PowerHA
installed, CAA is not active.
If you have a configured cluster and find that caavg_private is not varied on (active), your
CAA cluster has a potential problem. See Chapter 10, “Troubleshooting PowerHA 7.1” on
page 305, for guidance about recovery in this situation.
chile:/ # lspv
hdisk1
caa_private0
hdisk3
hdisk4
hdisk5
hdisk6
hdisk7
hdisk8
hdisk0
000fe4114cf8d1ce
000fe40163c54011
000fe4114cf8d2ec
000fe4114cf8d3a1
000fe4114cf8d441
000fe4114cf8d4d5
000fe4114cf8d579
000fe4114cf8d608
000fe40140a5516a
None
caavg_private
None
diskhb
None
None
None
ny_datavg
rootvg
active
active
Figure 1-7 lspv command showing the caa_private repository disk
You can view the structure of caavg_private from the standpoint of a Logical Volume
Manager (LVM) as shown in Figure 1-8. The lsvg command shows the structure of the file
system.
chile:/ # lsvg -l caavg_private
caavg_private:
LV NAME
TYPE
LPs
caalv_private1
boot
1
caalv_private2
boot
1
caalv_private3
boot
4
fslv00
jfs2
4
/clrepos_private1
fslv01
jfs2
4
/clrepos_private2
powerha_crlv
boot
1
PPs
1
1
4
4
PVs
1
1
1
1
LV STATE
closed/syncd
closed/syncd
open/syncd
open/syncd
4
1
closed/syncd
1
1
closed/syncd
MOUNT POINT
N/A
N/A
N/A
N/A
Figure 1-8 The lsvg output of CAA
This file system has a special reserved structure. CAA mounts some file systems for its own
use as shown in Figure 1-9 on page 11. The fslv00 file system contains the solidDB
database mounted as /clrepos_private1 because the node is the primary node of the
cluster. If you look at the output for the second node, you might have /clrepos_private2
mounted instead of /clrepos_private1. See 1.2.1, “CAA daemons” on page 8, for an
explanation of the solid subsystem.
Important: CAA creates a file system for solidDB on the default lv name (fslv00, fslv01).
If you have a default name of lv for existing file systems that is outside of CAA, ensure that
both nodes have the same lv names. For example, if node A has the names fslv00,
fslv01, and fslv02, node B must have the same names. You must not have any default lv
names in your cluster nodes so that CAA can use fslv00, fslv01 for the solidDB.
Also a /aha, which is a special pseudo file system, is mounted in memory and used by the
AHAFS. See “Autonomic Health Advisor File System” on page 11 for more information.
10
IBM PowerHA SystemMirror 7.1 for AIX
Important: Do not interfere with this volume group and its file systems. For example,
forcing a umount of /aha on a working cluster causes the node to halt.
For more information about CAA, see Cluster Management, SC23-6779, at the following web
address:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.clusteraware/c
lusteraware_pdf.pdf
1.2.4 Cluster event management
With PowerHA 7.1, event management is handled by using a new pseudo file-system
architecture called the Autonomic Health Advisor File System. With this pseudo file system,
application programming interfaces (APIs) can program the monitoring of events by reading
and writing events to the file system.
Autonomic Health Advisor File System
The AHAFS is part of the AIX event infrastructure for AIX and AIX clusters and is what CAA
uses as its monitoring framework. The AHAFS file system is automatically mounted when you
create the cluster (Figure 1-9).
chile:/ # mount
node
mounted
-------- --------------/dev/hd4
/dev/hd2
/dev/hd9var
/dev/hd3
/dev/hd1
/dev/hd11admin
/proc
/dev/hd10opt
/dev/livedump
rw,log=/dev/hd8
/aha
/dev/fslv00
rw,dio,log=INLINE
mounted over
vfs
date
options
--------------- ------ ------------ --------------/
jfs2
Sep 30 13:37 rw,log=/dev/hd8
/usr
jfs2
Sep 30 13:37 rw,log=/dev/hd8
/var
jfs2
Sep 30 13:37 rw,log=/dev/hd8
/tmp
jfs2
Sep 30 13:37 rw,log=/dev/hd8
/home
jfs2
Sep 30 13:38 rw,log=/dev/hd8
/admin
jfs2
Sep 30 13:38 rw,log=/dev/hd8
/proc
procfs Sep 30 13:38 rw
/opt
jfs2
Sep 30 13:38 rw,log=/dev/hd8
/var/adm/ras/livedump jfs2
Sep 30 13:38
/aha
ahafs Sep 30 13:46 rw
/clrepos_private1 jfs2
Sep 30 13:52
Figure 1-9 AHAFS file system mounted
Event handling entails the following process:
1. Create a monitor file based on the /aha directory.
2. Write the required information to the monitor file to represent the wait type (either a
select() call or a blocking read() call). Indicate when to trigger the event, such as a state
change of node down.
3. Wait in a select() call or a blocking read() call.
4. Read from the monitor file to obtain the event data. The event data is then fed to Group
Services.
The event information is retrieved from CAA, and any changes are communicated by using
AHAFS events. RSCT Group Services uses the AHAFS services to obtain events on the
Chapter 1. PowerHA SystemMirror architecture foundation
11
cluster. This information is provided by cluster query APIs and is fed to Group Services.
Figure 1-10 shows a list of event monitor directories.
drwxrwxrwt
1 root
drwxrwxrwt
1 root
drwxrwxrwt
1 root
drwxrwxrwt
1 root
drwxrwxrwt
1 root
drwxrwxrwt
1 root
chile:/aha/cluster #
system
system
system
system
system
system
0
1
1
0
1
1
Oct
Oct
Oct
Oct
Oct
Oct
1
1
1
1
1
1
17:04
17:04
17:04
17:04
17:04
17:04
linkedCl.monFactory
networkAdapterState.monFactory
nodeAddress.monFactory
nodeContact.monFactory
nodeList.monFactory
nodeState.monFactory
Figure 1-10 Directory listing of /aha/cluster
The AHAFS files used in RSCT
The following AHAFS event files are used in RSCT:
Node state, such as NODE_UP or NODE_DOWN
/aha/cluster/nodeState.monFactory/nodeStateEvent.mon
Node configuration, such as node added or deleted
/aha/cluster/nodeList.monFactory/nodeListEvent.mon
Adapter state, such as ADAPTER_UP or ADAPTER_DOWN and interfaces added or deleted
/aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon
Adapter configuration
/aha/cluster/nodeAddress.monFactory/nodeAddressEvent.mon
Process exit (Group Services daemon), such as PROCESS_DOWN
/aha/cpu/processMon.monFactory/usr/sbin/rsct/bin/hagsd.mon
Example of a NODE_DOWN event
A NODE_DOWN event is written to the nodeStateEvent.mon file in the nodeState.monFactory
directory. A NODE_DOWN event from the nodeStateEvent.mon file is interpreted as “a given node
has failed.” In this situation, the High Availability Topology Services (HATS) API generates an
Hb_Death event on the node group.
Example of a network ADAPTER_DOWN event
If a network adapter failure occurs, an ADAPTER_DOWN event is generated in the
networkAdapterStateEvent.mon file. This event is interpreted as “a given network interface
has failed.” In this situation, the HATS API generates an Hb_Death event on the adapter group.
Example of Group Services daemon failure
When you get a PROCESS_DOWN event because of a failure in Group Services, the event is
generated in the hagsd.mon file. This event is treated as a NODE_DOWN event, which is similar to
pre-CAA behavior. No PROCESS_UP event exists because, when the new Group Services
daemon is started, it broadcasts a message to peer daemons.
Filtering duplicated or invalid events
AHAFS handles duplicate or invalid events. For example, if a NODE_DOWN event is generated for
a node that is already marked as down, the event is ignored. The same applies for “up” events
and adapter events. Node events for local nodes are also ignored.
12
IBM PowerHA SystemMirror 7.1 for AIX
1.3 Cluster communication
Cluster Aware AIX indicates which nodes are in the cluster and provides information about
these nodes including their state. A special “gossip” protocol is used over the multicast
address to determine node information and implement scalable reliable multicast. No
traditional heartbeat mechanism is employed. Gossip packets travel over all interfaces. The
communication interfaces can be traditional networking interfaces (such as an Ethernet) and
storage fabrics (SANs with Fibre Channel, SAS, and so on). The cluster repository disk can
also be used as a communication device.
Gossip protocol: The gossip protocol determines the node configuration and then
transmits the gossip packets over all available networking and storage communication
interfaces. If no storage communication interfaces are configured, only the traditional
networking interfaces are used. For more information, see “Cluster Aware concepts” at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.clusterawar
e/claware_concepts.htm
1.3.1 Communication interfaces
The highly available cluster has several communication mechanisms. This section explains
the following interface concepts:
IP network interfaces
SAN-based communication (SFWCOM) interface
Central cluster repository-based communication (DPCOM) interface
Output of the lscluster -i command
The RESTRICTED and AIX_CONTROLLED interface state
Point of contact
IP network interfaces
IBM PowerHA communicates over available IP interfaces using a multicast address. PowerHA
use all IP interfaces that are configured with an address and are in an UP state as long as
they are reachable across the cluster.
PowerHA SystemMirror management interfaces: PowerHA SystemMirror and Cluster
Aware for AIX use all network interfaces that are available for cluster communication. All of
these interfaces are discovered by default and are used for health management and other
cluster communication. You can use the PowerHA SystemMirror management interfaces to
remove any interface that you do not want to be used for application availability. For
additional information, see “Cluster communication” topic in the AIX 7.1 Information Center
at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.
clusteraware/claware_comm_benifits.htm
Cluster communication requires the use of a multicast IP address. You can specify this
address when you create the cluster, or you can have one generated automatically when you
synchronize the initial cluster configuration.
Chapter 1. PowerHA SystemMirror architecture foundation
13
Cluster topology configuration on sydney node: The following PowerHA cluster
topology is configured from by using smitty sysmirror on the sydney node:
NODE perth:
Network ether01
perthb2 192.168.201.136
perth 192.168.101.136
NODE sydney:
Network ether01
sydneyb2 192.168.201.135
sydney 192.168.101.135
A default multicast address of 228.168.101.135 is generated for the cluster. PowerHA
takes the IP address of the node and changes its most significant part to 228 as shown in
the following example:
x.y.z.t -> 228.y.z.t
An overlap of the multicast addresses might be generated by default in the case of two
clusters with interfaces in the same virtual LAN (VLAN). This occurs when their IP
addresses are similar to the following example:
x1.y.z.t
x2.y.z.t
The netmon.cf configuration file is not required with CAA and PowerHA 7.1.
The range 224.0.0.0–224.0.0.255 is reserved for local purposes, such as administrative and
maintenance tasks. The data that they receive is never forwarded by multicast routers.
Similarly, the range 239.0.0.0–239.255.255.255 is reserved for administrative scooping.
These special multicast groups are regularly published in the assigned numbers RFC.1
If multicast traffic is present in the adjacent network, you must ask the network administrator
for multicast IP address allocation for your cluster. Also, ensure that the multicast traffic
generated by any of the cluster nodes is properly forwarded by the network infrastructure
toward the other cluster nodes. The Internet Group Management Protocol (IGMP) must be
enabled.
Interface states
Network interfaces can have any of the following common states. You can see the interface
state in the output of the lscluster -i command, as shown in Example 1-2 on page 16.
UP
The interface is up and active.
STALE
The interface configuration data is stale, which happens when
communication has been lost, but was previously up at some point.
DOWN SOURCE HARDWARE RECEIVE / SOURCE HARDWARE TRANSMIT
The interface is down because of a failure to receive or transmit, which
can happen in the event of a cabling problem.
DOWN SOURCE SOFTWARE
The interface is down in AIX software only.
SAN-based communication (SFWCOM) interface
Redundant high-speed communication channels can be established between the hosts
through the SAN fabric. To use this communication path, you must complete additional setup
1
14
http://tools.ietf.org/html/rfc3171
IBM PowerHA SystemMirror 7.1 for AIX
for the Fibre Channel (FC) adapters. Configure the server FC ports in the same zone of the
SAN fabric, and set their Target Mode Enable (tme) attribute to yes. Then enable the dynamic
tracking and fast fail. The SAS adapters do not require special setup. Based on this setup, the
CAA Storage Framework provides a SAN-based heartbeat. This heartbeat is an effective
replacement for all the non-IP heartbeat mechanisms used in earlier releases.
Enabling SAN fiber communication: To enable SAN fiber communication for cluster
communication, you must configure the Target Mode Enable attribute for FC adapters. See
Example 4-4 on page 57 for details.
Configure your cluster in an environment that supports SAN fabric-based communication.
This approach provides another channel of redundancy to help reduce the risk of getting a
partitioned (split) cluster.
The Virtual SCSI (VSCSI) SAN heartbeat depends on VIOS 2.2.0.11-FP24 SP01.
Interface state
The SAN-based communication (SFWCOM) interface has one state available, the UP state.
The UP state indicates that the SFWCOM interface is active. You can see the interface state
in the output of the lscluster -i command as shown in Example 1-2 on page 16.
Unavailable SAN fiber communication: When SAN fiber communication is unavailable,
the SFWCOM section is not listed in the output of the lscluster -i command. A DOWN
state is not shown.
Central cluster repository-based communication (DPCOM) interface
Heartbeating and other cluster messaging are also achieved through the central repository
disk. The repository disk is used as another redundant path of communication between the
nodes. A portion of the repository disk is reserved for node-to-node heartbeat and message
communication. This form of communication is used when all other forms of communication
have failed. The CAA Storage Framework provides a heartbeat through the repository disk,
which is only used when IP or SAN heartbeating no longer works.
When the underlying hardware infrastructure is available, you can proceed with the PowerHA
cluster topology configuration. The heartbeat starts right after the first successful “Verify and
Synchronization” operation, when the CAA cluster is created and activated by the PowerHA.
Interface states
The Central cluster repository-based communication (DPCOM) interface has the following
available states. You can see the interface state in the output of the lscluster -i command,
which is shown in Example 1-2.
UP AIX_CONTROLLED
Indicates that the interface is UP, but under AIX control. The user
cannot change the status of this interface.
UP RESTRICTED AIXCONTROLLED
Indicates that the interface is UP and under AIX system control, but is
RESTRICTED from monitoring mode.
STALE
The interface configuration data is stale. This state occurs when
communication is lost, but was up previously at some point.
Chapter 1. PowerHA SystemMirror architecture foundation
15
Output of the lscluster -i command
Example 1-2 shows the output from the lscluster -i command. The output shows the
interfaces and the interface states as explained in the previous sections.
Example 1-2 The lscluster -i output for one node
lscluster -i
Network/Storage Interface Query
Cluster Name: au_cl
Cluster uuid: d77ac57e-cc1b-11df-92a4-00145ec5bf9a
Number of nodes reporting = 2
Number of nodes expected = 2
Node sydney
Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9a
Smoothed rrt across interface = 8
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 110 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmas
k 255.255.252.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netm
ask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9b
Smoothed rrt across interface = 8
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 110 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255 netmas
k 255.255.252.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netm
ask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 0
Mean Deviation in network rrt across interface = 0
16
IBM PowerHA SystemMirror 7.1 for AIX
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node perth
Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d9
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255 netmas
k 255.255.252.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netm
ask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d8
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255 netmas
k 255.255.252.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0 netm
ask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Chapter 1. PowerHA SystemMirror architecture foundation
17
Smoothed rrt across interface = 0
Mean Deviation in network rrt across interface = 0
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
The RESTRICTED and AIX_CONTROLLED interface state
When the network and storage interfaces in the cluster are active and available, the cluster
repository disk appears as restricted and controlled by AIX. (The restricted term identifies the
disk as “not currently used.”) In the output from the lscluster commands, the term dpcom is
used for the cluster repository disk as a communication device and is initially noted as UP
RESTRICTED AIX_CONTROLLED.
When the system determines that the node has lost the normal network or storage interfaces,
the system activates (unrestrict) the cluster repository disk interface (dpcom) and begins
using it for communications. At this point, the interface state changes to UP AIX_CONTROLLED
(unrestricted, but still system controlled).
Point of contact
The output of the lscluster -m command shows a reference to a point of contact as shown in
Example 1-3 on page 19. The local node is displayed as N/A, and the remote node is
displayed as en0 UP. CAA monitors the state and points of contact between the nodes for both
communication interfaces.
A point of contact indicates that a node has received a packet from the other node over the
interface. The point-of-contact status UP indicates that the packet flow is continuing. The
point-of-contact monitor tracks the number of UP points of contact for each communication
interface on the node. If this count reaches zero, the interface is marked as reachable through
the cluster repository disk only.
1.3.2 Communication node status
The node communication status is indicated by the State of Node value in the lscluster -m
command output (Example 1-3 on page 19). The cluster node can have the following
communication states:
UP
Indicates that the node is up.
UP NODE_LOCAL
Indicates that the node is up and is the local node in the cluster.
UP NODE_LOCAL REACHABLE THROUGH REPOS DISK ONLY
Indicates that the local node is up, but that it is reachable through the
repository disk only.
18
IBM PowerHA SystemMirror 7.1 for AIX
When a node can only communicate by using the cluster repository
disk, the output from the lscluster command notes it as REACHABLE
THROUGH REPOS DISK ONLY.
When the normal network or storage interfaces become available
again, the system automatically detects the restoration of
communication interfaces, and again places dpcom in the restricted
state. See “The RESTRICTED and AIX_CONTROLLED interface
state” on page 18.
UP REACHABLE THROUGH REPOS DISK ONLY
Indicates that the local node is up. It is reachable through the
repository disk only, but not through a local node.
DOWN
Indicates that the node is down. If the node does not have access to
the cluster repository disk, the node is marked as down.
Example 1-3 The lscluster -m output
lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2
Node name: chile
Cluster shorthand id for node: 1
uuid for node: 7067c3fa-ca95-11df-869b-a2e310452004
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
newyork
local
5f2f5d38-cd78-11df-b986-a2e310452003
Number of points_of_contact for node: 0
Point-of-contact interface & contact state
n/a
-----------------------------Node name: serbia
Cluster shorthand id for node: 2
uuid for node: 8a5e2768-ca95-11df-8775-a2e312537404
State of node: UP
Smoothed rtt to node: 7
Mean Deviation in network rtt to node: 3
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
newyork
local
5f2f5d38-cd78-11df-b986-a2e310452003
Number of points_of_contact for node: 1
Point-of-contact interface & contact state
en0 UP
Chapter 1. PowerHA SystemMirror architecture foundation
19
Interface up, point of contact down: This phrase means that an interface might be up but
a point-of-contact might be down. In this state, no packets are received from the other
node.
1.3.3 Considerations for the heartbeat configuration
In previous versions of PowerHA, the heartbeat configuration was necessary to configure a
non-IP heartbeat configuration, such as disk-based heartbeating. PowerHA 7.1 no longer
supports disk-based heartbeat monitoring. CAA uses all available interfaces to perform
heartbeat monitoring, including the Repository Disk-based and SAN fiber heartbeat
monitoring methods. Both types of heartbeat monitoring are replacements for the previous
non-IP heartbeat configuration. The cluster also performs heartbeat monitoring, similar to
how it use to perform it, across all available network interfaces.
Heartbeat monitoring is performed by sending and receiving gossip packets across the
network with the multicast protocol. CAA uses heartbeat monitoring to determine
communication problems that need to be reflected in the cluster information.
1.3.4 Deciding when a node is down: Round-trip time (rtt)
CAA monitors the interfaces of each node by using the multicast protocol and gossip packets.
Gossip packets are periodically sent from each node in the cluster for timing purposes. These
gossip packets are automatically replied to by the other nodes of the cluster. The packet
exchanges are used to calculate the round-trip time.
The round-trip time value is shown in the output of the lscluster -i and lscluster -m
commands. The mean deviation in network rtt is the average round-trip time, which is
automatically managed by CAA. Unlike previous versions of PowerHA and HACMP, no
heartbeat tuning is necessary. See Example 1-2 on page 16 and Figure 1-11 for more
information.
Smoothed rtt to node:7
Mean Deviation in network rtt to node: 3
Figure 1-11 Extract from the lscluster -m command output showing the rtt values
Statistical projections are directly employed to compute node-down events. By using normal
network dropped packet rates and the projected round-trip times with mean deviations, the
cluster can determine when a packet was lost or not sent. Each node monitors the time when
a response is due from other nodes in the cluster. If a node finds that a node is overdue, a
node down protocol is initiated in the cluster to determine if the node is down or if network
isolation has occurred.
This algorithm is self-adjusting to load and network conditions, providing a highly reliable and
scalable cluster. Expected round-trip times and variances rise quickly when load conditions
cause delays. Such delays cause the system to wait longer before setting a node down state.
Such a state provides for a high probability of valid state information. (Quantitative
probabilities of errors can be computed.) Conversely, expected round-trip times and variances
fall quickly when delays return to normal.
The cluster automatically adjusts to variances in latency and bandwidth characteristics of
various network and storage interfaces.
20
IBM PowerHA SystemMirror 7.1 for AIX
1.4 PowerHA 7.1 SystemMirror plug-in for IBM Systems
Director
PowerHA SystemMirror provides a plug-in to IBM Systems Director, giving you a graphical
user interface to manage a cluster. This topic includes the following sections:
Introduction to IBM Systems Director
Advantages of using IBM Systems Director
Basic architecture
1.4.1 Introduction to IBM Systems Director
IBM Systems Director provides systems management personnel with a
single-point-of-control, helping to reduce IT management complexity and cost. With IBM
Systems Director, IT personnel can perform the following tasks:
Optimize computing and network resources
Quickly respond to business requirements with greater delivery flexibility
Attain higher levels of services management with streamlined management of physical,
virtual, storage, and network resources
A key feature of IBM Systems Director is a consistent user interface with a focus on driving
common management tasks. IBM Systems Director provides a unified view of the total IT
environment, including servers, storage, and network. With this view, users can perform tasks
with a single tool, IBM Systems Director.
1.4.2 Advantages of using IBM Systems Director
IBM Systems Director offers the following advantages:
A single, centralized view into all PowerHA SystemMirror clusters
– Centralized and secure access point
Everyone logs in to the same machine, simplifying security and providing an audit trail
of user activities.
– Single sign-on (SSO) capability
After the initial setup is done, using standard Director mechanisms, the password of
each individual node being managed no longer must be provided. Customers log in to
the Director server by using their account on that one machine only and have access to
all PowerHA clusters under management by that server.
Two highly accessible interfaces
– Graphical interface
The GUI helps to explain and show relationships. It also guides customers through the
learning phase, improving their chances of success with Systems Director.
•
•
•
•
Instant and nearly instant help for just about everything
Maximum, interactive assistance with many tasks
Maximum error checking
SystemMirror enterprise health summary
Chapter 1. PowerHA SystemMirror architecture foundation
21
– Textual interface
As with all IBM Systems Director plug-ins, the textual interface (also known as the CLI)
is available through the smcli utility of IBM Systems Director. The namespace (which is
not needed) is sysmirror, for example smcli sysmirror help.
•
•
Maximum speed
Centralized, cross-cluster scripting
A common, IBM unified interface (learn once, manage many)
More IBM products are now plugging into Systems Director. Although each individual
plug-in is different, the common framework around each on remains the same, reducing
the education burden of customers. Another benefit is in the synergies that might be used
by having multiple products all sharing a common data store on the IBM Systems Director
server.
To learn more about the advantages of IBM Systems Director, see the PowerHA 7.1
presentation by Peter Schenke at:
http://www-05.ibm.com/ch/events/systems/pdf/6_PowerHA_7_1_News.pdf
1.4.3 Basic architecture
Figure 1-12 shows the basic architecture of IBM Systems Director for PowerHA. IBM Systems
Director is used to quickly and easily scan subnets to find and load AIX systems. When these
systems are unlocked (when the login ID and password are provided), and if PowerHA is
installed on any of these systems, they are automatically discovered and loaded by the
plug-ins.
Three-tier architecture provides scalability:
User Interface
Management Server
Director Agent
User Interface
Director Agent
Web-based interface
Command-line
interface
Automatically installed on AIX 7.1 & AIX V6.1 TL06
AIX
P
D
P
D
P
D
P
D
PowerHA
Director
Agent
Secure communication
P
D
P
D
P
D
Director Server
Discovery of clusters
and resources
Figure 1-12 Basic architecture of IBM Systems Director for PowerHA
22
IBM PowerHA SystemMirror 7.1 for AIX
Central point of control
Supported on AIX, Linux,
and Windows
Agent manager
2
Chapter 2.
Features of PowerHA
SystemMirror 7.1
This chapter explains which previously supported features of PowerHA SystemMirror have
been removed. It also provides information about the new features in PowerHA SystemMirror
Standard Edition 7.1 for AIX.
This chapter includes the following topics:
Deprecated features
New features
Changes to the SMIT panel
The rootvg system event
Resource management enhancements
CLUSTER_OVERRIDE environment variable
CAA disk fencing
PowerHA SystemMirror event flow differences
© Copyright IBM Corp. 2011. All rights reserved.
23
2.1 Deprecated features
PowerHA SystemMirror 7.1 has removed support for the following previously available
features:
IP address takeover (IPAT) via IP replacement
Locally administered address (LAA) for hardware MAC address takeover (HWAT)
Heartbeat over IP aliases
clcomdES with the /usr/es/sbin/cluster/etc/rhosts directory is replaced by the Cluster
Aware AIX (CAA) clcomd with the /etc/cluster/rhosts directory
The following IP network types:
– ATM
– FDDI
– Token Ring
The following point-to-point (non-IP) network types:
–
–
–
–
–
RS232
TMSCSI
TMSSA
Disk heartbeat (diskhb)
Multi-node disk heartbeat (mndhb)
Two-node configuration assistant
WebSMIT (replaced with the IBM Systems Director plug-in)
Site support in this version
– Cross-site Logical Volume Manager (LVM) mirroring (available in PowerHA 7.1 SP3)
IPV6 support in this version
IP address takeover via IP aliasing is now the only supported IPAT option. SAN heartbeat,
provided by the CAA repository disk, and FC heartbeat, as described in the following section,
have replaced all point-to-point (non-IP) network types.
2.2 New features
The new version of PowerHA uses much simpler heartbeat management. This method uses
multicasting, which reduces the burden on the customer to define aliases for heartbeat
monitoring. By default, it supports dual communication paths for most data center
deployments by using both the IP network and the SAN connections (available in 7.1 SP3 and
later). These communication paths are done through the CAA and the central repository disk.
PowerHA SystemMirror 7.1 introduces the following features:
SMIT panel enhancements
The rootvg system event
Systems Director plug-in
Resource management enhancements
– StartAfter
– StopAfter
User-defined resource type
24
IBM PowerHA SystemMirror 7.1 for AIX
Dynamic node priority: Adaptive failover
Additional disk fencing by CAA
New Smart Assists for the following products:
–
–
–
–
–
SAP NetWeaver 7.0 (2004s) SR3
IBM FileNet® 4.5.1
IBM Tivoli Storage Manager 6.1
IBM Lotus® Domino® Server
SAP MaxDB v7.6 and 7.7
The clmgr tool
The clmgr tool is the new command-line user interface (CLI) with which an administrator
can use a uniform interface to deploy and maintain clusters. For more information, see 5.2,
“Cluster configuration using the clmgr tool” on page 104.
2.3 Changes to the SMIT panel
PowerHA SystemMirror 7.1 includes several changes to the SMIT panel since the release of
PowerHA 6.1. This topic focuses on the most used items on the panel and not the technical
changes behind these items. These changes can help experienced system administrators to
quickly find the paths to the functions they need to implement in their new clusters.
In PowerHA SystemMirror 7.1, the SMIT panel has the following key changes:
Separation of menus by function
Addition of the Custom Cluster Configuration menu
Removal of Extended Distance menus from the base product
Removal of unsupported dialogs or menus
Changes to some terminology
New dialog for specifying repository and cluster IP address
Many changes in topology and resource menus
2.3.1 SMIT tree
The SMIT tree offers several changes that make it easier for system administrators to find the
task they want to perform. For an overview of these changes, see Appendix B, “PowerHA
SMIT tree” on page 483. To access a list of the SMIT tree and available fast paths, use the
smitty path: smitty hacmp  Can't find what you are looking for ?.
Chapter 2. Features of PowerHA SystemMirror 7.1
25
2.3.2 The smitty hacmp command
Figure 2-1 shows the SMIT screens that you see when you use the smitty hacmp command
or the path: smitty  Communications Applications and Services  PowerHA
SystemMirror. It compares PowerHA 5.5, PowerHA 6.1, and PowerHA SystemMirror 7.1.
Figure 2-1 The screens shown after running the smitty hacmp command
In PowerHA SystemMirror 7.1, the smitty sysmirror (or smit sysmirror) command provides
a new fast path to the PowerHA start menu in SMIT. The old fast path (smitty hacmp) is still
valid.
26
IBM PowerHA SystemMirror 7.1 for AIX
Figure 2-2 shows, in more detail, where some of the main functions moved to. Minor changes
have been made to the following paths, which are not covered in this Redbooks publication:
System Management (C-SPOC)
Problem Determination Tools
Can’t find what you are looking for ?
Not sure where to start ?
The “Initialization and Standard Configuration” path has been split into two paths: Cluster
Nodes and Networks and Cluster Applications and Resources. For more details about these
paths, see 2.3.4, “Cluster Standard Configuration menu” on page 29. Some features for the
Extended Configuration menu have moved to the Custom Cluster Configuration menu. For
more details about custom configuration, see 2.3.5, “Custom Cluster Configuration menu” on
page 30.
smitty sysmirror
Figure 2-2 PowerHA SMIT start panel
Chapter 2. Features of PowerHA SystemMirror 7.1
27
2.3.3 The smitty clstart and smitty clstop commands
The SMIT screens to start and stop a cluster did not change, and the fast path is still the
same. Figure 2-3 shows the Start Cluster Services panels for PowerHA versions 5.5, 6.1, and
7.1.
Although the SMIT path did not change, some of the wording has changed. For example, the
word “HACMP” was replaced with “Cluster Services.” The path with the new wording is smitty
hacmp  System Management (C-SPOC)  PowerHA SystemMirror Services, and then
you select either the “Start Cluster Services” or “Stop Cluster Services” menu.
Figure 2-3 The screens that are shown when running the smitty clstart command
28
IBM PowerHA SystemMirror 7.1 for AIX
2.3.4 Cluster Standard Configuration menu
In previous versions, the “Cluster Standard Configuration” menu was called the “Initialization
and Standard Configuration” menu. This menu is now split into the following menu options as
indicated in Figure 2-2 on page 27:
Cluster Nodes and Networks
Cluster Applications and Resources
This version has a more logical flow. The topology configuration and management part is in
the “Cluster Nodes and Networks” menu. The resources configuration and management part
is in the “Cluster Applications and Resources” menu.
Figure 2-4 shows some tasks and where they have moved to. The dotted line shows where
Smart Assist was relocated. The Two-Node Cluster Configuration Assistant no longer exists.
Figure 2-4 Cluster standard configuration
Chapter 2. Features of PowerHA SystemMirror 7.1
29
2.3.5 Custom Cluster Configuration menu
The “Custom Cluster Configuration” menu is similar to the “Extended Configuration” menu in
the previous release. Unlike the “Extended Configuration” menu, which contains entries that
were duplicated from the standard menu path, the “Custom Cluster Configuration” menu in
PowerHA SystemMirror 7.1 does not contain these duplicate entries. Figure 2-5 shows an
overview of where some of the functions have moved to. The Custom Cluster Configuration
menu is shown in the upper-right corner, and the main PowerHA SMIT menu is shown in the
lower-right corner.
Figure 2-5 Custom Cluster Configuration menu
30
IBM PowerHA SystemMirror 7.1 for AIX
2.3.6 Cluster Snapshot menu
The content of the Cluster Snapshot menu did not change compared to PowerHA 6.1
(Figure 2-6). However, the path to this menu has changed to smitty sysmirror  Cluster
Nodes and Networks  Manage the Cluster  Snapshot Configuration.
Snapshot Configuration
Move cursor to desired item and press Enter.
Create a Snapshot of the Cluster Configuration
Change/Show a Snapshot of the Cluster Configuration
Remove a Snapshot of the Cluster Configuration
Restore the Cluster Configuration From a Snapshot
Configure a Custom Snapshot Method
Figure 2-6 Snapshot Configuration menu
2.3.7 Configure Persistent Node IP Label/Address menu
The content of the SMIT panel to add or change a persistent IP address did not change
compared to PowerHA 6.1 (Figure 2-7). However, the path to it changed to smitty hacmp 
Cluster Nodes and Networks  Manage Nodes  Configure Persistent Node IP
Label/Addresses.
Configure Persistent Node IP Label/Addresses
Move cursor to desired item and press Enter.
Add a Persistent Node IP Label/Address
Change/Show a Persistent Node IP Label/Address
Remove a Persistent Node IP Label/Address
Figure 2-7 Configure Persistent Node IP Label/Addresses menu
2.4 The rootvg system event
PowerHA SystemMirror 7.1 introduces system events. These events are handled by a new
subsystem called clevmgrdES. The rootvg system event allows for the monitoring of loss of
access to the rootvg volume group. By default, in the case of loss of access, the event logs an
entry in the system error log and reboots the system. If required, you can change this option
in the SMIT menu to log only an event entry and not to reboot the system. For further details
about this event and a test example, see 9.4.1, “The rootvg system event” on page 286.
Chapter 2. Features of PowerHA SystemMirror 7.1
31
2.5 Resource management enhancements
PowerHA SystemMirror 7.1 offers the following new resource and resource group
configuration choices. They provide more flexibility in administering resource groups across
the various nodes in the cluster.
Start After and Stop After resource group dependencies
User-defined resource type
Adaptive failover
2.5.1 Start After and Stop After resource group dependencies
The previous version of PowerHA has the following types of resource group dependency
runtime policies:
Parent-child
Online on the Same Node
Online on Different Nodes
Online On Same Site Location
These policies are insufficient for supporting some complex applications. For example, the
FileNet application server must be started only after its associated database is started. It
does not need to be stopped if the database is brought down for some time and then started.
The following dependencies have been added to PowerHA:
Start After dependency
Stop After dependency
The Start After and Stop After dependencies use source and target resource group
terminology. The source resource group depends on the target resource group as shown in
Figure 2-8.
db_rg
Target
Start After
Source
app_rg
Figure 2-8 Start After resource group dependency
32
IBM PowerHA SystemMirror 7.1 for AIX
For Start After dependency, the target resource group must be online on any node in the
cluster before a source (dependent) resource group can be activated on a node. Resource
groups can be released in parallel and without any dependency.
Similarly, for Stop After dependency, the target resource group must be offline on any node in
the cluster before a source (dependent) resource group can be brought offline on a node.
Resource groups are acquired in parallel and without any dependency.
A resource group can serve as both a target and a source resource group, depending on
which end of a given dependency link it is placed. You can specify three levels of
dependencies for resource groups. You cannot specify circular dependencies between
resource groups.
A Start After dependency applies only at the time of resource group acquisition. During a
resource group release, these resource groups do not have any dependencies. A Start After
source resource group cannot be acquired on a node until its target resource group is fully
functional. If the target resource group does not become fully functional, the source resource
group goes into an OFFLINE DUE TO TARGET OFFLINE state. If you notice that a resource group
is in this state, you might need to troubleshoot which resources need to be brought online
manually to resolve the resource group dependency.
When a resource group in a Start After target role falls over from one node to another, the
resource groups that depend on it are unaffected.
After the Start After source resource group is online, any operation (such as bring offline or
move resource group) on the target resource group does not affect the source resource
group. A manual resource group move or bring resource group online on the source resource
group is not allowed if the target resource group is offline.
A Stop After dependency applies only at the time of a resource group release. During
resource group acquisition, these resource groups have no dependency between then. A
Stop After source resource group cannot be released on a node until its target resource group
is offline.
When a resource group in a Stop After source role falls over from one node to another, its
related target resource group is released as a first step. Then the source (dependent)
resource group is released. Next, both resource groups are acquired in parallel, assuming
that no start after or tparent-child dependency exists between these resource groups.
A manual resource group move or bring resource group offline on the Stop After source
resource group is not allowed if the target resource group is online.
Summary: In summary, the source Start After and Stop After target resource groups have
the following dependencies:
Source Start After target: The source is brought online after the target resource group.
Source Stop After target: The source is brought offline after the target resource group.
Chapter 2. Features of PowerHA SystemMirror 7.1
33
A parent-child dependency can be seen as being composed of two parts with the newly
introduced Start After and Stop After dependencies. Figure 2-9 shows this logical
equivalence.
Figure 2-9 Comparing Start After, Stop After, and parent-child resource group (rg) dependencies
If you configure a Start After dependency between two resource groups in your cluster, the
applications in these resource groups are started in the configured sequence. To ensure that
this process goes smoothly, configure application monitors and use a Startup Monitoring
mode for the application included in the target resource group.
For a configuration example, see 5.1.6, “Configuring Start After and Stop After resource
group dependencies” on page 96.
2.5.2 User-defined resource type
With PowerHA, you can add your own resource types and specify management scripts to
configure how and where PowerHA processes the resource type. You can then configure a
user-defined resource instance for use in a resource group.
A user-defined resource type is one that you can define for a customized resource that you
can add to a resource group. A user-defined resource type contains several attributes that
describe the properties of the instances of the resource type.
When you create a user-defined resource type, you must choose processing order among
existing resource types. PowerHA SystemMirror processes the user-defined resources at the
beginning of the resource acquisition order if you choose the FIRST value. If you chose any
other value, for example, VOLUME_GROUP, the user-defined resources are acquired after varying
on the volume groups. Then they are released before varying off the volume groups. These
resources are existing resource types. You can choose from a pick list in the SMIT menu.
34
IBM PowerHA SystemMirror 7.1 for AIX
Figure 2-10 shows the existing resource type and acquisition or release order. A user-defined
resource can be any of the following types:
FIRST
WPAR
SERVICEIP
TAPE (DISKS)
VOLUME_GROUP
FILE_SYSTEM
APPLICATION
DISK
FILE SYSTEM
Userdefined
Resource
Acquisition Order
Release Order
RSCT with CAA
SERVICE IP
APPLICATION
Figure 2-10 Processing order of the resource type
2.5.3 Dynamic node priority: Adaptive failover
The framework for dynamic node priority is already present in the previous versions of
PowerHA. This framework determines the takeover node at the time of a failure according to
one of the following policies:
cl_highest_free_mem
cl_highest_idle_cpu
cl_lowest_disk_busy
The cluster manager queries the Resource Monitoring and Control (RMC) subsystem every
3 minutes to obtain the current value of these attributes on each node. Then the cluster
manager distributes them cluster-wide. For an architecture overview of PowerHA and RSCT,
see 1.1.3, “PowerHA and RSCT” on page 5.
Chapter 2. Features of PowerHA SystemMirror 7.1
35
The dynamic node priority feature is enhanced in PowerHA SystemMirror 7.1 to support the
following policies:
cl_lowest_nonzero_udscript_rc
cl_highest_udscript_rc
The return code of a user-defined script is used in determining the destination node.
When you select one of the criteria, you must also provide values for the DNP script path and
DNP timeout attributes for a resource group. PowerHA executes the supplied script and
collects the return codes from all nodes. If you choose the cl_highest_udscript_rc policy,
collected values are sorted. The node that returned the highest value is selected as a
candidate node to fall over. Similarly, if you choose the cl_lowest_nonzero_udscript_rc
policy, collected values are sorted. The node that returned lowest nonzero positive value is
selected as a candidate takeover node. If the return value of the script from all nodes is the
same or zero, the default node priority is considered. PowerHA verifies the script existence
and the execution permissions during verification.
Time-out value: When you select a time-out value, ensure that it is within the time period
for running and completing a script. If you do not specify a time-out value, a default value
equal to the config_too_long time is specified.
For information about configuring the dynamic node priority, see 5.1.8, “Configuring the
dynamic node priority (adaptive failover)” on page 102.
2.6 CLUSTER_OVERRIDE environment variable
In PowerHA SystemMirror 7.1, the use of several AIX commands on cluster resources can
potentially impair the integrity of the cluster configuration. PowerHA SystemMirror 7.1
provides C-SPOC versions of these functions, which are safer to use in the cluster
environment. You can avoid this usage by using the commands outside of C-SPOC. By
default, it is set to allow the use of these commands outside of C-SPOC.
To restrict people from using these commands in the command line, you can change the
default value from yes to no:
1. Locate the following line in the /etc/environment file:
CLUSTER_OVERRIDE=yes
2. Change the line to the following line:
CLUSTER_OVERRIDE=no
The following commands are affected by this variable:
36
chfs
crfs
chgroup
chlv
chpasswd
chuser
chvg
extendlv
extendvg
importvg
mirrorvg
IBM PowerHA SystemMirror 7.1 for AIX
mkgroup
mklv
mklvcopy
mkuser
mkvg
reducevg
If the CLUSTER_OVERRIDE variable has the value no, you see an error message similar to the
one shown in Example 2-1.
Example 2-1 Error message when using CLUSTER_OVERRIDE=no
# chfs -a size=+1 /home
The command must be issued using C-SPOC or the override environment variable must
be set.
In this case, use the equivalent C-SPOC CLI called cli_chfs. See the C-SPOC man page for
more details.
Deleting the CLUSTER_OVERRIDE variable: You also see the message shown in
Example 2-1 if you delete the CLUSTER_OVERRIDE variable in your /etc/environment file.
2.7 CAA disk fencing
CAA introduces another level of disk fencing beyond what PowerHA and gsclvmd provide by
using enhanced concurrent volume groups (ECVGs). In previous releases of PowerHA when
using ECVGs in a fast disk takeover mode, the volume groups are in full read/write (active)
mode on the node owning the resource group. Any standby candidate node has the volume
group varied on in read only (passive) mode.
The passive state allows only read access to a volume group special file and the first 4 KB of
a logical volume. Write access through standard LVM is not allowed. However, low-level
commands, such as dd, can bypass LVM and write directly to the disk.
The new CAA disk fencing feature prevents writes from any other nodes to the disk device,
invalidating the potential for a lower-level operation, such as dd, to succeed. However, any
system that has access to that disk might be a member of the CAA cluster. Therefore, its still
important to zone the storage appropriately so that only cluster nodes have the disks
configured.
The PowerHA SystemMirror 7.1 announcement letter explains this fencing feature as a
storage framework that is embedded in the operating system to aid in storage device
management. As part of the framework, fencing disks or disk groups are supported. Fencing
shuts off write access to the shared disks from any entity on the node (irrespective of the
privileges associated with the entity trying to access the disk). Fencing is exploited by
PowerHA SystemMirror to implement strict controls in regard to shared disks and their access
solely from one the nodes that is sharing the disk. Fencing ensures that, when the workload
moves to another node for continuing operations, access to the disks on the departing node is
turned off for write operations.
Chapter 2. Features of PowerHA SystemMirror 7.1
37
2.8 PowerHA SystemMirror event flow differences
The event flow process occurs when the PowerHA SystemMirror cluster starts.
2.8.1 Startup processing
Start
Cluste
r
servic
es
In this example, a resource group must be started on a node. The start server is not done
until the necessary resources are acquired. Figure 2-11 illustrates the necessary steps to
move the acquired resource groups during a node failure.
1)rg_move_acquire
lls
ca
RC
clstrmgrES
Event
Manager
cal
ls
RC
process_resources (NONE)
for each RG:
process_resources (ACQUIRE)
process_resources (SERVICE_LABELS)
acquire_service_addr
acquire_aconn_service en0 net_ether_01
process_resources (DISKS)
process_resources (VGS)
process_resources (LOGREDO)
process_resources (FILESYSTEMS)
process_resources (SYNC_VGS)
process_resources (TELINIT)
process_resources (NONE)
< Event Summary >
2) rg_move_complete
for each RG: process resources (APPLICATIONS)
start_server app01
process_resources (ONLINE)
process_resources (NONE)
< Event Summary >
Figure 2-11 First node starting the cluster services
TE_RG_MOVE_ACQUIRE is the SystemMirror event listed in the debug file. The
/usr/es/sbin/cluster/events/rg_online.rp recovery program is listed in the HACMP rules
Object Data Manager (ODM) file (Example 2-2).
Example 2-2 The rg_online.rp file
all "rg_move_fence" 0 NULL
barrier
#
all "rg_move_acquire" 0 NULL
#
barrier
#
all "rg_move_complete" 0 NULL
The following section explains what happens when a subsequent node joins the cluster.
38
IBM PowerHA SystemMirror 7.1 for AIX
2.8.2 Another node joins the cluster
When another node starts, it must first join the cluster. If a resource group needs to fall back,
then rg_move_release is done. If the resource group fallback is not needed, the
rg_move_release is skipped. The numbers indicate the order of the steps. The same number
means that parallel processing is taking place. Example 2-3 shows the messages on the
process flow.
Example 2-3 Debug file showing the process of another node joining the cluster
Debug file:
[TE_JOIN_NODE_DEP] r
[TE_RG_MOVE_ACQUIRE]
[TE_JOIN_NODE_DEP_COMPLETE]i
cluster.log file node1:
Nov 23 00:35:06 AIX: EVENT
Nov 23 00:35:06 AIX: EVENT
Nov 23 00:35:11 AIX: EVENT
Nov 23 00:35:11 AIX: EVENT
Nov 23 00:35:11 AIX: EVENT
Nov 23 00:35:11 AIX: EVENT
Nov 23 00:35:11 AIX: EVENT
Nov 23 00:35:11 AIX: EVENT
Nov 23 00:35:15 AIX: EVENT
Nov 23 00:35:15 AIX: EVENT
Nov 23 00:35:18 AIX: EVENT
Nov 23 00:35:18 AIX: EVENT
START: node_up node2
COMPLETED: node_up node2 0
START: rg_move_fence node1 2
COMPLETED: rg_move_fence node1 2 0
START: rg_move_acquire node1 2
START: rg_move node1 2 ACQUIRE
COMPLETED: rg_move node1 2 ACQUIRE 0
COMPLETED: rg_move_acquire node1 2 0
START: rg_move_complete node1 2
COMPLETED: rg_move_complete node1 2 0
START: node_up_complete node2
COMPLETED: node_up_complete node2 0
cluster.log file node2
Nov 23 00:35:06 AIX: EVENT
Nov 23 00:35:08 AIX: EVENT
Nov 23 00:35:11 AIX: EVENT
Nov 23 00:35:11 AIX: EVENT
Nov 23 00:35:11 AIX: EVENT
Nov 23 00:35:11 AIX: EVENT
Nov 23 00:35:11 AIX: EVENT
Nov 23 00:35:13 AIX: EVENT
Nov 23 00:35:13 AIX: EVENT
Nov 23 00:35:13 AIX: EVENT
Nov 23 00:35:15 AIX: EVENT
Nov 23 00:35:15 AIX: EVENT
Nov 23 00:35:15 AIX: EVENT
Nov 23 00:35:15 AIX: EVENT
Nov 23 00:35:16 AIX: EVENT
Nov 23 00:35:16 AIX: EVENT
Nov 23 00:35:18 AIX: EVENT
Nov 23 00:35:18 AIX: EVENT
START: node_up node2
COMPLETED: node_up node2 0
START: rg_move_fence node1 2
COMPLETED: rg_move_fence node1 2 0
START: rg_move_acquire node1 2
START: rg_move node1 2 ACQUIRE
START: acquire_service_addr
START: acquire_aconn_service en2 appsvc_
COMPLETED: acquire_aconn_service en2 app
COMPLETED: acquire_service_addr 0
COMPLETED: rg_move node1 2 ACQUIRE 0
COMPLETED: rg_move_acquire node1 2 0
START: rg_move_complete node1 2
START: start_server appBctrl
COMPLETED: start_server appBctrl 0
COMPLETED: rg_move_complete node1 2 0
START: node_up_complete node2
COMPLETED: node_up_complete node2 0
Chapter 2. Features of PowerHA SystemMirror 7.1
39
Figure 2-12 shows the process flow when another node joins the cluster.
g
nin
n
u
r
clstrmgrES
Messages
Event
Manager
1) rg_move_release
C
R
fallback to higher node
(see node leaves slide)
Nothing
RC
2) rg_move_acquire
nothing
2) rg_move_acquire
ll
ca
RC
call
ll
ca C
R
Event
Manager
t
ar
St ster s
e
u
Cl rvic
e
ca s
ll 1) rg_move_release
clstrmgrES
3)rg_move_complete
nothing
If no fallback, rg_move_release is not done
Same sequence as
node 1 up (previous visual)
3) rg_move_complete
for each RG:
process resources (APPLICATIONS)
start_server app02
process_resources (ONLINE)
process_resources (NONE)
< Event Summary >
Figure 2-12 Another node joining the cluster
The next section explains what happens when a node leaves the cluster voluntarily.
40
IBM PowerHA SystemMirror 7.1 for AIX
2.8.3 Node down processing normal with takeover
In this example, a resource group is on the departing node and must be moved to one of the
remaining nodes.
Node failure
The situation is slightly different if the node on the right fails suddenly. Because a node is not
in a position to run any events, the calls to process_resources listed under the right node are
not run as shown in Figure 2-13.
ning
run
clstrmgrES
p
Sto ter
s
Clu vices
ca
ll ser
clstrmgrES
Event
Manager
Messages
Event
Manager
ll
ca C
R
1) rg_move_release
RC
2) rg_move_acquire
RC
RC
ll
ca
ll
ca
nothing
1) rg_move_release
for each RG:
service address
disks
for each RG:
process_resources (RELEASE)
process_resources (APPLICATIONS)
stop_server app02
process_resources (FILESYSTEMS)
process_resources (VGS)
process_resources (SERVICE_LABELS)
release_service_addr
< Event Summary >
2) rg_move_acquire
3) rg_move_complete
start server
nothing
3) rg_move_complete
Figure 2-13 Node leaving the cluster (stopped)
Example 2-4 shows details about the process flow from the clstrmgr.debug file.
Example 2-4 clstrmgr.debug file
clstrmgr.debug file:
[TE_FAIL_NODE_DEP]
[TE_RG_MOVE_RELEASE]
[TE_RG_MOVE_ACQUIRE]
[TE_FAIL_NODE_DEP_COMPLETE]
cluster.log file node1
Nov 23 06:24:21 AIX: EVENT COMPLETED: rg_move node1 1 RELEASE 0
Nov 23 06:24:21 AIX: EVENT COMPLETED: rg_move_release node1 1 0
Nov 23 06:24:32 AIX: EVENT START: rg_move_fence node1 1
Nov 23 06:24:32 AIX: EVENT COMPLETED: rg_move_fence node1 1 0
Nov 23 06:24:34 AIX: EVENT START: rg_move_fence node1 2
Nov 23 06:24:34 AIX: EVENT COMPLETED: rg_move_fence node1 2 0
Nov 23 06:24:35 AIX: EVENT START: rg_move_acquire node1 2
Nov 23 06:24:35 AIX: EVENT START: rg_move node1 2 ACQUIRE
Nov 23 06:24:35 AIX: EVENT START: acquire_service_addr
Nov 23 06:24:36 AIX: EVENT START: acquire_aconn_service en2 appsvc_
Chapter 2. Features of PowerHA SystemMirror 7.1
41
Nov
Nov
Nov
Nov
Nov
Nov
Nov
Nov
Nov
Nov
Nov
Nov
Nov
Nov
42
23
23
23
23
23
23
23
23
23
23
23
23
23
23
06:24:36
06:24:36
06:24:36
06:24:38
06:24:41
06:24:41
06:24:41
06:24:41
06:24:42
06:24:42
06:24:49
06:24:49
06:24:51
06:24:51
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
COMPLETED: acquire_aconn_service en2 app
COMPLETED: acquire_service_addr 0
START: acquire_takeover_addr
COMPLETED: acquire_takeover_addr 0
COMPLETED: rg_move node1 2 ACQUIRE 0
COMPLETED: rg_move_acquire node1 2 0
START: rg_move_complete node1 2
START: start_server appActrl
START: start_server appBctrl
COMPLETED: start_server appBctrl 0
COMPLETED: start_server appActrl 0
COMPLETED: rg_move_complete node1 2 0
START: node_down_complete node2
COMPLETED: node_down_complete node2 0
cluster.log node2
Nov 23 06:24:21 AIX:
Nov 23 06:24:21 AIX:
Nov 23 06:24:21 AIX:
Nov 23 06:24:21 AIX:
Nov 23 06:24:22 AIX:
Nov 23 06:24:24 AIX:
Nov 23 06:24:27 AIX:
Nov 23 06:24:28 AIX:
Nov 23 06:24:29 AIX:
Nov 23 06:24:30 AIX:
Nov 23 06:24:30 AIX:
Nov 23 06:24:30 AIX:
Nov 23 06:24:32 AIX:
Nov 23 06:24:32 AIX:
Nov 23 06:24:34 AIX:
Nov 23 06:24:35 AIX:
Nov 23 06:24:35 AIX:
Nov 23 06:24:35 AIX:
Nov 23 06:24:35 AIX:
Nov 23 06:24:35 AIX:
Nov 23 06:24:41 AIX:
Nov 23 06:24:41 AIX:
Nov 23 06:24:51 AIX:
Nov 23 06:24:52 AIX:
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
START: rg_move_release node1 1
START: rg_move node1 1 RELEASE
START: stop_server appActrl
START: stop_server appBctrl
COMPLETED: stop_server appBctrl 0
COMPLETED: stop_server appActrl 0
START: release_service_addr
COMPLETED: release_service_addr 0
START: release_takeover_addr
COMPLETED: release_takeover_addr 0
COMPLETED: rg_move node1 1 RELEASE 0
COMPLETED: rg_move_release node1 1 0
START: rg_move_fence node1 1
COMPLETED: rg_move_fence node1 1 0
START: rg_move_fence node1 2
COMPLETED: rg_move_fence node1 2 0
START: rg_move_acquire node1 2
START: rg_move node1 2 ACQUIRE
COMPLETED: rg_move node1 2 ACQUIRE 0
COMPLETED: rg_move_acquire node1 2 0
START: rg_move_complete node1 2
COMPLETED: rg_move_complete node1 2 0
START: node_down_complete node2
COMPLETED: node_down_complete node2 0
IBM PowerHA SystemMirror 7.1 for AIX
3
Chapter 3.
Planning a cluster
implementation for high
availability
This chapter provides guidance for planning a cluster implementation for high availability with
IBM PowerHA SystemMirror 7.1 for AIX. It explains the software, hardware, and storage
requirements with a focus on PowerHA 7.1.
For more details about planning, consider the following publications:
PowerHA for AIX Cookbook, SG24-7739
PowerHA SystemMirror Version 7.1 for AIX Planning Guide, SC23-6758-01
This chapter includes the following topics:
Software requirements
Hardware requirements
Considerations before using PowerHA 7.1
Migration planning
Storage
Network
© Copyright IBM Corp. 2011. All rights reserved.
43
3.1 Software requirements
Because PowerHA 7.1 for AIX uses Cluster Aware AIX (CAA) functionality, the following
minimum versions of AIX and Reliable Scalable Cluster Technology (RSCT) are required:
AIX 6.1 TL6 or AIX 7.1
RSCT 3.1
CAA cluster: PowerHA SystemMirror creates the CAA cluster automatically. You do not
manage the CAA configuration or state directly. You can use the cluster commands to view
the CAA status directly.
Download and install the latest service packs for AIX and PowerHA from IBM Fix Central at:
http://www.ibm.com/support/fixcentral
3.1.1 Prerequisite for AIX BOS and RSCT components
The following Base Operating System (BOS) components for AIX are required for PowerHA:
bos.adt.lib
bos.adt.libm
bos.adt.syscalls
bos.ahafs
bos.clvm.enh
bos.cluster
bos.data
bos.net.tcp.client
bos.net.tcp.server
bos.rte.SRC
bos.rte.libc
bos.rte.libcfg
bos.rte.libcur
bos.rte.libpthreads
bos.rte.lvm
bos.rte.odm
cas.agent (required for the IBM Systems Director plug-in)
The following file sets on the AIX base media are required:
rsct.basic.rte
rsct.compat.basic.hacmp
rsct.compat.clients.hacmp
The appropriate versions of RSCT for the supported AIX releases are also supplied with the
PowerHA installation media.
3.2 Hardware requirements
The nodes of your cluster can be hosted on any hardware system on which installation of AIX
6.1 TL6 or AIX 7.1 is supported. They can be hosted as a full system partition or inside a
logical partition (LPAR).
44
IBM PowerHA SystemMirror 7.1 for AIX
The right design methodology can help eliminate network and disk single points of failure
(SPOF) by using redundant configurations. Have at least two network adapters connected to
different Ethernet switches in the same virtual LAN (VLAN). EtherChannel is supported with
PowerHA. Employ dual-fabric SAN connections to the storage subsystems using at least two
Fibre Channel (FC) adapters and appropriate multipath drivers. Use Redundant Array of
Independent Disks (RAID) technology to protect data from any disk failure.
This topic describes the hardware that is supported.
3.2.1 Supported hardware
Your hardware, including the firmware and the AIX multipath driver, must be in a supported
configuration. For more information about hardware, see Appendix C, “PowerHA supported
hardware” on page 491.
More information: For a list of the supported FC adapters, see “Setting up cluster storage
communication” in the AIX 7.1 Information Center at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.
clusteraware/claware_comm_setup.htm
See the readme files that are provided with the base PowerHA file sets and the latest service
pack. See also the PowerHA SystemMirror 7.1 for AIX Standard Edition Information Center
at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.doc/doc/base/
powerha.htm
The nodes of your cluster can be any system on which the installation of AIX 6.1 TL6 or
AIX 7.1 is supported, either as a full system partition or as a logical partition (LPAR).
Design methodologies can help eliminate network and disk single points of failure (SPOF) by
using redundant configurations. Use at least two network adapters connected to different
Ethernet switches in the same virtual LAN (VLAN). (PowerHA also supports the use of
EtherChannel.) Similarly, use dual-fabric storage area network (SAN) connections to the
storage subsystems with at least two Fibre Channel (FC) adapters and appropriate multipath
drivers. Also use Redundant Array of Independent Disks (RAID) technology to protect data
from any disk failure.
3.2.2 Requirements for the multicast IP address, SAN, and repository disk
Cluster communication requires the use of a multicast IP address. You can specify this
address when you create the cluster, or you can have one be generated automatically. The
ranges 224.0.0.0–224.0.0.255 and 239.0.0.0–239.255.255.255 are reserved for
administrative and maintenance purposes. If multicast traffic is present in the adjacent
network, you must ask the network administrator for a multicast IP address allocation. Also,
ensure that the multicast traffic that is generated by each of the cluster nodes is properly
forwarded by the network infrastructure to any other cluster node.
If you use SAN-based heartbeat, you must have zoning setup to ensure connectivity between
host FC adapters. You also must activate the Target Mode Enabled parameter on the involved
FC adapters.
Hardware redundancy at the storage subsystem level is mandatory for the Cluster Repository
disk. Logical Volume Manager (LVM) mirroring of the repository disk is not supported. The disk
Chapter 3. Planning a cluster implementation for high availability
45
must be at least 1 GB in size and not exceed 10 GB. For more information about supported
hardware for the cluster repository disk, see 3.5.1, “Shared storage for the repository disk” on
page 48.
CAA support: Currently CAA only supports the repository disk Fibre Channel or SAS
disks as described in the “Cluster communication” topic in the AIX 7.1 Information Center
at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.
clusteraware/claware_comm_benifits.htm
3.3 Considerations before using PowerHA 7.1
You must be aware of the following considerations before planning to use PowerHA 7.1:
You cannot change the host name in a configured cluster.
After the cluster is synchronized, you are unable to change the host name of any of the
cluster nodes. Therefore, changing the host name is not supported.
You cannot change the cluster name in a configured cluster.
After the cluster is synchronized, you are unable to change the name of the cluster. If you
want to change the cluster name, you must completely remove and recreate the cluster.
You cannot change the repository location or cluster IP address in a configured cluster.
After the cluster is synchronized, you are unable to change the repository disk or cluster
multicast IP address. To change the repository disk or the cluster multicast IP address,
you must completely remove and recreate the cluster.
No IPV6 support is available, which is a restriction from the CAA implementation.
3.4 Migration planning
Before migrating your cluster, you must be aware of the following considerations:
The required software
– AIX
– Virtual I/O Server (VIOS)
Multicast address
Repository disk
FC heartbeat support
All non-IP networks support removed
–
–
–
–
–
RS232
TMSCSI
TMSSA
Disk heartbeat (DISKHB)
Multinode disk heartbeat (MNDHB)
IP networks support removed
– Asynchronous transfer mode (ATM)
– Fiber Distributed Data Interface (FDDI)
– Token ring
46
IBM PowerHA SystemMirror 7.1 for AIX
IP Address Takeover (IPAT) via replacement support removed
Heartbeat over alias support removed
Site support not available in this version
IPV6 support not available in this version
You can migrate from High-Availability Cluster Multi-Processing (HACMP) or PowerHA
versions 5.4.1, 5.5, and 6.1 only. If you are running a version earlier than HACMP 5.4.1, you
must upgrade to a newer version first.
TL6: AIX must be at a minimum version of AIX 6.1 TL6 (6.1.6.0) on all nodes before
migration. Use of AIX 6.1 TL6 SP2 or later is preferred.
Most migration scenarios require a two-part upgrade. First, you migrate AIX to the minimum
version of AIX 6.1 TL6 on all nodes. You must reboot each node after upgrading AIX. Second,
you migrate to PowerHA 7.1 by using the offline, rolling, or snapshot scenario as explained in
Chapter 7, “Migrating to PowerHA 7.1” on page 151.
In addition, keep in mind the following considerations:
Multicast address
A multicast address is required for communication between the nodes (used by CAA).
During the migration, you can specify this address or allow CAA to automatically generate
one for you.
Discuss the multicast address with your network administrator to ensure that such
addresses are allowed on your network. Consider firewalls and routers that might not have
this support enabled.
CAA repository disk
A shared disk that is zoned in and available to all nodes in the cluster is required. This disk
is reserved for use by CAA only.
VIOS support
You can configure a PowerHA 7.1 cluster on LPARs that are using resources provided by a
VIOS. However, the support of your CAA repository disk has restrictions.
Support for vSCSI: CAA repository disk support for virtual SCSI (vSCSI) is officially
introduced in AIX 6.1 TL6 SP2 and AIX 7.1 SP2. You can create a vSCSI disk
repository at AIX 6.1 TL6 base levels, but not at SP1. Alternatively, direct SAN
connection logical unit numbers (LUNs) or N_Port ID Virtualization (NPIV) LUNs are
supported with all versions.
SAN heartbeat support
One of the new features of PowerHA 7.1 is the ability to use the SAN fabric for another
communications route between hosts. This feature is implemented through CAA and
replaces Non-IP support in previous versions.
Adapters for SAN heartbeat: This feature requires 4 GB or 8 GB adapters, which
must be direct attach or virtualized. If the adapters are virtualized as vSCSI through
VIOS or by using NPIV, VIOS 2.2.0.11-FP24 SP01 is required.
Chapter 3. Planning a cluster implementation for high availability
47
Heartbeat support for non-IP configurations (such as disk heartbeat)
Disk-based heartbeat, MNDHB, RS232, TMSCSI, and TMSSA are no longer supported
configurations with PowerHA 7.1. When you migrate, be aware that you cannot keep these
configurations. When the migration is completed, these definitions are removed from the
Object Data Manager (ODM).
As an alternative, PowerHA 7.1 uses SAN-based heartbeat, which is configured
automatically when you migrate.
Removal of existing network hardware support
FDDI, ATM, and token ring are no longer supported. You must remove this hardware
before you begin the migration.
IPAT via IP replacement
IPAT via IP replacement for address takeover is no longer supported. You must remove
this configuration before you begin the migration.
Heartbeat over aliases
Configurations using heartbeat over aliases are no longer supported. You must remove
this configuration before you begin the migration.
PowerHA SystemMirror for AIX Enterprise Edition (PowerHA/XD) configurations
The latest version of PowerHA/XD is 6.1. You cannot migrate this version to PowerHA 7.1.
3.5 Storage
This section provides details about storage planning considerations for high availability of
your cluster implementation.
3.5.1 Shared storage for the repository disk
You must dedicate a shared disk with a minimum size of 1 GB as a central repository for the
cluster configuration data of CAA. For this disk, configure intrinsic data redundancy by using
hardware RAID features of the external storage subsystems.
For additional information about the shared disk, see the PowerHA SystemMirror Version 7.1
for AIX Standard Edition Concepts and Facilities Guide, SC23-6751. See also the PowerHA
SystemMirror Version 7.1 announcement information or the PowerHA SystemMirror Version
7.1 for AIX Standard Edition Planning Guide, SC23-6758-01, for a complete list of supported
devices.
The following disks are supported (through Multiple Path I/O (MPIO)) for the repository disk:
All FC disks that configure as MPIO
IBM DS8000, DS3000, DS4000®, DS5000, XIV®, ESS800, SAN Volume Controller (SVC)
EMC: Symmetrix, DMX, CLARiiON
HDS: 99XX, 96XX, OPEN series
IBM System Storage N series/NetApp: All models of N series and all NetApp models
common to N series
VIOS vSCSI
All IBM serial-attached SCSI (SAS) disks that configure as MPIO
SAS storage
48
IBM PowerHA SystemMirror 7.1 for AIX
The following storage types are known to work with MPIO but do not have a service
agreement:
HP
SUN
Compellent
3PAR
LSI
Texas Memory Systems
Fujitsu
Toshiba
Support for third-party multipathing software: At the time of writing, some third-party
multipathing software was not supported.
3.5.2 Adapters supported for storage communication
At the time of this writing, only the 4 GB and 8 GB FC adapters are supported. Also the
daughter card for IBM System p blades and Emulex FC adapters are supported. See
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Planning Guide, SC23-6758-01,
for additional information.
The following FC and SAS adapters are supported for connection to the repository disk:
4 GB Single-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 1905; CCIN 1910)
4 GB Single-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 5758; CCIN 280D)
4 GB Single-Port Fibre Channel PCI-X Adapter (FC 5773; CCIN 5773)
4 GB Dual-Port Fibre Channel PCI-X Adapter (FC 5774; CCIN 5774)
4 Gb Dual-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 1910; CCIN 1910)
4 Gb Dual-Port Fibre Channel PCI-X 2.0 DDR Adapter (FC 5759; CCIN 5759)
8 Gb PCI Express Dual Port Fibre Channel Adapter (FC 5735; CCIN 577D)
8 Gb PCI Express Dual Port Fibre Channel Adapter 1Xe Blade (FC 2B3A; CCIN 2607)
3 Gb Dual-Port SAS Adapter PCI-X DDR External (FC 5900 and 5912; CCIN 572A)
More information: For the most current list of supported storage adapters for shared disks
other than the repository disk, contact your IBM representative. Also see the “IBM
PowerHA SystemMirror for AIX” web page at:
http://www.ibm.com/systems/power/software/availability/aix/index.html
The PowerHA software supports the following disk technologies as shared external disks in a
highly available cluster:
SCSI drives, including RAID subsystems
FC adapters and disk subsystems
Data path devices (VPATH): SDD 1.6.2.0, or later
Virtual SCSI (vSCSI) disks
Support for vSCSI: CAA repository disk support for vSCSI is officially introduced in
AIX 6.1 TL6 SP2 and AIX 7.1 SP2. You can create a vSCSI disk repository at AIX 6.1
TL6 base levels, but not at SP1. Alternatively, direct SAN connection LUNs or NPIV
LUNs are supported with all versions.
You can combine these technologies within a cluster. Before choosing a disk technology,
review the considerations for configuring each technology as described in the following
section.
Chapter 3. Planning a cluster implementation for high availability
49
3.5.3 Multipath driver
AIX 7.1 does not support the IBM Subsystem Device Driver (SDD) for TotalStorage Enterprise
Storage Server®, the IBM System Storage DS8000, and the IBM System Storage SAN Volume
Controller. Instead, you can use the IBM Subsystem Device Driver Path Control Module
(SDDPCM) or native AIX MPIO Path Control Module (PCM) for multipath support on AIX7.1.
AIX MPIO is an architecture that uses PCMs. The following PCMs are all supported:
SDDPCM
HDLM PCM
AIXPCM
SDDPCM only supports DS6000™, DS8000, SVC, and some models of DS4000. HDLM PCM
only supports Hitachi storage devices. AIXPCM supports all storage devices that System p
servers and VIOS support. AIXPCM supports storage devices from over 25 storage vendors.
Support for third-party multipath drivers: At the time of writing, other third-party
multipath drivers (such as EMC PowerPath, and Veritas) are not supported. This limitation
is planned to be resolved in a future release.
See the “Support Matrix for Subsystem Device Driver, Subsystem Device Driver Path Control
Module, and Subsystem Device Driver Device Specific Module” at:
http://www.ibm.com/support/docview.wss?rs=540&uid=ssg1S7001350
Also check whether the coexistence of different multipath drivers using different FC ports on
the same system is supported for mixed cases. For example, the cluster repository disk might
be a on storage or FC adapter other than the shared data disks.
3.5.4 System Storage Interoperation Center
To check the compatibility of your particular storage and SAN infrastructure with PowerHA,
see the System Storage Interoperation Center (SSIC) site at:
http://www.ibm.com/systems/support/storage/config/ssic
3.6 Network
The networking requirements for PowerHA SystemMirror 7.1 differ from all previous versions.
This section focuses specifically on the differences of the following requirements:
Multicast address
Network interfaces
Subnetting requirements for IPAT via aliasing
Host name and node name
Other network considerations
– Single adapter networks
– Virtual Ethernet (VIOS)
IPv6: IPv6 is not supported in PowerHA SystemMirror 7.1.
For additional information, and details about common features between versions, see the
PowerHA for AIX Cookbook, SG24-7739.
50
IBM PowerHA SystemMirror 7.1 for AIX
3.6.1 Multicast address
The CAA functionality in PowerHA SystemMirror 7.1 employs multicast addressing for
heartbeating. Therefore, the network infrastructure must handle and allow the use of multicast
addresses. If multicast traffic is present in the adjacent network, you must ask the network
administrator for a multicast IP address allocation. Also, ensure that the multicast traffic
generated by each of the cluster nodes is properly forwarded by the network infrastructure
toward any other cluster node.
3.6.2 Network interfaces
Because PowerHA SystemMirror uses CAA, CAA forces the use of all common network
(Ethernet, InfiniBand, or both) interfaces between the cluster nodes for communications. You
cannot limit which interfaces are used or configured to the cluster.
In previous versions, the network Failure Detection Rate (FDR) policy was tunable, which is
no longer true in PowerHA SystemMirror 7.1.
3.6.3 Subnetting requirements for IPAT via aliasing
In terms of subnetting requirements, IPAT via aliasing is now the only IPAT option available.
IPAT via aliasing has the following subnet requirements:
All base IP addresses on a node must be on separate subnets.
All service IP addresses must be on a separate subnet from any of the base subnets.
The service IP addresses can all be in the same or different subnets.
The persistent IP address can be in the same or a different subnet from the service IP
address.
If the networks are a single adapter configuration, both the base and service IP addresses
are allowed on the same subnet.
3.6.4 Host name and node name
In PowerHA SystemMirror 7.1, both the cluster node name and AIX host name be the same.
3.6.5 Other network considerations
Other network considerations for using PowerHA SystemMirror 7.1 include single adapter
networks and virtual Ethernet.
Single adapter networks
Through the use of EtherChannel, Shared Ethernet Adapters (SEA), or both at the VIOS
level, it is common today to have redundant interfaces act as one logical interface to the AIX
client or cluster node. In these configurations, historically users configured a netmon.cf file to
ping additional external interfaces or addresses. The netmon.cf configuration file is no longer
required.
Virtual Ethernet
In previous versions, when using virtual Ethernet, users configured a special formatted
netmon.cf file to ping additional external interfaces or addresses by using specific outbound
interfaces. The netmon.cf configuration file no longer applies.
Chapter 3. Planning a cluster implementation for high availability
51
52
IBM PowerHA SystemMirror 7.1 for AIX
4
Chapter 4.
Installing PowerHA SystemMirror
7.1 for AIX
This chapter explains how to install the IBM PowerHA SystemMirror 7.1 for AIX Standard
Edition software.
This chapter includes the following topics:
Hardware configuration of the test environment
Installing PowerHA file sets
Volume group consideration
© Copyright IBM Corp. 2011. All rights reserved.
53
4.1 Hardware configuration of the test environment
Figure 4-1 shows a hardware overview of the test environment to demonstrate the installation
and configuration procedures in this chapter. It consists of two IBM Power 570 logical
partitions (LPARs), both SAN-attached to a DS4800 storage subsystem and connected to a
common LAN segment.
Figure 4-1 PowerHA Lab environment
4.1.1 SAN zoning
In the test environment, the conventional SAN zoning is configured between each host and
the storage subsystem to allow for the host attachment of the shared disks.
For the cluster SAN-based communication channel, two extra zones are created as shown in
Example 4-1. One zone includes the fcs0 ports of each server, and the other zone includes
the fcs1 ports of each server.
Example 4-1 Host-to-host zoning for SAN-based channel
sydney:/ # for i in 0 1; do lscfg -vpl fcs$i|grep "Network Address";done
Network Address.............10000000C974C16E
Network Address.............10000000C974C16F
perth:/ # for i in 0 1; do lscfg -vpl fcs$i|grep "Network Address";done
Network Address.............10000000C97720D8
Network Address.............10000000C97720D9
54
IBM PowerHA SystemMirror 7.1 for AIX
Fabric1:
zone: Syndey_fcs0__Perth_fcs0
10:00:00:00:c9:74:c1:6e
10:00:00:00:c9:77:20:d8
Fabric2:
zone: Syndey_fcs1__Perth_fcs1
10:00:00:00:c9:74:c1:6f
10:00:00:00:c9:77:20:d9
This dual zone setup provides redundancy for the SAN communication channel at the Cluster
Aware AIX (CAA) storage framework level. The dotted lines in Figure 4-2 represent the
initiator-to-initiator zones added on top of the conventional ones, connecting host ports to
storage ports.
Figure 4-2 Host-to-host zoning
4.1.2 Shared storage
Three Redundant Array of Independent Disks (RAID) logical drives are configured on the
DS4800 storage subsystem and are presented to both AIX nodes. One logical drive hosts the
cluster repository disk. On the other two drives, the shared storage space is configured for
application data.
Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX
55
Example 4-2 shows that each disk is available through two paths on different Fibre Channel
(FC) adapters.
Example 4-2 FC path setup on AIX nodes
sydney:/ # for
Enabled hdisk1
Enabled hdisk1
Enabled hdisk2
Enabled hdisk2
Enabled hdisk3
Enabled hdisk3
perth:/
Enabled
Enabled
Enabled
Enabled
Defined
Enabled
i in hdisk1 hdisk2 hdisk3 ; do lspath -l $i;done
fscsi0
fscsi1
fscsi0
fscsi1
fscsi0
fscsi1
# for i in hdisk1 hdisk2 hdisk3 ; do lspath -l $i;done
hdisk1 fscsi0
hdisk1 fscsi1
hdisk2 fscsi0
hdisk2 fscsi1
hdisk3 fscsi0
hdisk3 fscsi1
The multipath driver being used is the AIX native MPIO. In Example 4-3, the mpio_get_config
command shows identical LUNs on both nodes, as expected.
Example 4-3 MPIO shared LUNs on AIX nodes
sydney:/ # mpio_get_config -Av
Frame id 0:
Storage Subsystem worldwide name: 60ab800114632000048ed17e
Controller count: 2
Partition count: 1
Partition 0:
Storage Subsystem Name = 'ITSO_DS4800'
hdisk
LUN #
Ownership
User Label
hdisk1
7
B (preferred)
PW-0201-L7
hdisk2
8
A (preferred)
PW-0201-L8
hdisk3
9
B (preferred)
PW-0201-L9
perth:/ # mpio_get_config -Av
Frame id 0:
Storage Subsystem worldwide name: 60ab800114632000048ed17e
Controller count: 2
Partition count: 1
Partition 0:
Storage Subsystem Name = 'ITSO_DS4800'
hdisk
LUN #
Ownership
User Label
hdisk1
7
B (preferred)
PW-0201-L7
hdisk2
8
A (preferred)
PW-0201-L8
hdisk3
9
B (preferred)
PW-0201-L9
56
IBM PowerHA SystemMirror 7.1 for AIX
4.1.3 Configuring the FC adapters for SAN-based communication
To properly configure the FC adapters for the cluster SAN-based communication, follow these
steps:
X in fcsX: In the following steps, the X in fcsX represents the number of the FC adapters.
You must complete this procedure for each FC adapter that is involved in cluster
SAN-based communication.
1. Unconfigure fcsX:
rmdev -Rl fcsX
fcsX device busy: If the fcsX device is busy when you use the rmdev command, enter
the following commands:
chdev -P -l fcsX -a tme=yes
chdev -P -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail
Then restart the system.
2. Change tme attribute value to yes in the fcsX definition:
chdev -l fcsX -a tme=yes
3. Enable the dynamic tracking and the fast-fail error recovery policy on the corresponding
fscsiX device:
chdev -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail
4. Configure fcsX port and its associated Storage Framework Communication device:
cfgmgr -l fcsX;cfgmgr -l sfwcommX
5. Verify the configuration changes by running the following commands:
lsdev -C | grep -e fcsX -e sfwcommX
lsattr -El fcsX | grep tme
lsattr -El fscsiX | grep -e dyntrk -e fc_err_recov
Example 4-4 illustrates the procedure for port fcs0 on node sydney.
Example 4-4 SAN-based communication channel setup
sydney:/ # lsdev -l fcs0
fcs0 Available 00-00 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03)
sydney:/ # lsattr -El fcs0|grep tme
tme
no
Target Mode Enabled
True
sydney:/ # rmdev -Rl fcs0
fcnet1 Defined
sfwcomm0 Defined
fscsi0 Defined
fcs0 Defined
sydney:/ # chdev -l fcs0 -a tme=yes
fcs0 changed
sydney:/ # chdev -l fscsi0 -a dyntrk=yes -a fc_err_recov=fast_fail
Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX
57
fscsi0 changed
sydney:/ # cfgmgr -l fcs0;cfgmgr -l sfwcomm0
sydney:/ # lsdev -C|grep -e fcs0 -e sfwcomm0
fcs0
Available 01-00
8Gb PCI Express Dual Port FC Adapter
(df1000f114108a03)
sfwcomm0
Available 01-00-02-FF Fiber Channel Storage Framework Comm
sydney:/ # lsattr -El fcs0|grep tme
tme
yes
Target Mode Enabled True
sydney:/ # lsattr -El fscsi0|grep -e dyntrk -e fc_err_recov
dyntrk
yes
Dynamic Tracking of FC Devices
True
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True
4.2 Installing PowerHA file sets
At a minimum, you must have the following PowerHA runtime executable files:
cluster.es.client
cluster.es.server
cluster.es.cspoc
Depending on the functionality required for your environment, additional file sets might be
selected for installation.
Migration consideration: Installation on top of a previous release is considered a
migration. Additional steps are required for migration including running the clmigcheck
command. For more information about migration, see Chapter 7, “Migrating to PowerHA
7.1” on page 151.
PowerHA SystemMirror 7.1 for AIX Standard Edition includes the Smart Assists images. For
more details about the Smart Assists functionality and new features, see 2.2, “New features”
on page 24.
The PowerHA for IBM Systems Director agent file set comes with the base installation media.
To learn more about PowerHA SystemMirror for IBM Systems Director, see 5.3, “PowerHA
SystemMirror for IBM Systems Director” on page 133.
You can install the required packages in the following ways:
From a CD
From a hard disk to which the software has been copied
From a Network Installation Management (NIM) server
Installation from a CD is more appropriate for small environments. Use NFS export and
import for remote nodes to avoid multiple CD maneuvering or image copy operations.
The following section provides an example of how to use a NIM server to install the PowerHA
software.
58
IBM PowerHA SystemMirror 7.1 for AIX
4.2.1 PowerHA software installation example
This section guides you through an example of installing the PowerHA software. This example
runs on the server configuration shown in 4.1, “Hardware configuration of the test
environment” on page 54.
Installing the AIX BOS components and RSCT
Some of the prerequisite file sets might already be present, or they might be missing from
previous installations, updates, and removals. To begin, a consistent AIX image must be
installed. The test environment entailed starting with a “New and Complete Overwrite” of AIX
6.1.6.1 installation from a NIM server. Example 4-5 shows how to check the AIX version and
the consistency of the installation.
Example 4-5 Initial AIX image
sydney:/ # oslevel -s
6100-06-01-1043
sydney:/ # lppchk -v
sydney:/ #
In Example 4-6, the lslpp command lists the prerequisites that are already installed and the
ones that are missing in a single output.
Example 4-6 Checking the installed and missing prerequisites
sydney:/ # lslpp -L bos.adt.lib bos.adt.libm bos.adt.syscalls bos.clvm.enh \
> bos.cluster.rte bos.cluster.solid bos.data bos.ahafs bos.net.tcp.client \
> bos.net.tcp.server bos.rte.SRC bos.rte.libc bos.rte.libcfg \
> bos.rte.libcur bos.rte.libpthreads bos.rte.lvm bos.rte.odm \
> bos.rte.libcur bos.rte.libpthreads bos.rte.lvm bos.rte.odm \
> rsct.basic.rte rsct.compat.basic.hacmp rsct.compat.clients.hacmp
Fileset
Level State Type Description (Uninstaller)
---------------------------------------------------------------------------bos.adt.lib
6.1.2.0
C
F
Base Application Development
Libraries
lslpp: Fileset bos.adt.libm not installed.
lslpp: Fileset bos.adt.syscalls not installed.
bos.cluster.rte
6.1.6.1
C
F
Cluster Aware AIX
bos.cluster.solid
6.1.6.1
C
F
POWER HA Business Resiliency
solidDB
lslpp: Fileset bos.clvm.enh not installed.
lslpp: Fileset bos.data not installed.
bos.net.tcp.client
6.1.6.1
C
F
TCP/IP Client Support
bos.net.tcp.server
6.1.6.0
C
F
TCP/IP Server
bos.rte.SRC
6.1.6.0
C
F
System Resource Controller
bos.rte.libc
6.1.6.1
C
F
libc Library
bos.rte.libcfg
6.1.6.0
C
F
libcfg Library
bos.rte.libcur
6.1.6.0
C
F
libcurses Library
bos.rte.libpthreads
6.1.6.0
C
F
pthreads Library
bos.rte.lvm
6.1.6.0
C
F
Logical Volume Manager
bos.rte.odm
6.1.6.0
C
F
Object Data Manager
rsct.basic.rte
3.1.0.1
C
F
RSCT Basic Function
rsct.compat.basic.hacmp
3.1.0.1
C
F
RSCT Event Management Basic
Function (HACMP/ES Support)
Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX
59
rsct.compat.clients.hacmp
3.1.0.0
C
F
RSCT Event Management Client
Function (HACMP/ES Support)
Figure 4-3 shows selection of the appropriate lpp_source on the NIM server, aix6161, by
following the path smitty nim  Install and Update Software  Install Software. You
select all of the required file sets on the next panel.
Install and Update Software
Move cursor to desired item and press Enter.
Install Software
Update Installed Software to Latest Level (Update All)
Install Software Bundle
Update Software by Fix (APAR)
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•
Select the LPP_SOURCE containing the install images
•
•
•
• Move cursor to desired item and press Enter.
•
•
•
•
aix7100g
resources
lpp_source
•
•
aix7101
resources
lpp_source
•
•
aix6161
resources
lpp_source
•
•
ha71sp1
resources
lpp_source
•
•
aix6060
resources
lpp_source
•
•
aix6160-SP1-only
resources
lpp_source
•
•
•
• F1=Help
F2=Refresh
F3=Cancel
•
• Esc+8=Image
Esc+0=Exit
Enter=Do
•
F1• /=Find
n=Find Next
•
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 4-3 Installing the prerequisites: Selecting lpp_source
60
IBM PowerHA SystemMirror 7.1 for AIX
Figure 4-4 shows one of the selected file sets, bos.clvm. Although it is not required for
another file set, bos.clvm is mandatory for PowerHA 7.1 because only enhanced concurrent
volume groups (ECVGs) are supported. See 10.3.3, “The ECM volume group” on page 313,
for more details.
Ty••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Pr•
Software to Install
•
•
•
[T• Move cursor to desired item and press Esc+7. Use arrow keys to scroll.
•
* •
ONE OR MORE items can be selected.
•
* • Press Enter AFTER making all selections.
•
•
•
• [MORE...2286]
•
•
+ 6.1.6.1 POWER HA Business Resiliency solidDB
•
•
+ 6.1.6.0 POWER HA Business Resiliency solidDB
•
•
•
• > bos.clvm
ALL •
•
+ 6.1.6.0 Enhanced Concurrent Logical Volume Manager
•
•
•
•
bos.compat
ALL •
•
+ 6.1.6.0 AIX 3.2 Compatibility Commands
•
• [MORE...4498]
•
[M•
•
• F1=Help
F2=Refresh
F3=Cancel
•
F1• Esc+7=Select
Esc+8=Image
Esc+0=Exit
•
Es• Enter=Do
/=Find
n=Find Next
•
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 4-4 Installing the prerequisites: Selecting the file sets
After installing from the NIM server, ensure that each node remains at the initial version of AIX
and RSCT, and check the software consistency, as shown in Example 4-7.
Example 4-7 Post-installation check of the prerequisites
sydney:/ # oslevel -s
6100-06-01-1043
sydney:/ # lppchk -v
sydney:/ # lslpp -L rsct.basic.rte rsct.compat.basic.hacmp \
> rsct.compat.clients.hacmp
Fileset
Level State Type Description (Uninstaller)
---------------------------------------------------------------------------rsct.basic.rte
3.1.0.1
C
F
RSCT Basic Function
rsct.compat.basic.hacmp
3.1.0.1
C
F
RSCT Event Management Basic
Function (HACMP/ES Support)
rsct.compat.clients.hacmp 3.1.0.0
C
F
RSCT Event Management Client
Function (HACMP/ES Support)
Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX
61
Installing the PowerHA file sets
To prepare an lpp_source that contains the required base and updated file sets, follow these
steps:
1. Copy the file set from the media to a directory on the NIM server by using the smit
bffcreate command.
2. Apply the latest service pack in the same directory by using the smit bffcreate
command.
3. Create an lpp_source resource that points to the directory on the NIM server.
Example 4-8 lists the contents of the lpp_source. As mentioned previously, both the Smart
Assist file sets and PowerHA for IBM Systems Director agent file set come with the base
media.
Example 4-8 The contents of lpp_source in PowerHA SystemMirror
nimres1:/ # lsnim -l ha71sp1
ha71sp1:
class
= resources
type
= lpp_source
arch
= power
Rstate
= ready for use
prev_state = unavailable for use
location
= /nimrepo/lpp_source/HA71
alloc_count = 0
server
= master
nimres1:/nimres1:/ # ls /nimrepo/lpp_source/HA71
.toc
cluster.adt.es
cluster.doc.en_US.assist
cluster.doc.en_US.assist.db2.html.7.1.0.1.bff
cluster.doc.en_US.assist.oracle.html.7.1.0.1.bff
cluster.doc.en_US.assist.websphere.html.7.1.0.1.bff
cluster.doc.en_US.es
cluster.doc.en_US.es.html.7.1.0.1.bff
cluster.doc.en_US.glvm.html.7.1.0.1.bff
cluster.es.assist
cluster.es.assist.common.7.1.0.1.bff
cluster.es.assist.db2.7.1.0.1.bff
cluster.es.assist.domino.7.1.0.1.bff
cluster.es.assist.ihs.7.1.0.1.bff
cluster.es.assist.sap.7.1.0.1.bff
cluster.es.cfs
cluster.es.cfs.rte.7.1.0.1.bff
cluster.es.client
cluster.es.client.clcomd.7.1.0.1.bff
cluster.es.client.lib.7.1.0.1.bff
cluster.es.client.rte.7.1.0.1.bff
cluster.es.cspoc
cluster.es.director.agent
cluster.es.migcheck
cluster.es.nfs
cluster.es.server
cluster.es.server.diag.7.1.0.1.bff
cluster.es.server.events.7.1.0.1.bff
62
IBM PowerHA SystemMirror 7.1 for AIX
cluster.es.server.rte.7.1.0.1.bff
cluster.es.server.utils.7.1.0.1.bff
cluster.es.worksheets
cluster.license
cluster.man.en_US.es.data
cluster.msg.en_US.assist
cluster.msg.en_US.es
rsct.basic_3.1.0.0
rsct.compat.basic_3.1.0.0
rsct.compat.clients_3.1.0.0
rsct.core_3.1.0.0
rsct.exp_3.1.0.0
rsct.opt.fence_3.1.0.0
rsct.opt.stackdump_3.1.0.0
rsct.opt.storagerm_3.1.0.0
rsct.sdk_3.1.0.0
Example 4-9 shows the file sets that were selected for the test environment and installed from
the lpp_source that was prepared previously. Each node requires a PowerHA license.
Therefore, you must install the license file set.
Example 4-9 List of installed PowerHA file sets
sydney:/ # lslpp -L cluster.*
Fileset
Level State Type Description (Uninstaller)
---------------------------------------------------------------------------Infrastructure
cluster.es.client.lib
7.1.0.1
C
F
PowerHA SystemMirror Client
Libraries
cluster.es.client.rte
7.1.0.1
C
F
PowerHA SystemMirror Client
Runtime
cluster.es.client.utils
7.1.0.0
C
F
PowerHA SystemMirror Client
Utilities
cluster.es.client.wsm
7.1.0.0
C
F
Web based Smit
cluster.es.cspoc.cmds
7.1.0.0
C
F
CSPOC Commands
cluster.es.cspoc.dsh
7.1.0.0
C
F
CSPOC dsh
cluster.es.cspoc.rte
7.1.0.0
C
F
CSPOC Runtime Commands
cluster.es.migcheck
7.1.0.0
C
F
PowerHA SystemMirror Migration
support
cluster.es.server.cfgast
7.1.0.0
C
F
Two-Node Configuration
Assistant
cluster.es.server.diag
7.1.0.1
C
F
Server Diags
cluster.es.server.events
7.1.0.1
C
F
Server Events
cluster.es.server.rte
7.1.0.1
C
F
Base Server Runtime
cluster.es.server.testtool
7.1.0.0
C
F
Cluster Test Tool
cluster.es.server.utils
7.1.0.1
C
F
Server Utilities
cluster.license
7.1.0.0
C
F
PowerHA SystemMirror
Electronic License
cluster.man.en_US.es.data 7.1.0.0
C
F
Man Pages - U.S. English
cluster.msg.en_US.assist
7.1.0.0
C
F
PowerHA SystemMirror Smart
Assist Messages - U.S. English
cluster.msg.en_US.es.client
7.1.0.0
C
F
PowerHA SystemMirror Client
Messages - U.S. English
cluster.msg.en_US.es.server
Chapter 4. Installing PowerHA SystemMirror 7.1 for AIX
63
7.1.0.0
C
F
Recovery Driver Messages U.S. English
Then verify the installed software as shown in Example 4-10. The prompt return by the lppchk
command confirms the consistency of the installed file sets.
Example 4-10 Verifying the installed PowerHA filesets consistency
sydney:/ # lppchk -v
sydney:/ # lppchk -c cluster.*
sydney:/ #
4.3 Volume group consideration
PowerHA 7.1 supports only the use of enhanced concurrent volume groups. If you try to add
an existing non-current volume group to a PowerHA resource group, it fails if it is not already
imported on the other node with the error message shown in Figure 4-5.
Auto Discover/Import of Volume Groups was set to true.
Gathering cluster information, which may take a few minutes.
claddres: test_vg is not a shareable volume group.
Could not perform all imports.
No ODM values were changed.
<01> Importing Volume group: test_vg onto node: chile: FAIL
Verification to be performed on the following:
Cluster Topology
Cluster Resources
Figure 4-5 Error message when adding a volume group
To work around the problem shown in Figure 4-5, manually import the volume group on the
other node by using the following command:
importvg -L test_vg hdiskx
After the volume group is added to the other node, the synchronization and verification are
then completed.
Volume group conversion: The volume group is automatically converted to an enhanced
concurrent volume group during the first startup of the PowerHA cluster.
64
IBM PowerHA SystemMirror 7.1 for AIX
5
Chapter 5.
Configuring a PowerHA cluster
To configure a PowerHA cluster, you can choose from the following options.
SMIT
SMIT is the most commonly used way to manage and configure a cluster. The SMIT
menus are available after the cluster file sets are installed. The learning cycle for using
SMIT is shorter than the learning cycle for using the command-line interface (CLI). For
more information about using SMIT to configure a cluster, see 5.1, “Cluster configuration
using SMIT” on page 66.
PowerHA SystemMirror plug-in for IBM Systems Director
IBM Systems Director is for users who are ready to use and want to use it to manage and
configure the PowerHA clusters. You might choose this option if you are working with large
environments for central management of all clusters.
You can choose from two methods, as explained in the following sections, to configure a
cluster using IBM Systems Director:
– 12.1.1, “Creating a cluster with the SystemMirror plug-in wizard” on page 334
– 12.1.2, “Creating a cluster with the SystemMirror plug-in CLI” on page 339
The clmgr CLI
You can use the clmgr utility for configuration tasks. However, its purpose is to provide a
uniform scripting interface for deployments in larger environments and to perform
day-to-day cluster management. For more information about using this tool, see 5.2,
“Cluster configuration using the clmgr tool” on page 104.
You can perform most administration tasks with any of these options. The option that you
choose depends on which one you prefer and which one meets the requirements of your
environment.
This chapter includes the following topics:
Cluster configuration using SMIT
Cluster configuration using the clmgr tool
PowerHA SystemMirror for IBM Systems Director
© Copyright IBM Corp. 2011. All rights reserved.
65
5.1 Cluster configuration using SMIT
This topic includes the following sections:
SMIT menu changes
Overview of the test environment
Typical configuration of a cluster topology
Custom configuration of the cluster topology
Configuring resources and applications
Configuring Start After and Stop After resource group dependencies
Creating a user-defined resource type
Configuring the dynamic node priority (adaptive failover)
Removing a cluster
5.1.1 SMIT menu changes
The SMIT menus for PowerHA SystemMirror 7.1 are restructured to simplify configuration
and administration by grouping menus by function.
Locating available options: If you are familiar with the SMIT paths from an earlier
version, and need to locate a specific feature, use the “Can’t find what you are looking for
?” feature from the main SMIT menu to list and search the available options.
To enter the top-level menu, use the new fast path, smitty sysmirror. The fast path on earlier
versions, smitty hacmp, still works. From the main menu, the highlighted options shown in
Figure 5-1 are available to help with topology and resources configuration. Most of the tools
necessary to configure cluster components are under “Cluster Nodes and Networks” and
“Cluster Applications and Resources.” Some terminology has changed, and the interface
looks more simplified for easier navigation and management.
PowerHA SystemMirror
Move cursor to desired item and press Enter.
Cluster Nodes and Networks
Cluster Applications and Resources
System Management (C-SPOC)
Problem Determination Tools
Custom Cluster Configuration
Can't find what you are looking for ?
Not sure where to start ?
Figure 5-1 Top-level SMIT menu
Because topology monitoring has been transferred to CAA, its management has been
simplified. Support for non-TCP/IP heartbeat has been transferred to CAA and is no longer a
separate configurable option. Instead of multiple menu options and dialogs for configuring
non-TCP/IP heartbeating devices, a single option is available plus a window (Figure 5-2) to
specify the CAA cluster repository disk and the multicast IP address.
66
IBM PowerHA SystemMirror 7.1 for AIX
Up-front help information and navigation aids, similar to the last two items in the top-level
menu in Figure 5-1 (Can't find what you are looking for ? and Not sure where to start ?), are
now available in some of the basic panels. See the last menu option in Figure 5-2 (What are a
repository disk and cluster IP address ?) for an example. The context-sensitive help (F1 key)
in earlier versions is still available.
Initial Cluster Setup (Typical)
Move cursor to desired item and press Enter.
Setup a Cluster, Nodes and Networks
Define Repository Disk and Cluster IP Address
What are a repository disk and cluster IP address ?
F1=Help
Esc+9=Shell
F2=Refresh
Esc+0=Exit
F3=Cancel
Enter=Do
Esc+8=Image
Figure 5-2 Help information
The top resource menus keep only the commonly used options, and the less frequently used
menus are deeper in the hierarchy, under a new Custom Cluster Configuration menu. This
menu includes various customizable and advanced options, similar to the “Extended
Configuration” menu in earlier versions. See 2.3, “Changes to the SMIT panel” on page 25,
for a layout that compares equivalent menu screens in earlier versions with the new screens.
The Verify and Synchronize functions now have a simplified form in most of the typical menus,
while the earlier customizable version is available in more advanced contexts.
Application server versus application controller: Earlier versions used the term
application server to refer to the scripts that are used to start and stop applications under
SystemMirror control. In version 7.1, these scripts are referred to as application
controllers.
A System Events dialog is now available in addition to the user-defined events and pre- and
post-event commands for predefined events from earlier versions. For more information about
this dialog, see 9.4, “Testing the rootvg system event” on page 286.
SSA disks are no longer supported in AIX 6.1, and the RSCT role has been diminished.
Therefore, some related menu options have been removed. See Chapter 2, “Features of
PowerHA SystemMirror 7.1” on page 23, for more details about the new and obsolete
features.
For a topology configuration, SMIT provides two possible approaches that resemble the
previous Standard and Extended configuration paths: typical configuration and custom
configuration.
Typical configuration
The smitty sysmirror  Cluster Nodes and Networks  Initial Cluster Setup (Typical)
configuration path provides the means to configure the basic components of a cluster in a few
steps. Discovery and selection of configuration information is automated, and default values
are provided whenever possible. If you need to use specific values instead of the default
Chapter 5. Configuring a PowerHA cluster
67
paths that are provided, you can change them later or use the custom configuration path
instead.
Custom configuration
Custom cluster configuration options are not typically required or used by most customers.
However they provide extended flexibility in configuration and management options. These
options are under the Custom Cluster Configuration option in the top-level panel. If you want
complete control over which components are added to the cluster, and create them piece by
piece, you can configure the cluster topology with the SMIT menus. Follow the path Custom
Cluster Configuration  Initial Cluster Setup (Custom). With this path, you can also set
your own node and network names, other than the default ones. Alternatively, you can choose
only specific network interfaces to support the clustered applications. (By default, all IP
configured interfaces are used.)
Resources configuration
The Cluster Applications and Resources menu in the top-level panel groups the commonly
used options for configuring resources, resource groups, and application controllers.
Other resource options that are not required in most typical configurations are under the
Custom Cluster Configuration menu. They provide dialogs and options to perform the
following tasks:
Configure a custom disk, volume group, and file system methods for cluster resources
Customize resource recovery and service IP label distribution policy
Customize and event
Most of the resources menus and dialogs are similar to their counterparts in earlier versions.
For more information, see the existing documentation about the previous releases listed in
“Related publications” on page 519.
5.1.2 Overview of the test environment
The cluster used in the test environment is a mutual-takeover, dual-node implementation with
two resource groups, one on each node. Figure 5-3 on page 69 shows the cluster
configuration on top of the hardware infrastructure introduced in 4.1, “Hardware configuration
of the test environment” on page 54.
68
IBM PowerHA SystemMirror 7.1 for AIX
Figure 5-3 Mutual-takeover, dual-node cluster
By using this setup, we can present various aspects of a typical production implementation,
such as topology redundancy or more complex resource configuration. As an example, we
configure SAN-based heartbeating and introduce the new Start After and Stop After resource
group dependencies.
5.1.3 Typical configuration of a cluster topology
This section explains step-by-step how to configure a basic PowerHA cluster topology using
the typical cluster configuration path. For an example of using the custom cluster
configuration path, see 5.1.4, “Custom configuration of the cluster topology” on page 78.
Prerequisite: Before reading this section, you must have configured all your networks and
storage devices as explained in 3.2, “Hardware requirements” on page 44.
The /etc/cluster/rhosts directory must be populated with all cluster IP addresses before
using PowerHA SystemMirror. This process was done automatically in earlier versions, but is
now a required, manual process. The addresses that you enter in this file must include the
addresses that resolve to the host name of the cluster nodes. If you update this file, you must
refresh the clcomd subsystem with the refresh -s clcomd command.
Chapter 5. Configuring a PowerHA cluster
69
In previous releases of PowerHA, you were not required to have the host name resolve into
an IP address. From the information based on the PowerHA release notes, you are required
to resolve the host name.
Important: Previous releases used the clcomdES subsystem, which read information from
the /usr/es/sbin/cluster/etc/rhosts directory. The clcomdES subsystem is no longer
used. Therefore, you must configure the clcomd subsystem as explained in this section.
Also, ensure that you have one unused shared disk available for the cluster repository.
Example 5-1 shows the lspv command output on the systems sydney and perth. The first
part shows the output from the node sydney, and the second part shows the output from
perth.
Example 5-1 lspv command output before configuring PowerHA
sydney:/ # lspv
hdisk0
00c1f170488a4626
hdisk1
00c1f170fd6b4d9d
hdisk2
00c1f170fd6b50a5
hdisk3
00c1f170fd6b5126
rootvg
dbvg
appvg
None
active
--------------------------------------------------------------------------perth:/ # lspv
hdisk0
00c1f1707c6092fe
rootvg
active
hdisk1
00c1f170fd6b4d9d
dbvg
hdisk2
00c1f170fd6b50a5
appvg
hdisk3
00c1f170fd6b5126
None
Node names: The sydney and perth node names have no implication on extended
distance capabilities. The names have been used only for node names.
Defining a cluster
To define a cluster, follow these steps:
1. Use the smitty sysmirror or smitty hacmp fast path.
2. In the PowerHA SystemMirror menu (Figure 5-4), select the Cluster Nodes and
Networks option.
PowerHA SystemMirror
Move cursor to desired item and press Enter.
Cluster Nodes and Networks
Cluster Applications and Resources
System Management (C-SPOC)
Problem Determination Tools
Custom Cluster Configuration
Can't find what you are looking for ?
Not sure where to start ?
Figure 5-4 Menu that is displayed after entering smitty sysmirror
70
IBM PowerHA SystemMirror 7.1 for AIX
3. In the Cluster Nodes and Networks menu (Figure 5-5), select the Initial Cluster Setup
(Typical) option.
Cluster Nodes and Networks
Move cursor to desired item and press Enter.
Initial Cluster Setup (Typical)
Manage the Cluster
Manage Nodes
Manage Networks and Network Interfaces
Discover Network Interfaces and Disks
Verify and Synchronize Cluster Configuration
Figure 5-5 Cluster Nodes and Networks menu
4. In the Initial Cluster Setup (Typical) menu (Figure 5-6), select the Setup a Cluster, Nodes
and Networks option.
Initial Cluster Setup (Typical)
Move cursor to desired item and press Enter.
Setup a Cluster, Nodes and Networks
Define Repository Disk and Cluster IP Address
What are a repository disk and cluster IP address ?
Figure 5-6 Initial cluster setup (typical)
5. From the Setup a Cluster, Nodes, and Networks panel (Figure 5-7 on page 72), complete
the following steps:
a. Specify the repository disk and the multicast IP address.
The cluster name is based on the host name of the system. You can use this default or
replace it with a name you want to use. In the text environment, the cluster is named
australia.
b. In the New Nodes field, define the IP label that you want to use to communicate to the
other systems. In this example, we plan to build a two-node cluster where the two
systems are named sydney and perth. If you want to create a cluster with more than
two nodes, you can specify more than one system by using the F4 key. The advantage
is that you do not get typographical errors, and you can verify that the /etc/hosts file
contains your network addresses.
The Currently Configured Node(s) field lists all the configured nodes or lists the host
name of the system you are working on if nothing is configured so far.
c. Press Enter.
Chapter 5. Configuring a PowerHA cluster
71
Setup Cluster, Nodes and Networks (Typical)
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Cluster Name
New Nodes (via selected communication paths)
Currently Configured Node(s)
[Entry Fields]
[australia]
[perth]
sydney
Figure 5-7 Setup a Cluster, Nodes and Networks panel
The COMMAND STATUS panel (Figure 5-8) indicates that the cluster creation completed
successfully.
COMMAND STATUS
Command: OK
stdout: yes
stderr: no
Before command completion, additional instructions may appear below.
[TOP]
Cluster Name: australia_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: None
Cluster IP Address:
There are 2 node(s) and 1 network(s) defined
NODE perth:
Network net_ether_01
perth 192.168.101.136
NODE sydney:
Network net_ether_01
sydney 192.168.101.135
No resource groups defined
clharvest_vg: Initializing....
Gathering cluster information, which may take a few minutes...
clharvest_vg: Processing...
Storing the following information in file
/usr/es/sbin/cluster/etc/config/clvg_config
perth:
[MORE...93]
Figure 5-8 Cluster creation completed successfully
72
IBM PowerHA SystemMirror 7.1 for AIX
+
If you receive an error message similar to the example in Figure 5-9, you might have missed a
step. For example, you might not have added the host names to /etc/cluster/rhosts
directory or forgot to use the refresh -s clcomd command. Alternatively, you might have to
change the host name in the /etc/cluster/rhosts directory to a full domain-based host
name.
Reminder: After you change the /etc/cluster/rhosts directory, enter the refresh -s
clcomd command.
COMMAND STATUS
Command: failed
stdout: yes
stderr: no
Before command completion, additional instructions may appear below.
Warning: There is no cluster found.
cllsclstr: No cluster defined
cllsclstr: Error reading configuration
Figure 5-9 Failure to set up the initial cluster
When you look in more detail at the output, you might notice that the system adds your entries
to the cluster configuration and runs a discovery on the systems. You also get information
about the discovered shared disks that are listed.
Configuring the repository disk and cluster multicast IP address
After you configure the cluster, configure the repository disk and the cluster multicast IP
address.
1. Go back to the Initial Cluster Setup (Typical) panel (Figure 5-6 on page 71). You can use
the path smitty sysmirror  Cluster Nodes and Networks  Initial Cluster Setup
(Typical) or the smitty cm_setup_menu fast path.
2. In the Initial Cluster Setup (Typical) panel, select the Define Repository and Cluster IP
Address option.
Chapter 5. Configuring a PowerHA cluster
73
3. In the Define Repository and Cluster IP Address panel (Figure 5-10), complete these
steps:
a. Press the F4 key to select the disk that you want to use as the repository disk for CAA.
As shown in Example 5-1 on page 70, only one unused shared disk, hdisk3, remains.
b. Leave the Cluster IP Address field empty. The system generates an appropriate
address for you.
The cluster IP address is a multicast address that is used for internal cluster
communication and monitoring. Specify an address manually only if you have an
explicit reason to do so. For more information about the cluster multicast IP address,
see “Requirements for the multicast IP address, SAN, and repository disk” on page 45.
Multicast address not specified: If you did not specify a multicast address, you
can see the one that AIX chose for you in the output of the cltopinfo command.
c. Press Enter.
Define Repository and Cluster IP Address
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Cluster Name
* Repository Disk
Cluster IP Address
[Entry Fields]
australia
[None]
[]
+
+--------------------------------------------------------------------------+
|
Repository Disk
|
|
|
| Move cursor to desired item and press Enter.
|
|
|
|
hdisk3
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
F1| F8=Image
F10=Exit
Enter=Do
|
F5| /=Find
n=Find Next
|
F9+--------------------------------------------------------------------------+
Figure 5-10 Define Repository and Cluster IP Address panel
74
IBM PowerHA SystemMirror 7.1 for AIX
Then the COMMAND STATUS panel (Figure 5-11) opens.
COMMAND STATUS
Command: OK
stdout: yes
stderr: no
Before command completion, additional instructions may appear below.
[TOP]
Cluster Name: australia
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: hdisk3
Cluster IP Address:
There are 2 node(s) and 1 network(s) defined
NODE perth:
Network net_ether_01
perth 192.168.101.136
NODE sydney:
Network net_ether_01
sydney 192.168.101.135
No resource groups defined
Current cluster configuration:
[BOTTOM]
Figure 5-11 COMMAND STATUS showing OK for adding a repository disk
This process only updates the information in the cluster configuration. If you use the lspv
command on any nodes in the cluster, each node still shows the same output as listed in
Example 5-1 on page 70. When the cluster is synchronized the first time, both the CAA
cluster and repository disk are created.
Creating a cluster with host names in the FQDN format
In the testing environments, we create working clusters with both short and fully qualified
domain name (FQDN) host names. To use the FQDN, you must follow this guidance:
The /etc/hosts file has the FQDN entry first, right after the IP address, and then the short
host name as an alias for each label. In this case, the FQDN name is used by CAA
because CAA always uses the host name for its node names, regardless of whether the
host name is short or FQDN.
Define the PowerHA node names with the short names because dots are not accepted as
part of a node name.
As long as the /etc/hosts file contains the FQDN entry first, and then the short name as
an alias, the host name can be either FQDN or short in your configuration.
As long as the /etc/hosts file contains the FQDN entry first, and then the short name as
an alias, the /etc/cluster/rhosts file can contain only the short name. This file is only
used for the first synchronization of the cluster, when the Object Data Manager (ODM)
classes are still not populated with the communication paths for the nodes. The same
Chapter 5. Configuring a PowerHA cluster
75
function as /usr/es/sbin/cluster/etc/rhosts file exists in previous PowerHA and
HACMP versions.
When you are defining the interfaces to PowerHA, choose either the short or long name
from the pick lists in SMIT. PowerHA always uses the short name at the end. The same
guidance applies for service or persistent addresses.
Logical partition (LPAR) names continue to be the short ones, even if you use FQDN for
host names.
Example 5-2 shows a configuration that uses host names in the FQDN format.
Example 5-2 Configuration using host names in the FQDN format
seoul.itso.ibm.com:/ # clcmd cat /etc/hosts
------------------------------NODE seoul.itso.ibm.com
------------------------------127.0.0.1
loopback localhost
# loopback (lo0) name/address
::1
loopback localhost
# IPv6 loopback (lo0) name/address
192.168.101.143 seoul-b1.itso.ibm.com seoul-b1
# Base IP label 1
192.168.101.144 busan-b1.itso.ibm.com busan-b1
# Base IP label 1
192.168.201.143 seoul-b2.itso.ibm.com seoul-b2
# Base IP label 2
192.168.201.144 busan-b2.itso.ibm.com busan-b2
# Base IP label 2
10.168.101.43 seoul.itso.ibm.com
seoul
# Persistent IP
10.168.101.44 busan.itso.ibm.com
busan
# Persistent IP
10.168.101.143 poksap-db.itso.ibm.com poksap-db
# Service IP label
10.168.101.144 poksap-en.itso.ibm.com poksap-en
# Service IP label
10.168.101.145 poksap-er.itso.ibm.com poksap-er
# Service IP label
------------------------------NODE busan.itso.ibm.com
------------------------------127.0.0.1
loopback localhost
# loopback (lo0) name/address
::1
loopback localhost
# IPv6 loopback (lo0) name/address
192.168.101.143 seoul-b1.itso.ibm.com seoul-b1
# Base IP label 1
192.168.101.144 busan-b1.itso.ibm.com busan-b1
# Base IP label 1
192.168.201.143 seoul-b2.itso.ibm.com seoul-b2
# Base IP label 2
192.168.201.144 busan-b2.itso.ibm.com busan-b2
# Base IP label 2
10.168.101.43 seoul.itso.ibm.com
seoul
# Persistent IP
10.168.101.44 busan.itso.ibm.com
busan
# Persistent IP
10.168.101.143 poksap-db.itso.ibm.com poksap-db
# Service IP label
10.168.101.144 poksap-en.itso.ibm.com poksap-en
# Service IP label
10.168.101.145 poksap-er.itso.ibm.com poksap-er
# Service IP label
seoul.itso.ibm.com:/ # clcmd hostname
------------------------------NODE seoul.itso.ibm.com
------------------------------seoul.itso.ibm.com
------------------------------NODE busan.itso.ibm.com
------------------------------busan.itso.ibm.com
seoul.itso.ibm.com:/ # clcmd cat /etc/cluster/rhosts
------------------------------NODE seoul.itso.ibm.com
------------------------------seoul
busan
------------------------------NODE busan.itso.ibm.com
------------------------------seoul
busan
76
IBM PowerHA SystemMirror 7.1 for AIX
seoul.itso.ibm.com:/ # clcmd lsattr
------------------------------NODE seoul.itso.ibm.com
------------------------------authm
65536
bootup_option no
gateway
hostname
seoul.itso.ibm.com
rout6
route
net,,0,192.168.100.60
------------------------------NODE busan.itso.ibm.com
------------------------------authm
65536
bootup_option no
gateway
hostname
busan.itso.ibm.com
rout6
route
net,,0,192.168.100.60
seoul.itso.ibm.com:/
Adapter
Name
Global Name
busan-b1 boot
24
busan-b2 boot
24
poksap-er service
255.255.255.0 24
poksap-en service
255.255.255.0 24
poksap-db service
255.255.255.0 24
seoul-b1 boot
24
seoul-b2 boot
24
poksap-er service
255.255.255.0 24
poksap-en service
255.255.255.0 24
poksap-db service
255.255.255.0 24
-El inet0
Authentication Methods
Use BSD-style Network Configuration
Gateway
Host Name
IPv6 Route
Route
True
True
True
True
True
True
Authentication Methods
Use BSD-style Network Configuration
Gateway
Host Name
IPv6 Route
Route
True
True
True
True
True
True
# cllsif
Type
Network
Netmask
net_ether_01 ether
net_ether_01 ether
Net Type
Attribute Node
IP Address
Alias for HB Prefix Length
public
busan
192.168.101.144
public
busan
Hardware Address Interface
192.168.201.144
en0 255.255.255.0
en2 255.255.255.0
net_ether_01 ether
public
busan
10.168.101.145
net_ether_01 ether
public
busan
10.168.101.144
net_ether_01 ether
public
net_ether_01 ether
public
seoul
192.168.101.143
en0 255.255.255.0
net_ether_01 ether
public
seoul
192.168.201.143
en2 255.255.255.0
net_ether_01 ether
public
seoul
10.168.101.145
net_ether_01 ether
public
seoul
10.168.101.144
net_ether_01 ether
public
seoul
10.168.101.143
busan
10.168.101.143
seoul.itso.ibm.com:/ # cllsnode
Node busan
Interfaces to network net_ether_01
Communication Interface: Name
Communication Interface: Name
Communication Interface: Name
Communication Interface: Name
Communication Interface: Name
busan-b1, Attribute public, IP address 192.168.101.144
busan-b2, Attribute public, IP address 192.168.201.144
poksap-er, Attribute public, IP address 10.168.101.145
poksap-en, Attribute public, IP address 10.168.101.144
poksap-db, Attribute public, IP address 10.168.101.143
Node seoul
Interfaces to network
Communication
Communication
Communication
Communication
Communication
seoul-b1, Attribute public, IP address 192.168.101.143
seoul-b2, Attribute public, IP address 192.168.201.143
poksap-er, Attribute public, IP address 10.168.101.145
poksap-en, Attribute public, IP address 10.168.101.144
poksap-db, Attribute public, IP address 10.168.101.143
net_ether_01
Interface: Name
Interface: Name
Interface: Name
Interface: Name
Interface: Name
# LPAR names
seoul.itso.ibm.com:/ # clcmd uname -n
------------------------------NODE seoul.itso.ibm.com
Chapter 5. Configuring a PowerHA cluster
77
------------------------------seoul
------------------------------NODE busan.itso.ibm.com
------------------------------busan
seoul.itso.ibm.com:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------sapdb
ONLINE
seoul
OFFLINE
busan
sapen
ONLINE
OFFLINE
seoul
busan
saper
ONLINE
OFFLINE
busan
seoul
# The output below shows that CAA always use the hostname for its node names
# The Power HA nodenames are: seoul, busan
seoul.itso.ibm.com:/ # lscluster -c
Cluster query for cluster korea returns:
Cluster uuid: 02d20290-d578-11df-871d-a24e50543103
Number of nodes in cluster = 2
Cluster id for node busan.itso.ibm.com is 1
Primary IP address for node busan.itso.ibm.com is 10.168.101.44
Cluster id for node seoul.itso.ibm.com is 2
Primary IP address for node seoul.itso.ibm.com is 10.168.101.43
Number of disks in cluster = 2
for disk cldisk2 UUID = 428e30e8-657d-8053-d70e-c2f4b75999e2 cluster_major = 0 cluster_minor = 2
for disk cldisk1 UUID = fe1e9f03-005b-3191-a3ee-4834944fcdeb cluster_major = 0 cluster_minor = 1
Multicast address for cluster is 228.168.101.43
5.1.4 Custom configuration of the cluster topology
For the custom configuration path example, we use the test environment from 4.1, “Hardware
configuration of the test environment” on page 54.
As a preliminary step, add the base IP aliases in /etc/cluster/rhosts file on each node and
refresh the CAA clcomd daemon. Example 5-3 illustrates this step on the node sydney.
Example 5-3 Populating the /etc/cluster/rhosts file
sydney:/ # cat /etc/cluster/rhosts
sydney
perth
sydneyb2
perthb2
sydney:/ # stopsrc -s clcomd;startsrc -s clcomd
0513-044 The clcomd Subsystem was requested to stop.
0513-059 The clcomd Subsystem has been started. Subsystem PID is 4980906.
78
IBM PowerHA SystemMirror 7.1 for AIX
Performing a custom configuration
To perform a custom configuration, follow these steps:
1. Access the Initial Cluster Setup (Custom) panel (Figure 5-12) by following the path smitty
sysmirror  Custom Cluster Configuration  Cluster Nodes and Networks  Initial
Cluster Setup (Custom). This task shows how to use each option on this menu.
Initial Cluster Setup (Custom)
Move cursor to desired item and press Enter.
Cluster
Nodes
Networks
Network Interfaces
Define Repository Disk and Cluster IP Address
Figure 5-12 initial Cluster Setup (Custom) panel for a custom configuration
2. Define the cluster:
a. From the Initial Cluster Setup (Custom) panel (Figure 5-12), follow the path Cluster 
Add/Change/Show a Cluster.
b. In the Add/Change/Show a Cluster panel (Figure 5-13), define the cluster name,
australia.
Add/Change/Show a Cluster
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
[australia]
* Cluster Name
Figure 5-13 Adding a cluster
3. Add the nodes:
a. From the Initial Cluster Setup (Custom) panel, select the path Nodes  Add a Node,
b. In the Add a Node panel (Figure 5-14), specify the first node, sydney, and the path that
is taken to initiate communication with the node. The cluster Node Name might be
different from the host name of the node.
c. Add the second node, perth, in the same way as you did for the sydney node.
Add a Node
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Node Name
Communication Path to Node
Entry Fields]
[sydney]
[sydney]
+
Figure 5-14 Add a Node panel
Chapter 5. Configuring a PowerHA cluster
79
4. Add a network:
a. From the Initial Cluster Setup (Custom) panel, follow the path Networks  Add a
Network.
b. In the Add a Network panel (Figure 5-15), For Network Type, select ether.
c. Define a PowerHA logical network, ether01, and specify its netmask. This logical
network is later populated with the corresponding base and service IP labels. You can
define more networks if needed.
Add a Network
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Network Name
* Network Type
* Netmask(IPv4)/Prefix Length(IPv6)
[Entry Fields]
[ether01]
ether
[255.255.252.0]
Figure 5-15 Add a Network panel
5. Add the network interfaces:
a. From the Initial Cluster Setup (Custom) panel, follow the path Network Interfaces 
Add a Network Interface.
b. Select the logical network and populate it with the appropriate interfaces. In the
example shown in Figure 5-16, we select the only defined ether01 network, and add
the interface sydneyb2 on the sydney node. Add in all the other interfaces in the same
way.
Tip: You might find it useful to remember the following points:
The sydneyb1 and perthb1 addresses are defined in the same subnet network.
The sydnetb2 and perthb2 addresses are defined in another subnet network.
All interfaces must have the same network mask.
Add a Network Interface
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
*
*
*
*
IP Label/Address
Network Type
Network Name
Node Name
Network Interface
Figure 5-16 Add a Network Interface panel
80
IBM PowerHA SystemMirror 7.1 for AIX
[Entry Fields]
[sydneyb2]
ether
ether01
[sydney]
[]
+
+
6. Define the repository disk and cluster IP address:
a. From the Initial Cluster Setup (Custom) panel, select the Define Repository Disk and
Cluster IP Address option.
b. Choose the physical disk that is used as a central repository of the cluster configuration
and specify the multicast IP address to be associated with this cluster. In the example
shown in Figure 5-17, we let the cluster automatically generate a default value for the
multicast IP address.
Define Repository and Cluster IP Address
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
australia
[hdisk1]
+
[]
* Cluster Name
* Repository Disk
Cluster IP Address
Figure 5-17 Define Repository Disk and Cluster IP Address panel
Verifying and synchronizing the custom configuration
With the cluster topology defined, you can verify and synchronize the cluster for the first time.
When the first Verify and Synchronize Cluster Configuration action is successful, the
underlying CAA cluster is activated, and the heartbeat messages begin. We use the
customizable version of the Verify and Synchronize Cluster Configuration command.
Figure 5-18 shows an example where the Automatically correct errors found during
verification? option changed from the default value of No to Yes.
PowerHA SystemMirror Verification and Synchronization
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Verify, Synchronize or Both
* Include custom verification library checks
* Automatically correct errors found during
verification?
* Force synchronization if verification fails?
* Verify changes only?
* Logging
[Entry Fields]
[Both]
[Yes]
[Yes]
+
+
+
[No]
[No]
[Standard]
+
+
+
Figure 5-18 Verifying and synchronizing the cluster configuration (advanced)
Chapter 5. Configuring a PowerHA cluster
81
Upon successful synchronization, check the PowerHA topology and the CAA cluster
configuration by using cltopinfo and lscluster -c commands on any node. Example 5-4
shows usage of the PowerHA cltopinfo command. It also shows how the topology
configured on the node sydney looks on the node perth after synchronization.
Example 5-4 PowerHA cluster topology
perth:/ # cltopinfo
Cluster Name: australia
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: caa_private0
Cluster IP Address:
There are 2 node(s) and 1 network(s) defined
NODE perth:
Network ether01
perthb2 192.168.201.136
perth 192.168.101.136
NODE sydney:
Network ether01
sydneyb2 192.168.201.135
sydney 192.168.101.135
No resource groups defined
Example 5-5 shows a summary configuration of the CAA cluster created during the
synchronization phase.
Example 5-5 CAA cluster summary configuration
perth:/ # lscluster -c
Cluster query for cluster australia returns:
Cluster uuid: d77ac57e-cc1b-11df-92a4-00145ec5bf9a
Number of nodes in cluster = 2
Cluster id for node perth is 1
Primary IP address for node perth is 192.168.101.136
Cluster id for node sydney is 2
Primary IP address for node sydney is 192.168.101.135
Number of disks in cluster = 0
Multicast address for cluster is 228.168.101.135
For more details about the CAA cluster status, see the following section.
Initial CAA cluster status
Check the status of the CAA cluster by using lscluster command. As shown in Example 5-6,
the lscluster -m command lists the node and point-of-contact status information. A
point-of-contact status indicates that a node has received communication packets across this
interface from another node.
Example 5-6 CAA cluster node status
sydney:/ # lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2
82
IBM PowerHA SystemMirror 7.1 for AIX
Node name: perth
Cluster shorthand id for node: 1
uuid for node: 15bef17c-cbcf-11df-951c-00145e5e3182
State of node: UP
Smoothed rtt to node: 7
Mean Deviation in network rtt to node: 3
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
australia
local
98f28ffa-cfde-11df-9a82-00145ec5bf9a
Number of points_of_contact for node: 3
Point-of-contact interface & contact state
sfwcom UP
en2 UP
en1 UP
-----------------------------Node name: sydney
Cluster shorthand id for node: 2
uuid for node: f6a81944-cbce-11df-87b6-00145ec5bf9a
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
australia
local
98f28ffa-cfde-11df-9a82-00145ec5bf9a
Number of points_of_contact for node: 0
Point-of-contact interface & contact state
n/a
sydney:/ #
Example 5-7 shows detailed interface information provided by the lscluster -i command. It
shows information about the network interfaces and the other two logical interfaces that are
used for cluster communication:
sfwcom
dpcom
The node connection to the SAN-based communication channel.
The node connection to the repository disk.
Example 5-7 CAA cluster interface status
sydney:/ # lscluster -i
Network/Storage Interface Query
Cluster Name: australia
Cluster uuid: d77ac57e-cc1b-11df-92a4-00145ec5bf9a
Number of nodes reporting = 2
Number of nodes expected = 2
Node sydney
Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a
Number of interfaces discovered = 4
Chapter 5. Configuring a PowerHA cluster
83
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9a
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 5
Probe interval for interface = 120 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmask
255.255.252.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9b
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 5
Probe interval for interface = 120 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255 netmask
255.255.252.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 0
Mean Deviation in network rrt across interface = 0
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node perth
Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182
Number of interfaces discovered = 4
84
IBM PowerHA SystemMirror 7.1 for AIX
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d9
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255 netmask
255.255.252.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d8
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255 netmask
255.255.252.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 0
Mean Deviation in network rrt across interface = 0
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Chapter 5. Configuring a PowerHA cluster
85
5.1.5 Configuring resources and applications
This section continues to build up the cluster by configuring its resources, resource groups,
and application controllers. The goal is to prepare the setup that is needed to introduce the
new Start After and Stop After resource group dependencies in PowerHA 7.1. For a
configuration example for these dependencies, see 5.1.6, “Configuring Start After and Stop
After resource group dependencies” on page 96.
Adding storage resources and resource groups from C-SPOC
To add storage resources and resource groups form C-SPOC, follow these steps:
1. Use the smitty cl_lvm fast path or follow the path smitty sysmirror  System
Management (C-SPOC)  Storage to configure storage resources.
2. Create two volume groups, dbvg and appvg. In the Storage panel (Figure 5-19), select the
path Volume Groups  Create a Volume Group (smitty cl_createvg fast path).
Storage
Move cursor to desired item and press Enter.
Volume Groups
Logical Volumes
File Systems
Physical Volumes
Figure 5-19 C-SPOC storage panel
The Volume Groups option is the preferred method for creating a volume group, because it
is automatically configured on all of the selected nodes. Since the release of PowerHA 6.1,
most operations on volume groups, logical volumes, and file systems no longer require
these objects to be in a resource group. Smart menus check for configuration and state
problems and prevent invalid operations before they can be initiated.
86
IBM PowerHA SystemMirror 7.1 for AIX
3. In the Volume Groups panel, in the Node Names dialog (Figure 5-20), select the nodes for
configuring the volume groups.
Volume Groups
Move cursor to desired item and press Enter.
List All Volume Groups
Create a Volume Group
Create a Volume Group with Data Path Devices
Set Characteristics of a Volume Group
Enable a Volume Group for Fast Disk Takeover or Concurrent Access
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•
Node Names
•
•
•
• Move cursor to desired item and press Esc+7.
•
•
ONE OR MORE items can be selected.
•
• Press Enter AFTER making all selections.
•
•
•
• > perth
•
• > sydney
•
•
•
• F1=Help
F2=Refresh
F3=Cancel
•
• Esc+7=Select
Esc+8=Image
Esc+0=Exit
•
F1• Enter=Do
/=Find
n=Find Next
•
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-20 Nodes selection
Chapter 5. Configuring a PowerHA cluster
87
In the Volume Groups panel (Figure 5-21), only the physical shared disks that are
accessible on the selected nodes are displayed (Physical Volume Names menu).
4. In the Physical Volume Names menu (inset in Figure 5-21), select the volume group type.
Volume Groups
Move cursor to desired item and press Enter.
List All Volume Groups
Create a Volume Group
Create a Volume Group with Data Path Devices
Set Characteristics of a Volume Group
Enable a Volume Group for Fast Disk Takeover or Concurrent Access
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•
Physical Volume Names
•
•
•
• Move cursor to desired item and press Esc+7.
•
•
ONE OR MORE items can be selected.
•
• Press Enter AFTER making all selections.
•
•
•
•
00c1f170674f3d6b ( hdisk1 on all selected nodes )
•
•
00c1f1706751bc0d ( hdisk2 on all selected nodes )
•
•
•
• F1=Help
F2=Refresh
F3=Cancel
•
• Esc+7=Select
Esc+8=Image
Esc+0=Exit
•
F1• Enter=Do
/=Find
n=Find Next
•
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-21 Shared disk selection
PVID: This step automatically creates physical volume IDs (PVIDs) for the unused (no
PVID) shared disks. A shared disk might have different names on selected nodes, but
the PVID is the same.
88
IBM PowerHA SystemMirror 7.1 for AIX
5. In the Create a Volume Group panel (Figure 5-22), specify the volume group name and
the resource group name.
Use the Resource Group Name field to include the volume group into an existing resource
group or automatically create a resource group to hold this volume group. After the
resource group is created, synchronize the configuration for this change to take effect
across the cluster.
Create a Volume Group
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[TOP]
Node Names
Resource Group Name
+
PVID
VOLUME GROUP name
Physical partition SIZE in megabytes
Volume group MAJOR NUMBER
Enable Cross-Site LVM Mirroring Verification
Enable Fast Disk Takeover or Concurrent Access
Volume Group Type
CRITICAL volume group?
[Entry Fields]
perth,sydney
[dbrg]
00c1f170674f3d6b
[dbvg]
4
[37]
false
Fast Disk Takeover
Original
no
+
#
+
+
+
Figure 5-22 Creating a volume group in C-SPOC
Chapter 5. Configuring a PowerHA cluster
89
6. Leave the resource group field empty and create or associate the resource group later.
When a volume group is known on multiple nodes, it is displayed in pick lists as <Not in a
Resource Group>. Figure 5-23 shows an example of a pick list.
Logical Volumes
Move cursor to desired item and press Enter.
List All Logical Volumes by Volume Group
Add a Logical Volume
Show Characteristics of a Logical Volume
Set Characteristics of a Logical Volume
Change a Logical Volume
Remove a Logical Volume
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•
Select the Volume Group that will hold the new Logical Volume
•
•
•
• Move cursor to desired item and press Enter.
•
•
•
•
#Volume Group
Resource Group
Node List
•
•
appvg
<Not in a Resource Group> perth,sydney
•
•
caavg_private
<Not in a Resource Group> perth,sydney
•
•
dbvg
dbrg
perth,sydney
•
•
•
• F1=Help
F2=Refresh
F3=Cancel
•
• Esc+8=Image
Esc+0=Exit
Enter=Do
•
F1• /=Find
n=Find Next
•
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-23 Adding a logical volume in C-SPOC
7. In the C-SPOC Storage panel (Figure 5-19 on page 86), define the logical volumes and
file systems by selecting the Logical Volumes and File Systems options. The
intermediate and final panels for these actions are similar to those panels in previous
releases.
You can list the file systems that you created by following the path C-SPOC Storage 
File Systems  List All File Systems by Volume Group. The COMMAND STATUS
panel (Figure 5-24) shows the list of file systems for this example.
COMMAND STATUS
Command: OK
stdout: yes
stderr: no
Before command completion, additional instructions may appear below.
#File System
/appmp
/clrepos_private1
/clrepos_private2
/dbmp
Volume Group
appvg
caavg_private
caavg_private
dbvg
Figure 5-24 Listing of file systems in C-SPOC
90
IBM PowerHA SystemMirror 7.1 for AIX
Resource Group
<None>
<None>
<None>
dbrg
Node List
sydney,perth
sydney,perth
sydney,perth
sydney,perth
Resources and resource groups
By following the path smitty sysmirror  Cluster Applications and Resources, you see
the Cluster Applications and Resources menu (Figure 5-25) for resources and resource group
management.
Cluster Applications and Resources
Move cursor to desired item and press Enter.
Make Applications Highly Available (Use Smart Assists)
Resources
Resource Groups
Verify and Synchronize Cluster Configuration
Figure 5-25 Cluster Applications and Resources menu
Smart Assists: The “Make Applications Highly Available (Use Smart Assists)” function
leads to a menu of all installed Smart Assists. If you do not see the Smart Assist that you
need, verify that the corresponding Smart Assist file set is installed.
Configuring application controllers
To configure the application controllers, follow these steps:
1. From the Cluster Applications and Resources menu, select Resources.
2. In the Resources menu (Figure 5-26), select the Configure User Applications (Scripts
and Monitors) option to configure the application scripts.
Alternatively, use the smitty cm_user_apps fast path or smitty sysmirror  Cluster
Applications and Resources  Resources  Configure User Applications (Scripts
and Monitors).
Resources
Move cursor to desired item and press Enter.
Configure User Applications (Scripts and Monitors)
Configure Service IP Labels/Addresses
Configure Tape Resources
Verify and Synchronize Cluster Configuration
Figure 5-26 Resources menu
Chapter 5. Configuring a PowerHA cluster
91
3. In the Configure User Applications (Scripts and Monitors) panel (Figure 5-27), select the
Application Controller Scripts option.
Configure User Applications (Scripts and Monitors)
Move cursor to desired item and press Enter.
Application Controller Scripts
Application Monitors
Configure Application for Dynamic LPAR and CoD Resources
Show Cluster Applications
Figure 5-27 Configure user applications (scripts and monitors)
4. In the Application Controller Scripts panel (Figure 5-28), select the Add Application
Controller Scripts option.
Application Controller Scripts
Move cursor to desired item and press Enter.
Add Application Controller Scripts
Change/Show Application Controller Scripts
Remove Application Controller Scripts
What is an "Application Controller" anyway ?
Figure 5-28 Application controller scripts
92
IBM PowerHA SystemMirror 7.1 for AIX
5. In the Add Application Controller Scripts panel (Figure 5-29), which looks similar to the
panels in previous versions, follow these steps:
a. In the Application Controller Name field, type the name that you want use as a label for
your application. In this example, we use the name dbac.
b. As in previous versions, in the Start Script field, provide the location of your application
start script.
c. In the Stop Script field, specify the location of your stop script. In this example, we
specify /HA71/db_start.sh as the start script and /HA71/db_stop.sh as the stop script.
d. Optional: To monitor your application, in the Application Monitor Name(s) field, select
one or more application monitors. However, you must define the application monitors
before you can use them here. For an example, see “Configuring application
monitoring for the target resource group” on page 98.
Add Application Controller Scripts
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
[dbac]
[/HA71/db_start.sh]
[/HA71/db_stop.sh]
* Application Controller Name
* Start Script
* Stop Script
Application Monitor Name(s)
+
Figure 5-29 Adding application controller scripts
The configuration of the applications is completed. The next step is to configure the service IP
addresses.
Configuring IP service addresses
To configure the IP service addresses, follow these steps:
1. Return to the Resource panel (Figure 5-26 on page 91) by using the
smitty cm_resources_menu fast path or smitty sysmirror  Cluster Applications and
Resources  Resources.
2. In the Resource panel, select the Configure Service IP Labels/Addresses option.
3. In the Configure Service IP Labels/Addresses menu (Figure 5-30), select the Add a
Service IP Label/Address option.
Configure Service IP Labels/Addresses
Move cursor to desired item and press Enter.
Add a Service IP Label/Address
Change/ Show a Service IP Label/Address
Remove Service IP Label(s)/Address(es)
Configure Service IP Label/Address Distribution Preferences
Figure 5-30 Configure Service IP Labels/Addresses menu
Chapter 5. Configuring a PowerHA cluster
93
4. In the Network Name subpanel (Figure 5-31), select the network to which you want to add
the Service IP Address. In this example, only one network is defined.
Configure Service IP Labels/Addresses
Move cursor to desired item and press Enter.
Add a Service IP Label/Address
Change/ Show a Service IP Label/Address
Remove Service IP Label(s)/Address(es)
Configure Service IP Label/Address Distribution Preferences
+--------------------------------------------------------------------------+
|
Network Name
|
|
|
| Move cursor to desired item and press Enter.
|
|
|
|
ether01 (192.168.100.0/22 192.168.200.0/22)
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F8=Image
F10=Exit
Enter=Do
|
F1| /=Find
n=Find Next
|
F9+--------------------------------------------------------------------------+
Figure 5-31 Network Name subpanel for the Add a Service IP Label/Address option
5. In the Add a Service IP Label/Address panel, which changes as shown in Figure 5-32, in
the IP Label/Address field, select the service address that you want to add.
Service address defined: As in previous versions, the service address must be
defined in the /etc/hosts file. Otherwise, you cannot select it by using the F4 key.
You can use the Netmask(IPv4)/Prefix Length(IPv6) field to define the netmask. With IPv4,
you can leave this field empty. The Network Name field is prefilled.
Add a Service IP Label/Address
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* IP Label/Address
Netmask(IPv4)/Prefix Length(IPv6)
* Network Name
[Entry Fields]
sydneys
[]
ether01
+
Figure 5-32 Details of the Add a Service IP Label/Address panel
You have now finished configuring the resources. In this example, you defined one service IP
address. If you need to add more service IP addresses, repeat the steps as indicated in this
section.
As explained in the following section, the next step is to configure the resource groups.
94
IBM PowerHA SystemMirror 7.1 for AIX
Configuring resource groups
To configure the resource groups, follow these steps:
1. Go to the Cluster Applications and Resources panel (Figure 5-25 on page 91).
Alternatively, use the smitty cm_apps_resources fast path or smitty sysmirror  Cluster
Applications and Resources.
2. In the Cluster Applications and Resources panel, select Resource Groups.
3. In the Resource Groups menu (Figure 5-33), add a resource group by selecting the Add a
Resource Group option.
Resource Groups
Move cursor to desired item and press Enter.
Add a Resource Group
Change/Show Nodes and Policies for a Resource Group
Change/Show Resources and Attributes for a Resource Group
Remove a Resource Group
Configure Resource Group Run-Time Policies
Show All Resources by Node or Resource Group
Verify and Synchronize Cluster Configuration
What is a "Resource Group" anyway ?
Figure 5-33 Resource Groups menu
4. In the Add a Resource Group panel (Figure 5-34), as in previous versions of PowerHA,
specify the resource group name, the participating nodes, and the policies.
Add a Resource Group
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Resource Group Name
* Participating Nodes (Default Node Priority)
Startup Policy
Fallover Policy
Fallback Policy
[Entry Fields]
[dbrg]
[sydney perth]
+
Online On Home Node O> +
Fallover To Next Prio> +
Fallback To Higher Pr> +
Figure 5-34 Add a Resource Group panel
Chapter 5. Configuring a PowerHA cluster
95
5. Configure the resources into the resource group. If you need more than one resource
group, repeat the previous step to add a resource group.
a. To configure the resources to the resource group, go back to the Resource Groups
panel (Figure 5-33 on page 95), and select the Change/Show Resources and
Attributes for a Resource Group.
b. In the Change/Show Resources and Attributes for a Resource Group panel
(Figure 5-35), define the resources for the resource group.
Change/Show All Resources and Attributes for a Resource Group
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[TOP]
Resource Group Name
Participating Nodes (Default Node Priority)
[Entry Fields]
dbrg
sydney perth
Startup Policy
Fallover Policy
Fallback Policy
Fallback Timer Policy (empty is immediate)
Online On Home Node O>
Fallover To Next Prio>
Fallback To Higher Pr>
[]
+
Service IP Labels/Addresses
Application Controllers
[sydneys]
[dbac]
+
+
[dbvg]
false
+
+
Volume Groups
Use forced varyon of volume groups, if necessary
[MORE...24]
Figure 5-35 Change/Show All Resources and Attributes for a Resource Group panel
You have now finished configuring the resource group.
Next, you synchronize the cluster nodes. If the Verify and Synchronize Cluster Configuration
task is successfully completed, you can start your cluster. However, you might first want to
see if the CAA cluster was successfully created by using the lscluster -c command.
5.1.6 Configuring Start After and Stop After resource group dependencies
In this section, you configure a Start After resource group dependency and similarly create a
Stop After resource group dependency. For more information about Start After and Stop After
resource group dependencies, see 2.5.1, “Start After and Stop After resource group
dependencies” on page 32.
96
IBM PowerHA SystemMirror 7.1 for AIX
You can manage Start After dependencies between resource groups by following the path
smitty sysmirror  Cluster Applications and Resources  Resource Groups 
Configure Resource Group Run-Time Policies  Configure Dependencies between
Resource Groups  Configure Start After Resource Group Dependency. Figure 5-36
shows the Configure Start After Resource Group Dependency menu.
Configure Start After Resource Group Dependency
Move cursor to desired item and press Enter.
Add Start After Resource Group Dependency
Change/Show Start After Resource Group Dependency
Remove Start After Resource Group Dependency
Display Start After Resource Group Dependencies
Figure 5-36 Configuring Start After Resource Group dependency menu
To add a new dependency, in the Configure Start After Resource Group Dependency menu,
select the Add Start After Resource Group Dependency option. In this example, we
already configured the dbrg and apprg resource groups. The apprg resource group is defined
as the source (dependent) resource group as shown in Figure 5-37.
Configure Start After Resource Group Dependency
Move cursor to desired item and press Enter.
Add Start After Resource Group Dependency
Change/Show Start After Resource Group Dependency
Remove Start After Resource Group Dependency
Display Start After Resource Group Dependencies
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•
Select the Source Resource Group
•
•
•
• Move cursor to desired item and press Enter.
•
•
•
•
apprg
•
•
dbrg
•
•
•
• F1=Help
F2=Refresh
F3=Cancel
•
• Esc+8=Image
Esc+0=Exit
Enter=Do
•
F1• /=Find
n=Find Next
•
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-37 Selecting the source resource group of a Start After dependency
Chapter 5. Configuring a PowerHA cluster
97
Figure 5-38 shows dbrg resource group defined as the target resource group.
Configure Start After Resource Group Dependency
Move cursor to desired item and press Enter.
Add Start After Resource Group Dependency
Change/Show Start After Resource Group Dependency
Remove Start After Resource Group Dependency
Display Start After Resource Group Dependencies
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•
Select the Target Resource Group
•
•
•
• Move cursor to desired item and press Esc+7.
•
•
ONE OR MORE items can be selected.
•
• Press Enter AFTER making all selections.
•
•
•
•
dbrg
•
•
•
• F1=Help
F2=Refresh
F3=Cancel
•
• Esc+7=Select
Esc+8=Image
Esc+0=Exit
•
F1• Enter=Do
/=Find
n=Find Next
•
Es••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 5-38 Selecting the target resource group of a Start After dependency
Example 5-8 shows the result.
Example 5-8 Start After dependency configured
sydney:/ # clrgdependency -t'START_AFTER' -sl
#Source
Target
apprg
dbrg
sydney:/ #
Configuring application monitoring for the target resource group
The Start After dependency guarantees that only the source resource group is started after
the target resource group is started. You might need the application in your source resource
group (source startup script) to start only after a full and successful start of the application in
your target resource group (after target startup script returns 0). In this case, you must
configure the startup monitoring for your target application. The dummy scripts in
Example 5-9 show the configuration of the test cluster.
Example 5-9 Dummy scripts for target and source applications
sydney:/HA71 # ls -l
total 48
-rwxr--r-1 root
-rwxr--r-1 root
-rwxr--r-1 root
-rwxr--r-1 root
-rwxr--r-1 root
-rwxr--r-1 root
98
IBM PowerHA SystemMirror 7.1 for AIX
system
system
system
system
system
system
226
283
233
201
274
229
Oct
Oct
Oct
Oct
Oct
Oct
12
12
12
12
12
12
07:00
07:06
07:03
06:03
07:24
06:04
app_mon.sh
app_start.sh
app_stop.sh
db_mon.sh
db_start.sh
db_stop.sh
The remainder of this task continues from the configuration started in “Configuring application
controllers” on page 91. You only have to add a monitor for the dbac application controller that
you already configured.
Follow the path smitty sysmirror  Cluster Applications and Resources 
Resources  Configure User Applications (Scripts and Monitors)  Add Custom
Application Monitor. The Add Custom Application Monitor panel (Figure 5-39) is displayed.
We do not explain the fields here because they are the same as the fields in previous
versions. However, keep in mind that the Monitor Mode value Both means both startup
monitoring and long-running monitoring.
Add Custom Application Monitor
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
[dbam]
dbac
[Both]
[/HA71/db_mon.sh]
[30]
[]
[120]
[3]
[]
[fallover]
[/HA71/db_stop.sh]
[/HA71/db_start.sh]
[]
*
*
*
*
Monitor Name
Application Controller(s) to Monitor
Monitor Mode
Monitor Method
Monitor Interval
Hung Monitor Signal
* Stabilization Interval
* Restart Count
Restart Interval
* Action on Application Failure
Notify Method
Cleanup Method
Restart Method
+
+
#
#
#
#
#
+
Figure 5-39 Adding the dbam custom application monitor
Chapter 5. Configuring a PowerHA cluster
99
Similarly, you can configure an application monitor and an application controller for the apprg
resource group as shown in Figure 5-40.
Change/Show Custom Application Monitor
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Monitor Name
Application Controller(s) to Monitor
* Monitor Mode
* Monitor Method
Monitor Interval
Hung Monitor Signal
* Stabilization Interval
Restart Count
Restart Interval
* Action on Application Failure
Notify Method
Cleanup Method
Restart Method
[Entry Fields]
appam
appac
+
[Long-running monitori> +
[/HA71/app_mon.sh]
[30]
#
[9]
#
[15]
#
[3]
#
[594]
#
[fallover]
+
[]
[/HA71/app_stop.sh]
[/HA71/app_start.sh]
Figure 5-40 Configuring the appam application monitor and appac application controller
For a series of tests performed on this configuration, see 9.8, “Testing a Start After resource
group dependency” on page 297.
5.1.7 Creating a user-defined resource type
Now create a user-defined resource type by using SMIT:
1. To define a user-defined resource type, follow the path smitty sysmirror  Custom
Cluster Configuration  Resources  Configure User Defined Resources and
Types  Add a User Defined Resource Type.
Resource type management: PowerHA SystemMirror automatically manages most
resource types.
100
IBM PowerHA SystemMirror 7.1 for AIX
2. In the Add a User Defined Resource Type panel (Figure 5-41), define a resource type.
Also select the processing order from the pick list.
Add a User Defined Resource Type
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* Resource Type Name
[my_resource_type]
* Processing order
[]
+
Verification Method
[]
Verification Type
[Script]
+
Start Method
[]
Stop Method
[]
+--------------------------------------------------------------------------+
¦
Processing order
¦
¦
¦
¦ Move cursor to desired item and press Enter.
¦
¦
¦
¦
FIRST
¦
¦
WPAR
¦
¦
VOLUME_GROUP
¦
¦
FILE_SYSTEM
¦
¦
SERVICEIP
¦
¦
TAPE
¦
¦
APPLICATION
¦
¦
¦
¦ F1=Help
F2=Refresh
F3=Cancel
¦
F1¦ Esc+8=Image
Esc+0=Exit
Enter=Do
¦
Es¦ /=Find
n=Find Next
¦
Es+--------------------------------------------------------------------------+
Figure 5-41 Adding a user-defined resource type
3. After you create your own resource, add it to the resource group. The resource group can
be shown in the pick list. This information is stored in the HACMresourcetype,
HACMPudres_def, and HACMPudresouce cluster configuration files.
Chapter 5. Configuring a PowerHA cluster
101
5.1.8 Configuring the dynamic node priority (adaptive failover)
As mentioned in 2.5.3, “Dynamic node priority: Adaptive failover” on page 35, in PowerHA
7.1, you can decide node priority based on the return value of your own script. To configure
the dynamic node priority (DNP), follow these steps:
1. Follow the path smitty sysmirror  Cluster Applications and Resource  Resource
Groups  Change/Show Resources and Attributes for a Resource Group (if you
already have your resource group).
As you can see in Change/Show Resources and Attributes for a Resource Group panel
(Figure 5-42), the algeria_rg resource group has default node priority. The participating
nodes are algeria, brazil, and usa.
2. To configure DNP, choose the dynamic node priority policy. In this example, we chose
cl_lowest_nonzero_udscript_rc as the dynamic node priority. Usage of this DNP means
the node that has the lowest return from the DNP script gets the highest priority among the
nodes. Also define the DNP script path and timeout value.
DNP script for the nodes: Ensure that all nodes have the DNP script and that the
script has executable mode. Otherwise, you receive an error message while running the
synchronization or verification process.
For a description of this test scenario, see 9.9, “Testing dynamic node priority” on
page 302.
Change/Show All Resources and Attributes for a Resource Group
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[TOP]
Resource Group Name
Participating Nodes (Default Node Priority)
* Dynamic Node Priority Policy
DNP Script path
DNP Script timeout value
[Entry Fields]
algeria_rg
algeria brazil usa
[cl_lowest_nonzero_uds> +
<HTTPServer/bin/DNP.sh] /
[20]
#
Startup Policy
Fallover Policy
Fallback Policy
Fallback Timer Policy (empty is immediate)
Online On Home Node O>
Fallover Using Dynami>
Fallback To Higher Pr>
[]
+
[MORE...11]
F1=Help
Esc+5=Reset
Esc+9=Shell
F2=Refresh
Esc+6=Command
Esc+0=Exit
Figure 5-42 Configuring DNP in a SMIT session
102
IBM PowerHA SystemMirror 7.1 for AIX
F3=Cancel
Esc+7=Edit
Enter=Do
F4=List
Esc+8=Image
5.1.9 Removing a cluster
You can remove your cluster by using the path smitty sysmirror  Cluster Nodes and
Networks  Manage the Cluster  Remove the Cluster Definition.
Removing a cluster consists of deleting the PowerHA definition and deleting the CAA cluster
from AIX. Removing the CAA cluster is the last step of the Remove operation as shown in
Figure 5-43.
COMMAND STATUS
Command: OK
stdout: yes
stderr: no
Before command completion, additional instructions may appear below.
Attempting to delete node "Perth" from the cluster...
Attempting to delete the local node from the cluster...
Attempting to delete the cluster from AIX ...
F1=Help
Esc+8=Image
n=Find Next
F2=Refresh
Esc+9=Shell
F3=Cancel
Esc+0=Exit
Esc+6=Command
/=Find
Figure 5-43 Removing the cluster
Normally, deleting the cluster with this method removes both the PowerHA SystemMirror and
the CAA cluster definitions from the system. If a problem is encountered while PowerHA is
trying to remove the CAA cluster, you might need to delete the CAA cluster manually. For
more information, see Chapter 10, “Troubleshooting PowerHA 7.1” on page 305.
After you remove the cluster, ensure that the caavg_private volume group is no longer
displayed as shown in Figure 5-44.
--- before --# lspv
caa_private0
hdisk6
hdisk7
hdisk0
--- after --# lspv
hdisk5
hdisk6
hdisk7
hdisk0
000fe40120e16405
000fe4114cf8d258
000fe4114cf8d2ec
000fe411201305c3
caavg_private
dbvg
applvg
rootvg
000fe40120e16405
000fe4114cf8d258
000fe4114cf8d2ec
000fe411201305c3
None
dbvg
applvg
rootvg
active
active
active
Figure 5-44 The lspv command output before and after removing a cluster
Chapter 5. Configuring a PowerHA cluster
103
5.2 Cluster configuration using the clmgr tool
PowerHA 7.1 introduces the clmgr command-line tool. This tool is partially new. It is based on
the clvt tool with the following improvements:
Consistent usage across the supported functions
Improved usability
Improved serviceability
Uses fully globalized message catalog
Multiple levels of debugging
Automatic help
To see the possible values for the attributes, use the man clvt command.
5.2.1 The clmgr action commands
The following actions are currently supported in the clmgr command:
add
delete
manage
modify
move
offline
online
query
recover
sync
view
For a list of actions, you can use clmgr command with no arguments. See “The clmgr
command” on page 106 and Example 5-10 on page 106.
Most of the actions in the list provide aliases. Table 5-1 shows the current actions and their
abbreviations and aliases.
Table 5-1 Command aliases
104
Actual
Synonyms or aliases
add
a, create, make, mk
query
q, ls, get
modify
mod, ch, set
delete
de, rem, rm er
online
on, start
offline
off, stop
move
mov, mv
recover
rec
sync
sy
verify
ve
view
vi, cat
IBM PowerHA SystemMirror 7.1 for AIX
Actual
Synonyms or aliases
manage
mg
5.2.2 The clmgr object classes
The following object classes are currently supported:
application_controller
application_monitor
cluster
dependency
fallback_timer
file_collection
file_system (incomplete coverage)
interface
logical_volume (incomplete coverage)
method (incomplete coverage)
network
node
persistent_ip
physical_volume (incomplete coverage)
report
resource_group
service_ip
snapshot
site
tape
volume_group (incomplete coverage)
For a list, you can use clmgr with no arguments. See “The clmgr query command” on
page 107 and Example 5-11 on page 107.
Most of these object classes in the list provide aliases. Table 5-2 on page 105 lists the current
object classes and their abbreviations and aliases.
Table 5-2 Object classes with aliases
Actual
Minimum string
Cluster
cl
site
si
node
no
interface
in, if
network
ne, nw
resource_group
rg
service_ip
se
persistent_ip
pe, pi
application_controller
ac, app, appctl
Chapter 5. Configuring a PowerHA cluster
105
5.2.3 Examples of using the clmgr command
This section provides information about some of the clmgr commands. An advantage of the
clmgr command compared to the clvt command is that it is not case-sensitive. For more
details about the clmgr command, see Appendix D, “The clmgr man page” on page 501.
For a list of the actions that are currently supported, see 5.2.1, “The clmgr action commands”
on page 104.
For a list, you can use clmgr command with no arguments. See “The clmgr command” on
page 106 and Example 5-10 on page 106.
For a list of object classes that are currently supported, see 5.2.2, “The clmgr object classes”
on page 105.
For a list, use the clmgr command with no arguments. See “The clmgr query command” on
page 107 and Example 5-11 on page 107.
For most of these actions and object classes, abbreviations and aliases are available. These
commands are not case-sensitive. You can find more details about the actions and their
aliases in “The clmgr action commands” on page 104. For more information about object
classes, see “The clmgr object classes” on page 105.
Error messages: At the time of writing, the clmgr error messages referred to clvt. This
issue will be fixed in a future release so that it references clmgr.
The clmgr command
Running the clmgr command with no arguments or with the -h option shows the operations
that you can perform. Example 5-10 shows the output that you see just by using the clmgr
command. You see similar output if you use the -h option. The difference between the clmgr
and clmgr -h commands is that, in the output of the clmgr -h command, the line with the
error message is missing. For more details about the -h option, see “Using help in clmgr” on
page 111.
Example 5-10 Output of the clmgr command with no arguments
# clmgr
ERROR: an invalid operation was requested:
clmgr [-c|-x] [-S] [-v] [-f] [-D] [-l {low|med|high|max}] [-T <ID>] \
[-a {<ATTR#1>,<ATTR#2>,<ATTR#n>,...}] <ACTION> <CLASS> [<NAME>] \
[-h | <ATTR#1>=<VALUE#1> <ATTR#2>=<VALUE#2> <ATTR#n>=<VALUE#n> ...]
clmgr [-c|-x] [-S] [-v] [-f] [-D] [-l {low|med|high|max}] [-T <ID>] \
[-a {<ATTR#1>,<ATTR#2>,<ATTR#n>,...}] -M "
<ACTION> <CLASS> [<NAME>] [<ATTR#1>=<VALUE#1> <ATTR#n>=<VALUE#n> ...]
.
.
."
ACTION={add|modify|delete|query|online|offline|...}
CLASS={cluster|site|node|network|resource_group|...}
clmgr {-h|-?} [-v]
clmgr [-v] help
106
IBM PowerHA SystemMirror 7.1 for AIX
# Available actions for clvt:
add
delete
help
manage
modify
move
offline
online
query
recover
sync
verify
view
#
The clmgr query command
Running the clmgr command with only the query argument generates a list of the supported
object classes as shown in Example 5-11. You see similar output if you use the -h option. The
difference between the clmgr query and clmgr query -h commands is that, in the output of
the clmgr query -h command, the lines with the object class names are indented. For more
details about the -h option, see “Using help in clmgr” on page 111.
Example 5-11 Output of the clmgr query command
# clmgr query
# Available classes for clvt action "query":
application_controller
application_monitor
cluster
dependency
fallback_timer
file_collection
file_system
interface
log
logical_volume
method
network
node
persistent_ip
physical_volume
resource_group
service_ip
site
smart_assist
snapshot
tape
volume_group
#
Chapter 5. Configuring a PowerHA cluster
107
The clmgr query cluster command
You use the clmgr query cluster command to obtain detailed information about your cluster.
Example 5-12 show the output from the cluster used in the test environment.
Example 5-12 Output of the clmgr query cluster command
# clmgr query cluster
CLUSTER_NAME="hacmp29_cluster"
CLUSTER_ID="1126895238"
STATE="STABLE"
VERSION="7.1.0.1"
VERSION_NUMBER="12"
EDITION="STANDARD"
CLUSTER_IP=""
REPOSITORY="caa_private0"
SHARED_DISKS=""
UNSYNCED_CHANGES="false"
SECURITY="Standard"
FC_SYNC_INTERVAL="10"
RG_SETTLING_TIME="0"
RG_DIST_POLICY="node"
MAX_EVENT_TIME="180"
MAX_RG_PROCESSING_TIME="180"
SITE_POLICY_FAILURE_ACTION="fallover"
SITE_POLICY_NOTIFY_METHOD=""
DAILY_VERIFICATION="Enabled"
VERIFICATION_NODE="Default"
VERIFICATION_HOUR="0"
VERIFICATION_DEBUGGING="Enabled"
LEVEL=""
ALGORITHM=""
GRACE_PERIOD=""
REFRESH=""
MECHANISM=""
CERTIFICATE=""
PRIVATE_KEY=""
#
As mentioned previously, most clmgr actions and object classes provide aliases. Another
helpful feature of the clmgr command is the ability to understand abbreviated commands. For
example, the previous command can be shortened as follows:
# clmgr q cl
For more details about the capability of the clmgr command, see 5.2.1, “The clmgr action
commands” on page 104, and 5.2.2, “The clmgr object classes” on page 105. See also the
man pages listed in Appendix D, “The clmgr man page” on page 501.
108
IBM PowerHA SystemMirror 7.1 for AIX
The enhanced search capability
An additional feature of the clmgr command is that it provides an easy search capability with
the query action. Example 5-13 shows how to list all the defined resource groups.
Example 5-13 List of all defined resource groups
# clmgr query rg
rg1
rg2
rg3
rg4
rg5
rg6
#
You can also use more complex search expressions. Example 5-14 shows how you can use
simple regular expression command. In addition, you can search on more than one field, and
only those objects that match all provided searches are displayed.
Example 5-14 Simple regular expression command
# clmgr query rg name=rg[123]
rg1
rg2
rg3
#
The -a option
Some query commands produce a rather long output. You can use the -a (attributes) option
to obtain shorter output and for information about a single value as shown in Example 5-15.
You can also use this option to get information about several values as shown in
Example 5-16.
Example 5-15 List state of the cluster node
munich:/ # clmgr -a state query cluster
STATE="STABLE"
munich:/ #
Example 5-16 shows how to get information about the state and the location of a resource
group. The full output of the query command for the nfsrg resource group is shown in
Example 5-31 on page 123.
Example 5-16 List state and location of a resource group
munich:/ # clmgr -a STATE,Current query rg nfsrg
STATE="ONLINE"
CURRENT_NODE="berlin"
munich:/ #
Chapter 5. Configuring a PowerHA cluster
109
You can also use wildcards for getting information about some values as shown in
Example 5-17.
Example 5-17 The -a option and wildcards
munich:/ # clmgr -a "*NODE*" query rg nfsrg
CURRENT_NODE="berlin"
NODES="berlin munich"
PRIMARYNODES=""
PRIMARYNODES_STATE=""
SECONDARYNODES=""
SECONDARYNODES_STATE=""
NODE_PRIORITY_POLICY="default"
munich:/ #
The -v option
The -v (verbose) option is helpful when used with the query action as shown in
Example 5-18. You use this option almost exclusively in IBM Systems Director to scan the
cluster for information.
Example 5-18 The -v option for query all resource groups
munich:/ # clmgr -a STATE,current -v
STATE="ONLINE"
CURRENT_NODE="munich"
query rg
STATE="ONLINE"
CURRENT_NODE="berlin"
munich:/ #
If you do not use the -v option with the query action as shown in Example 5-18, you see an
error message similar to the one in Example 5-19.
Example 5-19 Error message when not using the -v option for query all resource groups
munich:/ # clmgr -a STATE,current query rg
ERROR: a name/label must be provided.
munich:/ #
Returning only one value
You might want only one value returned for a clmgr command. This requirement mainly
happens if you prefer to use the clmgr command in a script and do not like to get the
ATTR="VALUE" format. You only need the VALUE. Example 5-20 shows how you can ensure
that only one value is returned.
The command has the following syntax:
clmgr -cSa <ATTR> query <CLASS> <OBJECT>
Example 5-20 The command to return a single value from the clmgr command
# clmgr -cSa state query rg rg1
ONLINE
#
110
IBM PowerHA SystemMirror 7.1 for AIX
5.2.4 Using help in clmgr
You can use the -h option in combination with actions and object classes. For example, if you
want to know how to add a resource group to an existing cluster, you can use the clmgr add
resource_group -h command. Example 5-21 shows the output of using this command. For an
example of using the clmgr add resource_group command, see Example 5-28 on page 121.
Example 5-21 Help for adding resource group using the clmgr command
# clmgr add resource_group -h
# Available options for "clvt add resource_group":
<RESOURCE_GROUP_NAME>
NODES
PRIMARYNODES
SECONDARYNODES
FALLOVER
FALLBACK
STARTUP
FALLBACK_AT
SERVICE_LABEL
APPLICATIONS
VOLUME_GROUP
FORCED_VARYON
VG_AUTO_IMPORT
FILESYSTEM
FSCHECK_TOOL
RECOVERY_METHOD
FS_BEFORE_IPADDR
EXPORT_FILESYSTEM
EXPORT_FILESYSTEM_V4
MOUNT_FILESYSTEM
STABLE_STORAGE_PATH
WPAR_NAME
NFS_NETWORK
SHARED_TAPE_RESOURCES
DISK
AIX_FAST_CONNECT_SERVICES
COMMUNICATION_LINKS
WLM_PRIMARY
WLM_SECONDARY
MISC_DATA
CONCURRENT_VOLUME_GROUP
NODE_PRIORITY_POLICY
NODE_PRIORITY_POLICY_SCRIPT
NODE_PRIORITY_POLICY_TIMEOUT
SITE_POLICY
#
Object class names between the angle brackets (<>) are required information, which does not
mean that all the other items are optional. Some items might not be marked because of other
dependencies. In Example 5-22 on page 112, only CLUSTER_NAME is listed as required, but
because of the new CAA dependency, the REPOSITORY (disk) is also required. For more
details about how to create a cluster using the clmgr command, see “Configuring a new
cluster using the clmgr command” on page 113.
Chapter 5. Configuring a PowerHA cluster
111
Example 5-22 Help for creating a cluster
# clmgr add cluster -h
# Available options for "clvt add cluster":
<CLUSTER_NAME>
FC_SYNC_INTERVAL
NODES
REPOSITORY
SHARED_DISKS
CLUSTER_IP
RG_SETTLING_TIME
RG_DIST_POLICY
MAX_EVENT_TIME
MAX_RG_PROCESSING_TIME
SITE_POLICY_FAILURE_ACTION
SITE_POLICY_NOTIFY_METHOD
DAILY_VERIFICATION
VERIFICATION_NODE
VERIFICATION_HOUR
VERIFICATION_DEBUGGING
5.2.5 Configuring a PowerHA cluster using the clmgr command
In this section, you configure the two-node mutual takeover cluster with a focus on the
PowerHA configuration only. The system names are munich and berlin. This task does not
include the preliminary steps, which include setting up the IP interfaces and the shared disks.
For details and an example of the output, see “Starting the cluster using the clmgr command”
on page 127. All the steps in the referenced section were executed on the munich system.
To configure a PowerHA cluster by using the clmgr command, follow these steps:
1. Configure the cluster:
# clmgr add cluster de_cluster NODES=munich,berlin REPOSITORY=hdisk4
For details, see “Configuring a new cluster using the clmgr command” on page 113.
2. Configure the service IP addresses:
# clmgr add service_ip alleman NETWORK=net_ether_01 NETMASK=255.255.255.0
# clmgr add service_ip german NETWORK=net_ether_01 NETMASK=255.255.255.0
For details, see “Defining the service address using the clmgr command” on page 118.
3. Configure the application server:
# clmgr add application_controller http_app \
> STARTSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k start" \
> STOPSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k stop"
For details, see “Defining the application server using the clmgr command” on page 120.
4. Configure a resource group:
# clmgr add resource_group httprg VOLUME_GROUP=httpvg NODES=munich,berlin \
> SERVICE_LABEL=alleman APPLICATIONS=http_app
#
>
>
>
112
clmgr add resource_group nfsrg VOLUME_GROUP=nfsvg NODES=berlin,munich \
SERVICE_LABEL=german FALLBACK=NFB RECOVERY_METHOD=parallel \
FS_BEFORE_IPADDR=true EXPORT_FILESYSTEM="/nfsdir" \
MOUNT_FILESYSTEM="/sap;/nfsdir"
IBM PowerHA SystemMirror 7.1 for AIX
For details, see “Defining the resource group using the clmgr command” on page 120.
5. Sync the cluster:
clmgr sync cluster
For details, see “Synchronizing the cluster definitions by using the clmgr command” on
page 124.
6. Start the cluster:
clmgr online cluster start_cluster BROADCAST=false CLINFO=true
For details, see “Starting the cluster using the clmgr command” on page 127.
Command and syntax of clmgr: To ensure a robust and easy-to-use SMIT interface,
when using the clmgr command or CLI to configure or manage the PowerHA cluster, you
must use the correct command and syntax.
Configuring a new cluster using the clmgr command
Creating a cluster using the clmgr command is similar to using the typical configuration
through SMIT (described in 5.1.3, “Typical configuration of a cluster topology” on page 69). If
you want a method that is similar to the custom configuration in SMIT, you must use a
combination of the classical PowerHA commands and the clmgr command. The steps in the
following sections use the clmgr command only.
Preliminary setup
Prerequisite: In this section, you must know how to set up the prerequisites for a PowerHA
cluster.
The IP interfaces are already defined and the shared volume groups and file systems have
been created. The host names of the two systems are munich and berlin. Figure 5-45 shows
the disks and shared volume groups that are defined so far. hdisk4 is used as the CAA
repository disk.
munich:/ # lspv
hdisk1
00c0f6a012446137
hdisk2
00c0f6a01245190c
hdisk3
00c0f6a012673312
hdisk4
00c0f6a01c784107
hdisk0
00c0f6a07c5df729
munich:/ #
httpvg
httpvg
nfsvg
None
rootvg
active
Figure 5-45 List of available disks
Chapter 5. Configuring a PowerHA cluster
113
Figure 5-46 shows the network interfaces that are defined on the munich system.
munich:/ # netstat -i
Name Mtu
Network
Address
en0
1500 link#2
a2.4e.58.a0.41.3
en0
1500 192.168.100 munich
en1
1500 link#3
a2.4e.58.a0.41.4
en1
1500 100.168.200 munichb1
en2
1500 link#4
a2.4e.58.a0.41.5
en2
1500 100.168.220 munichb2
lo0
16896 link#1
lo0
16896 127
localhost.locald
lo0
16896 localhost6.localdomain6
munich:/ #
Ipkts Ierrs
23992
0
23992
0
2
0
2
0
4324
0
4324
0
16039
0
16039
0
16039
0
Opkts Oerrs Coll
24516
0
0
24516
0
0
7
0
0
7
0
0
7
0
0
7
0
0
16039
0
0
16039
0
0
16039
0
0
Figure 5-46 Defined network interfaces
Creating the cluster
To begin, define the cluster along with the repository disk. If you do not remember all the
options for creating a cluster with the clmgr command, use the clmgr add cluster -h
command. Example 5-22 on page 112 shows the output of this command.
Before you use the clmgr add cluster command, you must know which disk will be used for
the CAA repository disk. Example 5-23 shows the command and its output.
Table 5-3 provides more details about the command and arguments that are used.
Table 5-3 Creating a cluster using the clmgr command
Action, object class, or
argument
Value used
Comment
add
Basic preferred action
cluster
Basic object class used
CLUSTER_NAME
de_cluster
Optional argument name, but required value
NODES
munich, berlin
Preferred node to use in the cluster
REPOSITORY
hdisk4
The disk for the CAA repository
Example 5-23 Creating a cluster using the clmgr command
munich:/ # clmgr add cluster de_cluster NODES=munich,berlin REPOSITORY=hdisk4
Cluster Name: de_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: None
Cluster IP Address:
There are 2 node(s) and 2 network(s) defined
NODE berlin:
Network net_ether_01
berlinb2
100.168.220.141
berlinb1
100.168.200.141
Network net_ether_010
114
IBM PowerHA SystemMirror 7.1 for AIX
berlin 192.168.101.141
NODE munich:
Network net_ether_01
munichb2
100.168.220.142
munichb1
100.168.200.142
Network net_ether_010
munich 192.168.101.142
No resource groups defined
clharvest_vg: Initializing....
Gathering cluster information, which may take a few minutes...
clharvest_vg: Processing...
Storing the following information in file
/usr/es/sbin/cluster/etc/config/clvg_config
berlin:
Hdisk:
hdisk1
PVID:
00c0f6a012446137
VGname:
httpvg
VGmajor:
100
Conc-capable:
Yes
VGactive:
No
Quorum-required:Yes
Hdisk:
hdisk2
PVID:
00c0f6a01245190c
VGname:
httpvg
VGmajor:
100
Conc-capable:
Yes
VGactive:
No
Quorum-required:Yes
munich:
Hdisk:
hdisk1
PVID:
00c0f6a012446137
VGname:
httpvg
VGmajor:
100
Conc-capable:
Yes
VGactive:
No
Quorum-required:Yes
berlin:
Hdisk:
hdisk3
PVID:
00c0f6a012673312
VGname:
nfsvg
VGmajor:
200
Conc-capable:
Yes
VGactive:
No
Quorum-required:Yes
munich:
Hdisk:
hdisk2
Chapter 5. Configuring a PowerHA cluster
115
PVID:
00c0f6a01245190c
VGname:
httpvg
VGmajor:
100
Conc-capable:
Yes
VGactive:
No
Quorum-required:Yes
berlin:
Hdisk:
hdisk4
PVID:
00c0f6a01c784107
VGname:
None
VGmajor:
0
Conc-capable:
No
VGactive:
No
Quorum-required:No
munich:
Hdisk:
hdisk3
PVID:
00c0f6a012673312
VGname:
nfsvg
VGmajor:
200
Conc-capable:
Yes
VGactive:
No
Quorum-required:Yes
berlin:
Hdisk:
hdisk0
PVID:
00c0f6a048cf8bfd
VGname:
rootvg
VGmajor:
10
Conc-capable:
No
VGactive:
Yes
Quorum-required:Yes
FREEMAJORS:
35..99,101..199,201...
munich:
Hdisk:
hdisk4
PVID:
00c0f6a01c784107
VGname:
None
VGmajor:
0
Conc-capable:
No
VGactive:
No
Quorum-required:No
Hdisk:
hdisk0
PVID:
00c0f6a07c5df729
VGname:
rootvg
VGmajor:
10
Conc-capable:
No
VGactive:
Yes
Quorum-required:Yes
FREEMAJORS:
35..99,101..199,201...
116
IBM PowerHA SystemMirror 7.1 for AIX
Cluster Name: de_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: hdisk4
Cluster IP Address:
There are 2 node(s) and 2 network(s) defined
NODE berlin:
Network net_ether_01
berlinb2
100.168.220.141
berlinb1
100.168.200.141
Network net_ether_010
berlin 192.168.101.141
NODE munich:
Network net_ether_01
munichb1
100.168.200.142
munichb2
100.168.220.142
Network net_ether_010
munich 192.168.101.142
No resource groups defined
Warning: There is no cluster found.
cllsclstr: No cluster defined
cllsclstr: Error reading configuration
Communication path berlin discovered a new node. Hostname is berlin. Adding it to
the configuration with
Nodename berlin.
Communication path munich discovered a new node. Hostname is munich. Adding it to
the configuration with
Nodename munich.
Discovering IP Network Connectivity
Discovered [10] interfaces
IP Network Discovery completed normally
Current cluster configuration:
Discovering Volume Group Configuration
Current cluster configuration:
munich:/ #
Chapter 5. Configuring a PowerHA cluster
117
To see the configuration up to this point, you can use the cltopinfo command. Keep in mind
that this information is local to the system on which you are working. Example 5-24 shows the
configuration up to this point.
Example 5-24 Output of the cltopinfo command after creating cluster definitions
munich:/ # cltopinfo
Cluster Name: de_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: hdisk4
Cluster IP Address:
There are 2 node(s) and 2 network(s) defined
NODE berlin:
Network net_ether_01
berlinb2
100.168.220.141
berlinb1
100.168.200.141
Network net_ether_010
berlin 192.168.101.141
NODE munich:
Network net_ether_01
munichb1
100.168.200.142
munichb2
100.168.220.142
Network net_ether_010
munich 192.168.101.142
No resource groups defined
munich:/ #
Defining the service address using the clmgr command
Next you define the service addresses. Example 5-25 on page 119 shows the command and
its output.
The clmgr add cluster command: The clmgr add cluster command automatically runs
discovery on IP and volume group harvesting. It results in adding the IP network interfaces
automatically to the cluster configuration.
Table 5-4 provides more details about the command and arguments that are used.
Table 5-4 Defining the service address using the clmgr command
Action, object class, or
argument
118
Value used
Comment
add
Basic preferred action
service_ip
Basic object class used
SERVICE_IP_NAME
alleman
german
Optional argument name, but required value
NETWORK
net_ether_01
The network name from the cltopinfo command used
previously
IBM PowerHA SystemMirror 7.1 for AIX
Action, object class, or
argument
Value used
Comment
NETMASK
255.255.255.0
Optional; when you specify a value, use the same one
that you used in setting up the interface.
Example 5-25 Defining the service address
munich:/ # clmgr add service_ip alleman NETWORK=net_ether_01 NETMASK=255.255.255.0
munich:/ # clmgr add service_ip german NETWORK=net_ether_01 NETMASK=255.255.255.0
munich:/ #
To check the configuration up to this point, use the cltopinfo command again. Example 5-26
shows the current configuration.
Example 5-26 The cltopinfo output after creating cluster definitions
munich:/ # cltopinfo
Cluster Name: de_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: hdisk4
Cluster IP Address:
There are 2 node(s) and 2 network(s) defined
NODE berlin:
Network net_ether_01
german 10.168.101.141
alleman 10.168.101.142
berlinb1
100.168.200.141
berlinb2
100.168.220.141
Network net_ether_010
berlin 192.168.101.141
NODE munich:
Network net_ether_01
german 10.168.101.141
alleman 10.168.101.142
munichb1
100.168.200.142
munichb2
100.168.220.142
Network net_ether_010
munich 192.168.101.142
No resource groups defined
munich:/ #
Chapter 5. Configuring a PowerHA cluster
119
Defining the application server using the clmgr command
Next you define the application server. Example 5-27 shows the command and its output. The
application is named server http_app.
Table 5-5 provides more details about the command and arguments that are used.
Table 5-5 Defining the application server using the clmgr command
Action, object class, or
argument
Value used
Comment
add
Basic preferred action
application_controller
Basic object class used
APPLICATION_SERVER_NAME
http_app
Optional argument name, but
required value
STARTSCRIPT
"/usr/IBM/HTTPServer/bin/
apachectl -k start"
The start script used for the
application
STOPSCRIPT
"/usr/IBM/HTTPServer/bin/
apachectl -k stop"
The stop script used for the
application
Example 5-27 Defining the service address
munich:/ # munich:/ # clmgr add application_controller http_app \
> STARTSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k start" \
> STOPSCRIPT="/usr/IBM/HTTPServer/bin/apachectl -k stop"
munich:/ #
Defining the resource group using the clmgr command
Next you define the resource groups. Example 5-28 on page 121 shows the commands and
their output.
Compared to the smit functions, by using the clmgr command, you create a resource group
and its resources in one step. Therefore, you must ensure that you have defined all the
service IP addresses and your application servers.
Two resource groups have been created. The first one uses only the items needed for this
resource group (httprg), so that the system used the default values for the remaining
arguments. Table 5-6 provides more details about the command and arguments that are
used.
Table 5-6 Defining the resource groups using the clmgr (httprg) command
action, object class, or
argument
120
Value used
comment
add
Basic preferred action.
resource_group
Basic object class used.
RESOURCE_GROUP_NAME
httprg
Optional argument name, but required value.
VOLUME_GROUP
httpvg
The volume group used for this resource group.
NODES
munich,berlin
The sequence of the nodes is important. The first
node is the primary node.
IBM PowerHA SystemMirror 7.1 for AIX
action, object class, or
argument
Value used
comment
SERVICE_LABEL
alleman
The service address used for this resource
group.
APPLICATIONS
http_app
The application server label created in a previous
step.
For the second resource group in the test environment, we specified more details because we
did not want to use the default values (nfsrg). Table 5-7 provides more details about the
command and arguments that we used.
Table 5-7 Defining the resource groups using the clmgr (nfsrg) command
Action, object class, or
argument
Value used
Comment
add
Basic preferred action.
resource_group
Basic object class used.
RESOURCE_GROUP_NAME
httprg
Optional argument name, but required value.
VOLUME_GROUP
nfsvg
The volume group use for this resource group.
NODES
berlin,munich
The sequence of the nodes is important. The
first node is the primary node.
SERVICE_LABEL
german
The service address used for this resource
group.
FALLBACK
NFB
Never Fall Back (NFB) preferred for this
resource group (the default is FBHPN).
RECOVERY_METHOD
parallel
Parallel preferred as the recovery method for
this resource group. (The default is sequential.)
FS_BEFORE_IPADDR
true
Because we want to define an NFS cross
mount, we must use the value true here. (The
default is false.)
EXPORT_FILESYSTEM
/nfsdir
The file system for NFS to export.
MOUNT_FILESYSTEM
"/sap;/nfsdir"
Requires the same syntax because we used it in
smit to define the NFS cross mount.
Example 5-28 shows the commands that are used to define the resource groups listed in
Table 5-6 on page 120 and Table 5-7.
Example 5-28 Defining the resource groups
munich:/ # clmgr add resource_group httprg VOLUME_GROUP=httpvg \
> NODES=munich,berlin SERVICE_LABEL=alleman APPLICATIONS=http_app
Auto Discover/Import of Volume Groups was set to true.
Gathering cluster information, which may take a few minutes.
munich:/ #
munich:/ # clmgr add resource_group nfsrg VOLUME_GROUP=nfsvg \
> NODES=berlin,munich SERVICE_LABEL=german FALLBACK=NFB \
> RECOVERY_METHOD=parallel FS_BEFORE_IPADDR=true EXPORT_FILESYSTEM="/nfsdir" \
> MOUNT_FILESYSTEM="/sap;/nfsdir"
Chapter 5. Configuring a PowerHA cluster
121
Auto Discover/Import of Volume Groups was set to true.
Gathering cluster information, which may take a few minutes.
munich:/ #
To see the configuration up to this point, use the clmgr query command. Example 5-29
shows how to check which resource groups you defined.
Example 5-29 Listing the defined resource groups using the clmgr command
munich:/ # clmgr query rg
httprg
nfsrg
munich:/ #
Next, you can see the content that you created for the resource groups. Example 5-30 shows
the content of the httprg. As discussed previously, the default values for this resource group
were used as much as possible.
Example 5-30 Contents listing of httprg
munich:/ # clmgr query rg httprg
NAME="httprg"
STATE="UNKNOWN"
CURRENT_NODE=""
NODES="munich berlin"
PRIMARYNODES=""
PRIMARYNODES_STATE="UNKNOWN"
SECONDARYNODES=""
SECONDARYNODES_STATE="UNKNOWN"
TYPE=""
APPLICATIONS="http_app"
STARTUP="OHN"
FALLOVER="FNPN"
FALLBACK="FBHPN"
NODE_PRIORITY_POLICY="default"
SITE_POLICY="ignore"
DISK=""
VOLUME_GROUP="httpvg"
CONCURRENT_VOLUME_GROUP=""
FORCED_VARYON="false"
FILESYSTEM=""
FSCHECK_TOOL="fsck"
RECOVERY_METHOD="sequential"
EXPORT_FILESYSTEM=""
SHARED_TAPE_RESOURCES=""
AIX_CONNECTIONS_SERVICES=""
AIX_FAST_CONNECT_SERVICES=""
COMMUNICATION_LINKS=""
MOUNT_FILESYSTEM=""
SERVICE_LABEL="alleman"
MISC_DATA=""
SSA_DISK_FENCING="false"
VG_AUTO_IMPORT="false"
INACTIVE_TAKEOVER="false"
CASCADE_WO_FALLBACK="false"
122
IBM PowerHA SystemMirror 7.1 for AIX
FS_BEFORE_IPADDR="false"
NFS_NETWORK=""
MOUNT_ALL_FS="true"
WLM_PRIMARY=""
WLM_SECONDARY=""
FALLBACK_AT=""
RELATIONSHIP=""
SRELATIONSHIP="ignore"
GMD_REP_RESOURCE=""
PPRC_REP_RESOURCE=""
ERCMF_REP_RESOURCE=""
SRDF_REP_RESOURCE=""
TRUCOPY_REP_RESOURCE=""
SVCPPRC_REP_RESOURCE=""
GMVG_REP_RESOURCE=""
EXPORT_FILESYSTEM_V4=""
STABLE_STORAGE_PATH=""
WPAR_NAME=""
VARYON_WITH_MISSING_UPDATES="true"
DATA_DIVERGENCE_RECOVERY="ignore"
munich:/ #
Now you can see the content that was created for the resource groups. Example 5-31 shows
the content of the nfsrg resource group.
Example 5-31 List the content of nfsrg resource group
munich:/ # clmgr query rg nfsrg
NAME="nfsrg"
STATE="UNKNOWN"
CURRENT_NODE=""
NODES="berlin munich"
PRIMARYNODES=""
PRIMARYNODES_STATE="UNKNOWN"
SECONDARYNODES=""
SECONDARYNODES_STATE="UNKNOWN"
TYPE=""
APPLICATIONS=""
STARTUP="OHN"
FALLOVER="FNPN"
FALLBACK="NFB"
NODE_PRIORITY_POLICY="default"
SITE_POLICY="ignore"
DISK=""
VOLUME_GROUP="nfsvg"
CONCURRENT_VOLUME_GROUP=""
FORCED_VARYON="false"
FILESYSTEM=""
FSCHECK_TOOL="fsck"
RECOVERY_METHOD="parallel"
EXPORT_FILESYSTEM="/nfsdir"
SHARED_TAPE_RESOURCES=""
AIX_CONNECTIONS_SERVICES=""
AIX_FAST_CONNECT_SERVICES=""
COMMUNICATION_LINKS=""
MOUNT_FILESYSTEM="/sap;/nfsdir"
Chapter 5. Configuring a PowerHA cluster
123
SERVICE_LABEL="german"
MISC_DATA=""
SSA_DISK_FENCING="false"
VG_AUTO_IMPORT="false"
INACTIVE_TAKEOVER="false"
CASCADE_WO_FALLBACK="false"
FS_BEFORE_IPADDR="true"
NFS_NETWORK=""
MOUNT_ALL_FS="true"
WLM_PRIMARY=""
WLM_SECONDARY=""
FALLBACK_AT=""
RELATIONSHIP=""
SRELATIONSHIP="ignore"
GMD_REP_RESOURCE=""
PPRC_REP_RESOURCE=""
ERCMF_REP_RESOURCE=""
SRDF_REP_RESOURCE=""
TRUCOPY_REP_RESOURCE=""
SVCPPRC_REP_RESOURCE=""
GMVG_REP_RESOURCE=""
EXPORT_FILESYSTEM_V4=""
STABLE_STORAGE_PATH=""
WPAR_NAME=""
VARYON_WITH_MISSING_UPDATES="true"
DATA_DIVERGENCE_RECOVERY="ignore"
munich:/ #
Synchronizing the cluster definitions by using the clmgr command
After you create all topology and resource information, synchronize the cluster.
Verifying and propagating the changes: After using the clmgr command to modify the
cluster configuration, enter the clmgr verify cluster and clmgr sync cluster commands
to verify and propagate the changes to all nodes.
Example 5-32 shows usage of the clmgr sync cluster command to synchronize the cluster
and the command output.
Example 5-32 Synchronizing the cluster using the clmgr sync cluster command
munich:/ # clmgr sync cluster
Verification to be performed on the following:
Cluster Topology
Cluster Resources
Retrieving data from available cluster nodes.
This could take a few minutes.
Start data collection on node berlin
Start data collection on node munich
Waiting on node berlin data collection, 15 seconds elapsed
Waiting on node munich data collection, 15 seconds elapsed
Collector on node berlin completed
124
IBM PowerHA SystemMirror 7.1 for AIX
Collector on node munich completed
Data collection complete
Verifying Cluster Topology...
Completed 10 percent of the verification checks
berlin
munich
net_ether_010
net_ether_010
Completed 20 percent of the verification checks
Completed 30 percent of the verification checks
Verifying Cluster Resources...
Completed 40 percent of the verification checks
http_app
Completed
Completed
Completed
Completed
Completed
Completed
httprg
50 percent of the verification checks
60 percent of the verification checks
70 percent of the verification checks
80 percent of the verification checks
90 percent of the verification checks
100 percent of the verification checks
Remember to redo automatic error notification if configuration has changed.
Verification has completed normally.
Committing any changes, as required, to all available nodes...
Adding any necessary PowerHA SystemMirror for AIX entries to /etc/inittab and
/etc/rc.net for IP Address
Takeover on node munich.
Adding any necessary PowerHA SystemMirror for AIX entries to /etc/inittab and
/etc/rc.net for IP Address
Takeover on node berlin.
Verification has completed normally.
WARNING: Multiple communication interfaces are recommended for networks that
use IP aliasing in order to prevent the communication interface from
becoming a single point of failure. There are fewer than the recommended
number of communication interfaces defined on the following node(s) for
the given network(s):
Node:
Network:
------------------------------------------------------------------WARNING: Not all cluster nodes have the same set of HACMP filesets installed.
The following is a list of fileset(s) missing, and the node where the
fileset is missing:
Fileset:
--------------------------------
Node:
--------------------------------
Chapter 5. Configuring a PowerHA cluster
125
WARNING: There are IP labels known to HACMP and not listed in file
/usr/es/sbin/cluster/etc/clhosts.client on
node: berlin. Clverify can automatically populate this file to be used on a client
node, if executed in
auto-corrective mode.
WARNING: There are IP labels known to HACMP and not listed in file
/usr/es/sbin/cluster/etc/clhosts.client on
node: munich. Clverify can automatically populate this file to be used on a client
node, if executed in
auto-corrective mode.
WARNING: Network option "nonlocsrcroute" is set to 0 and will be set to 1 on
during HACMP startup on the
following nodes:
berlin
munich
WARNING: Network option "ipsrcrouterecv" is set to 0 and will be set to 1 on
during HACMP startup on the
following nodes:
berlin
munich
WARNING: Application monitors are required for detecting application failures
in order for HACMP to recover from them. Application monitors are started
by HACMP when the resource group in which they participate is activated.
The following application(s), shown with their associated resource group,
do not have an application monitor configured:
Application Server
Resource Group
-------------------------------- --------------------------------WARNING: Node munich has cluster.es.nfs.rte installed however grace periods are
not fully enabled on this node.
Grace periods must be enabled before NFSv4 stable storage can be used.
HACMP will attempt to fix this opportunistically when acquiring NFS resources on
this node however the change
won't take effect until the next time that nfsd is started.
If this warning persists, the administrator should perform the following steps to
enable grace periods on
126
IBM PowerHA SystemMirror 7.1 for AIX
munich at the next planned downtime:
1. stopsrc -s nfsd
2. smitty nfsgrcperiod
3. startsrc -s nfsd
munich:/ #
When the migration finishes successfully, the CAA repository disk is now defined. Figure 5-47
shows the disks before the cluster synchronization, which are the same as those shown in
Figure 5-45 on page 113.
munich:/ # lspv
hdisk1
00c0f6a012446137
hdisk2
00c0f6a01245190c
hdisk3
00c0f6a012673312
hdisk4
00c0f6a01c784107
hdisk0
00c0f6a07c5df729
munich:/ #
httpvg
httpvg
nfsvg
None
rootvg
active
Figure 5-47 List of available disks before sync
Figure 5-48 shows the output of the lspv command after the synchronization. In our example,
hdisk4 is now converted into a CAA repository disk and is listed as caa_private0.
munich:/ # lspv
hdisk1
hdisk2
hdisk3
caa_private0
hdisk0
munich:/ #
00c0f6a012446137
00c0f6a01245190c
00c0f6a012673312
00c0f6a01c784107
00c0f6a07c5df729
httpvg
httpvg
nfsvg
caavg_private
rootvg
active
active
Figure 5-48 List of available disks after using the cluster sync command
Starting the cluster using the clmgr command
To determine whether the cluster is configured correctly, test the cluster. To begin, start the
cluster nodes.
Example 5-33 show the command that we used and some of the output from using this
command. To start the clinfo command, we used the CLINFO=true argument. We did not
want a broadcast message. Therefore, we also defined the BROADCAST=false argument.
Example 5-33 Starting the cluster by using the clmgr command
munich:/ # clmgr online cluster start_cluster BROADCAST=false CLINFO=true
Warning: "WHEN" must be specified. Since it was not,
a default of "now" will be used.
Warning: "MANAGE" must be specified. Since it was not,
a default of "auto" will be used.
Chapter 5. Configuring a PowerHA cluster
127
/usr/es/sbin/cluster/diag/cl_ver_alias_topology[42] [[ high = high ]]
---
skipped lines
---
/usr/es/sbin/cluster/diag/cl_ver_alias_topology[335] return 0
WARNING: Multiple communication interfaces are recommended for networks that
use IP aliasing in order to prevent the communication interface from
becoming a single point of failure. There are fewer than the recommended
number of communication interfaces defined on the following node(s) for
the given network(s):
Node:
---------------------------------berlin
munich
Network:
---------------------------------net_ether_010
net_ether_010
WARNING: Network option "nonlocsrcroute" is set to 0 and will be set to 1 on during HACMP
startup on the following nodes:
munich
WARNING: Network option "ipsrcrouterecv" is set to 0 and will be set to 1 on during HACMP
startup on the following nodes:
munich
WARNING: Application monitors are required for detecting application failures
in order for HACMP to recover from them. Application monitors are started
by HACMP when the resource group in which they participate is activated.
The following application(s), shown with their associated resource group,
do not have an application monitor configured:
Application Server
Resource Group
-------------------------------- --------------------------------http_app
httprg
/usr/es/sbin/cluster/diag/clwpardata[23] [[ high == high ]]
---
skipped lines
---
/usr/es/sbin/cluster/diag/clwpardata[325] exit 0
WARNING: Node munich has cluster.es.nfs.rte installed however grace periods are not fully
enabled on this node. Grace periods must be enabled before NFSv4 stable storage can be used.
HACMP will attempt to fix this opportunistically when acquiring NFS resources on this node
however the change won't take effect until the next time that nfsd is started.
If this warning persists, the administrator should perform the following steps to enable grace
periods on munich at the next planned downtime:
1. stopsrc -s nfsd
2. smitty nfsgrcperiod
3. startsrc -s nfsd
berlin: start_cluster: Starting PowerHA SystemMirror
128
IBM PowerHA SystemMirror 7.1 for AIX
berlin:
berlin:
berlin:
berlin:
munich:
munich:
munich:
munich:
munich:
2359456
- 0:09 syslogd
Setting routerevalidate to 1
0513-059 The clevmgrdES Subsystem has been started. Subsystem PID is 10682520.
0513-059 The clinfoES Subsystem has been started. Subsystem PID is 10027062.
start_cluster: Starting PowerHA SystemMirror
3408044
- 0:07 syslogd
Setting routerevalidate to 1
0513-059 The clevmgrdES Subsystem has been started. Subsystem PID is 5505122.
0513-059 The clinfoES Subsystem has been started. Subsystem PID is 6029442.
The cluster is now online.
munich:/ #
Starting all nodes in a cluster: The clmgr online cluster start_cluster command
starts all nodes in a cluster by default.
Example 5-49 shows that all nodes are now up and running.
clstat - HACMP Cluster Status Monitor
------------------------------------Cluster: de_cluster
(1126819374)
Wed Oct 13 17:27:30 EDT 2010
State: UP
SubState: STABLE
Nodes: 2
Node: berlin
State: UP
Interface: berlinb1 (0)
Interface: berlinb2 (0)
Interface: berlin (1)
Interface: german (0)
Address:
State:
Address:
State:
Address:
State:
Address:
State:
100.168.200.141
UP
100.168.220.141
UP
192.168.101.141
UP
10.168.101.141
UP
State: On line
Resource Group: nfsrg
Node: munich
State: UP
Interface: munichb1 (0)
Interface: munichb2 (0)
Interface: munich (1)
Interface: alleman (0)
Resource Group: httprg
Address:
State:
Address:
State:
Address:
State:
Address:
State:
100.168.200.142
UP
100.168.220.142
UP
192.168.101.142
UP
10.168.101.142
UP
State: On line
************************ f/forward, b/back, r/refresh, q/quit *****************
Figure 5-49 Output of the clstat -a command showing that all nodes are running
Chapter 5. Configuring a PowerHA cluster
129
5.2.6 Alternative output formats for the clmgr command
All of the previous examples use the ATTR="VALUE" format. However, two other formats are
supported. One format is colon-delimited (by using -c). The other format is simple XML (by
using -x).
Colon-delimited format
When using the colon-delimited output format (-c), you can use the -S option to silence or
eliminate the header line.
Example 5-34 shows the colon-delimited output format.
Example 5-34 The colon-delimited output format
# clmgr query ac appctl1
NAME="appctl1"
MONITORS=""
STARTSCRIPT="/bin/hostname"
STOPSCRIPT="/bin/hostname"
# clmgr -c query ac appctl1
# NAME:MONITORS:STARTSCRIPT:STOPSCRIPT
appctl1::/bin/hostname:/bin/hostname
# clmgr -cS query ac appctl1
appctl1::/bin/hostname:/bin/hostname
Simple XML format
Example 5-35 shows the simple XML-based output format.
Example 5-35 Simple XML-based output format
# clmgr -x query ac appctl1
<APPLICATION_CONTROLLERS>
<APPLICATION_CONTROLLER>
<NAME>appctl1</NAME>
<MONITORS></MONITORS>
<STARTSCRIPT>/bin/hostname</STARTSCRIPT>
<STOPSCRIPT>/bin/hostname</STOPSCRIPT>
</APPLICATION_CONTROLLER>
</APPLICATION_CONTROLLERS>
5.2.7 Log file of the clmgr command
The traditional PowerHA practice of setting VERBOSE_LOGGING to produce debug output is
supported with the clmgr command. You can also set VERBOSE_LOGGING on a per-run
basis with the clmgr -l command. The -l flag has the following options:
130
low
Typically of interest to support personnel; shows simple function entry and exit.
med
Typically of interest to support personnel; shows the same information as the low
option, but includes input parameters and return codes.
high
The recommended setting for customer use; turns on set -x in scripts (equivalent
to VERBOSE_LOGGING=high) but leaves out internal utility functions.
IBM PowerHA SystemMirror 7.1 for AIX
max
Turns on everything that the high option does and omits nothing. Is likely to make
debugging more difficult because of the volume of output that is produced.
Attention: The max value might have a negative impact on performance.
The main log file for clmgr debugging is the /var/hacmp/log/clutils.log file. This log file
includes all standard error and output from each command.
The return codes used by the clmgr command are standard for all commands:
RC_UNKNOWN=-1
A result is not known. It is useful as an initializer.
RC_SUCCESS=0
No errors were detected; the operation seems to have
been successful.
RC_ERROR=1
A general error has occurred.
RC_NOT_FOUND=2
A specified resource does not exist or could not be
found.
RC_MISSING_INPUT=3
Some required input was missing.
RC_INCORRECT_INPUT=4
Some detected input was incorrect.
RC_MISSING_DEPENDENCY=5
A required dependency does not exist.
RC_SEARCH_FAILED=6
A specified search failed to match any data.
Example 5-36 lists the format of the trace information in the clutils.log file.
Example 5-36 The trace information in the clutils.log file
<SENTINEL>:<RETURN_CODE>:<FILE>:<FUNCTION>[<LINENO>](<ELAPSED_TIME>):
<TRANSACTION_ID>:<PID>:<PPID>: <SCRIPT_LINE>
The following line shows an example of how the clutils.log file might be displayed:
CLMGR:0:resource_common:SerializeAsAssociativeArray()[537](0.704):13327:9765002:90
44114: unset 'array[AIX_LEVEL0]'
Example 5-37 shows some lines from the clutils.log file (not using trace).
Example 5-37 The clutils.log file
CLMGR STARTED (243:7667890:9437234): Wed Oct 6 23:51:22 CDT 2010
CLMGR USER
(243:7667890:9437234): ::root:system
CLMGR COMMAND (243:7012392:7667890): clmgr -T 243 modify cluster
hacmp2728_cluster REPOSITORY=hdisk2
CLMGR ACTUAL (243:7012392:7667890): modify_cluster properties
hdisk2
CLMGR RETURN (243:7012392:7667890): 0
CLMGR STDERR -- BEGIN (243:7667890:9437234): Wed Oct 6 23:51:26 CDT 2010
Current cluster configuration:
CLMGR STDERR -- END
(243:7667890:9437234): Wed Oct 6 23:51:26 CDT 2010
CLMGR ENDED
(243:7667890:9437234): Wed Oct 6 23:51:26 CDT 2010
CLMGR ELAPSED (243:7667890:9437234): 3.720
Chapter 5. Configuring a PowerHA cluster
131
5.2.8 Displaying the log file content by using the clmgr command
You can use the clmgr action view command to view the log content.
Defining the number of lines returned
By using the TAIL argument, you can define the number of clmgr command-related lines that
are returned from the clutils.log file. Example 5-38 shows how you can specify 1000 lines
of clmgr log information.
Example 5-38 Using the TAIL argument when viewing the content of the clmgr log file
# clmgr view log clutils.log TAIL=1000 | wc -l
1000
#
Filtering special items by using the FILTER argument
You can use the FILTER argument to filter special items that you are looking for.
Example 5-39 shows how to list just the last 10 clmgr commands that were run.
Example 5-39 Listing the last 10 clmgr commands
# clmgr view log clutils.log TAIL=10 FILTER="CLMGR COMMAND"
CLMGR COMMAND (12198:13828308:15138846): clmgr -T 12198 add
application_controller appctl1 start=/bin/hostname stop=/bin/hostname
CLMGR COMMAND (2629:15138850:17891482): clmgr -T 2629 query
application_controller appctl1
CLMGR COMMAND (4446:19464210:17891482): clmgr -c -T 4446 query
application_controller appctl1
CLMGR COMMAND (23101:19464214:17891482): clmgr -c -S -T 23101 query
application_controller appctl1
CLMGR COMMAND (24919:17826012:17891482): clmgr -x -T 24919 query
application_controller appctl1
CLMGR COMMAND (464:14352476:15138926): clmgr -T 464 view log clutils.log
CLMGR COMMAND (18211:15728818:15138928): clmgr -T 18211 view log clutils.log
CLMGR COMMAND (10884:13828210:14156024): clmgr -T 10884 view log clutils.log
CLMGR COMMAND (28631:17629296:14156026): clmgr -T 28631 view log clutils.log
CLMGR COMMAND (19061:17825922:14156028): clmgr -T 19061 view log clutils.log
TAIL=1000
#
Example 5-40 shows how to list the last five clmgr query commands that were run.
Example 5-40 Listing the last five clmgr query commands
# clmgr view log clutils.log TAIL= FILTER="CLMGR COMMAND",query
CLMGR COMMAND (9047:17825980:17891482): clmgr -x -T 9047 query resource_group rg1
CLMGR COMMAND (2629:15138850:17891482): clmgr -T 2629 query
application_controller appctl1
CLMGR COMMAND (4446:19464210:17891482): clmgr -c -T 4446 query
application_controller appctl1
CLMGR COMMAND (23101:19464214:17891482): clmgr -c -S -T 23101 query
application_controller appctl1
CLMGR COMMAND (24919:17826012:17891482): clmgr -x -T 24919 query
application_controller appctl1
#
132
IBM PowerHA SystemMirror 7.1 for AIX
5.3 PowerHA SystemMirror for IBM Systems Director
Using the web browser graphical user interface makes it easy to complete the configuration
and management tasks by mouse clicks. For example, you can easily create a cluster, verify
and synchronize a cluster, and add nodes to a cluster)
Director client agent of PowerHA SystemMirror is installed on cluster nodes in the same
manner as PowerHA SystemMirror itself by using the installp command. The Director
server and PowerHA server plug-in installation require a separate effort. You must download
them from the external website and manually install them on a dedicated system. This system
does not have to be a PowerHA system.
To learn about installing the Systems Director and PowerHA components, and their use for
configuration and management tasks, see Chapter 12, “Creating and managing a cluster
using IBM Systems Director” on page 333.
Chapter 5. Configuring a PowerHA cluster
133
134
IBM PowerHA SystemMirror 7.1 for AIX
6
Chapter 6.
IBM PowerHA SystemMirror
Smart Assist for DB2
PowerHA SystemMirror Smart Assist for DB2 is included in the base Standard Edition
software. It simplifies and minimizes the time and effort of making a non-DPF DB2 database
highly available. The Smart Assist automatically discovers DB2 instances and databases and
creates start and stop scripts for the instances. The Smart Assist also creates process and
custom PowerHA application monitors that help to keep the DB2 instances highly available.
This chapter explains how to configure a hot standby two-node IBM PowerHA SystemMirror
7.1 cluster using the Smart Assist for DB2. The lab cluster korea is used for the examples with
the participating nodes seoul and busan.
This chapter includes the following topics:
Prerequisites
Implementing a PowerHA SystemMirror cluster and Smart Assist for DB2 7.1
© Copyright IBM Corp. 2011. All rights reserved.
135
6.1 Prerequisites
This section describes the prerequisites for the Smart Assist implementation.
6.1.1 Installing the required file sets
You must install two additional file sets, as shown in Example 6-1, before using Smart Assist
for DB2.
Example 6-1 Additional file sets required for installing Smart Assist
seoul:/ # clcmd lslpp -l cluster.es.assist.common cluster.es.assist.db2
------------------------------NODE seoul
------------------------------Fileset
Level State
Description
---------------------------------------------------------------------------Path: /usr/lib/objrepos
cluster.es.assist.common
7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist Common Files
cluster.es.assist.db2
7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist for DB2
------------------------------NODE busan
------------------------------Fileset
Level State
Description
---------------------------------------------------------------------------Path: /usr/lib/objrepos
cluster.es.assist.common
7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist Common Files
cluster.es.assist.db2
7.1.0.1 COMMITTED PowerHA SystemMirror Smart
Assist for DB2
6.1.2 Installing DB2 on both nodes
The DB2 versions supported by the PowerHA Smart Assist are versions 8.1, 8.2, 9.1, and 9.5.
For the example in this appendix, DB2 9.5 has been installed on both nodes, seoul and
busan, as shown in Example 6-2.
Example 6-2 DB2 version installed
seoul:/db2/db2pok # db2pd -v
Instance db2pok uses 64 bits and DB2 code release SQL09050
with level identifier 03010107
Informational tokens are DB2 v9.5.0.0, s071001, AIX6495, Fix Pack 0.
136
IBM PowerHA SystemMirror 7.1 for AIX
6.1.3 Importing the shared volume group and file systems
The storage must be accessible from both nodes with the logical volume structures created
and imported on both sides. If the volume groups are not imported on the secondary node,
Smart Assist for DB2 does it automatically as shown in Example 6-3.
Example 6-3 Volume groups imported in the nodes
seoul:/db2/db2pok # clcmd lspv
------------------------------NODE seoul
------------------------------hdisk0
00c0f6a088a155eb
caa_private0
00c0f6a01077342f
cldisk2
00c0f6a0107734ea
cldisk1
00c0f6a010773532
rootvg
caavg_private
pokvg
pokvg
active
active
------------------------------NODE busan
------------------------------hdisk0
00c0f6a089390270
caa_private0
00c0f6a01077342f
cldisk2
00c0f6a0107734ea
cldisk1
00c0f6a010773532
rootvg
caavg_private
pokvg
pokvg
active
active
6.1.4 Creating the DB2 instance and database on the shared volume group
Before launching the PowerHA Smart Assist for DB2, you must have already created the DB2
instance and DB2 database over the volume groups that are shared by both nodes.
In Example 6-4, the home for the POK database was created in the /db2/POK/db2pok shared
file system of the volume group pokvg. The instance was created in the /db2/db2pok shared
file system, which is the home directory for user db2pok. The instance was created in the
primary node only as far as the structures are created over a shared volume group.
Example 6-4 Displaying the logical volume groups of pokvg
seoul:/ # lsvg -l pokvg
pokvg:
LV NAME TYPE
LPs
loglv001 jfs2log
1
poklv001 jfs2
96
poklv002 jfs2
192
poklv003 jfs2
32
poklv004 jfs2
48
poklv005 jfs2
64
poklv006 jfs2
64
poklv008 jfs2
32
poklv009 jfs2
4
poklv007 jfs2
32
PPs
1
96
192
32
48
64
64
32
4
32
PVs
1
1
2
1
1
1
1
1
1
1
LV STATE
open/syncd
open/syncd
open/syncd
open/syncd
open/syncd
open/syncd
open/syncd
open/syncd
open/syncd
open/syncd
MOUNT POINT
N/A
/db2/POK/db2pok
/db2/POK/sapdata1
/db2/POK/sapdatat1
/db2/POK/log_dir
/export/sapmnt/POK
/export/usr/sap/trans
/usr/sap/POK
/db2/POK/db2dump
/db2/db2pok
seoul:/ # clcmd grep db2pok /etc/passwd
------------------------------NODE seoul
-------------------------------
Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2
137
db2pok:!:203:101::/db2/db2pok:/usr/bin/ksh
------------------------------NODE busan
------------------------------db2pok:!:203:101::/db2/db2pok:/usr/bin/ksh
seoul:/ # /opt/IBM/db2/V9.5/instance/db2icrt -a SERVER -s ese -u db2fenc1 -p
db2c_db2pok db2pok
seoul:/ # su - db2pok
seoul:/db2/db2pok # ls -ld sqllib
drwxrwsr-t
19 db2pok db2iadm1
4096 Sep 21 13:12 sqllib
seoul:/db2/db2pok # db2start
seoul:/db2/db2pok # db2 "create database pok on /db2/POK/db2pok CATALOG TABLESPACE
managed by database using (file '/db2/POK/sapdata1/catalog.tbs' 100000) EXTENTSIZE
4 PREFETCHSIZE 4 USER TABLESPACE managed by database using (file
'/db2/POK/sapdata1/sapdata.tbs' 500000) EXTENTSIZE 4 PREFETCHSIZE 4 TEMPORARY
TABLESPACE managed by database using (file '/db2/POK/sapdatat1/temp.tbs' 200000)
EXTENTSIZE 4 PREFETCHSIZE 4"
seoul:/db2/db2pok # db2 list db directory
System Database Directory
Number of entries in the directory
= 1
Database 1 entry:
Database alias
= POK
Database name
= POK
Local database directory
= /db2/POK/db2pok
Database release level
= c.00
Comment
=
Directory entry type
= Indirect
Catalog database partition number
= 0
Alternate server hostname
=
Alternate server port number
=
seoul:/db2/db2pok # db2 update db cfg for pok using NEWLOGPATH /db2/POK/log_dir
seoul:/db2/db2pok # db2 update db cfg for pok using LOGRETAIN on
seoul:/db2/db2pok # db2 backup db pok to /tmp
seoul:/db2/db2pok # db2stop; db2start
seoul:/db2/db2pok # db2 connect to pok
Database Connection Information
Database server
SQL authorization ID
Local database alias
138
= DB2/AIX64 9.5.0
= DB2POK
= POK
IBM PowerHA SystemMirror 7.1 for AIX
seoul:/db2/db2pok # db2 connect reset
DB20000I The SQL command completed successfully.
Non-DPF database support: Smart Assist for DB2 supports only non-DPF databases.
6.1.5 Updating the /etc/services file on the secondary node
When the instance is created on the primary node, the /etc/services file is updated with
information for DB2 use. You must also add these lines to the /etc/services file on the
secondary node as in the following example:
db2c_db2pok
DB2_db2pok
DB2_db2pok_1
DB2_db2pok_2
DB2_db2pok_END
50000/tcp
60000/tcp
60001/tcp
60002/tcp
60003/tcp
6.1.6 Configuring IBM PowerHA SystemMirror
You must configure the topology of the PowerHA cluster before using Smart Assist for DB2. In
Example 6-5, the cluster korea was configured with two Ethernet interfaces in each node.
Example 6-5 Cluster korea configuration
seoul:/ #
busan-b2
busan-b1
poksap-db
seoul-b1
seoul-b2
poksap-db
cllsif
boot net_ether_01 ether public busan
boot net_ether_01 ether public busan
service net_ether_01 ether
public busan
boot
net_ether_01 ether public seoul
boot
net_ether_01 ether public seoul
service net_ether_01 ether public seoul
192.168.201.144
192.168.101.144
10.168.101.143
192.168.101.143
192.168.201.143
10.168.101.143
en2
en0
en0
en2
255.255.252.0
255.255.252.0
255.255.252.0
255.255.252.0
255.255.252.0
255.255.252.0
22
22
22
22
22
22
6.2 Implementing a PowerHA SystemMirror cluster and Smart
Assist for DB2 7.1
This section explains the preliminary steps that are required before you start Smart Assist for
DB2. Then it explains how to start Smart Assist for DB2.
6.2.1 Preliminary steps
Before starting Smart Assist for DB2, complete the following steps:
1. Stop the PowerHA cluster services on both nodes by issuing the lssrc -ls clstrmgrES
command on both nodes as shown in Example 6-6 on page 140. A ST_INIT state indicates
that the cluster services are stopped. The shared volume group is active, with file systems
mounted, on the node where Smart Assist for DB2 is going to be installed.
Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2
139
Example 6-6 Checking for PowerHA stopped cluster services
seoul:/ # lssrc -ls clstrmgrES
Current state: ST_INIT
sccsid = "$Header: @(#) 61haes_r710_integration/13
43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710
2010-08-19T1
0:34:17-05:00$"
busan:/ # lssrc -ls clstrmgrES
Current state: ST_INIT
sccsid = "$Header: @(#) 61haes_r710_integration/13
43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710
2010-08-19T1
0:34:17-05:00$"
2. Mount the file systems as shown in Example 6-7 so that Smart Assist for DB2 can
discover the available instances and databases.
Example 6-7 Checking for mounted file systems in node seoul
seoul:/db2/db2pok # lsvg -l pokvg
pokvg:
LV NAME TYPE
LPs
PPs
loglv001 jfs2log
1
1
poklv001 jfs2
96
96
poklv002 jfs2
192
192
poklv003 jfs2
32
32
poklv004 jfs2
48
48
poklv005 jfs2
64
64
poklv006 jfs2
64
64
poklv008 jfs2
32
32
poklv009 jfs2
4
4
poklv007 jfs2
32
32
PVs
1
1
2
1
1
1
1
1
1
1
LV STATE
open/syncd
open/syncd
open/syncd
open/syncd
open/syncd
open/syncd
open/syncd
open/syncd
open/syncd
open/syncd
MOUNT POINT
N/A
/db2/POK/db2pok
/db2/POK/sapdata1
/db2/POK/sapdatat1
/db2/POK/log_dir
/export/sapmnt/POK
/export/usr/sap/trans
/usr/sap/POK
/db2/POK/db2dump
/db2/db2pok
The DB2 instance is active on the node where Smart Assist for DB2 is going to be
executed as shown in Example 6-8.
Example 6-8 Checking for active DB2 instances
seoul:/ # su - db2pok
seoul:/db2/db2pok # db2ilist
db2pok
seoul:/db2/db2pok # db2start
09/24/2010 11:38:53
0
0
SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
seoul:/db2/db2pok # ps -ef | grep db2sysc | grep -v grep
db2pok 15794218 8978496
0 11:38:52
- 0:00 db2sysc 0
seoul:/db2/db2pok # db2pd Database Partition 0 -- Active -- Up 0 days 00:00:10
140
IBM PowerHA SystemMirror 7.1 for AIX
3. After the instance is running, edit the $INSTHOME/sqllib/db2nodes.cfg file as shown in
Example 6-9 to add the service IP label. This service IP label is going to be used in the
IBM PowerHA resource group. If you edited it before, the database instance will not start
because the service IP label is not configured on the network interface when PowerHA is
down.
Example 6-9 Editing and adding the service IP label to the db2nodes.cfg file
seoul:/ # cat /db2/db2pok/sqllib/db2nodes.cfg
0 poksap-db 0
The .rhosts file (Example 6-10) for the DB2 instance owner has all the base, persistent,
and service addresses. It also has the right permissions.
Example 6-10 Checking the .rhosts file
seoul:/ # cat /db2/db2pok/.rhosts
seoul db2pok
busan db2pok
seoul-b1 db2pok
busan-b1 db2pok
seoul-b2 db2pok
busan-b2 db2pok
poksap-db db2pok
seoul:/db2/db2pok # ls -ld .rhosts
-rw------1 db2pok
system
107 Oct
4 15:10 .rhosts
4. Find the path for the binary files and then export the variable as shown in Example 6-11.
The DSE_INSTALL_DIR environment variable is exported as a root user with the actual path
for the DB2 binary files. If more than one DB2 version is installed, choose the version that
you to use for your high available instance.
Example 6-11 Finding the DB2 binary files and exporting them
seoul:/db2/db2pok # db2level
DB21085I Instance "db2pok" uses "64" bits and DB2 code release "SQL09050" with
level identifier "03010107".
Informational tokens are "DB2 v9.5.0.0", "s071001", "AIX6495", and Fix Pack
"0".
Product is installed at "/opt/IBM/db2/V9.5".
seoul:/ # export DSE_INSTALL_DIR=/opt/IBM/db2/V9.5
6.2.2 Starting Smart Assist for DB2
After completing the steps in 6.2.1, “Preliminary steps” on page 139, you are ready to start
Smart Assist for DB2 as explained in the following steps:
1. Launch Smart Assist for DB2 by using the path for seoul: smitty sysmirror  Cluster
Applications and Resources  Make Applications Highly Available (Use Smart
Assists)  Add an Application to the PowerHA SystemMirror Configuration.
2. In the Add an Application to the PowerHA SystemMirror Configuration panel, select Select
a Smart Assist From the List of Available Smart Assists.
Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2
141
3. In the Select a Smart Assist From the List of Available Smart Assists panel (Figure 6-1),
select DB2 UDB non-DPF Smart Assist.
Select a Smart Assist From the List of Available Smart Assists
Move cursor to desired item and press Enter.
DB2 UDB non-DPF Smart Assist
# busan seoul
DHCP Smart Assist
# busan seoul
DNS Smart Assist
# busan seoul
Lotus Domino Smart Assist
# busan seoul
FileNet P8 Smart Assist
# busan seoul
IBM HTTP Server Smart Assist
# busan seoul
SAP MaxDB Smart Assist
# busan seoul
Oracle Database Smart Assist
# busan seoul
Oracle Application Server Smart Assist # busan seoul
Print Subsystem Smart Assist
# busan seoul
SAP Smart Assist
# busan seoul
Tivoli Directory Server Smart Assist # busan seoul
TSM admin smart assist
# busan seoul
TSM client smart assist
# busan seoul
TSM server smart assist
# busan seoul
WebSphere Smart Assist
# busan seoul
F1=Help
Esc+8=Image
/=Find
F2=Refresh
Esc+0=Exit
n=Find Next
F3=Cancel
Enter=Do
Figure 6-1 Selecting DB2 UDB non-DPF Smart Assist
4. In the Add an Application to the PowerHA SystemMirror Configuration panel, select Select
Configuration Mode.
5. In the Select Configuration Mode panel (Figure 6-2), select Automatic Discovery and
Configuration.
Select Configuration Mode
Move cursor to desired item and press Enter.
Automatic Discovery And Configuration
Manual Configuration
F1=Help
Esc+8=Image
/=Find
F2=Refresh
Esc+0=Exit
n=Find Next
Figure 6-2 Selecting the configuration mode
142
IBM PowerHA SystemMirror 7.1 for AIX
F3=Cancel
Enter=Do
6. In the Add an Application to the PowerHA SystemMirror Configuration panel, select Select
the Specific Configuration You Wish to Create.
7. In the Select the Specific Configuration You Wish to Create panel (Figure 6-3), select DB2
Single Instance.
Select The Specific Configuration You Wish to Create
Move cursor to desired item and press Enter.
DB2 Single Instance
F1=Help
Esc+8=Image
/=Find
# busan seoul
F2=Refresh
Esc+0=Exit
n=Find Next
F3=Cancel
Enter=Do
Figure 6-3 Selecting the configuration to create
8. Select the DB2 instance name. In this case, only one instance, db2pok, is available as
shown in Figure 6-4.
Select a DB2 Instance
Move cursor to desired item and press Enter.
db2pok
F1=Help
Esc+8=Image
/=Find
F2=Refresh
Esc+0=Exit
n=Find Next
F3=Cancel
Enter=Do
Figure 6-4 Selecting the DB2 instance name
9. Using the available pick lists (F4), edit the Takeover Node, DB2 Instance Database to
Monitor, and Service IP Label fields as shown in Figure 6-5. Press Enter.
Add a DB2 Highly Available Instance Resource Group
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Application Name
[DB2_Instance_db2pok]
*
*
*
*
*
[seoul]
[busan]
db2pok
POK
[poksap-db]
DB2 Instance Owning Node
Takeover Node(s)
DB2 Instance Name
DB2 Instance Database to Monitor
Service IP Label
+
+
+
+
+
Figure 6-5 Adding the DB2 high available instance resource group
Tip: You can edit the Application Name field and change it to have a more meaningful
name.
Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2
143
A new PowerHA resource group, called db2pok_ResourceGroup, is created. The volume
group pokvg and the service IP label poksap-db are automatically added to the resource
group as shown in Example 6-12.
Example 6-12 The configured resource group for the DB2 instance
seoul:/ # /usr/es/sbin/cluster/utilities/cllsres
APPLICATIONS="db2pok_ApplicationServer"
FILESYSTEM=""
FORCED_VARYON="false"
FSCHECK_TOOL="logredo"
FS_BEFORE_IPADDR="false"
RECOVERY_METHOD="parallel"
SERVICE_LABEL="poksap-db"
SSA_DISK_FENCING="false"
VG_AUTO_IMPORT="false"
VOLUME_GROUP="pokvg"
USERDEFINED_RESOURCES=""
seoul:/ # /usr/es/sbin/cluster/utilities/cllsgrp
db2pok_ResourceGroup
10.Administrator task: Verify the start and stop scripts that were created for the resource
group.
a. To verify the scripts, use the odmget or cllsserv commands or the SMIT tool as shown
in Example 6-13.
Example 6-13 Verifying the start and stop scripts
busan:/ # odmget HACMPserver
HACMPserver:
name = "db2pok_ApplicationServer"
start = "/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok"
stop = "/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok"
min_cpu = 0
desired_cpu = 0
min_mem = 0
desired_mem = 0
use_cod = 0
min_procs = 0
min_procs_frac = 0
desired_procs = 0
desired_procs_frac = 0
seoul:/ # /usr/es/sbin/cluster/utilities/cllsserv
db2pok_ApplicationServer /usr/es/sbin/cluster/sa/db2/sbin/cl_db2start
db2pok /usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok
b. Follow the path on seoul: smitty sysmirror  Cluster Applications and
Resources  Resources  Configure User Applications (Scripts and
Monitors)  Application Controller Scripts  Change/Show Application
Controller Scripts.
144
IBM PowerHA SystemMirror 7.1 for AIX
c. Select the application controller (Figure 6-8) and press Enter.
Select Application Controller
Move cursor to desired item and press Enter.
db2pok_ApplicationServer
F1=Help
Esc+8=Image
/=Find
F2=Refresh
Esc+0=Exit
n=Find Next
F3=Cancel
Enter=Do
Figure 6-6 Selecting the DB2 application controller
The characteristics of the application controller displayed as shown in Figure 6-7.
Change/Show Application Controller Scripts
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
Application Controller Name
db2pok_ApplicationServer
New Name
[db2pok_ApplicationServer]
Start Script
[/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok]
Stop Script
[/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok]
Application Monitor Name(s)
db2pok_SQLMonitor
db2pok_ProcessMonitor
Figure 6-7 Change/Show Application Controller Scripts panel
11.Administrator task: Verify which custom and process application monitors were created by
Smart Assist for DB2. In our example, the application monitors are db2pok_SQLMonitor
and db2pok_ProcessMonitor.
a. Run the following path for seoul: smitty sysmirror  Cluster Applications and
Resources  Resources  Configure User Applications (Scripts and
Monitors)  Application Monitors  Configure Custom Application Monitors 
Change/Show Custom Application Monitor.
b. In the Application Monitor to Change panel (Figure 6-8), select db2pok_SQLMonitor
and press Enter.
Application Monitor to Change
Move cursor to desired item and press Enter.
db2pok_SQLMonitor
F1=Help
Esc+8=Image
/=Find
F2=Refresh
Esc+0=Exit
n=Find Next
F3=Cancel
Enter=Do
Figure 6-8 Selecting the application monitor to change
Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2
145
c. In the Change/Show Custom Application Monitor panel (Figure 6-9), you see the
attributes of the application monitor.
Change/Show Custom Application Monitor
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Monitor Name
db2pok_SQLMonitor
Application Controller(s) to Monitor
db2pok_ApplicationServer +
* Monitor Mode
[Long-running
monitoring]
+
* Monitor Method
[/usr/es/sbin/cluster/sa/db2/sbin/cl_db2cmon -i db2pok
-A po>
Monitor Interval
[120]
#
Hung Monitor Signal
[9]
#
* Stabilization Interval
[240]
#
Restart Count
[3]
#
Restart Interval
[1440]
#
* Action on Application Failure
[fallover]
+
Notify Method
[]
Cleanup Method
[/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok]
Restart Method
[/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok]
Figure 6-9 Change/Show Custom Application Monitor panel
d. Run the following path for seoul: smitty sysmirror  Cluster Applications and
Resources  Resources  Configure User Applications (Scripts and
Monitors)  Application Monitors  Configure Process Application Monitors 
Change/Show Process Application Monitor.
e. In the Application Monitor to Change panel (Figure 6-10), select
db2pok_ProcessMonitor and press Enter.
Application Monitor to Change
Move cursor to desired item and press Enter.
db2pok_ProcessMonitor
F1=Help
Esc+8=Image
/=Find
F2=Refresh
Esc+0=Exit
n=Find Next
Figure 6-10 Selecting the application monitor to change
146
IBM PowerHA SystemMirror 7.1 for AIX
F3=Cancel
Enter=Do
In the Change/Show Process Application Monitor panel, you see the attributes of the
application monitor (Figure 6-11).
Change/Show Process Application Monitor
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* Monitor Name
db2pok_ProcessMonitor
Application Controller(s) to Monitor
db2pok_ApplicationServer +
* Monitor Mode
[Long-running monitoring]
+
* Processes to Monitor
[db2sysc]
* Process Owner
[db2pok]
Instance Count
[1]
#
* Stabilization Interval
[240]
#
* Restart Count
[3]
#
Restart Interval
[1440]
#
* Action on Application Failure
[fallover]
+
Notify Method
[]
Cleanup Method
[/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok]
Restart Method
[/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok]
Figure 6-11 Change/Show Process Application Monitor panel
6.2.3 Completing the configuration
After the Smart Assist for DB2 is started, complete the configuration:
1. Stop the DB2 instance on the primary node as shown in Example 6-14. Keep in mind that
it was active only for the sake of the Smart Assist for DB2 discovery process.
Example 6-14 Stopping the DB2 instance
seoul:/ # su - db2pok
seoul:/db2/db2pok # db2stop
09/24/2010 12:02:56
0
0
SQL1064N DB2STOP processing was successful.
SQL1064N DB2STOP processing was successful.
2. Unmount the shared file systems as shown in Example 6-15.
Example 6-15 Unmounting the shared file systems
seoul:/db2/db2pok # lsvg -l pokvg
pokvg:
LV NAME TYPE
LPs
PPs
loglv001 jfs2log
1
1
poklv001 jfs2
96
96
poklv002 jfs2
192
192
poklv003 jfs2
32
32
poklv004 jfs2
48
48
poklv005 jfs2
64
64
poklv006 jfs2
64
64
poklv008 jfs2
32
32
poklv009 jfs2
4
4
poklv007 jfs2
32
32
PVs
1
1
2
1
1
1
1
1
1
1
LV STATE
closed/syncd
closed/syncd
closed/syncd
closed/syncd
closed/syncd
closed/syncd
closed/syncd
closed/syncd
closed/syncd
closed/syncd
MOUNT POINT
N/A
/db2/POK/db2pok
/db2/POK/sapdata1
/db2/POK/sapdatat1
/db2/POK/log_dir
/export/sapmnt/POK
/export/usr/sap/trans
/usr/sap/POK
/db2/POK/db2dump
/db2/db2pok
Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2
147
3. Deactivate the shared volume group as shown in Example 6-16.
Example 6-16 Deactivating the shared volume group of pokvg
seoul:/ # varyoffvg pokvg
seoul:/ # lsvg -o
caavg_private
rootvg
4. Synchronize the PowerHA cluster by using SMIT:
a. Follow the path smitty sysmirror  Custom Cluster Configuration  Verify and
Synchronize Cluster Configuration (Advanced).
b. In the PowerHA SystemMirror Verification and Synchronization panel (Figure 6-12),
press Enter to accept the default option.
PowerHA SystemMirror Verification and Synchronization
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* Verify, Synchronize or Both
* Include custom verification library checks
* Automatically correct errors found during
verification?
* Force synchronization if verification fails?
* Verify changes only?
* Logging
[Both]
[Yes]
[Yes]
+
+
+
[No]
[No]
[Standard]
+
+
+
Figure 6-12 Accepting the default actions on the Verification and Synchronization panel
5. Start the cluster on both nodes, seoul and busan, by running smitty clstart.
6. In the Start Cluster Services panel (Figure 6-13 on page 149), complete these steps:
a.
b.
c.
d.
e.
f.
g.
h.
148
For Start now, on system restart or both, select now.
For Start Cluster SErvices on these nodes, enter [seoul busan].
For Manage Resource Groups, select Automatically.
For BROADCAST message at startup, select false.
For Startup Cluster Information Daemon, select true.
For Ignore verification errors, select false.
For Automatically correct errors found during cluster start?, select yes.
Press Enter.
IBM PowerHA SystemMirror 7.1 for AIX
Start Cluster Services
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* Start now, on system restart or both
Start Cluster Services on these nodes
* Manage Resource Groups
BROADCAST message at startup?
Startup Cluster Information Daemon?
Ignore verification errors?
Automatically correct errors found during
cluster start?
now
[seoul busan]
Automatically
false
true
false
yes
+
+
+
+
+
+
+
Figure 6-13 Specifying the options for starting cluster services
Tip: The log file for the Smart Assist is in the /var/hacmp/log/sa.log file. You can use the
clmgr utility to easily view the log, as in the following example:
clmgr view log sa.log
When the PowerHA cluster starts, the DB2 instance is automatically started. The application
monitors start after the defined stabilization interval as shown in Example 6-17.
Example 6-17 Checking the status of the high available cluster and the DB2 instance
seoul:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------db2pok_Resourc ONLINE
seoul
OFFLINE
busan
seoul:/ # ps -ef | grep /usr/es/sbin/cluster/clappmond | grep -v grep
root 7340184 15728806
0 12:17:53
- 0:00
/usr/es/sbin/cluster/clappmond db2pok1_SQLMonitor
root 11665630 4980958
0 12:17:53
- 0:00
/usr/es/sbin/cluster/clappmond db2pok_ProcessMonitor
seoul:/ # su - db2pok
seoul:/db2/db2pok # db2pd Database Partition 0 -- Active -- Up 0 days 00:19:38
Your DB2 instance and database are now configured for high availability in a hot-standby
PowerHA SystemMirror configuration.
Chapter 6. IBM PowerHA SystemMirror Smart Assist for DB2
149
150
IBM PowerHA SystemMirror 7.1 for AIX
7
Chapter 7.
Migrating to PowerHA 7.1
This chapter includes the following topics for migrating to PowerHA 7.1:
Considerations before migrating
Understanding the PowerHA 7.1 migration process
Snapshot migration
Rolling migration
Offline migration
© Copyright IBM Corp. 2011. All rights reserved.
151
7.1 Considerations before migrating
Before migrating your cluster, you must be aware of the following considerations:
The required software
– AIX
– Virtual I/O Server (VIOS)
Multicast address
Repository disk
FC heartbeat support
All non-IP networks support removed
–
–
–
–
–
RS232
TMSCSI
TMSSA
Disk heartbeat (DISKHB)
Multinode disk heartbeat (MNDHB)
IP networks support removed
– Asynchronous transfer mode (ATM)
– Fiber Distributed Data Interface (FDDI)
– Token ring
IP Address Takeover (IPAT) via replacement support removed
Heartbeat over alias support removed
Site support not available in this version
IPV6 support not available in this version
You can migrate from High-Availability Cluster Multi-Processing (HACMP) or PowerHA
versions 5.4.1, 5.5, and 6.1 only. If you are running a version earlier than HACMP 5.4.1, you
must upgrade to a newer version first.
TL6: AIX must be at a minimum version of AIX 6.1 TL6 (6.1.6.0) on all nodes before
migration. Use of AIX 6.1 TL6 SP2 or later is preferred.
For more information about migration considerations, see 3.4, “Migration planning” on
page 46.
Only the following migration methods are supported:
Snapshot migration (as explained in 7.3, “Snapshot migration” on page 161)
Rolling migration (as explained in 7.4, “Rolling migration” on page 177)
Offline migration (as explained in 7.5, “Offline migration” on page 191)
Important: A nondisruptive upgrade is not available in PowerHA 7.1, because this version
is the first one to use Cluster Aware AIX (CAA).
152
IBM PowerHA SystemMirror 7.1 for AIX
7.2 Understanding the PowerHA 7.1 migration process
Before you begin a migration, you must understand the migration process and all migration
scenarios. The process is different from the previous versions of PowerHA (HACMP).
With the introduction of PowerHA 7.1, you now use the features of CAA introduced in AIX 6.1
TL6 and AIX 7.1. For more information about the new features of this release, see 2.2, “New
features” on page 24.
The migration process now has two main cluster components: CAA and PowerHA. This
process involves updating your existing PowerHA product and configuring the CAA cluster
component.
7.2.1 Stages of migration
Migrating to PowerHA 7.1 involves the following stages:
Stage 1: Upgrading to AIX 6.1 TL6 or AIX 7.1
Before you can migrate, you must have working a cluster-aware version of AIX. You can
perform this task as part of a two-stage rolling migration or upgrade to AIX first before you
start the PowerHA migration. This version is required before you can start premigration
checking (stage 2).
Stage 2: Performing the premigration check (clmigcheck)
During this stage, you use the clmigcheck command to upgrade PowerHA to PowerHA
7.1:
a. Stage 2a: Run the clmigcheck command on the first node to choose Object Data
Manager (ODM) or snapshot. Run it again to choose the repository disk (and optionally
the IP multicast address).
b. Stage 2b: Run the clmigcheck command on each node (including the first node) to see
the “OK to install the new version” message and then upgrade the node to
PowerHA 7.1.
The clmigcheck command: The clmigcheck command automatically creates the CAA
cluster when it is run on the last node.
For a detailed explanation about the clmigcheck process, see 7.2.2, “Premigration
checking: The clmigcheck program” on page 157.
Chapter 7. Migrating to PowerHA 7.1
153
Stage 3: Upgrading to PowerHA 7.1
After stage 2 is completed, you upgrade to PowerHA 7.1 on the node. Figure 7-1 shows
the state of the cluster in the test environment after updating to PowerHA 7.1 on one node.
Topology services are still active so that the newly migrated PowerHA 7.1 node can
communicate with the previous version, PowerHA 6.1. The CAA configuration has been
completed, but the CAA cluster is not yet created.
Figure 7-1 Mixed version cluster after migrating node 1
Stage 4: Creating the CAA cluster (last node)
When you are on the last node of the cluster, you create the CAA cluster after running the
clmigcheck command a final time. CAA is required for PowerHA 7.1 to work, making this
task a critical step. Figure 7-2 shows the state of the environment after running the
clmigcheck command on the last node of the cluster, but before completing the migration.
Figure 7-2 Mixed version cluster after migrating node 2
At this stage, the clmigcheck process has run on the last node of the cluster. The CAA
cluster is now created and CAA has established communication with the other node.
154
IBM PowerHA SystemMirror 7.1 for AIX
However, PowerHA is still using the Topology Services (topsvcs) function because the
migration switchover to CAA is not yet completed.
Stage 5: Starting the migration protocol
As soon as you create the CAA cluster and install PowerHA 7.1, you must start the cluster.
The node_up event checks whether all nodes are running PowerHA 7.1 and starts the
migration protocol. The migration protocol has two phases:
– Phase 1
You call ha_gs_migrate_to_caa_prep(0) to start the migration from groups services to
CAA. Ensure that each node can proceed with the migration.
– Phase 2
During the second phase, you update the DCD and ACD ODM entries in HACMPnode
and HACMPcluster to the latest version. You call ha_gs_migrate_to_caa_commit() to
complete the migration and issue the following command:
/usr/es/sbin/cluster/utilities/clmigcleanup
The clmigcleanup process removes existing non-IP entries from the HACMPnetwork,
HACMPadapter, and HACMPnim ODM entries, such as any diskhb entries. Figure 7-3
shows sections from the clstrmgr.debug log file showing the migration protocol stages.
Migration phase one - extract from clstrmgr.debug
Mon Sep 27 20:22:51 nPhaseCb: First phase of the migration protocol, call
ha_gs_caa_migration_prep()
Mon Sep 27 20:22:51 domainControlCb: Called, state=ST_STABLE
Mon Sep 27 20:22:51 domainControlCb: Notification type: HA_GS_DOMAIN_NOTIFICATION
Mon Sep 27 20:22:51 domainControlCb: HA_GS_MIGRATE_TO_CAA
Mon Sep 27 20:22:51 domainControlCb: Sub-Type: HA_GS_DOMAIN_CAA_MIGRATION_COORD
Mon Sep 27 20:22:51 domainControlCb: reason: HA_GS_VOTE_FOR_MIGRATION
Mon Sep 27 20:22:51 domainControlCb: Called, state=ST_STABLE
Mon Sep 27 20:22:51 domainControlCb: Notification type: HA_GS_DOMAIN_NOTIFICATION
Mon Sep 27 20:22:51 domainControlCb: HA_GS_MIGRATE_TO_CAA
Mon Sep 27 20:22:51 domainControlCb: Sub-Type: HA_GS_DOMAIN_CAA_MIGRATION_APPRVD
Mon Sep 27 20:22:51 domainControlCb: reason: HA_GS_MIGRATE_TO_CAA_PREP_DONE
Mon Sep 27 20:22:51 domainControlCb: Set RsctMigPrepComplete flag
Mon Sep 27 20:22:51 domainControlCb: Voting to CONTINUE with RsctMigrationPrepMsg.
Migration phase two - updating cluster version
Mon Sep 27 20:22:51 DoNodeOdm: Called for DCD HACMPnode class
Mon Sep 27 20:22:51 GetObjects: Called with criteria:
name=chile
Mon Sep 27 20:22:51 DoNodeOdm: Updating DCD HACMPnode stanza with node_id = 1 and
version = 12 for object NAME_SERVER of node chile
Mon Sep 27 20:22:51 DoNodeOdm: Updating DCD HACMPnode stanza with node_id = 1 and
version = 12 for object DEBUG_LEVEL of node chile
Finishing migration
Mon Sep 27 20:23:51
Mon Sep 27 20:23:51
Mon Sep 27 20:23:51
- calling clmigcleanup
finishMigrationGrace: resetting MigrationGracePeriod
finishMigrationGrace: Calling ha_gs_migrate_to_caa_commit()
finifhMigration Grace: execute clmigcleanup command
Figure 7-3 Extract from the clstrmgr.debug file showing the migration protocol
Chapter 7. Migrating to PowerHA 7.1
155
Stage 6: Switching over from Group Services (grpsvcs) to CAA
When migration is complete, switch over the grpsvcs communication function from
topsvcs to the new communication with CAA. The topsvcs function is now inactive, but the
service is still part of Reliable Scalable Cluster Technology (RSCT) and is not removed.
CAA communication: The grpsvcs SRC subsystem is active until you restart the
system. This subsystem is now communicating with CAA and not topsvcs as shown in
Figure 7-4.
Figure 7-4 Switching over Group Services to use CAA
Figure 7-5 shows the services that are running after migration, including cthags.
chile:/ # lssrc -a | grep cluster
clstrmgrES
cluster
clevmgrdES
cluster
4391122
11862228
active
active
chile:/ # lssrc -a | grep cthags
cthags
cthags
7405620
active
chile:/ # lssrc -a | grep caa
cld
caa
clcomd
caa
solid
caa
clconfd
caa
solidhac
caa
4063436
3670224
7864338
5505178
7471164
active
active
active
active
active
Figure 7-5 Services running after migration
156
IBM PowerHA SystemMirror 7.1 for AIX
Table 7-1 shows the changes to the SRC subsystem before and after migration.
Table 7-1 Changes in the SRC subsystems
Older PowerHA
PowerHA 7.1 or later
Topology Services
topsvcs
N/A
Group Services
grpsvcs
cthags
The clcomdES and clcomd subsystems
When running in a mixed-version cluster, you must handle the changes in the clcomd
subsystem. During a rolling or mixed-cluster situation, you can have two separate instances
of the communication daemon running: clcomd and clcomdES.
clcomd instances: You can have two instances of the clcomd daemon in the cluster, but
never on a given node. After PowerHA 7.1 is installed on a node, the clcomd daemon is
run, and the clcomdES daemon does not exist. AIX 6.1.6.0 and later with a back-level
PowerHA version (before version 7.1) only runs the clcomdES daemon even though the
clcomd daemon exists.
The clcomd daemon uses port 16191, and the clcomdES daemon uses port 6191. When
migration is complete, the clcomdES daemon is removed.
The clcomdES daemon: The clcomdES daemon is removed when the older PowerHA
software version is removed (snapshot migration) or overwritten by the new PowerHA 7.1
version (rolling or offline migration).
7.2.2 Premigration checking: The clmigcheck program
Before starting migration, you must run the clmigcheck program to prepare the cluster for
migration. The clmigcheck program has two functions. First, it validates the current cluster
configuration (by using ODM or snapshot) for migration. If the configuration is not valid, the
clmigcheck program notifies you of any unsupported elements, such as disk heartbeating or
IPAT via replacement. It also indicates any actions that might be required before you can
migrate. Second, this program prepares for the new cluster by obtaining the disk to be used
for the repository disk and multicast address.
Command profile: The clmigcheck command is not a PowerHA command, but the
command is part of bos.cluster and is in the /usr/sbin directory.
Chapter 7. Migrating to PowerHA 7.1
157
High-level overview of the clmigcheck process
Figure 7-6 shows a high-level view of how the clmigcheck program works. The clmigcheck
program must go through several stages to complete the cluster migration.
Figure 7-6 High-level process of the clmigcheck command
The clmigcheck program goes through the following stages:
1. Performing the first initial run
When the clmigcheck program runs, it checks whether it has been run before by looking
for a /var/clmigcheck/clmigcheck.txt file. If this file does not exist, the clmigcheck
program runs and opens the menu shown in Figure 7-8 on page 159.
2. Verifying that the cluster configuration is suitable for migration
From the clmigcheck menu, you can select options 1 or 2 to check your existing ODM or
snapshot configuration to see if your environment is ready for migration.
3. Creating the CAA required configuration
After performing option 1 or 2, choose option 3. Option 3 creates the /var/clmigcheck
/clmigcheck.txt file with the information entered and is copied to all nodes in the cluster.
4. Performing the second run on the first node, or first run on any other node that is not the
first or the last node in the cluster to be migrated
If the clmigcheck program is run again and the clmigcheck.txt file already exists, a
message is returned indicating that you can proceed with the upgrade of PowerHA.
158
IBM PowerHA SystemMirror 7.1 for AIX
5. Verifying whether the last node in the cluster is upgraded
When the clmigcheck program runs, apart from checking for the presence of the
clmigcheck.txt file, it verifies if it is the last node in the cluster to be upgraded. The lslpp
command is run against each node in the cluster to establish whether PowerHA has been
upgraded. If all other nodes are upgraded, this command confirms that this node is the last
node of the cluster and can now create the CAA cluster.
The clmigcheck program uses the mkcluster command and passes the cluster parameters
from the existing PowerHA cluster, along with the repository disk and multicast address (if
applicable). Figure 7-7 shows an example of the mkcluster command being called.
usr/sbin/mkcluster -n newyork -r hdisk1 -m
chile{cle_globid=4},scotland{cle_globid=5},serbia{cle_globid=6}
Figure 7-7 The clmigcheck command calling the mkcluster command
Running the clmigcheck command
Figure 7-8 shows the main clmigcheck panel. You choose option 1 or 2 depending on which
type of migration you want to perform. Option 1 is for a rolling or offline migration. Option 2 is
for a snapshot migration. When you choose either option, a check of the cluster configuration
is performed to verify if the cluster can be migrated. If any problems are detected, a warning
or error message is displayed.
------------[ PowerHA SystemMirror Migration Check ]------------Please select one of the following options:
1
= Check ODM configuration.
2
= Check snapshot configuration.
3
= Enter repository disk and multicast IP addresses.
Select one of the above,"x"to exit or "h" for help:
Figure 7-8 The clmigcheck menu
A warning message is displayed for certain unsupported elements, such as disk heartbeat as
shown in Figure 7-9.
------------[ PowerHA SystemMirror Migration Check ]------------CONFIG-WARNING: The configuration contains unsupported hardware: Disk
Heartbeat network. The PowerHA network name is net_diskhb_01. This will be
removed from the configuration during the migration
to PowerHA SystemMirror 7.1.
Hit <Enter> to continue
Figure 7-9 The disk heartbeat warning message when running the clmigcheck command
Chapter 7. Migrating to PowerHA 7.1
159
Non-IP networks can be dynamically removed during the migration process by using the
clmigcleanup command. However, other configurations, such as IPAT via replacement,
require manual steps to remove or change them to a supported configuration. After the
changes are made, run clmigcheck again to ensure that the error is resolved.
The second function of the clmigcheck program is to prepare the CAA cluster environment.
This function is performed when you select option 3 (Enter repository disk and multicast IP
addresses) from the menu.
When you select this option, the clmigcheck program stores the information entered in the
/var/clmigcheck/clmigcheck.txt file. This file is also copied to the /var/clmigcheck
directory on all nodes in the cluster. This file contains the physical volume identifier (PVID) of
the repository disk and the chosen multicast address. If PowerHA is allowed to choose a
multicast address automatically, the NULL setting is specified in the file. Figure 7-10 shows
an example of the clmigcheck.txt file.
CLUSTER_TYPE:STANDARD
CLUSTER_REPOSITORY_DISK:000fe40120e16405
CLUSTER_MULTICAST:NULL
Figure 7-10 Contents of the clmigcheck.txt file
Upon running the clmigcheck command, the command checks to see if the clmigcheck.txt
file exists. If the clmigcheck.txt file exists and the node is not the last node in the cluster to
be migrated, the panel shown in Figure 7-11 is displayed. It contains a message indicating
that you can now upgrade to the later level of PowerHA.
------------[ PowerHA SystemMirror Migration Check ]------------clmigcheck: This is not the first node or last node clmigcheck was run on.
No further checking is required on this node. You can install the new
version of PowerHA SystemMirror.
Hit <Enter> to continue
----------------------------------------------------------------------Figure 7-11 The clmigcheck panel after it has been run once and before the PowerHA upgrade
The clmigcheck program checks the installed version of PowerHA to see if it has been
upgraded. This step is important to determine which node is the last node to be upgraded in
the cluster. If it is the last node in the cluster, then additional configuration operations must be
completed along with creating and activating the CAA cluster.
Important: You must run the clmigcheck program before you upgrade PowerHA. Then
upgrade PowerHA one node at a time, and run the clmigcheck program on the next node
only after you complete the migration on the previous node. If you do not run the
clmigcheck program specifically on the last node, the cluster is still in migration mode
without creating the CAA cluster. For information about how to resolve this situation, see
10.4.7, “The ‘Cluster services are not active’ message” on page 323.
160
IBM PowerHA SystemMirror 7.1 for AIX
After you upgrade PowerHA, if you run the clmigcheck program again, you see an error
message similar to the one shown in Figure 7-12. The message indicates that all migration
steps for this node of the cluster have been completed.
ERROR: This program is intended for PowerHA configurations prior to version 7.1
The version currently installed appears to be: 7.1.0
Figure 7-12 clmigcheck panel after PowerHA has been installed on a node.
Figure 7-13 shows an extract from the /tmp/clmigcheck/clmigcheck.log file that was taken
when the clmigcheck command ran on the last node in a three-node cluster migration. This
file shows the output by the clmigcheck program when checking whether this node is the last
node of the cluster.
ck_lastnode: Getting version of cluster.es.server.rte on node chile
ck_lastnode: lslpp from node (chile) is
/etc/objrepos:cluster.es.server.rte:7.1.
0.1::COMMITTED:F:Base Server Runtime:
ck_lastnode: cluster.es.server.rte on node chile is (7.1.0.1)
ck_lastnode: Getting version of cluster.es.server.rte on node serbia
ck_lastnode: lslpp from node (serbia) is
/etc/objrepos:cluster.es.server.rte:7.1
.0.1::COMMITTED:F:Base Server Runtime:
ck_lastnode: cluster.es.server.rte on node serbia is (7.1.0.1)
ck_lastnode: Getting version of cluster.es.server.rte on node scotland
ck_lastnode: lslpp from node (scotland) is
/etc/objrepos:cluster.es.server.rte:6
.1.0.2::COMMITTED:F:ES Base Server Runtime:
ck_lastnode: cluster.es.server.rte on node scotland is (6.1.0.2)
ck_lastnode: oldnodes = 1
ck_lastnode: This is the last node to run clmigcheck.
clmigcheck: This is the last node to run clmigcheck, create the CAA cluster
Figure 7-13 Extract from clmigcheck.log file showing the lslpp last node checking
7.3 Snapshot migration
To illustrate a snapshot migration, the environment in this scenario entails a two-node AIX
6.1.3 and PowerHA 5.5 SP4 cluster being migrated to AIX 6.1 TL6 and PowerHA 7.1 SP1.
The nodes are IBM POWER6® 550 systems and configured as VIO client partitions. Virtual
devices are used for network and storage configuration.
Chapter 7. Migrating to PowerHA 7.1
161
The network topology consists of one IP network and one non-IP network, which is the disk
heartbeat network. The initial IPAT method is IPAT via replacement, which must be changed
before starting the migration, because PowerHA 7.1 only supports IPAT via aliasing.
Also the environment has one resource group that includes one service IP, two volume
groups, and application monitoring. This environment also has an IBM HTTP server as the
application. Figure 7-14 shows the relevant resource group settings.
Resource Group Name
Participating Node Name(s)
Startup Policy
Fallover Policy
Fallback Policy
Site Relationship
Node Priority
Service IP Label
Volume Groups
testrg
algeria brazil
Online On Home Node Only
Fallover To Next Priority Node
Never Fallback
ignore
algeria_svc
algeria_vg brazil_vg
Figure 7-14 Cluster resource group configuration using snapshot migration
7.3.1 Overview of the migration process
A major difference from previous migration versions is the clmigcheck script, which is
mandatory for the migration procedure. As stated in 1.2, “Cluster Aware AIX” on page 7,
PowerHA 7.1 uses CAA for monitoring and event management. By running the clmigcheck
script (option 3), you can specify a repository disk and a multicast address, which are
required for the CAA service.
The snapshot migration method requires all cluster nodes to be offline for some time. It
requires removing previous versions of PowerHA and installing AIX 6.1 TL6 or later and the
new version of PowerHA 7.1.
In this scenario, to begin, PowerHA 5.5 SP4 is on AIX 6.1.3 and migrated to PowerHA 7.1
SP1 on AIX 6.1 TL6. The network topology consists of one IP network using IPAT via
replacement and the disk heartbeat network. Both of these network types are no longer
supported. However, if you have an IPAT via replacement configuration, the clmigcheck script
generates an error message as shown in Figure 7-15. You must remove this configuration to
proceed with the migration.
------------[ PowerHA SystemMirror Migration Check ]------------CONFIG-ERROR: The configuration contains unsupported options: IP Address
Takeover via Replacement. The PowerHA network name is net_ether_01.
This will have to be removed from the configuration before
migration to PowerHA SystemMirror
Hit <Enter> to continue
Figure 7-15 The clmigcheck error message for IPAT via replacement
IPAT via replacement configuration: If your cluster has an IPAT via replacement
configuration, remove or change to the IPAT via alias method before starting the migration.
162
IBM PowerHA SystemMirror 7.1 for AIX
7.3.2 Performing a snapshot migration
The next steps are followed to migrate the cluster.
Creating a snapshot
Create a snapshot by entering the smit cm_add_snap.dialog command while your cluster is
running.
Stopping the cluster
Run the smit clstop command on all nodes to take down the cluster. Ensure that the cluster
is down by using the lssrc -ls clstrmgrES command (Figure 7-16) for each node.
# lssrc -ls clstrmgrES
Current state: ST_INIT
sccsid = "$Header: @(#) 61haes_r710_integration/13 43haes/usr/sbin/cluster/hacmprd/main.C,
hacmp, 61haes_r710, 1034A_61haes_r710 2010-08-19T1
0:34:17-05:00$"
Figure 7-16 The lssrc -ls clstrmgrES command to ensure that each cluster is down
Installing AIX 6.1.6 and clmigcheck
To install AIX 6.1.6 and the clmigcheck program, follow these steps:
1. By using the AIX 6.1.6 installation media or TL6 updates, perform a smitty update_all.
2. After updating AIX, check whether the bos.cluster and bos.ahafs file sets are correctly
installed as shown in Figure 7-17. These two file sets are new for the CAA services. You
might need to install them separately.
brazil:/ # lslpp -l |grep bos.cluster
bos.cluster.rte
6.1.6.1 APPLIED
bos.cluster.solid
6.1.6.1 APPLIED
bos.cluster.rte
6.1.6.1 APPLIED
bos.cluster.solid
6.1.6.0 COMMITTED
brazil:/ #
Cluster Aware AIX
POWER HA Business Resiliency
Cluster Aware AIX
POWER HA Business Resiliency
Figure 7-17 Verifying additional required file sets
The clcomd subsystem is now part of AIX and requires the fully qualified host names of all
nodes in the cluster to be listed in the /etc/cluster/rhosts file. Because AIX was
updated, a restart is required.
3. Because you updated the AIX image, restart the system before you continue with the next
step.
After restarting the system, you can see the clcomd subsystem from the caa subsystem
group that is up and running. The clcomdES daemon, which is part of PowerHA, is also
running as shown in Figure 7-18.
algeria:/usr/es/sbin/cluster/etc # lssrc -a|grep com
clcomd
caa
4128960
active
clcomdES
clcomdES
2818102
active
algeria:/usr/es/sbin/cluster/etc #
Figure 7-18 Two clcomd daemons exist
Chapter 7. Migrating to PowerHA 7.1
163
Now AIX 6.1.6 is installed and you ready for the clmigcheck step.
4. Run the clmigcheck command on the first node (algeria). Figure 7-19 shows the
clmigcheck menu.
------------[ PowerHA SystemMirror Migration Check ]------------Please select one of the following options:
1
= Check ODM configuration.
2
= Check snapshot configuration.
3
= Enter repository disk and multicast IP addresses.
Select one of the above,"x"to exit or "h" for help:
Figure 7-19 Options on the clmigcheck menu
The clmigcheck menu options: In the clmigcheck menu, option 1 and 2 review the
cluster configurations. Option 3 gathers information that is necessary to create the CAA
cluster during its execution on the last node of the cluster. In option 3, you define a cluster
repository disk and multicast IP address. Selecting option 3 means that you are ready to
start the migration.
In option 3 of the clmigcheck menu, you select two configurations:
The disk to use for the repository
The multicast address for internal cluster communication
Option 2: Checking the snapshot configuration
When you choose option 2 from the clmigcheck menu, a prompt is displayed for you to
provide the snapshot file name. The clmigcheck review specifies the snapshot file and shows
an error or warning message if any unsupported elements are discovered.
164
IBM PowerHA SystemMirror 7.1 for AIX
In the test environment, a disk heartbeat network is not supported in PowerHA 7.1. The
warning message from clmigcheck is for the disk heartbeat configuration as Figure 7-20
shows.
------------[ PowerHA SystemMirror Migration Check ]------------h = help
Enter snapshot name (in /usr/es/sbin/cluster/snapshots): snapshot_mig
clsnapshot: Removing any existing temporary HACMP ODM entries...
clsnapshot: Creating temporary HACMP ODM object classes...
clsnapshot: Adding HACMP ODM entries to a temporary directory..
clsnapshot: Succeeded generating temporary ODM containing Cluster Snapshot:
snapshot_mig
------------[ PowerHA SystemMirror Migration Check ]------------CONFIG-WARNING: The configuration contains unsupported hardware: Disk
Heartbeat network. The PowerHA network name is net_diskhb_01. This will be
removed from the configuration during the migration
to PowerHA SystemMirror 7.1.
Hit <Enter> to continue
Figure 7-20 The clmigcheck warning message for a disk heartbeat configuration
Figure 7-20 shows the warning message “This will be removed from the configuration
during the migration”. Because it is only a warning message, you can continue with the
migration. After completing the migration, verify that the disk heartbeat is removed.
When option 2 of clmigcheck is completed without error, proceed with option 3 as shown in
Figure 7-21.
------------[ PowerHA SystemMirror Migration Check ]------------The ODM has no unsupported elements.
Hit <Enter> to continue
Figure 7-21 clmigcheck passed for snapshot configurations
Chapter 7. Migrating to PowerHA 7.1
165
Option 3: Entering the repository disk and multicast IP addresses
In option 3, clmigcheck lists all shared disks on both nodes. In this scenario, hdisk1 is
specified as the repository disk as shown in Figure 7-22.
------------[ PowerHA SystemMirror Migration Check ]------------Select the disk to use for the repository
1
2
3
4
5
=
=
=
=
=
000fe4114cf8d1ce(hdisk1)
000fe4114cf8d3a1(hdisk4)
000fe4114cf8d441(hdisk5)
000fe4114cf8d4d5(hdisk6)
000fe4114cf8d579(hdisk7)
Select one of the above or "x" to exit:
1
Figure 7-22 Selecting the repository disk
You can create a NULL entry for the multicast address. Then, AIX generates one such
address as shown in Figure 7-23. Keep this value as the default so that AIX can generate the
multicast address.
------------[ PowerHA SystemMirror Migration Check ]------------PowerHA SystemMirror uses multicast address for internal
cluster communication and monitoring. These must be in the
multicast range, 224.0.0.0 - 239.255.255.255.
If you make a NULL entry, AIX will generate an appropriate address for you.
You should only specify an address if you have an explicit reason to do
so, but are cautioned that this address cannot be changed once the
configuration is activated (i.e. migration is complete).
h = help
Enter the multicast IP address to use for network monitoring:
Figure 7-23 Defining a multicast address
166
IBM PowerHA SystemMirror 7.1 for AIX
The clmigcheck process is logged in the /tmp/clmigcheck/clmigcheck.log file (Figure 7-24).
validate_disks: No sites, only one repository disk needed.
validate_disks: Disk 000fe4114cf8d1ce exists
prompt_mcast: Called
prompt_mcast: User entered:
validate_mcast: Called
write_file: Called
write_file: Copying /tmp/clmigcheck/clmigcheck.txt to algeria:/var/clmigcheck/clmigcheck.txt
write_file: Copying /tmp/clmigcheck/clmigcheck.txt to brazil:/var/clmigcheck/clmigcheck.txt
Figure 7-24 /tmp/clmigcheck/clmigcheck.log
The completed clmigcheck program
When the clmigcheck program is completed, it creates a /var/clmigcheck/clmigcheck.txt
file on each node of the cluster. The text file contains a PVID of the repository disk and the
multicast address for the CAA cluster as shown in Figure 7-25.
# cat /var/clmigcheck/clmigcheck.txt
CLUSTER_TYPE:STANDARD
CLUSTER_REPOSITORY_DISK:000fe4114cf8d1ce
CLUSTER_MULTICAST:NULL
Figure 7-25 The /var/clmigcheck/clmigcheck.txt file
When PowerHA 7.1 is installed, this information is used to create the HACMPsircol.odm file as
shown in Figure 7-26. This file is created when you finish restoring the snapshot in this
scenario.
algeria:/ # odmget HACMPsircol
HACMPsircol:
name = "canada_cluster_sircol"
id = 0
uuid = "0"
repository = "000fe4114cf8d1ce"
ip_address = ""
nodelist = "brazil,algeria"
backup_repository1 = ""
backup_repository2 = ""
algeria:/ #
Figure 7-26 The HACMPsircol.odm file
Chapter 7. Migrating to PowerHA 7.1
167
Running clmigcheck on one node: Compared to the rolling migration method, the
snapshot migration method entails running the clmigcheck command on one node. Do not
run the clmigcheck command on another node while you are doing a snapshot migration or
the migration will fail. If you run the clmigcheck command on every node, the CAA cluster
is created upon executing the clmigcheck command on the last node and goes into the
rolling migration phase.
Uninstalling PowerHA SystemMirror 5.5
To uninstall PowerHA SystemMirror 5.5, follow these steps:
1. Run smit install_remove and specify cluster.* from all nodes. Verify this step by
running the following command to show that all PowerHA file sets are removed:
lslpp -l cluster.*
2. Install PowerHA 7.1 by using the following command:
smit install_all
3. Verify that the file sets are installed correctly:
lslpp -l cluster.*
After you install the new PowerHA 7.1 file sets, you can see that the clcomdES daemon has
disappeared. You now have the clcomd daemon, which is part of CAA, instead of the clcomdES
daemon.
Updating the /etc/cluster/rhosts file
After you complete the installation of PowerHA 7.1, update the /etc/cluster/rhosts file:
1. Update the /etc/cluster/rhosts file with the fully qualified domain name of each node in
the cluster. (For example, you might use the output from the hostname command).
2. Restart the clcomd subsystem as shown in Figure 7-27.
algeria:/ # stopsrc -s clcomd
0513-044 The clcomd Subsystem was requested to stop.
algeria:/ # startsrc -s clcomd
0513-059 The clcomd Subsystem has been started. Subsystem PID is 12255420.
algeria:/ #
Figure 7-27 Restarting the clcomd subsystem on both nodes
3. Stop and start the clcomd daemon instead by using the following command:
refresh -s clcomd
4. To verify that the clcomd subsystem is working, use the clrsh command. If it does not
work, correct any problems before proceeding as explained in Chapter 10,
“Troubleshooting PowerHA 7.1” on page 305.
Converting the snapshot
Now convert the snapshot from PowerHA 5.5. On PowerHA 7.1, run the clconvert_snapshot
command before you restore it. (In some older versions of PowerHA, you do not need to run
this command.) While converting the snapshot, the clconvert_snapshot command refers to
the /var/clmigcheck/clmigcheck.txt file and adds the HACMPsircol stanza with the
repository disk and multicast address, which are newly introduced in PowerHA 7.1. After you
168
IBM PowerHA SystemMirror 7.1 for AIX
restore the snapshot, you can see that the HACMPsircol ODM contains this information as
illustrated in Figure 7-26 on page 167.
Restoring a snapshot
To restore a snapshot, follow the path smitty hacmp  Cluster Nodes and Networks 
Manage the Cluster  Snapshot Configuration  Restore the Cluster Configuration
From a Snapshot for restoring a snapshot.
Failure to restore a snapshot
When you restore the snapshot with the default option, an error message about clcomd
communication is displayed. Because there is no configuration, the snapshot fails at the
communication_check function in the clsnapshot program as shown in Figure 7-28.
cllsnode: Error reading configuration
/usr/es/sbin/cluster/utilities/clsnapshot[2127]: apply_CS[116]: communication_check: line 49:
local: not found
Warning: unable to verify inbound clcomd communication from node "algeria" to the local node,
"".
/usr/es/sbin/cluster/utilities/clsnapshot[2127]: apply_CS[116]: communication_check: line 49:
local: not fou
nd
Warning: unable to verify inbound clcomd communication from node "brazil" to the local node,
"".
clsnapshot: Verifying configuration using
Cannot get local HACMPnode ODM.
Cannot get local HACMPnode ODM.
FATAL ERROR: CA_invoke_client nodecompath
FATAL ERROR: CA_invoke_client nodecompath
FATAL ERROR: CA_invoke_client nodecompath
FATAL ERROR: CA_invoke_client nodecompath
temporary PowerHA SystemMirror ODM entries...
==
==
==
==
NULL!
NULL!
NULL!
NULL!
@
@
@
@
line:
line:
line:
line:
of
of
of
of
file:
file:
file:
file:
clver_ca_main.c
clver_ca_main.c
clver_ca_main.c
clver_ca_main.c
Figure 7-28 A failed snapshot restoration
Chapter 7. Migrating to PowerHA 7.1
169
If you are at PowerHA 7.1 SP2, you should not see the failure message. However, some error
messages concern the disk heartbeat network (Figure 7-29), which is not supported in
PowerHA 7.1. You can ignore this error message.
clsnapshot: Removing any existing temporary PowerHA SystemMirror ODM entries...
clsnapshot: Creating temporary PowerHA SystemMirror ODM object classes...
clsnapshot: Adding PowerHA SystemMirror ODM entries to a temporary
directory..ODMDIR set to /tmp/snapshot
Error: Network's network type diskhb is not known.
Error: Interface/Label's network type diskhb is not known.
cllsclstr: Error reading configuration
Error: Network's network type diskhb is not known.
Error: Interface/Label's network type diskhb is not known.
cllsnode: Error reading configuration
clodmget: Could not retrieve object for HACMPnode, odm errno 5904
/usr/es/sbin/cluster/utilities/clsnapshot[2139]: apply_CS[125]:
communication_check: line 52: local: not found
Warning: unable to verify inbound clcomd communication from node "algeria" to
the local node, "
".
/usr/es/sbin/cluster/utilities/clsnapshot[2139]: apply_CS[125]:
communication_check: line 52: local: not found
Warning: unable to verify inbound clcomd communication from node "brazil" to
the local node, ""
Figure 7-29 The snapshot restoring the error with the new clsnapshot command
When you finish restoring the snapshot, the CAA cluster is created based on the repository
disk and multicast address based in the /var/clmigcheck/clmigcheck.txt file.
Sometimes the synchronization or verification fails because the snapshot cannot create the
CAA cluster. If you see an error message similar to the one shown in Figure 7-30, look in the
/var/adm/ras/syslog.caa file and correct the problem.
ERROR: Problems encountered creating the cluster in AIX. Use the syslog
facility to see output from the mkcluster command.
ERROR: Creating the cluster in AIX failed. Check output for errors in local
cluster configuration, correct them, and try synchronization again.
ERROR: Updating the cluster in AIX failed. Check output for errors in local
cluster configuration, correct them, and try synchronization again.
cldare: Error detected during synchronization.
Figure 7-30 Failure of CAA creation during synchronization or verification
170
IBM PowerHA SystemMirror 7.1 for AIX
Figure 7-30 on page 170 shows a sample CAA creation failure, which is a clrepos_private1
file system mount point that is used for the CAA service. Assuming you have enabled syslog,
you can easily find it in the syslog.caa file, which you can find by searching on “odmadd
HACMPsircol.add.”
After completing all the steps, check the CAA cluster configuration and status on both nodes.
First, the caavg_private volume group is created and varied on as shown in Figure 7-31.
algeria:/ # lspv
hdisk2
000fe4114cf8d258
hdisk3
000fe4114cf8d2ec
hdisk8
000fe4114cf8d608
caa_private0
000fe40120e16405
hdisk0
000fe4113f087018
algeria:/ #
algeria_vg
brazil_vg
diskhb
caavg_private
rootvg
active
active
Figure 7-31 The caavg_private volume group varied on
Chapter 7. Migrating to PowerHA 7.1
171
From the lscluster command, you can see information about the CAA cluster including the
repository disk, the multicast address, and so on, as shown in Figure 7-32.
algeria:/ # lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2
Node name: algeria
Cluster shorthand id for node: 1
uuid for node: 0410c158-c6ca-11df-88bc-c21e45bc6603
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
canada_cluster
local
e8fbea82-c6c9-11df-b8d6-c21e4a9e5103
Number of points_of_contact for node: 0
Point-of-contact interface & contact state
n/a
-----------------------------Node name: brazil
Cluster shorthand id for node: 2
uuid for node: e8ff0dde-c6c9-11df-b8d6-c21e4a9e5103
State of node: UP
Smoothed rtt to node: 7
Mean Deviation in network rtt to node: 3
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
canada_cluster
local
e8fbea82-c6c9-11df-b8d6-c21e4a9e5103
Number of points_of_contact for node: 2
Point-of-contact interface & contact state
en1 UP
en0 UP
algeria:/mnt/HA71 # lscluster -c
Cluster query for cluster canada_cluster returns:
Cluster uuid: e8fbea82-c6c9-11df-b8d6-c21e4a9e5103
Number of nodes in cluster = 2
Cluster id for node algeria is 1
Primary IP address for node algeria is 192.168.101.101
Cluster id for node brazil is 2
Primary IP address for node brazil is 192.168.101.102
Number of disks in cluster = 0
Multicast address for cluster is 228.168.101.102
algeria:/mnt/HA71 #
Figure 7-32 The lscluster command after creating the CAA cluster
172
IBM PowerHA SystemMirror 7.1 for AIX
You can also check whether the multicast address is correctly defined for each interface by
running the netstat -a -I en0 command as shown in Figure 7-33.
algeria:/ # netstat -a -I en0
Name Mtu
Network
Address
en0
1500 link#2
c2.1e.45.bc.66.3
01:00:5e:28:65:65
01:00:5e:7f:ff:fd
01:00:5e:00:00:01
en0
1500 192.168.100 algeria
228.168.101.101
239.255.255.253
224.0.0.1
en0
1500 10.168.100 algeria_svc
228.168.101.101
239.255.255.253
224.0.0.1
Ipkts Ierrs
1407667
0
Opkts Oerrs Coll
1034372
0
0
1407667
0
1034372
0
0
1407667
0
1034372
0
0
algeria:/ # netstat -a -I en1
Name Mtu
Network
Address
Ipkts Ierrs
en1
1500 link#3
c2.1e.45.bc.66.4
390595
0
01:00:5e:28:65:65
01:00:5e:7f:ff:fd
01:00:5e:00:00:01
en1
1500 192.168.200 algeria_boot
390595
0
228.168.101.101
239.255.255.253
224.0.0.1
Opkts Oerrs Coll
23
0
0
23
0
0
Figure 7-33 The multicast address for CAA service
After the clmigcheck command is done running, you can remove the older version of
PowerHA and install PowerHA 7.1.
Optional: Adding a shared disk to the CAA services
After the migration, the shared volume group is not included in the CAA service as shown in
Figure 7-34.
# lspv
caa_private0
hdisk2
hdisk3
hdisk0
#
000fe40120e16405
000fe4114cf8d258
000fe4114cf8d2ec
000fe4113f087018
caavg_private
algeria_vg
brazil_vg
rootvg
active
active
Figure 7-34 The lspv output after restoring the snapshot
Chapter 7. Migrating to PowerHA 7.1
173
To add the shared volume group disks to the CAA service, run the following command:
chcluster -n <cluster_name> -d +hdiskX, hdiskY
where:
<cluster_name>
+hdiskX
hdsiskY
is canada_cluster.
is +hdisk2.
is hdisk3.
The two shared disks are now included in the CAA shared disk as shown in Figure 7-35.
algeria: # chcluster -n canada_cluster -d +hdisk2,hdisk3
chcluster: Cluster shared disks are automatically renamed to names such as
cldisk1, [cldisk2, ...] on all cluster nodes. However, this cannot
take place while a disk is busy or on a node which is down or not
reachable. If any disks cannot be renamed now, they will be renamed
later by the clconfd daemon, when the node is available and the disks
are not busy.
algeria: #
Figure 7-35 Using the chcluster command for shared disks
Now hdisk2 and hdisk3 are changed to cldisk. The hdisk name from the lspv command
shows the cldiskX instead of the hdiskX as shown in Figure 7-36.
algeria:/ # lspv
caa_private0
000fe40120e16405
cldisk1
000fe4114cf8d258
cldisk2
000fe4114cf8d2ec
hdisk8
000fe4114cf8d608
hdisk0
000fe4113f087018
algeria:/ #
caavg_private
algeria_vg
brazil_vg
diskhb
rootvg
active
active
Figure 7-36 The lspv command showing cldisks for shared disks
When you use the lscluster command to perform the check, you can see that the shared
disks (cldisk1 and cldisk2) are monitored by the CAA service. Keep in mind that two types
of disks are in CAA. One type is the repository disk that is shown as REPDISK, and the other
type is the shared disk that is shown as CLUSDISK. See Figure 7-37 on page 175.
174
IBM PowerHA SystemMirror 7.1 for AIX
algeria:/ # lscluster -d
Storage Interface Query
Cluster Name: canada_cluster
Cluster uuid: 97833c9e-c5b8-11df-be00-c21e45bc6603
Number of nodes reporting = 2
Number of nodes expected = 2
Node algeria
Node uuid = 88cff8be-c58f-11df-95ab-c21e45bc6604
Number of disk discovered = 3
cldisk2
state : UP
uDid : 533E3E213600A0B80001146320000F1A74C18BDAA0F1815
FAStT03IBMfcp05VDASD03AIXvscsi
uUid : 600a0b80-0011-4632-0000-f1a74c18bdaa
type : CLUSDISK
cldisk1
state : UP
uDid : 533E3E213600A0B8000291B080000D3CB053B7EA60F1815
FAStT03IBMfcp05VDASD03AIXvscsi
uUid : 600a0b80-0029-1b08-0000-d3cb053b7ea6
type : CLUSDISK
caa_private0
state : UP
uDid :
uUid : 600a0b80-0029-1b08-0000-d3cd053b7f0d
type : REPDISK
Node
Node uuid = 00000000-0000-0000-0000-000000000000
Number of disk discovered = 0
algeria:/ #
Figure 7-37 The shared disks monitored by the CAA service
Verifying the cluster
To verify the snapshot migration, check the components shown in Table 7-2 on each node.
Table 7-2 Components to verify after the snapshot migration
Component
Command
The CAA services are active.
lssrc -g caa
lscluster -m
The RSCT services are active.
lssrc -s cthags
Start the cluster service one by one.
smitty clstart
Chapter 7. Migrating to PowerHA 7.1
175
7.3.3 Checklist for performing a snapshot migration
Because the entire migration can be confusing, Table 7-3 provides a step-by-step checklist for
the snapshot migration of each node in the cluster.
Table 7-3 Checklist for performing a snapshot migration
Step
Node 1
Node 2
0
Ensure that the cluster is
running.
Ensure that the cluster is
running.
1
Create a snapshot.
2
Stop the cluster.
Stop the cluster.
lssrc -ls clstrmgrES
3
Update AIX 6.1.6.
Update AIX 6.1.6.
oslevel -s
install bos.cluster and
bos.ahafs filesets
4
Restart the system.
Restart the system.
5
Select option 2 from the
clmigcheck menu.
Check for unsupported
configurations.
6
Select option 3 from the
clmigcheck menu.
/var/clmigcheck/clgmicheck.txt
7
Remove PowerHA 5.5
and install PowerHA 7.1.
8
Convert the snapshot.
9
Restore the snapshot.
10
Start the cluster.
11
Remove PowerHA 5.5
and install PowerHA 7.1.
Check
lslpp -l | grep cluster
clconvert_snapshot
lssrc -ls clstrmgrES, hacmp.out
Start the cluster.
lssrc -ls clstrmgrES, hacmp.out
7.3.4 Summary
A snapshot migration to PowerHA 7.1 entails running the clmigcheck program. Before you
begin the migration, you must prepare for it by installing AIX 6.1.6 or later and checking if any
part of the configuration is unsupported.
Then you run the clmigcheck command to review your PowerHA configuration and verify that
is works with PowerHA 7.1. After verifying the configuration, you specify a repository disk and
multicast address for the CAA service, which are essential components for the CAA service.
After you successfully complete the clmigcheck procedure, you can install PowerHA 7.1. The
CAA service is made while you restore your snapshot. PowerHA 7.1 uses the newly
configured CAA service for event monitoring and heartbeating.
176
IBM PowerHA SystemMirror 7.1 for AIX
7.4 Rolling migration
This section explains how to perform a three-node rolling migration of AIX and PowerHA. The
test environment begins with PowerHA 6.1 SP3 and AIX 6.1 TL3 versions. The step-by-step
instructions in this topic explain how to perform a three-node rolling migration of AIX to 6.1
TL6 and PowerHA to 7.1 SP1 versions as illustrated in Figure 7-38.
Figure 7-38 Three-node cluster before migration
The cluster is using virtualized resources provided by VIOS for network and storage. Rootvg
(hdisk0) is also hosted from the VIOS. The backing devices are provided from a DS4800
storage system.
The network topology is configured as IPAT via aliasing. Also disk heartbeating is used over
the shared storage between all the nodes.
The cluster contains two resource groups: newyork_rg and test_rg. The newyork_rg resource
group hosts the IBM HTTP Server application, and the test_rg resource group hosts a test
script application. The node priority for newyork_rg is node chile, and test_rg is node
serbia. Node scotland is running in a standby node capacity.
Chapter 7. Migrating to PowerHA 7.1
177
Figure 7-39 shows the relevant attributes of the newyork_rg and test_rg resource groups.
Resource Group Name
Participating Node Name(s)
Startup Policy
Fallover Policy
Fallback Policy
Volume Groups
Application Servers
newyork_rg
chile scotland serbia
Online On Home Node Only
Fallover To Next Priority Node
Never Fallback
ny_datavg
httpd_app
Resource Group Name
Participating Node Name(s)
Startup Policy
Fallover Policy
Fallback Policy
Application Servers
test_app_rg
serbia chile scotland
Online On Home Node Only
Fallover To Next Priority Node
Fallback To Higher Priority Node
test_app
Figure 7-39 Three-node cluster resource groups
7.4.1 Planning
Before beginning a rolling migration, you must properly plan to ensure that you are ready to
proceed. For more information, see 7.1, “Considerations before migrating” on page 152. The
migration to PowerHA 7.1 is different from previous releases, because of the support for CAA
integration. Therefore, see also 7.2, “Understanding the PowerHA 7.1 migration process” on
page 153.
Ensure that the cluster is stable on all nodes and is synchronized. With a rolling migration,
you must be aware of the following restrictions while performing the migration, because a
mixed-software-version cluster is involved:
Do not perform synchronization or verification while a mixed-software-version cluster
exists. Such actions are not allowed in this case.
Do not make any cluster configuration changes.
Do not perform a Cluster Single Point Of Control (C-SPOC) operation while a
mixed-software-version cluster exists. Such action is not allowed in this case.
Try to perform the migration during one maintenance period, and do not leave your cluster
in a mixed state for any significant length of time.
7.4.2 Performing a rolling migration
In this example, a two-phase migration is performed in which you migrate AIX from version
6.1 TL3 to version 6.1 TL6, restart the system, and then migrate PowerHA. You perform this
migration on one node at a time, ensuring that any resource group that the node is hosting is
moved to another node first.
178
IBM PowerHA SystemMirror 7.1 for AIX
Migrating the first node
Figure 7-40 shows the cluster before upgrading AIX.
Figure 7-40 Rolling migration: Scotland before the AIX upgrade
To migrate the first node, follow these steps:
1. Shut down PowerHA services on the standby node (scotland). Specify the smitty clstop
command to stop this node. Because this node is a standby node, no resource groups are
hosted. Therefore, you do not need to perform any resource group operations first. Ensure
that cluster services are stopped by running the following command:
lssrc -ls clstrmgres
Look for the ST_INIT status, which indicates that cluster services on this node are in a
stopped state.
2. Update AIX to version 6.1 TL6 (scotland node). To perform this task, run the smitty
update_all command by using the TL6 images, which you can download by going to:
http://www.ibm.com/support/entry/portal/Downloads/IBM_Operating_Systems/AIX
CAA-specific file sets: You must install the CAA specific bos.cluster and bos.ahafs
file sets because update_all does not install them.
After you complete the installation, restart the node.
Chapter 7. Migrating to PowerHA 7.1
179
When AIX is upgraded, you are at the stage shown in Figure 7-41.
Figure 7-41 Rolling migration: Scotland post AIX upgrade
3. Decide which shared disk you to use for the CAA private repository (scotland node). See
7.1, “Considerations before migrating” on page 152, for more information.
Previous volume disk group: The disk must be a clean logical unit number (LUN) that
does not contain a previous volume group. If you have a previous volume group on this
disk, you must remove it. See 10.4.5, “Volume group name already in use” on
page 320.
4. Run the clmigcheck command on the first node (scotland).
You have now upgraded AIX to a CAA version and chosen the CAA disk. When you start
the clmigcheck command, you see the panel shown in Figure 7-42 on page 181. For more
information about the clmigcheck command, see 7.2, “Understanding the PowerHA 7.1
migration process” on page 153.
180
IBM PowerHA SystemMirror 7.1 for AIX
------------[ PowerHA SystemMirror Migration Check ]------------Please select one of the following options:
1
= Check ODM configuration.
2
= Check snapshot configuration.
3
= Enter repository disk and multicast IP addresses.
Select one of the above,"x"to exit or "h" for help:
Figure 7-42 Running the clmigcheck command first during a rolling migration
a. Select option 1 (Check the ODM configuration).
When choosing this option, the clmigcheck command checks your configuration and
reports any problems that cannot be migrated.
This migration scenario uses disk-based heartbeating. The clmigcheck command
detects this method and shows a message similar to the one in Figure 7-43, indicating
that this configuration will be removed during migration.
------------[ PowerHA SystemMirror Migration Check ]------------CONFIG-WARNING: The configuration contains unsupported hardware: Disk
Heartbeat network. The PowerHA network name is net_diskhb_01. This will
be
removed from the configuration during the migration
to PowerHA SystemMirror 7.1.
Hit <Enter> to continue
Figure 7-43 The disk heartbeat warning message from the clmigcheck command
You do not need to take any action because the disk-based heartbeating is
automatically removed during migration. Because three disk heartbeat networks are in
the configuration, this warning message is displayed three times, once for each
network. If no errors are detected, you see the message shown in Figure 7-44.
------------[ PowerHA SystemMirror Migration Check ]------------The ODM has no unsupported elements.
Hit <Enter> to continue
Figure 7-44 ODM no unsupported elements message
Press Enter after this last panel, and you return to the main menu.
Chapter 7. Migrating to PowerHA 7.1
181
b. Select option 3 to enter the repository disk. As shown in Figure 7-45, in this scenario,
we chose option 1 to use hdisk1 (PVID 000fe40120e16405).
-----------[ PowerHA SystemMirror Migration Check ]------------Select the disk to use for the repository
1
2
3
4
5
6
=
=
=
=
=
=
000fe40120e16405(hdisk1)
000fe4114cf8d258(hdisk2)
000fe4114cf8d2ec(hdisk3)
000fe4013560cc77(hdisk5)
000fe4114cf8d4d5(hdisk6)
000fe4114cf8d579(hdisk7)
Select one of the above or "x" to exit:
Figure 7-45 Choosing a CAA disk
c. Enter the multicast address as shown in Figure 7-46. You can specify a multicast, or
you can have clmigcheck automatically assign one. For more information about
multicast addresses, see 1.3.1, “Communication interfaces” on page 13. Press Enter
and you return to the main menu.
------------[ PowerHA SystemMirror Migration Check ]------------PowerHA SystemMirror uses multicast address for internal
cluster communication and monitoring. These must be in the
multicast range, 224.0.0.0 - 239.255.255.255.
If you make a NULL entry, AIX will generate an appropriate address for
you.
You should only specify an address if you have an explicit reason to do
so, but are cautioned that this address cannot be changed once the
configuration is activated (i.e. migration is complete).
h = help
Enter the multicast IP address to use for network monitoring:
Figure 7-46 Choosing a multicast address
d. Exit the clmigcheck tool.
182
IBM PowerHA SystemMirror 7.1 for AIX
5. Verify whether you are ready for the PowerHA upgrade on the node scotland by running
the clmigcheck tool again. If you are ready, you see the panel shown in Figure 7-47.
------------[ PowerHA SystemMirror Migration Check ]------------clmigcheck: This is not the first node or last node clmigcheck was run on.
No further checking is required on this node. You can install the new
version of PowerHA SystemMirror.
Hit <Enter> to continue
Figure 7-47 Verifying readiness for migration
6. Upgrade PowerHA on the scotland node to PowerHA 7.1 SP1. Because the cluster
services are down, you can perform a smitty update_all to upgrade PowerHA.
7. When this process is complete, modify the new rhosts definition for CAA as shown in
Figure 7-48. Although in this scenario, we used network addresses, you can also add the
short name for the host name into rhosts considering that you configured the /etc/hosts
file correctly. See “Creating a cluster with host names in the FQDN format” on page 75, for
more information.
/etc/cluster
# cat rhosts
192.168.101.111
192.168.101.112
192.168.101.113
Figure 7-48 Extract showing the configured rhosts file
Populating the /etc/cluster/rhosts file: The /etc/cluster/rhosts file must be
populated with all cluster IP addresses before using PowerHA SystemMirror. This
process was done automatically in previous releases but is now a required, manual
process. The addresses that you enter in this file must include the addresses that
resolve to the host name of the cluster nodes. If you update this file, you must refresh
the clcomd subsystem by using the following command:
refresh -s clcomd
Restarting the cluster: You do not need to restart the cluster after you upgrade
PowerHA.
8. Start PowerHA on the scotland node by issuing the smitty clstart command. The node
should be able to rejoin the cluster. However, you receive warning messages about mixed
versions of PowerHA.
After PowerHA is started on this node, move any resource groups that the next node is
hosting onto this node so that you can migrate the second node in the cluster. In this
scenario, the serbia node is hosting the test_app_rg resource group. Therefore, we
perform a resource group move request to move this resource to the newly migrated
scotland node. The serbia node is then available to migrate.
Chapter 7. Migrating to PowerHA 7.1
183
You have now completed the first node migration of the three-node cluster. You have
rejoined the cluster and are now in a mixed version. Figure 7-49 shows the starting point
for migrating the next node in the cluster, with the test_app_rg resource group moved to
the newly migrated scotland node.
Figure 7-49 Rolling migration: Scotland post HA upgrade
184
IBM PowerHA SystemMirror 7.1 for AIX
Migrating the second node
Figure 7-50 shows that you are ready to proceed with migration of the second node (serbia).
Figure 7-50 Rolling migration: Serbia before the AIX upgrade
To migrate the second node, follow these steps:
1. Shut down PowerHA services on the serbia node. You must stop cluster services on this
node before you begin the migration.
2. Upgrade to AIX 6.1 TL6 (serbia node) similar to the process you used for the scotland
node. After the update is complete, ensure that AIX is rebooted.
Chapter 7. Migrating to PowerHA 7.1
185
You are now in the state as shown in Figure 7-51.
Figure 7-51 Rolling migration: Serbia post AIX upgrade
3. Run the clmigcheck command to ensure that the migration worked and that you can
proceed with the PowerHA upgrade. This step is important even though you have already
performed the cluster configuration migration check and CAA configuration on the first
node (scotland) is complete.
Figure 7-52 shows the panel that you see now.
------------[ PowerHA SystemMirror Migration Check ]------------clmigcheck: This is not the first node or last node clmigcheck was run on.
No further checking is required on this node. You can install the new
version of PowerHA SystemMirror.
Hit <Enter> to continue
Figure 7-52 The clmigcheck panel on the second node
4. Upgrade PowerHA on the serbia node to PowerHA 7.1 SP1. Follow the same migration
procedure as in the first node.
Reminder: Update the /etc/cluster/rhosts file so that it is the same as the first node
that you upgraded. See step 6 on page 183.
186
IBM PowerHA SystemMirror 7.1 for AIX
5. Start PowerHA on the serbia node and rejoin this node to the cluster.
After this node is started, check and move the newyork_rg resource group from the chile
node to the scotland node. By performing this task, you are ready to proceed with
migration of the final node in the cluster (the chile node).
At this stage, two of the three nodes in the cluster are migrated to AIX 6.1 TL6 and PowerHA
7.1. The chile node is the last node in the cluster to be upgraded. Figure 7-53 shows how the
cluster looks now.
Figure 7-53 Rolling migration: The serbia node post HA upgrade
Chapter 7. Migrating to PowerHA 7.1
187
Migrating the final node
Figure 7-54 shows that you are ready to proceed with migration of the final node of the chile
cluster. The newyork_rg resource group has been moved to the scotland node and the
cluster services are down and ready for the AIX migration.
Figure 7-54 Rolling migration: The chile node before the AIX upgrade
To migrate the final node, follow these steps:
1. Shut down PowerHA services on the chile node.
2. Upgrade to AIX 6.1 TL6 (chile node). Remember to reboot the node after the upgrade.
Then run the clmigcheck command for the last time.
When the clmigcheck command is run for the last time, it recognizes that this node is the
last node of the cluster to migrate. This command then initiates the final phase of the
migration, which configures CAA. You see the message shown in Figure 7-55.
clmigcheck: You can install the new version of PowerHA SystemMirror.
Figure 7-55 Final message from the clmigcheck command
188
IBM PowerHA SystemMirror 7.1 for AIX
If a problem exists at this stage, you might see the message shown in Figure 7-56.
chile:/ # clmigcheck
Verifying clcomd communication, please be patient.
clmigcheck: Running
/usr/sbin/rsct/install/bin/ct_caa_set_disabled_for_migration
on each node in the cluster
Creating CAA cluster, please be patient.
ERROR: Problems encountered creating the cluster in AIX.
Use the syslog facility to see output from the mkcluster command.
Figure 7-56 Error condition from clmigcheck
If you see a message similar to the one shown in Figure 7-56, the final mkcluster phase
has failed. For more information about this problem, see 10.2, “Troubleshooting the
migration” on page 308.
At this stage, you have upgraded AIX and run the final clmigcheck process. Figure 7-57
shows how the cluster looks now.
Figure 7-57 Rolling migration: Chile post AIX upgrade
Chapter 7. Migrating to PowerHA 7.1
189
3. Upgrade PowerHA on the chile node by following the same procedure that you previously
used.
Reminder: Update the /etc/cluster/rhosts file so that it is the same as the other
nodes that you upgraded. See step 6 on page 183.
In this scenario, you started PowerHA on the chile node and performed a synchronization or
verification of the cluster, which is the final stage of the migration. The newyork_rg resource
group was moved back to the chile node. The cluster migration is now completed.
Figure 7-58 shows how the cluster looks now.
Figure 7-58 Rolling migration completed
190
IBM PowerHA SystemMirror 7.1 for AIX
7.4.3 Checking your newly migrated cluster
After the migration is completed, perform the following checks to ensure that everything has
migrated correctly:
 Verify that CAA is configured and running on all nodes.
Check that CAA is working by running the lscluster -m command. This command returns
information about your cluster from all your nodes. If a problem exists, you see a message
similar to the one shown in Figure 7-59.
# lscluster -m
Cluster services are not active.
Figure 7-59 Message indicating that CAA is not running
If you receive this message, see 10.4.7, “The ‘Cluster services are not active’ message” on
page 323, for details about how to fix this problem.
 Verify that CAA private is defined and active on all nodes.
Check the lspv output to ensure that the CAA repository is defined and varied on for each
node. You see output similar to what is shown in Figure 7-60.
chile:/ # lspv
caa_private0
hdisk2
000fe40120e16405
000fe4114cf8d258
caavg_private
None
active
Figure 7-60 Extract from lspv showing the CAA repository disk
 Check conversion of PowerHA ODM.
Review the /tmp/clconvert.log file to ensure that the conversion of the PowerHA ODM has
been successful. For additional details about the log files and troubleshooting information,
see 10.1, “Locating the log files” on page 306.
 Synchronize or verify the cluster.
Run verification on your cluster to ensure that it operates as expected.
Troubleshooting: For information about common problems and solutions, see
Chapter 10, “Troubleshooting PowerHA 7.1” on page 305.
7.5 Offline migration
This section explains how to perform an offline migration. The test environment begins with
AIX 6.1.3.2 and PowerHA 6.1.0.2. The migration leads to AIX 7.1.0.1 and PowerHA 7.1.0.1.
7.5.1 Planning the offline migration
Part of planning for any migration is to ensure that you meet all the hardware and software
requirements. For more details, see 7.1, “Considerations before migrating” on page 152, and
7.2, “Understanding the PowerHA 7.1 migration process” on page 153.
Chapter 7. Migrating to PowerHA 7.1
191
Starting configuration
Figure 7-61 on page 192 shows a simplified layout of the cluster that is migrated in this
scenario. Both systems are running AIX 6.1 TL3 SP 2. The installed PowerHA version is 6.1
SP 2.
The cluster layout is a mutual takeover configuration. The munich system is the primary server
for the HTTP application. The berlin system is the primary server for the Network File
System (NFS), which is cross mounted by the system munich.
Because of resource limitations, the disk heartbeat is using one of the existing shared disks.
Two networks are defined:
The net_ether_01 network is the administrative network and is used only by the system
administration team.
The net_ether_10 network is used by the applications and its users.
Figure 7-61 Start point for offline migration
192
IBM PowerHA SystemMirror 7.1 for AIX
Planned target configuration
The plan is to update both systems to AIX 7.1 and to PowerHA SystemMirror 7.1. Because
PowerHA SystemMirror 6.1 SP2 is not supported on AIX 7.1, the quickest way to update it is
through an offline migration. A rolling migration is also possible, but requires the following
migration steps:
1. Update to PowerHA 6.1 SP3 or later (which can be performed by using a nondisruptive
upgrade method).
2. Migrate to AIX 7.1.
3. Migrate to PowerHA 7.1.
PowerHA 6.1 support on AIX 7.1: PowerHA 6.1 SP2 is not supported on AIX 7.1. You
need a minimum of PowerHA 6.1 SP3.
As mentioned in 1.2.3, “The central repository” on page 9, an additional shared disk is
required for the new CAA repository disk. Figure 7-62 shows the results of the completed
migration. To perform the migration, see 7.5.3, “Performing an offline migration” on page 195.
Figure 7-62 Planned configuration for offline migration
Chapter 7. Migrating to PowerHA 7.1
193
7.5.2 Offline migration flow
Figure 7-63 shows a high-level overview of the offline migration flow. First and most
importantly, you must have fulfilled all the new hardware requirements. Then you ensure that
AIX has been upgraded on all cluster nodes before continuing with the update of PowerHA.
To perform the migration, see 7.5.3, “Performing an offline migration” on page 195.
Figure 7-63 Offline migration flow
194
IBM PowerHA SystemMirror 7.1 for AIX
7.5.3 Performing an offline migration
Before you start the migration, you must complete all hardware and software requirements.
For a list of the requirements, see 7.1, “Considerations before migrating” on page 152.
1. Create a snapshot and copy it to a safe place and create a system backup (mksysb).
The snapshot and the mksysb are not required to complete the migration, but they might be
helpful if something goes wrong. You can also use the snapshot file to perform a snapshot
migration. You can use the system backup to re-install the system back to its original
starting point if necessary.
2. Stop cluster services on all nodes by running the smitty clstop command. Before you
continue, ensure that cluster services are stopped on all nodes.
3. Update to AIX 6.1.6 or later. Alternatively perform a migration installation of AIX to version
7.1. or later.
In this test scenario, a migration installation to version 7.1 is performed on both systems in
parallel.
4. Ensure that the new AIX cluster file sets are installed, specifically the bos.ahafs and
bos.cluster file sets. These file sets are not installed as part of the AIX migration.
5. Restart the systems.
Important: You must restart the systems to ensure that all needed processes for CAA
are running.
6. Verify that the new clcomd subsystem is running.
If the clcomd subsystem is not running, a required file set is missing (see step 4).
Figure 7-64 shows an example of the output indicating that the subsystems are running.
# lssrc -a | grep clcom
clcomd
caa
clcomdES
clcomdES
#
3866824
5243068
active
active
Figure 7-64 Verifying if the clcomd subsystem is running
Beginning with PowerHA 6.1 SP3 or later, you can start the cluster if preferred, but we do
not start it now in this scenario.
7. Run the clmigcheck program on one of the cluster nodes.
Important: You must run the clmigcheck program (in the /usr/sbin/ directory) before
you install PowerHA 7.1. Keep in mind that you must run this program on each node
one-at-a-time in the cluster.
Chapter 7. Migrating to PowerHA 7.1
195
The following steps are required for offline migration when running the clmigcheck
program. The steps might differ slightly if you perform a rolling or snapshot migration.
a. Select option 1 (check ODM configuration) from the first clmigcheck panel
(Figure 7-65).
------------[ PowerHA SystemMirror Migration Check ]------------Please select one of the following options:
1
= Check ODM configuration.
2
= Check snapshot configuration.
3
= Enter repository disk and multicast IP addresses.
Select one of the above,"x"to exit or "h" for help: 1
Figure 7-65 The clmigcheck main panel
While checking the configuration, you might see warning or error messages. You must
correct errors manually, but can clean up issues identified by warning messages during
the migration process. In this case, a warning message (Figure 7-66) is displayed
indicating the disk heartbeat network will be removed at the end of the migration.
------------[ PowerHA SystemMirror Migration Check ]------------CONFIG-WARNING: The configuration contains unsupported hardware: Disk
Heartbeat network. The PowerHA network name is net_diskhb_01. This will
be
removed from the configuration during the migration
to PowerHA SystemMirror 7.1.
Hit <Enter> to continue
Figure 7-66 Warning message after selecting clmigcheck option 1
b. Continue with the next clmigcheck panel.
Only one error or warning is displayed at a time. Press the Enter key, and any
additional messages are displayed. In this case, only one warning message is
displayed.
Manually correct or fix all issues that are identified by error messages before
continuing with the process. After you fix an issue, restart the system as explained in
step 5 on page 195.
196
IBM PowerHA SystemMirror 7.1 for AIX
c. Verify that you receive a message similar to the one in Figure 7-67 indicating that ODM
has no supported elements. You must receive this message before you continue with
the clmigcheck process and the installation of PowerHA.
------------[ PowerHA SystemMirror Migration Check ]------------The ODM has no unsupported elements.
Hit <Enter> to continue
Figure 7-67 ODM check successful message
Press Enter, and the main clmigcheck panel (Figure 7-65 on page 196) is displayed
again.
d. Select option 3 (Enter repository disk and multicast IP addresses).
The next panel (Figure 7-68) lists all available shared disks that might be used for the
CAA repository disk. You need one shared disk for the CAA repository.
------------[ PowerHA SystemMirror Migration Check ]------------Select the disk to use for the repository
1
= 00c0f6a01c784107(hdisk4)
Select one of the above or "x" to exit: 1
Figure 7-68 Selecting the repository disk
e. Configure the multicast address as shown in Figure 7-69 on page 198. The system
automatically creates an appropriate address for you. By default, PowerHA creates a
multicast address by replacing the first octet of the IP communication path of the lowest
node in the cluster by 228. Press Enter.
Manually specifying an address: Only specify an address manually if you have an
explicit reason to do so.
Important:
You cannot change the selected IP multicast address after the configuration is
activated.
You must set up any routers in the network topology to forward multicast
messages.
Chapter 7. Migrating to PowerHA 7.1
197
------------[ PowerHA SystemMirror Migration Check ]------------PowerHA SystemMirror uses multicast address for internal
cluster communication and monitoring. These must be in the
multicast range, 224.0.0.0 - 239.255.255.255.
If you make a NULL entry, AIX will generate an appropriate address for
you.
You should only specify an address if you have an explicit reason to do
so, but are cautioned that this address cannot be changed once the
configuration is activated (i.e. migration is complete).
h = help
Enter the multicast IP address to use for network monitoring:
Figure 7-69 Configuring a multicast address
f. From the main clmigcheck panel, type an x to exit the clmigcheck program.
g. In the next panel (Figure 7-70), confirm the exit request by typing y.
------------[ PowerHA SystemMirror Migration Check ]------------You have requested to exit clmigcheck.
Do you really want to exit? (y) y
Figure 7-70 The clmigcheck exit confirmation message
A warning message (Figure 7-71) is displayed as a reminder to complete all the
previous steps before you exit.
Note - If you have not completed the input of repository disks and
multicast IP addresses, you will not be able to install
PowerHA SystemMirror
Additional details for this session may be found in
/tmp/clmigcheck/clmigcheck.log.
Figure 7-71 The clmigcheck exit warning message
198
IBM PowerHA SystemMirror 7.1 for AIX
8. Install PowerHA only on the node where the clmigcheck program was executed.
If the clmigcheck program is not run, a failure message (Figure 7-72) is displayed when
you try to install PowerHA 7.1. In this case, return to step 7 on page 195.
COMMAND STATUS
Command: failed
stdout: yes
stderr: no
Before command completion, additional instructions may appear below.
[MORE...94]
restricted by GSA ADP Schedule Contract with IBM Corp.
. . . . . << End of copyright notice for cluster.es.migcheck >>. . . .
The /usr/sbin/clmigcheck command must be run to
verify the back level configuration before you can
install this version. If you are not migrating the
back level configuration you must remove it before
before installing this version.
Failed /usr/sbin/clmigcheck has not been run
instal: Failed while executing the cluster.es.migcheck.pre_i script.
[MORE...472]
F1=Help
F8=Image
n=Find Next
F2=Refresh
F9=Shell
F3=Cancel
F10=Exit
F6=Command
/=Find
Figure 7-72 PowerHA 7.1 installation failure message
9. Add the host names of your cluster nodes to the /etc/cluster/rhosts file. The names
must match the PowerHA node names.
10.Refresh the clcomd subsystem.
refresh -s clcomd
11.Review the /tmp/clconvert.log file to ensure that a conversion of the PowerHA ODMs
has occurred.
12.Start cluster services only on the node that you updated by using smitty clstart.
13.Ensure that the cluster services have started successfully on this node by using any of the
following commands:.
clstat -a
lssrc -ls clstrmgrES | grep state
clmgr query cluster | grep STATE
14.Continue to the next node.
15.Run the clmigcheck program on this node.
Keep in mind that you must run the clmigcheck program on each node before you can
install PowerHA 7.1. Follow the same steps as for the first system as explained in step 7
on page 195.
Chapter 7. Migrating to PowerHA 7.1
199
An error message similar to the one shown in Figure 7-73 indicates that one of the steps
was not performed. Often this message is displayed because the system was not
restarted after the installation of the AIX cluster file sets.
To correct this issue, return to step 4 on page 195. You might have to restart both systems,
depending on which part was missed.
# clmigcheck
Saving existing /tmp/clmigcheck/clmigcheck.log to
/tmp/clmigcheck/clmigcheck.log.bak
rshexec: cannot connect to node munich
ERROR: Internode communication failed,
check the clcomd.log file for more information.
#
Figure 7-73 The clmigcheck execution error message
Attention: Do not start the clcomd subsystem manually. Starting this system manually
can result in further errors, which might require you to re-install this node or all the
cluster nodes.
16.Install PowerHA only on this node in the same way as you did on the first node. See step 8
on page 199.
17.As on the first node, add the host names of your cluster nodes to the /etc/cluster/rhosts
file. The names must be the same as the node names.
18.Refresh the clcomd subsystem.
19.Start the cluster services only on the node that you updated.
20.Ensure that the cluster services started successfully on this node.
21.If you have more than two nodes in you cluster, repeat step 15 on page 199 through step
20 until all of your cluster nodes are updated.
You now have a fully running cluster environment. Before going into production mode, test
your cluster as explained in Chapter 9, “Testing the PowerHA 7.1 cluster” on page 259.
Upon checking the topology information by using the cltopinfo command, all non-IP and disk
heartbeat networks should be removed. If these networks are not removed, see Chapter 10,
“Troubleshooting PowerHA 7.1” on page 305.
When checking the RSCT subsystems, the topology subsystem should now be inactive as
shown in Figure 7-74.
# lssrc -a | grep svcs
grpsvcs
grpsvcs
emsvcs
emsvcs
topsvcs
topsvcs
grpglsm
grpsvcs
emaixos
emsvcs
Figure 7-74 Checking for topology service
200
IBM PowerHA SystemMirror 7.1 for AIX
6684834
5898390
active
active
inoperative
inoperative
inoperative
8
Chapter 8.
Monitoring a PowerHA
SystemMirror 7.1 for AIX cluster
Monitoring plays an important role in managing issues when a cluster has duplicated
hardware that can “hide” the failing components from the user. It is also essential for tracking
the behavior of a cluster and helping to address performance issues or bad design
implementations.
The role of the administrator is to quickly find relevant information and analyze it to make the
best decision in every situation. This chapter provides several examples that show how the
PowerHA 7.1 administrator can gather information about the cluster by using several
methods.
For most of the examples in this chapter, the korea cluster from the test environment is used
with the participating seoul and busan nodes. All the commands in the examples are executed
as root user.
This chapter includes the following topics:
Collecting information before a cluster is configured
Collecting information after a cluster is configured
Collecting information after a cluster is running
© Copyright IBM Corp. 2011. All rights reserved.
201
8.1 Collecting information before a cluster is configured
Before you configure the cluster, you must collect the relevant information. Later, the
administrator can use this information to see the changes that have been made after a
configured IBM PowerHA SystemMirror 7.1 for AIX cluster is running. Ensure that this
information is available to assist in troubleshooting and diagnosing the cluster in the future.
This topic lists the relevant information that you might want to collect.
The /etc/hosts file
The /etc/hosts file must have all the IP addresses that are used in the cluster configuration,
including the boot or base addresses, persistent addresses, and service addresses, as
shown in Example 8-1.
Example 8-1 A /etc/hosts sample configuration
seoul, busan:/ # egrep "seoul|busan|poksap" /etc/hosts
192.168.101.143
seoul-b1
# Boot IP label 1
192.168.101.144
busan-b1
# Boot IP label 1
192.168.201.143
seoul-b2
# Boot IP label 2
192.168.201.144
busan-b2
# Boot IP label 2
10.168.101.43
seoul
# Persistent IP
10.168.101.44
busan
# Persistent IP
10.168.101.143
poksap-db
# Service IP label
The /etc/cluster/rhosts file
The /etc/cluster/rhosts file (Example 8-2) in PowerHA 7.1 replaces the
/usr/es/sbin/cluster/etc/rhosts file. This file is populated with the communication paths
used at the moment of the nodes definition.
Example 8-2 A /etc/cluster/rhosts sample configuration
seoul, busan:/ # cat /etc/cluster/rhosts
seoul
# Persistent IP address used as communication path
busan
# Persistent IP address used as communication path
CAA subsystems
Cluster Aware AIX (CAA) introduces a new set of subsystems. When the cluster is not
running, its status is inactive, except for the clcomd subsystem, which is active (Example 8-3).
The clcomdES subsystem has been replaced by the clcomd subsystem and is no longer part of
the cluster subsystems group. It is now part of the AIX Base Operating System (BOS), not
PowerHA.
Example 8-3 CAA subsystems status
seoul, busan:/ # lssrc -a | grep caa
clcomd
caa
5505056
cld
caa
clconfd
caa
active
inoperative
inoperative
busan:/ # lslpp -w /usr/sbin/clcomd
File
Fileset
Type
---------------------------------------------------------------------------/usr/sbin/clcomd
bos.cluster.rte
File
202
IBM PowerHA SystemMirror 7.1 for AIX
PowerHA groups
IBM PowerHA 7.1 creates two operating system groups during installation. The group
numbers must be consistent across cluster nodes as shown in Example 8-4.
Example 8-4 Groups created while installing PowerHA file sets
seoul, busan:/ # grep ha /etc/group
hacmp:!:202:
haemrm:!:203:
Disk configuration
With the current code level in AIX 7.1.0.1, the CAA repository cannot be created over virtual
SCSI (VSCSI) disks. For the korea cluster, a DS4800 storage system is used and is accessed
over N_Port ID Virtualization (NPIV). The rootvg volume group is the only one using VSCSI
devices. Example 8-5 shows a list of storage disks.
Example 8-5 Storage disks listing
seoul:/ # lspv
hdisk0
00c0f6a088a155eb
hdisk1
00c0f6a077839da7
hdisk2
00c0f6a0107734ea
hdisk3
00c0f6a010773532
rootvg
None
None
None
active
busan:/ # lspv
hdisk0
00c0f6a089390270
hdisk1
00c0f6a077839da7
hdisk2
00c0f6a0107734ea
hdisk3
00c0f6a010773532
rootvg
None
None
None
active
seoul,
hdisk0
hdisk1
hdisk2
hdisk3
busan:/ #
Available
Available
Available
Available
lsdev -Cc disk
Virtual SCSI Disk Drive
C5-T1-01 MPIO Other DS4K Array Disk
C5-T1-01 MPIO Other DS4K Array Disk
C5-T1-01 MPIO Other DS4K Array Disk
Network interfaces configuration
The boot or base address is configured as the initial address for each network interface. The
future persistent IP address is aliased over the en0 interface in each node before the
PowerHA cluster configuration. Example 8-6 shows a configuration of the network interfaces.
Example 8-6 Network interfaces configuration
seoul:/ # ifconfig -a
en0:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.101.143 netmask 0xffffff00 broadcast 192.168.101.255
inet 10.168.101.43 netmask 0xffffff00 broadcast 10.168.101.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
en2:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.201.143 netmask 0xffffff00 broadcast 192.168.201.255
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
203
lo0:
flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LAR
GESEND,CHAIN>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
busan:/ # ifconfig -a
en0: en0:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.101.144 netmask 0xffffff00 broadcast 192.168.101.255
inet 10.168.101.44 netmask 0xffffff00 broadcast 10.168.101.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
en2:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.201.144 netmask 0xffffff00 broadcast 192.168.201.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
lo0:
flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LAR
GESEND,CHAIN>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
Routing table
Keeping the routing table is an important source of information. As shown in 8.3.1, “AIX
commands and log files” on page 216, the multicast address is not displayed in this table,
even when the CAA and IBM PowerHA clusters are running. Example 8-7 shows the routing
table for the seoul node.
Example 8-7 Routing table
seoul:/ # netstat -rn
Routing tables
Destination
Gateway
Flags
Route tree for Protocol Family 2 (Internet):
default
192.168.100.60
UG
10.168.100.0
10.168.101.43
UHSb
10.168.100/22
10.168.101.43
U
10.168.101.43
127.0.0.1
UGHS
10.168.103.255
10.168.101.43
UHSb
127/8
127.0.0.1
U
192.168.100.0
192.168.101.143
UHSb
192.168.100/22
192.168.101.143
U
192.168.101.143
127.0.0.1
UGHS
192.168.103.255
192.168.101.143
UHSb
192.168.200.0
192.168.201.143
UHSb
192.168.200/22
192.168.201.143
U
192.168.201.143
127.0.0.1
UGHS
192.168.203.255
192.168.201.143
UHSb
204
IBM PowerHA SystemMirror 7.1 for AIX
Refs
Use
If
1
0
10
11
0
12
0
2
0
0
0
0
0
0
3489
0
39006
24356
0
10746
0
1057
16
39
0
2
4
0
en0
en0
en0
lo0
en0
lo0
en0
en0
lo0
en0
en2
en2
lo0
en2
Exp
Groups
-
-
=>
=>
=>
Route tree for Protocol Family 24 (Internet v6):
::1%1
::1%1
UH
3
17903 lo0
-
-
Multicast information
You can use the netstat command to display information about an interface for which
multicast is enabled. As shown in Example 8-8 for en0, no multicast address is configured,
other than the default 224.0.0.1 address before the cluster is configured.
Example 8-8 Multicast information
seoul:/ # netstat -a -I en0
Name Mtu
Network
Address
en0
1500 link#2
a2.4e.50.54.31.3
01:00:5e:7f:ff:fd
01:00:5e:00:00:01
en0
1500 192.168.100 seoul-b1
239.255.255.253
224.0.0.1
en0
1500 10.168.100 seoul
239.255.255.253
224.0.0.1
Ipkts Ierrs
304248
0
Opkts
60964
Oerrs
0
Coll
0
304248
0
60964
0
0
304248
0
60964
0
0
Status of the IBM Systems Director common agent subsystems
The two subsystems must be active in every node to be discovered and managed by IBM
Systems Director as shown in Example 8-9. To monitor the cluster using the IBM Systems
Director web and command-line interfaces (CLIs), see 8.3, “Collecting information after a
cluster is running” on page 216.
Example 8-9 Common agent subsystems status
seoul:/ # lssrc -a | egrep "cim|platform"
platform_agent
2359482
cimsys
3211362
active
active
busan:/ # lssrc -a | egrep "cim|platform"
platform_agent
3014798
cimsys
2818190
active
active
Cluster status
Before a cluster is configured, the state of every node is NOT_CONFIGURED as shown in
Example 8-10.
Example 8-10 PowerHA cluster status
seoul:/ # lssrc -g cluster
Subsystem
Group
clstrmgrES
cluster
PID
6947066
Status
active
seoul:/ # lssrc -ls clstrmgrES
Current state: NOT_CONFIGURED
sccsid = "$Header: @(#) 61haes_r710_integration/13
43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710
2010-08-19T1
0:34:17-05:00$"
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
205
busan:/ # lssrc -g cluster
Subsystem
Group
clstrmgrES
cluster
PID
3342346
Status
active
busan:/ # lssrc -ls clstrmgrES
Current state: NOT_CONFIGURED
sccsid = "$Header: @(#) 61haes_r710_integration/13
43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710
2010-08-19T1
0:34:17-05:00$"
Modifications in the /etc/syslogd.conf file
During the installation of the PowerHA 7.1 file sets, entries are added to the
/etc/syslogd.conf configuration file as shown in Example 8-11.
Example 8-11 Modifications to the /etc/syslogd.conf file
# PowerHA SystemMirror Critical Messages
local0.crit /dev/console
# PowerHA SystemMirror Informational Messages
local0.info /var/hacmp/adm/cluster.log
# PowerHA SystemMirror Messages from Cluster Scripts
user.notice /var/hacmp/adm/cluster.log
# PowerHA SystemMirror Messages from Cluster Daemons
daemon.notice /var/hacmp/adm/cluster.log
Lines added to the /etc/inittab file
In PowerHA 7.1, the clcomd subsystem has a separate entry in the /etc/inittab file because
the clcomd subsystem is no longer part of the cluster subsystem group. Two entries now exist
as shown in Example 8-12.
Example 8-12 Modification to the /etc/inittab file
clcomd:23456789:once:/usr/bin/startsrc -s clcomd
hacmp:2:once:/usr/es/sbin/cluster/etc/rc.init >/dev/console 2>&1
8.2 Collecting information after a cluster is configured
After the configuration is done and the first cluster synchronization is performed, the CAA
services become available. Also, the administrator can start using the clcmd utility that
distributes every command passed as an argument to all the cluster nodes.
As soon as the configuration is synchronized to all nodes and the CAA cluster is created, the
administrator cannot change the cluster name or the cluster multicast address.
Changing the repository disk: The administrator can change the repository disk with the
procedure for replacing a repository disk provided in the PowerHA 7.1 Release Notes.
206
IBM PowerHA SystemMirror 7.1 for AIX
Disk configuration
During the first successful synchronization, the CAA repository is created over the chosen
disk. In each node, the hdisk device is renamed according to the new cluster unified
nomenclature. Is name changes to caa_private0. The repository volume group is called
caavg_private and is in active state in every node.
After the first synchronization, two other disks are added in the cluster storage by using the
following command:
chcluster -n korea -d+hdisk2,hdisk3
where hdisk2 is renamed to cldisk2, and hdisk3 is renamed to cldisk1. Example 8-13
shows the resulting disk listing.
Example 8-13 Disk listing
seoul:/ # clcmd lspv
------------------------------NODE seoul
------------------------------hdisk0
00c0f6a088a155eb
caa_private0
00c0f6a077839da7
cldisk2
00c0f6a0107734ea
cldisk1
00c0f6a010773532
------------------------------NODE busan
------------------------------hdisk0
00c0f6a089390270
caa_private0
00c0f6a077839da7
cldisk2
00c0f6a0107734ea
cldisk1
00c0f6a010773532
rootvg
caavg_private
None
None
active
active
rootvg
caavg_private
None
None
active
active
Attention: The cluster repository disk is a special device for the cluster. The use of Logical
Volume Manager (LVM) commands over the repository disk is not supported. AIX LVM
commands are single node commands and are not intended for use in a clustered
configuration.
Multicast information
Compared with the multicast information collected when the cluster was not configured, the
netstat command now shows the 228.168.101.43 address in the table (Example 8-14).
Example 8-14 Multicast information
seoul:/ # netstat -a -I en0
Name Mtu
Network
Address
Ipkts Ierrs
en0
1500 link#2
a2.4e.50.54.31.3
70339
0
01:00:5e:28:65:2b
01:00:5e:7f:ff:fd
01:00:5e:00:00:01
en0
1500 192.168.100 seoul-b1
70339
0
228.168.101.43
239.255.255.253
224.0.0.1
Opkts Oerrs Coll
44686
0
0
44686
0
0
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
207
en0
1500
10.168.100
seoul
228.168.101.43
239.255.255.253
224.0.0.1
70339
0
44686
0
0
Cluster status
The cluster status changes from NOT_CONFIGURED to ST_INIT as shown in Example 8-15.
Example 8-15 PowerHA cluster status
busan:/ # lssrc -ls clstrmgrES
Current state: ST_INIT
sccsid = "$Header: @(#) 61haes_r710_integration/13
43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710
2010-08-19T1
0:34:17-05:00$"
CAA subsystem group active
All the CAA subsystems become active after the first cluster synchronization as shown in
Example 8-16.
Example 8-16 CAA subsystems status
seoul:/ # clcmd lssrc -g caa
------------------------------NODE seoul
------------------------------Subsystem
Group
cld
caa
clcomd
caa
clconfd
caa
solidhac
caa
solid
caa
------------------------------NODE busan
------------------------------Subsystem
Group
cld
caa
clcomd
caa
solid
caa
clconfd
caa
solidhac
caa
PID
3735780
5439664
4915418
6947064
5701642
Status
active
active
active
active
active
PID
3211462
2687186
6160402
6488286
5439698
Status
active
active
active
active
active
Subsystem guide:
cld determines whether the local node must become the primary or secondary solidDB
server in a failover.
The solid subsystem is the database engine.
The solidhac subsystem is used for the high availability of the solidDB server.
The clconfd subsystem runs every 10 minutes to put any missed cluster configuration
changes into effect on the local node.
208
IBM PowerHA SystemMirror 7.1 for AIX
Cluster information using the lscluster command
CAA comes with a set of command-line tools, as explained in the following sections, that can
be used to monitor the status and statistics of a running cluster. For more information about
CAA and its functionalities, see Chapter 2, “Features of PowerHA SystemMirror 7.1” on
page 23.
Listing the cluster configuration: -c flag
Example 8-17 shows the cluster configuration by using the lscluster -c command.
Example 8-17 Listing the cluster configuration
seoul:/ # lscluster -c
Cluster query for cluster korea returns:
Cluster uuid: a01f47fe-d089-11df-95b5-a24e50543103
Number of nodes in cluster = 2
Cluster id for node busan is 1
Primary IP address for node busan is 10.168.101.44
Cluster id for node seoul is 2
Primary IP address for node seoul is 10.168.101.43
Number of disks in cluster = 2
for disk cldisk1 UUID = fe1e9f03-005b-3191-a3ee-4834944fcdeb
cluster_major = 0 cluster_minor = 1
for disk cldisk2 UUID = 428e30e8-657d-8053-d70e-c2f4b75999e2
cluster_major = 0 cluster_minor = 2
Multicast address for cluster is 228.168.101.43
Tip: The primary IP address shown for each node is the IP address chosen as the
communication path during cluster definition. In this case, the address is the same IP
address that is used as the persistent IP address.
The multicast address, when not specified by the administrator during cluster creation, is
composed by the number 228 followed by the last three octets of the communication path
from the node where the synchronization is executed. In this particular example, the
synchronization was run from the seoul node that has the communication path
192.168.101.43. Therefore, the multicast address for the cluster becomes 228.168.101.43
as can be observed in the output of lscluster -c command.
Listing the cluster nodes configuration: -m flag
The -m flag has a different output in each node. In the output shown in Example 8-18, clcmd is
used to distribute the command over all cluster nodes.
Example 8-18 Listing the cluster nodes configuration
seoul:/ # clcmd lscluster -m
------------------------------NODE seoul
------------------------------Calling node query for all nodes
Node query number of nodes examined: 2
Node name: busan
Cluster shorthand id for node: 1
uuid for node: e356646e-c0dd-11df-b51d-a24e57e18a03
State of node: UP
Smoothed rtt to node: 7
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
209
Mean Deviation in network rtt to node: 3
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
korea
local
a01f47fe-d089-11df-95b5-a24e50543103
Number of points_of_contact for node: 2
Point-of-contact interface & contact state
en2 UP
en0 UP
-----------------------------Node name: seoul
Cluster shorthand id for node: 2
uuid for node: 4f8858be-c0dd-11df-930a-a24e50543103
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
korea
local
a01f47fe-d089-11df-95b5-a24e50543103
Number of points_of_contact for node: 0
Point-of-contact interface & contact state
n/a
------------------------------NODE busan
------------------------------Calling node query for all nodes
Node query number of nodes examined: 2
Node name: busan
Cluster shorthand id for node: 1
uuid for node: e356646e-c0dd-11df-b51d-a24e57e18a03
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
korea
local
a01f47fe-d089-11df-95b5-a24e50543103
Number of points_of_contact for node: 0
Point-of-contact interface & contact state
n/a
-----------------------------Node name: seoul
Cluster shorthand id for node: 2
uuid for node: 4f8858be-c0dd-11df-930a-a24e50543103
State of node: UP
Smoothed rtt to node: 7
Mean Deviation in network rtt to node: 3
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
210
IBM PowerHA SystemMirror 7.1 for AIX
CLUSTER NAME
korea
TYPE SHID
local
UUID
a01f47fe-d089-11df-95b5-a24e50543103
Number of points_of_contact for node: 2
Point-of-contact interface & contact state
en2 UP
en0 UP
Zone: Example 8-18 on page 209 mentions zones. A zone is a concept that is planned for
use in future versions of CAA, where the node can be part of different groups of machines.
Listing the cluster interfaces: -i flag
The korea cluster is configured with NPIV through the VIOS. To have SAN heartbeating, you
must direct SAN connection through Fibre Channel (FC) adapters. In Example 8-19, a cluster
with such requirements has been used to demonstrate the output.
Example 8-19 Listing the cluster interfaces
sydney:/ # lscluster -i
Network/Storage Interface Query
Cluster Name: au_cl
Cluster uuid: 0252a470-c216-11df-b85d-6a888564f202
Number of nodes reporting = 2
Number of nodes expected = 2
Node sydney
Node uuid = a6ac83d4-c1d4-11df-8953-6a888564f202
Number of interfaces discovered = 4
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 6a.88.85.64.f2.2
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863
ndd flags for interface = 0x21081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 6a.88.85.64.f2.4
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863
ndd flags for interface = 0x21081b
Interface state UP
Number of regular addresses configured on interface = 1
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
211
IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node perth
Node uuid = c89d962c-c1d4-11df-aa87-6a888dd67502
Number of interfaces discovered = 4
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 6a.88.8d.d6.75.2
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863
ndd flags for interface = 0x21081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 6a.88.8d.d6.75.4
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863
ndd flags for interface = 0x21081b
Interface state UP
Number of regular addresses configured on interface = 1
212
IBM PowerHA SystemMirror 7.1 for AIX
IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
rtt: The round-trip time (rtt) is calculated by using a mean deviation formula. Some
commands show rrt instead of rtt, which is believed to be a typographic error in the
command.
sfwcom: Storage Framework Communication (sfwcom) is the interface created by CAA for
SAN heartbeating. To enable sfwcom, the following prerequisites must be in place:
Each node must have either a 4 GB or 8 GB FC adapter. If you are using vSCSI or
NPIV, VIOS 2.2.0.11-FP24 SP01 is the minimum level required.
The adapters used for SAN heartbeating must have the tme (target mode enabled)
parameter set to yes. The Fibre Channel controller must have the parameter dyntrk set
to yes, and the parameter fc_err_recov set to fast_fail.
All the adapters participating in the heartbeating must be in the same fabric zone. In the
previous example, sydney-fcs0 and perth-fcs0 are in the same fabric zone;
sydney-fcs1 and perth-fcs1 are in the same fabric zone.
dpcomm: The dpcomm interface is the actual repository disk. It means that, on top of the
Ethernet and the Fibre Channel adapters, the cluster also uses the repository disk as a
physical media to exchange heartbeats among the nodes.
Excluding configured interfaces: Currently you cannot exclude configured interfaces
from being used for cluster monitoring and communication. All network interfaces are used
for cluster monitoring and communication.
Listing the cluster storage interfaces: -d flag
Example 8-20 shows all storage disks that are participating in the cluster, including the
repository disk.
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
213
Example 8-20 Listing cluster storage interfaces
seoul:/ # clcmd lscluster -d
------------------------------NODE seoul
------------------------------Storage Interface Query
Cluster Name: korea
Cluster uuid: a01f47fe-d089-11df-95b5-a24e50543103
Number of nodes reporting = 2
Number of nodes expected = 2
Node seoul
Node uuid = 4f8858be-c0dd-11df-930a-a24e50543103
Number of disk discovered = 3
cldisk2
state : UP
uDid : 3E213600A0B8000114632000009554C8E0B010F1815
uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2
type : CLUSDISK
cldisk1
state : UP
uDid : 3E213600A0B8000291B080000E90C05B0CD4B0F1815
uUid : fe1e9f03-005b-3191-a3ee-4834944fcdeb
type : CLUSDISK
caa_private0
state : UP
uDid :
uUid : 03e41dc1-3b8d-c422-3426-f1f61c567cda
type : REPDISK
Node seoul
Node uuid = 4f8858be-c0dd-11df-930a-a24e50543103
Number of disk discovered = 3
cldisk2
state : UP
uDid : 3E213600A0B8000114632000009554C8E0B010F1815
uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2
type : CLUSDISK
cldisk1
state : UP
uDid : 3E213600A0B8000291B080000E90C05B0CD4B0F1815
uUid : fe1e9f03-005b-3191-a3ee-4834944fcdeb
type : CLUSDISK
caa_private0
state : UP
uDid :
uUid : 03e41dc1-3b8d-c422-3426-f1f61c567cda
type : REPDISK
------------------------------NODE busan
------------------------------Storage Interface Query
Cluster Name:
Cluster uuid:
214
korea
a01f47fe-d089-11df-95b5-a24e50543103
IBM PowerHA SystemMirror 7.1 for AIX
FAStT03IBMfcp
FAStT03IBMfcp
FAStT03IBMfcp
FAStT03IBMfcp
Number of nodes reporting = 2
Number of nodes expected = 2
Node busan
Node uuid = e356646e-c0dd-11df-b51d-a24e57e18a03
Number of disk discovered = 3
cldisk1
state : UP
uDid : 3E213600A0B8000291B080000E90C05B0CD4B0F1815
uUid : fe1e9f03-005b-3191-a3ee-4834944fcdeb
type : CLUSDISK
cldisk2
state : UP
uDid : 3E213600A0B8000114632000009554C8E0B010F1815
uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2
type : CLUSDISK
caa_private0
state : UP
uDid :
uUid : 03e41dc1-3b8d-c422-3426-f1f61c567cda
type : REPDISK
Node busan
Node uuid = e356646e-c0dd-11df-b51d-a24e57e18a03
Number of disk discovered = 3
cldisk1
state : UP
uDid : 3E213600A0B8000291B080000E90C05B0CD4B0F1815
uUid : fe1e9f03-005b-3191-a3ee-4834944fcdeb
type : CLUSDISK
cldisk2
state : UP
uDid : 3E213600A0B8000114632000009554C8E0B010F1815
uUid : 428e30e8-657d-8053-d70e-c2f4b75999e2
type : CLUSDISK
caa_private0
state : UP
uDid :
uUid : 03e41dc1-3b8d-c422-3426-f1f61c567cda
type : REPDISK
FAStT03IBMfcp
FAStT03IBMfcp
FAStT03IBMfcp
FAStT03IBMfcp
Listing the network statistics: -s flag
Example 8-21 shows overall statistics about cluster heartbeating and the gossip protocol
used for nodes communication.
Example 8-21 Listing the network statistics
seoul:/ # lscluster -s
Cluster Statistics:
Cluster Network Statistics:
pkts seen:194312
IP pkts:126210
gossip pkts sent:22050
cluster address pkts:0
bad transmits:0
pkts passed:66305
UDP pkts:127723
gossip pkts recv:64076
CP pkts:127497
bad posts:0
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
215
short pkts:0
cluster wide errors:0
dup pkts:3680
fragments queued:0
requests dropped:0
pkts pulled:0
rxmit requests recv:21
requests missed:0
requests reset sent:0
requests lnk reset send :0
rxmit requests sent:5
alive pkts sent:0
ahafs pkts sent:17
nodedown pkts sent:0
socket pkts sent:733
cwide pkts sent:230
socket pkts no space:0
stale pkts recv:0
storage pkts sent:1
storage
out-of-range pkts recv:0
multicast pkts:127768
bad pkts:0
pkt fragments:0
fragments freed:0
pkts routed:0
no memory:0
requests found:21
ooo pkts:2
reset recv:0
reset lnk recv:0
alive pkts recv:0
ahafs pkts recv:7
nodedown pkts recv:0
socket pkts recv:414
cwide pkts recv:230
pkts recv notforhere:0
other cluster pkts:0
pkts recv:1
8.3 Collecting information after a cluster is running
Up to this point, all the examples in this chapter collected information about a non-running
PowerHA 7.1 cluster. This section explains how to obtain valuable information from a
configured and running cluster.
WebSMIT: WebSMIT is no longer a supported tool.
8.3.1 AIX commands and log files
AIX 7.1, which is used in the korea cluster, provides a set of tools that can be used to collect
relevant information about the cluster, cluster services, and cluster device status. This section
shows examples of that type of information.
Disk configuration
All the volume groups controlled by a resource group are shown as concurrent on both sides
as shown in Example 8-22.
Example 8-22 Listing disks
seoul:/ # clcmd lspv
------------------------------NODE seoul
------------------------------hdisk0
00c0f6a088a155eb
caa_private0
00c0f6a077839da7
cldisk2
00c0f6a0107734ea
cldisk1
00c0f6a010773532
------------------------------NODE busan
216
IBM PowerHA SystemMirror 7.1 for AIX
rootvg
caavg_private
pokvg
pokvg
active
active
concurrent
concurrent
------------------------------hdisk0
00c0f6a089390270
caa_private0
00c0f6a077839da7
cldisk2
00c0f6a0107734ea
cldisk1
00c0f6a010773532
rootvg
caavg_private
pokvg
pokvg
active
active
concurrent
concurrent
Multicast information
When compared with the multicast information collected when the cluster is not configured,
the netstat command shows that the 228.168.101.43 address is present in the table. See
Example 8-23.
Example 8-23 Multicast information
seoul:/ # netstat -a -I en0
Name Mtu
Network
Address
Ipkts Ierrs
en0
1500 link#2
a2.4e.50.54.31.3
82472
0
01:00:5e:28:65:2b
01:00:5e:7f:ff:fd
01:00:5e:00:00:01
en0
1500 192.168.100 seoul-b1
82472
0
228.168.101.43
239.255.255.253
224.0.0.1
en0
1500 10.168.100 seoul
82472
0
228.168.101.43
239.255.255.253
224.0.0.1
seoul:/ # netstat -a -I en2
Name Mtu
Network
Address
Ipkts Ierrs
en2
1500 link#3
a2.4e.50.54.31.7
44673
0
01:00:5e:7f:ff:fd
01:00:5e:28:65:2b
01:00:5e:00:00:01
en2
1500 192.168.200 seoul-b2
44673
0
239.255.255.253
228.168.101.43
224.0.0.1
en2
1500 10.168.100 poksap-db
44673
0
239.255.255.253
228.168.101.43
224.0.0.1
Opkts Oerrs Coll
53528
0
0
53528
0
0
53528
0
0
Opkts Oerrs Coll
22119
0
0
22119
0
0
22119
0
0
Status of the cluster
When the PowerHA cluster is running, its status changes from ST_INIT to ST_STABLE as
shown in Example 8-24.
Example 8-24 PowerHA cluster status
seoul:/ # lssrc -ls clstrmgrES
Current state: ST_STABLE
sccsid = "$Header: @(#) 61haes_r710_integration/13
43haes/usr/sbin/cluster/hacmprd/main.C, hacmp, 61haes_r710, 1034A_61haes_r710
2010-08-19T1
0:34:17-05:00$"
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
217
i_local_nodeid 1, i_local_siteid -1, my_handle 2
ml_idx[1]=0
ml_idx[2]=1
There are 0 events on the Ibcast queue
There are 0 events on the RM Ibcast queue
CLversion: 12 #
Note: Version 12 represents PowerHA SystemMirror 7.1
local node vrmf is 7101
cluster fix level is "1"
The following timer(s) are currently active:
Current DNP values
DNP Values for NodeId - 1 NodeName - busan
PgSpFree = 1308144 PvPctBusy = 0 PctTotalTimeIdle = 98.105654
DNP Values for NodeId - 2 NodeName - seoul
PgSpFree = 1307899 PvPctBusy = 0 PctTotalTimeIdle = 96.912367
Group Services information
Previous versions of PowerHA use the grpsvcs subsystem. PowerHA 7.1 uses the cthags
subsystem. The output of the lssrc -ls cthags command has similar information to what
used to be presented by the lssrc -ls grpsvcs command. Example 8-25 shows this output.
Example 8-25 Output of the lssrc -ls cthags command
seoul:/ # lssrc -ls cthags
Subsystem
Group
PID
Status
cthags
cthags
6095048
active
5 locally-connected clients. Their PIDs:
6160578(IBM.ConfigRMd) 1966256(rmcd) 3604708(IBM.StorageRMd) 7078046(clstrmgr)
14680286(gsclvmd)
HA Group Services domain information:
Domain established by node 1
Number of groups known locally: 8
Number of
Number of local
Group name
providers
providers/subscribers
rmc_peers
2
1
0
s00O3RA00009G0000015CDBQGFL
2
1
0
IBM.ConfigRM
2
1
0
IBM.StorageRM.v1
2
1
0
CLRESMGRD_1108531106
2
1
0
CLRESMGRDNPD_1108531106
2
1
0
CLSTRMGR_1108531106
2
1
0
d00O3RA00009G0000015CDBQGFL
2
1
0
Critical clients will be terminated if unresponsive
Network configuration and routing table
The service IP address is added to an interface on the node where the resource group is
started. The routing table also keeps the service IP address. The multicast address is not
displayed in the routing table. See Example 8-26.
Example 8-26 Network configuration and routing table
seoul:/ # clcmd ifconfig -a
------------------------------NODE seoul
-------------------------------
218
IBM PowerHA SystemMirror 7.1 for AIX
en0: en0:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.101.143 netmask 0xffffff00 broadcast 192.168.103.255
inet 10.168.101.43 netmask 0xffffff00 broadcast 10.168.103.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
en2:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.201.143 netmask 0xffffff00 broadcast 192.168.203.255
inet 10.168.101.143 netmask 0xffffff00 broadcast 10.168.103.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
lo0:
flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LAR
GESEND,CHAIN>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
------------------------------NODE busan
------------------------------en0:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.101.144 netmask 0xffffff00 broadcast 192.168.103.255
inet 10.168.101.44 netmask 0xffffff00 broadcast 10.168.103.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
en2:
flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT
,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
inet 192.168.201.144 netmask 0xffffff00 broadcast 192.168.203.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
lo0:
flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LAR
GESEND,CHAIN>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
seoul:/ # clcmd netstat -rn
------------------------------NODE seoul
------------------------------Routing tables
Destination
Gateway
Flags
Route tree for Protocol Family 2 (Internet):
default
192.168.100.60
UG
10.168.100.0
10.168.101.43
UHSb
10.168.100.0
10.168.101.143
UHSb
10.168.100/22
10.168.101.43
U
10.168.100/22
10.168.101.143
U
10.168.101.43
127.0.0.1
UGHS
10.168.101.143
127.0.0.1
UGHS
10.168.103.255
10.168.101.43
UHSb
Refs
Use
If
1
0
0
7
3
10
1
0
4187
0
0
56800
1770
33041
72
0
en0
en0
en2
en0
en2
lo0
lo0
en0
Exp
Groups
-
-
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
=>
=>
=>
=>
219
10.168.103.255
127/8
192.168.100.0
192.168.100/22
192.168.101.143
192.168.103.255
192.168.200.0
192.168.200/22
192.168.201.143
192.168.203.255
10.168.101.143
127.0.0.1
192.168.101.143
192.168.101.143
127.0.0.1
192.168.101.143
192.168.201.143
192.168.201.143
127.0.0.1
192.168.201.143
UHSb
U
UHSb
U
UGHS
UHSb
UHSb
U
UGHS
UHSb
en2
lo0
en0
en0
lo0
en0
en2
en2
lo0
en2
-
-
Route tree for Protocol Family 24 (Internet v6):
::1%1
::1%1
UH
2
4180 lo0
-
-
------------------------------NODE busan
------------------------------Routing tables
Destination
Gateway
Refs
Use
If
1
0
23
10
0
19
0
3
0
2
0
0
0
0
2012
0
54052
5706
0
3803
0
1953
14
27
0
2
4
0
en0
en0
en0
lo0
en0
lo0
en0
en0
lo0
en0
en2
en2
lo0
en2
-
-
876 lo0
-
-
Flags
Route tree for Protocol Family 2 (Internet):
default
192.168.100.60
UG
10.168.100.0
10.168.101.44
UHSb
10.168.100/22
10.168.101.44
U
10.168.101.44
127.0.0.1
UGHS
10.168.103.255
10.168.101.44
UHSb
127/8
127.0.0.1
U
192.168.100.0
192.168.101.144
UHSb
192.168.100/22
192.168.101.144
U
192.168.101.144
127.0.0.1
UGHS
192.168.103.255
192.168.101.144
UHSb
192.168.200.0
192.168.201.144
UHSb
192.168.200/22
192.168.201.144
U
192.168.201.144
127.0.0.1
UGHS
192.168.203.255
192.168.201.144
UHSb
0
15
0
2
0
0
0
0
0
0
Route tree for Protocol Family 24 (Internet v6):
::1%1
::1%1
UH
6
0
16316
0
1201
18
43
0
2
4
0
Exp
=>
=>
Groups
=>
=>
=>
Using tcpdump, iptrace, and mping utilities to monitor multicast traffic
With the introduction of the multicast address and the gossip protocol, the cluster
administrator can use tools to monitor Ethernet heartbeating. The following sections explain
how to use the native tcpdump, iptrace, and mping native AIX tools for this type of monitoring.
The tcpdump utility
You can dump all the traffic between the seoul node and the multicast address
228.168.101.43 by using the tcpdump utility. Observe that the UDP packets originate in the
base or boot addresses of the interfaces, not in the persistent or service IP labels.
Example 8-27 shows how to list the available interfaces and then capture traffic for the en2
interface.
Example 8-27 Multicast packet monitoring for the seoul node using the tcpdump utility
seoul:/ # tcpdump -D
1.en0
220
IBM PowerHA SystemMirror 7.1 for AIX
2.en2
3.lo0
seoul:/ # tcpdump -t -i2 -v ip and host 228.168.101.43
tcpdump: listening on en0, link-type 1, capture size 96 bytes
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP
seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP
seoul-b2.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x02 ttl 32, id 0, offset 0, flags [none], proto: UDP
seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP
seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP
seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP
seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP
seoul-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP
seoul-b2.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
(17), length: 1478)
(17), length: 1478)
(17), length: 1478)
(17), length: 1478)
(17), length: 1478)
(17), length: 1478)
(17), length: 1478)
(17), length: 1478)
The same information is captured on the busan node as shown in Example 8-28.
Example 8-28 Multicast packet monitoring for the busan node using the tcpdump utility
busan:/tmp # tcpdump -D
1.en0
2.en2
3.lo0
busan:/ # tcpdump -t -i2 -v ip and host 228.168.101.43
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP
busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP
busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP
busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP
busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP
busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP
busan-b2.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP
busan-b2.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
IP (tos 0x0, ttl 32, id 0, offset 0, flags [none], proto: UDP
busan-b1.drmsfsd > 228.168.101.43.drmsfsd: UDP, length 1450
(17), length: 1478)
(17), length: 1478)
(17), length: 1478)
(17), length: 1478)
(17), length: 1478)
(17), length: 1478)
(17), length: 1478)
(17), length: 1478)
You can also see the multicast traffic for all the PowerHA 7.1 clusters in your LAN segment.
The following command generates the output:
seoul:/ # tcpdump -n -vvv port drmsfsd
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
221
The iptrace utility
The iptrace utility provides a more detailed packet tracing information compared to the
tcpdump utility. Both the en0 (MAC address A24E50543103) and en2 (MAC address
A24E50543107) interfaces are generating packets toward the cluster multicast address
228.168.101.43 as shown in Example 8-29.
Example 8-29 The iptrace utility for monitoring multicast packets
seoul:/tmp # iptrace -a -s 228.168.101.43 -b korea_cluster.log; sleep 30
[10289364]
seoul:/tmp # kill -9 10289364
seoul:/tmp # /usr/sbin/ipreport korea_cluster.log | more
IPTRACE version: 2.0
====( 1492 bytes transmitted on interface en0 )==== 12:49:17.384871427
ETHERNET packet : [ a2:4e:50:54:31:03 -> 01:00:5e:28:65:2b ] type 800 (IP)
IP header breakdown:
< SRC = 192.168.101.143 > (seoul-b1)
< DST = 228.168.101.43 >
ip_v=4, ip_hl=20, ip_tos=0, ip_len=1478, ip_id=0, ip_off=0
ip_ttl=32, ip_sum=251c, ip_p = 17 (UDP)
UDP header breakdown:
<source port=4098(drmsfsd), <destination port=4098(drmsfsd) >
[ udp length = 1458 | udp checksum = 0 ]
00000000
00000009 100234c8 00000030 00000000
|......4....0....|
00000010
1be40fb0 c19311df 920ca24e 50543103
|...........NPT1.|
********
00000030
ffffffff ffffffff ffffffff ffffffff
|................|
00000040
00001575 00000000 00000000 00000000
|...u............|
00000050
00000000 00000003 00000000 00000000
|................|
00000060
00000000 00000000 00020001 00020fb0
|................|
00000070
c19311df 1be40fb0 c19311df 920ca24e
|...............N|
00000080
50543103 0000147d 00000000 4f8858be
|PT1....}....O.X.|
00000090
c0dd11df 930aa24e 50543103 00000000
|.......NPT1.....|
000000a0
00000000 00000000 00000000 00000000
|................|
********
000005a0
00000000 00000000 0000
|..........
|
====( 1492 bytes transmitted on interface en0 )==== 12:49:17.388085181
ETHERNET packet : [ a2:4e:50:54:31:03 -> 01:00:5e:28:65:2b ] type 800 (IP)
IP header breakdown:
< SRC = 192.168.101.143 > (seoul-b1)
< DST = 228.168.101.43 >
ip_v=4, ip_hl=20, ip_tos=0, ip_len=1478, ip_id=0, ip_off=0
ip_ttl=32, ip_sum=251c, ip_p = 17 (UDP)
UDP header breakdown:
<source port=4098(drmsfsd), <destination port=4098(drmsfsd) >
[ udp length = 1458 | udp checksum = 0 ]
00000000
00000004 10021002 00000070 00000000
|...........p....|
00000010
1be40fb0 c19311df 920ca24e 50543103
|...........NPT1.|
********
00000030
ffffffff ffffffff ffffffff ffffffff
|................|
00000040
00001575 00000000 00000000 00000000
|...u............|
00000050
f1000815 b002b8a0 00000000 00000000
|................|
00000060
00000000 00000000 0002ffff 00010000
|................|
222
IBM PowerHA SystemMirror 7.1 for AIX
00000070
00000080
00000090
000000a0
000000b0
000000c0
000000d0
000000e0
********
000005a0
00000000
00000000
00000000
00000000
00000000
00000001
50543103
00000000
00000000
00000d7a
00000000
00020000
00000000
4f8858be
00000001
00000000
00000000
00000000
00000000
00000000
00000000
c0dd11df
00000000
00000000
00000000 00000000 0000
00000000
00000000
00000000
00000000
00001575
930aa24e
00000000
00000000
|................|
|.......z........|
|................|
|................|
|...............u|
|....O.X........N|
|PT1.............|
|................|
|..........
|
====( 1492 bytes transmitted on interface en2 )==== 12:49:17.394219029
ETHERNET packet : [ a2:4e:50:54:31:07 -> 01:00:5e:28:65:2b ] type 800 (IP)
IP header breakdown:
< SRC = 192.168.201.143 > (seoul-b2)
< DST = 228.168.101.43 >
ip_v=4, ip_hl=20, ip_tos=0, ip_len=1478, ip_id=0, ip_off=0
ip_ttl=32, ip_sum=c11b, ip_p = 17 (UDP)
UDP header breakdown:
<source port=4098(drmsfsd), <destination port=4098(drmsfsd) >
[ udp length = 1458 | udp checksum = 0 ]
00000000
00000009 100234c8 00000030 00000000
|......4....0....|
00000010
a01f47fe d08911df 95b5a24e 50543103
|..G........NPT1.|
********
00000030
ffffffff ffffffff ffffffff ffffffff
|................|
00000040
00000fab 00000000 00000000 00000000
|................|
00000050
00000000 00000003 00000000 00000000
|................|
00000060
00000000 00000000 00020001 000247fe
|..............G.|
00000070
d08911df a01f47fe d08911df 95b5a24e
|......G........N|
00000080
50543103 000014b4 00000000 4f8858be
|PT1.........O.X.|
00000090
c0dd11df 930aa24e 50543103 00000000
|.......NPT1.....|
000000a0
00000000 00000000 00000000 00000000
|................|
********
000005a0
00000000 00000000 0000
|..........
|
.
.
.
Tip: You can observer the multicast address in the last line of the lscluster -c CAA
command.
The mping utility
You can also use the mping utility to test the multicast connectivity. One node acts as a sender
of packets, and the other node acts as a receiver of packets. You trigger the command on
both nodes at the same time as shown in Example 8-30.
Example 8-30 Using the mping utility to test multicast connectivity
seoul:/ # mping -v -s -a 228.168.101.43
mping version 1.0
Localhost is seoul, 10.168.101.43
mpinging 228.168.101.43/4098 with ttl=32:
32 bytes from 10.168.101.44: seqno=1 ttl=32 time=0.260 ms
32 bytes from 10.168.101.44: seqno=1 ttl=32 time=0.326 ms
32 bytes from 10.168.101.44: seqno=1 ttl=32 time=0.344 ms
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
223
32
32
32
32
32
32
32
32
32
bytes
bytes
bytes
bytes
bytes
bytes
bytes
bytes
bytes
from
from
from
from
from
from
from
from
from
10.168.101.44:
10.168.101.44:
10.168.101.44:
10.168.101.44:
10.168.101.44:
10.168.101.44:
10.168.101.44:
10.168.101.44:
10.168.101.44:
seqno=1
seqno=2
seqno=2
seqno=2
seqno=2
seqno=3
seqno=3
seqno=3
seqno=3
ttl=32
ttl=32
ttl=32
ttl=32
ttl=32
ttl=32
ttl=32
ttl=32
ttl=32
time=0.361
time=0.235
time=0.261
time=0.299
time=0.317
time=0.216
time=0.262
time=0.282
time=0.300
busan:/ # mping -v -r -a 228.168.101.43
mping version 1.0
Localhost is busan, 10.168.101.44
Listening on 228.168.101.43/4098:
Replying to mping from 10.168.101.43 bytes=32
Replying to mping from 10.168.101.43 bytes=32
Discarding receiver packet
Discarding receiver packet
Replying to mping from 10.168.101.43 bytes=32
Replying to mping from 10.168.101.43 bytes=32
Discarding receiver packet
Discarding receiver packet
Replying to mping from 10.168.101.43 bytes=32
Replying to mping from 10.168.101.43 bytes=32
Discarding receiver packet
Discarding receiver packet
ms
ms
ms
ms
ms
ms
ms
ms
ms
seqno=1 ttl=32
seqno=1 ttl=32
seqno=2 ttl=32
seqno=2 ttl=32
seqno=3 ttl=32
seqno=3 ttl=32
8.3.2 CAA commands and log files
This section explains the commands specifically for gathering CAA-related information and
the associated log files.
Cluster information
The CAA comes with a set of command-line tools, as explained in “Cluster information using
the lscluster command” on page 209. These tools can be used to monitor the status and
statistics of a running cluster. For more information about CAA and its functionalities, see
Chapter 2, “Features of PowerHA SystemMirror 7.1” on page 23.
Cluster repository disk, CAA, and solidDB
This section provides additional information about the cluster repository disk, CAA, and
solidDB.
UUID
The UUID of the caa_private0 disk is stored as a cluster0 device attribute as shown in
Example 8-31.
Example 8-31 The cluster0 device attributes
seoul:/ # lsattr -El cluster0
clvdisk
03e41dc1-3b8d-c422-3426-f1f61c567cda Cluster repository disk identifier True
node_uuid 4f8858be-c0dd-11df-930a-a24e50543103 OS image identifier
True
Example 8-32 also shows the UUID.
224
IBM PowerHA SystemMirror 7.1 for AIX
Example 8-32 UUID
caa_private0
state
uDid
uUid
type
: UP
:
: 03e41dc1-3b8d-c422-3426-f1f61c567cda
: REPDISK
The repository disk contains logical volumes for the bootstrap and solidDB file systems as
shown in Example 8-33.
Example 8-33 Repository logical volumes
seoul:/ # lsvg -l caavg_private
caavg_private:
LV NAME
TYPE
LPs
caalv_private1
boot
1
caalv_private2
boot
1
caalv_private3
boot
4
fslv00
jfs2
4
fslv01
jfs2
4
powerha_crlv
boot
1
PPs
1
1
4
4
4
1
PVs
1
1
1
1
1
1
LV STATE
closed/syncd
closed/syncd
open/syncd
closed/syncd
open/syncd
closed/syncd
MOUNT POINT
N/A
N/A
N/A
/clrepos_private1
/clrepos_private2
N/A
Querying the bootstrap repository
Example 8-34 shows the bootstrap repository.
Example 8-34 Querying the bootstrap repository
seoul:/ # /usr/lib/cluster/clras dumprepos
HEADER
CLUSRECID:
0xa9c2d4c2
Name:
korea
UUID:
a01f47fe-d089-11df-95b5-a24e50543103
SHID:
0x0
Data size:
1536
Checksum:
0xc197
Num zones:
0
Dbpass: a0305b84_d089_11df_95b5_a24e50543103
Multicast:
228.168.101.43
DISKS
name
cldisk1
FAStT03IBMfcp
cldisk2
FAStT03IBMfcp
devno
1
2
uuid
fe1e9f03-005b-3191-a3ee-4834944fcdeb
udid
3E213600A0B8000291B080000E90C05B0CD4B0F1815
428e30e8-657d-8053-d70e-c2f4b75999e2
3E213600A0B8000114632000009554C8E0B010F1815
NODES
numcl
0
0
numz
0
0
uuid
4f8858be-c0dd-11df-930a-a24e50543103
e356646e-c0dd-11df-b51d-a24e57e18a03
shid
2
1
name
seoul
busan
ZONES
none
The solidDB status
You can use the command shown in Example 8-35 to check which node currently hosts the
active solidDB database.
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
225
Example 8-35 The solidDB status
seoul:/ # clcmd /opt/cluster/solidDB/bin/solcon -x pwdfile:/etc/cluster/dbpass -e "hsb state"
"tcp 2188" caa
------------------------------NODE seoul
------------------------------IBM solidDB Remote Control - Version 6.5.0.0 Build 0010
(c) Solid Information Technology Ltd. 1993, 2009
SECONDARY ACTIVE
------------------------------NODE busan
------------------------------IBM solidDB Remote Control - Version 6.5.0.0 Build 0010
(c) Solid Information Technology Ltd. 1993, 2009
PRIMARY ACTIVE
Tip: The solidDB database is not necessarily active in the same node where the PowerHA
resource group is active. You can see this difference when comparing Example 8-35 with
the output of the clRGinfo command:
seoul:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------db2pok_Resourc ONLINE
seoul
OFFLINE
busan
In this case, the solidDB database has the primary database active in the busan node, and
the PowerHA resource group is currently settled in the seoul node.
Another way to check which node has solidDB active is to use the lssrc command.
Example 8-36 shows that solidDB is active in the seoul node. Observe the line that says
“Group Leader.”
Example 8-36 Using the lssrc command to check where solidDB is active
seoul:/ # lssrc -ls
Subsystem
:
PID
:
Cluster Name
:
Node Number
:
Daemon start time :
IBM.StorageRM
IBM.StorageRM
7077950
korea
2
10/05/10 10:06:57
PeerNodes: 2
QuorumNodes: 2
Group IBM.StorageRM.v1:
ConfigVersion: 0x24cab3184
Providers: 2
QuorumMembers: 2
Group Leader: seoul, 0xdc82faf0908920dc, 2
Information from malloc about memory use:
Total Space
: 0x00be0280 (12452480)
Allocated Space: 0x007ec198 (8307096)
226
IBM PowerHA SystemMirror 7.1 for AIX
Unused Space
: 0x003ed210 (4117008)
Freeable Space : 0x00000000 (0)
Information about trace levels:
_SEU Errors=255 Info=0 API=0 Buffer=0 SvcTkn=0 CtxTkn=0
_SEL Errors=255 Info=0 API=0 Buffer=0 Perf=0
_SEI Error=0 API=0 Mapping=0 Milestone=0 Diag=0
_SEA Errors=255 Info=0 API=0 Buffer=0 SVCTKN=0 CTXTKN=0
_MCA Errors=255 Info=0 API=0 Callbacks=0 Responses=0 RspPtrs=0
Protocol=0 APItoProto=0 PrototoRsp=0 CommPath=0 Thread=0 ThreadCtrl=0
RawProtocol=0 Signatures=0
_RCA RMAC_SESSION=0 RMAC_COMMANDGROUP=0 RMAC_REQUEST=0 RMAC_RESPONSE=0
RMAC_CALLBACK=0
_CAA Errors=255 Info=0 Debug=0 AUA_Blobs=0 AHAFS_Events=0
_GSA Errors=255 Info=2 GSCL=0 Debug=0
_SRA API=0 Errors=255 Wherever=0
_RMA Errors=255 Info=0 API=0 Thread=0 Method=0 Object=0
Protocol=0 Work=0 CommPath=0
_SKD Errors=255 Info=0 Debug=0
_SDK Errors=255 Info=0 Exceptions=0
_RMF Errors=255 Info=2 Debug=0
_STG Errors=255 Info=1 Event=1 Debug=0
/var/ct/2W7qV~q8aHtvMreavGL343/log/mc/IBM.StorageRM/trace -> spooling not enabled
Using the solidDB SQL interface
You can also retrieve some information shown by the lscluster command by using the
solidDB SQL interface as shown in Example 8-37 and Example 8-38 on page 228.
Example 8-37 The solidDB SQL interface (view from left side of code)
seoul:/ # /opt/cluster/solidDB/bin/solsql -x pwdfile:/etc/cluster/dbpass "tcp 2188" caa
IBM solidDB SQL Editor (teletype) - Version: 6.5.0.0 Build 0010
(c) Solid Information Technology Ltd. 1993, 2009
Connected to 'tcp 2188'.
Execute SQL statements terminated by a semicolon.
Exit by giving command: exit;
list schemas;
RESULT
-----Catalog: CAA
SCHEMAS:
-------CAA
35193956_C193_11DF_A3EA_A24E50543103
36FC3B56_C193_11DF_A29A_A24E50543103
1 rows fetched.
list tables;
RESULT
-----Catalog: CAA
Schema: CAA
TABLES:
------CLUSTERS
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
227
NODES
REPOSNAMESPACE
REPOSSTORES
SHAREDDISKS
INTERFACES
INTERFACE_ATTRS
PARENT_CHILD
ENTITIES
1 rows fetched.
select * from clusters;
CLUSTER_ID CLUSTER_NAME
ETYPE
ESUBTYPE
---------- ----------------------1
SIRCOL_UNKNOWN 4294967296 32
2
korea
4294967296 32
2 rows fetched.
select * from nodes;
NODES_ID
-------1
2
NODE_NAME
--------busan
seoul
ETYPE
----8589934592
8589934592
ESUBTYPE
-------0
0
GLOB_ID
UUID
---------4294967297 00000000-0000-0000-0000-000000000000
4294967296 a01f47fe-d089-11df-95b5-a24e50543103
GLOB_ID
------8589934593
85899345944
UUID
---e356646e-c0dd-11df-b51d-a24e57e18a03
f8858be-c0dd-11df-930a-a24e50543103
2 rows fetched.
select * from SHAREDDISKS;
SHARED_DISK_ID DISK_NAME
-------------- --------1
cldisk2
2
cldisk1
2 rows fetched.
ETYPE
----34359738368
34359738368
GLOB_ID
------34359738370
34359738369
UUID
---428e30e8-657d-8053-d70e-c2f4b75999e2
fe1e9f03-005b-3191-a3ee-4834944fcdeb
Example 8-38 Using the solidDB SQL interface (view from right side starting at CLUSTER_ID row)
VERIFIED_STATUS
--------------NULL
NULL
ESTATE
-----1
1
VERSION_OPERATING
----------------1
1
VERIFIED_STATUS
--------------NULL
NULL
PARENT_CLUSTER_ID
----------------2
2
VERIFIED_STATUS
--------------NULL
NULL
PARENT_CLUSTER_ID
----------------2
2
VERSION_CAPABLE
--------------1
1
MULTICAST
--------0
0
ESTATE
-----1
1
VERSION_OPERATING
----------------1
1
VERSION_CAPABLE
--------------1
1
ESTATE
-----1
1
VERSION_OPERATING
----------------1
1
VERSION_CAPABLE
--------------1
1
SIRCOL: SIRCOL stands for Storage Interconnected Resource Collection.
228
IBM PowerHA SystemMirror 7.1 for AIX
The /var/adm/ras/syslog.caa log file
The mkcluster, chcluster and rmcluster commands (and their underlying APIs) use the
syslogd daemon for error logging. The cld and clconfd daemons and the clusterconf
command also use syslogd facility for error logging. For that purpose, when PowerHA 7.1 file
sets are installed, the following line is added to the /etc/syslog.conf file:
*.info /var/adm/ras/syslog.caa rotate size 1m files 10
This file keeps all the logs about CAA activity, including the error outputs from the commands.
Example 8-39 shows an error caught in the /var/adm/ras/syslog.caa file during the cluster
definition. The chosen repository disk has already been part of a repository in the past and
had not been cleaned up.
Example 8-39 Output of the /var/adm/ras/syslog.caa file
Sep 16 08:58:14 seoul user:err|error syslog: validate_device: Specified device,
hdisk1, is a repository.
Sep 16 08:58:14 seoul user:warn|warning syslog: To force cleanup of this disk, use
rmcluster -r hdisk1
# It also keeps track of all PowerHA SystemMirror events. Example:
Sep 16 09:40:40 seoul user:notice PowerHA SystemMirror for AIX: EVENT
acquire_service_addr 0
Sep 16 09:40:42 seoul user:notice PowerHA SystemMirror for AIX: EVENT
rg_move seoul 1 ACQUIRE 0
Sep 16 09:40:42 seoul user:notice PowerHA SystemMirror for AIX: EVENT
rg_move_acquire seoul 1 0
Sep 16 09:40:42 seoul user:notice PowerHA SystemMirror for AIX: EVENT
rg_move_complete seoul 1
Sep 16 09:40:42 seoul user:notice PowerHA SystemMirror for AIX: EVENT
rg_move_complete seoul 1 0
Sep 16 09:40:44 seoul user:notice PowerHA SystemMirror for AIX: EVENT
node_up_complete seoul
Sep 16 09:40:44 seoul user:notice PowerHA SystemMirror for AIX: EVENT
node_up_complete seoul 0
COMPLETED:
COMPLETED:
COMPLETED:
START:
COMPLETED:
START:
COMPLETED:
Tip: To capture debug information, you can replace *.info with *.debug in the
/etc/syslog.conf file, followed by a syslogd daemon refresh. Given that the output in
debug mode provides much information, redirect the syslogd output to a file system other
than /, /var, or /tmp.
The solidDB log files
The solidDB daemons keep log files on file systems over the repository disk in every node
inside the solidDB directory as shown in Example 8-40.
Example 8-40 The solidDB log files and directories
seoul:/ # lsvg -l caavg_private
caavg_private:
LV NAME
TYPE
LPs
caalv_private1
boot
1
caalv_private2
boot
1
caalv_private3
boot
4
fslv00
jfs2
4
/clrepos_private1
PPs
1
1
4
4
PVs
1
1
1
1
LV STATE
closed/syncd
closed/syncd
open/syncd
closed/syncd
MOUNT POINT
N/A
N/A
N/A
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
229
fslv01
/clrepos_private2
powerha_crlv
jfs2
4
4
1
open/syncd
boot
1
1
1
closed/syncd
seoul:/ # ls -lrt /clrepos_private2
total 8
drwxr-xr-x
2 root
system
drwxr-xr-x
4 bin
bin
N/A
256 Sep 16 09:05 lost+found
4096 Sep 17 14:32 solidDB
seoul:/ # ls -lrt /clrepos_private2/solidDB
total 18608
-r-xr-xr-x
1 root
system
650
-r-xr-xr-x
1 root
system
5246
-r-xr-xr-x
1 root
system
5975
d--x-----2 root
system
256
-r-x-----1 root
system
322
drwxr-xr-x
2 bin
bin
256
-rw------1 root
system
8257536
-rw-r--r-1 root
system
18611
-rw------1 root
system
1054403
-rw------1 root
system
166011
Feb
Jun
Aug
Aug
Sep
Sep
Sep
Sep
Sep
Sep
20
6
7
7
17
17
17
17
17
17
2010
18:54
15:53
23:10
12:06
12:06
12:06
12:06
14:32
15:03
solid.lic
caa.sql
solid.ini
.sec
solidhac.ini
logs
solid.db
hacmsg.out
solmsg.bak
solmsg.out
seoul:/ # ls -lrt /clrepos_private2/solidDB/logs
total 32
-rw------1 root
system
16384 Sep 17 12:07 sol00002.log
Explanation of file names:
The solid daemon generates the solmsg.out log file.
The solidhac daemon generates the hacmsg.out log file.
The solid.db file is the database itself, and the logs directory contains the database
transaction logs.
The solid.ini files are the configuration files for the solid daemons; the solidhac.ini
files are the configuration files for the solidhac daemons.
Collecting CAA debug information for IBM support
The CAA component is now included in the snap command. The snap -e and clsnap
commands collect all the necessary information for IBM support. The snap command gathers
the following files from each node, compressing them into a .pax file:
LOG
bootstrap_repository
clrepos1_solidDB.tar
dbpass
lscluster_clusters
lscluster_network_interfaces
lscluster_network_statistics
lscluster_nodes
lscluster_storage_interfaces
230
IBM PowerHA SystemMirror 7.1 for AIX
lscluster_zones
solid_lssrc
solid_lssrc_S
solid_select_sys_tables
solid_select_tables
syslog_caa
system_proc_version
system_uname
8.3.3 PowerHA 7.1 cluster monitoring tools
PowerHA 7.1 comes with many commands and utilities that an administrator can use to
monitor the cluster. This section explains those tools that are most commonly used.
Using the clstat utility
The clstat utility is the most traditional and most used interactive tool to observe the cluster
status. Before using the clstat utility, you must convert the Simple Network Management
Protocol (SNMP) from version 3 to version 1, if it is not done yet. Example 8-41 shows the
steps and sample outputs.
Example 8-41 Converting SNMP from V3 to V1
seoul:/ # stopsrc -s snmpd
0513-044 The snmpd Subsystem was requested to stop.
seoul:/ # ls -ld /usr/sbin/snmpd
lrwxrwxrwx
1 root
system
snmpdv3ne
9 Sep 15 22:17 /usr/sbin/snmpd ->
seoul:/ # /usr/sbin/snmpv3_ssw -1
Stop daemon: snmpmibd
In /etc/rc.tcpip file, comment out the line that contains: snmpmibd
In /etc/rc.tcpip file, remove the comment from the line that contains: dpid2
Make the symbolic link from /usr/sbin/snmpd to /usr/sbin/snmpdv1
Make the symbolic link from /usr/sbin/clsnmp to /usr/sbin/clsnmpne
Start daemon: dpid2
seoul:/ # ls -ld /usr/sbin/snmpd
lrwxrwxrwx
1 root
system
/usr/sbin/snmpdv1
17 Sep 20 09:49 /usr/sbin/snmpd ->
seoul:/ # startsrc -s snmpd
0513-059 The snmpd Subsystem has been started. Subsystem PID is
8126570.
The clstat utility in interactive mode
With the new -i flag, you can now select the cluster ID from a list of available ones as shown
in Example 8-42.
Example 8-42 The clstat command in interactive mode
sydney:/ # clstat -i
clstat - HACMP Cluster Status Monitor
------------------------------------Number of clusters active: 1
ID
Name
State
1108531106
korea
UP
Select an option:
# - the Cluster ID
1108531106
q- quit
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
231
clstat - HACMP Cluster Status Monitor
------------------------------------Cluster: korea (1108531106)
Tue Oct 5 11:01:17 2010
State: UP
Nodes: 2
SubState: STABLE
Node: busan
State: UP
Interface: busan-b1 (0)
192.168.101.144
UP
192.168.201.144
UP
Address:
State:
Address:
State:
Address:
State:
192.168.101.143
UP
192.168.201.143
UP
10.168.101.143
UP
State: On line
Interface: busan-b2 (0)
Address:
State:
Address:
State:
Node: seoul
State: UP
Interface: seoul-b1 (0)
Interface: seoul-b2 (0)
Interface: poksap-db (0)
Resource Group: db2pok_ResourceGroup
The clstat utility with the -o flag
You can use the clstat utility with the -o flag as shown in Example 8-43. This flag instructs
the utility to run once and then exit. It is useful for scripts and cron jobs.
Example 8-43 The clstat utility with the option to run only once
sydney:/ # clstat -o
clstat - HACMP Cluster Status Monitor
------------------------------------Cluster: au_cl (1128255334)
Mon Sep 20 10:26:10 2010
State: UP
SubState: STABLE
Nodes: 2
Node: perth
State: UP
Interface: perth (0)
Interface: perthb2 (0)
Interface: perths (0)
Address:
State:
Address:
State:
Address:
State:
192.168.101.136
UP
192.168.201.136
UP
10.168.201.136
UP
State: On line
Resource Group: perthrg
Node: sydney
State: UP
Interface: sydney (0)
Interface: sydneyb2 (0)
Interface: sydneys (0)
232
IBM PowerHA SystemMirror 7.1 for AIX
Address:
State:
Address:
State:
Address:
State:
192.168.101.135
UP
192.168.201.135
UP
10.168.201.135
UP
Resource Group: sydneyrg
State:
On line
sydney:/ #
Tip: The sfwcom and dpcomm interfaces that are shown with the lscluster -i command are
not shown in output of the clstat utility. The PowerHA 7.1 cluster is unaware of the CAA
cluster that is present at the AIX level.
Using the cldump utility
Another traditional way to observe the cluster status is to use the cldump utility, which also
relies on the SNMP infrastructure as shown in Example 8-44.
Example 8-44 cldump command
seoul:/ # cldump
Obtaining information via SNMP from Node: seoul...
_____________________________________________________________________________
Cluster Name: korea
Cluster State: UP
Cluster Substate: STABLE
_____________________________________________________________________________
Node Name: busan
State: UP
Network Name: net_ether_01
State: UP
Address: 192.168.101.144 Label: busan-b1
Address: 192.168.201.144 Label: busan-b2
Node Name: seoul
State: UP
State: UP
State: UP
Network Name: net_ether_01
State: UP
Address: 10.168.101.143 Label: poksap-db
Address: 192.168.101.143 Label: seoul-b1
Address: 192.168.201.143 Label: seoul-b2
State: UP
State: UP
State: UP
Cluster Name: korea
Resource Group Name: db2pok_ResourceGroup
Startup Policy: Online On Home Node Only
Fallover Policy: Fallover To Next Priority Node In The List
Fallback Policy: Never Fallback
Site Policy: ignore
Node
Group State
---------------------------- --------------seoul
ONLINE
busan
OFFLINE
Tools in the /usr/es/sbin/cluster/utilities/ file
The administrator of a running PowerHA 7.1 cluster can use several tools that are provided
with the cluster.es.server.utils file set. These tools are kept in the
/usr/es/sbin/cluster/utilities/ directory. Examples of the tools are provided in the
following sections.
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
233
Listing the PowerHA SystemMirror cluster interfaces
Example 8-45 shows the list of interfaces in the cluster using the cllsif command.
Example 8-45 Listing cluster interfaces using the cllsif command
seoul:/ # /usr/es/sbin/cluster/utilities/cllsif
Adapter
Type
Network
Net Type
Attribute
Address
Hardware Address Interface Name
Global Name
Alias for HB Prefix Length
busan-b2
192.168.201.144
busan-b1
192.168.101.144
poksap-db
10.168.101.143
seoul-b1
192.168.101.143
seoul-b2
192.168.201.143
poksap-db
10.168.101.143
boot
boot
service
boot
boot
service
net_ether_01 ether
en2
net_ether_01 ether
en0
net_ether_01 ether
net_ether_01 ether
en0
net_ether_01 ether
en2
net_ether_01 ether
Node
Netmask
busan
255.255.255.0
public
busan
255.255.255.0
public
busan
255.255.255.0
public
seoul
255.255.255.0
public
seoul
255.255.255.0
public
seoul
255.255.255.0
IP
public
24
24
24
24
24
24
Listing the whole cluster topology information
Example 8-46 shows the cluster topology information that is generated by using the cllscf
command.
Example 8-46 Cluster topology listing by using the cllscf command
seoul:/ # /usr/es/sbin/cluster/utilities/cllscf
Cluster Name: korea
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
There were 1 networks defined: net_ether_01
There are 2 nodes in this cluster
NODE busan:
This node has 1 service IP label(s):
Service IP Label poksap-db:
IP address:
10.168.101.143
Hardware Address:
Network:
net_ether_01
Attribute:
public
Aliased Address?:
Enable
Service IP Label poksap-db has 2 communication interfaces.
(Alternate Service) Communication Interface 1: busan-b2
IP Address:
192.168.201.144
Network:
net_ether_01
Attribute:
public
Alias address for heartbeat:
(Alternate Service) Communication Interface 2: busan-b1
234
IBM PowerHA SystemMirror 7.1 for AIX
IP Address:
Network:
Attribute:
192.168.101.144
net_ether_01
public
Alias address for heartbeat:
Service IP Label poksap-db has no communication interfaces for recovery.
This node has 1 persistent IP label(s):
Persistent IP Label busan:
IP address:
10.168.101.44
Network:
net_ether_01
NODE seoul:
This node has 1 service IP label(s):
Service IP Label poksap-db:
IP address:
10.168.101.143
Hardware Address:
Network:
net_ether_01
Attribute:
public
Aliased Address?:
Enable
Service IP Label poksap-db has 2 communication interfaces.
(Alternate Service) Communication Interface 1: seoul-b1
IP Address:
192.168.101.143
Network:
net_ether_01
Attribute:
public
Alias address for heartbeat:
(Alternate Service) Communication Interface 2: seoul-b2
IP Address:
192.168.201.143
Network:
net_ether_01
Attribute:
public
Alias address for heartbeat:
Service IP Label poksap-db has no communication interfaces for recovery.
This node has 1 persistent IP label(s):
Persistent IP Label seoul:
IP address:
10.168.101.43
Network:
net_ether_01
Breakdown of network connections:
Connections to network net_ether_01
Node busan is connected to network net_ether_01 by these interfaces:
busan-b2
busan-b1
poksap-db
busan
Node seoul is connected to network net_ether_01 by these interfaces:
seoul-b1
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
235
seoul-b2
poksap-db
seoul
Tip: The cltopinfo -m command is used to show the heartbeat rings in the previous
versions of PowerHA. Because this concept no longer applies, the output of the cltopinfo
-m command is empty in PowerHA 7.1.
The PowerHA 7.1 cluster administrator must explore all the utilities in the
/usr/es/sbin/cluster/utilities/ directory in a testing system. Most of the utilities are only
informational tools. Remember to never trigger unknown commands in production systems.
8.3.4 PowerHA ODM classes
Example 8-47 on page 236 provides a comprehensive list of PowerHA Object Data Manager
(ODM) files. Never edit these files directly, unless you are directed by IBM support. However,
you can use the odmget command to grab cluster configuration information directly from these
files as explained in this section.
Example 8-47 PowerHA ODM files
seoul:/etc/es/objrepos # ls HACMP*
HACMPadapter
HACMPpprcconsistgrp
HACMPcluster
HACMPras
HACMPcommadapter
HACMPresource
HACMPcommlink
HACMPresourcetype
HACMPcsserver
HACMPrg_loc_dependency
HACMPcustom
HACMPrgdependency
HACMPdaemons
HACMPrresmethods
HACMPdisksubsys
HACMPrules
HACMPdisktype
HACMPsa
HACMPercmf
HACMPsa_metadata
HACMPercmfglobals
HACMPsdisksubsys
HACMPevent
HACMPserver
HACMPeventmgr
HACMPsircol
HACMPfcfile
HACMPsite
HACMPfcmodtime
HACMPsiteinfo
HACMPfilecollection
HACMPsna
HACMPgpfs
HACMPsp2
HACMPgroup
HACMPspprc
HACMPlogs
HACMPsr
HACMPmonitor
HACMPsvc
HACMPnetwork
HACMPsvcpprc
HACMPnim
HACMPsvcrelationship
HACMPnode
HACMPtape
HACMPnpp
HACMPtc
HACMPoemfilesystem
HACMPtimer
HACMPoemfsmethods
HACMPtimersvc
HACMPoemvgmethods
HACMPtopsvcs
HACMPoemvolumegroup
HACMPude
HACMPpager
HACMPudres_def
HACMPpairtasks
HACMPudresource
HACMPpathtasks
HACMPx25
HACMPport
HACMPxd_mirror_group
236
IBM PowerHA SystemMirror 7.1 for AIX
HACMPpprc
Use the odmget command followed by the name of the file in the /etc/es/objrepos directory.
Example 8-48 shows how to retrieve information about the cluster.
Example 8-48 Using the odmget command to grab cluster information
seoul:/ # ls -ld /etc/es/objrepos/HACMPcluster
-rw-r--r-1 root
hacmp
4096 Sep 17 12:29
/etc/es/objrepos/HACMPcluster
seoul:/ # odmget HACMPcluster
HACMPcluster:
id = 1108531106
name = "korea"
nodename = "seoul"
sec_level = "Standard"
sec_level_msg = ""
sec_encryption = ""
sec_persistent = ""
last_node_ids = ""
highest_node_id = 0
last_network_ids = ""
highest_network_id = 0
last_site_ids = ""
highest_site_id = 0
handle = 2
cluster_version = 12
reserved1 = 0
reserved2 = 0
wlm_subdir = ""
settling_time = 0
rg_distribution_policy = "node"
noautoverification = 0
clvernodename = ""
clverhour = 0
clverstartupoptions = 0
Tip: In previous versions of PowerHA, the ODM HACMPtopsvcs class kept information about
the current instance number for a node. In PowerHA 7.1, this class always has the instance
number 1 (instanceNum = 1 as shown in the following example) because topology services
are not used anymore. This number never changes.
seoul:/ # odmget HACMPtopsvcs
HACMPtopsvcs:
hbInterval = 1
fibrillateCount = 4
runFixedPri = 1
fixedPriLevel = 38
tsLogLength = 5000
gsLogLength = 5000
instanceNum = 1
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
237
You can use the HACMPnode ODM class to discover which version of PowerHA is installed as
shown in Example 8-49.
Example 8-49 Using the odmget command to retrieve the PowerHA version
seoul:/ # odmget HACMPnode | grep version | sort -u
version = 12
The following version numbers and corresponding HACMP/PowerHA release are available:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
HACMP 4.3.1
HACMP 4.4
HACMP 4.4.1
HACMP 4.5
HACMP 5.1
HACMP 5.2
HACMP 5.3
HACMP 5.4
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
Querying the HACMPnode ODM class is useful during cluster synchronization after a migration,
when PowerHA issues warning messages about mixed versions among the nodes.
If the HACMPtopsvcs ODM class can no longer be used to discover if the configuration must be
synchronized across the nodes, you can query the HACMPcluster ODM class. This class
keeps a numeric attribute called handle. Each node has a different value for this attribute,
ranging from 1 to 32. You can retrieve the handle values by using the odmget or clhandle
commands as shown in Example 8-50.
Example 8-50 Viewing the cluster handles
seoul:/ # clcmd odmget HACMPcluster
------------------------------NODE seoul
------------------------------HACMPcluster:
id = 1108531106
name = "korea"
nodename = "seoul"
sec_level = "Standard"
sec_level_msg = ""
sec_encryption = ""
sec_persistent = ""
last_node_ids = ""
highest_node_id = 0
last_network_ids = ""
highest_network_id = 0
last_site_ids = ""
highest_site_id = 0
handle = 2
cluster_version = 12
reserved1 = 0
reserved2 = 0
wlm_subdir = ""
238
IBM PowerHA SystemMirror 7.1 for AIX
settling_time = 0
rg_distribution_policy = "node"
noautoverification = 0
clvernodename = ""
clverhour = 0
clverstartupoptions = 0
------------------------------NODE busan
------------------------------HACMPcluster:
id = 1108531106
name = "korea"
nodename = "busan"
sec_level = "Standard"
sec_level_msg = ""
sec_encryption = ""
sec_persistent = ""
last_node_ids = ""
highest_node_id = 0
last_network_ids = ""
highest_network_id = 0
last_site_ids = ""
highest_site_id = 0
handle = 1
cluster_version = 12
reserved1 = 0
reserved2 = 0
wlm_subdir = ""
settling_time = 0
rg_distribution_policy = "node"
noautoverification = 0
clvernodename = ""
clverhour = 0
clverstartupoptions = 0
seoul:/ # clcmd clhandle
------------------------------NODE seoul
------------------------------2 seoul
------------------------------NODE busan
------------------------------1 busan
When you perform a cluster configuration change in any node, that node receives a numeric
value of 0 over its handle.
Suppose that you want to add a new resource group to the korea cluster and that you make
the change from the seoul node. After you do the modification, and before you synchronize
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
239
the cluster, the handle attribute in the HACMPcluster ODM class in the seoul node has a value
of 0 as shown in Example 8-51.
Example 8-51 Handle values after a change, before synchronization
seoul:/ # clcmd odmget HACMPcluster | egrep "NODE|handle"
NODE seoul
handle = 0
NODE busan
handle = 1
seoul:/ # clcmd clhandle
------------------------------NODE seoul
------------------------------0 seoul
------------------------------NODE busan
------------------------------1 busan
After you synchronize the cluster, the handle goes back to its original value of 2 as shown in
Example 8-52.
Example 8-52 Original handle values after synchronization
seoul:/ # smitty sysmirror  Custom Cluster Configuration  Verify and
Synchronize Cluster Configuration (Advanced)
seoul:/ # clcmd odmget HACMPcluster | egrep "NODE|handle"
NODE seoul
handle = 2
NODE busan
handle = 1
seoul:/ # clcmd clhandle
------------------------------NODE seoul
------------------------------2 seoul
------------------------------NODE busan
------------------------------1
busan
If you experience a situation where more than one node has a handle with a 0 value, you or
another person might have performed the changes from different nodes. Therefore, you must
decide in which node you want to start the synchronization. As result, the cluster
modifications made on the other nodes are then lost.
240
IBM PowerHA SystemMirror 7.1 for AIX
8.3.5 PowerHA clmgr utility
The clmgr utility provides a new interface to PowerHA with consistency, usability, and
serviceability. The tool is packed into the cluster.es.server.utils file set as shown in
Example 8-53.
Example 8-53 The clmgr utility file set
seoul:/ # whence clmgr
/usr/es/sbin/cluster/utilities/clmgr
seoul:/ # lslpp -w /usr/es/sbin/cluster/utilities/clmgr
File
Fileset
Type
---------------------------------------------------------------------------/usr/es/sbin/cluster/utilities/clmgr
cluster.es.server.utils
Hardlink
The clmgr command generates a /var/hacmp/log/clutils.log log file.
The clmgr command supports the actions as listed in 5.2.1, “The clmgr action commands” on
page 104.
For monitoring purposes, you can use the query and view actions. For a list of object classes,
that are available for each action, see 5.2.2, “The clmgr object classes” on page 105.
Example using the query action
Example 8-54 shows the query action on the PowerHA cluster using the clmgr command.
Example 8-54 Query action on the PowerHA cluster using the clmgr command
seoul:/ # clmgr query cluster
CLUSTER_NAME="korea"
CLUSTER_ID="1108531106"
STATE="STABLE"
VERSION="7.1.0.1"
VERSION_NUMBER="12"
EDITION="STANDARD"
CLUSTER_IP=""
REPOSITORY="caa_private0"
SHARED_DISKS="cldisk2,cldisk1"
UNSYNCED_CHANGES="false"
SECURITY="Standard"
FC_SYNC_INTERVAL="10"
RG_SETTLING_TIME="0"
RG_DIST_POLICY="node"
MAX_EVENT_TIME="180"
MAX_RG_PROCESSING_TIME="180"
SITE_POLICY_FAILURE_ACTION="fallover"
SITE_POLICY_NOTIFY_METHOD=""
DAILY_VERIFICATION="Enabled"
VERIFICATION_NODE="Default"
VERIFICATION_HOUR="0"
VERIFICATION_DEBUGGING="Enabled"
LEVEL=""
ALGORITHM=""
GRACE_PERIOD=""
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
241
REFRESH=""
MECHANISM=""
CERTIFICATE=""
PRIVATE_KEY=""
seoul:/ # clmgr query interface
busan-b2
busan-b1
poksap-db
seoul-b1
seoul-b2
seoul:/ # clmgr query node
busan
seoul
seoul:/ # clmgr query network
net_ether_01
seoul:/ # clmgr query resource_group
db2pok_ResourceGroup
seoul:/ # clmgr query volume_group
caavg_private
pokvg
Tip: Another way to check the PowerHA version is to query the SNMP subsystem as
follows:
seoul:/ # snmpinfo -m dump -v -o /usr/es/sbin/cluster/hacmp.defs
clstrmgrVersion
clstrmgrVersion.1 = "7.1.0.1"
clstrmgrVersion.2 = "7.1.0.1"
Example using the view action
Example 8-55 shows the view action on the PowerHA cluster using the clmgr command.
Example 8-55 Using the view action on the PowerHA cluster using clmgr
seoul:/ # clmgr view report cluster
Cluster: korea
Cluster services: active
State of cluster: up
Substate: stable
#############
APPLICATIONS
#############
Cluster korea provides the following applications: db2pok_ApplicationServer
Application: db2pok_ApplicationServer
db2pok_ApplicationServer is started by
/usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok
db2pok_ApplicationServer is stopped by
/usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok
242
IBM PowerHA SystemMirror 7.1 for AIX
Application monitors for db2pok_ApplicationServer:
db2pok_SQLMonitor
db2pok_ProcessMonitor
Monitor name: db2pok_SQLMonitor
Type: custom
Monitor method: user
Monitor interval: 120 seconds
Hung monitor signal: 9
Stabilization interval: 240 seconds
Retry count: 3 tries
Restart interval: 1440 seconds
Failure action: fallover
Cleanup method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok
Restart method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok
Monitor name: db2pok_ProcessMonitor
Type: process
Process monitored: db2sysc
Process owner: db2pok
Instance count: 1
Stabilization interval: 240 seconds
Retry count: 3 tries
Restart interval: 1440 seconds
Failure action: fallover
Cleanup method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2stop db2pok
Restart method: /usr/es/sbin/cluster/sa/db2/sbin/cl_db2start db2pok
This application is part of resource group 'db2pok_ResourceGroup'.
Resource group policies:
Startup: on home node only
Fallover: to next priority node in the list
Fallback: never
State of db2pok_ApplicationServer: online
Nodes configured to provide db2pok_ApplicationServer: seoul {up}
busan {up}
Node currently providing db2pok_ApplicationServer: seoul {up}
The node that will provide db2pok_ApplicationServer if seoul fails
is: busan
Resources associated with db2pok_ApplicationServer:
Service Labels
poksap-db(10.168.101.143) {online}
Interfaces configured to provide poksap-db:
seoul-b1 {up}
with IP address: 192.168.101.143
on interface: en0
on node: seoul {up}
on network: net_ether_01 {up}
seoul-b2 {up}
with IP address: 192.168.201.143
on interface: en2
on node: seoul {up}
on network: net_ether_01 {up}
busan-b2 {up}
with IP address: 192.168.201.144
on interface: en2
on node: busan {up}
on network: net_ether_01 {up}
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
243
busan-b1 {up}
with IP address: 192.168.101.144
on interface: en0
on node: busan {up}
on network: net_ether_01 {up}
Shared Volume Groups:
pokvg
#############
TOPOLOGY
#############
korea consists of the following nodes: busan seoul
busan
Network interfaces:
busan-b2 {up}
with IP address: 192.168.201.144
on interface: en2
on network: net_ether_01 {up}
busan-b1 {up}
with IP address: 192.168.101.144
on interface: en0
on network: net_ether_01 {up}
seoul
Network interfaces:
seoul-b1 {up}
with IP address: 192.168.101.143
on interface: en0
on network: net_ether_01 {up}
seoul-b2 {up}
with IP address: 192.168.201.143
on interface: en2
on network: net_ether_01 {up}
seoul:/ # clmgr view report topology
Cluster Name: korea
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: caa_private0
Cluster IP Address:
NODE busan:
Network net_ether_01
poksap-db
busan-b2
busan-b1
NODE seoul:
Network net_ether_01
poksap-db
seoul-b1
seoul-b2
Network
244
Attribute
IBM PowerHA SystemMirror 7.1 for AIX
10.168.101.143
192.168.201.144
192.168.101.144
Alias
10.168.101.143
192.168.101.143
192.168.201.143
Monitor method
Node
Adapter(s)
net_ether_01
public
busan-b1 poksap-db
Enable
Default monitoring busan
seoul
seoul-b1 seoul-b2 poksap-db
Adapter
Type
Network
Net Type
Attribute
Address
Hardware Address Interface Name
Global Name
Alias for HB Prefix Length
busan-b2
192.168.201.144
22
busan-b1
192.168.101.144
22
poksap-db
10.168.101.143
22
seoul-b1
192.168.101.143
22
seoul-b2
192.168.201.143
22
poksap-db
10.168.101.143
22
busan-b2
Node
Netmask
boot
net_ether_01 ether
en2
public
busan
255.255.255.0
boot
net_ether_01 ether
en0
public
busan
255.255.255.0
service
net_ether_01 ether
public
busan
255.255.255.0
boot
net_ether_01 ether
en0
public
seoul
255.255.255.0
boot
net_ether_01 ether
en2
public
seoul
255.255.255.0
service
net_ether_01 ether
public
seoul
255.255.255.0
IP
You can also use the clmgr command to see the list of PowerHA SystemMirror log files as
shown in Example 8-56.
Example 8-56 Viewing the PowerHA cluster log files using the clmgr command
seoul:/ # clmgr view log
Available Logs:
autoverify.log
cl2siteconfig_assist.log
cl_testtool.log
clavan.log
clcomd.log
clcomddiag.log
clconfigassist.log
clinfo.log
clstrmgr.debug
clstrmgr.debug.long
cluster.log
cluster.mmddyyyy
clutils.log
clverify.log
cspoc.log
cspoc.log.long
cspoc.log.remote
dhcpsa.log
dnssa.log
domino_server.log
emuhacmp.out
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
245
hacmp.out
ihssa.log
migration.log
sa.log
sax.log
Tip: The output verbose level can be set by using the -l option as in the following
example:
clmgr -l {low|med|high|max} action object
8.3.6 IBM Systems Director web interface
This section explains how to discover and monitor a cluster by using the IBM Systems
Director 6.1 web interface. For the steps to install IBM Systems Director, the IBM PowerHA
SystemMirror plug-in, and IBM System Director Common Agent, see Chapter 11, “Installing
IBM Systems Director and the PowerHA SystemMirror plug-in” on page 325.
Login page for IBM Systems Director
When you point the web browser to the IBM Systems Director IP address, port 8422, you are
presented with a login page. The root user and password are used to log on as shown in
Figure 8-1 on page 247.
Root user: Do not use the root user. The second person who logs on with the root user ID
unlogs the first person, and so on. The logon is exclusive. For a production environment,
create an AIX user ID for each person who must connect to the IBM Systems Director web
interface. This user ID must belong to smadmin. Therefore, everyone can connect
simultaneously to the IBM Systems Director web interface. For more information, see the
“Users and user groups in IBM Systems Director” topic in the IBM Systems Director V6.1.x
Information Center at:
http://publib.boulder.ibm.com/infocenter/director/v6r1x/index.jsp?topic=/direct
or.security_6.1/fqm0_c_user_accounts.html
smadmin (Administrator group): Members of the smadmin group are authorized for all
operations.
246
IBM PowerHA SystemMirror 7.1 for AIX
Figure 8-1 IBM Systems Director 6.1 login page
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
247
Welcome page for IBM Systems Director
On the welcome page, the administrator must first discover the systems with PowerHA to
administer. Figure 8-2 shows the link underlined in red.
Figure 8-2 IBM Systems Director 6.1 welcome page
248
IBM PowerHA SystemMirror 7.1 for AIX
Discovery Manager
In the Discovery Manager panel, the administrator must click the System discovery link as
shown in Figure 8-3.
Figure 8-3 IBM Systems Director 6.1 Discovery Manager
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
249
Selecting the systems and agents to discover
In the System Discovery panel, complete the following actions:
1. For Select a discovery option, select the Range of IPv4 addresses.
2. Enter the starting and ending IP addresses. In Figure 8-4, only the two IP addresses for
the seoul and busan nodes are used.
3. For Select the resource type to discover, leave the default of All.
4. Click the Discover now button. The discovery takes less than 1 minute in this case
because the IP range is limited to two machines.
Figure 8-4 Selecting the systems to discover
250
IBM PowerHA SystemMirror 7.1 for AIX
IBM Systems Director availability menu
In the left navigation bar, expand Availability and click the PowerHA SystemMirror link as
shown in Figure 8-5.
Figure 8-5 IBM Systems Director 6.1 availability menu
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
251
Initial panel of the PowerHA SystemMirror plug-in
In the Health Summary list, you can see that two systems have an OK status with one
resource group also having an OK status. Click the Manage Clusters link as shown in
Figure 8-6.
Figure 8-6 PowerHA SystemMirror plug-in initial menu
252
IBM PowerHA SystemMirror 7.1 for AIX
PowerHA available clusters
On the Cluster and Resource Group Management panel (Figure 8-7), the PowerHA plug-in
for IBM Systems Director shows the available clusters. This information has been retrieved in
the discovery process. Two clusters are shown: korea and ro_cl. In the korea cluster, the two
nodes, seoul and busan, are visible and indicate a healthy status. The General tab on the
right shows more relevant information about the selected cluster.
Figure 8-7 PowerHA SystemMirror plug-in: Available clusters
Cluster menu
You can right-click all the objects to access options. Figure 8-8 shows an example of the
options for the korea cluster.
Figure 8-8 Menu options when right-clicking a cluster in the PowerHA SystemMirror plug-in
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
253
PowerHA SystemMirror plug-in: Resource Groups tab
The Resource Group tab (Figure 8-9) shows the available resource groups in the cluster.
Figure 8-9 PowerHA SystemMirror plug-in: Resource Groups tab
254
IBM PowerHA SystemMirror 7.1 for AIX
Resource Groups menu
You can right-click the resource groups to access options such as those shown in
Figure 8-10.
Figure 8-10 Options available when right-clicking a resource group in PowerHA SystemMirror plug-in
PowerHA SystemMirror plug-in: Cluster tab
The Cluster tab has several tabs on the right that you can use to retrieve information about
the cluster. These tabs include the Resource Groups tab, Network tab, Storage tab, and
Additional Properties tab as shown in the following sections.
Resource Group tab
Figure 8-11 shows the Resource Groups tab and the information that is presented.
Figure 8-11 PowerHA SystemMirror plug-in: Resource Groups tab
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
255
Network tab
Figure 8-12 shows the Networks tab and the information that is displayed.
Figure 8-12 PowerHA SystemMirror plug-in: Networks tab
Storage tab
Figure 8-13 shows the Storage tab and the information that is presented.
Figure 8-13 PowerHA SystemMirror plug-in: Storage tab
256
IBM PowerHA SystemMirror 7.1 for AIX
Additional Properties tab
Figure 8-14 shows the Additional Properties tab and the information that is presented.
Figure 8-14 PowerHA SystemMirror plug-in Additional Properties tab
8.3.7 IBM Systems Director CLI (smcli interface)
The new web interface for IBM Systems Director is powerful, allowing IBM Systems Director
to be used anywhere to open a systems management console. However, it is often desirable
to perform certain functions against the management server by using a command line.
Whether scripting something to be used on many systems or to automate a process, the CLI
can be useful in a management environment such as IBM Systems Director.
Tip: To run the commands, the smcli interface requires you to be an IBM Systems Director
superuser.
Example 8-57 runs the smcli command host name mexico in IBM Systems Director to see the
available options with PowerHA.
Example 8-57 Available options for PowerHA in IBM Systems Director CLI
mexico:/ # /opt/ibm/director/bin/smcli lsbundle | grep sysmirror
sysmirror/help
sysmirror/lsac
sysmirror/lsam
sysmirror/lsappctl
sysmirror/lsappmon
sysmirror/lscl
sysmirror/lscluster
sysmirror/lsdependency
sysmirror/lsdp
sysmirror/lsfc
sysmirror/lsfilecollection
sysmirror/lsif
sysmirror/lsinterface
sysmirror/lslg
sysmirror/lslog
sysmirror/lsmd
sysmirror/lsmethod
Chapter 8. Monitoring a PowerHA SystemMirror 7.1 for AIX cluster
257
sysmirror/lsnd
.
.
.
All the configuration commands listed in Example 8-57 can be triggered by using the smcli
command. Example 8-58 shows the commands that you can use.
Example 8-58 Using #smcli to retrieve PowerHA information
# Lists the clusters that can be managed by the IBM Systems Director:
mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lscluster
korea
(1108531106)
# Lists the service labels of a cluster:
mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lssi -c korea
poksap-db
# Lists all interfaces defined in a cluster:
mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsif -c korea
busan-b1
busan-b2
seoul-b1
seoul-b2
# Lists resource groups of a cluster:
mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsrg -c korea
db2pok_ResourceGroup
# Lists networks:
mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsnw -c korea
net_ether_01
# Lists application servers of a cluster:
mexico:/ # /opt/ibm/director/bin/smcli sysmirror/lsac -c korea
db2pok_ApplicationServer
258
IBM PowerHA SystemMirror 7.1 for AIX
9
Chapter 9.
Testing the PowerHA 7.1 cluster
This chapter takes you through several simulations for testing a PowerHA 7.1 cluster and then
explains the cluster behavior and log files. This chapter includes the following topics:
Testing the SAN-based heartbeat channel
Testing the repository disk heartbeat channel
Simulation of a network failure
Testing the rootvg system event
Simulation of a crash in the node with an active resource group
Simulations of CPU starvation
Simulation of a Group Services failure
Testing a Start After resource group dependency
Testing dynamic node priority
© Copyright IBM Corp. 2011. All rights reserved.
259
9.1 Testing the SAN-based heartbeat channel
This section explains how to check the redundant heartbeat through the storage area network
(SAN)-based channel if the network communication between nodes is lost. The procedure is
based on the test cluster shown in Figure 9-1. In this environment, the PowerHA cluster is
synchronized, and the CAA cluster is running.
Figure 9-1 Testing the SAN-based heartbeat
Example 9-1 shows the working state of the CAA cluster.
Example 9-1 Initial error-free CAA status
sydney:/ # lscluster -i
Network/Storage Interface Query
Cluster Name: au_cl
Cluster uuid: d77ac57e-cc1b-11df-92a4-00145ec5bf9a
Number of nodes reporting = 2
Number of nodes expected = 2
Node sydney
Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9a
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 5
260
IBM PowerHA SystemMirror 7.1 for AIX
Probe interval for interface = 120 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9b
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 5
Probe interval for interface = 120 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 0
Mean Deviation in network rrt across interface = 0
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node perth
Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d9
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Chapter 9. Testing the PowerHA 7.1 cluster
261
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d8
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 0
Mean Deviation in network rrt across interface = 0
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
262
IBM PowerHA SystemMirror 7.1 for AIX
You can check the connectivity between nodes by using the socksimple command. This
command provides a ping-type interface to send and receive packets over the cluster
communications channels. Example 9-2 shows the usage output of running the socksimple
command.
Example 9-2 socksimple usage
sydney:/ # socksimple
Usage: socksimple -r|-s [-v] [-a address] [-p port] [-t ttl]
-r|-s
-a address
-p port
-p ttl
-v
Receiver or sender. Required argument,
mutually exclusive
Cluster address to listen/send on,
overrides the default. (must be < 16 characters long)
port to listen/send on,
overrides the default of 12.
Time-To-Live to send,
overrides the default of 1.
Verbose mode
You can obtain the cluster address for the -a option of the socksimple command from the
lscluster -c command output (Example 9-3).
Example 9-3 Node IDs of the CAA cluster
sydney:/ # lscluster -c
Cluster query for cluster aucl returns:
Cluster uuid: 98f28ffa-cfde-11df-9a82-00145ec5bf9a
Number of nodes in cluster = 2
Cluster id for node perth is 1
Primary IP address for node perth is 192.168.101.136
Cluster id for node sydney is 2
Primary IP address for node sydney is 192.168.101.135
Number of disks in cluster = 0
Multicast address for cluster is 228.168.101.135
To test the SAN-based heartbeat channel, follow these steps:
1. Check the cluster communication with all the network interfaces up (Example 9-4).
Example 9-4 The socksimple test with the network channel up
sydney:/ # socksimple -s -a 1
socksimple version 1.2
socksimpleing 1/12 with ttl=1:
1275 bytes from cluster host id = 1: seqno=1275 ttl=1 time=0.415 ms
1276 bytes from cluster host id = 1: seqno=1276 ttl=1 time=0.381 ms
1277 bytes from cluster host id = 1: seqno=1277 ttl=1 time=0.347 ms
--- socksimple statistics --3 packets transmitted, 3 packets received
round-trip min/avg/max = 0.347/0.381/0.415 ms
perth:/ # socksimple -r -a 1
socksimple version 1.2
Listening on 1/12:
Chapter 9. Testing the PowerHA 7.1 cluster
263
Replying to socksimple from cluster node id = 2 bytes=1275 seqno=1275 ttl=1
Replying to socksimple from cluster node id = 2 bytes=1276 seqno=1276 ttl=1
Replying to socksimple from cluster node id = 2 bytes=1277 seqno=1277 ttl=1
perth:/ #
2. Disconnect the network interfaces, by pulling the cables in one node to simulate an
Ethernet network failure. Example 9-5 shows the interfaces status.
Example 9-5 Ethernet ports down
perth:/ # entstat -d ent1 | grep -i link
Link Status: UNKNOWN
perth:/ # entstat -d ent2 | grep -i link
Link Status: UNKNOWN
3. Check the cluster communication by using the socksimple command as shown in
Example 9-6.
Example 9-6 The socksimple test with Ethernet ports down
sydney:/ # socksimple -s -a 1
socksimple version 1.2
socksimpleing 1/12 with ttl=1:
1275
1275
1275
1276
1276
1276
bytes
bytes
bytes
bytes
bytes
bytes
from
from
from
from
from
from
cluster
cluster
cluster
cluster
cluster
cluster
host
host
host
host
host
host
id
id
id
id
id
id
=
=
=
=
=
=
1:
1:
1:
1:
1:
1:
seqno=1275
seqno=1275
seqno=1275
seqno=1276
seqno=1276
seqno=1276
ttl=1
ttl=1
ttl=1
ttl=1
ttl=1
ttl=1
time=1.075 ms
time=50.513 ms
time=150.663 ms
time=0.897 ms
time=50.623 ms
time=150.791 ms
--- socksimple statistics --2 packets transmitted, 6 packets received
round-trip min/avg/max = 0.897/67.427/150.791 ms
perth:/ # socksimple -r -a 1
socksimple version 1.2
Listening on 1/12:
Replying to socksimple from cluster node id = 2 bytes=1275 seqno=1275 ttl=1
Replying to socksimple from cluster node id = 2 bytes=1276 seqno=1276 ttl=1
perth:/
4. Check the status of the cluster interfaces by using the lscluster -i command.
Example 9-7 shows the status for both disconnected ports on the perth node. In this
example, the status has changed from UP to DOWN SOURCE HARDWARE RECEIVE SOURCE
HARDWARE TRANSMIT.
Example 9-7 CAA cluster status with Ethernet ports down
sydney:/ # lscluster -i
Network/Storage Interface Query
Cluster Name:
Cluster uuid:
264
aucl
98f28ffa-cfde-11df-9a82-00145ec5bf9a
IBM PowerHA SystemMirror 7.1 for AIX
Number of nodes reporting = 2
Number of nodes expected = 2
Node sydney
Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9a
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255
netmask
255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9b
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255
netmask
255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Chapter 9. Testing the PowerHA 7.1 cluster
265
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node perth
Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d9
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE
TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255
netmask
255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d8
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE
TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255
netmask
255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
266
IBM PowerHA SystemMirror 7.1 for AIX
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
5. Reconnect the Ethernet cables and check the port status as shown in Example 9-8.
Example 9-8 Ethernet ports reconnected
perth:/ # entstat -d ent1 | grep -i link
Link Status: Up
perth:/ # entstat -d ent2|grep -i link
Link Status: Up
6. Check if the cluster status has recovered. Example 9-9 shows that both Ethernet ports on
the perth node are now in the UP state.
Example 9-9 CAA cluster status recovered
sydney:/ # lscluster -i
Network/Storage Interface Query
Cluster Name: aucl
Cluster uuid: 98f28ffa-cfde-11df-9a82-00145ec5bf9a
Number of nodes reporting = 2
Number of nodes expected = 2
Node sydney
Node uuid = f6a81944-cbce-11df-87b6-00145ec5bf9a
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9a
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.135 broadcast 192.168.103.255
netmask
255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.c5.bf.9b
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Chapter 9. Testing the PowerHA 7.1 cluster
267
netmask
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.135 broadcast 192.168.203.255
255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node perth
Node uuid = 15bef17c-cbcf-11df-951c-00145e5e3182
Number of interfaces discovered = 4
Interface number 1 en1
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d9
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.101.136 broadcast 192.168.103.255
netmask
255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 2 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.14.5e.e7.25.d8
268
IBM PowerHA SystemMirror 7.1 for AIX
netmask
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.136 broadcast 192.168.203.255
255.255.255.0
Number of cluster multicast addresses configured on interface =
1
IPV4 MULTICAST ADDRESS: 228.168.101.135 broadcast 0.0.0.0
netmask 0.0.0.0
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
Interface number 4 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
9.2 Testing the repository disk heartbeat channel
This section explains how to test the repository disk heartbeat channel.
9.2.1 Background
When the entire PowerHA SystemMirror IP network fails, and either the SAN-based heartbeat
network (sfwcom) does not exist, or it exists but has failed, CAA uses the
heartbeat-over-repository-disk (dpcom) feature.
The example in the next section describes dpcom heartbeating in a two-node cluster after all
IP interfaces have failed.
Chapter 9. Testing the PowerHA 7.1 cluster
269
9.2.2 Testing environment
A two-node cluster is configured with the following topology:
en0 is not included in the PowerHA cluster, but it is monitored by CAA.
en3 through en5 are included in the PowerHA cluster and monitored by CAA.
No SAN-based communication channel (sfwcom) is available.
Initially, both nodes are online and running cluster services, all IP interfaces are online, and
the service IP address has an alias on the en3 interface.
This test scenario includes unplugging the cable of one interface at a time, starting with en3,
en4, en5, and finally en0. As each cable is unplugged, the service IP correctly swaps to the
next available interface on the same node. Each failed interface is marked as DOWN SOURCE
HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT as shown in Example 9-10. After the cables for
the en3 through en5 interfaces are unplugged, a local network failure event occurs, leading to
a selective failover of the resource group to the remote node. However, because the en0
interface is still up, CAA continues to heartbeat over the en0 interface.
Example 9-10 Output of the lscluster -i command
[hacmp27:HAES7101/AIX61-06 /]
# lscluster -i
Network/Storage Interface Query
Cluster Name: ha71sp1_aixsp2
Cluster uuid: c37f7324-daff-11df-903e-0011257e4998
Number of nodes reporting = 2
Number of nodes expected = 2
Node hacmp27
Node uuid = 66b66b10-d16e-11df-aa3f-0011257e4998
Number of interfaces discovered = 5
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.7e.49.98
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 9.3.44.27 broadcast 9.3.44.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 2 en3
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b5
Smoothed rrt across interface = 8
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 110 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.1.27 broadcast 10.1.1.255 netmask 255.255.255.0
270
IBM PowerHA SystemMirror 7.1 for AIX
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 3 en4
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b6
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.2.27 broadcast 10.1.2.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 4 en5
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b7
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.3.27 broadcast 10.1.3.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 5 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 118
Mean Deviation in network rrt across interface = 81
Probe interval for interface = 1990 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node hacmp28
Node uuid = 15e86116-d173-11df-8bdf-0011257e4340
Number of interfaces discovered = 5
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.7e.43.40
Smoothed rrt across interface = 8
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 110 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 9.3.44.28 broadcast 9.3.44.255 netmask 255.255.255.0
Chapter 9. Testing the PowerHA 7.1 cluster
271
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 2 en3
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.d
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 2
IPV4 ADDRESS: 10.1.1.28 broadcast 10.1.1.255 netmask 255.255.255.0
IPV4 ADDRESS: 192.168.1.27 broadcast 192.168.1.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 3 en4
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.e
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.2.28 broadcast 10.1.2.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 4 en5
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.f
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.3.28 broadcast 10.1.3.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 5 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 1037
Mean Deviation in network rrt across interface = 1020
Probe interval for interface = 20570 ms
272
IBM PowerHA SystemMirror 7.1 for AIX
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Example 9-11 shows the output of the lscluster -m command.
Example 9-11 Output of the lscluster -m command
[hacmp27:HAES7101/AIX61-06 /]
# lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2
Node name: hacmp27
Cluster shorthand id for node: 1
uuid for node: 66b66b10-d16e-11df-aa3f-0011257e4998
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
ha71sp1_aixsp2
local
c37f7324-daff-11df-903e-0011257e4998
Number of points_of_contact for node: 0
Point-of-contact interface & contact state
n/a
-----------------------------Node name: hacmp28
Cluster shorthand id for node: 2
uuid for node: 15e86116-d173-11df-8bdf-0011257e4340
State of node: UP
Smoothed rtt to node: 8
Mean Deviation in network rtt to node: 3
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
ha71sp1_aixsp2
local
c37f7324-daff-11df-903e-0011257e4998
Number of points_of_contact for node: 4
Point-of-contact interface & contact state
en0 UP
en5 DOWN
en4 DOWN
en3 DOWN
Chapter 9. Testing the PowerHA 7.1 cluster
273
After the en0 cable is unplugged, CAA proceeds to heartbeat over the repository disk
(dpcom). This action is indicated by the node status REACHABLE THROUGH REPOS DISK ONLY in
the lscluster -m command (Example 9-12).
Example 9-12 Output of the lscluster -m command
[hacmp27:HAES7101/AIX61-06 /]
# lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2
Node name: hacmp27
Cluster shorthand id for node: 1
uuid for node: 66b66b10-d16e-11df-aa3f-0011257e4998
State of node: UP NODE_LOCAL REACHABLE THROUGH REPOS DISK ONLY
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
ha71sp1_aixsp2
local
c37f7324-daff-11df-903e-0011257e4998
Number of points_of_contact for node: 0
Point-of-contact interface & contact state
n/a
-----------------------------Node name: hacmp28
Cluster shorthand id for node: 2
uuid for node: 15e86116-d173-11df-8bdf-0011257e4340
State of node: UP REACHABLE THROUGH REPOS DISK ONLY
Smoothed rtt to node: 143
Mean Deviation in network rtt to node: 107
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
ha71sp1_aixsp2
local
c37f7324-daff-11df-903e-0011257e4998
Number of points_of_contact for node: 5
Point-of-contact interface & contact state
dpcom UP
en0 DOWN
en5 DOWN
en4 DOWN
en3 DOWN
[hacmp28:HAES7101/AIX61-06 /]
# lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2
Node name: hacmp27
Cluster shorthand id for node: 1
uuid for node: 66b66b10-d16e-11df-aa3f-0011257e4998
274
IBM PowerHA SystemMirror 7.1 for AIX
State of node: UP REACHABLE THROUGH REPOS DISK ONLY
Smoothed rtt to node: 17
Mean Deviation in network rtt to node: 5
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
ha71sp1_aixsp2
local
c37f7324-daff-11df-903e-0011257e4998
Number of points_of_contact for node: 5
Point-of-contact interface & contact state
dpcom UP
en4 DOWN
en5 DOWN
en3 DOWN
en0 DOWN
-----------------------------Node name: hacmp28
Cluster shorthand id for node: 2
uuid for node: 15e86116-d173-11df-8bdf-0011257e4340
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
ha71sp1_aixsp2
local
c37f7324-daff-11df-903e-0011257e4998
Number of points_of_contact for node: 0
Point-of-contact interface & contact state
n/a
Example 9-13 shows the output of the lscluster -i command with the dpcom status
changing from UP RESTRICTED AIX_CONTROLLED to UP AIX_CONTROLLED.
Example 9-13 Output of the lscluster -i command showing the dpcom status
[hacmp27:HAES7101/AIX61-06 /]
# lscluster -i
Network/Storage Interface Query
Cluster Name: ha71sp1_aixsp2
Cluster uuid: c37f7324-daff-11df-903e-0011257e4998
Number of nodes reporting = 2
Number of nodes expected = 2
Node hacmp27
Node uuid = 66b66b10-d16e-11df-aa3f-0011257e4998
Number of interfaces discovered = 5
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.7e.49.98
Smoothed rrt across interface = 8
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 110 ms
Chapter 9. Testing the PowerHA 7.1 cluster
275
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 9.3.44.27 broadcast 9.3.44.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 2 en3
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b5
Smoothed rrt across interface = 8
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 110 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.1.27 broadcast 10.1.1.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 3 en4
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b6
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.2.27 broadcast 10.1.2.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 4 en5
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b7
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.3.27 broadcast 10.1.3.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 5 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 23
Mean Deviation in network rrt across interface = 11
276
IBM PowerHA SystemMirror 7.1 for AIX
Probe interval for interface = 340 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP AIX_CONTROLLED
Node hacmp28
Node uuid = 15e86116-d173-11df-8bdf-0011257e4340
Number of interfaces discovered = 5
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.7e.43.40
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 9.3.44.28 broadcast 9.3.44.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 2 en3
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.d
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 2
IPV4 ADDRESS: 10.1.1.28 broadcast 10.1.1.255 netmask 255.255.255.0
IPV4 ADDRESS: 192.168.1.27 broadcast 192.168.1.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 3 en4
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.e
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.2.28 broadcast 10.1.2.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 4 en5
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.f
Smoothed rrt across interface = 7
Chapter 9. Testing the PowerHA 7.1 cluster
277
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.3.28 broadcast 10.1.3.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 5 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 10
Mean Deviation in network rrt across interface = 7
Probe interval for interface = 170 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP AIX_CONTROLLED
After any interface cable is reconnected, such as the en0 interface, CAA stops heartbeating
over the repository disk and resumes heartbeating over the IP interface.
Example 9-14 shows the output of the lscluster -m command after the en0 cable is
reconnected. The dpcom status changes from UP to DOWN RESTRICTED, and the en0 interface
status changes from DOWN to UP.
Example 9-14 Output of the lscluster -m command after en0 is reconnected
[hacmp27:HAES/AIX61-06 /]
# lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2
Node name: hacmp27
Cluster shorthand id for node: 1
uuid for node: 66b66b10-d16e-11df-aa3f-0011257e4998
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
ha71sp1_aixsp2
local
c37f7324-daff-11df-903e-0011257e4998
Number of points_of_contact for node: 0
Point-of-contact interface & contact state
n/a
-----------------------------Node name: hacmp28
Cluster shorthand id for node: 2
uuid for node: 15e86116-d173-11df-8bdf-0011257e4340
State of node: UP
Smoothed rtt to node: 7
278
IBM PowerHA SystemMirror 7.1 for AIX
Mean Deviation in network rtt to node: 4
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME
TYPE SHID
UUID
ha71sp1_aixsp2
local
c37f7324-daff-11df-903e-0011257e4998
Number of points_of_contact for node: 5
Point-of-contact interface & contact state
dpcom DOWN RESTRICTED
en0 UP
en5 DOWN
en4 DOWN
en3 DOWN
Example 9-15 shows the output of the lscluster -i command. The en0 interface is now
marked as UP, and the dpcom returns to UP RESTRICTED AIX_CONTROLLED.
Example 9-15 Output of the lscluster -i command
[hacmp27:HAES/AIX61-06 /]
# lscluster -i
Network/Storage Interface Query
Cluster Name: ha71sp1_aixsp2
Cluster uuid: c37f7324-daff-11df-903e-0011257e4998
Number of nodes reporting = 2
Number of nodes expected = 2
Node hacmp27
Node uuid = 66b66b10-d16e-11df-aa3f-0011257e4998
Number of interfaces discovered = 5
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.7e.49.98
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 9.3.44.27 broadcast 9.3.44.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 2 en3
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b5
Smoothed rrt across interface = 8
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 110 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.1.27 broadcast 10.1.1.255 netmask 255.255.255.0
Chapter 9. Testing the PowerHA 7.1 cluster
279
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 3 en4
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b6
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.2.27 broadcast 10.1.2.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 4 en5
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cc.d.b7
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x630853
Interface state DOWN SOURCE HARDWARE RECEIVE SOURCE HARDWARE TRANSMIT
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.3.27 broadcast 10.1.3.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 5 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 120
Mean Deviation in network rrt across interface = 105
Probe interval for interface = 2250 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Node hacmp28
Node uuid = 15e86116-d173-11df-8bdf-0011257e4340
Number of interfaces discovered = 5
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.7e.43.40
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 9.3.44.28 broadcast 9.3.44.255 netmask 255.255.255.0
280
IBM PowerHA SystemMirror 7.1 for AIX
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 2 en3
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.d
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 2
IPV4 ADDRESS: 10.1.1.28 broadcast 10.1.1.255 netmask 255.255.255.0
IPV4 ADDRESS: 192.168.1.27 broadcast 192.168.1.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 3 en4
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.e
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.2.28 broadcast 10.1.2.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 4 en5
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 0.11.25.cb.e1.f
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x5e080863
ndd flags for interface = 0x63081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 10.1.3.28 broadcast 10.1.3.255 netmask 255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.3.44.27 broadcast 0.0.0.0 netmask 0.0.0.0
Interface number 5 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
Chapter 9. Testing the PowerHA 7.1 cluster
281
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
9.3 Simulation of a network failure
The following section explains the simulation of a network failure.
9.3.1 Background
In PowerHA 7.1, the heartbeat method has changed. Heartbeating between the nodes is now
done by AIX. The newly introduced CAA takes the role for heartbeating and event
management.
This simulation tests a network down scenario and looks at the log files of PowerHA and CAA
monitoring. This test scenario has a two-node cluster, and one network interface is down on
one of the nodes using the ifconfig command.
This cluster has one IP heartbeat path and two non-heartbeat paths. One of the
non-heartbeat paths is a SAN-based heartbeat channel (sfwcom). The other non-heartbeat
path is heartbeating over the repository disk (dpcom). Although IP connectivity is lost when
using the ifconfig command, PowerHA SystemMirror use CAA for heartbeating over the two
other channels. This process is similar to the rs232 or diskhb heartbeat networks in previous
versions of PowerHA.
9.3.2 Testing environment
Before starting the network failover test, you check the status of the cluster. The resource
group myrg is on the riyad node as shown in Figure 9-2.
riyad:/ # netstat -i
Name Mtu
Network
en0
1500 link#2
en0
1500 192.168.100
en0
1500 10.168.200
lo0
16896 link#1
lo0
16896 127
lo0
16896 loopback
Address
a2.4e.5f.b4.5.2
riyad
saudisvc
loopback
Ipkts Ierrs
74918
74918
74918
3937
3937
3937
0
0
0
0
0
0
Opkts Oerrs
50121
50121
50121
3937
3937
3937
Coll
0
0
0
0
0
0
0
0
0
0
0
0
riyad:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------myrg
ONLINE
riyad
OFFLINE
jeddah
Figure 9-2 Status of the riyad node
282
IBM PowerHA SystemMirror 7.1 for AIX
The output of the lscluster -i command (Figure 9-3) shows that every adapter has the UP
state.
riyad:/ # lscluster -i |egrep "Interface|Node"
Network/Storage Interface Query
Node riyad
Node uuid = 2f1590d0-cc02-11df-bf20-a24e5fb40502
Interface number 1 en0
Interface state UP
Interface number 2 sfwcom
Interface state UP
Interface number 3 dpcom
Interface state UP RESTRICTED AIX_CONTROLLED
Node jeddah
Node uuid = 39710df0-cc04-11df-929f-a24e5f0d9e02
Interface number 1 en0
Interface state UP
Interface number 2 sfwcom
Interface state UP
Interface number 3 dpcom
Interface state UP RESTRICTED AIX_CONTROLLED
Figure 9-3 Output of the lscluster -i command
9.3.3 Testing a network failure
Now, the ifconfig en0 down command is issued on the riyad node. The lscluster
command shows en0 in a DOWN state and the resource group of the cluster moves to the next
available node in the chain as shown in Figure 9-4.
riyad:/ # lscluster -i |egrep "Interface|Node"
Network/Storage Interface Query
Node riyad
Node uuid = 2f1590d0-cc02-11df-bf20-a24e5fb40502
Interface number 1 en0
Interface state DOWN SOURCE SOFTWARE
Interface number 2 sfwcom
Interface state UP
Interface number 3 dpcom
Interface state UP RESTRICTED AIX_CONTROLLED
Node jeddah
Node uuid = 39710df0-cc04-11df-929f-a24e5f0d9e02
Interface number 1 en0
Interface state UP
Interface number 2 sfwcom
Interface state UP
Interface number 3 dpcom
Interface state UP RESTRICTED AIX_CONTROLLED
Figure 9-4 The lscluster -i command after a network failure
Chapter 9. Testing the PowerHA 7.1 cluster
283
The clRGinfo command shows that the myrg resource group moved to the jeddah node
(Figure 9-5).
riyad:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------myrg
OFFLINE
riyad
ONLINE
jeddah
Figure 9-5 clRGinfo while network failure
You can also check the network down event in the /var/hacmp/adm/cluster.log file
(Figure 9-6).
Oct 6 09:57:42 riyad user:notice PowerHA SystemMirror
network_down riyad net_ether_01
Oct 6 09:57:42 riyad user:notice PowerHA SystemMirror
COMPLETED: network_down riyad net_ether_01 0
Oct 6 09:57:42 riyad user:notice PowerHA SystemMirror
network_down_complete riyad net_ether_01
Oct 6 09:57:43 riyad user:notice PowerHA SystemMirror
COMPLETED: network_down_complete riyad net_ether_01 0
Figure 9-6 Network down event from the cluster.log file
284
IBM PowerHA SystemMirror 7.1 for AIX
for AIX: EVENT START:
for AIX: EVENT
for AIX: EVENT START:
for AIX: EVENT
You can see this event by monitoring the AHAFS events. You can monitor AHAFS event by
running the /usr/sbin/rsct/bin/ahafs_mon_multi command as shown in Figure 9-7.
jeddah:/ # /usr/sbin/rsct/bin/ahafs_mon_multi
=== write String : CHANGED=YES;CLUSTER=YES
=== files being monitored:
fd file
3 /aha/cluster/nodeState.monFactory/nodeStateEvent.mon
4 /aha/cluster/nodeAddress.monFactory/nodeAddressEvent.mon
5 /aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon
6 /aha/cluster/nodeList.monFactory/nodeListEvent.mon
7 /aha/cpu/processMon.monFactory/usr/sbin/rsct/bin/hagsd.mon
==================================
Loop 1:
Event for
/aha/cluster/networkAdapterState.monFactory/networkAdapterStateEvent.mon has
occurred.
BEGIN_EVENT_INFO
TIME_tvsec=1286376025
TIME_tvnsec=623294923
SEQUENCE_NUM=0
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=ADAPTER_DOWN
INTERFACE_NAME=en0
NODE_NUMBER=2
NODE_ID=0x2F1590D0CC0211DFBF20A24E5FB40502
CLUSTER_ID=0x93D8689AD0F211DFA49CA24E5F0D9E02
END_EVPROD_INFO
END_EVENT_INFO
==================================
Figure 9-7 Event monitoring from AHAFS
With help from the caa_event, you can monitor the network failure event. You can see the
CAA event by running the /usr/sbin/rsct/bin/caa_event -a command (Figure 9-8).
# /usr/sbin/rsct/bin/caa_event -a
EVENT: adapter liveness:
event_type(0)
node_number(2)
node_id(0)
sequence_number(0)
reason_number(0)
p_interface_name(en0)
EVENT: adapter liveness:
event_type(1)
node_number(2)
node_id(0)
sequence_number(1)
reason_number(0)
p_interface_name(en0)
Figure 9-8 Network failure in CAA event monitoring
Chapter 9. Testing the PowerHA 7.1 cluster
285
In this test scenario, you can see that the non-IP based heartbeat channel is working.
Compared to previous version, heartbeating is now performed by CAA.
9.4 Testing the rootvg system event
This scenario tests the event monitoring capability of PowerHA 7.1 on the new rootvg
system. Because events are now being monitored at the kernel level with CAA, you can
monitor the loss of access to the rootvg volume group.
9.4.1 The rootvg system event
As discussed previously, event monitoring is now at the kernel level. The
/usr/lib/drivers/phakernmgr kernel extension, which is loaded by the clevmgrdES
subsystem, monitors these events for loss of rootvg. It can initiate a node restart operation if
enabled to do so as shown in Figure 9-9.
PowerHA 7.1 has a new system event that is enabled by default. This new event allows for the
monitoring of the loss of the rootvg volume group while the cluster node is up and running.
Previous versions of PowerHA/HACMP were unable to monitor this type of loss. Also the
cluster was unable to perform a failover action in the event of the loss of access to rootvg. An
example is if you lose a SAN disk that is hosting the rootvg for this cluster node.
The new option is available under the SMIT menu path smitty sysmirror Custom Cluster
Configuration  Events  System Events. Figure 9-9 shows that the rootvg system event
is defined and enabled by default in PowerHA 7.1.
Change/Show Event Response
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Event Name
* Response
* Active
[Entry Fields]
ROOTVG
Log event and reboot
Yes
+
+
+
Figure 9-9 The rootvg system event
The default event properties instruct the system to log an event and restart when a loss of
rootvg occurs. This exact scenario is tested in the next section to demonstrate this concept.
9.4.2 Testing the loss of the rootvg volume group
We simulate this test with a two-node cluster. The rootvg volume group is hosted on SAN
storage through a Virtual I/O Server (VIOS). The test removes access to the rootvg file
systems with the cluster node still up and running. This test can be done in several ways, from
pulling the physical SAN connection to making the storage unavailable to the operating
system. In this situation, the VSCSI resource is made unavailable on the VIOS.
This scenario entails a two-node cluster with one resource group. The cluster is running on
two nodes: sydney and perth. The rootvg volume group is hosted by the VIOS on a VSCSI
disk.
286
IBM PowerHA SystemMirror 7.1 for AIX
Cluster node status and mapping
First, you check the VIOS for the client mapping. You can identify the client partition number
by running the uname -L command on the cluster node. In this case, the client partition is 7.
Next you run the lsmap -all command on the VIOS, as shown in Figure 9-10, and look up
the client partition. Only one LUN is mapped through VIOS to the cluster node, because the
shared storage is attached by using Fibre Channel (FC) adapters.
lsmap -all
SVSA
Physloc
Client Partition
ID
--------------- -------------------------------------------- -----------------vhost5
U9117.MMA.101F170-V1-C26
0x00000007
VTD
Status
LUN
Backing device
Physloc
vtscsi13
Available
0x8100000000000000
lp5_rootvg
Figure 9-10 VIOS output of the lsmap command showing the rootvg resource
Check the node to ensure that you have the right disk as shown in Figure 9-11.
sydney:/ # lscfg -l hdisk0
hdisk0
U9117.MMA.101F170-V7-C5-T1-L8100000000000000
Disk Drive
sydney:/ # lspv
hdisk0
caa_private0
hdisk2
hdisk3
00c1f170ff638163
00c0f6a0febff5d4
00c1f170674f3d6b
00c1f1706751bc0d
rootvg
caavg_private
dbvg
appvg
Virtual SCSI
active
active
Figure 9-11 PowerHA node showing the mapping of hdisk0 to the VIOS
After the mapping is established, review the cluster status to ensure that the resource group
is online as shown in Figure 9-12.
sydney:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------dbrg
ONLINE
sydney
OFFLINE
perth
sydney:/ # lssrc -ls clstrmgrES | grep "Current state"
Current state: ST_STABLE
Figure 9-12 Sydney cluster status
Chapter 9. Testing the PowerHA 7.1 cluster
287
Testing by taking the rootvg volume group offline
To perform the test, take the mapping offline on the VIOS by removing the virtual target
device definition. You do this test while the PowerHA node is up and running as shown in
Figure 9-13.
$ rmvdev -vtd vtscsi13
$ lsmap -vadapter vhost5
SVSA
Physloc
Client Partition
ID
--------------- -------------------------------------------- -----------------vhost5
U9117.MMA.101F170-V1-C26
0x00000007
VTD
NO VIRTUAL TARGET DEVICE FOUND
Figure 9-13 VIOS: Taking the rootvg LUN offline
You have now removed the virtual target device (VTD) mapping that maps the rootvg LUN to
the client partition, which in this case, is the PowerHA node called sydney. You perform this
operation while the node is up and running and hosting the resource group. This operation
demonstrates what happens to the node when rootvg access has been lost.
While checking the node, the node halted and failed the resource group over to the standby
node perth (Figure 9-14). This behavior is new and expected in this situation. It is a result of
the system event that monitors access to rootvg from the kernel. Checking perth shows that
the failover happened.
perth:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------dbrg
OFFLINE
sydney
ONLINE
perth
Figure 9-14 Node status from the standby node showing that the node failed over
288
IBM PowerHA SystemMirror 7.1 for AIX
9.4.3 Loss of rootvg: What PowerHA logs
To show that this event is recognized and that you took the correct action, check the system
error report shown in Figure 9-15.
LABEL:
IDENTIFIER:
Date/Time:
Sequence Number:
Machine Id:
Node Id:
Class:
Type:
WPAR:
Resource Name:
KERNEL_PANIC
225E3B63
Wed Oct 6 14:07:54 2010
2801
00C1F1704C00
sydney
S
TEMP
Global
PANIC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
ASSERT STRING
PANIC STRING
System Halt because of rootvg failure
Figure 9-15 System error report showing a rootvg failure
9.5 Simulation of a crash in the node with an active resource
group
This section presents a scenario to simulate the node crash with the resource group active.
The scenario is made of the hot-standby cluster configuration with participating nodes seoul
and busan with only one Ethernet network and two Ethernet interfaces in each node. The halt
-q command is triggered in the seoul node that is hosting the resource group.
The result is that the resource group moved to the standby node as expected. Example 9-16
shows the relevant output that is written to the busan:/var/hacmp/adm/cluster.log file.
Example 9-16 Output of the resource move to the standby node
Sep 29 16:30:22 busan user:warn|warning cld[11599982]: Shutting down all services.
Sep 29 16:30:23 busan user:warn|warning cld[11599982]: Unmounting file systems.
Sep 29 16:30:28 busan daemon:err|error ConfigRM[10879056]: (Recorded using libct_ffdc.a cv 2):::Error ID:
:::Reference ID::::Template ID: a098bf90:::Details File: :::Location:
RSCT,PeerDomain.C,1.99.1.519,17853:::CONFIGRM_PENDINGQUORUM_ER The operational quorum state of the active
peer domain has changed to PENDING_QUORUM. This state usually indicates that exactly half of the nodes
that are defined in the peer domain are online. In this state cluster resources cannot be recovered
although none will be stopped explicitly.
Sep 29 16:30:28 busan local0:crit clstrmgrES[5701662]: Wed Sep 29 16:30:28 Removing 2 from ml_idx
Sep 29 16:30:28 busan user:notice PowerHA SystemMirror for AIX: EVENT START: node_down seoul
Chapter 9. Testing the PowerHA 7.1 cluster
289
Sep
Sep
Sep
Sep
Sep
Sep
Sep
Sep
Sep
Sep
Sep
Sep
Sep
Sep
Sep
Sep
Sep
29
29
29
29
29
29
29
29
29
29
29
29
29
29
29
29
29
16:30:28
16:30:31
16:30:31
16:30:32
16:30:32
16:30:32
16:30:33
16:30:35
16:30:35
16:30:35
16:30:35
16:30:36
16:30:38
16:30:45
16:30:45
16:30:45
16:30:46
busan
busan
busan
busan
busan
busan
busan
busan
busan
busan
busan
busan
busan
busan
busan
busan
busan
user:notice
user:notice
user:notice
user:notice
user:notice
user:notice
user:notice
user:notice
user:notice
user:notice
user:notice
user:notice
user:notice
user:notice
user:notice
user:notice
user:notice
PowerHA
PowerHA
PowerHA
PowerHA
PowerHA
PowerHA
PowerHA
PowerHA
PowerHA
PowerHA
PowerHA
PowerHA
PowerHA
PowerHA
PowerHA
PowerHA
PowerHA
SystemMirror
SystemMirror
SystemMirror
SystemMirror
SystemMirror
SystemMirror
SystemMirror
SystemMirror
SystemMirror
SystemMirror
SystemMirror
SystemMirror
SystemMirror
SystemMirror
SystemMirror
SystemMirror
SystemMirror
for
for
for
for
for
for
for
for
for
for
for
for
for
for
for
for
for
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
AIX:
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
EVENT
COMPLETED: node_down seoul 0
START: rg_move_release busan 1
START: rg_move busan 1 RELEASE
COMPLETED: rg_move busan 1 RELEASE 0
COMPLETED: rg_move_release busan 1 0
START: rg_move_fence busan 1
COMPLETED: rg_move_fence busan 1 0
START: rg_move_fence busan 1
COMPLETED: rg_move_fence busan 1 0
START: rg_move_acquire busan 1
START: rg_move busan 1 ACQUIRE
START: acquire_takeover_addr
COMPLETED: acquire_takeover_addr 0
COMPLETED: rg_move busan 1 ACQUIRE 0
COMPLETED: rg_move_acquire busan 1 0
START: rg_move_complete busan 1
COMPLETED: rg_move_complete busan 1 0
The cld messages are related to the solidDB. The cld subsystem determines whether the
local node must become the primary or secondary solidDB server in a failover. Before the
crash, solidDB was active on the seoul node as follows:
seoul:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: seoul, 0xdc82faf0908920dc, 2
As expected, after the crash, solidDB is active in the remaining busan node as follows:
busan:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: busan, 0x564bc620973c9bdc, 1
With the absence of the seoul node, its interfaces are in STALE status as shown in
Example 9-17.
Example 9-17 The lscluster -i command to check the status of the cluster
busan:/ # lscluster -i
Network/Storage Interface Query
Cluster Name: korea
Cluster uuid: a01f47fe-d089-11df-95b5-a24e50543103
Number of nodes reporting = 2
Number of nodes expected = 2
Node busan
Node uuid = e356646e-c0dd-11df-b51d-a24e57e18a03
Number of interfaces discovered = 3
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = a2.4e.57.e1.8a.3
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863
ndd flags for interface = 0x21081b
Interface state UP
Number of regular addresses configured on interface = 2
IPV4 ADDRESS: 192.168.101.144 broadcast 192.168.103.255
255.255.255.0
290
IBM PowerHA SystemMirror 7.1 for AIX
netmask
IPV4 ADDRESS: 10.168.101.44 broadcast 10.168.103.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.43 broadcast 0.0.0.0 netmask
0.0.0.0
Interface number 2 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP RESTRICTED AIX_CONTROLLED
Interface number 3 en2
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = a2.4e.57.e1.8a.7
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863
ndd flags for interface = 0x21081b
Interface state UP
Number of regular addresses configured on interface = 2
IPV4 ADDRESS: 192.168.201.144 broadcast 192.168.203.255 netmask
255.255.255.0
IPV4 ADDRESS: 10.168.101.143 broadcast 10.168.103.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.43 broadcast 0.0.0.0 netmask
0.0.0.0
Node seoul
Node uuid = 4f8858be-c0dd-11df-930a-a24e50543103
Number of interfaces discovered = 3
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = a2.4e.50.54.31.3
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863
ndd flags for interface = 0x21081b
Interface state STALE
Number of regular addresses configured on interface = 2
IPV4 ADDRESS: 192.168.101.143 broadcast 192.168.103.255 netmask
255.255.255.0
IPV4 ADDRESS: 10.168.101.43 broadcast 10.168.103.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.43 broadcast 0.0.0.0 netmask
0.0.0.0
Interface number 2 en2
Chapter 9. Testing the PowerHA 7.1 cluster
291
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = a2.4e.50.54.31.7
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863
ndd flags for interface = 0x21081b
Interface state STALE
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 192.168.201.143 broadcast 192.168.203.255 netmask
255.255.255.0
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.168.101.43 broadcast 0.0.0.0 netmask
0.0.0.0
Interface number 3 dpcom
ifnet type = 0 ndd type = 305
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 750
Mean Deviation in network rrt across interface = 1500
Probe interval for interface = 22500 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state STALE
Results: The results were the same when issuing the halt command instead of the halt
-q command.
9.6 Simulations of CPU starvation
In previous versions of PowerHA, CPU starvation could activate the deadman switch, leading
the starved node to a halt with a consequent move of the resource groups. In PowerHA 7.1,
the deadman switch no longer exists, and its functionality is accomplished at the kernel
interruption level. This test shows how the absence of the deadman switch can influence
cluster behavior.
Scenario 1
This scenario shows the use of a stress tool on the CPU of one node with more than 50
processes in the run queue and a duration of 60 seconds.
Overview
This scenario consists of a hot-standby cluster configuration with participating nodes seoul
and busan with only one Ethernet network. Each node has two Ethernet interfaces. The
resource group is hosted on seoul, and solidDB is active on the busan node. A tool is run to
stress the seoul CPU with more than 50 processes in the run queue with a duration of 60
seconds as shown in Example 9-18 on page 293.
292
IBM PowerHA SystemMirror 7.1 for AIX
Example 9-18 Scenario testing the use of a stress tool on one node
seoul:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: busan, 0x564bc620973c9bdc, 1
seoul:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------db2pok_Resourc ONLINE
seoul
OFFLINE
busan
Beneath the lpartstat output header, you see the CPU and memory configuration for each
node:
Seoul: Power 6, type=Shared, mode=Uncapped, smt=On, lcpu=2, mem=3584MB, ent=0.50
Busan: Power 6, type=Shared, mode=Uncapped, smt=On, lcpu=2, mem=3584MB, ent=0.50
Results
Before the test, the seoul node is running within an average of 3% of its entitled capacity. The
run queue is within an average of three processes as shown in Example 9-19.
Example 9-19 The vmstat result of the seoul node
seoul:/ # vmstat 2
System configuration: lcpu=2 mem=3584MB ent=0.50
kthr
memory
page
faults
cpu
----- ----------- ------------------------ ------------ ----------------------r b
avm
fre re pi po fr
sr cy in
sy cs us sy id wa
pc
ec
2 0 424045 10674
0
0
0
0
0
0 92 1508 359 1 2 97 0 0.02
3.4
3 0 424045 10674
0
0
0
0
0
0 84 1001 346 1 1 97 0 0.02
3.1
3 0 424044 10675
0
0
0
0
0
0 88 1003 354 1 1 97 0 0.02
3.1
3 0 424045 10674
0
0
0
0
0
0 91 1507 352 1 2 97 0 0.02
3.5
3 0 424047 10672
0
0
0
0
0
0 89 1057 370 1 2 97 0 0.02
3.3
3 0 424064 10655
0
0
0
0
0
0 94 1106 379 1 2 97 0 0.02
3.6
During the test, the entitled capacity raised to 200%, and the run queue raised to an average
of 50 processes as shown in Example 9-20.
Example 9-20 Checking the node status after running the stress test
seoul:/ # vmstat 2
System configuration: lcpu=2 mem=3584MB ent=0.50
kthr
memory
page
faults
cpu
----- ----------- ------------------------ ------------ ----------------------r b
avm
fre re pi po fr
sr cy in
sy cs us sy id wa
pc
ec
52 0 405058 167390
0
0
0
0
0
0 108 988 397 42 8 50 0 0.25 50.6
41 0 405200 167248
0
0
0
0
0
0 78 140 245 99 0 0 0 0.79 158.1
49 0 405277 167167
0
0
0
0
0
0 71 206 249 99 0 0 0 1.00 199.9
50 0 405584 166860
0
0
0
0
0
0 73
33 241 99 0 0 0 1.00 199.9
48 0 405950 166491
0
0
0
0
0
0 71 297 244 99 0 0 0 1.00 199.8
As expected, the CPU starvation did not trigger a resource group move from the seoul node
to the busan node. The /var/adm/ras/syslog.caa log file reported messages about solidDB
daemons being unable to communicate, but the leader node continued to be the busan node
as shown in Example 9-21 on page 294.
Chapter 9. Testing the PowerHA 7.1 cluster
293
Example 9-21 Status of the nodes after triggering a CPU starvation scenario
seoul:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: busan, 0x564bc620973c9bdc, 1
seoul:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------db2pok_Resourc ONLINE
seoul
OFFLINE
busan
Scenario 2
This scenario shows the use of a stress tool on the CPU of two nodes with more than 50
processes in the run queue and a duration of 60 seconds.
Overview
This scenario consists of a hot-standby cluster configuration with participating nodes seoul
and busan with only one Ethernet network. Each node has two Ethernet interfaces. Both the
resource group and the solidDB are active in busan node. A tool is run to stress the CPU of
both nodes with more than 50 processes in the run queue with a duration of 60 seconds as
shown in Example 9-22.
Example 9-22 Scenario testing the use of a stress tool on both nodes
seoul:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: busan, 0x564bc620973c9bdc, 1
seoul:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------db2pok_Resourc OFFLINE
seoul
ONLINE
busan
Results
Before the test, both nodes have a low run queue and low entitled capacity as shown in
Example 9-23.
Example 9-23 Results of the stress test in scenario two
seoul:/ # vmstat 2
System configuration: lcpu=2 mem=3584MB ent=0.50
kthr
memory
page
faults
cpu
----- ----------- ------------------------ ------------ ----------------------r b
avm
fre re pi po fr
sr cy in
sy cs us sy id wa
pc
ec
1 0 389401 181315
0
0
0
0
0
0 95 1651 302 2 2 97 0 0.02
3.5
1 0 389405 181311
0
0
0
0
0
0 91 960 316 1 2 97 0 0.02
3.3
1 0 389406 181310
0
0
0
0
0
0 88 953 299 1 1 97 0 0.02
3.1
1 0 389408 181308
0
0
0
0
0
0 97 1461 301 1 2 97 0 0.02
3.5
1 0 389411 181305
0
0
0
0
0
0 109 967 326 1 3 96 0 0.02
4.7
busan:/ # vmstat 2
System configuration: lcpu=2 mem=3584MB ent=0.50
kthr
memory
page
faults
294
IBM PowerHA SystemMirror 7.1 for AIX
cpu
----r b
1 0
1 0
1 0
1 0
1 0
----------- -----------------------avm
fre re pi po fr
sr cy
450395 349994
0
0
0
0
0
450395 349994
0
0
0
0
0
450395 349994
0
0
0
0
0
450395 349994
0
0
0
0
0
450395 349994
0
0
0
0
0
------------ ----------------------in
sy cs us sy id wa
pc
ec
0 77 670 363 1 2 97 0 0.02
3.4
0 80 477 359 1 1 98 0 0.02
3.1
0 80 554 369 1 1 97 0 0.02
3.4
0 73 479 368 1 1 98 0 0.02
3.1
0 81 468 339 1 1 98 0 0.01
2.9
During the test, the seoul node kept an average of 50 processes in the run queue and an
entitled capacity of 200% as shown in Example 9-24.
Example 9-24 Seoul node vmstat results during the test
seoul:/ # vmstat 2
System configuration: lcpu=2 mem=3584MB ent=0.50
kthr
memory
page
faults
cpu
----- ----------- ------------------------ ------------ ----------------------r b
avm
fre re pi po fr
sr cy in
sy cs us sy id wa
pc
ec
43 0 371178 199534
0
0
0
0
0
0 74 312 251 99 0 0 0 1.00 199.8
52 0 371178 199534
0
0
0
0
0
0 73
19 247 99 0 0 0 1.00 200.0
52 0 371176 199534
0
0
0
0
0
0 75 108 249 99 0 0 0 1.00 199.9
47 0 371075 199635
0
0
0
0
0
0 74
33 257 99 0 0 0 1.00 200.1
The busan node did not respond to the vmstat command during the test. When the CPU
stress finished, it could throw just one line of output showing a run queue of 119 processes
(Example 9-25).
Example 9-25 Busan node showing only one line of output
busan:/ # vmstat 2
System configuration: lcpu=2 mem=3584MB ent=0.50
kthr
memory
page
faults
cpu
----- ----------- ------------------------ ------------ ----------------------119 0 450463 349911
0
0
0
0
0
0 56
19 234 99 0 0 0 0.50 99.6
Both the resource group and solidDB database did not move from the busan node as shown
in Example 9-26.
Example 9-26 Status of the busan node
seoul:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------db2pok_Resourc OFFLINE
seoul
ONLINE
busan
seoul:/ # lssrc -ls IBM.StorageRM | grep Leader
Group Leader: busan, 0x564bc620973c9bdc, 1
Conclusion
The conclusion of this test is that eventual peak performance degradation events do not
cause resource group moves and unnecessary outages.
Chapter 9. Testing the PowerHA 7.1 cluster
295
9.7 Simulation of a Group Services failure
This scenario consists of a hot-standby cluster configuration with participating nodes seoul
and busan with only one Ethernet network. Each node has two Ethernet interfaces. We end
the cthags process in the seoul node that was hosting the resource group.
As a result, the seoul node halted as expected, and the resource group is acquired by the
remaining node as shown in Example 9-27.
Example 9-27 Resource group movement
seoul:/ # lssrc -ls cthags
Subsystem
Group
PID
Status
cthags
cthags
5963978
active
5 locally-connected clients. Their PIDs:
6095070(IBM.ConfigRMd) 6357196(rmcd) 5963828(IBM.StorageRMd) 7471354(clstrmgr)
12910678(gsclvmd)
HA Group Services domain information:
Domain established by node 2
Number of groups known locally: 8
Number of
Number of local
Group name
providers
providers/subscribers
rmc_peers
2
1
0
s00O3RA00009G0000015CDBQGFL
2
1
0
IBM.ConfigRM
2
1
0
IBM.StorageRM.v1
2
1
0
CLRESMGRD_1108531106
2
1
0
CLRESMGRDNPD_1108531106
2
1
0
CLSTRMGR_1108531106
2
1
0
d00O3RA00009G0000015CDBQGFL
2
1
0
Critical clients will be terminated if unresponsive
seoul:/ # ps -ef | grep cthagsd | grep -v grep
root 5963978 3866784
4 17:02:33 - 0:00 /usr/sbin/rsct/bin/hagsd cthags
seoul:/ # kill -9 5963978
The seoul:/var/adm/ras/syslog.caa log file recorded the messages before the crash. You
can observe that the seoul node was halted after 1 second as shown in Example 9-28.
Example 9-28 Message in the syslog.caa file in the seoul node
Sep 29 17:02:33 seoul daemon:err|error RMCdaemon[6357196]: (Recorded using libct_ffdc.a cv 2):::Error ID:
6XqlQl0dZucA/POE1DK4e.1...................:::Reference ID: :::Template ID: b1731da3:::Details File:
:::Location: RSCT,rmcd_gsi.c,1.50,10
48 :::RMCD_2610_101_ER Internal error. Error data 1 00000001 Error data 2 00000000 Error data 3 dispatch_gs
Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 announcementCb: Called,
state=ST_STABLE, provider token 1
Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 announcementCb: GsToken 3,
AdapterToken 4, rm_GsToken 1
Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 announcementCb: GRPSVCS
announcment code=512; exiting
Sep 29 17:02:33 seoul local0:crit clstrmgrES[7471354]: Wed Sep 29 17:02:33 CHECK FOR FAILURE OF RSCT
SUBSYSTEMS (cthags)
Sep 29 17:02:33 seoul daemon:err|error ConfigRM[6095070]: (Recorded using libct_ffdc.a cv 2):::Error ID:
:::Reference ID: :::Template ID: 362b0a5f:::Details File: :::Location:
RSCT,PeerDomain.C,1.99.1.519,21079:::CONFIGRM_EXIT_GS_ER The peer domain configuration manager daemon
296
IBM PowerHA SystemMirror 7.1 for AIX
(IBM.ConfigRMd) is exiting due to the Group Services subsystem terminating. The configuration manager
daemon will restart automatically, synchronize the nodes configuration with the domain and rejoin the
domain if possible.
Sep 29 17:02:34 seoul daemon:notice StorageRM[5963828]: (Recorded using libct_ffdc.a cv 2):::Error ID:
:::Reference ID: :::Template ID: a8576c0d:::Details File: :::Location: RSCT,StorageRMDaemon.C,1.56,323
:::STORAGERM_STOPPED_ST IBM.StorageRM daemon has been stopped.
Sep 29 17:02:34 seoul daemon:notice ConfigRM[6095070]: (Recorded using libct_ffdc.a cv 2):::Error ID:
:::Reference ID: :::Template ID: de84c4db:::Details File: :::Location: RSCT,IBM.ConfigRMd.C,1.55,346
:::CONFIGRM_STARTED_STIBM.ConfigRM daemon has started.
Sep 29 17:02:34 seoul daemon:notice snmpd[3342454]: NOTICE: lost peer (SMUX ::1+51812+5)
Sep 29 17:02:34 seoul daemon:notice RMCdaemon[15663146]: (Recorded using libct_ffdc.a cv 2):::Error ID:
6eKora0eZucA/Xuo/D
K4e.1...................:::Reference ID: :::Template ID: a6df45aa:::Details File: :::Location:
RSCT,rmcd.c,1.75,225:::RMCD_INFO_0_ST The daemon is started.
Sep 29 17:02:34 seoul user:notice PowerHA SystemMirror for AIX: clexit.rc : Unexpected termination of
clstrmgrES
Sep 29 17:02:34 seoul user:notice PowerHA SystemMirror for AIX: clexit.rc : Halting system immediately!!!
9.8 Testing a Start After resource group dependency
This test uses the example that was configured in 5.1.6, “Configuring Start After and Stop
After resource group dependencies” on page 96. Figure 9-16 shows a summary of the
configuration. The dependency configuration of the Start After resource group is tested to see
whether it works as expected.
Figure 9-16 Start After dependency between the apprg and dbrg resource group
Chapter 9. Testing the PowerHA 7.1 cluster
297
9.8.1 Testing the standard configuration of a Start After resource group
dependency
Example 9-29 shows the state of a resource group pair after a normal startup of the cluster on
both nodes.
Example 9-29 clRGinfo for a Start After resource group pair
sydney:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------dbrg
ONLINE
sydney
OFFLINE
perth
apprg
ONLINE
perth
OFFLINE
sydney
With both resource groups online, the source (dependent) apprg resource group can be
brought offline and then online again. Alternatively, it can be gracefully moved to another
node without any influence on the target dbrg resource group. With both resource groups
online, the source (dependent) apprg resource group can be brought offline and then online
again. Alternatively, it can be gracefully moved to another node without any influence on the
target dbrg resource group. Target resource group can also be brought offline. However, to
bring the source resource group online, the target resource group must be brought online
manually (if it is offline).
If you start the cluster only on the home node of the source resource group, the apprg
resource group in this case, the cluster waits until the dbrg resource group is brought online
as shown in Example 9-30. The startup policy is Online On Home Node Only for both resource
groups.
Example 9-30 Offline because the target is offline from clRGinfo
sydney:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------dbrg
OFFLINE
sydney
OFFLINE
perth
apprg
OFFLINE due to target offlin perth
OFFLINE
sydney
9.8.2 Testing application startup with Startup Monitoring configured
For this test, both resource groups are started on the same node. This way their application
scripts logs messages in the same file so that you can see the detailed sequence of their start
and finish moments. The home node is temporarily modified to sydney for both resource
groups. Then the cluster is started only on the sydney node with both resource groups.
298
IBM PowerHA SystemMirror 7.1 for AIX
Example 9-31 shows the start, stop, and monitoring scripts. Note the syslog configuration that
was made to log the messages through the local7 facility in the
/var/hacmp/log/StartAfter_cluster.log file.
Example 9-31 Dummy start, stop, and monitoring scripts
sydney:/HA71 # ls
app_mon.sh
app_stop.sh
app_start.sh db_mon.sh
db_start.sh
db_stop.sh
sydney:/HA71 # cat db_start.sh
#!/bin/ksh
fp="local7.info"
file="`expr "//$0" : '.*/\([^/]*\)'`"
# cleanup
if [ -f /dbmp/db.lck ]; then rm /dbmp/db.lck; fi
logger -t"$file" -p$fp "Starting up DB... "
sleep 50
echo "DB started at:\n\t`date`">/dbmp/db.lck
logger -t"$file" -p$fp "DB is running!"
exit 0
sydney:/HA71 # cat db_stop.sh
#!/bin/ksh
fp="local7.info"
file="`expr "//$0" : '.*/\([^/]*\)'`"
logger -t"$file" -p$fp "Shutting down DB... "
sleep 20
# cleanup
if [ -f /dbmp/db.lck ]; then rm /dbmp/db.lck; fi
logger -t"$file" -p$fp "DB stopped!"
exit 0
sydney:/HA71 # cat db_mon.sh
#!/bin/ksh
fp="local7.info"
file="`expr "//$0" : '.*/\([^/]*\)'`"
if [ -f /dbmp/db.lck ]; then
logger -t"$file" -p$fp "DB is running!"
exit 0
fi
logger -t"$file" -p$fp "DB is NOT running!"
exit 1
sydney:/HA71 # cat app_start.sh
#!/bin/ksh
fp="local7.info"
file="`expr "//$0" : '.*/\([^/]*\)'`"
# cleanup
if [ -f /appmp/app.lck ]; then rm /appmp/app.lck; fi
logger -t"$file" -p$fp "Starting up APP... "
sleep 10
Chapter 9. Testing the PowerHA 7.1 cluster
299
echo "APP started at:\n\t`date`">/appmp/app.lck
logger -t"$file" -p$fp "APP is running!"
exit 0
sydney:/HA71 # cat app_stop.sh
#!/bin/ksh
fp="local7.info"
file="`expr "//$0" : '.*/\([^/]*\)'`"
logger -t"$file" -p$fp "Shutting down APP... "
sleep 2
# cleanup
if [ -f /appmp/app.lck ]; then rm /appmp/app.lck; fi
logger -t"$file" -p$fp "APP stopped!"
exit 0
sydney:/HA71 # cat app_mon.sh
#!/bin/ksh
fp="local7.info"
file="`expr "//$0" : '.*/\([^/]*\)'`"
if [ -f /appmp/app.lck ]; then
logger -t"$file" -p$fp "APP is running!"
exit 0
fi
logger -t"$file" -p$fp "APP is NOT running!"
exit 1
sydney:/HA71 # grep local7 /etc/syslog.conf
local7.info /var/hacmp/log/StartAfter_cluster.log
rotate size 256k files 4
Without Startup Monitoring, the APP startup script is launched before the DB startup script
returns as shown Example 9-32.
Example 9-32 Startup sequence without Startup monitoring mode
...
Oct
Oct
Oct
Oct
Oct
Oct
Oct
Oct
...
300
12
12
12
12
12
12
12
12
07:53:26
07:53:27
07:53:36
07:53:37
07:53:47
07:53:53
07:54:17
07:54:23
sydney
sydney
sydney
sydney
sydney
sydney
sydney
sydney
local7:info
local7:info
local7:info
local7:info
local7:info
local7:info
local7:info
local7:info
IBM PowerHA SystemMirror 7.1 for AIX
db_mon.sh: DB is NOT running!
db_start.sh: Starting up DB...
app_mon.sh: APP is NOT running!
app_start.sh: Starting up APP...
app_start.sh: APP is running!
app_mon.sh: APP is running!
db_start.sh: DB is running!
app_mon.sh: APP is running!
With Startup Monitoring, the APP startup script is launched after the DB startup script
returns, as shown in Example 9-33, and as expected.
Example 9-33 Startup sequence with Startup Monitoring
...
Oct
Oct
Oct
Oct
Oct
Oct
Oct
Oct
Oct
Oct
Oct
Oct
Oct
Oct
Oct
Oct
...
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
08:02:38
08:02:39
08:02:39
08:02:45
08:02:51
08:02:57
08:03:03
08:03:09
08:03:15
08:03:21
08:03:27
08:03:29
08:03:33
08:03:49
08:03:50
08:04:00
sydney
sydney
sydney
sydney
sydney
sydney
sydney
sydney
sydney
sydney
sydney
sydney
sydney
sydney
sydney
sydney
local7:info
local7:info
local7:info
local7:info
local7:info
local7:info
local7:info
local7:info
local7:info
local7:info
local7:info
local7:info
local7:info
local7:info
local7:info
local7:info
db_mon.sh: DB is NOT running!
db_start.sh: Starting up DB...
db_mon.sh: DB is NOT running!
db_mon.sh: DB is NOT running!
db_mon.sh: DB is NOT running!
db_mon.sh: DB is NOT running!
db_mon.sh: DB is NOT running!
db_mon.sh: DB is NOT running!
db_mon.sh: DB is NOT running!
db_mon.sh: DB is NOT running!
db_mon.sh: DB is NOT running!
db_start.sh: DB is running!
db_mon.sh: DB is running!
app_mon.sh: APP is NOT running!
app_start.sh: Starting up APP...
app_start.sh: APP is running!
Example 9-34 shows the state change of the resource groups during this startup.
Example 9-34 Resource group state during startup
sydney:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------dbrg
OFFLINE
sydney
OFFLINE
perth
apprg
OFFLINE
sydney
OFFLINE
perth
sydney:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------dbrg
ACQUIRING
sydney
OFFLINE
perth
apprg
TEMPORARY ERROR
sydney
OFFLINE
perth
sydney:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------dbrg
ONLINE
sydney
OFFLINE
perth
apprg
ACQUIRING
OFFLINE
sydney
perth
Chapter 9. Testing the PowerHA 7.1 cluster
301
sydney:/ # clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------dbrg
ONLINE
sydney
OFFLINE
perth
apprg
ONLINE
sydney
OFFLINE
perth
9.9 Testing dynamic node priority
This test has the algeria, brazil, and usa nodes, and one resource group in the cluster as
shown in Figure 9-17. This resource group is configured to fail over based on a script return
value. The DNP.sh script returns different values for each node. For details about configuring
the dynamic node priority (DNP), see 5.1.8, “Configuring the dynamic node priority (adaptive
failover)” on page 102.
Figure 9-17 Dynamic node priority test environment
Table 9-1 provides the cluster details.
Table 9-1 Cluster details
302
Field
Value
Resource name
algeria_rg
Participating nodes
algeria, brazil, usa
Dynamic node priority policy
cl_lowest_nonzero_udscript_rc
IBM PowerHA SystemMirror 7.1 for AIX
Field
Value
DNP script path
/usr/IBM/HTTPServer/bin/DNP.sh
DNP script timeout value
20
The default node priority is algeria first, then brazil, and then usa. The usa node gets the
lowest return value from DNP.sh. When a resource group failover is triggered, the algeria_rg
resource group is moved to the usa node, because the return value is the lowest one as
shown in Example 9-35.
Example 9-35 Expected return value for each nodes
usa:/ # clcmd cat /usr/IBM/HTTPServer/bin/DNP.sh
------------------------------NODE usa
------------------------------exit 100
------------------------------NODE brazil
------------------------------exit 105
------------------------------NODE algeria
------------------------------exit 103
When the resource group fails over, algeria_rg moves from the algeria node to the usa
node, which has the lowest return value in DNP.sh as shown in Figure 9-18.
# clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------algeria_rg
ONLINE
algeria
OFFLINE
brazil
OFFLINE
usa
# clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------algeria_rg
OFFLINE
algeria
OFFLINE
brazil
ONLINE
usa
Figure 9-18 clRGinfo of before and after takeover
Chapter 9. Testing the PowerHA 7.1 cluster
303
Then the DNP.sh script is modified to set brazil with the lowest return value as shown in
Example 9-36.
Example 9-36 Changing the DNP.sh file
usa:/ # clcmd cat /usr/IBM/HTTPServer/bin/DNP.sh
------------------------------NODE usa
------------------------------exit 100
------------------------------NODE brazil
------------------------------exit 101
------------------------------NODE algeria
------------------------------exit 103
Upon resource group failover, the resource group moves to brazil as long as it has the
lowest return value among the cluster nodes this time as shown in Figure 9-19.
# clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------algeria_rg
OFFLINE
algeria
OFFLINE
brazil
ONLINE
usa
# clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------algeria_rg
OFFLINE
algeria
ONLINE
brazil
OFFLINE
usa
Figure 9-19 Resource group moving
To simplify the test scenario, DNP.sh is defined to simply return a value. In a real situation, you
can replace this DNP.sh sample file with any customized script. Then, node failover is done
based upon the return value of your own script.
304
IBM PowerHA SystemMirror 7.1 for AIX
10
Chapter 10.
Troubleshooting PowerHA 7.1
This chapter shares the experiences of the writers of this IBM Redbooks publication and the
lessons learned in all the phases of implementing PowerHA 7.1 to help you troubleshoot your
migration, installation, configuration, and Cluster Aware AIX (CAA).
This chapter includes the following topics:
Locating the log files
Troubleshooting the migration
Troubleshooting the installation and configuration
Troubleshooting problems with CAA
© Copyright IBM Corp. 2011. All rights reserved.
305
10.1 Locating the log files
This section explains where you can find the various log files in your PowerHA cluster to
assist in managing problems with CAA and PowerHA.
10.1.1 CAA log files
You can check the CAA clutils log file and the syslog file for error messages as explained in
the following sections.
The clutils file
If you experience a problem with an operation, such as creating a cluster in CAA, check the
/var/hacmp/log/clutils.log log file.
The syslog facility
The CAA service uses the syslog facility to log errors and debugging information. All CAA
messages are written to the /var/adm/ras/syslog.caa file.
For verbose logging information, you must enable debug mode by editing the
/etc/syslog.conf configuration file and adding the following line as shown in Figure 10-1:
*.debug /tmp/syslog.out rotate size 10m files 10
local0.crit /dev/console
local0.info /var/hacmp/adm/cluster.log
user.notice /var/hacmp/adm/cluster.log
daemon.notice /var/hacmp/adm/cluster.log
*.info /var/adm/ras/syslog.caa rotate size 1m files 10
*.debug /tmp/syslog.out rotate size 10m files 10
Figure 10-1 Extract from the /etc/syslog.conf file
After you make this change, verify that a syslog.out file is in the in /tmp directory. If this file is
not in the directory, create one by entering the touch /tmp/syslog.out command. After you
create the file, refresh the syslog daemon by issuing the refresh -s syslogd command.
When debug mode is enabled, you capture detailed debugging information in the
/tmp/syslog.out file. This information can assist you in troubleshooting problems with
commands, such as the mkcluster command during cluster migration.
10.1.2 PowerHA log files
The following PowerHA log files are most commonly used:
/var/hacmp/adm/cluster.log
One of the main sources of information for the administrator. This file
tracks time-stamped messages of all PowerHA events, scripts, and
daemons.
/var/hacmp/log/hacmp.out
Along with cluster.log file, this file is the most important source of
information. Recent PowerHA releases are sending more details to
this log file, including summaries of events and the location of
resource groups.
306
IBM PowerHA SystemMirror 7.1 for AIX
/var/log/clcomd/clcomd.log
Includes information about communication that is exchanged among
all the cluster nodes.
Increasing the verbose logging level
You can increase the verbose logging level in PowerHA by enabling the export
VERBOSE_LOGGING=high setting. This setting enables a high level of logging for PowerHA. The
result is that you see more information in the log files when this variable is exported in such
logs as the hacmp.out and clmigcheck.log files.
Listing the PowerHA log files by using the clmgr utility
One of the common ways to have a list of all PowerHA log files is to use the clmgr
command-line utility. First you run the clmgr view log command to access a list of the
available logs as shown in Example 10-1. Then you run the clmgr view log logname
command replacing logname with the log that you want to analyze.
Example 10-1 Generating a list of PowerHA log files with the clmgr utility
seoul:/ # clmgr view log
ERROR: """" does not appear to exist!
Available Logs:
autoverify.log
cl2siteconfig_assist.log
cl_testtool.log
clavan.log
clcomd.log
clcomddiag.log
clconfigassist.log
clinfo.log
clstrmgr.debug
clstrmgr.debug.long
cluster.log
cluster.mmddyyyy
clutils.log
clverify.log
cspoc.log
cspoc.log.long
cspoc.log.remote
dhcpsa.log
dnssa.log
domino_server.log
emuhacmp.out
hacmp.out
ihssa.log
migration.log
sa.log
sax.log
seoul:/ # clmgr view log cspoc.log | more
Warning: no options were provided for log "cspoc.log".
Defaulting to the last 500 lines.
09/21/10 10:23:09 seoul: success: clresactive -v datavg
09/21/10 10:23:10 seoul: success: /usr/es/sbin/cluster/cspoc/clshowfs2
datavg
09/21/10 10:23:29 [========== C_SPOC COMMAND LINE ==========]
Chapter 10. Troubleshooting PowerHA 7.1
307
09/21/10 10:23:29 /usr/es/sbin/cluster/sbin/cl_chfs -cspoc -nseoul,busan -FM -a
size=+896 -A no /database/logdir
09/21/10 10:23:29 busan: success: clresactive -v datavg
09/21/10 10:23:29 seoul: success: clresactive -v datavg
09/21/10 10:23:30 seoul: success: eval LC_ALL=C lspv
09/21/10 10:23:35 seoul: success: chfs -A no -a size="+1835008" /database/logdir
09/21/10 10:23:36 seoul: success: odmget -q 'attribute = label and value =
/database/logdir' CuAt
09/21/10 10:23:37 busan: success: eval varyonvg -n -c -A datavg ;
imfs -lx lvdata09
;
imfs -l lvdata09;
varyonvg -n -c -P datavg.
10.2 Troubleshooting the migration
This section offers a collection of problems and solutions that you might encounter when
migration testing. The information is based on the experience of the writers of this Redbooks
publication.
10.2.1 The clmigcheck script
The clmigcheck script writes all activity to the /tmp/clmigcheck.log file (Figure 10-2).
Therefore, you must first look in this file for an error message if you run into any problems with
the clmigcheck utility.
mk_cluster: ERROR: Problems encountered creating the cluster in AIX.
Use the syslog facility to see output from the mkcluster command.
Error termination on: Wed Sep 22 15:47:43 EDT 2010
Figure 10-2 Output from the clmigcheck.log file
10.2.2 The ‘Cluster still stuck in migration’ condition
When migration is completed, you might not progress to the update of the Object Data
Manager (ODM) entries until the node_up event is run on the last node of the cluster. If you
have this problem, start the node to see if this action completes the migration protocol and
updates the version numbers correctly. For PowerHA 7.1, the version number must be 12 in
the HACMPcluster class. You can verify this number by running odmget as shown in example
7-51. If the version number is less than 12, you are still stuck in migration and must call IBM
support.
10.2.3 Existing non-IP networks
The following section provides details about problems with existing non-IP networks that are
not removed. It describes a possible workaround to remove disk heartbeat networks if they
were not deleted as part of the migration process.
308
IBM PowerHA SystemMirror 7.1 for AIX
After the migration, the output of the cltopinfo command might still show the disk heartbeat
network as shown in Example 10-2.
Example 10-2 The cltopinfo command with the disk heartbeat still being displayed
berlin:/ # cltopinfo
Cluster Name: de_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: caa_private0
Cluster IP Address:
There are 2 node(s) and 3 network(s) defined
NODE berlin:
Network net_diskhb_01
berlin_hdisk1_01
/dev/hdisk1
Network net_ether_01
berlin 192.168.101.141
Network net_ether_010
alleman 10.168.101.142
german 10.168.101.141
berlinb1
192.168.200.141
berlinb2
192.168.220.141
NODE munich:
Network net_diskhb_01
munich_hdisk1_01
/dev/hdisk1
Network net_ether_01
munich 192.168.101.142
Network net_ether_010
alleman 10.168.101.142
german 10.168.101.141
munichb1
192.168.200.142
munichb2
192.168.220.142
Resource Group http_rg
Startup Policy
Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Never Fallback
Participating Nodes
munich berlin
Service IP Label
alleman
Resource Group nfs_rg
Startup Policy
Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Fallback To Higher Priority Node In The List
Participating Nodes
berlin munich
Service IP Label
german
Chapter 10. Troubleshooting PowerHA 7.1
309
To remove the disk heartbeat network, follow these steps:
1. Stop PowerHA on all cluster nodes. You must perform this action because the removal
does not work in a running cluster. Figure 10-3 shows the error message that is received
when trying to remove the network in an active cluster.
COMMAND STATUS
Command: failed
stdout: yes
stderr: no
Before command completion, additional instructions may appear below.
cldare: Migration from PowerHA SystemMirror to PowerHA SystemMirror/ES
detected.
A DARE event cannot be run until the migration has completed.
F1=Help
F8=Image
n=Find Next
F2=Refresh
F9=Shell
F3=Cancel
F10=Exit
Figure 10-3 Cluster synchronization error message
310
IBM PowerHA SystemMirror 7.1 for AIX
F6=Command
/=Find
2. Remove the network:
a. Follow the path smitty sysmirror  Cluster Nodes and Networks  Manage
Networks and Network Interfaces  Networks  Remove a Network.
b. On the SMIT panel, similar to the one shown in Figure 10-4, select the disk heartbeat
network that you want to remove.
You might have to repeat these steps if you have more than one disk heartbeat network.
Networks
Move cursor to desired item and press Enter.
Add a Network
Change/Show a Network
Remove a Network
+--------------------------------------------------------------------------+
|
Select a Network to Remove
|
|
|
| Move cursor to desired item and press Enter.
|
|
|
|
net_diskhb_01
|
|
net_ether_01 (192.168.100.0/22)
|
|
net_ether_010 (10.168.101.0/24 192.168.200.0/24 192.168.220.0/24)
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F8=Image
F10=Exit
Enter=Do
|
F1| /=Find
n=Find Next
|
F9+--------------------------------------------------------------------------+
Figure 10-4 Removing the disk heartbeat network
3. Synchronize your cluster by selecting the path: smitty sysmirror  Custom Cluster
Configuration  Verify and Synchronize Cluster Configuration (Advanced).
4. See if the network is deleted by using the cltopinfo command as shown in Example 10-3.
Example 10-3 Output of the cltopinfo command after removing the disk heartbeat network
berlin:/ # cltopinfo
Cluster Name: de_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: caa_private0
Cluster IP Address:
There are 2 node(s) and 2 network(s) defined
NODE berlin:
Network net_ether_01
berlin 192.168.101.141
Network net_ether_010
german 10.168.101.141
alleman 10.168.101.142
berlinb1
192.168.200.141
Chapter 10. Troubleshooting PowerHA 7.1
311
berlinb2
192.168.220.141
NODE munich:
Network net_ether_01
munich 192.168.101.142
Network net_ether_010
german 10.168.101.141
alleman 10.168.101.142
munichb1
192.168.200.142
munichb2
192.168.220.142
Resource Group http_rg
Startup Policy
Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Never Fallback
Participating Nodes
munich berlin
Service IP Label
alleman
Resource Group nfs_rg
Startup Policy
Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Fallback To Higher Priority Node In The List
Participating Nodes
berlin munich
Service IP Label
german
berlin:/ #
5. Start PowerHA on all your cluster nodes by running the smitty cl_start command.
10.3 Troubleshooting the installation and configuration
This section explains how you can recover from various installation and configuration
problems on CAA and PowerHA.
10.3.1 The clstat and cldump utilities and the SNMP
After installing and configuring PowerHA 7.1 in AIX 7.1, the clstat and cldump utilities do not
work. If you experience this problem, convert the SNMP from version 3 to version 1. See
Example 10-4 for all the steps to correct this problem.
Example 10-4 The clstat utility not working under SNMP V3
seoul:/ # clstat -a
Failed retrieving cluster information.
There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.
Refer to the HACMP Administration Guide for more information.
seoul:/ # stopsrc -s snmpd
0513-044 The snmpd Subsystem was requested to stop.
seoul:/ # ls -ld /usr/sbin/snmpd
312
IBM PowerHA SystemMirror 7.1 for AIX
lrwxrwxrwx
snmpdv3ne
1 root
system
9 Sep 15 22:17 /usr/sbin/snmpd ->
seoul:/ # /usr/sbin/snmpv3_ssw -1
Stop daemon: snmpmibd
In /etc/rc.tcpip file, comment out the line that contains: snmpmibd
In /etc/rc.tcpip file, remove the comment from the line that contains: dpid2
Make the symbolic link from /usr/sbin/snmpd to /usr/sbin/snmpdv1
Make the symbolic link from /usr/sbin/clsnmp to /usr/sbin/clsnmpne
Start daemon: dpid2
seoul:/ # ls -ld /usr/sbin/snmpd
lrwxrwxrwx
1 root
system
/usr/sbin/snmpdv1
17 Sep 20 09:49 /usr/sbin/snmpd ->
seoul:/ # startsrc -s snmpd
0513-059 The snmpd Subsystem has been started. Subsystem PID is 8126570.
10.3.2 The /var/log/clcomd/clcomd.log file and the security keys
You might find that you cannot start the clcomd daemon and its log file has messages
indicating problems with the security keys as shown in Example 10-5.
Example 10-5 The clcomd daemon indicating problems with the security keys
2010-09-23T00:02:07.983104: WARNING: Cannot read the key
/etc/security/cluster/key_md5_des
2010-09-23T00:02:07.985975: WARNING: Cannot read the key
/etc/security/cluster/key_md5_3des
2010-09-23T00:02:07.986082: WARNING: Cannot read the key
/etc/security/cluster/key_md5_aes
This problem means that the /etc/cluster/rhosts file is not completed correctly. On all
cluster nodes, edit this file by using the IP addresses as the communication paths during
cluster definition, before the first synchronization. Use the host name as the persistent
address and the communication path. Then add the persistent addresses to the
/etc/cluster/rhosts file. Finally, issue the startsrc -s clcomd command.
10.3.3 The ECM volume group
When creating an ECM volume group by using the PowerHA C-SPOC menus, the
administrator receives the message shown in Example 10-6 about the inability to create the
group.
Example 10-6 Error messages when trying to create an ECM volume group using C-SPOC
seoul: 0516-1335 mkvg: This system does not support enhanced concurrent capable
seoul:
volume groups.
seoul: 0516-862 mkvg: Unable to create volume group.
seoul: cl_rsh had exit code = 1, see cspoc.log and/or clcomd.log for more
information
cl_mkvg: An error occurred executing mkvg appvg on node seoul
In /var/hacmp/log/cspoc.log, the messages are:
Chapter 10. Troubleshooting PowerHA 7.1
313
09/14/10 17:41:40 [========== C_SPOC COMMAND LINE ==========]
09/14/10 17:41:40 /usr/es/sbin/cluster/sbin/cl_mkvg -f -n -B -cspoc -nseoul,busan
-rdatarg -y datavg -s32 -V100 -lfalse E 00c0f6a0107734ea 00c0f6a010773532 00c0f6a0fed38de6 00c0f6a0fed3d324
00c0f6a0fed3ef8f
09/14/10 17:41:40 busan: success: clresactive -v datavg
09/14/10 17:41:40 seoul: success: clresactive -v datavg
09/14/10 17:41:41 cl_mkvg: cl_mkvg: An error occurred executing mkvg datavg on
node seoul
09/14/10 17:41:41 seoul: FAILED: mkvg -f -n -B -y datavg -s 32 -V 100 -C cldisk4
cldisk3 cldisk1 cldisk2 cldisk5
09/14/10 17:41:41 seoul: 0516-1335 mkvg: This system does not support enhanced
concurrent capable
09/14/10 17:41:41 seoul:
volume groups.
09/14/10 17:41:41 seoul: 0516-862 mkvg: Unable to create volume group.
09/14/10 17:41:41 seoul: RETURN_CODE=1
09/14/10 17:41:41 seoul: cl_rsh had exit code = 1, see cspoc.log and/or clcomd.log
for more information
09/14/10 17:41:42 seoul: success: cl_vg_fence_init datavg rw cldisk4 cldisk3
cldisk1 cldisk2 cldisk5
In this case, install the bos.clvm.enh file set and any fixes for this file set for the system to stay
in a consistent version state.
10.3.4 Communication path
If your cluster node communication path is misconfigured, you might see an error message
similar to the one shown in Figure 10-5.
------------[ PowerHA SystemMirror Migration Check ]------------ERROR: Communications Path for node brazil must be set to hostname
Hit <Enter> to continue
ERROR:
Figure 10-5 clmigcheck error for communication path
314
IBM PowerHA SystemMirror 7.1 for AIX
If you see an error for communication path while running the clmigcheck program, verify that
the /etc/hosts file includes the communication path for the cluster. Also check the
communication path in the HACMPnode ODM class as shown in Figure 10-6.
algeria:/ # odmget HACMPnode | grep -p COMMUNICATION
HACMPnode:
name = "algeria"
object = "COMMUNICATION_PATH"
value = "algeria"
node_id = 1
node_handle = 1
version = 12
HACMPnode:
name = "brazil"
object = "COMMUNICATION_PATH"
value = "brazil"
node_id = 3
node_handle = 3
version = 12
Figure 10-6 Communication path definition at HACMPnode.odm
Because the clmigcheck program is a ksh script, certain profiles can cause a similar problem. If
the problem persists after you correct the /etc/hosts configuration file, try to remove the
contents of the kshrc file because it might be affecting the behavior of the clmigcheck program.
If your /etc/cluster/rhosts program is not configured properly, you see an error message
similar to the one shown in Figure 10-7. The /etc/cluster/rhosts file must contain the fully
qualified domain name of each node in the cluster (that is, the output from the host name
command). After changing the /etc/cluster/rhosts file, run the stopsrc and startsrc
commands on the clcomd subsystem.
brazil:/ # clmigcheck
lslpp: Fileset hageo* not installed.
rshexec: cannot connect to node algeria
ERROR: Internode communication failed,
check the clcomd.log file for more information.
brazil:/ # clrsh algeria date
connect: Connection refused
rshexec: cannot open socket
Figure 10-7 The clcomd error message
You can also check clcomd communication by using the clrsh command as shown in
Figure 10-8.
algeria:/ # clrsh algeria date
Mon Sep 27 11:14:12 EDT 2010
algeria:/ # clrsh brazil date
Mon Sep 27 11:14:15 EDT 2010
Figure 10-8 Checking the clcomd connection
Chapter 10. Troubleshooting PowerHA 7.1
315
10.4 Troubleshooting problems with CAA
In this chapter, we discuss various problems that you could encounter on configuration or
installation of CAA, and provide recovery steps.
10.4.1 Previously used repository disk for CAA
When defining a PowerHA cluster, you must define a disk to use as the repository for the
CAA. If the specified disk was used previously as a repository by another cluster, upon
synchronizing the cluster, you receive a message in the /var/adm/ras/syslog.caa file (or
another file defined in /etc/syslog.conf). Example 10-7 shows the message that you
receive.
Example 10-7 CAA error message in the /var/adm/ras/syslog.caa file
Sep 16 08:58:14 seoul user:err|error syslog: validate_device: Specified device,
hdisk1, is a repository.
Sep 16 08:58:14 seoul user:warn|warning syslog: To force cleanup of this disk, use
rmcluster -r hdisk1
Example 10-8 shows the exact error message saved in the smit.log file.
Example 10-8 CAA errors in the smit.log file
ERROR: Problems encountered creating the cluster in AIX. Use the syslog facility
to see output from the mkcluster command.
ERROR: Creating the cluster in AIX failed. Check output for errors in local
cluster configuration, correct them, and try synchronization
again.
The message includes the solution as shown in Example 10-7. You run the rmcluster
command as shown in Example 10-9 to remove all CAA structures from the specified disk.
Example 10-9 Removing CAA structures from a disk
seoul:/ # rmcluster -r hdisk1
This operation will scrub hdisk1, removing any volume groups and clearing cluster
identifiers.
If another cluster is using this disk, that cluster will be destroyed.
Are you sure? (y/[n]) y
remove_cluster_repository: Couldn't get cluster repos lock.
remove_cluster_repository: Force continue.
After you issue the rmcluster command, the administrator can synchronize the cluster again.
Tip: After running the rmcluster command, verify that the caa_private0 disk has been
unconfigured and is not seen on other nodes. Run the lqueryvg -Atp command against
the repository disk to ensure that the volume group definition is removed from the disk. If
you encounter problems with the rmcluster command, see “Removal of the volume group
when the rmcluster command does not” on page 320 for information about how to
manually remove the volume group.
316
IBM PowerHA SystemMirror 7.1 for AIX
10.4.2 Repository disk replacement
The information to replace a repository disk is currently only available in the
/usr/es/sbin/cluster/README7.1.0.UPDATE file. However, the following information has been
provided to assist you in solving this problem:
1. If necessary, add a new disk and ensure that it is recognized by AIX. The maximum size
required is 10 GB. The disk must be zoned and masked to all cluster nodes.
2. Identify the current repository disk. You can use any of the following commands to obtain
this information:
lspv | grep caa_private
cltopinfo
lscluster -d
3. Stop cluster services on all nodes. Either bring resource groups offline or place them in an
unmanaged state.
4. Remove the CAA cluster by using the following command:
rmcluster -fn clustername
5. Verify that the AIX cluster is removed by running the following command in each node:
lscluster -m
6. If the CAA cluster is still present, run the following command in each node:
clusterconf -fu
7. Verify that the cluster repository is removed by using the lspv command. The repository
disk (see step 2) must not belong to any volume group.
8. Define a new repository disk by following the path: smitty sysmirror  Cluster Nodes
and Networks  Initial Cluster Setup (Typical)  Define Repository Disk and
Cluster IP Address.
9. Verify and synchronize the PowerHA cluster:
#smitty cm_ver_and_sync
10.Verify that the AIX cluster is recreated by running the following command:
#lscluster -m
11.Verify that the repository disk has changed by running the following command:
lspv | grep caa_private
12.Start cluster services on all nodes:
smitty cl_start
10.4.3 CAA cluster after the node restarts
In some cases, the CAA cluster disappears after a system reboot or halt. If you encounter this
situation, try the following solutions:
Wait 10 minutes. If you have another node in your cluster, the clconfd daemon checks for
nodes that need to join or sync up. It wakes up every 10 minutes.
If the previous method does not work, run the clusterconf command manually. This
solution works only if the system is aware of the repository disk location. You can check it
by running the lsattr -El cluster0 command.
Chapter 10. Troubleshooting PowerHA 7.1
317
See if clvdisk contains the repository disk UUID. Otherwise, you see the clusterconf error
message as shown in Example 10-10.
Example 10-10 The clusterconf error message
riyad:/ # clusterconf -v
_find_and_load_repos(): No repository candidate found.
leave_sinc: Could not get cluster disk names from cache file
/etc/cluster/clrepos_cache: No such file or directory
leave_sinc: Could not find cluster disk names.
Manually define the repository disk by using the following command:
clusterconf -vr caa_private0
If you know that the repository disk is available, and you know that your node is listed in
the configuration on the repository disk, use the -s flag on the clusterconf command to
do a search for it. This utility examines all locally visible hard disk drives to find the
repository disk.
10.4.4 Creation of the CAA cluster
You might encounter an error message about creating the CAA cluster when the clmigcheck
utility is run. You might also see such a message when trying to install PowerHA for the first
time or when creating a CAA cluster configuration. Depending on whether you are doing a
migration or a new configuration, you either see a problem in the clmigcheck.log file or on the
verification of your cluster.
One of the error messages that you see is “ERROR: Problems encountered creating the
cluster in AIX.” This message indicates a problem with creating the CAA cluster. The
clmigcheck program calls the mkcluster command to create the CAA cluster, which is what
you must look for in the logs.
To proceed with the troubleshooting, enable the syslog debugging as discussed in 10.2.1,
“The clmigcheck script” on page 308.
Incorrect entries in the /etc/filesystems file
When the CAA cluster is created, the cluster creates a caavg_private volume group and the
associated file systems for CAA. This information is kept in the /var/adm/ras/syslog.caa log
file. Any problems that you face when running the mkcluster command are also logged in the
/var/hacmp/clutils.log file.
If you encounter a problem when creating your cluster, check these log files to ensure that the
volume group and file systems are created without any errors.
318
IBM PowerHA SystemMirror 7.1 for AIX
Figure 10-9 shows the contents of caavg_private volume group.
# lsvg -l caavg_private
caavg_private:
LV NAME
TYPE
caalv_private1
boot
caalv_private2
boot
caalv_private3
boot
fslv00
jfs2
fslv01
jfs2
powerha_crlv
boot
LPs
1
1
4
4
4
1
PPs
1
1
4
4
4
1
PVs
1
1
1
1
1
1
LV STATE
closed/syncd
closed/syncd
open/syncd
open/syncd
closed/syncd
closed/syncd
MOUNT POINT
N/A
N/A
N/A
/clrepos_private1
/clrepos_private2
N/A
Figure 10-9 Contents of the caavg_private volume group
Figure 10-10 shows a crfs failure while creating the CAA cluster. This problem was corrected
by removing incorrect entries in the /etc/filesystems file. Likewise, problems can happen
when you already have the same logical volume name that must be used by the CAA cluster,
for example.
Sep 29 15:50:49 riyad user:info cluster[9437258]:
stdout: caalv_private3
Sep 29 15:50:49 riyad user:info cluster[9437258]:
stderr:
Sep 29 15:50:49 riyad user:info cluster[9437258]: cl_run_log_method:
'/usr/lib/cluster/clreposfs ' returned 1
Sep 29 15:50:49 riyad user:info cluster[9437258]:
stdout:
Sep 29 15:50:49 riyad user:info cluster[9437258]:
stderr: crfs:
/clrepos_private1 file system already exists '/usr/sbin/crfs -v jfs2 -m
/clrepos_private1 -g caavg_private -a options=dio -a logname=INLINE -a
size=256M' failed with rc=1
Sep 29 15:50:49 riyad user:err|error cluster[9437258]: cluster_repository_init:
create_clv failed
Sep 29 15:50:49 riyad user:info cluster[9437258]: cl_run_log_method:
'/usr/sbin/varyonvg -b -u caavg_private' returned 0
Figure 10-10 The syslog.caa entries after a failure during CAA creation
Tip: When you look at the syslog.caa file, focus on the AIX commands (such as mkvg, mklv,
and crfs) and their returned values. If you find non-zero return values, a problem exists.
Chapter 10. Troubleshooting PowerHA 7.1
319
10.4.5 Volume group name already in use
A volume group that is already in use can cause the error message discussed in 10.4.4,
“Creation of the CAA cluster” on page 318. When you encounter the error message, enable
syslog debugging. The /tmp/syslog.out log file has the entries shown in Figure 10-11.
Sep 23 11:46:09 chile user:info cluster[21037156]: cl_run_log_method:
'/usr/sbin/mkvg -f -y caavg_private -s 64 caa_private0' returned 1
Sep 23 11:46:09 chile user:info cluster[21037156]:
stdout:
Sep 23 11:46:09 chile user:info cluster[21037156]:
stderr: 0516-360
/usr/sbin/mkvg: The device name is already used; choose a different name.
Sep 23 11:46:09 chile user:err|error cluster[21037156]:
cluster_repository_init: create_cvg failed
Figure 10-11 Extract from the syslog.out file
You can see that the volume group creation failed because the name is already in use. This
problem can happen for several reasons. Ffor example, it can occur if the disk was previously
used as the CAA repository or the disk had the volume group descriptor area (VGDA)
information of other volume group in it.
Disk previously used by CAA volume group or third party
If the disk was previously used by CAA or AIX, you can recover from this situation by running
the following command:
rmcluster -r hdiskx
For the full sequence of steps, see 10.4.1, “Previously used repository disk for CAA” on
page 316.
If you find that the rmcluster command has not removed your CAA definition from the disk,
use the steps in the following section, “Removal of the volume group when the rmcluster
command does not.”
Removal of the volume group when the rmcluster command does not
In this situation, you must use the Logical Volume Manager (LVM) commands, which you can
do in one of two ways. The easiest method is to import the volume group, vary on the volume
group, and then reduce it so that the VGDA is removed from the disk. If this method does not
work, use the dd command to overwrite special areas of the disk.
Tip: Make sure that the data contained on the disk is not needed because usage of the
following steps destroys the volume group data on the disk.
Removing the VGDA from the disk
This method involves importing the volume group from the disk and reducing it from the
volume group to remove the VGDA information without losing the PVID. If you are able to
import the volume group, activate it by using the varyonvg command:
# varyonvg vgname
If the activation fails, run the exportvg command to remove the volume group definition from
the ODM. Then try to import it with a different name as follows:
# exportvg vgname
# importvg -y new-vgname hdiskx
320
IBM PowerHA SystemMirror 7.1 for AIX
If you cannot activate the imported volume group, use the reducevg command as shown in
Example 10-12.
reducevg -df test_vg caa_private0
0516-1246 rmlv: If caalv_private1 is the boot logical volume, please run 'chpv
-c <diskname>'
as root user to clear the boot record and avoid a potential boot
off an old boot image that may reside on the disk from which this
logical volume is moved/removed.
rmlv: Logical volume caalv_private1 is removed.
0516-1246 rmlv: If caalv_private2 is the boot logical volume, please run 'chpv
-c <diskname>'
as root user to clear the boot record and avoid a potential boot
off an old boot image that may reside on the disk from which this
logical volume is moved/removed.
rmlv: Logical volume caalv_private2 is removed.
Figure 10-12 The reducevg command
After you complete the forced reduction, check whether the disk no longer contains a volume
group by using the lqueryvg -Atp hdisk command.
Also verify whether any previous volume group definition is still being displayed on the other
nodes of your cluster by using the lspv command. If the lspv output shows the PVID with one
associated volume group, you can fix it by running the exportvg vgname command.
If experience any problems with this procedure, try a force overwrite of the disk as described
in “Overwriting the disk.”
Overwriting the disk
This method involves writing data to the top of the disk to overwrite the VGDA information and
effectively cleaning the disk, leaving it ready for use by other volume groups.
Attention: Only attempt this method if the rmcluster and reducevg procedures fail and if
AIX still has access to the disk. You can check this access by running the lquerypv -h
/dev/hdisk command.
Enter the following command:
# dd if=/dev/zero of=/dev/hdiskx bs=4 count=1
This command zeros only the part of the disk that contains the repository offset. Therefore,
you do not lose the PVID information.
In some cases, this procedure is not sufficient to resolve the problem. If you need to
completely overwrite the disk, run the following procedure:
Attention: This procedure overwrites the entire disk structure including the PVID. You
must follow the steps as shown to change the PVID if required during migration.
#
#
#
#
dd if=/dev/zero of=/dev/hdiskn bs=512 count=9
chdev -l hdiskn -a pv=yes
rmdev -dl hdiskn
cfgmgr
Chapter 10. Troubleshooting PowerHA 7.1
321
On any other node in the cluster, you must also update the disk:
# rmdev -dl hdiskn
# cfgmgr
Run the lspv command to check that the PVID is the same on both nodes. To ensure that you
have the real PVID, query the disk as follows:
# lquerypv -h /dev/hdiskn
Look for the PVID, which is in sector 80 as shown in Figure 10-13.
chile:/ # lquerypv -h /dev/hdisk3
00000000
C9C2D4C1 00000000 00000000
00000010
00000000 00000000 00000000
00000020
00000000 00000000 00000000
00000030
00000000 00000000 00000000
00000040
00000000 00000000 00000000
00000050
00000000 00000000 00000000
00000060
00000000 00000000 00000000
00000070
00000000 00000000 00000000
00000080
000FE401 68921CEA 00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
|................|
|................|
|................|
|................|
|................|
|................|
|................|
|................|
|....h...........|
Figure 10-13 PVID from the lquerypv command
The PVID should match the lspv output as shown in Figure 10-14.
chile:/ # lspv
hdisk1
hdisk2
hdisk3
hdisk4
hdisk5
hdisk6
hdisk7
hdisk8
hdisk0
000fe4114cf8d1ce
000fe40163c54011
000fe40168921cea
000fe4114cf8d3a1
000fe4114cf8d441
000fe4114cf8d4d5
000fe4114cf8d579
000fe4114cf8d608
000fe40140a5516a
None
None
None
None
None
None
None
ny_datavg
rootvg
active
Figure 10-14 The lspv output showing PVID
10.4.6 Changed PVID of the repository disk
Your repository disk PVID might have changed because of a dd on the whole disk or a change
in the logical unit number (LUN). If this change happened and you must complete the
migration, follow the guidance in this section to change it.
322
IBM PowerHA SystemMirror 7.1 for AIX
If you are in a migration that has not yet been completed, change the PVID section in the
/var/clmigcheck/clmigcheck.txt file (Figure 10-15). You must change this file on every node
in your cluster.
CLUSTER_TYPE:STANDARD
CLUSTER_REPOSITORY_DISK:000fe40120e16405
CLUSTER_MULTICAST:NULL
Figure 10-15 Changing the PVID in the clmigcheck.txt file
If this is post migration and PowerHA is installed, you must also modify the HACMPsircol ODM
class (Figure 10-16) on all nodes in the cluster.
HACMPsircol:
name = "newyork_sircol"
id = 0
uuid = "0"
repository = "000fe4114cf8d258"
ip_address = ""
nodelist = "serbia,scotland,chile,"
backup_repository1 = ""
backup_repository2 = ""
Figure 10-16 The HACMPsircol ODM class
To modify the HACMPsircol ODM class, enter the following commands:
# odmget HACMPsircol > HACMPsircol.add
# vi HACMPsircol.add
Change the repository = "000fe4114cf8d258" line to your new PVID as follows:
# odmdelete -o HACMPsircol
# odmadd HACMPsircol.add
Then save the file.
10.4.7 The ‘Cluster services are not active’ message
After migration of PowerHA, if you notice that CAA cluster services are not running, you see
the “Cluster services are not active” message when you run the lscluster command.
You also notice that the CAA repository disk is not varied on.
You might be able to recover by recreating the CAA cluster from the last CAA configuration
(HACMPsircol class in ODM) as explained in the following steps:
1. Clear the CAA repository disk as explained in “Previously used repository disk for CAA” on
page 316.
2. Perform a synchronization or verification of the cluster. Upon synchronizing the cluster, the
mkcluster command is run to recreate the CAA cluster. However, if the problem still
persists, contact IBM support.
Chapter 10. Troubleshooting PowerHA 7.1
323
324
IBM PowerHA SystemMirror 7.1 for AIX
11
Chapter 11.
Installing IBM Systems Director
and the PowerHA SystemMirror
plug-in
This chapter explains how to install IBM Systems Director Version 6.2. It also explains how to
install the PowerHA SystemMirror plug-in for the IBM Systems Director, and the necessary
agents on the client machines to be managed by Systems Director. For detailed planning,
prerequisites, and instructions, see Implementing IBM Systems Director 6.1, SG24-7694.
This chapter includes the following topics:
Installing IBM Systems Director Version 6.2
Installing the SystemMirror plug-in
Installing the clients
© Copyright IBM Corp. 2011. All rights reserved.
325
11.1 Installing IBM Systems Director Version 6.2
Before you configure the cluster using the SystemMirror plug-in, you must install and
configure the IBM Systems Director. You can install the IBM Systems Director Server on AIX,
Linux, or Windows operating system. For quick reference, this section provides the installation
steps on AIX. See the information in the following topics in the IBM Systems Director
Information Center for details about installation on other operating systems:
The “IBM Systems Director V6.2.x” topic for general information
http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com.ib
m.director.main.helps.doc/fqm0_main.html
“Installing IBM Systems Director on the management server” topic for installation
information
http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com.ib
m.director.install.helps.doc/fqm0_t_installing.html
The following section, “Hardware requirements”, explains the installation requirements of IBM
Systems Director v6.2 on AIX.
11.1.1 Hardware requirements
See the “Hardware requirements for running IBM Systems Director Server” topic in the IBM
Systems Director Information Center for details about the recommended hardware
requirements for installing IBM Systems Director:
http://publib.boulder.ibm.com/infocenter/director/v6r2x/topic/com.ibm.director.pla
n.helps.doc/fqm0_r_hardware_requirements_for_running_ibm_systems_director_server.h
tml
Table 11-1 lists the hardware requirements for IBM Systems Director Server running on AIX
for a small configuration that has less than 500 managed systems.
Table 11-1 Hardware requirements for IBM Systems Director Server on AIX
326
Resource
Requirement
CPU
Two processors, IBM POWER5, POWER6 or POWER7®, or for
partitioned systems:
Entitlement = 1
Uncapped Virtual processors = 4
Weight = Default
Memory
3 GB
Disk storage
4 GB
File system requirement
(during installation)
root = 1.2 GB
/tmp = 2 GB
/opt = 4 GB
IBM PowerHA SystemMirror 7.1 for AIX
More information: Disk storage requirements for running the IBM Systems Director
Server are used by the /opt file system. Therefore, a total of 4 GB is required for the /opt
file system while installing IBM Systems Director and during run time.
For more details about hardware requirements, see the “Recommended hardware
requirements for IBM Systems Director Server running on AIX” topic in the IBM Systems
Director Information Center at:
http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com.ib
m.director.plan.helps.doc/fqm0_r_hardware_requirements_servers_running_aix.html
11.1.2 Installing IBM Systems Director on AIX
For the prerequisites and complete steps for installing IBM Systems Director, see the
following topics in the IBM Systems Director Information Center:
“Preparing to install IBM Systems Director Server on AIX”
http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/
com.ibm.director.install.helps.doc/fqm0_t_preparing_to_install_ibm_director_on_
aix.html
“Installing IBM Systems Director Server on AIX,” which provides the complete installation
steps
http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com.ib
m.director.install.helps.doc/fqm0_t_installing_ibm_director_server_on_aix.html
The following steps summarize the process for installing IBM Systems Director on AIX:
1. Increase the file size limit:
ulimit -f 4194302 (or to unlimited)
2. Increase the number of file descriptors:
ulimit -n 4000
3. Verify the file system (/, /tmp and /opt) size as mentioned in Table 11-1 on page 326:
df -g / /tmp /opt
4. Download IBM Systems Director from the IBM Systems Director Downloads page at:
http://www.ibm.com/systems/management/director/downloads/
5. Extract the content:
gzip -cd <package_name> | tar -xvf where <package_name> is the file name of the download package.
6. Install the content by using the script in the extracted package:
./dirinstall.server
Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in
327
11.1.3 Configuring and activating IBM Systems Director
To configure and activate IBM Systems Director, follow these steps:
1. Configure IBM Systems Director by using the following script:
/opt/ibm/director/bin/configAgtMgr.sh
Agent password: The script prompts for an agent password for which you can
consider giving the host system root password or any other common password of your
choice. This password is used by IBM Systems Director for its internal communication
and does have any external impact.
2. Start IBM Systems Director:
/opt/ibm/director/bin/smstart
3. Monitor the activation process as shown in Figure 11-1. This process might take 2-3
minutes.
/opt/ibm/director/bin/smstatus -r
Inactive
Starting
Active
Figure 11-1 Activation status for IBM Systems Director
Some subsystems are added as part of the installation process as follows:
Subsystem
platform_agent
cimsys
Group
PID
2752614
3080288
Status
active
active
Some process start automatically:
root 6553804 7995522
0 13:24:40 pts/0 0:00
/opt/ibm/director/jre/bin/java -Xverify:none -cp /opt/ibm/director/lwi/r
root 7340264
1
0 13:19:26 pts/2 3:14
/opt/ibm/director/jre/bin/java -Xms512m -Xmx2048m -Xdump:system:events=g
root 7471292 2949286
0 12:00:31
- 0:00
/opt/freeware/cimom/pegasus/bin/cimssys platform_agent
root 7536744
1
0 12:00:31
- 0:00
/opt/ibm/icc/cimom/bin/dirsnmpd
root 8061058 3604568
0 13:16:32
- 0:14
/var/opt/tivoli/ep/_jvm/jre/bin/java -Xmx384m -Xminf0.01 -Xmaxf0.4 -Dsun
4. Log in to IBM Systems Director by using the following address:
https://<hostname.domain.com or IP>:8422/ibm/console/logon.jsp
In this example, we use the following address:
https://indus74.in.ibm.com:8422/ibm/console/logon.jsp
5. On the welcome page (Figure 12-4 on page 335) that opens, log in using root credentials.
After completing the installation of IBM Systems Director, install the SystemMirror plug-in as
explained in the following section.
328
IBM PowerHA SystemMirror 7.1 for AIX
11.2 Installing the SystemMirror plug-in
The IBM Systems Director provides two sets of plug-ins:
The SystemMirror server plug-in to be installed in the IBM Systems Director Server.
The SystemMirror agent plug-in to be installed in the cluster nodes or the endpoints as
discovered by IBM Systems Director.
11.2.1 Installing the SystemMirror server plug-in
You must install the SystemMirror server plug-in in the IBM Systems Director Server.
Table 11-2 on page 329 outlines the installation steps for the SystemMirror server plug-in
depending on your operating system. You can find this table and more information about the
installation in the SystemMirror installation steps chapter in “Configuring AIX Clusters for High
Availability Using PowerHA SystemMirror for Systems Director,” which you can download
from:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101774
Table 11-2 Installation steps for the SystemMirror server plug-in
Operating
system
Installation steps
AIX and Linux
Graphical installation:
# chmod 700 IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin
# IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin
Textual Installation:
# chmod 700 IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin
# IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin -i console
Silent mode installation
Edit the installer.properties file.
# chmod 700 IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin
# IBMSystemsDirector-PowerHA_SystemMirror-AIX.bin -i silent
Windows
Graphical installation:
IBMSystemsDirector-PowerHA_SystemMirror-Windows.exe
Textual installation:
IBMSystemsDirector-PowerHA_SystemMirror-Windows.exe -i console
Silent installation:
First, edit the installer.properties file.
IBMSystemsDirector-PowerHA_SystemMirror-Windows.exe -i silent
export DISPLAY: export DISPLAY =<ip address of X Windows Server>:1 is required to
export the display of the server running the X Window System server to use the graphical
installation.
Verifying the installation of the SystemMirror plug-in
The interface plug-in of the subagent is loaded when the IBM System Director Server starts.
To check the installation, run the following command depending on your environment:
AIX / Linux:
/opt/ibm/director/lwi/bin/lwiplugin.sh -status | grep mirror
Windows:
C:/Program Files/IBM/Director/lwi/bin/lwiplugin.bat
Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in
329
Figure 11-2 shows the output of the plug-in status.
94:RESOLVED:com.ibm.director.power.ha.systemmirror.branding:7.1.0.1:com.ibm.director.power.ha.systemmirr
or.branding
95:ACTIVE:com.ibm.director.power.ha.systemmirror.common:7.1.0.1:com.ibm.director.power.ha.systemmirror.c
ommon
96:ACTIVE:com.ibm.director.power.ha.systemmirror.console:7.1.0.1:com.ibm.director.power.ha.systemmirror.
console
97:RESOLVED:com.ibm.director.power.ha.systemmirror.helps.doc:7.1.0.1:com.ibm.director.power.ha.systemmir
ror.helps.doc
98:INSTALLED:com.ibm.director.power.ha.systemmirror.server.fragment:7.1.0.0:com.ibm.director.power.ha.sy
stemmirror.server.fragment
99:ACTIVE:com.ibm.director.power.ha.systemmirror.server:7.1.0.1:com.ibm.director.power.ha.systemmirror.s
erver
Figure 11-2 Output of the plug-in status command
If the subagent interface plug-in shows the RESOLVED status instead of the ACTIVE status,
attempt to start the subagent. Enter the following commands by using the lwiplugin.sh script
on AIX and Linux or the lwiplugin.bat script on Windows and the plug-in number (which is
94):
AIX and Linux
/opt/ibm/director/agent/bin/lwiplugin.sh -start 94
Windows
C:/Program Files/IBM/Director/lwi/bin/lwiplugin.bat -start 94
If Systems Director was active during installation of the plug-in, you must stop it and restart it
as follows:
1. Stop the IBM Systems Director Server:
# /opt/ibm/director/bin/smstop
2. Start the IBM Systems Director Server:
# /opt/ibm/director/bin/smstart
3. Monitor the startup process:
# /opt/ibm/director/bin/smstatus -r
Inactive
Starting
Active *** (the "Active" status can take a long time)
11.2.2 Installing the SystemMirror agent plug-in in the cluster nodes
Install the cluster.es.director.agent file set by using SMIT. This file set is provided with the
base PowerHA SystemMirror installable images.
More information: See the SystemMirror agent installation section in Configuring AIX
Clusters for High Availability Using PowerHA SystemMirror for Systems Director paper at:
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101774
See also PowerHA SystemMirror for IBM Systems Director, SC23-6763.
330
IBM PowerHA SystemMirror 7.1 for AIX
11.3 Installing the clients
You must perform the steps in the following sections in each node that is going to be
managed by the PowerHA SystemMirror plug-in for IBM Systems Director. This topic includes
the following sections:
Installing the common agent
Installing the PowerHA SystemMirror agent
11.3.1 Installing the common agent
Perform these steps on each node that is going to be managed by the IBM Systems Director
Server:
1. Extract the SysDir6_2_Common_Agent_AIX.jar file set:
# /usr/java5/bin/jar -xvf SysDir6_2_Common_Agent_AIX.jar
2. Give execution permission to the repository/dir6.2_common_agent_aix.sh file:
# chmod +x repository/dir6.2_common_agent_aix.sh
3. Execute the repository/dir6.2_common_agent_aix.sh file:
# ./repository/dir6.2_common_agent_aix.sh
Some subsystems are added as part of the installation process:
platform_agent
cimsys
3211374
2621604
active
active
Some process start automatically:
root 421934
1
0 15:55:30
- 0:00
/opt/ibm/icc/cimom/bin/dirsnmpd
root 442376
1
0 15:55:40
- 0:00 /usr/bin/cimlistener
root 458910
1
0 15:55:31
- 0:00
/opt/freeware/cimom/pegasus/bin/CIM_diagd
root 516216 204950
0 15:55:29
- 0:00
/opt/freeware/cimom/pegasus/bin/cimssys platform_agent
root 524366
1
0 15:55:29
- 0:00 ./slp_srvreg -D
root 581780
1
0 15:55:37
- 0:04 [cimserve]
root 626740 204950
0 15:55:29
- 0:00
/opt/freeware/cimom/pegasus/bin/cimssys cimsys
root 630862
1
0 15:55:29
- 0:00
/opt/ibm/director/cimom/bin/tier1slp
Chapter 11. Installing IBM Systems Director and the PowerHA SystemMirror plug-in
331
11.3.2 Installing the PowerHA SystemMirror agent
To install the PowerHA SystemMirror agent on the nodes, follow these steps:
1. Install the cluster.es.director.agent.rte file set:
# smitty install_latest
2. Stop the common agent:
# stopsrc -s platform_agent
# stopsrc -s cimsys
3. Start the common agent:
# startsrc -s platform_agent
Tip: The cimsys subsystem starts along with the platform_agent subsystem.
332
IBM PowerHA SystemMirror 7.1 for AIX
12
Chapter 12.
Creating and managing a cluster
using IBM Systems Director
The SystemMirror plug-in provided by IBM Systems Director is used to configure and manage
the PowerHA cluster. This plug-in provides a state-of-the-art interface and a command-line
interface (CLI) for cluster configuration. It includes wizards to help you create and manage the
cluster and the resource groups. The plug-in also helps in seamless integration of Smart
Assists and third-party application support.
This chapter explains how to create and manage the PowerHA SystemMirror cluster with IBM
Systems Director.
This chapter includes the following topics:
Creating a cluster with the SystemMirror plug-in wizard
Creating a cluster with the SystemMirror plug-in CLI
Performing cluster management
Performing cluster management with the SystemMirror plug-in CLI
Creating a resource group with the SystemMirror plug-in GUI wizard
Resource group management using the SystemMirror plug-in wizard
Managing a resource group with the SystemMirror plug-in CLI
Verifying and synchronizing a configuration with the GUI
Verifying and synchronizing with the CLI
Performing cluster monitoring with the SystemMirror plug-in
© Copyright IBM Corp. 2011. All rights reserved.
333
12.1 Creating a cluster
You can create a cluster by using the wizard for the SystemMirror plug-in or by using the CLI
commands for the SystemMirror plug-in. This topic explains how to use both methods.
12.1.1 Creating a cluster with the SystemMirror plug-in wizard
To create the cluster by using the GUI wizard of the SystemMirror plug-in, follow these steps.
1. Go to your IBM Systems Director server.
2. On the login page (Figure 12-1), log in to IBM Systems Director with your user ID and
password.
Figure 12-1 Systems Director login console
3. In the IBM Systems Director console, in the left navigation pane, expand Availability and
select PowerHA SystemMirror (Figure 12-2).
Figure 12-2 Selecting the PowerHA SystemMirror link in IBM Systems Director
334
IBM PowerHA SystemMirror 7.1 for AIX
4. In the right pane, under Cluster Management, click Create Cluster (Figure 12-3).
Figure 12-3 The Create Cluster link under Cluster Management
5. Starting with the Create Cluster Wizard, follow the wizard panes to create the cluster.
a. In the Welcome pane (Figure 12-4), click Next.
Figure 12-4 Create Cluster Wizard
Chapter 12. Creating and managing a cluster using IBM Systems Director
335
b. In the Name the cluster pane (Figure 12-5), in the Cluster name field, provide a name
for the cluster. Click Next.
Figure 12-5 Entering the cluster name
c. In the Choose nodes pane (Figure 12-6), select the host names of the nodes.
Figure 12-6 Selecting the cluster nodes
336
IBM PowerHA SystemMirror 7.1 for AIX
Common storage: The cluster nodes must have the common storage for the
repository disk. To verify the common storage, in the Choose nodes window, click
the Common storage button. The Common storage window (Figure 12-7) opens
showing the common disks.
Figure 12-7 Verifying common storage availability for the repository disk
d. In the Configure nodes pane (Figure 12-8), set the controlling node. The controlling
node in the cluster is considered to be the primary or home node. Click Next.
Figure 12-8 Setting the controlling node
Chapter 12. Creating and managing a cluster using IBM Systems Director
337
e. In the Choose repositories pane (Figure 12-9), choose the storage disk that is shared
among all nodes in the cluster to use as the common storage repository. Click Next.
Figure 12-9 Selecting the repository disk
f. In the Configure security pane (Figure 12-10), specify the security details to secure
communication within the cluster.
Figure 12-10 Configuring the cluster security configuration
338
IBM PowerHA SystemMirror 7.1 for AIX
g. In the Summary pane (Figure 12-11), verify the configuration details.
Figure 12-11 Summary pane
6. Verify the cluster creation in the AIX cluster nodes by using either of the following
commands:
– The CAA command:
/usr/sbin/lscluster -m
– The PowerHA command:
/usr/es/sbin/cluster/utilities/cltopinfo
12.1.2 Creating a cluster with the SystemMirror plug-in CLI
IBM Systems Director provides a CLI to monitor and manage the system. This section
explains how to create a cluster by using the SystemMirror plug-in CLI.
Chapter 12. Creating and managing a cluster using IBM Systems Director
339
Overview of the CLI
The CLI is executed by using a general-purpose smcli command. To list the available CLI
commands for managing the cluster, run the smcli lsbundle command as shown in
Figure 12-12.
# smcli lsbundle | grep sysmirror
sysmirror/help
sysmirror/lsac
sysmirror/lsam
sysmirror/lsappctl
sysmirror/lsappmon
sysmirror/lscl
sysmirror/lscluster
sysmirror/lsdependency
sysmirror/lsdp
sysmirror/lsfc
sysmirror/lsfilecollection
sysmirror/lsif
sysmirror/lsinterface
sysmirror/lslg
sysmirror/lslog
sysmirror/lsmd
sysmirror/lsmethod
.....
.....
Figure 12-12 CLI commands specific to SystemMirror
You can retrieve help information for the commands (Figure 12-12) as shown in Figure 12-13.
# smcli lscluster --help
smcli sysmirror/lscluster {-h|-?|--help} \
[-v|--verbose]
smcli sysmirror/lscluster [-v|--verbose] \
[<CLUSTER>[,<CLUSTER#2>,...]]
Command Alias: lscl
Figure 12-13 CLI help option
Creating a cluster with the CLI
Before you create a cluster, ensure that you have all the required details to create the cluster:
Cluster nodes
Persistent IP (if any)
Repository disk
Controlling node
Security options (if any)
To verify the availability of the mkcluster command, you can use the smcli lsbundle
command in IBM Systems Director as shown in Figure 12-12.
340
IBM PowerHA SystemMirror 7.1 for AIX
To create a cluster, issue the smcli mkcluster command from the IBM Systems Director
Server as shown in Example 12-1.
Example 12-1 Creating a cluster with the smcli mkcluster CLI command
smcli mkcluster -i 224.0.0.0 \
-r hdisk3 \
–n nodeA.xy.ibm.com,nodeB.xy.ibm.com \
DB2_Cluster
You can use the -h option to list the commands that are available (Figure 12-14).
# smcli mkcluster -h
smcli sysmirror/mkcluster {-h|-?|--help} [-v|--verbose]
smcli sysmirror/mkcluster [{-i|--cluster_ip} <multicast_address>] \
[{-S|--fc_sync_interval} <##>] \
[{-s|--rg_settling_time} <##>] \
[{-e|--max_event_time} <##>] \
[{-R|--max_rg_processing_time} <##>] \
[{-c|--controlling_node} <node>] \
[{-d|--shared_disks} <DISK>[,<DISK#2>,...] ] \
{-r|--repository} <disk> \
{-n|--nodes} <NODE>[, <NODE#2>,...] \
[<cluster_name>]
Figure 12-14 The mkcluster -h command to list the available commands
To verify that the cluster has been created, you can use the smcli lscluster command.
Command help: To assistance with using the commands, you can use either of the
following help options:
smcli <command name> -help --verbose
smcli <command name> -h -v
12.2 Performing cluster management
You can perform cluster management by using the GUI wizard for the SystemMirror plug-in or
by using the CLI commands for the SystemMirror plug-in. This topic explains how to use both
methods.
12.2.1 Performing cluster management with the SystemMirror plug-in GUI
wizard
IBM Systems Director provides GUI wizards to manage the network, storage, and snapshots
of a cluster. IBM Systems Director also provides functionalities to add nodes, view cluster
services status changes, review reports, and verify and synchronize operations. The following
sections guide you through these functionalities.
Chapter 12. Creating and managing a cluster using IBM Systems Director
341
Accessing the Cluster Management Wizard
To access the Cluster Management Wizard, follow these steps:
1. In the IBM Systems Director console, expand Availability and select PowerHA
SystemMirror (Figure 12-3 on page 335).
2. In the right pane, under Cluster Management, click the Manage Clusters link
(Figure 12-15).
Figure 12-15 Manage cluster
342
IBM PowerHA SystemMirror 7.1 for AIX
Cluster management functionality
This section describes the cluster management functionality:
Cluster Management window (Figure 12-16)
After clicking the Manage Clusters link in the IBM Systems Director console, you see the
Cluster Management pane. This pane contains a series of tabs to help you manage your
cluster.
Figure 12-16 Cluster Management pane
Chapter 12. Creating and managing a cluster using IBM Systems Director
343
Edit Advanced Properties button
Under the General tab, you can click the Edit Advanced Properties button to modify the
cluster properties. For example, you can change the controlling node as shown in
Figure 12-17.
Figure 12-17 Editing the advanced properties, such as the controlling node
Add Network tab
Under the Networks tab, you can click the Add Network button to add a network as
shown in Figure 12-18.
Figure 12-18 Add Network function
344
IBM PowerHA SystemMirror 7.1 for AIX
Storage management
On the Storage tab, you can perform disk management tasks such as converting the
hdisk into VPATH. From the View drop-down list, select Disks to modify the disk properties
as shown in Figure 12-19.
Figure 12-19 Cluster storage management
Capture Snapshot
You can capture and manage snapshots through the Snapshots tab. To capture a new
snapshot, click the Create button on the Snapshots tab as shown in Figure 12-20.
Figure 12-20 Capture Snapshot function
Chapter 12. Creating and managing a cluster using IBM Systems Director
345
File collection and logs management
You can manage file collection and logs on the Additional Properties tab. From the View
drop-down list, select either File Collections or Log files as shown in Figure 12-21.
Figure 12-21 Additional Properties tab: File Collections and Log files options
Creating a file collection
On the Additional Properties tab, when you select File Collections from the View
drop-down list and click the Create button, you can create a file collection as shown in
Figure 12-22.
Figure 12-22 Creating a file collection
346
IBM PowerHA SystemMirror 7.1 for AIX
Collect log files button
On the Additional Properties tab, when you select Log files from the View drop-down list
and click the Collect log files button, you can collect log files as shown in Figure 12-23.
Figure 12-23 Collect log files
The Systems Director plug-in also provides a CLI to manage the cluster. The following section
explains the available CLI commands and how you can find help for each of these commands.
12.2.2 Performing cluster management with the SystemMirror plug-in CLI
The SystemMirror plug-in provides a CLI to most of the cluster management functions. For a
list o the available functions, use the following command:
smcli lsbundle | grep sysmirror
A few of the CLI commands are provided as follows for a quick reference:
Snapshot creation
You can use the smcli mksnapshot command to create a snapshot. Figure 12-24 on
page 348 shows the command for obtaining detailed help about this command.
mkss: mkss is the alias for the mksnapshot command.
Chapter 12. Creating and managing a cluster using IBM Systems Director
347
# smcli mkss -h -v
smcli sysmirror/mksnapshot [-h|-?|--help] [-v|--verbose]
smcli sysmirror/mksnapshot {-c|--cluster} <CLUSTER> \
{-d|--description} "<DESCRIPTION>" \
[{-M|--methods} <METHOD>[,<METHOD#2>,...] ] \
[-s|--save_logs] \
<snapshot_name>
Figure 12-24 Help details for the mksnapshot command
Figure 12-2 shows usage of the smcli mkss command.
Example 12-2 Usage of the mksnapshot command
smcli mkss -c selma04_cluster -d "Selma04 cluster snapshot taken on Sept2010"
selma04_sep10_ss
Verify the snapshot by using the smcli lsss command as shown in Example 12-3.
Example 12-3 Verifying the snapshot
# smcli lsss -c selma04_cluster selma04_sep10_ss
NAME="selma04_sep10_ss"
DESCRIPTION="Selma04 cluster snapshot taken on Sept2010"
METHODS=""
SAVE_LOGS="false"
CAPTURE_DATE="Sep 29 09:47"
NODE="selma03"
File collection
You can use the smcli mkfilecollection command to create a file collection as shown in
Example 12-4. A file collection helps to keep the files and directories synchronized on all
nodes in the cluster.
Example 12-4 File collection
# smcli mkfilecollection -c selma04_cluster -C -d "File Collection for the
selma04 cluster" -F /home selma04_file_collection
# smcli lsfilecollection -c selma04_cluster selma04_file_collection
NAME="selma04_file_collection"
DESCRIPTION="File Collection for the selma04 cluster"
FILE="/home"
SIZE="256"
Log files
You can use the smcli lslog command (Example 12-5) to list the available log files in the
cluster. Then you can use the smcli vlog command to view the log files.
Example 12-5 Log file management
# smcli lslog -c selma04_cluster
Node: selma03
=============
autoverify.log
348
IBM PowerHA SystemMirror 7.1 for AIX
cl2siteconfig_assist.log
cl_testtool.log
clavan.log
clcomd.log
clcomddiag.log
....
....(output truncated)
#
smcli vlog -c selma04_cluster -n
Collector succeeded
Collector succeeded
Collector succeeded
selma03
0
selma03
on node
on node
on node
-T 4 clverify.log
selma03 (31610 bytes)
selma03 (4250 bytes)
selma03 (26 bytes)
Modification functionality: At the time of writing this IBM Redbooks publication, an edit
or modification CLI command, such as to modify the controlling node, is not available for its
initial release. Therefore, use the GUI wizards for the modification functionality.
12.3 Creating a resource group with the SystemMirror plug-in
GUI wizard
You can configure the resource group by using the Resource Group Wizard as follows:
1. Log in to IBM Systems Director.
2. In the left navigation area, expand Availability and select PowerHA SystemMirror
(Figure 12-25).
3. In the right pane, under Resource Group Management, click Add a resource group link.
Figure 12-25 Resource group management
Chapter 12. Creating and managing a cluster using IBM Systems Director
349
4. On the Clusters tab, click the Actions list and select Add Resource Group
(Figure 12-26). Then select the cluster node, and click the Action button.
Alternative: You can select the resource group configuration wizard by selecting the
cluster nodes, as shown in Figure 12-26.
Figure 12-26 Adding a resource group
5. In the Choose a cluster pane (Figure 12-27), choose the cluster where the resource group.
Notice that this step is highlighted under welcome in the left pane.
Figure 12-27 Choose the cluster for the resource group configuration
You can now choose to create either a custom resource group or a predefined resource group
as explained in 12.3.1, “Creating a custom resource group” on page 351, and 12.3.2,
“Creating a predefined resource group” on page 353.
350
IBM PowerHA SystemMirror 7.1 for AIX
12.3.1 Creating a custom resource group
To create a customer resource group, follow these steps:
1. In the Add a resource group pane (Figure 12-28), select the Create a custom resource
group option, enter a resource group name, and click Next.
Figure 12-28 Adding a resource group
2. In the Choose nodes pane (Figure 12-29), select the nodes for which you want to
configure the resource group.
Figure 12-29 Selecting the nodes for configuring a resource group
Chapter 12. Creating and managing a cluster using IBM Systems Director
351
3. In the Choose policies and attributes pane (Figure 12-30), select the policies to add to the
resource group.
Figure 12-30 Selecting the policies and attributes
4. In the Choose resources pane (Figure 12-31), select the shared resources to define for
the resource group.
Figure 12-31 Selecting the shared resources
352
IBM PowerHA SystemMirror 7.1 for AIX
5. In the Summary pane (Figure 12-32), review the settings and click the Finish button to
create the resource group.
Figure 12-32 Summary pane of the Resource Creation wizard
12.3.2 Creating a predefined resource group
For a set of applications, such as IBM SAP, WebSphere®, DB2, HTTP Server, and Tivoli
Directory Server, the SystemMirror plug-in facilitates the process of creating predefined
resource groups.
To configure the predefined resource groups, follow these steps:
1. In the Add a resource group pane (Figure 12-33 on page 354), select the Create
predefined resource groups for one of the following discovered applications radio
button. Then select the application for which the resource group is to be configured.
Application list: Only the applications installed in the cluster nodes are displayed
under the predefined resource group list.
Chapter 12. Creating and managing a cluster using IBM Systems Director
353
Figure 12-33 Predefined resource group configuration
2. In the Choose components pane, for the predefined resource group, select the
components of the application to create the resource group. In the example shown in
Figure 12-34, the Tivoli Director Server component is selected. Each component already
has the predefined properties such as the primary node and takeover node.
Modify the properties per your configuration and requirements. Then create the resource
group.
Figure 12-34 Application components
354
IBM PowerHA SystemMirror 7.1 for AIX
12.3.3 Verifying the creation of a resource group
To verify the creation of a resource group, follow these steps:
1. In the right pane, under Cluster Management, click the Manage Clusters link
(Figure 12-15 on page 342).
2. Click the Resource Groups tab (Figure 12-35).
Figure 12-35 Resource Groups tab
3. Enter the following base SystemMirror command to verify that the resource group has
been created:
/usr/es/sbin/cluster/utilities/clshowres
12.4 Managing a resource group
You can manage a resource group by using the SystemMirror plug-in wizard or the
SystemMirror plug-in CLI commands. This topic explains how to use both methods.
12.4.1 Resource group management using the SystemMirror plug-in wizard
The SystemMirror plug-in wizard has simplified resource group management with the addition
of the following functionalities:
Checking the status of a resource group
Moving a resource group across nodes
Creating dependencies
Accessing the resource group management wizard
To access the Resource Group Management wizard, follow these steps:
1. Log in to IBM Systems Director.
2. In the left pane, expand Availability and select PowerHA SystemMirror (Figure 12-36 on
page 356).
Chapter 12. Creating and managing a cluster using IBM Systems Director
355
3. In the right pane, under Resource Group Management, select Manage Resource Groups
(Figure 12-36).
Figure 12-36 Resource group management link
The Resource Group Management wizard opens as in Figure 12-37. Alternatively, you can
access the Resource Group Management wizard by selecting Manage Cluster under Cluster
Management (Figure 12-36).
To access the Cluster and Resource Group Management wizard, select the Resource
Groups tab as shown in Figure 12-37.
Figure 12-37 Resource Group Management tab
356
IBM PowerHA SystemMirror 7.1 for AIX
Resource group management functionality
The Resource Group Management wizard includes the following functions:
Create Dependency function
a. Select the Clusters button to see the resource groups defined under the cluster.
b. Click the Action list and select Create Dependency (as shown in Figure 12-38).
Alternatively, right-click a cluster name and select Create Dependency.
Figure 12-38 Selecting the Create Dependency function
c. In the Parent-child window (Figure 12-39), select the dependency type to configure the
dependencies.
Figure 12-39 Parent-child window
Chapter 12. Creating and managing a cluster using IBM Systems Director
357
Resource group removal
Right-click the selected resource group, and click Remove to remove the resource group
as shown in Figure 12-40.
Figure 12-40 Cluster and Resource Group Management pane
Application Availability and Configuration reports
The Application Availability and Configuration reports show the configuration details of the
resource group. The output of these reports is similar to the output produced by the
clshowres command in the base PowerHA installation. You can also see the status of the
application. To access these reports, right-click a resource group name, select Reports
and then select Application Availability or Configuration as shown in Figure 12-41.
Figure 12-41 Application Monitors
358
IBM PowerHA SystemMirror 7.1 for AIX
Resource group status change
To view move, online, and offline status changes, right-click a resource group name and
select Advanced. Then select the option you need as shown in Figure 12-42.
Figure 12-42 viewing a status change
12.4.2 Managing a resource group with the SystemMirror plug-in CLI
Similar to the CLI commands for cluster creation and management, a set of CLI commands
are provided for resource group management. To list the available CLI commands for
managing the cluster, run the smcli lsbundle command (Figure 12-12 on page 340).
The following commands are specific to resource groups:
To remove the resource group in the controlling node:
sysmirror/rmresgrp
To start the resource group in online state:
sysmirror/startresgrp
To stop the resource group to an offline state:
sysmirror/stopresgrp
To move the resource group to an online state:
sysmirror/moveresgrp
To list all the configured resource groups:
sysmirror/lsresgrp
If the resource group name is used along with this command, it provides the details of the
resource group.
Chapter 12. Creating and managing a cluster using IBM Systems Director
359
Examples of CLI command usage
This section shows examples using the CLI commands for resource group management.
To list the resource groups, use the following command as shown in Example 12-6:
smcli lsresgrp -c <cluster name>
Example 12-6 The smcli lsresgrp command
# smcli lsresgrp -c selma04_cluster
myRG
RG01_selma03
RG02_selma03
RG03_selma04
RG04_selma04_1
RG05_selma03_04
RG06_selma03_04
RG_dhe
To remove the resource group, use the following command as shown in Example 12-7:
smcli rmresgrp -c <cluster name> -C <RG_name>
Example 12-7 The smcli rmresgrp command using the -C option to confirm the removal operation
# smcli rmresgrp -c selma04_cluster Test_AltRG
Removing this resource group will cause all user-defined PowerHA information
to be DELETED.
Removing objects is something which is not easily reversed, and therefore
requires confirmation. If you are sure that you want to proceed with this
removal operation, re-run the command using the "--confirm" or "-C" option.
Consider creating a snapshot of the current cluster configuration first,
though, since restoring a snapshot will be the only way to reverse any
deletions.
12.5 Verifying and synchronizing a configuration
You can verify and synchronize a cluster by using the wizard for the SystemMirror plug-in or
by using the CLI commands for the SystemMirror plug-in. This topic explains how to use both
methods.
12.5.1 Verifying and synchronizing a configuration with the GUI
To verify and synchronize the configuration by using the Synchronization and Verification
function of the SystemMirror plug-in, follow these steps:
1. Log in to IBM Systems Director.
2. Expand Availability and select PowerHA SystemMirror as shown in Figure 12-4 on
page 335.
3. Under Cluster Management, select the Manage Clusters link.
360
IBM PowerHA SystemMirror 7.1 for AIX
4. In the Cluster and Resource Group Management wizard, select the cluster for which you
want to perform the synchronize and verification function. Then select the Action button or
right-click the cluster to access the Verify and Synchronize option as shown in
Figure 12-43.
Figure 12-43 Cluster management option list
5. In the Verify and Synchronize pane (Figure 12-44), select whether you want to
synchronize the entire configuration, only the unsynchronized changes, or verify. Then
click OK.
Figure 12-44 Verify and Synchronize window
Chapter 12. Creating and managing a cluster using IBM Systems Director
361
6. Optional: Undo the changes to the configuration after synchronization.
a. To access this option, in the Cluster and Resource Group Management wizard, on the
Clusters tab, select the cluster for which you want to perform the synchronize and
verification function (Figure 12-43 on page 361).
b. As shown in (Figure 12-45), select the Recovery  Undo local changes of
configuration.
Figure 12-45 Recovering the configuration option
c. When you see the Undo Local Changes of the Configuration message (Figure 12-46),
click OK.
Figure 12-46 Undo changes message window
Snapshot for the undo changes option: The undo changes option creates a
snapshot before it deletes the configuration since the last synchronization.
362
IBM PowerHA SystemMirror 7.1 for AIX
12.5.2 Verifying and synchronizing with the CLI
This section shows examples of performing cluster verification and synchronization by using
the CLI functionality:
Synchronization
You can use the synccluster command to verify and synchronize the cluster. This
command copies the cluster configuration from the controlling node of the specified cluster
to each of the other nodes in the cluster.
The help option is available by using the smcli synccluster -h -v command as shown in
Example 12-8. Here you see options such as to perform a verification or synchronization
(see Example 12-9).
Example 12-8 The help option of the smcli synccluster command
# smcli sysmirror/synccluster -h -v
smcli sysmirror/synccluster {-h|-?|--help} [-v|--verbose]
smcli sysmirror/synccluster [-n|--no_verification}] \
<CLUSTER>
smcli sysmirror/synccluster [-x|--fix_errors}] \
[-C|--changes_only}] \
[-t|--custom_tests_only}] \
[{-M|--methods} <METHOD>[,<METHOD#2>,...] ] \
[{-e|--maximum_errors} <##>] \
[-F|--force] \
[{-l|--logfile} <full_path_to_file>] \
<CLUSTER>
Command Alias: sycl
.....
.....
<output truncated>
Example 12-9 shows how to synchronize cluster changes and to log the output in its own
specific log file.
Example 12-9 smcli synccluster changes only with the log file option
# smcli synccluster -C -l /tmp/sync.log selma04_cluster
Undo changes
To restore the cluster configuration back to the configuration after any synchronization,
use the smcli undochanges command. This operation restores the cluster configuration
from the active configuration database. Typically, this command has the effect of
discarding any unsynchronized changes.
The help option is available by using the smcli undochanges -h -v command as shown in
Example 12-10.
Example 12-10 The help option for the smcli undochanges command
# smcli undochanges -h -v
smcli sysmirror/undochanges {-h|-?|--help} [-v|--verbose]
smcli sysmirror/undochanges <CLUSTER>
Chapter 12. Creating and managing a cluster using IBM Systems Director
363
Command Alias: undo
-h|-?|--help
Requests help for this command.
-v|--verbose
Requests maximum details in the displayed information.
<CLUSTER> The label of a cluster to perform this operation on.
...
<output truncated >
12.6 Performing cluster monitoring with the SystemMirror
plug-in
This topic explains how to monitor the status of the cluster and the resource group before and
while the cluster services are active. It also covers problem determination steps and how to
collect log files to analyze cluster issues.
12.6.1 Monitoring cluster activities before starting a cluster
This section explains the features you can use to monitor for cluster activities before starting
the cluster:
Topology view
After the cluster and its resource groups are configured, select the topology view to
understand the overall status of cluster and its configuration:
a. Log in to IBM Systems Director.
b. Expand Availability and select PowerHA SystemMirror as shown in Figure 12-4 on
page 335.
c. In the right pane, select the cluster to be monitored and click Actions. Select Map
View (Figure 12-47) to access the Map view of the cluster configuration.
Figure 12-47 Map view of cluster configuration
364
IBM PowerHA SystemMirror 7.1 for AIX
Map view: The map view is available for resource configuration. As shown in
Figure 12-47 on page 364, select the Resource Groups tab. Click Action, and click
Map View to see the map view of the resource group configuration as shown in
Figure 12-48.
Test_AhRG
myRG
RG_test_NChg
_testinggg
RG_testing11
RG_testing9
RG01_selma03
RG_testing6
selma_04_cluster
RG_testing2
RG05_selma03_04
RG_TEST_4
RG06_selma03_04
Figure 12-48 Map view of resource group configuration
Chapter 12. Creating and managing a cluster using IBM Systems Director
365
Cluster subsystem services status:
You can view the status of PowerHA services, such as the clcomd subsystem, by using the
Status feature. To access this feature, select the cluster for which the service status is to
be viewed. Click the Action button and select Reports  Status.
You now see the cluster service status details, similar to the example in Figure 12-49.
Figure 12-49 Cluster Services status
Cluster Configuration Report
Before starting the cluster services, access the cluster configuration report. Select the
cluster for which the configuration report is to be viewed. Click the Action button and
select Reports, which shows the Cluster Configuration Report page (Figure 12-50).
Figure 12-50 Cluster Configuration Report
366
IBM PowerHA SystemMirror 7.1 for AIX
You can also view the Cluster Topology Configuration Report by using the following
command:
/usr/es/sbin/cluster/utilities/cltopinfo
Then select the cluster, click the Action button, and select Reports  Configuration.
You see the results in a format similar to the example in Figure 12-51.
Figure 12-51 Cluster Topology Configuration Report
Similarly you can view the configuration report for the resource group as shown in
Figure 12-52. On the Resource Groups tab, select the resource group for which you want
to view the configuration. Then click the Action button and select Reports.
Figure 12-52 Resource Group Configuration Report
Chapter 12. Creating and managing a cluster using IBM Systems Director
367
Application monitoring
To locate the details of the application monitors that are configured and assigned to a
resource group, select the cluster. Click the Action button and select Reports 
Applications. Figure 12-53 shows the status of the application monitoring.
Figure 12-53 Application monitoring status
Similarly you can view the configuration report for networks and interfaces by selecting the
cluster, clicking the Action button, and selecting Reports  Networks and Interfaces.
12.6.2 Monitoring an active cluster
When the cluster service is active, to see the status of the resource group, select the cluster
for which the status is to be viewed. Click the Action button and select Report  Event
Summary. You can now access the online status of the resource group and events summary
as shown in Figure 12-54.
Figure 12-54 Resource group online status
368
IBM PowerHA SystemMirror 7.1 for AIX
12.6.3 Recovering from cluster configuration issues
To recover from cluster configuration issues, such as recovering from an event failure and
undoing local changes, consider the following tips:
Getting the proper GUI
Select the cluster and click the Actions button. Then select Recovery and choose the
appropriate action as shown in Figure 12-55.
Figure 12-55 Recovery options
Releasing cluster modification locks
After you issue the release of the cluster modification locks, you see a message similar to
the one shown in Figure 12-56. Before you perform the operation, save a snapshot of the
cluster as indicated in the message.
Figure 12-56 Release cluster modification locks
Chapter 12. Creating and managing a cluster using IBM Systems Director
369
Recovering from an event failure
After you issue a cluster recover from event failure, you see a message similar to the one
shown in Figure 12-57. Verify that you have addressed all problems that led to the error
before continuing with the operation.
Figure 12-57 Recovery from an event failure
Collecting problem determination data
To collect problem determination data, select the Turn on debugging option and Collect
the RSCT log files (Figure 12-58).
Figure 12-58 Collect Problem Determination Data window
Undoing local changes of a configuration
To undo local changes of a configuration, see 12.5.1, “Verifying and synchronizing a
configuration with the GUI” on page 360.
370
IBM PowerHA SystemMirror 7.1 for AIX
13
Chapter 13.
Disaster recovery using DS8700
Global Mirror
This chapter explains how to configure disaster recovery based on IBM PowerHA
SystemMirror for AIX Enterprise Edition using IBM System Storage DS8700 Global Mirror as
a replicated resource. This support was added in version 6.1 with service pack 3 (SP3).
This chapter includes the following topics:
Planning for Global Mirror
Installing the DSCLI client software
Scenario description
Configuring the Global Mirror resources
Configuring AIX volume groups
Configuring the cluster
Failover testing
LVM administration of DS8000 Global Mirror replicated resources
© Copyright IBM Corp. 2011. All rights reserved.
371
13.1 Planning for Global Mirror
Proper planning is crucial to the success of any disaster recovery solution. This topic reveals
the basic requirements to implement Global Mirror and integrate it with the IBM PowerHA
SystemMirror for AIX Enterprise Edition.
13.1.1 Software prerequisites
Global Mirror functionality works with all the AIX levels that are supported by PowerHA
SystemMirror Standard Edition. The following software is required for the configuration of the
PowerHA SystemMirror for AIX Enterprise Edition for Global Mirror:
The following base file sets for PowerHA SystemMirror for AIX Enterprise Edition 6.1:
–
–
–
–
–
cluster.es.pprc.cmds
cluster.es.pprc.rte
cluster.es.spprc.cmds
cluster.es.spprc.rte
cluster.msg.en_US.pprc
PPRC and SPPRC file sets: The PPRC and SPPRC file sets are not required for
Global Mirror support on PowerHA.
The following additional file sets included in SP3 (must be installed separately and require
the acceptance of licenses during the installation):
– cluster.es.genxd
cluster.es.genxd.cmds
cluster.es.genxd.rte
6.1.0.0 Generic XD support - Commands
6.1.0.0 Generic XD support - Runtime
– cluster.msg.en_US.genxd
cluster.msg.en_US.genxd
6.1.0.0 Generic XD support - Messages
AIX supported levels:
– 5.3 TL9, RSCT 2.4.12.0, or later
– 6.1 TL2 SP1, RSCT 2.5.4.0, or later
The IBM DS8700 microcode bundle 75.1.145.0 or later
DS8000 CLI (DSCLI) 6.5.1.203 or later client interface (must be installed on each
PowerHA SystemMirror node):
– Java 1.4.1 or later
– APAR IZ74478, which removes the previous Java requirement
The path name for the DSCLI client in the PATH for the root user on each PowerHA
SystemMirror node (must be added)
13.1.2 Minimum DS8700 requirements
Before you implement PowerHA SystemMirror with Global Mirror, you must ensure that the
following requirements are met:
Collect the following information for all the HMCs in your environment:
– IP addresses
– Login names and passwords
– Associations with storage units
372
IBM PowerHA SystemMirror 7.1 for AIX
Verify that all the data volumes that must be mirrored are visible to all relevant AIX hosts.
Verify that the DS8700 volumes are appropriately zoned so that the IBM FlashCopy®
volumes are not visible to the PowerHA SystemMirror nodes.
Ensure all Hardware Management Consoles (HMCs) are accessible by using the Internet
Protocol network for all PowerHA SystemMirror nodes where you want to run Global
Mirror.
13.1.3 Considerations
The PowerHA SystemMirror Enterprise Edition using DS8700 Global Mirror has the following
considerations:
The AIX Virtual SCSI is not supported in this initial release.
No auto-recovery is available from a PPRC path or link failure.
If the PPRC path or link between Global Mirror volumes breaks down, the PowerHA
Enterprise Edition is unaware of it. (PowerHA does not process Simple Network
Management Protocol (SNMP) for volumes that use DS8K Global Mirror technology for
mirroring). In this case, the user must identify and correct the PPRC path failure.
Depending on timing conditions, such an event can result in the corresponding Global
Mirror session to go to a “Fatal” state. If this situation occurs, the user must manually stop
and restart the corresponding Global Mirror Session (using the rmgmir and mkgmir DSCLI
commands) or an equivalent DS8700 interface.
Cluster Single Point Of Control (C-SPOC) cannot perform the some Logical Volume
Manager (LVM) operations on nodes at the remote site that contain the target volumes.
Operations that require nodes at the target site to read from the target volumes result in an
error message in C-SPOC. Such operations include such functions as changing the file
system size, changing the mount point, and adding LVM mirrors. However, nodes on the
same site as the source volumes can successfully perform these tasks, and the changes
can be propagated later to the other site by using a lazy update.
Attention: For C-SPOC operations to work on all other LVM operations, you must
perform all C-SPOC operations with the DS8700 Global Mirror volume pairs in a
synchronized or consistent state. Alternatively, you must perform them in the active
cluster on all nodes.
The volume group names must be listed in the same order as the DS8700 mirror group
names in the resource group.
13.2 Installing the DSCLI client software
You can download the latest version of the DS8000 DSCLI client software from the following
web page:
ftp://ftp.software.ibm.com/storage/ds8000/updates/DS8K_Customer_Download_Files/CLI
Install the DS8000 DSCLI software on each PowerHA SystemMirror node. By default, the
installation process installs the DSCLI in the /opt/ibm/dscli directory. Add the installation
directory of the DSCLI into the PATH environment variable for the root user.
For more details about the DS8000 DSCLI, see the IBM System Storage DS8000:
Command-Line Interface User’s Guide, SC26-7916.
Chapter 13. Disaster recovery using DS8700 Global Mirror
373
13.3 Scenario description
This scenario uses a three-node cluster named Txrmnia. Two nodes are in the primary site,
Texas, and one node is in the site Romania. The jordan and leeann nodes are at the Texas
site and the robert node is at the Romania site. The primary site, Texas, has both local
automatic failover and remote recovery. Figure 13-1 provides a software and hardware
overview of the tested configuration between the two sites.
Txrmnia
Figure 13-1 DS8700 Global Mirror test scenario
For this test, the resources are limited. Each system has a single IP, an XD_ip network, and
single Fibre Channel (FC) host adapters. Ideally, redundancy might exist throughout the
system, including in the local Ethernet networks, cross-site XD_ip networks, and FC
connectivity. This scenario has a single resource group, ds8kgmrg, which consists of a service
IP address (service_1), a volume group (txvg), and a DS8000 Global Mirror replicated
resource (texasmg). To configure the cluster, see 13.6, “Configuring the cluster” on page 385.
13.4 Configuring the Global Mirror resources
This section explains how to perform the following tasks:
Checking the prerequisites
Identifying the source and target volumes
Configuring the Global Mirror relationships
For each task, the DS8000 storage units are already added to the storage area network
(SAN) fabric and zoned appropriately. Also, the volumes are already provisioned to the nodes.
374
IBM PowerHA SystemMirror 7.1 for AIX
For details about how to set up the storage units, see IBM System Storage DS8700
Architecture and Implementation, SG24-8786.
13.4.1 Checking the prerequisites
To check the prerequisites, follow these steps:
1. Ensure that the DSCLI installation path is in the PATH environment variable on all nodes.
2. Verify that you have the appropriate microcode version on each storage unit by running
the ver -lmc command in a DSCLI session as shown in Example 13-1.
Example 13-1 Checking the microcode level
(0) root @ r9r4m21: : /
# dscli -cfg /opt/ibm/dscli/profile/dscli.profile.hmc1
Date/Time: October 6, 2010 2:15:33 PM CDT IBM DSCLI Version: 6.5.15.19
IBM.2107-75DC890
DS:
dscli> ver -lmc
Date/Time: October 6, 2010 2:15:41 PM CDT IBM DSCLI Version: 6.5.15.19 DS: Storage Image
LMC
==========================
IBM.2107-75DC890 5.5.1.490
dscli>
3. Check the code bundle level that corresponds to your LMC version on the “DS8700 Code
Bundle Information” web page at:
http://www.ibm.com/support/docview.wss?uid=ssg1S1003593
The code bundle level must be at version 75.1.145.0 or later. Also on the same page,
verify that your displayed DSCLI version corresponds to the installed code bundle level or
a later level.
Example 13-2 shows the extra parameters inserted into the DSCLI configuration file for the
storage unit in the primary site, /opt/ibm/dscli/profile/dscli.profile.hmc1. Adding these
parameters helps to prevent from having to type them each time they are required.
Example 13-2 Editing the DSCLI configuration file
username: redbook
password: r3dbook
hmc1:
9.3.207.122
devid:
IBM.2107-75DC890
remotedevid:
IBM.2107-75DC980
13.4.2 Identifying the source and target volumes
Figure 13-2 on page 376 shows the volume allocation in DS8000 units for the scenario in this
chapter. Global Copy source volumes are attached to both nodes in the primary site, Texas,
and the corresponding Global Copy target volumes are attached to the node in the secondary
site, Romania. The gray volumes, FlashCopy targets, are not exposed to the hosts.
Chapter 13. Disaster recovery using DS8700 Global Mirror
375
Texas
Romania
2604
0A08
2C04
2C00
2600
2E00
2804
2800
Global Copy
Data Volume
Flash Copy
Flash Copy Volume
Figure 13-2 Volume allocation in DS8000 units
Table 13-1 shows the association between the source and target volumes of the replication
relationship and between their logical subsystems (LSS, the two most significant digits of a
volume identifier highlighted in bold in the table). Table 13-1 also indicates the mapping
between the volumes in the DS8000 units and their disk names on the attached AIX hosts.
Table 13-1 AIX hdisk to DS8000 volume mapping
Site Texas
Site Romania
AIX disk
LSS/VOL ID
LSS/VOL ID
AIX disk
hdisk10
2E00
2800
hdisk2
hdisk6
2600
2C00
hdisk6
You can easily obtain this mapping by using the lscfg -vl hdiskX | grep Serial command
as shown in Example 13-3. The hdisk serial number is a concatenation of the storage image
serial number and the ID of the volume at the storage level.
Example 13-3 The hdisk serial number in the lscfg command output
# lscfg -vl hdisk10 | grep Serial
Serial Number...............75DC8902E00
# lscfg -vl hdisk6 | grep Serial
Serial Number...............75DC8902600
Symmetrical configuration: In an actual environment (and different from this sample
environment), to simplify the management of your Global Mirror environment, maintain a
symmetrical configuration in terms of both physical and logical elements. With this type of
configuration, you can keep the same AIX disk definitions on all nodes. It also helps you
during configuration and management operations of the disk volumes within the cluster.
376
IBM PowerHA SystemMirror 7.1 for AIX
13.4.3 Configuring the Global Mirror relationships
In this section, you configure the Global Mirror replication relationships by performing the
following tasks:
Creating PPRC paths
Creating Global Copy relationships
Creating FlashCopy relationships
Selecting an available Global Mirror session identifier
Defining Global Mirror sessions for all involved LSSs
Including all the source and target volumes in the Global Mirror session
Creating PPRC paths
In this task, the appropriate FC links have been configured between the storage units.
Example 13-4 shows the FC links that are available for the setup.
Example 13-4 Available FC links
dscli> lsavailpprcport -remotewwnn 5005076308FFC804 2e:28
Date/Time: October 5, 2010 5:48:09 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC890
Local Port Attached Port Type
=============================
I0010
I0210
FCP
I0013
I0203
FCP
I0013
I0310
FCP
I0030
I0200
FCP
I0030
I0230
FCP
I0030
I0330
FCP
I0040
I0200
FCP
I0040
I0230
FCP
I0041
I0232
FCP
I0041
I0331
FCP
I0042
I0211
FCP
I0110
I0203
FCP
I0110
I0310
FCP
I0110
I0311
FCP
I0111
I0310
FCP
I0111
I0311
FCP
I0130
I0200
FCP
I0130
I0230
FCP
I0130
I0300
FCP
I0130
I0330
FCP
I0132
I0232
FCP
I0132
I0331
FCP
dscli>
Complete the following steps:
1. Run the lssi command on the remote storage unit to obtain the remote wwnn parameter
for the lsavailpprcport command. The last parameter is one possible pair of your source
and target LSSs.
2. For redundancy and bandwidth, configure more FC links by using redundant SAN fabrics.
Chapter 13. Disaster recovery using DS8700 Global Mirror
377
3. Among the multiple displayed links, choose two that have their ports on different adapters.
Use them to create the PPRC path for the 2e:28 LSS pair (see Example 13-5).
Example 13-5 Creating pprc paths
dscli> mkpprcpath -remotewwnn 5005076308FFC804 -srclss 2e -tgtlss 28
I0030:I0230 I0110:I0203
Date/Time: October 5, 2010 5:55:46 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC890
CMUC00149I mkpprcpath: Remote Mirror and Copy path 2e:28 successfully
established.
dscli> lspprcpath 2e
Date/Time: October 5, 2010 5:56:13 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC890
Src Tgt State
SS
Port Attached Port Tgt WWNN
=========================================================
2E 28 Success FF28 I0030 I0230
5005076308FFC804
2E 28 Success FF28 I0110 I0203
5005076308FFC804
dscli>
4. In a similar manner, configure one PPRC path for each other involved LSS pair.
5. Because the PPRC paths are unidirectional, create a second path, in the opposite
direction, for each LSS pair. You use the same procedure, but work on the other storage
unit (see Example 13-6). We select different FC links for this direction.
Example 13-6 Creating PPRC paths in opposite directions
dscli> mkpprcpath -remotewwnn 5005076308FFC004 -srclss 28 -tgtlss 2e
I0311:I0111 I0300:I0130
Date/Time: October 5, 2010 5:57:02 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC980
CMUC00149I mkpprcpath: Remote Mirror and Copy path 28:2e successfully
established.
dscli>
Creating Global Copy relationships
Create Global Copy relationship between the source and target volumes and then check their
status by using the commands shown in Example 13-7.
Example 13-7 Creating Global Copy relationships
dscli> mkpprc -type gcp 2e00:2800 2600:2c00
Date/Time: October 5, 2010 5:57:13 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship 2E00:2800 successfully created.
CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship 2600:2C00 successfully created.
dscli> lspprc 2e00:2800 2600:2c00
Date/Time: October 5, 2010 5:57:42 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2600:2C00 Copy Pending Global Copy 26
60
Disabled
True
2E00:2800 Copy Pending Global Copy 2E
60
Disabled
True
dscli>
378
IBM PowerHA SystemMirror 7.1 for AIX
Creating FlashCopy relationships
Create FlashCopy relationships on both DS8000 storage units as shown in Example 13-8.
Example 13-8 Creating FlashCopy relationships
dscli> mkflash -tgtinhibit -nocp -record 2e00:0a08 2600:2604
Date/Time: October 5, 2010 4:17:13 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00137I mkflash: FlashCopy pair 2E00:0A08 successfully created.
CMUC00137I mkflash: FlashCopy pair 2600:2604 successfully created.
dscli> lsflash 2e00:0a08 2600:2604
Date/Time: October 5, 2010 4:17:31 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID
SrcLSS SequenceNum Timeout ActiveCopy Recording Persistent Revertible
SourceWriteEnabled TargetWriteEnabled BackgroundCopy
===========================================================================================
2E00:0A08 0A
0
60
Disabled Enabled Enabled
Disabled Enabled
Disabled
Disabled
2600:2604 26
0
60
Disabled Enabled Enabled
Disabled Enabled
Disabled
Disabled
dscli>
dscli> mkflash -tgtinhibit -nocp -record 2800:2804 2c00:2c04
Date/Time: October 5, 2010 4:20:14 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
CMUC00137I mkflash: FlashCopy pair 2800:2804 successfully created.
CMUC00137I mkflash: FlashCopy pair 2C00:2C04 successfully created.
dscli> lsflash 2800:2804 2c00:2c04
Date/Time: October 5, 2010 4:20:38 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID
SrcLSS SequenceNum Timeout ActiveCopy Recording Persistent Revertible
SourceWriteEnabled TargetWriteEnabled BackgroundCopy
===========================================================================================
2800:2804 28
0
60
Disabled Enabled Enabled
Disabled Enabled
Disabled
Disabled
2C00:2C04 2C
0
60
Disabled Enabled Enabled
Disabled Enabled
Disabled
Disabled
dscli>
Selecting an available Global Mirror session identifier
Example 13-9 lists the Global Mirror sessions that are already defined on each DS8000
storage unit. In this scenario, we chose 03 as the session identifier because it is free on both
storage units.
Example 13-9 Sessions defined on both DS8000 storage units
dscli> lssession 00-ff
Date/Time: October 5, 2010 6:07:19 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status Volume VolumeStatus PrimaryStatus
SecondaryStatus
FirstPassComplete
AllowCascading
===========================================================================================================
04
77
Normal 0400
Join Pending Primary Copy Pending Secondary Simplex True
Disable
0A
04
Normal 0A04
Join Pending Primary Suspended
Secondary Simplex False
Disable
16
05
Normal 1604
Join Pending Primary Suspended
Secondary Simplex False
Disable
16
05
Normal 1605
Join Pending Primary Suspended
Secondary Simplex False
Disable
18
02
Normal 1800
Join Pending Primary Suspended
Secondary Simplex False
Disable
1C
04
Normal 1C00
Join Pending Primary Suspended
Secondary Simplex False
Disable
1C
04
Normal 1C01
Join Pending Primary Suspended
Secondary Simplex False
Disable
dscli> lssession 00-ff
Date/Time: October 5, 2010 6:08:23 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
Chapter 13. Disaster recovery using DS8700 Global Mirror
379
LSS ID Session Status Volume VolumeStatus PrimaryStatus
SecondaryStatus
FirstPassComplete
AllowCascading
===========================================================================================================
1A
20
Normal 1A00
Join Pending Primary Simplex Secondary Copy Pending True
Disable
1C
01
30
77
Normal 3000
Join Pending Primary Simplex Secondary Copy Pending True
Disable
dscli>
Defining Global Mirror sessions for all involved LSSs
Define the Global Mirror sessions for all the LSSs associated with source and target volumes
as shown in Example 13-10. The same freely available session identifier, determined in
“Selecting an available Global Mirror session identifier” on page 379, is used on both storage
units.
Example 13-10 Defining the GM session for the source and target volumes
dscli> mksession -lss
Date/Time: October 5,
CMUC00145I mksession:
dscli> mksession -lss
Date/Time: October 5,
CMUC00145I mksession:
2e 03
2010 6:11:07 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
Session 03 opened successfully.
26 03
2010 6:11:25 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
Session 03 opened successfully.
dscli> mksession -lss
Date/Time: October 6,
CMUC00145I mksession:
dscli> mksession -lss
Date/Time: October 6,
CMUC00145I mksession:
dscli>
28 03
2010 5:39:02 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
Session 03 opened successfully.
2c 03
2010 5:39:15 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
Session 03 opened successfully.
Including all the source and target volumes in the Global Mirror session
Add the volumes in the Global Mirror sessions and verify their status by using the commands
shown in Example 13-11.
Example 13-11 Adding source and target volumes to the Global Mirror sessions
dscli> chsession -lss 26 -action add -volume 2600 03
Date/Time: October 5, 2010 6:15:17 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00147I chsession: Session 03 successfully modified.
dscli> chsession -lss 2e -action add -volume 2e00 03
Date/Time: October 5, 2010 6:15:56 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00147I chsession: Session 03 successfully modified.
dscli> lssession 26 2e
Date/Time: October 5, 2010 6:16:21 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status Volume VolumeStatus PrimaryStatus
SecondaryStatus
FirstPassComplete
AllowCascading
===========================================================================================================
26
03
Normal 2600
Join Pending Primary Copy Pending Secondary Simplex True
Disable
2E
03
Normal 2E00
Join Pending Primary Copy Pending Secondary Simplex True
Disable
dscli>
dscli> chsession -lss
Date/Time: October 6,
CMUC00147I chsession:
dscli> chsession -lss
Date/Time: October 6,
380
2c -action add -volume 2c00 03
2010 5:41:12 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
Session 03 successfully modified.
28 -action add -volume 2800 03
2010 5:41:56 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
IBM PowerHA SystemMirror 7.1 for AIX
CMUC00147I chsession: Session 03 successfully modified.
dscli> lssession 28 2c
Date/Time: October 6, 2010 5:44:02 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
LSS ID Session Status Volume VolumeStatus PrimaryStatus
SecondaryStatus
FirstPassComplete
AllowCascading
===========================================================================================================
28
03
Normal 2800
Join Pending Primary Simplex Secondary Copy Pending True
Disable
2C
03
Normal 2C00
Join Pending Primary Simplex Secondary Copy Pending True
Disable
dscli>
13.5 Configuring AIX volume groups
In this scenario, you create a volume group and a file system on the hdisks associated with
the DS8000 source volumes. These volumes are already identified in 13.4.2, “Identifying the
source and target volumes” on page 375. They are hdisk6 and hdisk10 on the jordan node.
You must configure the volume groups and file systems on the cluster nodes. The application
might need the same major number for the volume group on all nodes. Perform this
configuration task because it might be useful later for additional configuration of the Network
File System (NFS).
For the nodes on the primary site, you can use the standard procedure. You define the
volume groups and file systems on one node and then import them to the other nodes. For
the nodes on the secondary site, you must first suspend the replication on the involved target
volumes.
13.5.1 Configuring volume groups and file systems on primary site
In this task, you create an AIX volume group on the hdisks associated with the DS8000
source volumes on the jordan node and import it on the leeann node. Follow these steps:
1. Choose the next free major number on all cluster nodes by running the lvlstmajor
command on each cluster node. The next common free major number on all systems is 50
as shown in Example 13-12.
Example 13-12 Running the lvlstmajor command on all cluster nodes
root@leeann: lvlstmajor
50...
root@robert: lvlstmajor
44..54,56...
root@jordan: # lvlstmajor
50...
Chapter 13. Disaster recovery using DS8700 Global Mirror
381
2. Create a volume group, called txvg, and a file system, called /txro. These volumes are
already identified in 13.4.2, “Identifying the source and target volumes” on page 375. They
are hdisk6 and hdisk10 on the jordan node. Example 13-13 shows a list of commands to
run on the jordan node.
Example 13-13 Creating txvg volume group on jordan
root@jordan: mkvg -V 50 -y txvg hdisk6 hdisk10
0516-1254 mkvg: Changing the PVID in the ODM.
txvg
root@jordan:chvg -a n xvg
root@jordan: mklv -e x -t jfs2 -y txlv txvg 250
txlv
root@jordan: mklv -e x -t jfs2log -y txloglv txvg 1
txloglv
root@jordan: crfs -v jfs2 -d /dev/txlv -a log=/dev/txloglv -m /txro -A no
File system created successfully.
1023764 kilobytes total disk space.
New File System size is 2048000
root@jordan: lsvg -p txvg
txvg:
PV_NAME
PV STATE
TOTAL PPs
FREE PPs
FREE DISTRIBUTION
hdisk6
active
511
385
102..00..79..102..102
hdisk10
active
511
386
103..00..79..102..102
root@jordan:lspv|grep -e hdisk6 -e hdisk10
hdisk6
000a625afe2a4958
txvg
active
hdisk10
000a624a833e440f
txvg
active
root@jordan: varyoffvg txvg
root@jordan:
3. Import the volume group on the second node on the primary site, leeann, as shown in
Example 13-14:
a.
b.
c.
d.
Verify that the shared disks have the same PVID on both nodes.
Run the rmdev -dl command for each hdisk.
Run the cfgmgr program.
Run the importvg command.
Example 13-14 Importing the txvg volume group on the leeann node
root@leean: rmdev -dl hdisk6
hdisk6 deleted
root@leean: rmdev -dl hdisk10
hdisk10 deleted
root@leean: cfgmgr
root@leean:lspv | grep -e hdisk6 -e hdisk10
hdisk6
000a625afe2a4958
hdisk10
000a624a833e440f
root@leean: importvg -V 51 -y txvg hdisk6
txvg
root@leean: lsvg -l txvg
txvg:
LV NAME
TYPE
LPs
PPs
txlv
jfs2
250
250
txloglv
jfs2log
1
1
382
IBM PowerHA SystemMirror 7.1 for AIX
txvg
txvg
PVs
2
1
LV STATE
open/syncd
open/syncd
MOUNT POINT
/txro
N/A
root@leean: chvg -a n txvg
root@leean: varyoffvg txvg
13.5.2 Importing the volume groups in the remote site
To import the volume groups in the remote site, use the following steps. Example 13-15
shows the commands to run on the primary site.
1. Obtain a consistent replica of the data, on the primary site, by ensuring that the volume
group is varied off as shown by the last command in Example 13-14.
2. Ensure that the Global Copy is in progress and that the Out of Sync count is 0.
3. Suspend the replication by using the pausepprc command.
Example 13-15 Pausing the Global Copy relationship on the primary site
dscli> lspprc -l 2600 2e00
Date/Time: October 6, 2010 3:40:56 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID
State
Reason Type
Out Of Sync Tracks Tgt Read Src Cascade Tgt Cascade Date
Suspended SourceLSS Timeout (secs) Critical Mode First Pass Status Incremental Resync Tgt Write GMIR CG
PPRC CG isTgtSE DisableAutoResync
===========================================================================================================
===========================================================================================================
2600:2C00 Copy Pending Global Copy 0
Disabled Disabled
Invalid
26
60
Disabled
True
Disabled
Disabled N/A
Disabled
Unknown False
2E00:2800 Copy Pending Global Copy 0
Disabled Disabled
Invalid
2E
60
Disabled
True
Disabled
Disabled N/A
Disabled
Unknown False
dscli> pausepprc 2600:2C00 2E00:2800
Date/Time: October 6, 2010 3:49:29 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00157I pausepprc: Remote Mirror and Copy volume pair 2600:2C00 relationship successfully paused.
CMUC00157I pausepprc: Remote Mirror and Copy volume pair 2E00:2800 relationship successfully paused.
dscli> lspprc -l 2600 2e00
Date/Time: October 6, 2010 3:49:41 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID
State
Reason
Type
Out Of Sync Tracks Tgt Read Src Cascade Tgt Cascade Date
Suspended SourceLSS Timeout (secs) Critical Mode First Pass Status Incremental Resync Tgt Write GMIR CG
PPRC CG isTgtSE DisableAutoResync
===========================================================================================================
===========================================================================================================
2600:2C00 Suspended Host Source Global Copy 0
Disabled Disabled
Invalid
26
60
Disabled
True
Disabled
Disabled N/A
Disabled
Unknown False
2E00:2800 Suspended Host Source Global Copy 0
Disabled Disabled
Invalid
2E
60
Disabled
True
Disabled
Disabled N/A
Disabled
Unknown False
dscli>
4. To make the target volumes available to the attached hosts, use the failoverpprc
command on the secondary site as shown in Example 13-16.
Example 13-16 The failoverpprc command on the secondary site storage unit
dscli> failoverpprc -type gcp 2C00:2600 2800:2E00
Date/Time: October 6, 2010 3:55:19 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
CMUC00196I failoverpprc: Remote Mirror and Copy pair 2C00:2600 successfully reversed.
CMUC00196I failoverpprc: Remote Mirror and Copy pair 2800:2E00 successfully reversed.
dscli> lspprc 2C00:2600 2800:2E00
Date/Time: October 6, 2010 3:55:35 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
Chapter 13. Disaster recovery using DS8700 Global Mirror
383
ID
State
Reason
Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
====================================================================================================
2800:2E00 Suspended Host Source Global Copy 28
60
Disabled
True
2C00:2600 Suspended Host Source Global Copy 2C
60
Disabled
True
dscli>
5. Refresh and check the PVIDs. Then import and vary off the volume group as shown in
Example 13-17.
Example 13-17 Importing the volume group txvg on the secondary site node, robert
root@robert: rmdev -dl hdisk2
hdisk2 deleted
root@robert: rmdev -dl hdisk6
hdisk6 deleted
root@robert: cfgmgr
root@robert: lspv |grep -e hdisk2 -e hdisk6
hdisk2
000a624a833e440f
hdisk6
000a625afe2a4958
root@robert: importvg -V 50 -y txvg hdisk2
txvg
root@robert: lsvg -l txvg
txvg:
LV NAME
TYPE
LPs
PPs
txlv
jfs2
250
250
txloglv
jfs2log
1
1
root@robert: varyoffvg txvg
txvg
txvg
PVs
2
1
LV STATE
closed/syncd
closed/syncd
MOUNT POINT
/txro
N/A
6. Re-establish the Global Copy relationship as shown in Example 13-18.
Example 13-18 Re-establishing the initial Global Copy relationship
dscli> failbackpprc -type gcp 2600:2C00 2E00:2800
Date/Time: October 6, 2010 4:24:10 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2600:2C00 successfully failed back.
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2E00:2800 successfully failed back.
dscli> lspprc 2600:2C00 2E00:2800
Date/Time: October 6, 2010 4:24:41 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2600:2C00 Copy Pending Global Copy 26
60
Disabled
True
2E00:2800 Copy Pending Global Copy 2E
60
Disabled
True
dscli> lspprc 2800 2c00
Date/Time: October 6, 2010 4:24:57 AM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
=========================================================================================================
2600:2C00 Target Copy Pending Global Copy 26
unknown
Disabled
Invalid
2E00:2800 Target Copy Pending Global Copy 2E
unknown
Disabled
Invalid
dscli>
384
IBM PowerHA SystemMirror 7.1 for AIX
13.6 Configuring the cluster
To configure the cluster, you must complete all software prerequisites. Also you must
configure the /etc/hosts file properly, and verify that the clcomdES subsystem is running on
each node.
To configure the cluster, follow these steps:
1. Add a cluster.
2. Add all three nodes.
3. Add both sites.
4. Add the XD_ip network.
5. Add the disk heartbeat network.
6. Add the base interfaces to XD_ip network.
7. Add the service IP address.
8. Add the DS8000 Global Mirror replicated resources.
9. Add a resource group.
10.Add a service IP, application server, volume group, and DS8000 Global Mirror Replicated
Resource to the resource group.
13.6.1 Configuring the cluster topology
Configuring a cluster entails the following tasks:
Adding a cluster
Adding nodes
Adding sites
Adding networks
Adding communication interfaces
Adding a cluster
To add a cluster, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select Extended Configuration  Extended Topology Configuration 
Configure an HACMP Cluster  Add/Change/Show an HACMP Cluster.
3. Enter the cluster name, which is Txrmnia in this scenario, as shown in Figure 13-3. Press
Enter.
Add/Change/Show an HACMP Cluster
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
[Txrmnia]
* Cluster Name
Figure 13-3 Adding a cluster in the SMIT menu
The output is displayed in the SMIT Command Status window.
Chapter 13. Disaster recovery using DS8700 Global Mirror
385
Adding nodes
To add the nodes, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Topology
Configuration  Configure HACMP Nodes  Add a Node to the HACMP Cluster.
3. Enter the desired node name, which is jordan in this case, as shown in Figure 13-4. Press
Enter.
The output is displayed in the SMIT Command Status window.
Add a Node to the HACMP Cluster
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
[jordan]
[]
* Node Name
Communication Path to Node
+
Figure 13-4 Add a Node SMIT menu
4. In this scenario, repeat these steps two more times to add the additional nodes of leeann
and robert.
Adding sites
To add the nodes, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Topology
Configuration  Configure HACMP Sites  Add a Site.
3. Enter the desired site name, which in this scenario is the Texas site with the nodes jordan
and leeann, as shown in Figure 13-5. Press Enter.
The output is displayed in the SMIT Command Status window.
Add a Site
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Site Name
* Site Nodes
[Entry Fields]
[Texas]
jordan leeann
Figure 13-5 Add a Site SMIT menu
4. In this scenario, repeat these steps to add the Romania site with the robert node.
386
IBM PowerHA SystemMirror 7.1 for AIX
+
+
Example 13-19 shows the site definitions. The dominance information is displayed, but not
relevant until a resource group is defined later by using the nodes.
Example 13-19 cllssite information about site definitions
./cllssite
---------------------------------------------------Sitename
Site Nodes
Dominance
--------------------------------------------------Texas
jordan leeann
Romania
robert
Protection Type
NONE
NONE
Adding networks
To add the nodes, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Topology
Configuration  Configure HACMP Networks  Add a Network to the HACMP
Cluster.
3. Choose the desired network type, which in this scenario is XD_ip.
4. Keep the default network name and press Enter (Figure 13-6).
Add an IP-Based Network to the HACMP Cluster
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
*
*
*
*
Network Name
Network Type
Netmask(IPv4)/Prefix Length(IPv6)
Enable IP Address Takeover via IP Aliases
IP Address Offset for Heartbeating over IP Aliases
[Entry Fields]
[net_XD_ip_01]
XD_ip
[255.255.255.0]
[Yes]
[]
+
Figure 13-6 Add an IP-Based Network SMIT menu
5. Repeat these steps but select a network type of diskhb for the disk heartbeat network and
keep the default network name of net_diskhb_01.
Adding communication interfaces
To add the nodes, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Topology
Configuration  Configure HACMP Communication Interfaces/Devices  Add
Communication Interfaces/Devices  Add Pre-defined Communication Interfaces
and Devices  Communication Interfaces.
3. Select the previously created network, which in this scenario is net_XD_ip_01.
Chapter 13. Disaster recovery using DS8700 Global Mirror
387
4. Complete the SMIT menu fields. The first interface in this scenario is for jordan is shown
in Figure 13-7. Press Enter.
The output is displayed in the SMIT Command Status window.
Add a Communication Interface
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
*
*
*
*
[Entry Fields]
[jordan_base]
XD_ip
net_XD_ip_01
[jordan]
IP Label/Address
Network Type
Network Name
Node Name
+
+
Figure 13-7 Add communication interface SMIT menu
5. Repeat these steps and select Communication Devices to complete the disk heartbeat
network.
The topology is now configured. Also you can see all the interfaces and devices from the
cllsif command output shown in Figure 13-8.
Adapter
jordan_base
jordandhb
leeann_base
leeanndhb
robert_base
Type
boot
service
boot
service
boot
Network
Net Type
net_XD_ip_01 XD_ip
net_diskhb_01 diskhb
net_XD_ip_01 XD_ip
net_diskhb_01 diskhb
net_XD_ip_01 XD_ip
Attribute
public
serial
public
serial
public
Node
jordan
jordan
leeann
leeann
robert
IP Address
9.3.207.209
/dev/hdisk8
9.3.207.208
/dev/hdisk8
9.3.207.207
Figure 13-8 Cluster interfaces and devices defined
13.6.2 Configuring cluster resources and resource group
The test scenario has only one resource group, which contains the resources of the service
IP address, volume group, and DS8000 replicated resources. Configure the cluster resources
and resource group as explained in the following sections.
Defining the service IP
Define the service IP by following these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  HACMP Extended Resources Configuration  Configure HACMP
Service IP Labels/Addresses  Add a Service IP Label/Address  Configurable on
Multiple Nodes.
3. Choose the net_XD_ip_01 network and press Enter.
4. Choose the appropriate IP label or address. Press Enter.
The output is displayed in the SMIT Command Status window.
388
IBM PowerHA SystemMirror 7.1 for AIX
In this scenario, we added serviceip_2, as shown in Figure 13-9.
Add a Service IP Label/Address configurable on Multiple Nodes (extended)
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* IP Label/Address
serviceip_2
Netmask(IPv4)/Prefix Length(IPv6)
[]
* Network Name
net_XD_ip_01
Alternate HW Address to accompany IP Label/Address []
Associated Site
ignore
+
Figure 13-9 Add a Service IP Label SMIT menu
In most true site scenarios, where each site is on different segments, it is common to create
at least two service IP labels. You create one for each site by using the Associated Site
option, which indicates the desire to have site-specific service IP labels. With this option, you
can have a unique service IP label at each site. However, we do not use them in this test
because we are on the same network segment.
Defining the DS8000 Global Mirror resources
To fully define the Global Mirror resources, follow these steps:
1. Add a storage agent or agents.
2. Add a storage system or systems.
3. Add a mirror group or groups.
Because these options are all new, define each one before you configure them:
Storage agent
A generic name given by PowerHA SystemMirror for an entity such as
the IBM DS8000 HMC. Storage agents typically provide a one-point
coordination point and often use TCP/IP as their transport for
communication. You must provide the IP address and authentication
information that will be used to communicate with the HMC.
Storage system
A generic name given by PowerHA SystemMirror for an entity such as
a DS8700 Storage Unit. When using Global Mirror, you must associate
one storage agent with each storage system. You must provide the
IBM DS8700 system identifier for the storage system. For example,
IBM.2107-75ABTV1 is a storage identifier for a DS8000 Storage
System.
Mirror group
A generic name given by PowerHA SystemMirror for a logical
collection of volumes that must be mirrored to another storage system
that resides on a remote site. A Global Mirror session represents a
mirror group.
Adding a storage agent
To add a storage agent, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  HACMP Extended Resources Configuration  Configure DS8000
Global Mirror Resources  Configure Storage Agents  Add a Storage Agent.
Chapter 13. Disaster recovery using DS8700 Global Mirror
389
3. Complete the menu appropriately and press Enter. Figure 13-10 shows the configuration
for this scenario.
The output is displayed in the SMIT Command Status window.
Add a Storage Agent
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
*
*
*
*
[Entry Fields]
[ds8khmc]
[9.3.207.122]
[redbook]
[r3dbook]
Storage Agent Name
IP Addresses
User ID
Password
Figure 13-10 Add a Storage Agent SMIT menu
It is possible to have multiple storage agents. However, this test scenario has only one
storage agent that manages both storage units.
Important: The user ID and password are stored as flat text in the
HACMPxd_storage_agent.odm file.
Adding a storage system
To add the storage systems, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  HACMP Extended Resources Configuration  Configure DS8000
Global Mirror Resources  Configure Storage Systems  Add a Storage System.
3. Complete the menu appropriately and press Enter. Figure 13-11 shows the configuration
for this scenario.
The output is displayed in the SMIT Command Status window.
Add a Storage System
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
*
*
*
*
*
Storage System Name
Storage Agent Name(s)
Site Association
Vendor Specific Identification
WWNN
Figure 13-11 Add a Storage System SMIT menu
390
IBM PowerHA SystemMirror 7.1 for AIX
[Entry Fields]
[texasds8k]
ds8kmainhmc
Texas
[IBM.2107-75DC890]
[5005076308FFC004]
+
+
+
+
4. Repeat these steps for the storage system at Romania site, and name it romaniads8k.
Example 13-20 shows the configuration.
Example 13-20 Storage systems definitions
Storage System Name
Storage Agent Name(s)
Site Association
Vendor Specific Identification
WWNN
Storage System Name
Storage Agent Name(s)
Site Association
Vendor Specific Identification
WWNN
texasds8k
ds8kmainhmc
Texas
IBM.2107-75DC890
5005076308FFC004
romaniads8k
ds8kmainhmc
Romania
IBM.2107-75DC980
5005076308FFC804
Adding a mirror group
You are now ready to add the storage systems. To add a storage system, perform the
following steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  HACMP Extended Resources Configuration  Configure DS8000
Global Mirror Resources  Configure Mirror Groups  Add a Mirror Group.
3. Complete the menu appropriately and press Enter. Figure 13-12 show the configuration
for this scenario.
The output is displayed in the SMIT Command Status window.
Add a Mirror Group
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
*
*
*
*
Mirror Group Name
Storage System Name
Vendor Specific Identifier
Recovery Action
Maximum Coordination Time
Maximum Drain Time
Consistency Group Interval Time
[Entry Fields]
[texasmg]
texasds8k romaniads8k
[03]
+
automatic
[50]
[30]
[0]
+
+
Figure 13-12 Add a Mirror Group SMIT menu
Vendor Specific Identifier field: For the Vendor Specific Identifier field, provide only the
Global Mirror session number.
Defining a resource group and Global Mirror resources
Now that you have all the components configured that are required for the DS8700 replicated
resource, you can create a resource group and add your resources to it.
Chapter 13. Disaster recovery using DS8700 Global Mirror
391
Adding a resource group
To add a resource group, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  HACMP Extended Resources Group Configuration  Add a
Resource Group.
3. Complete the menu appropriately and press Enter. Figure 13-13 shows the configuration
in this scenario. Notice that for the Inter-Site Management Policy, we chose Prefer
Primary Site. This option ensures that resource group starts automatically when the
cluster is started in the primary Texas site.
The output is displayed in the SMIT Command Status window.
Add a Resource Group (extended)
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Resource Group Name
[Entry Fields]
[ds8kgmrg]
Inter-Site Management Policy
* Participating Nodes from Primary Site
Participating Nodes from Secondary Site
[Prefer Primary Site]
[jordan leeann]
[robert]
Startup Policy
Fallover Policy
Fallback Policy
+
+
+
Online On Home Node Only+
Fallover To Next Priority Node > +
Never Fallback
Figure 13-13 Add a Resource Group SMIT menu
Adding resources to a resource group
To add resources to a resource group, perform the following steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  Change/Show Resources and Attributes for a Resource Group.
3. Choose the resource group, which in this example is ds8kgmrg.
4. Complete the menu appropriately and press Enter. Figure 13-13 shows the configuration
for this scenario.
The output is displayed in the SMIT Command Status window.
In this scenario, we only added a service IP label, the volume group, and the DS8000 Global
Mirror Replicated Resources as shown in the streamlined clshowres command output in
Example 13-21.
Volume group: The volume group names must be listed in the same order as the DS8700
mirror group names in the resource group.
Example 13-21 Resource group attributes and resources
Resource Group Name
Inter-site Management Policy
Participating Nodes from Primary Site
392
IBM PowerHA SystemMirror 7.1 for AIX
ds8kgmrg
Prefer Primary Site
jordan leeann
Participating Nodes from Secondary Site
Startup Policy
Fallover Policy
Fallback Policy
Service IP Label
Volume Groups
GENXD Replicated Resources
robert
Online On Home Node Only
Fallover To Next Priority Node
Never Fallback
serviceip_2
txvg
+
texasmg
+
DS8000 Global Mirror Replicated Resources field: In the SMIT menu for adding
resources to the resource group, notice that the appropriate field is named DS8000 Global
Mirror Replicated Resources. However, when viewing the menu by using the clshowres
command (Example 13-21 on page 392), the field is called GENXD Replicated Resources.
You can now synchronize the cluster, start the cluster, and begin testing it.
13.7 Failover testing
This section takes you through basic failover testing scenarios with the DS8000 Global Mirror
replicated resources locally within the site and across sites. You must carefully plan the
testing of a site cluster failover because more time is required to manipulate the secondary
target LUNs at the recovery site. Also when testing the asynchronous replication, because of
the nature of the asynchronous replication, it can also impact the data.
In these scenarios, redundancy tests, such as on IP networks that have only a single network,
cannot be performed. Instead you must configure redundant IP or non-IP communication
paths to avoid isolation of the sites. The loss of all the communication paths between sites
leads to a partitioned state of the cluster. Such a loss also leads to data divergence between
sites if the replication links are also unavailable.
Another specific failure scenario is the loss of replication paths between the storage
subsystems while the cluster is running on both sites. To avoid this type of loss, configure a
redundant PPRC path or links for the replication. You must manually recover the status of the
pairs after the storage links are operational again.
Important: If the PPRC path or link between Global Mirror volumes breaks down, the
PowerHA Enterprise Edition is unaware. The reason is that PowerHA does not process
SNMP for volumes that use DS8700 Global Mirror technology for mirroring. In such a case,
you must identify and correct the PPRC path failure. Depending upon some timing
conditions, such an event can result in the corresponding Global Mirror session going into
a fatal state. In this situation, you must manually stop and restart the corresponding Global
Mirror session (by using the rmgmir and mkgmir DSCLI commands) or an equivalent
DS8700 interface.
This topic takes you through the following tests:
Graceful site failover
Rolling site failure
Site re-integration
Each test, other than the re-integration test, begins in the same initial state of the primary site
hosting the ds8kgmrg resource group on the primary node as shown in Example 13-22 on
page 394. Before each test, we start copying data from another file system to the replicated
file systems. After each test, we verify that the service IP address is online and that new data
Chapter 13. Disaster recovery using DS8700 Global Mirror
393
is in the file systems. We also had a script that inserted the current time and date, along with
the local node name, into a file on each file system.
Example 13-22 Beginning of the test cluster resource group states
jordan# clRGinfo
----------------------------------------------------------------------------Group Name
State
Node
----------------------------------------------------------------------------ds8kgmrg
ONLINE
jordan@Texas
OFFLINE
leeann@Texas
ONLINE SECONDARY
robert@Romania
After each test, we show the Global Mirror states. Example 13-23 shows the normal running
production status of the Global Mirror pairs from each site.
Example 13-23 Beginning states of the Global Mirror pairs
*******************From node jordan at site Texas***************************
dscli> lssession 26 2E
Date/Time: October 10, 2010 4:00:04 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status
Volume VolumeStatus PrimaryStatus
SecondaryStatus
FirstPassComplete
AllowCascading
===========================================================================================================
==============
26
03
CG In Progress 2600
Active
Primary Copy Pending Secondary Simplex True
Disable
2E
03
CG In Progress 2E00
Active
Primary Copy Pending Secondary Simplex True
Disable
dscli> lspprc 2600 2E00
Date/Time: October 10, 2010 4:00:43 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2600:2C00 Copy Pending Global Copy 26
60
Disabled
True
2E00:2800 Copy Pending Global Copy 2E
60
Disabled
True
*******************From remote node robert at site Romania***************************
dscli> lssession 28 2c
Date/Time: October 10, 2010 3:54:58 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
LSS ID Session Status Volume VolumeStatus PrimaryStatus
SecondaryStatus
FirstPassComplete
AllowCascading
===========================================================================================================
======
28
03
Normal 2800
Join Pending Primary Simplex Secondary Copy Pending True
Disable
2C
03
Normal 2C00
Join Pending Primary Simplex Secondary Copy Pending True
Disable
dscli> lspprc 2800 2c00
Date/Time: October 10, 2010 3:55:48 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
=========================================================================================================
2600:2C00 Target Copy Pending Global Copy 26
unknown
Disabled
Invalid
2E00:2800 Target Copy Pending Global Copy 2E
unknown
Disabled
Invalid
394
IBM PowerHA SystemMirror 7.1 for AIX
13.7.1 Graceful site failover
Performing a controlled move of a production environment across sites is a basic test to
ensure that the remote site can bring the production environment online. This test is done
only during initial implementation testing or during a planned production outage of the site. In
this test, we perform the graceful failover operation between sites by performing a resource
group move.
In a true maintenance scenario, you might most likely perform a graceful site failover by
stopping the cluster on the local standby node first. Then you stop the cluster on the
production node by using Move Resource Group.
Moving the resource group to another site: In this scenario, because we only have one
node at the Romania site, we use the option to move the resource group to another site. If
multiple remote nodes are members of the resource, use the option to move the resource
group to another node instead.
During this move, the following operations are performed:
Release the primary online instance of ds8kgmrg at the Texas site. This operation entails
the following tasks:
–
–
–
–
Executes the application server stop.
Unmounts the file systems.
Varies off the volume group.
Removes the service IP address.
Release the secondary online instance of ds8kgmrg at the Romania site.
Acquire ds8kgmrg in the secondary online state at the Texas site.
Acquire ds8kgmrg in the online primary state at the Romania site.
To perform the resource group move by using SMIT, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path System Management (C-SPOC)  Resource Groups and
Applications  Move a Resource Group to Another Node / Site  Move Resource
Groups to Another Site.
Chapter 13. Disaster recovery using DS8700 Global Mirror
395
3. Select the ONLINE instance of ds8kgmrg to be moved as shown in Figure 13-14.
Move a Resource Group to Another Node / Site
Move cursor to desired item and press Enter.
Move Resource Groups to Another Node
Move +--------------------------------------------------------------------------+
|
Select a Resource Group
|
|
|
| Move cursor to desired item and press Enter. Use arrow keys to scroll.
|
|
|
|
#
|
|
# Resource Group
State
Node(s) / Site |
|
#
|
|
ds8kgmrg
ONLINE
jordan / Texas |
|
ds8kgmrg
ONLINE SECONDARY
robert / Romani |
|
|
|
#
|
|
# Resource groups in node or site collocation configuration:
|
|
# Resource Group(s)
State
Node / Site
|
|
#
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F8=Image
F10=Exit
Enter=Do
|
F1=Help| /=Find
n=Find Next
|
F9=Shel+--------------------------------------------------------------------------+
Figure 13-14 Selecting a resource group
4. Select the Romania site from the next menu as shown in Figure 13-15.
+--------------------------------------------------------------------------+
|
Select a Destination Site
|
|
|
| Move cursor to desired item and press Enter.
|
|
|
|
# *Denotes Originally Configured Primary Site
|
|
Romania
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F8=Image
F10=Exit
Enter=Do
|
| /=Find
n=Find Next
|
+--------------------------------------------------------------------------+
Figure 13-15 Selecting a site for a resource group move
5. Verify the information in the final menu and Press Enter.
396
IBM PowerHA SystemMirror 7.1 for AIX
Upon completion of the move, ds8kgmrg is online on the node robert as shown
Example 13-24.
Attention: During our testing, a problem was encountered. After performing the first
resource group move between sites, we are unable to move it back due to the pick list
for destination site is empty. We can move it back by node. Later in our testing, the
by-site option started working. However, it moved the resource group to the standby
node at the primary site instead of the original primary node. If you encounter similar
problems, contact IBM support.
Example 13-24 Resource group status after the site move to Romania
----------------------------------------------------------------------------Group Name
State
Node
----------------------------------------------------------------------------ds8kgmrg
ONLINE SECONDARY
jordan@Texas
OFFLINE
leeann@Texas
ONLINE
robert@Romania
6. Repeat the resource group move to move it back to its original primary site, Texas, and
node, jordan, to return to the original starting state. However, instead of using the option
to move it another site, use the option to move it to another node.
Example 13-25 shows that the Global Mirror statuses are now swapped, and the local site is
showing the LUNs now as the target volumes.
Example 13-25 Global Mirror status after the resource group move
*******************From node jordan at site Texas***************************
dscli> lssession 26 2E
Date/Time: October 10, 2010 4:04:44 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status Volume VolumeStatus PrimaryStatus
SecondaryStatus
FirstPassComplete
AllowCascading
===========================================================================================================
======
26
03
Normal 2600
Active
Primary Simplex Secondary Copy Pending True
Disable
2E
03
Normal 2E00
Active
Primary Simplex Secondary Copy Pending True
Disable
dscli> lspprc 2600 2E00
Date/Time: October 10, 2010 4:05:26 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
=========================================================================================================
2800:2E00 Target Copy Pending Global Copy 28
unknown
Disabled
Invalid
2C00:2600 Target Copy Pending Global Copy 2C
unknown
Disabled
Invalid
*******************From remote node robert at site Romania***************************
dscli> lssession 28 2C
Date/Time: October 10, 2010 3:59:25 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
LSS ID Session Status
Volume VolumeStatus PrimaryStatus
SecondaryStatus
FirstPassComplete
AllowCascading
===========================================================================================================
==============
28
03
CG In Progress 2800
Active
Primary Copy Pending Secondary Simplex True
Disable
2C
03
CG In Progress 2C00
Active
Primary Copy Pending Secondary Simplex True
Disable
Chapter 13. Disaster recovery using DS8700 Global Mirror
397
dscli> lspprc 2800 2C00
Date/Time: October 10, 2010 3:59:35 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2800:2E00 Copy Pending Global Copy 28
60
Disabled
True
2C00:2600 Copy Pending Global Copy 2C
60
Disabled
True
13.7.2 Rolling site failure
This scenario entails performing a rolling site failure of the Texas site by using the following
steps:
1.
2.
3.
4.
5.
6.
Halt the primary production node jordan at the Texas site.
Verify that the resource group ds8kgmrg is acquired locally by the node leeann.
Verify that the Global Mirror pairs are in the same status as before the system failure.
Halt the node leeann to produce a site down.
Verify that the resource group ds8kgmrg is acquired remotely by the robert node.
Verify that the Global Mirror pair states are changed.
Begin with all three nodes active in the cluster and the resource group online on the primary
node as shown in Example 13-22 on page 394.
On the node jordan, we run the reboot -q command. The node leeann acquires the
ds8kgmrg resource group as shown in Example 13-26.
Example 13-26 Local node failover within the site Texas
root@leeann: clRGinfo
-----------------------------------------------------------------------------Group Name
State
Node
----------------------------------------------------------------------------ds8kgmrg
OFFLINE
jordan@Texas
ONLINE
leeann@Texas
ONLINE SECONDARY
robert@Romania
Example 13-27 shows that the statuses are the same as when we started.
Example 13-27 Global Mirror pair status after a local failover
*******************From node leeann at site Texas***************************
dscli> lssession 26 2E
Date/Time: October 10, 2010 4:10:04 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status
Volume VolumeStatus PrimaryStatus
SecondaryStatus
FirstPassComplete
AllowCascading
===========================================================================================================
==============
26
03
CG In Progress 2600
Active
Primary Copy Pending Secondary Simplex True
Disable
2E
03
CG In Progress 2E00
Active
Primary Copy Pending Secondary Simplex True
Disable
dscli> lspprc 2600 2E00
Date/Time: October 10, 2010 4:10:43 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2600:2C00 Copy Pending Global Copy 26
60
Disabled
True
2E00:2800 Copy Pending Global Copy 2E
60
Disabled
True
398
IBM PowerHA SystemMirror 7.1 for AIX
*******************From remote node robert at site Romania***************************
dscli> lssession 28 2c
Date/Time: October 10, 2010 4:04:58 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
LSS ID Session Status Volume VolumeStatus PrimaryStatus
SecondaryStatus
FirstPassComplete
AllowCascading
===========================================================================================================
28
03
Normal 2800
Join Pending Primary Simplex Secondary Copy Pending True
Disable
2C
03
Normal 2C00
Join Pending Primary Simplex Secondary Copy Pending True
Disable
dscli> lspprc 2800 2c00
Date/Time: October 10, 2010 4:05:48 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
=========================================================================================================
2600:2C00 Target Copy Pending Global Copy 26
unknown
Disabled
Invalid
2E00:2800 Target Copy Pending Global Copy 2E
unknown
Disabled
Invalid
Upon the cluster stabilization, we run the reboot -q command on the leeann node invoking a
site_down event. The robert node at the Romania site acquires the ds8kgmrg resource group
as shown in Example 13-28.
Example 13-28 Hard failover between sites
root@robert: clRGinfo
----------------------------------------------------------------------------Group Name
State
Node
----------------------------------------------------------------------------ds8kgmrg
OFFLINE
jordan@Texas
OFFLINE
leeann@Texas
ONLINE
robert@Romania
You can also see that the replicated pairs are now in the suspended state at the remote site as
shown in Example 13-29.
Example 13-29 Global Mirror pair status after site failover
*******************From remote node robert at site Romania***************************
dscli> lssession 28 2c
Date/Time: October 10, 2010 4:17:28 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
LSS ID Session Status Volume VolumeStatus PrimaryStatus
SecondaryStatus
FirstPassComplete
AllowCascading
===========================================================================================================
28
03
Normal 2800
Join Pending Primary Suspended Secondary Simplex False
Disable
2C
03
Normal 2C00
Join Pending Primary Suspended Secondary Simplex False
Disable
dscli> lspprc 2800 2c00
Date/Time: October 10, 2010 4:17:55 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID
State
Reason
Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
====================================================================================================
2800:2E00 Suspended Host Source Global Copy 28
60
Disabled
False
2C00:2600 Suspended Host Source Global Copy 2C
60
Disabled
False
Chapter 13. Disaster recovery using DS8700 Global Mirror
399
Important: Although the testing resulted in a site_down event, we never lost access to the
primary storage subsystem. PowerHA does not check storage connectivity back to the
primary site during this event. Before moving back to the primary site, re-establish the
replicated pairs and get them all back in sync. If you replace the storage, you might also
have to change the storage agent, storage subsystem, and mirror groups to ensure that
the new configuration is correct for the cluster.
13.7.3 Site re-integration
Before bringing the primary site node back into the cluster, the Global Mirror pairs must be
placed back in sync by using the following steps:
Tip: Follow these steps “as is” because you can accomplish the same results using various
methods:
1.
2.
3.
4.
5.
6.
7.
8.
Verify that the Global Mirror statuses at the primary site are suspended.
Fail back PPRC from the secondary site.
Verify that the Global Mirror status at the primary site shows the target status.
Verify that out-of-sync tracks are 0.
Stop the cluster to ensure that the volume group I/O is stopped.
Fail over the PPRC on the primary site.
Fail back the PPRC on the primary site.
Start the cluster.
Failing back the PPRC pairs to the secondary site
To fail back the PPRC pairs to the secondary site, follow these steps:
1. Verify the current state of the Global Mirror pairs at the primary site from the jordan node.
The pairs are suspended as shown in Example 13-30.
Example 13-30 Suspended pair status in Global Mirror on the primary site after node restart
*******************From node jordan at site Texas***************************
dscli> lspprc 2600 2e00
Date/Time: October 10, 2010 4:27:48 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID
State
Reason
Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
====================================================================================================
2600:2C00 Suspended Host Source Global Copy 26
60
Disabled
True
2E00:2800 Suspended Host Source Global Copy 2E
60
Disabled
True
2. On the remote node robert, fail back the PPRC pairs as shown in Example 13-31.
Example 13-31 Failing back PPRC pairs at the remote site
*******************From node robert at site Romania***************************
dscli> failbackpprc -type gcp 2C00:2600 2800:2E00
Date/Time: October 10, 2010 4:22:09 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC980
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2C00:2600 successfully failed back.
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2800:2E00 successfully
400
IBM PowerHA SystemMirror 7.1 for AIX
3. After executing the fallback, check the status again of the pairs from the primary site to
ensure that they are now shown as Target (Example 13-32).
Example 13-32 Verifying that the primary site LUNs are now target LUNs
*******************From node jordan at site Texas***************************
dscli> lspprc 2600 2e00
Date/Time: October 10, 2010 4:44:21 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First
Pass Status
================================================================================================
=========
2800:2E00 Target Copy Pending Global Copy 28
unknown
Disabled
Invalid
2C00:2600 Target Copy Pending Global Copy 2C
unknown
Disabled
Invalid
4. Monitor that the status of replication at the remote site by watching the Out of Sync Tracks
field by using the lspprc -l command. After they are at 0, as shown in Example 13-33,
they are in sync. Then you can stop the remote site in preparation to move production
back to the primary site.
Example 13-33 Verifying that the Global Mirror pairs are back in sync
dscli> lspprc -l 2800 2c00
Date/Time: October 10, 2010 4:22:46 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID
State
Reason Type
Out Of Sync Tracks Tgt Read Src Cascade Tgt Cascade Date
Suspended SourceLSS
===========================================================================================================
============
2800:2E00 Copy Pending Global Copy 0
Disabled Disabled
Invalid
28
2C00:2600 Copy Pending Global Copy 0
Disabled Disabled
Invalid
2C
6
Failing over the PPRC pairs back to the primary site
To fail over the PPRC pairs back to the primary site, follow these steps:
1. Stop the cluster on node robert by using the smitty clstop command to bring the
resource group down.
2. After the resources are offline, continue to fail over the PPRC on the primary site jordan
node as shown Example 13-34.
Example 13-34 Failover PPRC pairs at local primary site
*******************From node jordan at site Texas***************************
dscli> failoverpprc -type gcp 2600:2c00 2E00:2800
Date/Time: October 10, 2010 4:45:16 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC890
CMUC00196I failoverpprc: Remote Mirror and Copy pair 2600:2C00 successfully
reversed.
CMUC00196I failoverpprc: Remote Mirror and Copy pair 2E00:2800 successfully
reversed.
Chapter 13. Disaster recovery using DS8700 Global Mirror
401
3. Again verify that the status is in the suspended state on the primary site and that the
remote site shows the copy state as shown in Example 13-35.
Example 13-35 Global Mirror pairs suspended on the primary site
*******************From node jordan at site Texas***************************
dscli> lspprc 2600 2E00
Date/Time: October 10, 2010 4:45:51 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID
State
Reason
Type
SourceLSS Timeout (secs) Critical Mode First Pass
Status
================================================================================================
====
2600:2C00 Suspended Host Source Global Copy 26
60
Disabled
True
2E00:2800 Suspended Host Source Global Copy 2E
60
Disabled
True
******************From node robert at site Romania***************************
dscli> lspprc 2800 2c00
Date/Time: October 10, 2010 4:39:27 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass
Status
================================================================================================
==
2800:2E00 Copy Pending Global Copy 28
60
Disabled
True
2C00:2600 Copy Pending Global Copy 2C
60
Disabled
True
Failing back the PPRC pairs to the primary site
You cannot complete the switchback to the primary site by performing a failback of the Global
Mirror pairs to the primary site by running the failbackpprc command as shown in
Example 13-36.
Example 13-36 Failing back the PPRC pairs on the primary site
*******************From node jordan at site Texas***************************
dscli> failbackpprc -type gcp 2600:2c00 2E00:2800
Date/Time: October 10, 2010 4:46:49 PM CDT IBM DSCLI Version: 6.5.15.19 DS:
IBM.2107-75DC890
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2600:2C00 successfully failed back.
CMUC00197I failbackpprc: Remote Mirror and Copy pair 2E00:2800 successfully failed back.
Verify the status of the pairs at each site as shown in Example 13-37.
Example 13-37 Global Mirror pairs failed back to the primary site
*******************From node jordan at site Texas***************************
dscli> lspprc 2600 2e00
Date/Time: October 10, 2010 4:47:04 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2600:2C00 Copy Pending Global Copy 26
60
Disabled
True
2E00:2800 Copy Pending Global Copy 2E
60
Disabled
True
******************From node robert at site Romania***************************
dscli> lspprc 2800 2c00
Date/Time: October 10, 2010 4:40:44 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
402
IBM PowerHA SystemMirror 7.1 for AIX
=========================================================================================================
2600:2C00 Target Copy Pending Global Copy 26
unknown
Disabled
Invalid
2E00:2800 Target Copy Pending Global Copy 2E
unknown
Disabled
Invalid
Starting the cluster
To start the cluster, follow these steps:
1. Start all nodes in the cluster by using the smitty clstart command as shown
Figure 13-16.
Start Cluster Services
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Start now, on system restart or both
Start Cluster Services on these nodes
* Manage Resource Groups
BROADCAST message at startup?
Startup Cluster Information Daemon?
Ignore verification errors?
Automatically correct errors found during
cluster start?
[Entry Fields]
now
[jordan,leeann,robert]
Automatically
true
true
false
Interactively
+
+
+
+
+
+
+
Figure 13-16 Restarting a cluster after a site failure
Upon startup of the primary node jordan, the resource group is automatically started on
jordan and back to the original starting point as shown in Example 13-38.
Example 13-38 Resource group status after restart
----------------------------------------------------------------------------Group Name
State
Node
----------------------------------------------------------------------------ds8kgmrg
ONLINE
jordan@Texas
OFFLINE
leeann@Texas
ONLINE SECONDARY
robert@Romania
2. Verify the pair and session status on each site as shown in Example 13-39.
Example 13-39 Global Mirror pairs back to normal
*******************From node jordan at site Texas***************************
dscli>lssession 26 2e
Date/Time: October 10, 2010 5:02:11 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
LSS ID Session Status
Volume VolumeStatus PrimaryStatus
SecondaryStatus
FirstPassComplete
AllowCascading
===========================================================================================================
==============
26
03
CG In Progress 2600
Active
Primary Copy Pending Secondary Simplex True
Disable
2E
03
CG In Progress 2E00
Active
Primary Copy Pending Secondary Simplex True
Disable
dscli> lspprc 2600 2e00
Chapter 13. Disaster recovery using DS8700 Global Mirror
403
Date/Time: October 10, 2010 5:02:26 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC890
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
==================================================================================================
2600:2C00 Copy Pending Global Copy 26
60
Disabled
True
2E00:2800 Copy Pending Global Copy 2E
60
Disabled
True
******************From node robert at site Romania***************************
dscli>lssession 28 2C
Date/Time: October 10, 2010 4:56:11 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
LSS ID Session Status Volume VolumeStatus PrimaryStatus
SecondaryStatus
FirstPassComplete
AllowCascading
===========================================================================================================
======
28
03
Normal 2800
Active
Primary Simplex Secondary Copy Pending True
Disable
2C
03
Normal 2C00
Active
Primary Simplex Secondary Copy Pending True
Disable
dscli> lspprc 2800 2c00
Date/Time: October 10, 2010 4:56:30 PM CDT IBM DSCLI Version: 6.5.15.19 DS: IBM.2107-75DC980
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
=========================================================================================================
2600:2C00 Target Copy Pending Global Copy 26
unknown
Disabled
Invalid
2E00:2800 Target Copy Pending Global Copy 2E
unknown
Disabled
Invalid
13.8 LVM administration of DS8000 Global Mirror replicated
resources
This section provides the common scenarios for adding additional storage to an existing
Global Mirror replicated environment. These scenarios work primarily with the Texas site and
the ds8kgmrg resource group. You perform the following tasks:
Adding a new Global Mirror pair to an existing volume group
Adding a Global Mirror pair into a new volume group
Dynamically expanding a volume: This topic does not provide information about
dynamically expanding a volume because this option is not supported.
13.8.1 Adding a new Global Mirror pair to an existing volume group
To add a new Global Mirror pair to an existing volume group, follow these steps:
1. Assign a new LUN to each site, add the FlashCopy devices, and add the new pair into the
existing session as explained in 13.4.3, “Configuring the Global Mirror relationships” on
page 377. Table 13-2 summarizes the LUNs that are used from each site.
Table 13-2 Summary of the LUNs used on each site
Texas
404
Romania
AIX DISK
LSS/VOL ID
AIX DISK
LSS/VOL ID
hdisk11
2605
hdisk10
2C06
IBM PowerHA SystemMirror 7.1 for AIX
2. Define the new LUNs:
a. Run the cfgmgr command on the primary node jordan.
b. Assign the PVID on the node jordan.
chdev -l hdisk11 -a pv=yes
c.
d.
e.
f.
g.
h.
i.
Configure disk and PVID on local node leeann by using the cfgmgr command.
Verify that the PVID is displayed by running the lspv command.
Pause the PPRC on the primary site.
Fail over the PPRC to the secondary site.
Configure the disk and PVID on the remote node robert with the cfgmgr command.
Verify that the PVID is displayed by running the lspv command.
Fail back the PPRC to the primary site.
3. Add the new disk into the volume group by using C-SPOC as follows:
Important: C-SPOC cannot perform the certain LVM operations on nodes at the
remote site (that contain the target volumes). Such operations include those operations
that require nodes at the target site to read from the target volumes. These operations
cause an error message in C-SPOC. This includes functions such as changing file
system size, changing mount point, and adding LVM mirrors. However, nodes on the
same site as the source volumes can successfully perform these tasks. The changes
can be propagated later to the other site by using a lazy update.
For C-SPOC operations to work on all other LVM operations, perform all C-SPOC
operations with the Global Mirror volume pairs in synchronized or consistent states or
the ACTIVE cluster on all nodes.
a. From the command line, type the smitty cl_admin command.
b. In SMIT, select the path System Management (C-SPOC)  Storage  Volume
Groups  Add a Volume to a Volume Group.
c. Select the txvg volume group from the pop-up menu.
Chapter 13. Disaster recovery using DS8700 Global Mirror
405
d. Select the disk or disks by PVID as shown in Figure 13-17.
Set Characteristics of a Volume Group
Move cursor to desired item and press Enter.
Add a Volume to a Volume Group
Change/Show characteristics of a Volume Group
Remove a Volume from a Volume Group
Enable/Disable a Volume Group for Cross-Site LVM Mirroring Verification
+--------------------------------------------------------------------------+
|
Physical Volume Names
|
|
|
| Move cursor to desired item and press Enter.
|
|
|
|
000a624a987825c8 ( hdisk10 on node robert )
|
|
000a624a987825c8 ( hdisk11 on nodes jordan,leeann )
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F8=Image
F10=Exit
Enter=Do
|
F1| /=Find
n=Find Next
|
F9+--------------------------------------------------------------------------+
Figure 13-17 Disk selection to add to the volume group
e. Verify the menu information, as shown in Figure 13-18, and press Enter.
Add a Volume to a Volume Group
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
VOLUME GROUP name
Resource Group Name
Node List
Reference node
VOLUME names
[Entry Fields]
txvg
ds8kgmrg
jordan,leeann,robert
robert
hdisk10
Figure 13-18 Add a Volume C-SPOC SMIT menu
Upon completion of the C-SPOC operation, the local nodes have been updated but the
remote node has not been updated as shown in Example 13-40. This node was not updated
because the target volumes are not readable until the relationship is swapped. You receive an
error message from C-SPOC, as shown in the note after Example 13-40. However, the lazy
update procedure at the time of failover pulls in the remaining volume group information.
Example 13-40 New disk added to volume group on all nodes
root@jordan: lspv |grep txvg
hdisk6
000a625afe2a4958
hdisk10
000a624a833e440f
hdisk11
000a624a987825c8
406
IBM PowerHA SystemMirror 7.1 for AIX
txvg
txvg
txvg
root@leeann: lspv |grep txvg
hdisk6
000a625afe2a4958
hdisk10
000a624a833e440f
hdisk11
000a624a987825c8
txvg
txvg
txvg
root@robert: lspv
hdisk2
000a624a833e440f
hdisk6
000a625afe2a4958
hdisk10
000a624a987825c8
txvg
txvg
none
Attention: When using C-SPOC to modify a volume group containing a Global Mirror
replicated resource, you can expect to see the following error message:
cl_extendvg: Error executing clupdatevg txvg 000a624a833e440f on node robert
You do not need to synchronize the cluster because all of these changes are made to an
existing volume group. However, consider running a verification.
Adding a new logical volume
Again you use C-SPOC to add a new logical volume. As noted earlier, this process updates
the local nodes within the site. For the remote site, when a failover occurs, the lazy update
process updates the volume group information as needed. This process also adds a bit of
extra time to the failover time.
To add a new logical volume, follow these steps:
1. From the command line, type the smitty cl_admin command.
2. In SMIT, select the path System Management (C-SPOC)  Storage  Logical
Volumes  Add a Logical Volume.
3. Select the txvg volume group from the pop-up menu.
Chapter 13. Disaster recovery using DS8700 Global Mirror
407
4. Select the newly added disk hdisk11 as shown in Figure 13-19.
Logical Volumes
Move cursor to desired item and press Enter.
List All Logical Volumes by Volume Group
Add a Logical Volume
Show Characteristics of a Logical Volume
Set Characteristics of a Logical Volume
+--------------------------------------------------------------------------+
|
Physical Volume Names
|
|
|
| Move cursor to desired item and press F7.
|
|
ONE OR MORE items can be selected.
|
| Press Enter AFTER making all selections.
|
|
|
|
Auto-select
|
|
jordan hdisk6
|
|
jordan hdisk10
|
|
jordan hdisk11
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F7=Select
F8=Image
F10=Exit
|
F1| Enter=Do
/=Find
n=Find Next
|
F9+--------------------------------------------------------------------------+
Figure 13-19 Choose disk for new logical volume creation
408
IBM PowerHA SystemMirror 7.1 for AIX
5. Complete the information in the final menu (Figure 13-20), and press Enter.
We added a new logical volume, named pattilv, which consists of 100 logical partitions
(LPARs) and selected raw for the type. We left all other values with their defaults.
Add a Logical Volume
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[TOP]
Resource Group Name
VOLUME GROUP name
Node List
Reference node
* Number of LOGICAL PARTITIONS
PHYSICAL VOLUME names
Logical volume NAME
Logical volume TYPE
POSITION on physical volume
RANGE of physical volumes
MAXIMUM NUMBER of PHYSICAL VOLUMES
to use for allocation
Number of COPIES of each logical
[MORE...15]
[Entry Fields]
ds8kgmrg
txvg
jordan,leeann,robert
jordan
[100]
hdisk11
[pattilv]
[raw]
outer_middle
minimum
[]
1
#
+
+
+
#
+
Figure 13-20 New logical volume C-SPOC SMIT menu
6. Upon completion of the C-SPOC operation, verify that the new logical volume is created
locally on node jordan as shown in Example 13-41.
Example 13-41 Newly created logical volume
root@jordan:lsvg -l txvg
txvg:
LV NAME
TYPE
txlv
jfs2
txloglv
jfs2log
pattilv
raw
LPs
250
1
100
PPs
150
1
100
PVs
3
1
1
LV STATE
open/syncd
open/syncd
closed/syncd
MOUNT POINT
/txro
N/A
N/A
Similar to when you create the volume group, you see an error message (Figure 13-21) about
being unable to update the remote node.
COMMAND STATUS
Command: OK
stdout: yes
stderr: no
Before command completion, additional instructions may appear below.
jordan: pattilv
cl_mklv: Error executing clupdatevg txvg 000a625afe2a4958 on node robert
Figure 13-21 C-SPOC normal error upon logical volume creation
Chapter 13. Disaster recovery using DS8700 Global Mirror
409
Increasing the size of an existing file system
Again you use C-SPOC to perform this operation. As noted previously, this process updates
the local nodes within the site. For the remote site, when a failover occurs, the lazy update
process updates the volume group information as needed. This process also adds a bit of
extra time to the failover time.
To increase the size of an existing file system, follow these steps:
1. From the command line, type the smitty cl_admin command.
2. In SMIT, select the path System Management (C-SPOC)  Storage  File Systems 
Change / Show Characteristics of a File System.
3. Select the txro file system from the pop-up menu.
4. Complete the information in the final menu, and press Enter. In the example in
Figure 13-22, notice that we change the size from 1024 MB to 1250 MB.
Change/Show Characteristics of a Enhanced Journaled File System
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[TOP]
Volume group name
Resource Group Name
* Node Names
[Entry Fields]
txvg
ds8kgmrg
robert,leeann,jordan
* File system name
NEW mount point
SIZE of file system
Unit Size
Number of Units
Mount GROUP
Mount AUTOMATICALLY at system restart?
PERMISSIONS
Mount OPTIONS
[MORE...7]
/txro
[/txro]
Megabytes
[1250]
[]
no
read/write
[]
/
+
#
+
+
+
Figure 13-22 Changing the file system size on the final C-SPOC menu
5. Upon completion of the C-SPOC operation, verify that the new file system size locally on
node jordan has increased from 250 LPAR as shown in Example 13-41 on page 409 to
313 LPAR as shown Example 13-42.
Example 13-42 Newly increased file system size
root@jordan:lsvg -l txvg
txvg:
LV NAME
TYPE
txlv
jfs2
txloglv
jfs2log
pattilv
raw
LPs
313
1
100
PPs
313
1
100
PVs
3
1
1
LV STATE
open/syncd
open/syncd
closed/syncd
MOUNT POINT
/txro
N/A
N/A
A cluster synchronization is not required, because technically the resources have not
changed. All of the changes were made to an existing volume group that is already a resource
in the resource group.
410
IBM PowerHA SystemMirror 7.1 for AIX
Testing the fallover after making the LVM changes
Because you do not know if the cluster is going to work when you need it, repeat the steps
from 13.7.2, “Rolling site failure” on page 398. The new logical volume pattilv and additional
space on /txro show up on each node. However, a noticeable difference is on the site failover
when the lazy update is performed to update the volume group changes.
13.8.2 Adding a Global Mirror pair into a new volume group
The steps to add a new volume begin the same as the steps in 13.5, “Configuring AIX volume
groups” on page 381. However, for completeness, this section provides an overview of the
steps again and then provide details about the new LUNs to be used.
In this scenario, we re-use the LUNs from the previous section. We removed them from the
volume group and removed the disks for all nodes except the main primary node jordan. In
our process, we cleared the PVID and then assigned a new PVID for a clean start.
Table 13-3 provides a summary of the LUNs that we implemented in each site.
Table 13-3 Summary of the LUNs implemented in each site
Texas
Romania
AIX dISK
LSS/VOL ID
AIX dISK
LSS/VOL ID
hdisk11
2605
hdisk10
2C06
Now continue with the following steps, which are the same as those steps for defining new
LUNs:
1. Run the cfgmgr command on the primary node jordan.
2. Assign the PVID on the node jordan:
chdev -l hdisk11 -a pv=yes
3. Configure the disk and PVID on the local node leeann by using the cfgmgr command.
4. Verify that PVID shows up by using the lspv command.
5. Pause the PPRC on the primary site.
6. Fail over the PPRC to the secondary site.
7. Fail back the PPRC to the secondary site.
8. Configure the disk and PVID on the remote node robert by using the cfgmgr command.
9. Verify that PVID shows up by using the lspv command.
10.Pause the PPRC on the secondary site.
11.Fail over the PPRC to the primary site.
12.Fail back the PPRC to the primary site.
The main difference between adding a new volume group and extending an existing one is
that, when adding a new volume group, you must swap the pairs twice. When extending an
existing volume group, you can get away with only swapping once.
The main difference between adding a new volume group and extending an existing one is
similar to the original setup where we created all LVM components on the primary site and
swap the PPRC pairs to the remote site to import the volume group and then swap it back.
You can avoid performing two swaps, as we showed, by not choosing to include the third node
when creating the volume group. Then you can swap the pairs, run cfgmgr on the new disk
with the PVID, import the volume group, and swap the pairs back.
Chapter 13. Disaster recovery using DS8700 Global Mirror
411
Creating a volume group
Create a volume group by using C-SPOC:
1. From the command line, type the smitty cl_admin command.
2. In SMIT, select the path System Management (C-SPOC)  Storage  Volume
Groups  Create a Volume to a Volume Group.
3. Select the specific nodes. In this case, we chose all three nodes as shown in Figure 13-23.
Volume Groups
Move cursor to desired item and press Enter.
List All Volume Groups
Create a Volume Group
Create a Volume Group with Data Path Devices
+--------------------------------------------------------------------------+
|
Node Names
|
|
|
| Move cursor to desired item and press F7.
|
|
ONE OR MORE items can be selected.
|
| Press Enter AFTER making all selections.
|
|
|
| > jordan
|
| > leeann
|
| > robert
|
|
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F7=Select
F8=Image
F10=Exit
|
F1| Enter=Do
/=Find
n=Find Next
|
F9+--------------------------------------------------------------------------+
Figure 13-23 Adding a volume group node pick list
412
IBM PowerHA SystemMirror 7.1 for AIX
4. Select the disk or disks by PVID as shown in Figure 13-24.
Volume Groups
Move cursor to desired item and press Enter.
List All Volume Groups
Create a Volume Group
Create a Volume Group with Data Path Devices
Set Characteristics of a Volume Group
Enable a Volume Group for Fast Disk Takeover or Concurrent Access
+--------------------------------------------------------------------------+
|
Physical Volume Names
|
|
|
| Move cursor to desired item and press F7.
|
|
ONE OR MORE items can be selected.
|
| Press Enter AFTER making all selections.
|
|
|
|
000a624a9bb74ac3 ( hdisk10 on node robert )
|
|
000a624a9bb74ac3 ( hdisk11 on nodes jordan,leeann )
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F7=Select
F8=Image
F10=Exit
|
F1| Enter=Do
/=Find
n=Find Next
|
F9+--------------------------------------------------------------------------+
Figure 13-24 Selecting the disk or disks for the new volume group pick list
Chapter 13. Disaster recovery using DS8700 Global Mirror
413
5. Select the volume group type. In this scenario, we select scalable as shown in
Figure 13-25.
Volume Groups
Move cursor to desired item and press Enter.
List All Volume Groups
Create a Volume Group
Create a Volume Group with Data Path Devices
Set Characteristics of a Volume Group
+--------------------------------------------------------------------------+
|
Volume Group Type
|
|
|
| Move cursor to desired item and press Enter.
|
|
|
|
Legacy
|
|
Original
|
|
Big
|
|
Scalable
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F8=Image
F10=Exit
Enter=Do
|
F1| /=Find
n=Find Next
|
F9+--------------------------------------------------------------------------+
Figure 13-25 Choosing the volume group type for the new volume group pick list
6. Select the proper resource group. We select ds8kgmrg as shown in Figure 13-26.
Create a Scalable Volume Group
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[TOP]
Node Names
Resource Group Name
PVID
VOLUME GROUP name
Physical partition SIZE in megabytes
Volume group MAJOR NUMBER
Enable Cross-Site LVM Mirroring Verification
Enable Fast Disk Takeover or Concurrent Access
Volume Group Type
Maximum Physical Partitions in units of 1024
Maximum Number of Logical Volumes
[Entry Fields]
jordan,leeann,robert
[ds8kgmrg]
000a624a9bb74ac3
[princessvg]
4
[51]
false
Fast Disk Takeover or>
Scalable
32
256
Figure 13-26 Create a Scalable Volume Group (final) menu
7. Select a volume group name. We select princessvg. Then press Enter.
414
IBM PowerHA SystemMirror 7.1 for AIX
+
+
#
+
+
+
+
Instead of using C-SPOC, you can perform the steps manually and then import the volume
groups on each node as needed. However, remember to add the volume group into the
resource group after creating it. With C-SPOC, you can automatically add it to the resource
group while you are creating the volume group.
You can also use the C-SPOC CLI commands (Example 13-43). These commands are in the
/usr/es/sbin/cluster/cspoc directory, and all begin with the cli_ prefix. Similar to the SMIT
menus, their operation output is also saved in the cspoc.log file.
Example 13-43 C-SPOC CLI commands
root@jordan: ls cli_*
cli_assign_pvids cli_extendlv
cli_chfs
cli_extendvg
cli_chlv
cli_importvg
cli_chvg
cli_mirrorvg
cli_crfs
cli_mklv
cli_crlvfs
cli_mklvcopy
cli_mkvg
cli_on_cluster
cli_on_node
cli_reducevg
cli_replacepv
cli_rmfs
cli_rmlv
cli_rmlvcopy
cli_syncvg
cli_unmirrorvg
cli_updatevg
Upon completion of the C-SPOC operation, the local nodes are updated, but the remote node
is not as shown in Example 13-44. The remote nodes are not updated because the target
volumes are not readable until the relationship is swapped. You see an error message from
C-SPOC as shown in the note following Example 13-44. After you create all LVM structures,
you swap the pairs back to the remote node and import the new volume group and logical
volume.
Example 13-44 New disk added to volume group on all nodes
root@jordan: lspv |grep princessvg
hdisk11
000a624a9bb74ac3
princessvg
root@leeann: lspv |grep princessvg
hdisk11
000a624a9bb74ac3
princessvg
root@robert: lspv |grep princessvg
Attention: When using C-SPOC to add a new volume group that contains a Global Mirror
replicated resource, you might see the following error message:
cl_importvg: Error executing climportvg
000a624a9bb74ac3 on node robert
-V 51 -c -y princessvg -Q
While this message is normal, if you select any remote nodes, you can omit the remote
nodes and then you do not see the error message. This step is allowed because you
manually import it anyway.
When creating the volume group, it usually is automatically added to the resource group as
shown in Example 13-45 on page 416. However, with the error message indicted in the
previous attention box, it might not be automatically added. Therefore, double check that the
volume group is added into the resource group before continuing. Otherwise we do not have
to change the resource group any further. The new LUN pairs are added to the same storage
subsystems and the same session (3) that is already defined in the mirror group texasmg.
Chapter 13. Disaster recovery using DS8700 Global Mirror
415
Example 13-45 New volume group added to existing resource group
Resource Group Name
Inter-site Management Policy
Participating Nodes from Primary Site
Participating Nodes from Secondary Site
Startup Policy
Fallover Policy
Fallback Policy
Service IP Label
Volume Groups
GENXD Replicated Resources
ds8kgmrg
Prefer Primary Site
jordan leeann
robert
Online On Home Node Only
Fallover To Next Priority Node
Never Fallback
serviceip_2
txvg princessvg +
texasmg
Adding a new logical volume on the new volume group
You repeat the steps in “Adding a new logical volume” on page 407 to create a new logical
volume, named princesslv, on the newly created volume group, princessvg, as shown in
Example 13-46.
Example 13-46 New logical volume on the newly added volume group
root@jordan: lsvg -l princessvg
princessvg:
LV NAME
TYPE
LPs
princesslv
raw
38
PPs
38
PVs
1
LV STATE
closed/syncd
MOUNT POINT
N/A
Importing the new volume group to the remote site
To import the volume group, follow the steps in 13.5.2, “Importing the volume groups in the
remote site” on page 383. As a review, we perform the following steps:
1.
2.
3.
4.
5.
6.
7.
8.
9.
Vary off the volume group on the local site.
Pause the PPRC pairs on the local site.
Fail over the PPRC pairs on the remote site.
Fail back the PPRC pairs on the remote site.
Import the volume group.
Vary off the volume group on the remote site.
Pause the PPRC pairs on the remote site.
Fail over the PPRC pairs on the local site.
Fail back the PPRC pairs on the local site.
Synchronizing and verifying the cluster configuration
You now synchronize the resource group change to include the new volume group that was
added. However, first run a verification only to check for errors. If you find errors, you must fix
them manually because they are not automatically fixed in a running environment.
Then synchronize and verify it:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Verification and
Synchronization and Verification.
416
IBM PowerHA SystemMirror 7.1 for AIX
3. Select the options as shown in Figure 13-27.
HACMP Verification and Synchronization (Active Cluster Nodes Exist)
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
[No]
[Standard]
* Verify changes only?
* Logging
F1=Help
F5=Reset
F2=Refresh
F6=Command
F3=Cancel
F7=Edit
+
+
F4=List
F8=Image
Figure 13-27 Extended Verification and Synchronization SMIT menu
4. Verify that the information is correct, and press Enter.
Upon completion, the cluster configuration is synchronize and can now be tested.
Testing the failover after adding a new volume group
Because you do not know if the cluster is going to work when needed, repeat the steps from
13.7.2, “Rolling site failure” on page 398. The new volume group princessvg and logical
volume princesslv are showing up in each node.
Chapter 13. Disaster recovery using DS8700 Global Mirror
417
418
IBM PowerHA SystemMirror 7.1 for AIX
14
Chapter 14.
Disaster recovery using Hitachi
TrueCopy and Universal
Replicator
This chapter explains how to configure disaster recovery based on IBM PowerHA
SystemMirror for AIX Enterprise Edition using Hitachi TrueCopy/Hitachi Universal Replicator
(HUR) replication services. This support is added in version 6.1 with service pack 3 (SP3).
This chapter includes the following topics:
Planning for TrueCopy/HUR management
Overview of TrueCopy/HUR management
Scenario description
Configuring the TrueCopy/HUR resources
Failover testing
LVM administration of TrueCopy/HUR replicated pairs
© Copyright IBM Corp. 2011. All rights reserved.
419
14.1 Planning for TrueCopy/HUR management
Proper planning is crucial to the success of any implementation. Plan the storage deployment
and replication necessary for your environment. This process is related to the applications
and middleware that are being deployed in the environment, which can eventually be
managed by PowerHA SystemMirror Enterprise Edition. This topic lightly covers site, network,
storage area network (SAN), and storage planning, which are all key factors. However, the
primary focus of this topic is the software prerequisites and support considerations.
14.1.1 Software prerequisites
The following software is required:
One of the following AIX levels or later:
– AIX 5.3 TL9 and RSCT 2.4.12.0
– AIX 6.1 TL2 SP3 and RSCT 2.5.4.0
Multipathing software
– AIX MPIO
– Hitachi Dynamic Link Manager (HDLM)
PowerHA 6.1 Enterprise Edition with SP3
The following additional file sets are included in SP3, must be installed separately, and
require the acceptance of the license during the installation:
– cluster.es.tc
6.1.0.0 ES HACMP - Hitachi support - Runtime Commands
6.1.0.0 ES HACMP - Hitachi support Commands
– cluster.msg.en_US.tc (optional)
6.1.0.0
6.1.0.0
6.1.0.0
6.1.0.0
HACMP
HACMP
HACMP
HACMP
Hitachi
Hitachi
Hitachi
Hitachi
support Messages - U.S. English
Messages - U.S. English IBM-850
Messages – Japanese
Messages - Japanese IBM-eucJP
Hitachi Command Control Interface (CCI) Version 01-23-03/06 or later
USPV Microcode Level 60-06-05/00 or later
14.1.2 Minimum connectivity requirements for TrueCopy/HUR
For TrueCopy/HUR connectivity, you must have the following minimum requirements in place:
Ensure connectivity from the local Universal Storage Platform VM (USP VM) to the AIX
host ports.
The external storage ports on the local USP VMs (Data Center 1 and Data Center 2) are
zoned and cabled to their corresponding existing storage systems.
Present both the primary and secondary source devices to the local USP VMs.
Primary and secondary source volumes in the migration group are presented from the
existing storage systems to the corresponding local USP VMs. This step is transparent to
the servers in the migration set. No devices are imported or accessed by the local USP
VMs at this stage.
420
IBM PowerHA SystemMirror 7.1 for AIX
Establish replication connectivity between the target storage systems.
TrueCopy initiator and MCU target ports are configured on the pair of target USP VMs,
and an MCU/RCU pairing is established to validate the configuration.
Ensure replication connectivity from the local USP VMs to the remote USP VM
TrueCopy/HUR initiator. Also ensure that MCU target ports are configured on the local and
remote USP VMs. In addition, confirm that MCU and RCU pairing is established to
validate the configuration.
For HUR, configure Universal Replicator Journal Groups on local and remote USP VM
storage systems.
Configure the target devices.
Logical devices on the target USP VM devices are formatted and presented to front-end
ports or host storage domains. This way, device sizes, logical unit numbers, host modes,
and presentation worldwide names (WWNs) are identical on the source and target storage
systems. Devices are presented to host storage domains that correspond to both
production and disaster recovery standby servers.
Configure the target zoning.
Zones are defined between servers in the migration group and the target storage system
front-end ports, but new zones are not activated at this point.
Ideally the connectivity is through redundant links, switches, and fabrics to the hosts and
between the storage units themselves.
14.1.3 Considerations
Keep in mind the following considerations for mirroring PowerHA SystemMirror Enterprise
Edition with TrueCopy/HUR:
AIX Virtual SCSI is not supported in this initial release.
Logical Unit Size Expansion (LUSE) for Hitachi is not supported.
Only fence-level NEVER is supported for synchronous mirroring.
Only HUR is supported for asynchronous mirroring.
The dev_name must map to a logical devices, and the dev_group must be defined in the
HORCM_LDEV section of the horcm.conf file.
The PowerHA SystemMirror Enterprise Edition TrueCopy/HUR solution uses dev_group
for any basic operation, such as the pairresync, pairevtwait, or horctakeover operation.
If several dev_names are in a dev_group, the dev_group must be enabled for consistency.
PowerHA SystemMirror Enterprise Edition does not trap Simple Network Management
Protocol (SNMP) notification events for TrueCopy/HUR storage. If a TrueCopy link goes
down when the cluster is up and later the link is repaired, you must manually
resynchronize the pairs.
The creation of pairs is done outside the cluster control. You must create the pairs before
you start the cluster services.
Resource groups that are managed by PowerHA SystemMirror Enterprise Edition cannot
contain volume groups with both TrueCopy/HUR-protected and
non-TrueCopy/HUR-protected disks.
All nodes in the PowerHA SystemMirror Enterprise Edition cluster must use same horcm
instance.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
421
You cannot use Cluster Single Point Of Control (C-SPOC) for the following Logical Volume
Manager (LVM) operations to configure nodes at the remote site that contain the target
volume:
– Creating a volume group
– Operations that require nodes at the target site to write to the target volumes
For example, changing the file system size, changing the mount point, or adding LVM
mirrors cause an error message in C-SPOC. However, nodes on the same site as the
source volumes can successfully perform these tasks. The changes are then
propagated to the other site by using a lazy update.
C-SPOC on other LVM operations: For C-SPOC operations to work on all other LVM
operations, perform all C-SPOC operations when the cluster is active on all PowerHA
SystemMirror Enterprise Edition nodes and the underlying TrueCopy/HUR PAIRs are in
a PAIR state.
14.2 Overview of TrueCopy/HUR management
Hitachi TrueCopy/HUR storage management uses Command Control Interface (CCI)
operations from the AIX operating system and PowerHA SystemMirror Enterprise Edition
environment. PowerHA SystemMirror Enterprise Edition uses these interfaces to discover and
integrate the Hitachi Storage replicated storage into the framework of PowerHA SystemMirror
Enterprise Edition. With this integration, you can manage high availability disaster recovery
(HADR) for applications by using the mirrored storage.
Integration of TrueCopy/HUR and PowerHA SystemMirror Enterprise Edition provides the
following benefits:
Support for the Inter-site Management policy of Prefer Primary Site or Online on Either
Site
Flexible user-customizable resource group policies
Support for cluster verification and synchronization
Limited support for the C-SPOC in PowerHA SystemMirror Enterprise Edition
Automatic failover and re-integration of server nodes attached to pairs of TrueCopy/HUR
disk subsystem within sites and across sites
Automatic management for TrueCopy/HUR links
Management for switching the direction of the TrueCopy/HUR relationships when a site
failure occurs. With this process, the backup site can take control of the managed
resource groups in PowerHA SystemMirror Enterprise Edition from the primary site
14.2.1 Installing the Hitachi CCI software
Use the following steps as a guideline to help you install the Hitachi CCI on the AIX cluster
nodes. You can also find this information in the /usr/sbin/cluster/release_notes_xd file.
However, the release notes only exist if you already have the PowerHA SystemMirror
Enterprise Edition software installed. Always consult the latest version of the Hitachi
Command Control Interface (CCI) User and Reference Guide, MK-90RD011, which you can
download from:
http://communities.vmware.com/servlet/JiveServlet/download/1183307-19474
422
IBM PowerHA SystemMirror 7.1 for AIX
If you are installing CCI from a CD, use the RMinstsh and RMuninst scripts on the CD to
automatically install and uninstall the CCI software.
Important: You must install the Hitachi CCI software into the /HORCM/usr/bin directory.
Otherwise, you must create a symbolic link to this directory.
For other media, use the instructions in the following sections.
Installing the Hitachi CCI software into a root directory
To install the Hitachi CCI software into the root directory, follow these steps:
1. Insert the installation medium into the proper I/O device.
2. Move to the current root directory:
# cd /
3. Copy all files from the installation medium by using the cpio command:
# cpio -idmu < /dev/XXXX XXXX = I/O device
Preserve the directory structure (d flag) and file modification times (m flag), and copy
unconditionally (u flag). For diskettes, load them sequentially, and repeat the command.
An I/O device name of “floppy disk” designates a surface partition of the raw device file
(unpartitioned raw device file).
4. Execute the Hitachi Open Remote Copy Manager (HORCM) installation command:
# /HORCM/horcminstall.sh
5. Verify installation of the proper version by using the raidqry command:
# raidqry -h
Model: RAID-Manager/AIX
Ver&Rev: 01-23-03/06
Usage: raidqry [options] for HORC
Installing the Hitachi CCI software into a non-root directory
To install the Hitachi CCI software into a non-root directory, follow these steps:
1. Insert the installation medium, such as a CD, into the proper I/O device.
2. Move to the desired directory for CCI. The specified directory must be mounted by a
partition of except root disk or an external disk.
# cd /Specified Directory
3. Copy all files from the installation medium by using the cpio command:
# cpio -idmu < /dev/XXXX XXXX = I/O device
Preserve the directory structure (d flag) and file modification times (m flag), and copy
unconditionally (u flag). For diskettes, load them sequentially, and repeat the command.
An I/O device name of “floppy disk” designates a surface partition of the raw device file
(unpartitioned raw device file).
4. Make a symbolic link to the /HORCM directory:
# ln -s /Specified Directory/HORCM /HORCM
5. Run the HORCM installation command:
# /HORCM/horcminstall.sh
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
423
6. Verify installation of the proper version by using the raidqry command:
# raidqry -h
Model: RAID-Manager/AIX
Ver&Rev: 01-23-03/06
Usage: raidqry [options] for HORC
Installing a newer version of the Hitachi CCI software
To install a newer version of the CCI software:
1. Confirm that HORCM is not running. If it is running, shut it down:
One CCI instance: # horcmshutdown.sh
Two CCI instances: # horcmshutdown.sh 0 1
If Hitachi TrueCopy commands are running in the interactive mode, terminate the
interactive mode and exit these commands by using the -q option.
2. Insert the installation medium, such as a CD, into the proper I/O device.
3. Move to the directory that contains the HORCM directory as in the following example for
the root directory:
# cd /
4. Copy all files from the installation medium by using the cpio command:
# cpio -idmu < /dev/XXXX XXXX = I/O device
Preserve the directory structure (d flag) and file modification times (m flag) and copy
unconditionally (u flag). For diskettes, load them sequentially, and repeat the command.
An I/O device name of “floppy disk” designates a surface partition of the raw device file
(unpartitioned raw device file).
5. Execute the HORCM installation command:
# /HORCM/horcminstall.sh
6. Verify installation of the proper version by using the raidqry command:
# raidqry -h
Model: RAID-Manager/AIX
Ver&Rev: 01-23-03/06
Usage: raidqry [options] for HORC
14.2.2 Overview of the CCI instance
The CCI components on the storage system include the command device or devices and the
Hitachi TrueCopy volumes, ShadowImage volumes, or both. Each CCI instance on a
UNIX/PC server includes the following components:
HORCM:
–
–
–
–
Log and trace files
A command server
Error monitoring and event reporting files
A configuration management feature
Configuration definition file that is defined by the user
The Hitachi TrueCopy user execution environment, ShadowImage user execution
environment, or both, which contain the TrueCopy/ShadowImage commands, a command
log, and a monitoring function.
424
IBM PowerHA SystemMirror 7.1 for AIX
14.2.3 Creating and editing the horcm.conf files
The configuration definition file is a text file that is created and edited by using any standard
text editor, such as the vi editor. A sample configuration definition file, HORCM_CONF
(/HORCM/etc/horcm.conf), is included with the CCI software. Use this file as the basis for
creating your configuration definition files. The system administrator must copy the sample
file, set the necessary parameters in the copied file, and place the copied file in the proper
directory. For detailed descriptions of the configuration definition files for sample CCI
configurations, see the Hitachi Command Control Interface (CCI) User and Reference Guide,
MK-90RD011, which you can download from:
http://communities.vmware.com/servlet/JiveServlet/download/1183307-19474
Important: Do not edit the configuration definition file while HORCM is running. Shut down
HORCM, edit the configuration file as needed, and then restart HORCM.
You might have multiple CCI instances, each of which uses its own specific horcm#.conf file.
For example, instance 0 might be horcm0.conf, instance 1 (Example 14-1) might be
horcm1.conf, and so on. The test scenario presented later in this chapter uses instance 2 and
provides examples of the horcm2.conf file on each cluster node.
Example 14-1 The hormc.conf file
Example configuration files:
horcm1.conf file on local node
-----------------------------HORCM_MON
#ip_address => Address of the local node
#ip_address
service
poll(10ms) timeout(10ms)
10.15.11.194
horcm1
12000
3000
HORCM_CMD
#dev_name => hdisk of Command Device
#UnitID 0 (Serial# eg. 45306)
/dev/hdisk19
HORCM_DEV
#Map dev_grp to LDEV#
#dev_group dev_name port# TargetID LU# MU#
VG01
test01
CL1-B
1
5
0
VG01
work01
CL1-B
1
24 0
VG01
work02
CL1-B
1
25 0
HORCM_INST
#dev_group ip_address
VG01
10.15.11.195
service
horcm1
horcm1.conf file on remote node
------------------------------HORCM_MON
#ip_address => Address of the local node
#ip_address
service
poll(10ms) timeout(10ms)
10.15.11.195
horcm1
12000
3000
HORCM_CMD
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
425
#dev_name => hdisk of Command Device
#UnitID 0 (Serial# eg. 45306)
/dev/hdisk19
HORCM_DEV
#Map dev_grp to LDEV#
#dev_group dev_name port# TargetID LU# MU#
VG01
test01
CL1-B
1
5
0
VG01
work01
CL1-B
1
21 0
VG01
work02
CL1-B
1
22 0
HORCM_INST
#dev_group ip_address
VG01
10.15.11.194
service
horcm1
NOTE 1: For the horcm instance to use any available command device, in case one of
them fails, it is RECOMMENDED that, in your horcm file, under HORCM_CMD
section, the command device, is presented in the format below,
where 10133 is the serial # of the array:
\\.\CMD-10133:/dev/hdisk/
For example:
\\.\CMD-10133:/dev/rhdisk19 /dev/rhdisk20
( note space in between).
NOTE 2: The Device_File will show "-----" for the "pairdisplay -fd" command,
which will also cause verification to fail, if the ShadowImage license
has not been activated on the storage system and the MU# column is not
empty.
It is therefore recommended that the MU# column be left blank if the
ShadowImage license is NOT activated on the storage system.
Starting the HORCM instances
To start one instance of the CCI, follow these steps:
1. Modify the /etc/services file to register the port name/number (service) of the
configuration definition file. Make the port name/number the same on all servers.
horcm xxxxx/udp xxxxx = the port name/number of horcm.conf
2. Optional: If you want HORCM to start automatically each time the system starts, add
/etc/horcmstart.sh to the system automatic startup file (such as the /sbin/rc file).
3. Run the horcmstart.sh script manually to start the CCI instance:
# horcmstart.sh
4. Set the log directory (HORCC_LOG) in the command execution environment as needed.
5. Optional: If you want to perform Hitachi TrueCopy operations, do not set the HORCC_MRCF
environment variable.
– For the B shell:
# HORCC_MRCF=1
# export HORCC_MRCF
426
IBM PowerHA SystemMirror 7.1 for AIX
– For the C shell:
# setenv HORCC_MRCF 1
# pairdisplay -g xxxx xxxx = group name
To start two instances of the CCI, follow these steps:
1. Modify the /etc/services file to register the port name/number (service) of each
configuration definition file. The port name/number must be different for each CCI
instance.
horcm0 xxxxx/udp xxxxx = the port name/number for horcm0.conf
horcm1 yyyyy/udp yyyyy = the port name/number for horcm1.conf
2. If you want HORCM to start automatically each time the system starts, add
/etc/horcmstart.sh 0 1 to the system automatic startup file (such as the /sbin/rc file).
3. Run the horcmstart.sh script manually to start the CCI instances:
# horcmstart.sh 0 1
4. Set an instance number to the environment that executes a command:
For the B shell:
# HORCMINST=X X = instance number = 0 or 1
# export HORCMINST
For the C shell:
# setenv HORCMINST X
5. Set the log directory (HORCC_LOG) in the command execution environment as needed.
6. If you want to perform Hitachi TrueCopy operations, do not set the HORCC_MRCF
environment variable.
For B shell:
# HORCC_MRCF=1
# export HORCC_MRCF
For C shell:
# setenv HORCC_MRCF 1
# pairdisplay -g xxxx xxxx = group name
14.3 Scenario description
This scenario uses four nodes, two in each of the two sites: Austin and Miami. Nodes
jessica and bina are in the Austin site, and nodes krod and maddi are in the Miami site. Each
site provides local automatic failover, along with remote recovery for the other site, which is
often referred to as a mutual takeover configuration. Figure 14-1 on page 428 provides a
software and hardware overview of the tested configuration between the two sites.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
427
Miami
Austin
AIX 6.1 TL6
PowerHA
6.1 SP3
CCI01-23-03/06
AIX 6.1 TL6
PowerHA
6.1 SP3
CCI01-23-03/06
Bina
Jessica
Krod
Maddi
USPV Microcode
60-06-16/00
USPV Microcode
60-06-05/00
FC Links
USP-VM Ser#35754
USP-V Ser#45306
hdisc38
hdisc39
truesyncvg
hdisc40
hdisc38
TrueCopy sync
hdisc41
ursasyncvg
truesyncvg
hdisc40
URS async
hdisc39
hdisc41
ursasyncvg
Figure 14-1 Hitachi replication lab environment test configuration1
Each site consists of two type Ethernet networks. In this case, both networks are used for a
public Ethernet and for cross-site networks. Usually the cross-site network is on separate
segments and is an XD_ip network. It is also common to use site-specific service IP labels.
Example 14-2 shows the interlace list from the cluster topology.
Example 14-2 Test topology information
root@jessica: llsif
Adapter
Type
jessica
boot
jessicaalt
boot
service_1
service
service_2
service
bina
boot
bina alt
boot
service_1
service
service_2
service
krod
boot
krod alt
boot
service_1
service
service_2
service
maddi
boot
maddi alt
boot
service_1
service
service_2
service
1
428
Courtesy of Hitachi Data Systems
IBM PowerHA SystemMirror 7.1 for AIX
Network
Net Type
net_ether_02 ether
net_ether_03 ether
net_ether_03 ether
net_ether_03 ether
net_ether_02 ether
net_ether_03 ether
net_ether_03 ether
net_ether_03 ether
net_ether_02 ether
net_ether_03 ether
net_ether_03 ether
net_ether_03 ether
net_ether_02 ether
net_ether_03 ether
net_ether_03 ether
net_ether_03 ether
Attribute
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
Node
jessica
jessica
jessica
jessica
bina
bina
bina
bina
krod
krod
krod
krod
maddi
maddi
maddi
maddi
IP Address
9.3.207.24
207.24.1.1
1.2.3.4
1.2.3.5
9.3.207.77
207.24.1.2
1.2.3.4
1.2.3.5
9.3.207.79
207.24.1.3
1.2.3.4
1.2.3.5
9.3.207.78
207.24.1.4
1.2.3.4
1.2.3.5
In this scenario, each node or site has four unique disks defined through each of the two
separate Hitachi storage units. The jessica and bina nodes at the Austin site have two disks,
hdisk38 and hdisk3. These disks are the primary source volumes that use TrueCopy
synchronous replication for the truesyncvg volume group. The other two disks, hdisk40 and
hdisk41, are to be used as the target secondary volumes that use HUR for asynchronous
replication from the Miami site for the ursasyncvg volume group.
The krod and bina nodes at the Miami site have two disks, hdisk38 and hdisk39. These disks
are the secondary target volumes for the TrueCopy synchronous replication of the truesyncvg
volume group from the Austin site. The other two disks, hdisk40 and hdisk41, are to be used
as the primary source volumes for the ursasyncvg volume group that uses HUR for
asynchronous replication.
14.4 Configuring the TrueCopy/HUR resources
This topic explains how to perform the following tasks to configure the resources for
TrueCopy/HUR:
Assigning LUNs to the hosts (host groups)
Creating replicated pairs
Configuring an AIX disk and dev_group association
For each of these tasks, the Hitachi storage units have been added to the SAN fabric and
zoned appropriately. Also, the host groups have been created for the appropriate node
adapters, and the LUNs have been created within the storage unit.
14.4.1 Assigning LUNs to the hosts (host groups)
In this task, you assign LUNs by using the Hitachi Storage Navigator. Although an overview of
the steps is provided, always refer to the official Hitachi documentation for your version as
needed.
To begin, the Hitachi USP-V storage unit is at the Austin site. The host group, JessBina, is
assigned to port CL1-E on the Hitachi storage unit with the serial number 45306. Usually the
host group is assigned to multiple ports for full multipath redundancy.
To assign the LUNs to the hosts, follow these steps:
1. Locate the free LUNs and assign them to the proper host group.
a. Verify whether LUNs are currently assigned by checking the number of paths
associated with the LUN. If the fields are blank, the LUN is currently unassigned.
b. Assign the LUNs. To assign one LUN, click and drag it to a free LUN/LDEV location. To
assign multiple LUNs, hold down the Shift key and click each LUN. Then right-click the
selected LUNs and drag them to a free location.
This free location is indicated by a black and white disk image that also contains no
information in the corresponding attribute columns of LDEV/UUID/Emulation as shown
in Figure 14-2 on page 430.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
429
Figure 14-2 Assigning LUNs to the Austin site nodes2
2. In the path verification window (Figure 14-3), check the information and record the LUN
number and LDEV numbers. You use this information later. However, you can also retrieve
this information from the AIX system after the devices are configured by the host. Click
OK.
Figure 14-3 Checking the paths for the Austin LUNs3
2
3
430
Courtesy of Hitachi Data Systems
Courtesy of Hitachi Data Systems
IBM PowerHA SystemMirror 7.1 for AIX
3. Back on the LUN Manager tab (Figure 14-4), click Apply for these paths to become active
and the assignment to be completed.
Figure 14-4 Applying LUN assignments for Austin4
You have completed assigning four more LUNs for the nodes at the Austin site. However the
lab environment already had several LUNs, including both command and journaling LUNs in
the cluster nodes. These LUNs were added solely for this test scenario.
Important: If these LUNs are the first ones to be allocated to the hosts, you must also
assign the command LUNs. See the appropriate Hitachi documentation as needed.
For the storage unit at the Miami site, repeat the steps that you performed for the Austin site.
The host group, KrodMaddi, is assigned to port CL1-B on the Hitachi UPS-VM storage unit
with the serial number 35764. Usually the host group is assigned to multiple ports for full
multipath redundancy. Figure 14-5 on page 432 shows the result of these steps.
Again record both the LUN numbers and LDEV numbers so that you can easily refer to them
as needed when creating the replicated pairs. The numbers are also required when you add
the LUNs into device groups in the appropriate horcm.conf file.
4
Courtesy of Hitachi Data Systems
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
431
Figure 14-5 Miami site LUNs assigned5
14.4.2 Creating replicated pairs
PowerHA SystemMirror Enterprise Edition does not create replication pairs by using the
Hitachi interfaces. You must use the Hitachi Storage interfaces to create the same replicated
pairs before using PowerHA SystemMirror Enterprise Edition to achieve an HADR solution.
For information about setting up TrueCopy/HUR pairs, see the Hitachi Command Control
Interface (CCI) User and Reference Guide, MK-90RD011, which you can download from:
http://communities.vmware.com/servlet/JiveServlet/download/1183307-19474
You must know exactly which LUNs from each storage unit will be paired together. They must
be the same size. In this case, all of the LUNs that are used are 2 GB in size. The pairing of
LUNs also uses the LDEV numbers. The LDEV numbers are hexadecimal values that also
show up as decimal values on the AIX host.
5
432
Courtesy of Hitachi Data Systems
IBM PowerHA SystemMirror 7.1 for AIX
Table 14-1 translates the LDEV hex values of each LUN and its corresponding decimal value.
Table 14-1 LUN number to LDEV number comparison
Austin - 45306
Miami - 35764
LUN number
LDEV-HEX
LDEV-DEC number
LUN number
LDEV-HEX
LDEV-DEC number
000A
00:01:10
272
001C
00:01:0C
268
000B
00:01:11
273
001D
00:01:0D
269
000C
00:01:12
274
001E
00:01:0E
271
000D
00:01:13
275
001F
00:01:0E
272
Although the pairing can be done by using the CCI, the example in this section shows how to
create the replicated pairs through the Hitachi Storage Navigator. The appropriate commands
are in the /HORCM/usr/bin directory. In this scenario, none of the devices have been
configured to the AIX cluster nodes.
Creating TrueCopy synchronous pairings
Beginning with the Austin Hitachi unit, create two synchronous TrueCopy replicated pairings.
1. From within Storage Navigator (Figure 14-6), select Go  TrueCopy  Pair Operation.
Figure 14-6 Storage Navigator menu options to perform a pair operation6
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
433
2. In the TrueCopy Pair Operation window (Figure 14-7), select the appropriate port, CL-1E,
and find the specific LUNs to use (00-00A and 00-00B).
In this scenario, we have predetermined that we want to pair these LUNs with 00-01C and
00-01D from the Miami Hitachi storage unit on port CL1-B. Notice in the occurrence of
SMPL in the Status column next to the LUNs. SMPL indicates simplex, meaning that no
mirroring is being used with that LUN.
3. Right-click the first Austin LUN (00-00A), and select Paircreate  Synchronize
(Figure 14-7).
Figure 14-7 Creating a TrueCopy synchronous pairing7
6
7
434
Courtesy of Hitachi Data Systems
Courtesy of Hitachi Data Systems
IBM PowerHA SystemMirror 7.1 for AIX
4. In the full synchronous Paircreate menu (Figure 14-8), select the proper port and LUN that
you previously created and recorded. Click Set.
Because we have only one additional remote storage unit, the RCU field already shows
the proper one for Miami.
5. Repeat step 4 for the second LUN pairing. Figure 14-8 shows details of the two pairings.
Figure 14-8 TrueCopy pairings8
8
Courtesy of Hitachi Data Systems
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
435
6. After you complete the pairing selections, on the Pair Operation tab, verify that the
information is correct and click Apply to apply them all at one time.
Figure 14-9 shows both of the source LUNs in the middle of the pane. It also shows an
overview of which remote LUNs they are to be paired with.
Figure 14-9 Applying TrueCopy pairings9
9
436
Courtesy of Hitachi Data Systems
IBM PowerHA SystemMirror 7.1 for AIX
This step automatically starts copying the LUNs from the local Austin primary source to the
remote Miami secondary source LUNs. You can also right-click a LUN and select Detailed
Information as shown in Figure 14-10.
Figure 14-10 Detailed LUN pairing and copy status information10
10
Courtesy of Hitachi Data Systems
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
437
After the copy has completed, the status is displayed as PAIR as shown in Figure 14-11. You
can also view this status from the management interface of either one of the storage units.
Figure 14-11 TrueCopy pairing and copy completed11
11
438
Courtesy of Hitachi Data Systems
IBM PowerHA SystemMirror 7.1 for AIX
Creating a Universal Replicator asynchronous pairing
Now switch over to the Miami Hitachi storage unit to create the asynchronous replicated
pairings.
1. From the Storage Navigator, select Go  Universal Replicator  Pair Operation
(Figure 14-12).
Figure 14-12 Menu selection to perform the pair operation12
12
Courtesy of Hitachi Data Systems
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
439
2. In the Universal Replicator Pair Operation window (Figure 14-13), select the appropriate
port CL-1B and find the specific LUNs that you want to use, which are 00-01E and 00-01F
in this example). We have already predetermined that we want to pair these LUNs with
00-0C and 00-00D from the Austin Hitachi storage unit on port CL1-E.
Right-click one of the desired LUNs and select Paircreate.
Figure 14-13 Selecting Paircreate in the Universal Replicator13
13
440
Courtesy of Hitachi Data Systems
IBM PowerHA SystemMirror 7.1 for AIX
3. In the full synchronous Paircreate window, complete these steps:
a. Select the proper port and LUN that you previously created and recorded.
b. Because we only have one additional remote storage unit, the RCU field already shows
the proper one for Austin.
c. Unlike when using TrueCopy synchronous replication, when using Universal
Replicator, specify a master journal volume (M-JNL), a remote journal volume
(R-JNL), and a consistency (CT) group.
Important: If these are the first Universal Replicator LUNs to be allocated, you must
also assign journaling groups and LUNs for both storage units. Refer to the
appropriate Hitachi Universal Replicator documentation as needed.
We chose ones that have been already previously created in the environment.
d. Click Set
e. Repeat these steps for the second LUN pairing.
Figure 14-14 shows details of the two pairings.
Figure 14-14 Paircreate details in Universal Replicator14
14
Courtesy of Hitachi Data Systems
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
441
4. After you complete the pairing selections, on the Pair Operation tab, verify that the
information is correct and click Apply to apply them all at one time.
When the pairing is established, the copy automatically begins to synchronize with the
remote LUNs at the Austin site. The status changes to COPY, as shown in Figure 14-15,
until the pairs are in sync. After the pairs are synchronized, their status changes to PAIR.
Figure 14-15 Asynchronous copy in progress in Universal Replicator15
15
442
Courtesy of Hitachi Data Systems
IBM PowerHA SystemMirror 7.1 for AIX
5. Upon completion of the synchronization of the LUNs, configure the LUNs into the AIX
cluster nodes. Figure 14-16 shows an overview of the Hitachi replicated environment.
Figure 14-16 Replicated Hitachi LUN overview16
14.4.3 Configuring an AIX disk and dev_group association
Before you continue with the steps in this section, you must ensure that the Hitachi hdisks are
made available to your nodes. You can run the cfgmgr command to configure the new hdisks.
Also the CCI must already be installed on each cluster node. If you must install the CCI, see
14.2.1, “Installing the Hitachi CCI software” on page 422.
In the test environment, we already have hdisk0-37 on each of the four cluster nodes. After
running the cfgmgr command one each node, one at a time, we now have four additional
disks, hdisk38-hdisk41, as shown in Example 14-3.
Example 14-3 New Hitachi disks
root@jessica:
hdisk38
hdisk39
hdisk40
hdisk41
none
none
none
none
None
None
None
None
Although the LUN and LDEV numbers were written down during the initial LUN assignments,
you must identify the correct LDEV numbers of the Hitachi disks and the corresponding AIX
hdisks by performing the following steps:
16
Courtesy of Hitachi Data Systems
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
443
1. On the PowerHA SystemMirror Enterprise Edition nodes, select the Hitachi disks and the
disks that will be used in the TrueCopy/HUR relationships by running the inqraid
command. Example 14-4 shows hdisk38-hdisk41, which are the Hitachi disks that we just
added.
Example 14-4 Hitachi disks added
root@jessica:
# lsdev -Cc disk|grep hdisk|/HORCM/usr/bin/inqraid
hdisk38 -> [SQ] CL1-E Ser =
45306 LDEV = 272 [HITACHI
HORC = P-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL
RAID5[Group 1- 2] SSID = 0x0005
hdisk39 -> [SQ] CL1-E Ser =
45306 LDEV = 273 [HITACHI
HORC = P-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL
RAID5[Group 1- 2] SSID = 0x0005
hdisk40 -> [SQ] CL1-E Ser =
45306 LDEV = 274 [HITACHI
HORC = S-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL
RAID5[Group 1- 2] SSID = 0x0005 CTGID = 10
hdisk41 -> [SQ] CL1-E Ser =
45306 LDEV = 275 [HITACHI
HORC = S-VOL HOMRCF[MU#0 = SMPL MU#1 = SMPL
RAID5[Group 1- 2] SSID = 0x0005 CTGID = 10
] [OPEN-V
MU#2 = SMPL]
]
] [OPEN-V
MU#2 = SMPL]
]
] [OPEN-V
MU#2 = SMPL]
]
] [OPEN-V
MU#2 = SMPL]
]
2. Edit the HORCM LDEV section in the horcm#.conf file to identify the dev_group that will be
managed by PowerHA SystemMirror Enterprise Edition. In this example, we use the
horcm2.conf file.
Hdisk38 (ldev 272) and hdisk39 (ldev 273) are the pair for the synchronous replicated
resource group, which is primary at the Austin site. Hdisk40 (ldev 275) and hdisk41
(ldev276) are the pair for an asynchronous replicated resource, which is primary at the
Miami site.
Specify the device groups (dev_group) in the horcm#.conf file. We are using dev_group
htcdg01 with dev_names htcd01 and htcd02 for the synchronous replicated pairs. For the
asynchronous pairs, we are using dev_group hurdg01 and dev_names hurd01 and hurd02.
The device group names are needed later when checking that status of the replicated
pairs and when defining the replicated pairs as a resource for PowerHA Enterprise Edition
to control.
Important: Do not edit the configuration definition file while HORCM is running. Shut
down HORCM, edit the configuration file as needed, and then restart HORCM.
Example 14-5 shows the horcm2.conf file from the jessica node, at the Austin site.
Because two nodes are at the Austin site, the same updates were performed to the
/etc/horcm2.conf file on the bina node. Notice that you can use either the decimal value
of the LDEV or the hexidecimal value.
We specifically did one pair each way just to show it and to demonstrate that it works.
Although several groups were already defined, only those that are relevant to this scenario
are shown.
Example 14-5 Horcm2.conf file used for the Austin site nodes
root@jessica:
/etc/horcm2.conf
HORCM_MON
#Address of local node...
#ip_address
service
444
IBM PowerHA SystemMirror 7.1 for AIX
poll(10ms)
timeout(10ms)
r9r3m11.austin.ibm.com
52323
1000
HORCM_CMD
#hdisk of Command Device...
#dev_name
dev_name
#UnitID 0 (Serial# 45306)
#/dev/rhdisk10
\\.\CMD-45306:/dev/rhdisk10 /dev/rhdisk14
HORCM_LDEV
#Map dev_grp
#dev_group
#
#--------htcdg01
htcdg01
hurdg01
hurdg01
to LDEV#...
dev_name
Serial#
--------htcd01
htcd02
hurd01
hurd02
------45306
45306
45306
45306
CU:LDEV
(LDEV#)
-------272
273
01:12
01:13
3000
dev_name
MU#
---
siteA
siteB
hdisk
-> hdisk
--------------------
# Address of remote node for each dev_grp...
HORCM_INST
#dev_group
ip_address service
htcdg01
maddi.austin.ibm.com 52323
hurdg01
maddi.austin.ibm.com 52323
For the krod and maddi nodes at the Miami site, the dev_groups, dev_names, and the
LDEV numbers are the same. The difference is the specific serial number of the storage
unit at that site. Also, the remote system or IP address for the appropriate system in the
Austin site.
Example 14-6 shows the horcm2.conf file that we used for both nodes in the Miami site.
Notice that, for the ip_address fields, fully qualified names are used instead of the IP
address. As long as these names are resolvable, the format is still valid. However, the
format is seen using the actual addresses as shown in Example 14-1 on page 425.
Example 14-6 The horcm2.conf file used for the nodes in the Miami site
root@krod:
horcm2.conf
HORCM_MON
#Address of local node...
#ip_address
service
r9r3m13.austin.ibm.com 52323
poll(10ms)
1000
HORCM_CMD
#hdisk of Command Device...
#dev_name
dev_name
#UnitID 0 (Serial# 35764)
#/dev/rhdisk10
# /dev/hdisk19
\\.\CMD-45306:/dev/rhdisk11 /dev/rhdisk19
#HUR_GROUP
htcdg01
htcdg01
hurdg01
HUR_103_153 45306
htcd01
35764
htcd02
35764
hurd01
35764
01:53
268
269
01:0E
timeout(10ms)
3000
dev_name
0
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
445
hurdg01
hurd02
35764
01:0F
# Address of remote node for each dev_grp...
HORCM_INST
#dev_group
htcdg01
hurdg01
ip_address service
bina.austin.ibm.com
bina.austin.ibm.com
52323
52323
3. Map the TrueCopy-protected hdisks to the TrueCopy device groups by using the raidscan
command. In the following example, 2 is the HORCM instance number:
lsdev -Cc disk|grep hdisk | /HORCM/usr/bin/raidscan -IH2 -find inst
The -find inst option of the raidscan command registers the device file name (hdisk) to
all mirror descriptors of the LDEV map table for HORCM. This option also permits the
matching volumes on the horcm.conf file in protection mode and is started automatically
by using the /etc/horcmgr command. Therefore you do not need to use this option
normally. This option is terminated to avoid wasteful scanning when the registration has
been finished based on HORCM.
Therefore, if HORCM no longer needs the registration, then no further action is taken and
it exits. You can use the -find inst option with the -fx option to view LDEV numbers in
the hexadecimal format.
4. Verify that the PAIRs are established by running either the pairvdisplay command or the
pairvolchk command against the device groups htcdg01 and hurdg01.
Example 14-7 shows how we use the pairvdisplay command. For device group htcdg01,
the status of PAIR and fence of NEVER indicates that they are a synchronous pair. For
device group hurdg01, the ASYNC fence option clearly indicates that it is in an
asynchronous pair. Also notice that the CTG field shows the consistency group number for
the asynchronous pair managed by HUR.
Example 14-7 The pairdisplay command to verify that the pair status is synchronized
# pairdisplay -g htcdg01 -IH2 -fe
Group
PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV#
htcdg01 htcd01(L)
(CL1-E-0, 0, 10)45306
272.P-VOL PAIR NEVER ,35764
268
htcdg01 htcd01(R)
(CL1-B-0, 0, 28)35764
268.S-VOL PAIR NEVER ,----272
htcdg01 htcd02(L)
(CL1-E-0, 0, 11)45306
273.P-VOL PAIR NEVER ,35764
269
htcdg01 htcd02(R)
(CL1-B-0, 0, 29)35764
269.S-VOL PAIR NEVER ,----273
M CTG JID AP
- 1
- - 1
- -
# pairdisplay -g hurdg01 -IH2 -fe
Group
PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV#
hurdg01 hurd01(L)
(CL1-E-0, 0, 12)45306
274.S-VOL PAIR ASYNC ,----270
hurdg01 hurd01(R)
(CL1-B-0, 0, 30)35764
270.P-VOL PAIR ASYNC ,45306
274
hurdg01 hurd02(L)
(CL1-E-0, 0, 13)45306
275.S-VOL PAIR ASYNC ,----271
hurdg01 hurd02(R)
(CL1-B-0, 0, 31)35764
271.P-VOL PAIR ASYNC ,45306
275
M CTG JID AP
- 10
3 1
- 10
3 2
- 10
3 1
- 10
3 2
To show the output in Example 14-7, we removed the last three columns of the output
because it was not relevant to what we are checking.
446
IBM PowerHA SystemMirror 7.1 for AIX
Unestablished pairs: If pairs are not yet established, the status is displayed as SMPL. To
continue, you must create the pairs. For instructions about creating pairs from the
command line, see the Hitachi Command Control Interface (CCI) User and Reference
Guide, MK-90RD011, which you can download from:
http://communities.vmware.com/servlet/JiveServlet/download/1183307-19474
Otherwise, if you are using Storage Navigator, see 14.4.2, “Creating replicated pairs” on
page 432.
Creating volume groups and file systems on replicated disks
After identifying the hdisks and dev_groups that will be managed by PowerHA SystemMirror
Enterprise Edition, you must create the volume groups and file systems. To set up volume
groups and file systems in the replicated disks, follow these steps:
1. On each of the four PowerHA SystemMirror Enterprise Edition cluster nodes, verify the
next free major number by running the lvlstmajor command on each cluster node. Also
verify that the physical volume name for the file system can also be used across sites.
In this scenario, we use the major numbers 56 for the truesyncvg volume group and 57 for
the ursasyncvg volume group. We use these numbers later when importing the volume to
the other cluster nodes. Although the major numbers are not required to match, it is a
preferred practice.
We create the truesyncvg scalable volume group on the jessica node where the primary
LUNs are located. We also create the logical volumes, jfslog, and file systems as shown
in Example 14-8.
Example 14-8 Details about the truesyncvg volume group
root@jessica:lsvg truesyncvg
VOLUME GROUP:
truesyncvg
00cb14ce00004c000000012b564c41b9
VG STATE:
active
VG PERMISSION:
read/write
MAX LVs:
256
LVs:
3
OPEN LVs:
3
TOTAL PVs:
2
STALE PVs:
0
ACTIVE PVs:
2
MAX PPs per VG:
32768
LTG size (Dynamic): 256 kilobyte(s)
HOT SPARE:
no
PV RESTRICTION:
none
root@jessica:lsvg -l truesyncvg
lsvg -l truesyncvg
truesyncvg:
LV NAME
TYPE
LPs
oreolv
jfs2
125
majorlv
jfs2
125
truefsloglv
jfs2log
1
VG IDENTIFIER:
PP SIZE:
TOTAL PPs:
FREE PPs:
USED PPs:
QUORUM:
VG DESCRIPTORS:
STALE PPs:
AUTO ON:
MAX PVs:
AUTO SYNC:
BB POLICY:
PPs
125
125
1
PVs
1
1
1
4 megabyte(s)
988 (3952 megabytes)
737 (2948 megabytes)
251 (1004 megabytes)
2 (Enabled)
3
0
no
1024
no
relocatable
LV STATE
closed/syncd
closed/syncd
closed/syncd
MOUNT POINT
/oreofs
/majorfs
N/A
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
447
We create the ursasyncvg big volume group on the krod node where the primary LUNs
are located. We also create the logical volumes, jfslog, and file systems as shown in
Example 14-9.
Example 14-9 Ursasyncvg volume group information
root@krod:lspv
hdisk40
00cb14ce5676ad24
hdisk41
00cb14ce5676afcf
ursasyncvg
ursasyncvg
root@krod:lsvg ursasyncvg
VOLUME GROUP:
ursasyncvg
00cb14ce00004c000000012b5676b11e
VG STATE:
active
VG PERMISSION:
read/write
MAX LVs:
512
LVs:
3
OPEN LVs:
3
TOTAL PVs:
2
STALE PVs:
0
ACTIVE PVs:
2
MAX PPs per VG:
130048
MAX PPs per PV:
1016
LTG size (Dynamic): 256 kilobyte(s)
HOT SPARE:
no
root@krod:lsvg -l ursasyncvg
ursasyncvg:
LV NAME
TYPE
ursfsloglv
jfs2log
hannahlv
jfs2
julielv
jfs2
LPs
2
200
220
active
active
VG IDENTIFIER:
PPs
2
200
220
PP SIZE:
TOTAL PPs:
FREE PPs:
USED PPs:
QUORUM:
VG DESCRIPTORS:
STALE PPs:
AUTO ON:
4 megabyte(s)
1018 (4072 megabytes)
596 (2384 megabytes)
422 (1688 megabytes)
2 (Enabled)
3
0
no
MAX PVs:
AUTO SYNC:
BB POLICY:
128
no
relocatable
PVs
1
1
1
LV STATE
closed/syncd
closed/syncd
closed/syncd
MOUNT POINT
N/A
/hannahfs
/juliefs
2. Vary off the newly created volume groups by running the varyoffvg command. To import
the volume groups onto the other three systems, the pairs must be in sync.
We execute the pairresync command as shown in Example 14-10 on the local disks and
make sure that they are in the PAIR state. This process verifies that the local disk
information has been copied to the remote storage. Notice that the command is being run
on the respective node that contains the primary source LUNs and where the volume
groups are created.
Example 14-10 Pairresync command
#root@jessica:pairresync -g htcdg01 -IH2
#root@krod:pairresync -g hurdg01 -IH2
Verify that the pairs are in sync with the pairdisplay command as shown in Example 14-7
on page 446.
448
IBM PowerHA SystemMirror 7.1 for AIX
3. Split the pair relationship so that the remote systems can import the volume groups as
needed on each node. Run the pairsplit command against the device group as shown in
Example 14-11.
Example 14-11 The pairsplit command to suspend replication
root@jessica: pairsplit -g htcdg01 -IH2
root@krod: pairsplit -g hurdg01 -IH2
To verify that the pairs are split, check the status by using the pairdisplay command.
Example 14-12 shows that the pairs are in a suspended state.
Example 14-12 Pairdisplay shows pairs suspended
root@jessica: pairdisplay -g htcdg01 -IH2 -fe
Group
PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M CTG JID AP
htcdg01 htcd01(L)
(CL1-E-0, 0, 10)45306
272.P-VOL PSUS NEVER ,35764
268 - 1
htcdg01 htcd01(R)
(CL1-B-0, 0, 28)35764
268.S-VOL SSUS NEVER ,----272 - htcdg01 htcd02(L)
(CL1-E-0, 0, 11)45306
273.P-VOL PSUS NEVER ,35764
269 - 1
htcdg01 htcd02(R)
(CL1-B-0, 0, 29)35764
269.S-VOL SSUS NEVER ,----273 - root@krod: pairdisplay -g hurdg01 -IH2 -fe
Group
PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M CTG JID AP
hurdg01 hurd01(L)
(CL1-B-0, 0, 30)35764
270.P-VOL PSUS ASYNC ,45306
274 - 10
3 2
hurdg01 hurd01(R)
(CL1-E-0, 0, 12)45306
274.S-VOL SSUS ASYNC ,----270 - 10
3 1
hurdg01 hurd02(L)
(CL1-B-0, 0, 31)35764
271.P-VOL PSUS ASYNC ,45306
275 - 10
3 2
hurdg01 hurd02(R)
(CL1-E-0, 0, 13)45306
275.S-VOL SSUS ASYNC ,----271 - 10
3 1
4. To import the volume groups on the remaining nodes, ensure that the PVID is present on
the disks by using one of the following options:
– Run the rmdev -dl command for each hdisk and then run the cfgmgr command.
– Run the appropriate chdev command against each disk to pull in the PVID.
As shown in Example 14-13, we use the chdev command on each of the three additional
nodes.
Example 14-13 The chdev command to acquire the PVIDs
root@jessica: chdev -l hdisk40 -a pv=yes
root@jessica: chdev -l hdisk41 -a pv=yes
root@bina:
root@bina:
root@bina:
root@bina:
chdev
chdev
chdev
chdev
-l
-l
-l
-l
hdisk38
hdisk39
hdisk40
hdisk41
-a
-a
-a
-a
pv=yes
pv=yes
pv=yes
pv=yes
root@krod: chdev -l hdisk38 -a pv=yes
root@krod: chdev -l hdisk39 -a pv=yes
root@maddi:
root@maddi:
root@maddi:
root@maddi:
chdev
chdev
chdev
chdev
-l
-l
-l
-l
hdisk38
hdisk39
hdisk40
hdisk41
-a
-a
-a
-a
pv=yes
pv=yes
pv=yes
pv=yes
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
449
5. Verify that the PVIDs are correctly showing on each system by running the lspv command
as shown in Example 14-14. Because all four of the nodes have the exact hdisk
numbering, we show the output only from one node, the bina node.
Example 14-14 LSPV listing to verify PVIDs are present
bina@root: lspv
hdisk38
00cb14ce564c3f44
hdisk39
00cb14ce564c40fb
hdisk40
00cb14ce5676ad24
hdisk41
00cb14ce5676afcf
none
none
none
none
6. Import the volume groups on each node as needed by using the importvg command.
Specify the major number that you used earlier.
7. Disable both the auto varyon and quorum settings of the volume groups by using the chvg
command.
8. Vary off the volume group as shown in Example 14-15.
Attention: PowerHA SystemMirror Enterprise Edition attempts to automatically set the
AUTO VARYON to NO during verification, except in the case of remote TrueCopy/HUR.
Example 14-15 Importing the replicated volume groups
root@jessica: importvg -y ursasyncvg -V 57 hdisk40
root@jessica: chvg -a n -Q n ursasyncvg
root@jessica: varyoffvg ursasyncvg
root@bina:
root@bina:
root@bina:
root@bina:
root@bina:
root@bina:
importvg -y truesyncvg -V 56 hdisk38
importvg -y ursasyncvg -V 57 hdisk40
chvg -a n -Q n truesyncvg
chvg -a n -Q n ursasyncvg
varyoffvg truesyncvg
varyoffvg ursasyncvg
root@krod: importvg -y truesyncvg -V 56 hdisk38
root@krod: chvg -a n -Q n truesyncvg
root@krod: varyoffvg truesyncvg
root@maddi:
root@maddi:
root@maddi:
root@maddi:
root@maddi:
root@maddi:
importvg -y truesyncvg -V 56 hdisk38
importvg -y ursasyncvg -V 57 hdisk40
chvg -a n -Q n truesyncvg
chvg -a n -Q n ursasyncvg
varyoffvg truesyncvg
varyoffvg ursasyncvg
9. Re-establish the pairs that you split in step 3 on page 449 by running the pairresync
command again as shown in Example 14-10 on page 448.
10.Verify again if they are in sync by using the pairdisplay command as shown in
Example 14-7 on page 446.
450
IBM PowerHA SystemMirror 7.1 for AIX
14.4.4 Defining TrueCopy/HUR managed replicated resource to PowerHA
To add a replicated resource to be controlled by PowerHA consists of two specific steps per
device group, and four steps overall:
Adding TrueCopy/HUR replicated resources
Adding the TrueCopy/HUR replicated resources to a resource group
Verifying the TrueCopy/HUR configuration
Synchronizing the cluster configuration
In these steps, the cluster topology has been configured, including all four nodes, both sites,
and networks.
Adding TrueCopy/HUR replicated resources
To define a TrueCopy replicated resource, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  TrueCopy Replicated Resources  Add Hitachi TrueCopy/HUR
Replicated Resource.
3. In the Ad Hitachi TrueCopy/HUR Replication Resource panel, press Enter.
4. Complete the available fields appropriately and press Enter.
In this configuration, we created two replicated resources. One resource is for the
synchronous device group, htcdg01, named trulee. The second resource for the
asynchronous device group, hurdg01, named ursasyncRR. Example 14-16 shows both of the
replicated resources.
Example 14-16 TrueCopy/HUR replicated resource definitions
Add a HITACHI TRUECOPY(R)/HUR Replicated Resource
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
*
*
*
*
*
*
*
TRUECOPY(R)/HUR Resource Name
TRUECOPY(R)/HUR Mode
Device Groups
Recovery Action
Horcm Instance
Horctakeover Timeout Value
Pairevtwait Timeout Value
[Entry Fields]
[truelee]
SYNC
[htcdg01]
AUTO
[horcm2]
[300]
[3600]
+
+
+
#
#
Add a HITACHI TRUECOPY(R)/HUR Replicated Resource
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
*
*
*
*
TRUECOPY(R)/HUR Resource Name
TRUECOPY(R)/HUR Mode
Device Groups
Recovery Action
[Entry Fields]
[ursasyncRR]
ASYNC
[hurdg01]
AUTO
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
+
+
+
451
* Horcm Instance
* Horctakeover Timeout Value
* Pairevtwait Timeout Value
[horcm2]
[300]
[3600]
#
#
For a complete list of all of defined TrueCopy/HUR replicated resources, run the cllstc
command, which is in the /usr/es/sbin/cluster/tc/cmds directory. Example 14-17 shows
the output of the cllstc command.
Example 14-17 The cllstc command to list the TrueCopy/HUR replicated resources
root@jessica: cllstc -a
Name
CopyMode DeviceGrps
truelee
SYNC
htcdg01
ursasyncRR
ASYNC
hurdg01
RecoveryAction
AUTO
AUTO
HorcmInstance HorcTimeOut
horcm2
300
horcm2
300
PairevtTimeout
3600
3600
Adding the TrueCopy/HUR replicated resources to a resource group
To add a TrueCopy replicated resource to a resource group, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Resource
Configuration  Extended Resource Group Configuration.
Depending on whether you are working with an existing resource group or creating a
resource group, the TrueCopy Replicated Resources entry is displayed at the bottom of
the page in SMIT. This entry is a pick list that shows the resource names that are created
in the previous task.
3. Ensure that the volume groups that are selected on the Resource Group configuration
display match the volume groups that are used in the TrueCopy/HUR Replicated
Resource:
– If you are changing an existing resource group, select Change/Show Resource
Group.
– If you are adding a resource group, select Add a Resource Group.
4. In the TrueCopy Replicated Resources field, press F4 for a list of the TrueCopy/HUR
replicated resources that were previously added. Verify that this resource matches the
volume group that is specified.
Important: You cannot mix regular (non-replicated) volume groups and TrueCopy/HUR
replicated volume groups in the same resource group.
Press Enter.
In this scenario, we changed an existing resource group, emlecRG, for the Austin site and
specifically chose a site relationship, also known as an Inter-site Management Policy of Prefer
Primary Site. We added a new resource group, valhallarg, for the Miami site and chose to
use the same site relationship. We also added the additional nodes from each site. We
configured both to failover locally within a site and failover between sites. If a site failure
occurs, the node falls over to the remote site standby node, but never to the remote
production node.
452
IBM PowerHA SystemMirror 7.1 for AIX
Example 14-18 shows the relevant resource group information.
Example 14-18 Resource groups for the TrueCopy/HUR replicated resources
Resource Group Name
Participating Node Name(s)
Startup Policy
Fallover Policy
Fallback Policy
Site Relationship
Node Priority
Service IP Label
Volume Groups
Hitachi TrueCopy Replicated Resources
emlecRG
jessica bina maddi
Online On Home Node Only
Fallover To Next Priority Node
Never Fallback
Prefer Primary Site
Resource Group Name
Participating Node Name(s)
Startup Policy
Fallover Policy
Fallback Policy
Site Relationship
Node Priority
Service IP Label
Volume Groups
Hitachi TrueCopy Replicated Resources
valhallaRG
krod maddi bina
Online On Home Node Only
Fallover To Next Priority Node
Never Fallback
Prefer Primary Site
service_1
truesyncvg
truelee
service_2
ursasyncvg
ursasyncRR
Verifying the TrueCopy/HUR configuration
Before synchronizing the new cluster configuration, verify the TrueCopy/HUR configuration:
1. To verify the configuration, run the following command:
/usr/es/sbin/cluster/tc/utils/cl_verify_tc_config
2. Correct any configuration errors that are shown.
If you see error messages such as those shown in Figure 14-17, usually these types of
messages indicate that the raidscan command was not run or was run incorrectly. See
step 3 on page 449 in “Creating volume groups and file systems on replicated disks” on
page 447.
3. Run the script again.
Figure 14-17 Error messages found during TrueCopy/HUR replicated resource verification
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
453
Synchronizing the cluster configuration
You must verify the PowerHA SystemMirror Enterprise Edition cluster and the TrueCopy/HUR
configuration before you can synchronize the cluster. To propagate the new TrueCopy/HUR
configuration information and the additional resource group that were created across the
cluster, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select Extended Configuration  Extended Verification and
Synchronization.
3. In the Verify Synchronize or Both field select Synchronize. In the Automatically correct
errors found during verification field select No. Press Enter.
The output is displayed in the SMIT Command Status window.
14.5 Failover testing
This topic explains the basic failover testing of the TrueCopy/HUR replicated resources locally
within the site and across sites. You must carefully plan the testing of the site cluster failover
because it requires more time to manipulate the secondary target LUNs at the recovery site.
Also when testing the asynchronous replication, because of the nature of asynchronous
replication, testing can also impact the data.
These scenarios do not entail performing a redundancy test with the IP networks. Instead you
configure redundant IP or non-IP communication paths to avoid isolation of the sites. The loss
of all the communication paths between sites leads to a partitioned state of the cluster and to
data divergence between sites if the replication links are also unavailable.
Another specific failure scenario is the loss of the replication paths between the storage
subsystems while the cluster is running on both sites. To avoid this situation, configure
redundant communication links for TrueCopy/HUR replication. You must manually recover the
status of the pairs after the storage links are operational again.
Important: PowerHA SystemMirror Enterprise Edition does not trap SNMP notification
events for TrueCopy/HUR storage. If a TrueCopy link goes down when the cluster is up and
the link is repaired later, you must manually resynchronize the pairs.
This topic explains how to perform the following tests for each site and resource group:
454
Graceful site failover for the Austin site
Rolling site failure of the Austin site
Site re-integration for the Austin site
Graceful site failover for the Miami site
Rolling site failure of the Miami site
Site re-integration for the Miami site
IBM PowerHA SystemMirror 7.1 for AIX
Each test, except for the last re-integration test, begins in the same initial state of each site
hosting its own production resource group on the primary node as shown in Example 14-19.
Example 14-19 Beginning of test cluster resource group states
clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------emlecRG
ONLINE
jessica@Austin
OFFLINE
bina@Austin
ONLINE SECONDARY
maddi@Miami
valhallaRG
ONLINE
krod@Miami
OFFLINE
maddi@Miami
ONLINE SECONDARY
bina@Austin
Before each test, we start copying data from another file system to the replicated file systems.
After each test, we verify that the site service IP address is online and new data is in the file
systems. We also had a script that inserts the current time and date into a file on each file
system. Because of the small amounts of I/O in our environment, we were unable to
determine to have lost any data in the asynchronous replication either.
14.5.1 Graceful site failover for the Austin site
Performing a controlled move of a production environment across sites is a basic test to
ensure that the remote site can bring the production environment online. However, this task is
done only during initial implementation testing or during a planned production outage of the
site. You perform the graceful failover operation between sites by performing a resource group
move.
In a true maintenance scenario, you most likely perform this task by stopping the cluster on
the local standby node first. Then you stop the cluster on the production node by using the
Move Resource Group. You perform the following operations during this move:
Releasing the primary online instance of emlecRG at the Austin site
–
–
–
–
Executes application server stop
Unmounts the file systems
Varies off the volume group
Removes the service IP address
Releasing the secondary online instance of emlecRG at the Miami site.
Acquire the emlecRG resource group in the secondary online state at Austin site.
Acquire the emlecRG resource group in the online primary state at the Miami site.
To move the resource group by using SMIT, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path System Management (C-SPOC)  Resource Groups and
Applications  Move a Resource Group to Another Node / Site  Move Resource
Groups to Another Site.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
455
3. In the Move a Resource Group to Another Node / Site panel (Figure 14-18), select the
ONLINE instance of the emlecRG resource group to be moved.
Move a Resource Group to Another Node / Site
Move cursor to desired item and press Enter.
+--------------------------------------------------------------------------+
|
Select Resource Group(s)
|
|
|
| Move cursor to desired item and press Enter. Use arrow keys to scroll.
|
|
|
|
#
|
|
# Resource Group
State
Node(s) / Site |
|
#
|
|
emlecRG
ONLINE
jessica / Austi |
|
emlecRG
ONLINE SECONDARY
maddi / Miami
|
|
valhallarg
ONLINE
krod / Miami
|
|
|
|
#
|
|
# Resource groups in node or site collocation configuration:
|
|
# Resource Group(s)
State
Node / Site
|
|
#
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F8=Image
F10=Exit
Enter=Do
|
F1| /=Find
n=Find Next
|
F9+--------------------------------------------------------------------------+
Figure 14-18 Moving the Austin resource group across to site Miami
4. In the Select a Destination Site panel, select the Miami site as shown in Figure 14-19.
+--------------------------------------------------------------------------+
|
Select a Destination Site
|
|
|
| Move cursor to desired item and press Enter.
|
|
|
|
# *Denotes Originally Configured Primary Site
|
|
Miami
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F8=Image
F10=Exit
Enter=Do
|
F1| /=Find
n=Find Next
|
F9+--------------------------------------------------------------------------+
Figure 14-19 Selecting the site for resource group move
456
IBM PowerHA SystemMirror 7.1 for AIX
5. Verify the information in the final menu and Press Enter.
Upon completion of the move, emlecRG is online on the maddi node at the Miami site as
shown in Example 14-20.
Example 14-20 Resource group status after a move to the Miami site
root@maddi# clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------emlecRG
ONLINE SECONDARY
jessica@Austin
OFFLINE
bina@Austin
ONLINE
maddi@Miami
valhallarg
ONLINE
OFFLINE
OFFLINE
krod@Miami
maddi@Miami
bina@Austin
6. Repeat the resource group move to move it back to its original primary site and node to
return to the original starting state.
Attention: In our environment, after the first resource group move between sites, we were
unable to move the resource group back without leaving the pick list for the destination site
empty. However, we were able to move it back by node, instead of by site. Later in our
testing, the by-site option started working, but it moved it to the standby node at the
primary site instead of the original primary node. If you encounter similar problems, contact
IBM support.
14.5.2 Rolling site failure of the Austin site
In this scenario, you perform a rolling site failure of the Austin site by performing the following
tasks:
1.
2.
3.
4.
Halt the primary production node jessica at the Austin site.
Verify that the resource group emlecRG is acquired locally by the bina node.
Halt the bina node to produce a site down.
Verify that the resource group emlecRG is acquired remotely by the maddi node.
To begin, all four nodes are active in the cluster and the resource groups are online on the
primary node as shown in Example 14-19 on page 455.
1. On the jessica node, run the reboot -q command. The bina node acquires the emlecRG
resource group as shown in Example 14-21.
Example 14-21 Local node failover within the Austin site
root@bina: clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------emlecRG
OFFLINE
jessica@Austin
ONLINE
bina@Austin
OFFLINE
maddi@Miami
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
457
valhallarg
ONLINE
OFFLINE
ONLINE SECONDARY
krod@Miami
maddi@Miami
bina@Austin
2. Run the pairdisplay command (as shown in Example 14-22) to verify that the pairs are
still established because the volume group is still active on the primary site.
Example 14-22 Pairdisplay status after a local site failover
root@bina: pairdisplay -g htcdg01 -IH2 -fe
Group
htcdg01
htcdg01
htcdg01
htcdg01
PairVol(L/R) (Port#,TID,
htcd01(L)
(CL1-E-0, 0,
htcd01(R)
(CL1-B-0, 0,
htcd02(L)
(CL1-E-0, 0,
htcd02(R)
(CL1-B-0, 0,
LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV#
10)45306 272.P-VOL PAIR NEVER ,35764 268
28)35764 268.S-VOL PAIR NEVER ,----- 272
11)45306 273.P-VOL PAIR NEVER ,35764 269
29)35764 269.S-VOL PAIR NEVER ,----- 273
M CTG JID AP
- - 1
- - - - 1
- - -
3. Upon cluster stabilization, run the reboot -q command on the bina node. The maddi node
at the Miami site acquires the emlecRG resource group as shown in Example 14-23.
Example 14-23 Hard failover between sites
root@maddi: clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------emlecRG
OFFLINE
jessica@Austin
OFFLINE
bina@Austin
ONLINE
maddi@Miami
valhallarg
ONLINE
OFFLINE
OFFLINE
krod@Miami
maddi@Miami
bina@Austin
4. Verify that the replicated pairs are now in the suspended state from the command line as
shown in Example 14-24.
Example 14-24 Pairdisplay status after a hard site failover
root@maddi: pairdisplay -g htcdg01 -IH2 -fe
Group
htcdg01
htcdg01
htcdg01
htcdg01
458
PairVol(L/R) (Port#,TID,
htcd01(L) (CL1-B-0, 0,
htcd01(R) (CL1-E-0, 0,
htcd02(L) (CL1-B-0, 0,
htcd02(R) (CL1-E-0, 0,
IBM PowerHA SystemMirror 7.1 for AIX
LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M CTG JID AP
28)35764 268.S-VOL SSUS NEVER ,----- 272 W - - 1
10)45306 272.P-VOL PSUS NEVER ,35764 268 - - - 1
29)35764 269.S-VOL SSUS NEVER ,----- 273 W - - 1
11)45306 273.P-VOL PSUS NEVER ,35764 269 - - - 1
You can also verify that the replicated pairs are in the suspended state by using the
Storage Navigator (Figure 14-20).
Important: Although our testing resulted in a site_down event, we never lost access to
the primary storage subsystem. In a true site failure, including loss of storage,
re-establish the replicated pairs, and synchronize them before moving back to the
primary site. If you must change the storage LUNs, modify the horcm.conf file, and use
the same device group and device names. You do not have to change the cluster
resource configuration.
Figure 14-20 Pairs suspended after a site failover17
14.5.3 Site re-integration for the Austin site
In this scenario, we restart both cluster nodes at the Austin site by using the smitty clstart
command. Upon startup of the primary node jessica, the emlecRG resource group is
automatically gracefully moved back to and returns to the original starting point as shown in
Example 14-19 on page 455.
17
Courtesy of Hitachi Data Systems
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
459
Important: The resource group settings of the Inter-site Management Policy, also known
as the site relationship, dictate the behavior of what occurs upon re-integration of the
primary node. Because we chose Prefer Primary Site, the automatic fallback occurred.
Initially we are unable to restart the cluster on the jessica node because of verification errors
at startup, which are similar to the errors shown in Figure 14-17 on page 453. Of the two
possible reasons for these errors, one reason is that we failed to include starting the horcm
instance on bootup. The second is reason is that we also had to re-map the copy protected
device groups by running the raidscan command again.
Important: Always ensure that the horcm instance is running before rejoining a node into
the cluster. In some cases, if all instances, cluster nodes, or both have been down, you
might need to run the raidscan command again.
14.5.4 Graceful site failover for the Miami site
This move scenario starts from the states shown in Example 14-19 on page 455. You repeat
the steps from the previous three sections, one section at a time. However these steps are
performed to test the asynchronous replication of the Miami site.
The following tasks are performed during this move:
1. Release the primary online instance of valhallaRG at the Miami site.
–
–
–
–
Executes the application server stop.
Unmounts the file systems
Varies off the volume group
Removes the service IP address
2. Release the secondary online instance of valhallaRG at the Austin site.
3. Acquire valhallaRG in the secondary online state at the Miami site.
4. Acquire valhallaRG in the online primary state at the Austin site.
Perform the resource group move by using SMIT as follows:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path System Management (C-SPOC)  Resource Groups and
Applications  Move a Resource Group to Another Node / Site  Move Resource
Groups to Another Site.
3. Select the ONLINE instance of valhallaRG to be moved.
4. Select the Austin site from the pop-up menu.
5. Verify the information in the final menu and press Enter.
Upon completion of the move, the valhallaRG resource group is online on the bina node at
the Austin site. The resource group is online secondary on the local production krod node
at the Miami site as shown in Example 14-25.
Example 14-25 Resource group status after moving to the Austin site
root@bina: clRGinfo
Group Name
Group State
Node
----------------------------------------------------------------------------emlecRG
ONLINE
jessica@Austin
OFFLINE
bina@Austin
ONLINE SECONDARY
maddi@Miami
460
IBM PowerHA SystemMirror 7.1 for AIX
valhallarg
ONLINE SECONDARY
OFFLINE
ONLINE
krod@Miami
maddi@Miami
bina@Austin
6. Repeat these steps to move a resource group back to the original primary krod node at
the Miami site.
Attention: In our environment, after the first resource group move between sites, we were
unable to move the resource group back without leaving the pick list for the destination site
empty. However, we were able to move it back by node, instead of by site. Later in our
testing, the by-site option started working, but it moved it to the standby node at the
primary site instead of the original primary node. If you encounter similar problems, contact
IBM support.
14.5.5 Rolling site failure of the Miami site
In this scenario, you perform a rolling site failure of the Miami site by performing the following
tasks:
1.
2.
3.
4.
Halt primary production node krod at site Miami
Verify resource group valhallaRG is acquired locally by node maddi
Halt node maddi to produce a site down
Verify resource group valhallaRG is acquired remotely by node bina
To begin, all four nodes are active in the cluster, and the resource groups are online on the
primary node as shown in Example 14-19 on page 455. Follow these steps:
1. On the krod node, run the reboot -q command. The maddi node brings the valhallaRG
resource group online, and the remote bina node maintains the online secondary status as
shown in Example 14-26. This time the failover time was noticeably longer, specifically in
the fsck portion. The longer amount of time is most likely a symptom of the asynchronous
replication.
Example 14-26 Local node fallover within the Miami site
root@maddi: clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------emlecRG
ONLINE
jessica@Austin
OFFLINE
bina@Austin
ONLINE SECONDARY
maddi@Miami
valhallarg
OFFLINE
ONLINE
ONLINE SECONDARY
krod@Miami
maddi@Miami
bina@Austin
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
461
2. Run the pairdisplay command as shown in Example 14-27 to verify that the pairs are still
established because the volume group is still active on the primary site.
Example 14-27 Status using the pairdisplay command after the local Miami site fallover
root@maddi: pairdisplay -fd -g hurdg01 -IH2 -CLI
Group
PairVol L/R Device_File
Seq# LDEV# P/S Status
hurdg01 hurd01 L
hdisk40
35764
270 P-VOL PAIR
hurdg01 hurd01 R
hdisk40
45306
274 S-VOL PAIR
hurdg01 hurd02 L
hdisk41
35764
271 P-VOL PAIR
hurdg01 hurd02 R
hdisk41
45306
275 S-VOL PAIR
Fence Seq# P-LDEV#
ASYNC 45306
274
ASYNC
270
ASYNC 45306
275
ASYNC
271
M
-
3. Upon cluster stabilization, run the reboot -q command on the maddi node. The bina node
at the Austin sites acquires the valhallaRG resource group as shown in Example 14-28.
Example 14-28 Hard failover from Miami site to Austin site
root@bina: clRGinfo
----------------------------------------------------------------------------Group Name
Group State
Node
----------------------------------------------------------------------------emlecRG
ONLINE
jessica@Austin
OFFLINE
bina@Austin
OFFLINE
maddi@Miami
valhallarg
OFFLINE
OFFLINE
ONLINE
krod@Miami
maddi@Miami
bina@Austin
Important: Although our testing resulted in a site_down event, we never lost access to
the primary storage subsystem. In a true site failure, including loss of storage,
re-establish the replicated pairs, and synchronize them before moving back to the
primary site. If you must change the storage LUNs, modify the horcm.conf file, and use
the same device group and device names. You do not have to change the cluster
resource configuration.
14.5.6 Site re-integration for the Miami site
In this scenario, we restart both cluster nodes at the Miami site by using the smitty clstart
command. Upon startup of the primary node krod, the valhallaRG resource group is
automatically gracefully moved back to and returns to the original starting point as shown in
Example 14-19 on page 455.
Important: The resource group settings of the Inter-site Management Policy, also known
as the site relationship, dictate the behavior of what occurs upon re-integration of the
primary node. Because we chose Prefer Primary Site policy, the automatic fallback
occurred.
Initially we are unable to restart the cluster on the jessica node because of verification errors
at startup, which are similar to the errors shown in Figure 14-17 on page 453. Of the two
possible reasons for these errors, the first reason is that we failed to include starting the horcm
instance on bootup. The second is reason is that we also had to re-map the copy protected
device groups by running the raidscan command again.
462
IBM PowerHA SystemMirror 7.1 for AIX
Important: Always ensure that the horcm instance is running before rejoining a node into
the cluster. In some cases, if all instances, cluster nodes, or both have been down, you
might need to run the raidscan command again.
14.6 LVM administration of TrueCopy/HUR replicated pairs
This topic explains common scenarios for adding additional storage to an existing replicated
environment using Hitachi TrueCopy/HUR. In this scenario, you only work with the Austin site
and the emlecRG resource group in a TrueCopy synchronous replication. Overall the steps are
the same for both types of replication. The difference is the initial pair creation. You perform
the following tasks:
Adding LUN pairs to an existing volume group
Adding a new logical volume
Increasing the size of an existing file system
Adding a LUN pair to a new volume group
Important: This topic does not explain how to dynamically expand a volume through
Hitachi Logical Unit Size Expansion (LUSE) because this option is not supported.
14.6.1 Adding LUN pairs to an existing volume group
In this task, you assign a new LUN to each site as you did in 14.4.1, “Assigning LUNs to the
hosts (host groups)” on page 429. Table 14-2 shows a summary of the LUNs that are used.
Before continuing, the LUNs must already be established in a paired relationship, and the
LUNs or hdisk must be available on the appropriate cluster nodes.
Table 14-2 Summary of the LUNs implemented
Austin - Hitachi USPV - 45306
Miami - Hitachi USPVM - 35764
Port
CL1-E
Port
CL-1B
CU
01
CU
01
LUN
000E
LUN
001B
LDEV
01:14
LDEV
01:1F
jessica hdisk#
hdisk42
krod hdisk#
hdisk42
bina hdisk#
hdisk42
maddi hdisk#
hdisk42
Then follow the same steps from of defining new LUNs as follows:
1. Run the cfgmgr command on the primary node jessica.
2. Assign the PVID on the jessica node.
chdev -l hdisk42 -a pv=yes
3. Run the pairsplit command on the replicated LUNs.
4. Run the cfgmgr command on each of the remaining three nodes.
5. Verify that the PVID shows up on each node by using the lspv command.
6. Run the pairresync command on the replicated LUNs.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
463
7. Shut down the horcm2 instance on each node:
/HORCM/usr/bin/horcmshutdown.sh 2
8. Edit the /etc/horcm2.conf file on each node as appropriate for each site:
– The krod and maddi nodes on the Miami site added the following new line:
htcdg01
htcd03
35764
01:1F
– The jessica and bina nodes on the Austin site added the following new line:
htcdg01
htcd03
45306
01:14
9. Restart horcm2 instance on each node:
/HORCM/usr/bin/horcmstart.sh 2
10.Map the devices and device group on any node:
lsdev -Cc disk|grep hdisk|/HORCM/usr/bin/raidscan -IH2 -find inst
We ran this command on the jessica node.
11.Verify that the htcgd01 device group pairs are now showing the new pairs, which consist of
hdisk42 on each system as shown in Example 14-29.
Example 14-29 New LUN pairs in the htcgd01 device group
root@jessica: pairdisplay -fd -g htcdg01 -IH2 -CLI
Group
PairVol L/R Device_File
Seq# LDEV# P/S Status
htcdg01 htcd01 L
hdisk38
45306
272 P-VOL PAIR
htcdg01 htcd01 R
hdisk38
35764
268 S-VOL PAIR
htcdg01 htcd02 L
hdisk39
45306
273 P-VOL PAIR
htcdg01 htcd02 R
hdisk39
35764
269 S-VOL PAIR
htcdg01 htcd03 L
hdisk42
45306
276 P-VOL PAIR
htcdg01 htcd03 R
hdisk42
35764
287 S-VOL PAIR
Fence Seq# P-LDEV#
NEVER 35764
268
NEVER
272
NEVER 35764
269
NEVER
273
NEVER 35764
287
NEVER
276
M
-
You are now ready to use C-SPOC to add the new disk into the volume group:
Important: You cannot use C-SPOC for the following LVM operations to configure nodes at
the remote site that contain the target volume:
Creating a volume group
Operations that require nodes at the target site to write to the target volumes
For example, changing the file system size, changing the mount point, or adding LVM
mirrors cause an error message in C-SPOC. However, nodes on the same site as the
source volumes can successfully perform these tasks. The changes are then
propagated to the other site by using a lazy update.
For C-SPOC operations to work on all other LVM operations, perform all C-SPOC
operations with the (TrueCopy/HUR) volume pairs in the Synchronized or Consistent states
or the cluster ACTIVE on all nodes.
1. From the command line, type the smitty cl_admin command.
2. In SMIT, select the path System Management (C-SPOC)  Storage  Volume
Groups  Add a Volume to a Volume Group
3. Select the volume group truesyncvg from the pop-up menu.
464
IBM PowerHA SystemMirror 7.1 for AIX
4. Select hdisk42 as shown in Figure 14-21.
Set Characteristics of a Volume Group
Move cursor to desired item and press Enter.
Add a Volume to a Volume Group
Change/Show characteristics of a Volume Group
Remove a Volume from a Volume Group
+--------------------------------------------------------------------------+
|
Physical Volume Names
|
|
|
| Move cursor to desired item and press Enter.
|
|
|
|
000a621aaf47ce83 ( hdisk2 on nodes bina,jessica )
|
|
000a621aaf47ce83 ( hdisk3 on nodes krod,maddi )
|
|
000cf1da43e72fc2 ( hdisk5 on nodes bina,jessica )
|
|
000cf1da43e72fc2 ( hdisk6 on nodes krod,maddi )
|
|
00cb14ce74090ef3 ( hdisk42 on all selected nodes )
|
|
00cb14ceb0f5bd25 ( hdisk4 on nodes bina,jessica )
|
|
00cb14ceb0f5bd25 ( hdisk14 on nodes krod,maddi )
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F8=Image
F10=Exit
Enter=Do
|
F1| /=Find
n=Find Next
|
F9+--------------------------------------------------------------------------+
Figure 14-21 Selecting a disk to add to the volume group
5. Verify the menu information, as shown in Figure 14-22, and press Enter.
Add a Volume to a Volume Group
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
VOLUME GROUP name
Resource Group Name
Node List
Reference node
VOLUME names
[Entry Fields]
truesyncvg
emlecRG
bina,jessica,krod,mad>
bina
hdisk42
Figure 14-22 Adding a volume to a volume group
The krod node does not need the volume group because it is not a member of the resource
group. However, we started with all four nodes seeing all volume groups and decided to leave
the configuration that way. This way we have additional flexibility later if we need to change
the cluster configuration to allow the krod node to take over as a last resort.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
465
Upon completion of the C-SPOC operation, all four nodes now have the new disk as a
member of the volume group as shown in Example 14-30.
Example 14-30 New disk added to the volume group on all nodes
root@jessica: lspv |grep truesyncvg
hdisk38
00cb14ce564c3f44
hdisk39
00cb14ce564c40fb
hdisk42
00cb14ce74090ef3
truesyncvg
truesyncvg
truesyncvg
root@bina: lspv |grep truesyncvg
hdisk38
00cb14ce564c3f44
hdisk39
00cb14ce564c40fb
hdisk42
00cb14ce74090ef3
truesyncvg
truesyncvg
truesyncvg
root@krod: lspv |grep truesyncvg
hdisk38
00cb14ce564c3f44
hdisk39
00cb14ce564c40fb
hdisk42
00cb14ce74090ef3
truesyncvg
truesyncvg
truesyncvg
root@maddi: lspv |grep truesyncvg
hdisk38
00cb14ce564c3f44
hdisk39
00cb14ce564c40fb
truesyncvg
truesyncvg
hdisk42
truesyncvg
00cb14ce74090ef3
active
active
active
We do not need to synchronize the cluster because all of these changes are made to an
existing volume group. However, you might want to run the cl_verify_tc_config command to
verify the resources replicated correctly.
14.6.2 Adding a new logical volume
To perform this task, again you use C-SPOC, which updates the local nodes within the site.
For the remote site, when a failover occurs, the lazy update process updates the volume
group information as needed. This process also adds a bit of extra time to the failover time.
To add a new logical volume:
1. From the command line, type the smitty cl_admin command.
2. In SMIT, select the path System Management (C-SPOC)  Storage  Logical
Volumes  Add a Logical Volume.
3. Select the truesyncvg volume group from the pop-up menu.
466
IBM PowerHA SystemMirror 7.1 for AIX
4. Choose the newly added disk hdisk42 as shown in Figure 14-23.
Logical Volumes
Move cursor to desired item and press Enter.
List All Logical Volumes by Volume Group
Add a Logical Volume
Show Characteristics of a Logical Volume
Set Characteristics of a Logical Volume
+--------------------------------------------------------------------------+
|
Physical Volume Names
|
|
|
| Move cursor to desired item and press F7.
|
|
ONE OR MORE items can be selected.
|
| Press Enter AFTER making all selections.
|
|
|
|
Auto-select
|
|
jessica hdisk38
|
|
jessica hdisk39
|
|
jessica hdisk42
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F7=Select
F8=Image
F10=Exit
|
F1| Enter=Do
/=Find
n=Find Next
|
F9+--------------------------------------------------------------------------+
Figure 14-23 Selecting a disk for new logical volume creation
5. Complete the information in the final menu and press Enter.
We added a new logical volume, named micah, which consists of 50 logical partitions
(LPARs) and selected a type of raw. We accepted the default values for all other fields as
shown in Figure 14-24.
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[TOP]
Resource Group Name
VOLUME GROUP name
Node List
Reference node
* Number of LOGICAL PARTITIONS
PHYSICAL VOLUME names
Logical volume NAME
Logical volume TYPE
POSITION on physical volume
RANGE of physical volumes
MAXIMUM NUMBER of PHYSICAL VOLUMES
to use for allocation
Number of COPIES of each logical
[Entry Fields]
emlecRG
truesyncvg
bina,jessica,krod,mad>
jessica
[50]
#
hdisk42
[micah]
[raw]
+
outer_middle
+
minimum
+
[]
#
1
+
Figure 14-24 Defining a new logical volume
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
467
6. Upon completion of the C-SPOC operation, verify that the new logical was created locally
on the jessica node as shown Example 14-31.
Example 14-31 Newly created logical volume
root@jessica: lsvg -l truesyncvg
truesyncvg:
LV NAME
TYPE
LPs
oreolv
jfs2
125
majorlv
jfs2
125
truefsloglv
jfs2log
1
micah
raw
50
PPs
125
125
1
50
PVs
1
1
1
1
LV STATE
closed/syncd
closed/syncd
closed/syncd
closed/syncd
MOUNT POINT
/oreofs
/majorfs
N/A
N/A
14.6.3 Increasing the size of an existing file system
To perform this task, again you use C-SPOC, which updates the local nodes within the site.
For the remote site, when a failover occurs, the lazy update process updates the volume
group information as needed. This process also adds a bit of extra time to the failover time.
To increase the size of an existing file system, follow these steps:
1. From the command line, type the smitty cl_admin command.
2. In SMIT, select the path System Management (C-SPOC)  Storage  File Systems 
Change / Show Characteristics of a File System.
3. Select the oreofs file system from the pop-up menu.
4. Complete the information in the final menu as desired and press Enter.
In this scenario, we roughly tripled the size of the file system from 500 MB (125 LPARs), as
shown in Example 14-31, to 1536 MB as shown in Figure 14-25.
Change/Show Characteristics of a Enhanced Journaled File System
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[TOP]
Volume group name
Resource Group Name
* Node Names
* File system name
NEW mount point
SIZE of file system
Unit Size
Number of units
Mount GROUP
Mount AUTOMATICALLY at system restart?
PERMISSIONS
Mount OPTIONS
Figure 14-25 Changing the file system size
468
IBM PowerHA SystemMirror 7.1 for AIX
[Entry Fields]
truesyncvg
emlecRG
krod,maddi,bina,jessi>
/oreofs
[/oreofs]
M
[1536]
[]
no
read/write
[]
/
+
#
+
+
5. Upon completion of the C-SPOC operation, verify the new file system size locally on the
jessica node as shown in Example 14-32.
Example 14-32 Newly increased file system size
root@jessica: lsvg -l truesyncvg
truesyncvg:
LV NAME
TYPE
LPs
oreolv
jfs2
384
majorlv
jfs2
125
truefsloglv
jfs2log
1
micah
raw
50
PPs
384
125
1
50
PVs
1
1
1
1
LV STATE
closed/syncd
closed/syncd
closed/syncd
closed/syncd
MOUNT POINT
/oreofs
/majorfs
N/A
N/A
You do not need to synchronize the cluster because all of these changes are made to an
existing volume group. However, you might want to make sure that the replicated resources
verify correctly. Use the cl_verify_tc_config command first to isolate the replicated
resources specifically.
Testing failover after making the LVM changes
Because you do not know if the cluster is going to work when needed, repeat the steps from
14.5.2, “Rolling site failure of the Austin site” on page 457. The new logical volume micah and
the additional space on /oreofs show up on each node. However, there is a noticeable
difference in the total time involved during the site failover when the lazy update was
performed to update the volume group changes.
14.6.4 Adding a LUN pair to a new volume group
The steps for adding a new volume are the same as the steps in 14.6.1, “Adding LUN pairs to
an existing volume group” on page 463. The differences are that you are creating a volume
group, which is required to add a new volume group into a resource group. For completeness,
the initial steps are documented here along with an overview of the new LUNs to be used:
1. Run the cfgmgr command on the primary node jessica.
2. Assign the PVID on the jessica node:
chdev -l hdisk43 -a pv=yes
3. Run the pairsplit command on the replicated LUNs.
4. Run the cfgmgr command on each of the remaining three nodes.
5. Verify that the PVID shows up on each node by using the lspv command.
6. Run the pairresync command on the replicated LUNs.
7. Shut down the horcm2 instance on each node:
/HORCM/usr/bin/horcmshutdown.sh 2
8. Edit the /etc/horcm2.conf file on each node as appropriate for each site:
– On the Miami site, the krod and maddi nodes added the following new line:
htcdg01
htcd04
45306
00:20
– On the Austin site, the jessica and bina nodes added the following new line:
htcdg01
htcd04
35764
00:0A
9. Restart the horcm2 instance on each node:
/HORCM/usr/bin/horcmstart.sh 2
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
469
10.Map the devices and device group on any node. We ran the raidscan command on the
jessica node. See Table 14-3 for additional configuration details.
lsdev -Cc disk|grep hdisk|/HORCM/usr/bin/raidscan -IH2 -find inst
Table 14-3 Details on the Austin and Miami LUNs
Austin - Hitachi USPV - 45306
Miami - Hitachi USPVM - 35764
Port
CL1-E
Port
CL-1B
CU
00
CU
00
LUN
000F
LUN
0021
LDEV
00:20
LDEV
00:0A
jessica hdisk#
hdisk43
krod hdisk#
hdisk43
bina hdisk#
hdisk43
maddi hdisk#
hdisk43
11.Verify that the htcgd01 device group pairs are now showing the new pairs that consist of
hdisk42 on each system as shown in Example 14-33.
Example 14-33 New LUN pairs add to htcgd01 device group
root@jessica: pairdisplay -fd -g htcdg01 -IH2 -CLI
Group
PairVol L/R Device_File
Seq# LDEV# P/S Status
htcdg01 htcd01 L
hdisk38
45306
272 P-VOL PAIR
htcdg01 htcd01 R
hdisk38
35764
268 S-VOL PAIR
htcdg01 htcd02 L
hdisk39
45306
273 P-VOL PAIR
htcdg01 htcd02 R
hdisk39
35764
269 S-VOL PAIR
htcdg01 htcd04 L
hdisk43
45306
32 P-VOL PAIR
htcdg01 htcd04 R
hdisk43
35764
10 S-VOL PAIR
Fence Seq# P-LDEV#
NEVER 35764
268
NEVER
272
NEVER 35764
269
NEVER
273
NEVER 35764
10
NEVER
32
You are now ready to use C-SPOC to create a volume group:
1. From the command line, type the smitty cl_admin command.
2. In SMIT, select the path System Management (C-SPOC)  Storage  Volume
Groups  Create a Volume to a Volume Group.
470
IBM PowerHA SystemMirror 7.1 for AIX
M
-
3. In the Node Names panel, select the specific nodes. We chose all four as shown in
Figure 14-26.
Volume Groups
Move cursor to desired item and press Enter.
List All Volume Groups
Create a Volume Group
Create a Volume Group with Data Path Devices
+--------------------------------------------------------------------------+
|
Node Names
|
|
|
| Move cursor to desired item and press F7.
|
|
ONE OR MORE items can be selected.
|
| Press Enter AFTER making all selections.
|
|
|
| > bina
|
| > jessica
|
| > krod
|
| > maddi
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F7=Select
F8=Image
F10=Exit
|
F1| Enter=Do
/=Find
n=Find Next
|
F9+--------------------------------------------------------------------------+
Figure 14-26 Selecting a volume group node
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
471
4. In the Physical Volume Names panel (Figure 14-27), select hdisk43.
Volume Groups
Move cursor to desired item and press Enter.
List All Volume Groups
+--------------------------------------------------------------------------+
|
Physical Volume Names
|
|
|
| Move cursor to desired item and press F7.
|
|
ONE OR MORE items can be selected.
|
| Press Enter AFTER making all selections.
|
|
|
|
000a621aaf47ce83 ( hdisk2 on nodes bina,jessica )
|
|
000a621aaf47ce83 ( hdisk3 on nodes krod,maddi )
|
|
000cf1da43e72fc2 ( hdisk5 on nodes bina,jessica )
|
|
000cf1da43e72fc2 ( hdisk6 on nodes krod,maddi )
|
|
00cb14ce75bab41a ( hdisk43 on all selected nodes )
|
|
00cb14ceb0f5bd25 ( hdisk4 on nodes bina,jessica )
|
|
00cb14ceb0f5bd25 ( hdisk14 on nodes krod,maddi )
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F7=Select
F8=Image
F10=Exit
|
F1| Enter=Do
/=Find
n=Find Next
|
F9+--------------------------------------------------------------------------+
Figure 14-27 Selecting an hdisk for a new volume group
472
IBM PowerHA SystemMirror 7.1 for AIX
5. In the Volume Group Type panel, select the volume group type. We chose Scalable as
shown in Figure 14-28.
Volume Groups
Move cursor to desired item and press Enter.
List All Volume Groups
Create a Volume Group
Create a Volume Group with Data Path Devices
Set Characteristics of a Volume Group
+--------------------------------------------------------------------------+
|
Volume Group Type
|
|
|
| Move cursor to desired item and press Enter.
|
|
|
|
Legacy
|
|
Original
|
|
Big
|
|
Scalable
|
|
|
| F1=Help
F2=Refresh
F3=Cancel
|
| F8=Image
F10=Exit
Enter=Do
|
F1| /=Find
n=Find Next
|
F9+--------------------------------------------------------------------------+
Figure 14-28 Selecting the volume group type for a new volume group
6. In the Create a Scalable Volume Group panel, select the proper resource group. We chose
emlecRG as shown in Figure 14-29.
Create a Scalable Volume Group
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[TOP]
Node Names
Resource Group Name
PVID
VOLUME GROUP name
Physical partition SIZE in megabytes
Volume group MAJOR NUMBER
Enable Cross-Site LVM Mirroring Verification
Enable Fast Disk Takeover or Concurrent Access
Volume Group Type
Maximum Physical Partitions in units of 1024
Maximum Number of Logical Volumes
[Entry Fields]
bina,jessica,krod,mad>
[emlecRG]
00cb14ce75bab41a
[truetarahvg]
4
[58]
false
no
Scalable
32
256
+
+
#
+
+
+
+
Figure 14-29 Create a volume group final C-SPOC SMIT menu
7. Choose a volume group name. We chose truetarahvg. Press Enter.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
473
8. Verify that the volume group is successfully created, which we do on all four nodes as
shown in Example 14-34.
Example 14-34 Newly created volume group on all nodes
root@jessica: lspv |grep truetarahvg
hdisk43
00cb14ce75bab41a
truetarahvg
root@bina: lspv |grep truetarahvg
hdisk43
00cb14ce75bab41a
truetarahvg
root@krod: lspv |grep truetarahvg
hdisk43
00cb14ce75bab41a
truetarahvg
root@maddi: lspv |grep truetarahvg
hdisk43
00cb14ce75bab41a
truetarahvg
When creating the volume group, the volume group is automatically added to the resource
group as shown in Example 14-35. However, we do not have to change the resource
group any further, because the new disk and device are added to the same device group
and TrueCopy/HUR replicated resource.
Example 14-35 Newly added volume group also added to the resource group
Resource Group Name
Participating Node Name(s)
Startup Policy
Fallover Policy
Fallback Policy
Site Relationship
Node Priority
Service IP Label
Volume Groups
Hitachi TrueCopy Replicated Resources
emlecRG
jessica bina maddi
Online On Home Node Only
Fallover To Next Priority Node
Never Fallback
Prefer Primary Site
service_1
truesyncvg truetarahvg
truelee
9. Repeat the steps in 14.6.2, “Adding a new logical volume” on page 466, to create a new
logical volume, named tarahlv on the newly created volume group truetarahvg.
Example 14-36 shows the new logical volume.
Example 14-36 New logical volume on newly added volume group
root@jessica: lsvg -l truetarahvg
truetarahvg:
LV NAME
TYPE
LPs
tarahlv
raw
25
PPs
25
PVs
1
LV STATE
closed/syncd
MOUNT POINT
N/A
10.Manually run the cl_verify_tc_config command to verify that the new addition of the
replicated resources is complete.
474
IBM PowerHA SystemMirror 7.1 for AIX
Important: During our testing, we encountered a defect after the second volume group
was added to the resource group. The cl_verify_tc_config command produced the
following error messages:
cl_verify_tc_config: ERROR - Disk hdisk38 included in Device Group htcdg01 does
not match any hdisk in Volume Group truetarahvg.
cl_verify_tc_config: ERROR - Disk hdisk39 included in Device Group htcdg01 does
not match any hdisk in Volume Group truetarahvg.
cl_verify_tc_config: ERROR - Disk hdisk42 included in Device Group htcdg01 does
not match any hdisk in Volume Group truetarahvg.
Errors found verifying the HACMP TRUECOPY/HUR configuration. Status=3
These results incorrectly imply a one to one relationship between the device
group/replicated resource and the volume group, which is not intended. To work around
this problem, ensure that the cluster is down, do a forced synchronization, and then start
the cluster but ignore the verification errors. Usually performing both a forced
synchronization and then starting the cluster ignoring errors is not recommended. Contact
IBM support to see if a fix is available.
Synchronize the resource group change to include the new volume that you just added.
Usually you can perform this task within a running cluster. However, because of the defect
mentioned in the previous Important box, we had to have the cluster down to synchronize it.
To perform this task, follow these steps:
1. From the command line, type the smitty hacmp command.
2. In SMIT, select the path Extended Configuration  Extended Verification and
Synchronization and Verification
3. In the HACMP Verification and Synchronization display (Figure 14-30), for Force
synchronization if verification fails, select Yes.
HACMP Verification and Synchronization
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Verify, Synchronize or Both
* Automatically correct errors found during
verification?
* Force synchronization if verification fails?
* Verify changes only?
* Logging
F1=Help
F5=Reset
F2=Refresh
F6=Command
[Entry Fields]
[Both]
[No]
+
+
[Yes]
[No]
[Standard]
+
+
+
F3=Cancel
F7=Edit
F4=List
F8=Image
Figure 14-30 Extended Verification and Synchronization SMIT menu
4. Verify the information is correct, and press Enter. Upon completion, the cluster
configuration is in sync and can now be tested.
5. Repeat the steps for a rolling system failure as explained in 14.5.2, “Rolling site failure of
the Austin site” on page 457. In this scenario, the tests are successful.
Chapter 14. Disaster recovery using Hitachi TrueCopy and Universal Replicator
475
Testing failover after adding a new volume group
Because you do not know if the cluster is going to work when needed, repeat the steps of a
rolling site failure as explained in 14.5.2, “Rolling site failure of the Austin site” on page 457.
The new volume group truetarahvg and new logical volume tarahlv are displayed on each
node. However, there is a noticeable difference in total time involved during the site failover
when the lazy update is performed to update the volume group changes.
476
IBM PowerHA SystemMirror 7.1 for AIX
A
Appendix A.
CAA cluster commands
This appendix provides a list of the Cluster Aware AIX (CAA) administration commands, and
examples of how to use them. The information about these commands has been gathered
from the new AIX man pages and placed in this appendix for your reference. This list is not an
exhaustive list of all new commands, but focuses on commands that you might come across
during the administration of your PowerHA cluster.
This appendix includes the following topics:
The lscluster command
The mkcluster command
The rmcluster command
The chcluster command
The clusterconf command
© Copyright IBM Corp. 2011. All rights reserved.
477
The lscluster command
The lscluster command lists the cluster configuration information.
Syntax
lscluster -i [ -n ] | -s | -m | -d | -c
Description
The lscluster command shows the attributes that are associated with the cluster and the
cluster configuration.
Flags
-i
Lists the cluster configuration interfaces on the local node.
-n
Allows the cluster name to be queried for all interfaces (applicable only with the -i
flag).
-s
Lists the cluster network statistics on the local node.
-m
Lists the cluster node configuration information.
-d
Lists the cluster storage interfaces.
-c
Lists the cluster configuration.
Examples
To list the cluster configuration for all nodes, enter the following command:
lscluster -m
To list the cluster statistics for the local node, enter the following command:
lscluster -s
To list the interface information for the local node, enter the following command:
lscluster -i
To list the interface information for the cluster, enter the following command:
lscluster -i -n mycluster
To list the storage interface information for the cluster, enter the following command:
lscluster -d
To list the cluster configuration, enter the following command:
lscluster -c
The mkcluster command
The mkcluster command creates a cluster.
Syntax
mkcluster [ -n clustername ] [ -m node[,...] ] -r reposdev [-d shareddisk [,...]]
[-s multaddr_local ] [-v ]
478
IBM PowerHA SystemMirror 7.1 for AIX
Description
The mkcluster command creates a cluster. Each node that is added to the cluster must have
common storage area network (SAN) storage devices that are configured and zoned
appropriately. The SAN storage devices are used for the cluster repository disk and for any
clustered shared disks. (The shared disks that are added to a cluster configuration share the
same name across all the nodes in the cluster.)
A multicast address is used for cluster communications between the nodes in the cluster.
Therefore, if any network considerations must be reviewed before creating a cluster, consult
your network systems administrator.
Flags
-n clustername
Sets the name of the local cluster being created. If no name is
specified when you first run the mkcluster command, a default of
SIRCOL_hostname is used, where hostname is the name
(gethostname()) of the local host.
-m node[,...]
Lists the comma-separated resolvable host names or IP addresses for
nodes that are members of the cluster. The local host must be
included in the list. If the -m option is not used, the local host is implied,
causing a one-node local cluster to be created.
-r reposdev
Specifies the name, such as hdisk10, of the SAN-shared storage
device that is used as the central repository for the cluster
configuration data. This device must be accessible from all nodes.
This device is required to be a minimum of 1 GB in size and backed by
a redundant and highly available SAN configuration. This flag is
required when you first run the mkcluster command within a Storage
Interconnected Resource Collection (SIRCOL), and cannot be used
thereafter.
-d shareddisk[,...] Specifies a comma-separated list of SAN-shared storage devices,
such as hdisk12,hdisk34, to be incorporated into the cluster
configuration.
These devices are renamed with a cldisk prefix. The same name is
assigned to this device on all cluster nodes from which the device is
accessible. Specified devices must not be open when the mkcluster
command is executed. This flag is used only when you first run the
mkcluster command.
-s multaddr_local
Sets the multicast address of the local cluster that is being created.
This address is used for internal communication within the local
cluster. If the -s option is not specified when you first run the
mkcluster command within a SIRCOL, a multicast address is
automatically generated. This flag is used only when you first run the
mkcluster command within a SIRCOL.
-v
Specifies the verbose mode.
Examples
To create a cluster of one node and use the default values, enter the following command:
mkcluster -r hdisk1
The output is a cluster named SIRCOL_myhostname with a single node in the cluster. The
multicast address is automatically generated, and no shared disks are created for this
cluster. The repository device is set up on hdisk1, and this disk cannot be used by the
Appendix A. CAA cluster commands
479
node for any other purpose. The repository device is now dedicated to being the cluster
repository disk.
To create a multinode cluster, enter the following command:
mkcluster -n mycluster -m nodeA,nodeB,nodeC -r hdisk1 -d
hdisk10,hdisk11,hdisk12
The output is a cluster of three nodes and uses the default values. The output also creates
a cluster with the specified name, and the multicast address is automatically created.
Three disks are created as shared clustered disks for this cluster, and these disks share
the same name across all the nodes in this cluster. You can run the lspv command to see
the new names after the cluster is created. The repository device is set up on hdisk1 and
cannot be used by any of the nodes for any other purpose. The repository device is now
dedicated to being the cluster repository disk. A volume group is created for the cluster
repository disk. These logical volumes are used exclusively by the clustering subsystem.
The rmcluster command
The rmcluster command removes the cluster configuration.
Syntax
rmcluster -n name [-f] [-v]
Description
The rmcluster command removes the cluster configuration. The repository disk and all SAN
Volume Controller (SVC) shared disks are released, and the SAN shared disks are
re-assigned to a generic hdisk name. The generic hdisk name cannot be the same name that
was initially used to add the disk to the cluster.
Flags
-n name
Specifies the name of the cluster to be removed.
-f
Forces certain errors to be ignored.
-v
Specifies the verbose.
Example
To remove the cluster configuration, enter the following command:
rmcluster -n mycluster
The chcluster command
The chcluster command is used to change the cluster configuration.
Syntax
chcluster [ -n name ] [{ -d | -m } [+|-] name [,....]] ..... [ -q ][ -f ][ -v ]
Description
The chcluster command changes the cluster configuration. With this command, SAN shared
disks and nodes can be added and removed from the cluster configuration.
480
IBM PowerHA SystemMirror 7.1 for AIX
Flags
-d [+|-]shareddisk[,...]
Specifies a comma-separated list of shared storage-device names to
be added to or removed from a cluster configuration. The new shared
disks are renamed with a cldisk prefix. The same name is assigned to
this device on all cluster nodes from which the device can be
accessed. Deleted devices are re-assigned a generic hdisk name.
This newly reassigned hdisk name might not be the same as it was
before it was added to the cluster configuration. The shared disks
must not be open when the chcluster command is executed.
-m [+|-]node[,...]
Specifies a comma-separated list of node names to be added or
removed from the cluster configuration.
-n name
Specifies the name of the cluster to be changed. If omitted, the default
cluster is used.
-q
The quick mode option, which performs the changes on the local node
only. If this option is used, the other nodes in the cluster configuration
are asynchronously contacted and the changes are performed.
-f
The force option, which causes certain errors to be ignored.
-v
Verbose mode
Examples
To add shared disks to the cluster configuration, enter the following command:
chcluster -n mycluster -d +hdisk20,+hdisk21
To remove shared disks from the cluster configuration, enter the following command:
chcluster -n mycluster -d -hdisk20,-hdisk21
To add nodes to the cluster configuration, enter the following command:
chcluster -n mycluster -m +nodeD,+nodeE
To remove nodes from the cluster configuration, enter the following command:
chcluster -n mycluster -m -nodeD,-nodeE
The clusterconf command
The clusterconf command is a service utility for administration of a cluster configuration.
Syntax
clusterconf [ -u [-f ] | -s | -r hdiskN ] [-v ]
Description
The clusterconf command allows administration of the cluster configuration. A node in a
cluster configuration might indicate a status of DOWN (viewable by issuing the lscluster -m
command). Alternatively, a node in a cluster might not be displayed in the cluster configuration,
and you know the node is part of the cluster configuration (viewable from another node in the
cluster by using the lscluster -m command). In these cases, the following flags allow the
node to search and read the repository disk and take self-correcting actions.
Do not use the clusterconf command option to remove a cluster configuration. Instead, use
the rmcluster command for normal removal of the cluster configuration.
Appendix A. CAA cluster commands
481
Flags
If no flags are specified, the clusterconf command performs a refresh operation by retrieving
the cluster repository configuration and performing the necessary actions. The following
actions might occur:
A cluster node joins a cluster of which the node is a member and for some reason was
disconnected from the cluster (either from network or SAN problems)
A cluster node might perform a resync with the cluster repository configuration (again from
some problems in the network or SAN)
A cluster node might leave the cluster configuration if the node was removed from the
cluster repository configuration.
The clusterconf command is a normal cluster service and is automatically handled during
normal operation. This following flags are possible for this command:
-r hdiskN
Has the cluster subsystem read the repository device if you know where the
repository disk is (lspv and look for cvg). It causes the node to join the cluster if
the node is configured in the repository disk.
-s
Performs an exhaustive search for a cluster repository disk on all configured
hdisk devices. It stops when a cluster repository disk is found. This option
searches all disks that are looking for the signature of a repository device. If a
disk is found with the signature identifying it as the cluster repository, the search
is stopped. If the node finds itself in the cluster configuration on the disk, the
node joins the cluster. If the storage network is dirty and multiple repositories
are in the storage network (not supported), it stops at the first repository disk. If
the node is not in that repository configuration, it does not join the cluster.
Use the -v flag to see which disk was found. Then use the other options on the
clusterconf command to clean up the storage network until the desired results
are achieved.
-u
Performs the unconfigure operation for the local node. If the node is in the
cluster repository configuration on the shared disk to which the other nodes
have access, the other nodes in the cluster request this node to rejoin the
cluster. The -u option is used when cleanup must be performed on the local
node. (The node was removed from the cluster configuration. For some reason,
the local node was either down or inaccessible from the network to be removed
during normal removal operations such as when the chcluster -m -nodeA
command was run). The updates to clean up the environment on the local node
are performed by the unconfigure operation.
-f
The force option, which performs the unconfigure operation and ignores errors.
-v
Verbose mode.
Examples
To clean up the local node, the following command cleans up the nodes environment:
clusterconf -fu
To recover the cluster configuration and start cluster services, enter the following
command:
clusterconf -r hdisk1
To search for the cluster repository device and join the cluster, enter the following
command:
clusterconf -s
482
IBM PowerHA SystemMirror 7.1 for AIX
B
Appendix B.
PowerHA SMIT tree
This appendix includes the PowerHA v7.1 SMIT tree. Depending on the version of PowerHA
that you have installed, you might notice some differences.
Note the following explanation to help you understand how to read the tree:
The number of right-pointing double quotation marks (») indicates the number of screens
that you have to go down in the PowerHA SMIT tree. For example, » » » means that you
must page down three screens.
The double en dashes (--) are used as a separator between the SMIT text and the SMIT
fast path.
The parentheses (()) indicate the fast path.
» Cluster Nodes and Networks -- (cm_cluster_nodes_networks)
» » Initial Cluster Setup (Typical) -- (cm_setup_menu)
» » » Setup a Cluster, Nodes and Networks -- (cm_setup_cluster_nodes_networks)
» » » Define Repository Disk and Cluster IP Address -- cm_define_repos_ip_addr)
» » » What are a repository disk and cluster IP address ? -- (cm_whatis_repos_ip_addr)
» » Manage the Cluster -- (cm_manage_cluster)
» » »PowerHA SystemMirror Configuration -- (cm_show_cluster_top)
» » »Remove the Cluster Definition -- (cm_remove_cluster)
» » » Snapshot Configuration -- (cm_cfg_snap_menu)
» » » » Create a Snapshot of the Cluster Configuration -- (cm_add_snap.dialog)
» » » » Change/Show a Snapshot of the Cluster Configuration -- (cm_show_snap.select)
» » » » Remove a Snapshot of the Cluster Configuration -- (cm_rm_snap.select)
» » » » Restore the Cluster Configuration From a Snapshot -- (cm_apply_snap.select)
» » » » Configure a Custom Snapshot Method -- (clsnapshot_custom_menu)
» » » » » Add a Custom Snapshot Method -- (clsnapshot_custom_dialog_add)
» » » » » Change/Show a Custom Snapshot Method -- (clsnapshot_custom_dialog_cha.select)
» » » » » Remove a Custom Snapshot Method -- (clsnapshot_custom_dialog_rem.select)
» » Manage Nodes -- (cm_manage_nodes)
» » » Show Topology Information by Node -- (cllsnode_menu)
» » » » Show All Nodes -- (cllsnode.dialog)
» » » » Select a Node to Show -- (cllsnode_select)
» » » Add a Node -- (cm_add_node)
» » » Change/Show a Node -- (cm_change_show_node)
» » » Remove Nodes -- (cm_remove_node)
» » » Configure Persistent Node IP Label/Addresses -- (cm_persistent_addresses)
© Copyright IBM Corp. 2011. All rights reserved.
483
» » » » Add a Persistent Node IP Label/Address -(cm_add_a_persistent_node_ip_label_address_select)
» » » » Change/Show a Persistent Node IP Label/Address -(cm_change_show_a_persistent_node_ip_label_address_select)
» » » » Remove a Persistent Node IP Label/Address -(cm_delete_a_persistent_node_ip_label_address_select)
» » » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync)
» » Manage Networks and Network Interfaces -- (cm_manage_networks_interfaces)
» » » Networks -- (cm_manage_networks_menu)
» » » » Add a Network -- (cm_add_network)
» » » » Change/Show a Network -- (cm_change_show_network)
» » » » Remove a Network -- (cm_remove_network)
» » » Network Interfaces -- (cm_manage_interfaces_menu)
» » » » Add a Network Interface -- (cm_add_interfaces)
» » » » Change/Show a Network Interface -- (cm_change_show_interfaces)
» » » » Remove a Network Interface -- (cm_remove_interfaces)
» » » Show Topology Information by Network -- (cllsnw_menu)
» » » » Show All Networks -- (cllsnw.dialog)
» » » » Select a Network to Show -- (cllsnw_select)
» » » Show Topology Information by Network Interface -- (cllsif_menu)
» » » » Show All Network Interfaces -- (cllsif.dialog)
» » » » Select a Network Interface to Show -- (cllsif_select)
» » » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync)
» » Discover Network Interfaces and Disks -- (cm_discover_nw_interfaces_and_disks)
» » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync)
» Cluster Applications and Resources -- (cm_apps_resources)
» » Make Applications Highly Available (Use Smart Assists) -- (clsa)
» » Resources -- (cm_resources_menu)
» » » Configure User Applications (Scripts and Monitors) -- (cm_user_apps)
» » » » Application Controller Scripts -- (cm_app_scripts)
» » » » » Add Application Controller Scripts -- (cm_add_app_scripts)
» » » » » Change/Show Application Controller Scripts -- (cm_change_show_app_scripts)
» » » » » Remove Application Controller Scripts -- (cm_remove_app_scripts)
» » » » » What is an "Application Controller" anyway ? -- (cm_app_controller_help)
» » » » Application Monitors -- (cm_appmon)
» » » » » Configure Process Application Monitors -- (cm_cfg_process_appmon)
» » » » » » Add a Process Application Monitor -- (cm_add_process_appmon)
» » » » » » Change/Show Process Application Monitor -- (cm_change_show_process_appmon)
» » » » » » Remove a Process Application Monitor -- (cm_remove_process_appmon)
» » » » » Configure Custom Application Monitors -- (cm_cfg_custom_appmon)
» » » » » » Add a Custom Application Monitor -- (cm_add_custom_appmon)
» » » » » » Change/Show Custom Application Monitor -- (cm_change_show_custom_appmon)
» » » » » » Remove a Custom Application Monitor -- (cm_remove_custom_appmon)
» » » » Configure Application for Dynamic LPAR and CoD Resources -- (cm_cfg_appondemand)
» » » » » Configure Communication Path to HMC -- (cm_cfg_apphmc)
» » » » » » Add HMC IP addresses for a node -- (cladd_apphmc.dialog)
» » » » » » Change/Show HMC IP addresses for a node -- (clch_apphmc.select)
» » » » » » Remove HMC IP addresses for a node -- (clrm_apphmc.select)
» » » » » Configure Dynamic LPAR and CoD Resources for Applications -- (cm_cfg_appdlpar)
» » » » » » Add Dynamic LPAR and CoD Resources for Applications -- (cm_add_appdlpar)
» » » » » » Change/Show Dynamic LPAR and CoD Resources for Applications -(cm_change_show_appdlpar)
» » » » » » Remove Dynamic LPAR and CoD Resources for Applications -- (cm_remove_appdlpar)
» » » » Show Cluster Applications -- (cldisp.dialog)
» » » Configure Service IP Labels/Addresses -- (cm_service_ip)
» » » » Add a Service IP Label/Address -- (cm_add_a_service_ip_label_address.select_net)
» » » » Change/Show a Service IP Label/Address -- (cm_change_service_ip.select)
» » » » Remove Service IP Label(s)/Address(es) -- (cm_delete_service_ip.select)
» » » » Configure Service IP Label/Address Distribution Preferences --
484
IBM PowerHA SystemMirror 7.1 for AIX
(cm_change_show_service_ip_distribution_preference_select)
» » » Configure Tape Resources -- (cm_cfg_tape)
» » » » Add a Tape Resource -- (cm_add_tape)
» » » » Change/Show a Tape Resource -- (cm_change_tape)
» » » » Remove a Tape Resource -- (cm_remove_tape)
» » » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync)
» » Resource Groups -- (cm_resource_groups)
» » » Add a Resource Group -- (cm_add_resource_group)
» » » Change/Show Nodes and Policies for a Resource Group -(cm_change_show_rg_nodes_policies)
» » » Change/Show Resources and Attributes for a Resource Group -(cm_change_show_rg_resources)
» » » Remove a Resource Group -- (cm_remove_resource_group)
» » » Configure Resource Group Run-Time Policies -(cm_config_resource_group_run-time_policies_menu_dmn)
» » » » Configure Dependencies between Resource Groups -- (cm_rg_dependencies_menu)
» » » » » Configure Parent/Child Dependency -- (cm_rg_dependencies)
» » » » » » Add Parent/Child Dependency between Resource Groups -(cm_rg_dependencies add.select)
» » » » » » Change/Show Parent/Child Dependency between Resource Groups -(cm_rg_dependencies ch.select)
» » » » » » Remove Parent/Child Dependency between Resource Groups -(cm_rg_dependencies rm.select)
» » » » » » Display All Parent/Child Resource Group Dependencies -(cm_rg_dependencies display.select)
» » » » » Configure Start After Resource Group Dependency -(cm_rg_dependencies_startafter_main_menu)
» » » » » » Add Start After Resource Group Dependency -- (cm_rg_dependencies add.select startafter)
» » » » » » Change/Show Start After Resource Group Dependency -(cm_rg_dependencies ch.select startafter)
» » » » » » Remove Start After Resource Group Dependency -(cm_rg_dependencies rm.select startafter)
» » » » » » Display Start After Resource Group Dependencies -(cm_rg_dependencies display.select startafter)
» » » » » Configure Stop After Resource Group Dependency -(cm_rg_dependencies_stopafter_main_menu)
» » » » » » Add Stop After Resource Group Dependency -(cm_rg_dependencies add.select stopafter)
» » » » » » Change/Show Stop After Resource Group Dependency -(cm_rg_dependencies ch.select stopafter)
» » » » » » Remove Stop After Resource Group Dependency -(cm_rg_dependencies rm.select stopafter)
» » » » » » Display Stop After Resource Group Dependencies -(cm_rg_dependencies display.select stopafter)
» » » » » Configure Online on the Same Node Dependency -- (cm_rg_osn_dependencies)
» » » » » » Add Online on the Same Node Dependency Between Resource Groups -(cm_rg_osn_dependencies add.dialog)
» » » » » » Change/Show Online on the Same Node Dependency Between Resource Groups -(cm_rg_osn_dependencies ch.select)
» » » » » » Remove Online on the Same Node Dependency Between Resource -(cm_rg_osn_dependencies rm.select)
» » » » » Configure Online on Different Nodes Dependency -- (cm_rg_odn_dependencies.dialog)
» » » » Configure Resource Group Processing Ordering -- (cm_processing_order)
» » » » Configure PowerHA SystemMirror Workload Manager Parameters -- (cm_cfg_wlm_runtime)
» » » » Configure Delayed Fallback Timer Policies -- (cm_timer_menu)
» » » » » Add a Delayed Fallback Timer Policy -- (cm_timer_add.select)
» » » » » Change/Show a Delayed Fallback Timer Policy -- (cm_timer_update.select)
» » » » » Remove a Delayed Fallback Timer Policy -- (cm_timer_remove.select)
» » » » Configure Settling Time for Resource Groups -- (cm_settling_timer_menu)
» » » Show All Resources by Node or Resource Group --
Appendix B. PowerHA SMIT tree
485
(cm_show_all_resources_by_node_or_resource_group_menu_dmn
» » » » Show Resource Information by Node -- (cllsres.select)
» » » » Show Resource Information by Resource Group -- (clshowres.select)
» » » » Show Current State of Applications and Resource Groups -(cm_show_current_state_application_resource_group_menu_dwn)
» » » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync)
» » » What is a "Resource Group" anyway ? -- (cm_resource_group_help)
» » Verify and Synchronize Cluster Configuration -- (cm_ver_and_sync)
» System Management (C-SPOC) -- (cm_system_management_cspoc_menu_dmn)
» » Storage -- (cl_lvm)
» » » Volume Groups -- (cl_vg)
» » » » List All Volume Groups -- (cl_lsvgA)
» » » » Create a Volume Group -- (cl_createvg)
» » » » Create a Volume Group with Data Path Devices -- (cl_createvpathvg)
» » » » Set Characteristics of a Volume Group -- (cl_vgsc)
» » » » » Add a Volume to a Volume Group -- (cl_extendvg)
» » » » » Change/Show characteristics of a Volume Group -- (cl_chshsvg)
» » » » » Remove a Volume from a Volume Group -- (cl_reducevg)
» » » » » Enable/Disable a Volume Group for Cross-Site LVM Mirroring Verification -(hacmp_sm_lv_svg_sc_ed)
» » » » Enable a Volume Group for Fast Disk Takeover or Concurrent Access -- (cl_vgforfdto)
» » » » Import a Volume Group -- (cl_importvg)
» » » » Mirror a Volume Group -- (cl_mirrorvg)
» » » » Unmirror a Volume Group -- (cl_unmirrorvg)
» » » » Manage Critical Volume Groups -- (cl_manage_critical_vgs)
» » » » » Mark a Volume Group as Critical -- (cl_mark_critical_vg.select)
» » » » » Show all Critical volume groups -- (cl_show_critical_vgs)
» » » » » Mark a Volume Group as non-Critical -- (cl_mark_noncritical_vg.select)
» » » » » Configure failure actions for Critical Volume Groups -- (cl_set_critical_vg_response)
» » » » Synchronize LVM Mirrors -- (cl_syncvg)
» » » » » Synchronize by Volume Group -- (cl_syncvg_vg)
» » » » » Synchronize by Logical Volume -- (cl_syncvg_lv)
» » » » Synchronize a Volume Group Definition -- (cl_updatevg)
» » » Logical Volumes -- (cl_lv)
» » » » List All Logical Volumes by Volume Group -- (cl_lslv0)
» » » » Add a Logical Volume -- (cl_mklv)
» » » » Show Characteristics of a Logical Volume -- (cl_lslv)
» » » » Set Characteristics of a Logical Volume -- (cl_lvsc)
» » » » » Rename a Logical Volume -- (cl_renamelv)
» » » » » Increase the Size of a Logical Volume -- (cl_extendlv)
» » » » » Add a Copy to a Logical Volume -- (cl_mklvcopy)
» » » » » Remove a Copy from a Logical Volume -- (cl_rmlvcopy)
» » » » Change a Logical Volume -- (cl_chlv1)
» » » » Remove a Logical Volume -- (cl_rmlv1)
» » » File Systems -- (cl_fs)
» » » » List All File Systems by Volume Group -- (cl_lsfs)
» » » » Add a File System -- (cl_mkfs)
» » » » Change / Show Characteristics of a File System -- (cl_chfs)
» » » » Remove a File System -- (cl_rmfs)
» » » Physical Volumes -- (cl_disk_man)
» » » » Add a Disk to the Cluster -- (cl_disk_man add nodes)
» » » » Remove a Disk From the Cluster -- (cl_disk_man rem nodes)
» » » » Cluster Disk Replacement -- (cl_disk_man.replace)
» » » » Cluster Data Path Device Management -- (cl_dpath_mgt)
» » » » » Display Data Path Device Configuration -- (cl_dpls_cfg.select)
» » » » » Display Data Path Device Status -- (cl_dp_stat.select)
» » » » » Display Data Path Device Adapter Status -- (cl_dpdadapter_stat.select)
» » » » » Define and Configure all Data Path Devices -- (cl_dpdefcfg_all.select)
» » » » » Add Paths to Available Data Path Devices -- (cl_dpaddpaths.select)
486
IBM PowerHA SystemMirror 7.1 for AIX
» » » » » Configure a Defined Data Path Device -- (cl_dpconfdef.select)
» » » » » Remove a Data Path Device -- (cl_dprmvp.select)
» » » » » Convert ESS hdisk Device Volume Group to an SDD VPATH Device -(cl_dphd2vp.select)
» » » » » Convert SDD VPATH Device Volume Group to an ESS hdisk Device -(cl_dpvp2hd.select)
» » » » Configure Disk/Site Locations for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds)
» » » » » Add Disk/Site Definition for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds_ad)
» » » » » Change/Show Disk/Site Definition for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds_cs)
» » » » » Remove Disk/Site Definition for Cross-Site LVM Mirroring -- (hacmp_sm_pv_xsm_ds_rm)
» » PowerHA SystemMirror Services -- (cl_cm_startstop_menu)
» » » Start Cluster Services -- (clstart)
» » » Stop Cluster Services -- (clstop)
» » » Show Cluster Services -- (clshowsrv.dialog)
» » Communication Interfaces -(cm_hacmp_communication_interface_management_menu_dmn)
» » » Configure Communication Interfaces/Devices to the Operating System on a Node -(cm_config_comm_dev_node.select)
» » » Update PowerHA SystemMirror Communication Interface with AIX Settings -(cm_update_hacmp_interface_with_aix_settings)
» » » Swap IP Addresses between Communication Interfaces -- (cl_swap_adapter)
» » » PCI Hot Plug Replace a Network Interface Card --(cl_pcihp)
» » Resource Groups and Applications -(cm_hacmp_resource_group_and_application_management_menu)
» » » Show the Current State of Applications and Resource Groups -(cm_show_current_state_application_resource_group_menu_dwn)
» » » Bring a Resource Group Online -- (cl_resgrp_start.select)
» » » Bring a Resource Group Offline -- (cl_resgrp_stop.select)
» » » Move Resource Groups to Another Node -- (cl_resgrp_move_node.select)
» » » Suspend/Resume Application Monitoring -- (cm_suspend_resume_menu)
» » » » Suspend Application Monitoring -- (cm_suspend_appmon.select)
» » » » Resume Application Monitoring -- (cm_resume_appmon.select)
» » » Application Availability Analysis -- (cl_app_AAA.dialog)
» » PowerHA SystemMirror Logs -- (cm_hacmp_log_viewing_and_management_menu_dmn)
» » » View/Save/Delete PowerHA SystemMirror Event Summaries -- (cm_dsp_evs)
» » » » View Event Summaries -- (cm_show_evs)
» » » » Save Event Summaries to a file -- (dspevs.dialog)
» » » » Delete Event Summary History -- (cm_del_evs)
» » » View Detailed PowerHA SystemMirror Log Files -- (cm_log_menu)
» » » » Scan the PowerHA SystemMirror for AIX Scripts log -- (cm_scan_scripts_log_select)
» » » » Watch the PowerHA SystemMirror for AIX Scripts log -- (cm_watch_scripts_log.dialog)
» » » » Scan the PowerHA SystemMirror for AIX System log -- (cm_scan_syslog.dialog)
» » » » Watch the PowerHA SystemMirror for AIX System log -- (cm_watch_syslog.dialog)
» » » » Scan the C-SPOC System Log File -- (cl_scan_syslog.dialog)
» » » » Watch the C-SPOC System Log File -- (cl_watch_syslog.dialog)
» » » Change/Show PowerHA SystemMirror Log File Parameters -- (cm_run_time.select)
» » » Change/Show Cluster Manager Log File Parameters -- (cluster_manager_log_param)
» » » Change/Show a Cluster Log Directory -- (clusterlog_redir.select)
» » » Change All Cluster Logs Directory -- (clusterlog_redirall_cha)
» » » Collect Cluster log files for Problem Reporting -- (cm_clsnap_dialog)
» » File Collections -- (cm_filecollection_menu)
» » » Manage File Collections -- (cm_filecollection_mgt)
» » » » Add a File Collection -- (cm_filecollection_add)
» » » » Change/Show a File Collection -- (cm_filecollection_ch)
» » » » Remove a File Collection -- (cm_filecollection_rm)
» » » » Change/Show Automatic Update Time -- (cm_filecollection_time)
» » » Manage File in File Collections -- (cm_filesinfilecollection_mgt)
» » » » Add Files to a File Collection -- (cm_filesinfilecollection_add)
» » » » Remove Files from a File Collection -- (cm_filesfromfilecollection_selectfc)
» » » Propagate Files in File Collections -- (cm_filecollection_prop)
Appendix B. PowerHA SMIT tree
487
» » Security and Users -- (cl_usergroup)
» » » PowerHA SystemMirror Cluster Security -- (cm_config_security)
» » » » Configure Connection Authentication Mode -- (cm_config_security.connection)
» » » » Configure Message Authentication Mode and Key Management -(cm_config_security.message)
» » » » » Configure Message Authentication Mode -- (cm_config_security.message_dialog)
» » » » » Generate/Distribute a Key -- (cm_config_security.message_key_dialog)
» » » » » Enable/Disable Automatic Key Distribution -- (cm_config_security.keydist_message_dialog)
» » » » » Activate the new key on all PowerHA SystemMirror cluster node -(cm_config_security.keyrefr_message_dialog)
» » » Users in an PowerHA SystemMirror cluster -- (cl_users)
» » » » Add a User to the Cluster -- (cl_mkuser)
» » » » Change / Show Characteristics of a User in the Cluster -- (cl_chuser)
» » » » Remove a User from the Cluster -- (cl_rmuser)
» » » » List Users in the Cluster -- (cl_lsuser.hdr)
» » » Groups in an PowerHA SystemMirror cluster -- (cl_groups)
» » » » List All Groups in the Cluster -- (cl_lsgroup.hdr)
» » » » Add a Group to the Cluster -- (cl_mkgroup)
» » » » Change / Show Characteristics of a Group in the Cluster -- (cl_chgroup)
» » » » Remove a Group from the Cluster -- (cl_rmgroup)
» » » Passwords in an PowerHA SystemMirror cluster -- (cl_passwd)
» » » » Change a User's Password in the Cluster -- (cl_chpasswd)
» » » » Change Current Users Password -- (cl_chuserpasswd)
» » » » Manage List of Users Allowed to Change Password -- (cl_manageusers)
» » » » List Users Allowed to Change Password -- (cl_listmanageusers)
» » » » Modify System Password Utility -- (cl_modpasswdutil)
» » Open a SMIT Session on a Node -- (cm_open_a_smit_session_select)
» Problem Determination Tools -- (cm_problem_determination_tools_menu_dmn)
» » PowerHA SystemMirror Verification -- (cm_hacmp_verification_menu_dmn)
» » » Verify Cluster Configuration -- (clverify.dialog)
» » » Configure Custom Verification Method -- (clverify_custom_menu)
» » » » Add a Custom Verification Method -- (clverify_custom_dialog_add)
» » » » Change/Show a Custom Verification Method -- (clverify_custom_dialog_cha.select)
» » » » Remove a Custom Verification Method -- (clverify_custom_dialog_rem.select)
» » » Automatic Cluster Configuration Monitoring -- (clautover.dialog)
» » View Current State -- (cm_view_current_state_menu_dmn)
» » PowerHA SystemMirror Log Viewing and Management -(cm_hacmp_log_viewing_and_management_menu_dmn)
» » » View/Save/Delete PowerHA SystemMirror Event Summaries -- (cm_dsp_evs)
» » » » View Event Summaries -- (cm_show_evs)
» » » » Save Event Summaries to a file -- (dspevs.dialog)
» » » » Delete Event Summary History -- (cm_del_evs)
» » » View Detailed PowerHA SystemMirror Log Files -- (cm_log_menu)
» » » » Scan the PowerHA SystemMirror for AIX Scripts log -- (cm_scan_scripts_log_select)
» » » » Watch the PowerHA SystemMirror for AIX Scripts log -- (cm_watch_scripts_log.dialog)
» » » » Scan the PowerHA SystemMirror for AIX System log -- (cm_scan_syslog.dialog)
» » » » Watch the PowerHA SystemMirror for AIX System log -- (cm_watch_syslog.dialog)
» » » » Scan the C-SPOC System Log File -- (cl_scan_syslog.dialog)
» » » » Watch the C-SPOC System Log File -- (cl_watch_syslog.dialog)
» » » Change/Show PowerHA SystemMirror Log File Parameters -- (cm_run_time.select)
» » » Change/Show Cluster Manager Log File Parameters -- (cluster_manager_log_param)
» » » Change/Show a Cluster Log Directory -- (clusterlog_redir.select)
» » » Change All Cluster Logs Directory -- (clusterlog_redirall_cha)
» » » Collect Cluster log files for Problem Reporting -- (cm_clsnap_dialog)
» » Recover From PowerHA SystemMirror Script Failure -- (clrecover.dialog.select)
» » Restore PowerHA SystemMirror Configuration Database from Active Configuration -(cm_copy_acd_2dcd.dialog)
» » Release Locks Set By Dynamic Reconfiguration -- (cldarelock.dialog)
» » Cluster Test Tool -- (hacmp_testtool_menu)
488
IBM PowerHA SystemMirror 7.1 for AIX
» » » Execute Automated Test Procedure -- (hacmp_testtool_auto_extended)
» » » Execute Custom Test Procedure -- (hacmp_testtool_custom)
» » PowerHA SystemMirror Trace Facility -- (cm_trace_menu)
» » » Enable/Disable Tracing of PowerHA SystemMirror for AIX daemons -- (tracessys)
» » » » Start Trace -- (tracessyson)
» » » » Stop Trace -- (tracessysoff)
» » » Start/Stop/Report Tracing of PowerHA SystemMirror for AIX Service -- (trace)
» » » » START Trace -- (trcstart)
» » » » STOP Trace -- (trcstop)
» » » » Generate a Trace Report -- (trcrpt)
» » » » Manage Event Groups -- (grpmenu)
» » » » » List all Event Groups -- (lsgrp)
» » » » » Add an Event Group -- (addgrp)
» » » » » Change/Show an Event Group -- (chgrp)
» » » » » Remove Event Groups -- (delgrp.hdr)
» » » » Manage Trace -- (mngtrace)
» » » » » Change/Show Default Values -- (cngtrace)
» » » » » Reset Original Default Values -- (rstdflts)
» » PowerHA SystemMirror Error Notification -- (cm_EN_menu)
» » » Configure Automatic Error Notification -- (cm_AEN_menu)
» » » » List Error Notify Methods for Cluster Resources -- (cm_aen_list.dialog)
» » » » Add Error Notify Methods for Cluster Resources -- (cm_aen_add.dialog)
» » » » Remove Error Notify Methods for Cluster Resources -- (cm_aen_delete.dialog)
» » » Add a Notify Method -- (cm_add_notifymeth.dialog)
» » » Change/Show a Notify Method -- (cm_change_notifymeth_select)
» » » Remove a Notify Method -- (cm_del_notifymeth_select)
» » » Emulate Error Log Entry -- (show_err_emulate.select)
» » Stop RSCT Service -- (cm_manage_rsct_stop.dialog)
» » AIX Tracing for Cluster Resources -- (cm_trc_menu)
» » » Enable AIX Tracing for Cluster Resources -- (cm_trc_enable.select)
» » » Disable AIX Tracing for Cluster Resources -- (cm_trc_disable.dialog)
» » » Manage Command Groups for AIX Tracing for Cluster Resources -- (cm_trc_man_cmdgrp_menu)
» » » » List Command Groups for AIX Tracing for Cluster Resources -- (cm_trc_ls_cmdgrp.dialog)
» » » » Add a Command Group for AIX Tracing for Cluster Resources -- (cm_trc_add_cmdgrp.select)
» » » » Change / Show a Command Group for AIX Tracing for Cluster Resou -(cm_trc_ch_cmdgrp.select)
» » » » Remove Command Groups for AIX Tracing for Cluster Resources -- (cm_trc_rm_cmdgrp.dialog)
» » Open a SMIT Session on a Node -- (cm_open_a_smit_session_select)
» Custom Cluster Configuration -- (cm_custom_menu)
» » Cluster Nodes and Networks -- (cm_custom_cluster_nodes_networks)
» » » Initial Cluster Setup (Custom) -- (cm_custom_setup_menu)
» » » » Cluster -- (cm_custom_setup_cluster_menu)
» » » » » Add/Change/Show a Cluster -- (cm_add_change_show_cluster)
» » » » » Remove the Cluster Definition -- (cm_remove_cluster)
» » » » Nodes -- (cm_custom_setup_nodes_menu)
» » » » » Add a Node -- (cm_add_node)
» » » » » Change/Show a Node -- (cm_change_show_node)
» » » » » Remove a Node -- (cm_remove_node)
» » » » Networks -- (cm_manage_networks_menu)
» » » » » Add a Network -- (cm_add_network)
» » » » » Change/Show a Network -- (cm_change_show_network)
» » » » » Remove a Network -- (cm_remove_network)
» » » » Network Interfaces -- (cm_manage_interfaces_menu)
» » » » » Add a Network Interface -- (cm_add_interfaces)
» » » » » Change/Show a Network Interface -- (cm_change_show_interfaces)
» » » » » Remove a Network Interface -- (cm_remove_interfaces)
» » » » Define Repository Disk and Cluster IP Address -- (cm_define_repos_ip_addr)
» » » Manage the Cluster -- (cm_custom_mgt_menu)
» » » » Cluster Startup Settings -- (cm_startup_options)
Appendix B. PowerHA SMIT tree
489
» » » » Reset Cluster Tunables -- (cm_reset_cluster_tunables)
» » » Verify and Synchronize Cluster Configuration (Advanced) -- (cm_adv_ver_and_sync)
» » Resources -- (cm_custom_apps_resources)
» » » Custom Disk Methods -- (cldisktype_custom_menu)
» » » » Add Custom Disk Methods -- (cldisktype_custom_dialog_add)
» » » » Change/Show Custom Disk Methods -- (cldisktype_custom_dialog_cha.select)
» » » » Remove Custom Disk Methods -- (cldisktype_custom_dialog_rem.select)
» » » Custom Volume Group Methods -- (cm_config_custom_volume_methods_menu_dmn)
» » » » Add Custom Volume Group Methods -- (cm_dialog_add_custom_volume_methods)
» » » » Change/Show Custom Volume Group Methods -(cm_selector_change_custom_volume_methods)
» » » » Remove Custom Volume Group Methods -- (cm_dialog_delete_custom_volume_methods)
» » » Custom File System Methods -- (cm_config_custom_filesystem_methods_menu_dmn)
» » » » Add Custom File System Methods -- (cm_dialog_add_custom_filesystem_methods)
» » » » Change/Show Custom File System Methods -(cm_selector_change_custom_filesystem_methods)
» » » » Remove Custom File System Methods -- (cm_dialog_delete_custom_filesystem_methods)
» » » Configure User Defined Resources and Types -- (cm_cludrestype_main_menu)
» » » » Configure User Defined Resource Types -- (cm_cludrestype_sub_menu)
» » » » » Add a User Defined Resource Type -- (cm_cludrestype_add)
» » » » » Change/Show a User Defined Resource Type -- (cm_cludrestype_change)
» » » » » Remove a User Defined Resource Type -- (cm_cludrestype_remove)
» » » » Configure User Defined Resources -- (cm_cludres_sub_menu)
» » » » » Add a User Defined Resource -- (cm_cludres_add)
» » » » » Change/Show a User Defined Resource -- (cm_cludres_change)
» » » » » Remove a User Defined Resource -- (cm_cludres_remove)
» » » » » Change/Show User Defined Resource Monitor -- (cm_cludres_chmonitor)
» » » » Import User Defined Resource Types and Resources Definition from XML file -(cm_cludrestype_importxml)
» » » Customize Resource Recovery -- (_cm_change_show_resource_action_select)
» » » Verify and Synchronize Cluster Configuration (Advanced) -- (cm_adv_ver_and_sync)
» » Events -- (cm_events)
» » » Cluster Events -- (cm_cluster_events)
» » » » Configure Pre/Post-Event Commands -- (cm_defevent_menu)
» » » » » Add a Custom Cluster Event -- (cladd_event.dialog)
» » » » » Change/Show a Custom Cluster Event -- (clchsh_event.select)
» » » » » Remove a Custom Cluster Event -- (clrm_event.select)
» » » » Change/Show Pre-Defined Events -- (clcsclev.select)
» » » » User-Defined Events -- (clude_custom_menu)
» » » » » Add Custom User-Defined Events -- (clude_custom_dialog_add)
» » » » » Change/Show Custom User-Defined Events -- (clude_custom_dialog_cha.select)
» » » » » Remove Custom User-Defined Events -- (clude_custom_dialog_rem.select)
» » » » Remote Notification Methods -- (cm_def_cus_pager_menu)
» » » » » Configure a Node/Port Pair -- (define_node_port)
» » » » » Remove a Node/Port Pair -- (remove_node_port)
» » » » » Add a Custom Remote Notification Method -- (cladd_pager_notify.dialog)
» » » » » Change/Show a Custom Remote Notification Method -- (clch_pager_notify)
» » » » » Remove a Custom Remote Notification Method -- (cldel_pager_notify)
» » » » » Send a Test Remote Notification -- (cltest_pager_notify)
» » » » Change/Show Time Until Warning -- (cm_time_before_warning)
» » » System Events -- (cm_system_events)
» » » » Change/Show Event Response -- (cm_change_show_sys_event)
» » » Verify and Synchronize Cluster Configuration (Advanced) -- (cm_adv_ver_and_sync)
» » Verify and Synchronize Cluster Configuration (Advanced) -- (cm_adv_ver_and_sync)
» Can't find what you are looking for ? -- (cm_tree)
» Not sure where to start ? -- (cm_getting_started)
490
IBM PowerHA SystemMirror 7.1 for AIX
C
Appendix C.
PowerHA supported hardware
Historically, newer versions of PowerHA inherited support from previous versions, unless
specific support was removed by the product. Over time, it has become uncommon to remove
support for old hardware. If the hardware was supported in the past and it can run a version of
AIX that is supported by the current version of PowerHA, the hardware is supported.
Because PowerHA 7.1 is not supported on any AIX level before 6.1.6, if the hardware is not
supported on 6.1.6, then by definition PowerHA 7.1 does not support it either. Also, if the
hardware manufacturer has not made any statement of support for AIX 7.1, it is not valid until
such support is stated. This is true even though the tables in this appendix might show that
PowerHA supports it.
This appendix contains information about IBM Power Systems, IBM storage, adapters, and
AIX levels supported by current versions of High-Availability Cluster Multi-Processing
(HACMP) 5.4.1 through PowerHA 7.1. It focuses on hardware support from around the last
five years and consists mainly of IBM POWER5 systems and later. At the time of writing, the
information was current and complete.
All POWER5 and later systems are supported on AIX 7.1 and HACMP 5.4.1 and later. AIX 7.1
support has the following specific requirements for HACMP and PowerHA:
HACMP 5.4.1, SP10
PowerHA 5.5, SP7
PowerHA 6.1, SP3
PowerHA 7.1
Full software support details are in the official support flash. The information in this appendix
is available and maintained in the “PowerHA hardware support matrix” at:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105638
Most of the devices in the online documentation are linked to their corresponding support flash.
This appendix includes the following topics:
IBM Power Systems
IBM storage
Adapters
© Copyright IBM Corp. 2011. All rights reserved.
491
IBM Power Systems
The following sections provide details about the IBM Power System servers and the levels of
PowerHA and AIX supported.
IBM POWER5 systems
Table C-1 lists the software versions for PowerHA with AIX supported on IBM POWER5
System p models.
Table C-1 POWER5 System p model support for HACMP and PowerHA
492
System p
models
HACMP 5.4.1
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
7037-A50
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9110-510
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9110-51A
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9111-285
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9111-520
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6 r
AIX 7.1
9113-550
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9115-505
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9116-561+
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9117-570
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9118-575
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6 r
AIX 7.1
9119-590
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9119-595
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9131-52A
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9133-55A
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
IBM PowerHA SystemMirror 7.1 for AIX
Table C-2 lists the software versions for PowerHA with AIX supported on IBM POWER5
System i® models.
Table C-2 POWER5 System i model support for HACMP and PowerHA
System i models
HACMP 5.4.1
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
9406-520
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9406-550
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9406-570
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9406-590
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9406-595
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
IBM POWER6 systems
Table C-3 lists the software versions for PowerHA with AIX supported on POWER6 System p
models.
Table C-3 POWER6 System p support for PowerHA and AIX
System p
models
HACMP 5.4.1
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
8203-E4A
AIX 5.3 TL7
AIX 6.1 TL0 SP2
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
8203-E8A
AIX 5.3 TL7
AIX6.1 TL0 SP
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
8234-EMA
AIX 5.3 TL8
AIX 6.1 TL0 SP5
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9117-MMA
AIX 5.3 TL6
AIX 6.1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9119-FHA
AIX 5.3 TL8
AIX 6.1 SP1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
9125-F2A
AIX 5.3 TL8
AIX 6.1 SP1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
Built-in serial ports: Built-in serial ports in POWER6 servers are not available for
PowerHA use. Instead, use disk heartbeating. However, note that the built-in Ethernet
(IVE) adapters are supported for PowerHA use.
Appendix C. PowerHA supported hardware
493
IBM POWER7 Systems
Table C-4 lists the software versions for HACMP and PowerHA with AIX supported on IBM
POWER7 System p models.
Table C-4 POWER7 System p support for HACMP and PowerHA
System p
models
HACMP 5.4.1
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
8202-E4B/720
AIX 5.3 TL11 SP1
AIX 6.1 TL4 SP2
AIX 5.3 TL12
AIX 6.1 TL5
AIX 5.3 TL12
AIX 6.1 TL5
AIX 6.1 TL6
AIX 7.1
8205-E6B/740
AIX 5.3 TL11 SP1
AIX 6.1 TL4 SP2
AIX 5.3 TL12
AIX 6.1 TL5
AIX 5.3 TL12
AIX 6.1 TL5
AIX 6.1 TL6
AIX 7.1
8231-E2B/710
AIX 5.3 TL11 SP1
AIX 6.1 TL4 SP2
AIX 5.3 TL12
AIX 6.1 TL5
AIX 5.3 TL12
AIX 6.1 TL5
AIX 6.1 TL6
AIX 7.1
8231-E2B/730
AIX 5.3 TL11 SP1
AIX 6.1 TL4 SP2
AIX 5.3 TL12
AIX 6.1 TL5
AIX 5.3 TL12
AIX 6.1 TL5
AIX 6.1 TL6
AIX 7.1
8233-E8B/750
AIX 5.3 TL11 SP1
AIX 6.1 TL4 SP2
AIX 5.3 TL11 SP1
AIX 6.1 TL4 SP3
AIX 5.3 TL11
AIX 6.1 TL4 SP3
AIX 6.1 TL6 r
AIX 7.1
9117-MMB/770
AIX 5.3 TL11 SP1
AIX 6.1 TL4 SP2
AIX 5.3 TL11
AIX 6.1 TL4 SP3
AIX 5.3 TL11
AIX 6.1 TL4 SP3
AIX 6.1 TL6
AIX 7.1
9119-FHB/795
AIX 5.3 TL11 SP1
AIX 6.1 TL4 SP2
AIX 5.3 TL12
AIX 6.1 TL5
AIX 5.3 TL12
AIX 6.1 TL5
AIX 6.1 TL6
AIX 7.1
9179-FHB/780
AIX 5.3 TL11 SP1
AIX 6.1 TL4 SP2
AIX 5.3 TL11
AIX 6.1 TL4 SP3
AIX 5.3 TL11 or
AIX 6.1 TL4 SP3
AIX 6.1 TL6
AIX 7.1
Built-in serial ports: Built-in serial ports in POWER7 Servers are not available for
PowerHA use. Instead, use disk heartbeating. However, note that the built-in Ethernet
(IVE) adapters are supported for PowerHA use.
IBM POWER Blade servers
Table C-5 lists the software versions for HACMP and PowerHA with AIX supported on IBM
POWER Blade servers.
Table C-5 IBM POWER Blade support for HACMP and PowerHA
494
System p
models
HACMP 5.4.1
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
7778-23X/JS23
HACMP SP2
AIX 5.3 TL7
AIX 6.1 TL0 SP2
AIX 5.3 TL9
AIX 6.1 TL2 SP1
AIX 5.3 TL9
AIX 6.1 TL2 SP1
AIX 6.1 TL6
AIX 7.1
7778-43X/JS43
HACMP SP2
AIX 5.3 TL7
AIX 6.1 TL0 SP2
AIX 5.3 TL9
AIX 6.1 TL2 SP1
AIX 5.3 TL9
AIX 6.1 TL2 SP1
AIX 6.1 TL6
AIX 7.1
7998-60X/JS12
HACMP SP2
AIX 5.3 TL7
AIX 5.3 TL9
AIX 6.1 TL2 SP
AIX 5.3 TL9
AIX 6.1 TL2 SP1
AIX 6.1 TL6
AIX 7.1
7998-61X/JS22
HACMP SP2
AIX 5.3 TL6
AIX 5.3 TL7
AIX 6.1 TL2 SP1
AIX 5.3 TL9
AIX 6.1 TL2 SP1
AIX 6.1 TL6
AIX 7.1
IBM PowerHA SystemMirror 7.1 for AIX
System p
models
HACMP 5.4.1
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
8406-70Y/PS700
AIX 5.3 TL11 SP1
AIX 6.1 TL4 SP2
AIX 5.3 TL12
AIX 6.1 TL5
AIX 5.3 TL12
AIX 6.1 TL5
AIX 6.1 TL6
AIX 7.1
8406-71Y/PS701
PS702
AIX 5.3 TL11 SP1
AIX 6.1 TL4 SP2
AIX 5.3 TL12
AIX 6.1 TL5
AIX 5.3 TL12
AIX 6.1 TL5
AIX 6.1 TL6
AIX 7.1
8844-31U/JS21
8844-51U/JS21
AIX 5.3. TL4
AIX 5.3 TL9
AIX 6.1 TL2 SP1
AIX 5.3 TL9
AIX 6.1 TL2 SP1
AIX 6.1 TL6
AIX 7.1
Blade support includes support for IVM and IVE on both POWER6 and POWER7 blades. The
following adapter cards are supported in the POWER6 and POWER7 blades:
8240 Emulex 8Gb FC Expansion Card (CIOv)
8241 QLogic 4Gb FC Expansion Card (CIOv)
8242 QLogic 8Gb Fibre Channel Expansion Card (CIOv)
8246 SAS Connectivity Card (CIOv)
8251 Emulex 4Gb FC Expansion Card (CFFv)
8252 QLogic combo Ethernet and 4 Gb Fibre Channel Expansion Card (CFFh)
8271 QLogic Ethernet/8Gb FC Expansion Card (CFFh)
IBM storage
It is common to use multipathing drivers with storage. If using MPIO, SDD, SDDPCM, or all
three types on any PowerHA controlled storage, you are required to use enhanced concurrent
volume groups (ECVGs). This requirement also applies to vSCSI and NPIV devices.
Fibre Channel adapters
This section provides information about support for fibre channel (FC) adapters.
DS storage units
Table C-6 lists the DS storage unit support for HACMP and PowerHA with AIX.
Table C-6 DS storage unit support for HACMP and PowerHA
Model
HACMP 5.4.1
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
DS3400
HACMP SP2
AIX 5.3 TL8
AIX 6.1 TL2
AIX 5.3 TL9
AIX TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
DS3500
HACMP SP2
AIX 5.3 TL8
AIX 6.1 TL2
AIX 5.3 TL9
AIX TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
DS4100
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
DS4200
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
DS4300
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
Appendix C. PowerHA supported hardware
495
Model
HACMP 5.4.1
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
DS4400
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
DS4500
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
DS4700
AIX 5.3 TL5
AIX 6.1
AIX 5.3 TL9
AIX TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
DS4800
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
DS5020
HACMP SP2
AIX 5.3 TL7
AIX 6.1 TL0 SP2
AIX 5.3 TL9
AIX 6.1 TL2 SP1
AIX 5.3 TL9
AIX 6.1 TL2 SP1
AIX 6.1 TL6
AIX 7.1
DS6000
DS6800
AIX 5.3 TL5
AIX 6.1
AIX 5.3 TL9
AIX TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
DS5100
HACMP SP2
AIX 5.3 TL7
AIX 6.1 TL0 SP2
AIX 5.3 TL9
AIX 6.1 TL2 SP1
AIX 5.3 TL9
AIX 6.1 TL2 SP1
AIX 6.1 TL6
AIX 7.1
DS5300
HACMP SP2
AIX 5.3 TL7
AIX 6.1 TL0 SP2
AIX 5.3 TL9
AIX 6.1 TL2 SP1
AIX 5.3 TL9
AIX 6.1 TL2 SP1
AIX 6.1 TL6
AIX 7.1
DS8000
931,932,9B2
AIX 5.3 TL5
AIX 6.1
AIX 5.3 TL9
AIX TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
DS8700
HACMP SP2
AIX 5.3 TL8
AIX 6.1 TL2
AIX 5.3 TL9
AIX TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
IBM XIV
Table C-7 lists the software versions for HAMCP and PowerHA with AIX supported on XIV
storage. PowerHA requires XIV microcode level 10.0.1 or later.
Table C-7 IBM XIV support for HACMP and PowerHA with AIX
496
Model
HACMP 5.4.1
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
XIV
2810-A14
HACMP SP4
AIX 5.3 TL7 SP6
AIX 6.1 TL0 SP2
AIX 5.3 TL9
AIX TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
IBM PowerHA SystemMirror 7.1 for AIX
SAN Volume Controller
Table C-8 shows the software versions for HACMP and PowerHA with AIX supported on the
SAN Volume Controller (SVC). SVC software levels are supported up through SVC v5.1. The
levels shown in the table are the absolute minimum requirements for v5.1.
Table C-8 SVC supported models for HACMP and PowerHA with AIX
Model
HACMP 5.4.1
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
2145-4F2
HACMP SP8
AIX 5.3 TL9
AIX 6.1 TL2 SP3
PowerHA SP6
AIX 5.3 TL9
AIX 6.1 TL2 SP3
PowerHA SP1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
2145-8F2
HACMP SP8
AIX 5.3 TL9
AIX 6.1 TL2 SP3
PowerHA SP8
AIX 5.3 TL9
AIX 6.1 TL2 SP3
PowerHA SP1
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
Network-attached storage
Table C-9 shows the software versions for PowerHA and AIX supported on network-attached
storage (NAS).
Table C-9 NAS supported models for HACMP and PowerHA with AIX
Model
HACMP 5.4.1
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
N3700 (A20)
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N5200 (A20)
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N5200 (G20)
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N5300
HACMP SP3
AIX 5.3 TL7
AIX 6.1 TL0 SP2
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N5500 (A20)
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N5500 (G20)
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N5600
HACMP SP3
AIX 5.3 TL7
AIX 6.1 TL0 SP2
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N6040
AIX 5.3 TL7
AIX 6.1 TL0 SP2
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N6060
AIX 5.3 TL7
AIX 6.1 TL0 SP2
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N6070
AIX 5.3 TL7
AIX 6.1 TL0 SP2
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N7600 (A20)
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N7600 (G20)
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
Appendix C. PowerHA supported hardware
497
Model
HACMP 5.4.1
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
N7700 (A21)
AIX 5.3 TL7
AIX 6.1 TL0 SP2
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N7700 (G21)
AIX 5.3 TL7
AIX 6.1 TL0 SP2
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N7800 (A20)
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N7800 (G20)
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N7900 (A21)
AIX 5.3 TL7
AIX 6.1 TL0 SP2
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
N7900 (G21)
AIX 5.3 TL7
AIX 6.1 TL0 SP2
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
Serial-attached SCSI
Table C-10 lists the software versions for PowerHA and AIX supported on the serial-attached
SCSI (SAS) model.
Table C-10 SAS supported model for HACMP and PowerHA with AIX
Model
HACMP 5.4.1
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
5886 EXP12S
HACMP SP5
AIX 5.3 TL9
AIX 6.1 TL2 SP3
HACMP SP2
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
SCSI
Table C-11 shows the software versions for PowerHA and AIX supported on the SCSI model.
Table C-11 SCSI supported model for HACMP and PowerHA with AIX
Model
HACMP 5.4.1
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
7031-D24
AIX 5.3 TL4
AIX 6.1
AIX 5.3 TL7
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
Adapters
This following sections contain information about the supported adapters for PowerHA.
Fibre Channel adapters
The following FC adapters are supported:
498
#1905 4 Gb Single Port Fibre Channel PCI-X 2.0 DDR Adapter
#1910 4 Gb Dual Port Fibre Channel PCI-X 2.0 DDR Adapter
#1957 2 Gigabit Fibre Channel PCI-X Adapter
#1977 2 Gigabit Fibre Channel PCI-X Adapter
IBM PowerHA SystemMirror 7.1 for AIX
#5273 LP 8 Gb PCI-Express Dual Port Fibre Channel Adapter*
#5276 LP 4 Gb PCI-Express Fibre Channel Adapter
#5716 2 Gigabit Fibre Channel PCI-X Adapter
#5735 8 Gb PCI-Express Dual Port Fibre Channel Adapter*
#5758 4 Gb Single Port Fibre Channel PCI-X 2.0 DDR Adapter
#5759 4 Gb Dual Port Fibre Channel PCI-X 2.0 DDR Adapter
#5773 Gigabit PCI Express Fibre Channel Adapter
#5774 Gigabit PCI Express Fibre Channel Adapter
#6228 1-and 2-Gigabit Fibre Channel Adapter for 64-bit PCI Bus
#6239 2 Gigabit FC PCI-X Adapter
#5273/#5735 PCI-Express Dual Port Fibre Channel Adapter: The 5273/5735 minimum
requirements are PowerHA 5.4.1 SP2 or 5.5 SP1.
SAS
The following SAS adapters are supported:
#5278 LP 2x4port PCI-Express SAS Adapter 3 Gb
#5901 PCI-Express SAS Adapter
#5902 PCI-X DDR Dual –x4 Port SAS RAID Adapter
#5903 PCI-Express SAS Adapters
#5912 PCI-X DDR External Dual – x4 Port SAS Adapter
Table C-12 lists the SAS software support requirements.
Table C-12 SAS software support for HACMP and PowerHA with AIX
HACMP 5.4.1
PowerHA 5.5
PowerHA 6.1
PowerHA 7.1
HACMP SP5
AIX 5.3 TL9
AIX 6.1 TL2 SP3
HACMP SP2
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 5.3 TL9
AIX 6.1 TL2 SP3
AIX 6.1 TL6
AIX 7.1
Ethernet
The following Ethernet adapters are supported with PowerHA:
#1954 4-Port 10/100/100 Base-TX PCI-X Adapter
#1959 IBM 10/100/1000 Base-TX Ethernet PCI-X Adapter
#1978 IBM Gigabit Ethernet-SX PCI-X Adapter
#1979 IBM 10/100/1000 Base-TX Ethernet PCI-X Adapter
#1981 IBM 10 Gigabit Ethernet-SR PCI-X Adapter
#1982 IBM 10 Gigabit Ethernet-LR PCI-X Adapter
#1983 IBM 2-port 10/100/1000 Base-TX Ethernet PCI-X
#1984 IBM Dual Port Gigabit Ethernet-SX PCI-X Adapter
#1990 IBM 2-port 10/100/1000 Base-TX Ethernet PCI-X
#4961 IBM Universal 4-Port 10/100 Ethernet Adapter
#4962 IBM 10/100 Mbps Ethernet PCI Adapter II
#5271 LP 4-Port Ethernet 10/100/1000 Base-TX PCI-X Adapter
#5274 LP 2-Port Gigabit Ethernet-SX PCI Express
#5700 IBM Gigabit Ethernet-SX PCI-X Adapter
#5701 IBM 10/100/1000 Base-TX Ethernet PCI-X Adapter
#5706 IBM 2-Port 10/100/1000 Base-TX Ethernet PCI-X Adapter
#5707 IBM 2-Port Gigabit Ethernet-SX PCI-X Adapter
#5717 IBM 4-Port Ethernet 10/100/1000 Base-TX PCI-X Adapter
Appendix C. PowerHA supported hardware
499
#5718 IBM 10 Gigabit -SR/-LR Ethernet PCI-x adapters
#5719 IBM 10 Gigabit -SR/-LR Ethernet PCI-x adapters
#5721 IBM 10 Gigabit Ethernet-SR PCI-X 2.0 Adapter
#5722 IBM 10 Gigabit Ethernet-LR PCI-X 2.0 Adapter
#5740 4-Port 10/100/100 Base-TX PCI-X Adapter
#5767 Adapter 2-Port 10/100/1000 Base-TX Ethernet PCI Express
#5768 Adapter 2-Port Gigabit Ethernet-SX PCI Express
InfiniBand
The following InfiniBand adapters are supported with PowerHA:
#1809 IBM GX Dual-port 4x IB HCA
#1810 IBM GX Dual-port 4x IB HCA
#1811 IBM GX Dual-port 4x IB HCA
#1812 IBM GX Dual-port 4x IB HCA
#1820 IBM GX Dual-port 12x IB HCA
SCSI and iSCSI
The following SCSI and iSCSI adapters are supported with PowerHA:
#1912 IBM PCI-X DDR Dual Channel Ultra320 LVD SCSI Adapter
#1913 PCI-X DDR Dual Channel Ultra320 SCSI RAID Adapter
#1975 PCI-X Dual Channel Ultra320 SCSI RAID Adapter
#1986 1 Gigabit-TX iSCSI TOE PCI-X adapter (copper connector)
#1987 1 Gigabit-SX iSCSI TOE PCI-X adapter (optical connector)
#5703 PCI-X Dual Channel Ultra320 SCSI RAID Adapter
#5710 PCI-X Dual Channel Ultra320 SCSI Adapter
#5711 PCI-X Dual Channel Ultra320 SCSI RAID Blind Swap Adapter
#5712 PCI-X Dual Channel Ultra320 SCSI Adapter
#5713 1 Gigabit-TX iSCSI TOE PCI-X adapter (copper connector)
#5714 1 Gigabit-SX iSCSI TOE PCI-X adapter (optical connector)
#5736 IBM PCI-X DDR Dual Channel Ultra320 SCSI Adapter
#5737 PCI-X DDR Dual Channel Ultra320 SCSI RAID Adapter
PCI bus adapters
PowerHA 7.1 no longer supports RS-232 connections. Therefore, the following adapters are
supported up through PowerHA 6.1 only:
#2943 8-Port Asynchronous EIA-232/RS-422, PCI bus adapter
#2944 128-Port Asynchronous Controller, PCI bus adapter
#5277 IBM LP 4-Port Async EIA-232 PCIe Adapter
#5723 2-Port Asynchronous EIA-232/RS-422, PCI bus adapter
#5785 IBM 4-Port Async EIA-232 PCIe adapter
The 5785 adapter is only supported by PowerHA 5.5 and 6.1.
500
IBM PowerHA SystemMirror 7.1 for AIX
D
Appendix D.
The clmgr man page
At time of writing, no documentation was available about the clmgr command except for the
related man pages. To make it easier for those of you who do not have the product installed
and want more details about the clmgr command, a copy of the man pages is provided as
follows:
clmgr command
************
Purpose
=======
clmgr: Provides a consistent, reliable interface for performing IBM PowerHA
SystemMirror cluster operations via a terminal or script. All clmgr
operations are logged in the "clutils.log" file, including the
command that was executed, its start/stop time, and what user
initiated the command.
The basic format for using clmgr is consistently as follows:
clmgr <ACTION> <CLASS> [<NAME>] [<ATTRIBUTES...>]
This consistency helps make clmgr easier to learn and use. Further
help is also available at each part of clmgr's commmand line. For
example, just executing "clmgr" by itself will result in a list of
the available ACTIONs supported by clmgr. Executing "clmgr ACTION"
with no CLASS provided will result in a list of all the available
CLASSes for the specified ACTION. Executing "clmgr ACTION CLASS"
with no NAME or ATTRIBUTES provided is slightly different, though,
since for some ACTION+CLASS combinations, that may be a valid
command format. So to get help in this scenario, it is necessary
to explicitly request it by appending the "-h" flag. So executing
"clmgr ACTION CLASS -h" will result in a listing of all known
attributes for that ACTION+CLASS combination being displayed.
That is where clmgr's ability to help ends, however; it can not
help with each individual attribute. If there is a question about
what a particular attribute is for, or when to use it, the product
© Copyright IBM Corp. 2011. All rights reserved.
501
documentation will need to be consulted.
Synopsis
========
clmgr [-c|-x] [-S] [-v] [-f] [-D] [-l {low|med|high|max}] [-T <ID>]
[-a {<ATTR#1>,<ATTR#2>,<ATTR#n>,...}] <ACTION> <CLASS> [<NAME>]
[-h | <ATTR#1>=<VALUE#1> <ATTR#2>=<VALUE#2> <ATTR#n>=<VALUE#n>]
ACTION={add|modify|delete|query|online|offline|...}
CLASS={cluster|site|node|network|resource_group|...}
clmgr {-h|-?} [-v]
clmgr [-v] help
ACTION
a verb describing the operation to be performed
The following four ACTIONs are available on almost all the
supported CLASSes (there are a few exceptions):
add
query
modify
delete
(Aliases:
(Aliases:
(Aliases:
(Aliases:
a)
q, ls, get)
mod, ch, set)
de, rm, er)
The remaining ACTIONS are typically only supported on a small
subset of the supported CLASSes:
Cluster, Sites, Node, Resource Group:
online
(Aliases: on, start)
offline
(Aliases: off, stop)
Resource Group, Service IP, Persistent IP:
move
(Aliases: mv)
Cluster, Log, Node, Snapshot:
manage
(Aliases: mg)
Cluster, File Collection:
sync
(Aliases: sy)
Cluster, Method:
verify
(Aliases:
ve)
Log, Report, Snapshot:
view
(Aliases:
vi)
NOTE: ACTION is *not* case-sensitive.
NOTE: all ACTIONs provide a shorter alias, such as "rm" in
place of "delete". These aliases are provided for
convenience/ease-of-use at a terminal, and are not
recommended for use in scripts.
CLASS
502
the type of object upon which the ACTION will be performed.
The complete list of supported CLASSes is:
IBM PowerHA SystemMirror 7.1 for AIX
cluster
site
node
interface
network
resource_group
service_ip
persistent_ip
application_controller
application_monitor
tape
dependency
file_collection
snapshot
resource
resource_type
method
volume_group
logical_volume
file_system
physical_volume
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
(Aliases:
cl)
si)
no)
in,
ne,
rg)
se)
pe)
ac,
am,
tp)
de)
fi,
sn,
rs)
rt)
me)
vg)
lv)
fs)
pv)
if)
nw)
app)
mon)
fc)
ss)
NOTE: CLASS is *not* case-sensitive.
NOTE: all CLASSes provide a shorter alias, such as "fc" in
place of "file_collection". These aliases are provided
for convenience/ease-of-use at a terminal, and are not
recommended for use in scripts.
NAME
ATTR=VALUE
the specific object, of type "CLASS", upon which the ACTION
is to be performed.
optional, attribute/value pairs that are specific to the
ACTION+CLASS combination. These may be used to do specify
configuration settings, or adjust particular operations.
When used with the "query" action, ATTR=VALUE specifications
may be used to perform attribute-based searching/filtering.
When used for this purpose, simple wildcards may be used.
For example, "*" matches zero or more of any character, "?"
matches zero or one of any character.
NOTE: an ATTR may not always need to be fully typed. Only the
number of leading characters required to uniquely identify
the attribute from amongst the set of attributes available
for the specified operation need to be provided. So instead
of "FC_SYNC_INTERVAL", for the "add/modify cluster"
operation, "FC" could be used, and would have the same
result.
-a
valid only with the "query", "add", and "modify" ACTIONs,
requests that only the specified attribute(s) be displayed.
NOTE: the specified order of these attributes is *not*
guaranteed to be preserved in the resulting output.
Appendix D. The clmgr man page
503
-c
valid only with the "query", "add", and "modify" ACTIONs,
requests all data to be displayed in colon-delimited format.
-D
disables the dependency mechanism in clmgr that will attempt to
create any requisite resources if they are not already defined
within the cluster.
-f
requests an override of any interactive prompts, forcing the
current operation to be attempted (if forcing the operation
is a possibility).
-h
requests that any available help information be displayed.
An attempt is made to provide context-sensitive assistance.
-l
activates trace logging for serviceability:
low:
med:
logs function entry/exit
adds function entry parameters, as well as function
return values
high: adds tracing of every line of execution, only omitting
routine, "utility" functions
max: adds the routine/utility functions. Also adds a time/date
stamp to the function entry/exit messages.
All trace data is written into the "clutils.log" file.
This option is typically only of interest when troubleshooting.
-S
valid only with the "query" ACTION and "-c" option,
requests that all column headers be suppressed.
-T
a transaction ID to be applied to all logged output, to help
group one of more activities into a single body of output that
can be extracted from the log for analysis.
This option is typically only of interest when troubleshooting.
-v
requests maximum verbosity in the output.
NOTE: when used with the "query" action and no specific
object name, queries all instances of the specified
class. For example, "clmgr -v query node" will query
and display *all* nodes and their attributes. When
used with the "add" or "modify" operations, the
final, resulting attributes after the operation is
complete will be displayed (only if the operation
was successful).
-x
valid only with the "query", "add", and "modify" ACTIONs,
requests all data to be displayed in simple XML format.
Operations
==========
CLUSTER:
clmgr add cluster \
[ <cluster_label> ] \
REPOSITORY=<hdisk#> \
504
IBM PowerHA SystemMirror 7.1 for AIX
clmgr
clmgr
clmgr
clmgr
SHARED_DISKS=<hdisk#>[,<hdisk#>,...] \
[ NODES=<host>[,<host#2>,<host#n>,...] ] \
[ CLUSTER_IP=<IP_Address> ] \
[ FC_SYNC_INTERVAL=## ] \
[ RG_SETTLING_TIME=## ] \
[ MAX_EVENT_TIME=### ] \
[ MAX_RG_PROCESSING_TIME=### ] \
[ SITE_POLICY_FAILURE_ACTION={fallover|notify} ] \
[ SITE_POLICY_NOTIFY_METHOD="<FULL_PATH_TO_FILE>" ]
[ DAILY_VERIFICATION={Enabled|Disabled} ] \
[ VERIFICATION_NODE={Default|<node>} ] \
[ VERIFICATION_HOUR=<00..23> ] \
[ VERIFICATION_DEBUGGING={Enabled|Disabled} ]
modify cluster \
[ NEWNAME=<new_cluster_label> ] \
[ SHARED_DISKS=<disk>[,<disk#2>,<disk#n>,...] ] \
[ NODES=<host>[,<host#2>,<host#n>,...] ] \
[ CLUSTER_IP=<IP_Address> ] \
[ FC_SYNC_INTERVAL=## ] \
[ RG_SETTLING_TIME=## ] \
[ MAX_EVENT_TIME=### ] \
[ MAX_RG_PROCESSING_TIME=### ] \
[ SITE_POLICY_FAILURE_ACTION={fallover|notify} ] \
[ SITE_POLICY_NOTIFY_METHOD="<FULL_PATH_TO_FILE>" ]
[ DAILY_VERIFICATION={Enabled|Disabled} ] \
[ VERIFICATION_NODE={Default|<node>} ] \
[ VERIFICATION_HOUR=<00..23> ] \
[ VERIFICATION_DEBUGGING={Enabled|Disabled} ]
query cluster
delete cluster [ NODES={ALL|<node>[,<node#2>,<node#n>,...}] ]
recover cluster
NOTE: the "delete" action defaults to only deleting
the cluster on the local node.
clmgr sync cluster \
[ VERIFY={yes|no} ] \
[ CHANGES_ONLY={no|yes} ] \
[ DEFAULT_TESTS={yes|no} ] \
[ METHODS=<method#1>[,<method#n>,...] ] \
[ FIX={no|yes} ] \
[ LOGGING={standard|verbose} ] \
[ LOGFILE=<PATH_TO_LOG_FILE> ] \
[ MAX_ERRORS=## ] \
[ FORCE={no|yes} ]
NOTE: all options are verification parameters, so they
are only valid when "VERIFY" is set to "yes".
clmgr manage cluster {discover|reset|unlock}
clmgr manage cluster security \
LEVEL={Disable|Low|Med|High}
clmgr manage cluster security \
ALGORITHM={DES|3DES|AES} \
[ GRACE_PERIOD=<SECONDS> ] \
Appendix D. The clmgr man page
505
[ REFRESH=<SECONDS> ]
clmgr manage cluster security \
MECHANISM={OpenSSL|SelfSigned|SSH} \
[ CERTIFICATE=<PATH_TO_FILE> ] \
[ PRIVATE_KEY=<PATH_TO_FILE> ]
NOTE: "GRACE_PERIOD" defaults to 21600 seconds (6 hours).
NOTE: "REFRESH" defaults to 86400 seconds (24 hours).
clmgr verify cluster \
[ CHANGES_ONLY={no|yes} ] \
[ DEFAULT_TESTS={yes|no} ] \
[ METHODS=<method#1>[,<method#n>,...] ] \
[ FIX={no|yes} ] \
[ LOGGING={standard|verbose} ] \
[ LOGFILE=<PATH_TO_LOG_FILE> ] \
[ MAX_ERRORS=## ]
[ SYNC={no|yes} ] \
[ FORCE={no|yes} ]
NOTE: the "FORCE" option should only be used when "SYNC" is set
to "yes".
clmgr offline cluster \
[ WHEN={now|restart|both} ] \
[ MANAGE={offline|move|unmanage} ] \
[ BROADCAST={true|false} ] \
[ TIMEOUT=<seconds_to_wait_for_completion> ]
clmgr online cluster \
[ WHEN={now|restart|both} ] \
[ MANAGE={auto|manual} ] \
[ BROADCAST={false|true} ] \
[ CLINFO={false|true|consistent} ] \
[ FORCE={false|true} ] \
[ FIX={no|yes|interactively} ]
[ TIMEOUT=<seconds_to_wait_for_completion> ]
NOTE: the "RG_SETTLING_TIME" attribute only affects resource groups
with a startup policy of "Online On First Available Node".
NOTE: an alias for "cluster" is "cl".
SITE:
clmgr add site <sitename> \
[ NODES=<node>[,<node#2>,<node#n>,...] ]
clmgr modify site <sitename> \
[ NEWNAME=<new_site_label> ] \
[ {ADD|REPLACE}={ALL|<node>[,<node#2>,<node#n>,...}] ]
At least one modification option must be specified.
ADD attempts to append the specified nodes to the site.
REPLACE attempts to replace the sites current nodes with
the specified nodes.
clmgr query site [ <sitename>[,<sitename#2>,<sitename#n>,...] ]
clmgr delete site {<sitename>[,<sitename#2>,<sitename#n>,...] | ALL}
clmgr recover site <sitename>
clmgr offline site <sitename> \
[ WHEN={now|restart|both} ] \
506
IBM PowerHA SystemMirror 7.1 for AIX
[ MANAGE={offline|move|unmanage} ] \
[ BROADCAST={true|false} ] \
[ TIMEOUT=<seconds_to_wait_for_completion> ]
clmgr online site <sitename> \
[ WHEN={now|restart|both} ] \
[ MANAGE={auto|manual} ] \
[ BROADCAST={false|true} ] \
[ CLINFO={false|true|consistent} ] \
[ FORCE={false|true} ] \
[ FIX={no|yes|interactively} ]
[ TIMEOUT=<seconds_to_wait_for_completion> ]
NOTE: an alias for "site" is "si".
NODE:
clmgr add node <node> \
[ COMMPATH=<ip_address_or_network-resolvable_name> ] \
[ RUN_DISCOVERY={true|false} ] \
[ PERSISTENT_IP=<IP> NETWORK=<network>
{NETMASK=<255.255.255.0 | PREFIX=1..128} ] \
[ START_ON_BOOT={false|true} ] \
[ BROADCAST_ON_START={true|false} ] \
[ CLINFO_ON_START={false|true|consistent} ] \
[ VERIFY_ON_START={true|false} ]
clmgr modify node <node> \
[ NEWNAME=<new_node_label> ] \
[ COMMPATH=<new_commpath> ] \
[ PERSISTENT_IP=<IP> NETWORK=<network>
{NETMASK=<255.255.255.0 | PREFIX=1..128} ] \
[ START_ON_BOOT={false|true} ] \
[ BROADCAST_ON_START={true|false} ] \
[ CLINFO_ON_START={false|true|consistent} ] \
[ VERIFY_ON_START={true|false} ]
clmgr query node [ {<node>|LOCAL}[,<node#2>,<node#n>,...] ]
clmgr delete node {<node>[,<node#2>,<node#n>,...] | ALL}
clmgr manage node undo_changes
clmgr recover node <node>[,<node#2>,<node#n>,...]
clmgr online node <node>[,<node#2>,<node#n>,...] \
[ WHEN={now|restart|both} ] \
[ MANAGE={auto|manual} ] \
[ BROADCAST={false|true} ] \
[ CLINFO={false|true|consistent} ] \
[ FORCE={false|true} ] \
[ FIX={no|yes|interactively} ]
[ TIMEOUT=<seconds_to_wait_for_completion> ]
clmgr offline node <node>[,<node#2>,<node#n>,...] \
[ WHEN={now|restart|both} ] \
[ MANAGE={offline|move|unmanage} ] \
[ BROADCAST={true|false} ] \
[ TIMEOUT=<seconds_to_wait_for_completion> ]
NOTE: the "TIMEOUT" attribute defaults to 120 seconds.
NOTE: an alias for "node" is "no".
NETWORK:
Appendix D. The clmgr man page
507
clmgr add network <network> \
[ TYPE={ether|XD_data|XD_ip|infiniband} ] \
[ {NETMASK=<255.255.255.0 | PREFIX=1..128} ] \
[ IPALIASING={true|false} ]
clmgr modify network <network> \
[ NEWNAME=<new_network_label> ] \
[ TYPE={ether|XD_data|XD_ip|infiniband} ] \
[ {NETMASK=<###.###.###.###> | PREFIX=1..128} ] \
[ ENABLE_IPAT_ALIASING={true|false} ] \
[ PUBLIC={true|false} ] \
[ RESOURCE_DIST_PREF={AC|C|CPL|ACPL} ]
clmgr query network [ <network>[,<network#2>,<network#n>,...] ]
clmgr delete network {<network>[,<network#2>,<network#n>,...] | ALL}
NOTE: the TYPE defaults to "ether" if not specified.
NOTE: when adding, the default is to construct an IPv4
network using a netmask of "255.255.255.0". To
create an IPv6 network, specify a valid prefix.
NOTE: AC
== Anti-Collocation
C
== Collocation
CPL == Collocation with Persistent Label
ACPL == Anti-Collocation with Persistent Label
NOTE: aliases for "network" are "ne" and "nw".
INTERFACE:
clmgr add interface <interface> \
NETWORK=<network> \
[ NODE=<node> ] \
[ TYPE={ether|infiniband} ] \
[ INTERFACE=<network_interface> ]
clmgr modify interface <interface> \
NETWORK=<network>
clmgr query interface [ <interface>[,<if#2>,<if#n>,...] ]
clmgr delete interface {<interface>[,<if#2>,<if#n>,...] | ALL}
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
the "interface" may be either an IP address or label
the "NODE" attribute defaults to the local node name.
the "TYPE" attribute defaults to "ether"
the "<network_interface>" might look like "en1", "en2", ...
aliases for "interface" are "in" and "if".
RESOURCE GROUP:
clmgr add resource_group <resource_group>
NODES=nodeA1,nodeA2,...
[ SECONDARYNODES=nodeB2,nodeB1,...
[ STARTUP={OHN|OFAN|OAAN|OUDP}
[ FALLOVER={FNPN|FUDNP|BO}
[ FALLBACK={NFB|FBHPN}
[ NODE_PRIORITY_POLICY={default|mem|cpu|
disk|least|most}
[ NODE_PRIORITY_POLICY_SCRIPT=</path/to/script>
[ NODE_PRIORITY_POLICY_TIMEOUT=###
[ SITE_POLICY={ignore|primary|either|both}
[ SERVICE_LABEL=service_ip#1[,service_ip#2,...]
[ APPLICATIONS=appctlr#1[,appctlr#2,...]
508
IBM PowerHA SystemMirror 7.1 for AIX
]
]
]
]
]
]
]
]
]
]
\
\
\
\
\
\
\
\
\
\
\
\
\
[
[
[
[
[
[
[
[
[
[
[
[
[
STARTUP:
OHN ----OFAN ---OAAN ---OUDP ----
SHARED_TAPE_RESOURCES=<TAPE>[,<TAPE#2>,...]
VOLUME_GROUP=<VG>[,<VG#2>,...]
FORCED_VARYON={true|false}
VG_AUTO_IMPORT={true|false}
FILESYSTEM=/file_system#1[,/file_system#2,...]
DISK=<hdisk>[,<hdisk#2>,...]
FS_BEFORE_IPADDR={true|false}
WPAR_NAME="wpar_name"
EXPORT_FILESYSTEM=/expfs#1[,/expfs#2,...]
EXPORT_FILESYSTEM_V4=/expfs#1[,/expfs#2,...]
STABLE_STORAGE_PATH="/fs3"
NFS_NETWORK="nfs_network"
MOUNT_FILESYSTEM=/nfs_fs1;/expfs1,/nfs_fs2;,...
Online
Online
Online
Online
]
]
]
]
]
]
]
]
]
]
]
]
]
\
\
\
\
\
\
\
\
\
\
\
\
Home Node (default value)
on First Available Node
on All Available Nodes (concurrent)
Using Node Distribution Policy
FALLOVER:
FNPN ---- Fallover to Next Priority Node (default value)
FUDNP --- Fallover Using Dynamic Node Priority
BO ------ Bring Offline (On Error Node Only)
FALLBACK:
NFB ----- Never Fallback
FBHPN --- Fallback to Higher Priority Node (default value)
NODE_PRIORITY_POLICY:
NOTE: this policy may only be established if if the FALLOVER
policy has been set to "FUDNP".
default - next node in the NODES list
mem ----- node with most available memory
disk ---- node with least disk activity
cpu ----- node with most available CPU cycles
least --- node where the dynamic node priority script
returns the lowest value
most ---- node where the dynamic node priority script
returns the highest value
SITE_POLICY:
ignore -- Ignore
primary - Prefer Primary Site
either -- Online On Either Site
both ---- Online On Both Sites
NOTE: "SECONDARYNODES" and "SITE_POLICY" only apply when sites are
configured within the cluster.
NOTE: "appctlr" is an abbreviation for "application_controller".
clmgr modify resource_group <resource_group>
[ NEWNAME=<new_resource_group_label>
[ NODES=nodeA1[,nodeA2,...]
[ SECONDARYNODES=nodeB2[,nodeB1,...]
\
] \
] \
] \
Appendix D. The clmgr man page
509
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
STARTUP={OHN|OFAN|OAAN|OUDP}
]
FALLOVER={FNPN|FUDNP|BO}
]
FALLBACK={NFB|FBHPN}
]
NODE_PRIORITY_POLICY={default|mem|cpu|
disk|least|most}
]
SITE_POLICY={ignore|primary|either|both}
]
SERVICE_LABEL=service_ip#1[,service_ip#2,...]
]
APPLICATIONS=appctlr#1[,appctlr#2,...]
]
VOLUME_GROUP=volume_group#1[,volume_group#2,...]]
FORCED_VARYON={true|false}
]
VG_AUTO_IMPORT={true|false}
]
FILESYSTEM=/file_system#1[,/file_system#2,...] ]
DISK=hdisk#1[,hdisk#2,...]
]
FS_BEFORE_IPADDR={true|false}
]
WPAR_NAME="wpar_name"
]
EXPORT_FILESYSTEM=/expfs#1[,/expfs#2,...]
]
EXPORT_FILESYSTEM_V4=/expfs#1[,/expfs#2,...]
]
STABLE_STORAGE_PATH="/fs3"
]
NFS_NETWORK="nfs_network"
]
MOUNT_FILESYSTEM=/nfs_fs1;/expfs1,/nfs_fs2;,... ]
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
NOTE: "SECONDARYNODES" and "SITE_POLICY" only apply when sites are
configured within the cluster.
NOTE: "appctlr" is an abbreviation for "application_controller".
clmgr query resource_group [ <resource_group>[,<rg#2>,<rg#n>,...] ]
clmgr delete resource_group {<resource_group>[,<rg#2>,<rg#n>,...] |
ALL}
clmgr online resource_group <resource_group>[,<rg#2>,<rg#n>,...] \
[ NODES=<node>[,<node#2>,...] ]
clmgr offline resource_group <resource_group>[,<rg#2>,<rg#n>,...] \
[ NODES=<node>[,<node#2>,...] ]
clmgr move resource_group <resource_group>[,<rg#2>,<rg#n>,...] \
{SITE|NODE}=<node_or_site_label> \
[ STATE={online|offline} ] \
[ SECONDARY={false|true} ]
NOTE: the "SITE" and "SECONDARY" attributes are only applicable
when sites are configured in the cluster.
NOTE: the "SECONDARY" attribute defaults to "false".
NOTE: the resource group STATE remains unchanged if "STATE" is
not explicitly specified.
NOTE: an alias for "resource_group" is "rg".
FALLBACK TIMER:
clmgr add fallback_timer <timer> \
[ YEAR=<####> ] \
[ MONTH=<{1..12 | Jan..Dec}> ] \
[ DAY_OF_MONTH=<{1..31}> ] \
[ DAY_OF_WEEK=<{0..6 | Sun..Sat}> ] \
HOUR=<{0..23}> \
MINUTE=<{0..59}>
clmgr modify fallback_timer <timer> \
[ YEAR=<{####}> ] \
[ MONTH=<{1..12 | Jan..Dec}> ] \
510
IBM PowerHA SystemMirror 7.1 for AIX
[
[
[
[
[
DAY_OF_MONTH=<{1..31}> ] \
DAY_OF_WEEK=<{0..6 | Sun..Sat}> ] \
HOUR=<{0..23}> ] \
MINUTE=<{0..59}> ] \
REPEATS=<{0,1,2,3,4 |
Never,Daily,Weekly,Monthly,Yearly}> ]
clmgr query fallback_timer [<timer>[,<timer#2>,<timer#n>,...] ]
clmgr delete fallback_timer {<timer>[,<timer#2>,<timer#n>,...] |\
ALL}
NOTE: aliases for "fallback_timer" are "fa" and "timer".
PERSISTENT IP/LABEL:
clmgr add persistent_ip <persistent_IP> \
NETWORK=<network> \
[ NODE=<node>
]
clmgr modify persistent_ip <persistent_label> \
[ NEWNAME=<new_persistent_label> ] \
[ NETWORK=<new_network> ] \
[ PREFIX=<new_prefix_length> ]
clmgr query persistent_ip [ <persistent_IP>[,<pIP#2>,<pIP#n>,...] ]
clmgr delete persistent_ip {<persistent_IP>[,<pIP#2>,<pIP#n>,...] |
ALL}
clmgr move persistent_ip <persistent_IP> \
INTERFACE=<new_interface>
NOTE: an alias for "persistent_ip" is "pe".
SERVICE IP/LABEL:
clmgr add service_ip <service_ip> \
NETWORK=<network> \
[ {NETMASK=<255.255.255.0 | PREFIX=1..128} ] \
[ HWADDR=<new_hardware_address> ] \
[ SITE=<new_site> ]
clmgr modify service_ip <service_ip> \
[ NEWNAME=<new_service_ip> ] \
[ NETWORK=<new_network> ] \
[ {NETMASK=<###.###.###.###> | PREFIX=1..128} ] \
[ HWADDR=<new_hardware_address> ] \
[ SITE=<new_site> ]
clmgr query service_ip [ <service_ip>[,<service_ip#2>,...] ]
clmgr delete service_ip {<service_ip>[,<service_ip#2>,,...] | ALL}
clmgr move service_ip <service_ip> \
INTERFACE=<new_interface>
NOTE: if the "NETMASK/PREFIX" attributes are not specified,
the netmask/prefix value for the underlying network
is used.
NOTE: an alias for "service_ip" is "se".
APPLICATION CONTROLLER:
clmgr add application_controller <application_controller> \
STARTSCRIPT="/path/to/start/script" \
STOPSCRIPT ="/path/to/stop/script"
[ MONITORS=<monitor>[,<monitor#2>,<monitor#n>,...] ]
Appendix D. The clmgr man page
511
clmgr modify application_controller <application_controller> \
[ NEWNAME=<new_application_controller_label> ] \
[ STARTSCRIPT="/path/to/start/script" ] \
[ STOPSCRIPT ="/path/to/stop/script" ]
[ MONITORS=<monitor>[,<monitor#2>,<monitor#n>,...] ]
clmgr query application_controller [ <appctlr>[,<appctlr#2>,...] ]
clmgr delete application_controller {<appctlr>[,<appctlr#2>,...] | \
ALL}
NOTE: "appctlr" is an abbreviation for "application_controller".
NOTE: aliases for "application_controller" are "ac" and "app".
APPLICATION MONITOR:
clmgr add application_monitor <monitor> \
TYPE={Process|Custom} \
APPLICATIONS=<appctlr#1>[,<appctlr#2>,<appctlr#n>,...] \
MODE={continuous|startup|both} \
[ STABILIZATION="1 .. 3600" ] \
[ RESTARTCOUNT="0 .. 100" ] \
[ FAILUREACTION={notify|fallover} ] \
Process Arguments:
PROCESSES="pmon1,dbmon,..." \
OWNER="<processes_owner_name>" \
[ INSTANCECOUNT="1 .. 1024" ] \
[ RESTARTINTERVAL="1 .. 3600" ] \
[ NOTIFYMETHOD="</script/to/notify>" ] \
[ CLEANUPMETHOD="</script/to/cleanup>" ] \
[ RESTARTMETHOD="</script/to/restart>" ]
Custom Arguments:
MONITORMETHOD="/script/to/monitor" \
[ MONITORINTERVAL="1 .. 1024" ] \
[ HUNGSIGNAL="1 .. 63" ] \
[ RESTARTINTERVAL="1 .. 3600" ] \
[ NOTIFYMETHOD="</script/to/notify>" ] \
[ CLEANUPMETHOD="</script/to/cleanup>" ] \
[ RESTARTMETHOD="</script/to/restart>" ]
NOTE: "STABILIZATION" defaults to 180
NOTE: "RESTARTCOUNT" defaults to 3
clmgr modify application_monitor <monitor> \
[ NEWNAME=<new_monitor_label> ] \
[ See the "add" action, above, for a list
of supported modification attributes. ]
clmgr query application_monitor [ <monitor>[,<monitor#2>,...] ]
clmgr delete application_monitor {<monitor>[,<monitor#2>,...] | ALL}
NOTE: "appctlr" is an abbreviation for "application_controller".
NOTE: aliases for "application_monitor" are "am" and "mon".
DEPENDENCY:
512
IBM PowerHA SystemMirror 7.1 for AIX
# Temporal Dependency (parent ==> child)
clmgr add dependency \
PARENT=<rg#1> \
CHILD="<rg#2>[,<rg#2>,<rg#n>...]"
clmgr modify dependency <parent_child_dependency> \
[ TYPE=PARENT_CHILD ] \
[ PARENT=<rg#1> ] \
[ CHILD="<rg#2>[,<rg#2>,<rg#n>...]" ]
# Temporal Dependency (start/stop after)
clmgr add dependency \
{STOP|START}="<rg#2>[,<rg#2>,<rg#n>...]" \
AFTER=<rg#1>
clmgr modify dependency \
[ TYPE={STOP_AFTER|START_AFTER} ] \
[ {STOP|START}="<rg#2>[,<rg#2>,<rg#n>...]" ] \
[ AFTER=<rg#1> ]
# Location Dependency (colocation)
clmgr add dependency \
SAME={NODE|SITE} \
GROUPS="<rg1>,<rg2>[,<rg#n>...]"
clmgr modify dependency <colocation_dependency> \
[ TYPE=SAME_{NODE|SITE} ] \
GROUPS="<rg1>,<rg2>[,<rg#n>...]"
# Location Dependency (anti-colocation)
clmgr add dependency \
HIGH="<rg1>,<rg2>,..." \
INTERMEDIATE="<rg3>,<rg4>,..." \
LOW="<rg5>,<rg6>,..."
clmgr modify dependency <anti-colocation_dependency> \
[ TYPE=DIFFERENT_NODES ] \
[ HIGH="<rg1>,<rg2>,..." ] \
[ INTERMEDIATE="<rg3>,<rg4>,..." ] \
[ LOW="<rg5>,<rg6>,..." ]
# Acquisition/Release Order
clmgr add dependency \
TYPE={ACQUIRE|RELEASE} \
{ SERIAL="{<rg1>,<rg2>,...|ALL}" |
PARALLEL="{<rg1>,<rg2>,...|ALL}" }
clmgr modify dependency \
TYPE={ACQUIRE|RELEASE} \
{ SERIAL="{<rg1>,<rg2>,...|ALL}" |
PARALLEL="{<rg1>,<rg2>,...|ALL}" }
clmgr query dependency [ <dependency> ]
clmgr delete dependency {<dependency> | ALL} \
[ TYPE={PARENT_CHILD|STOP_AFTER|START_AFTER| \
SAME_NODE|SAME_SITE}|DIFFERENT_NODES} ]
clmgr delete dependency RG=<RESOURCE_GROUP>
NOTE: an alias for "dependency" is "de".
Appendix D. The clmgr man page
513
TAPE:
clmgr add tape <tape> \
DEVICE=<tape_device_name> \
[ DESCRIPTION=<tape_device_description> ] \
[ START="</script/to/start/tape/device>" ] \
[ START_SYNCHRONOUSLY={no|yes} ] \
[ STOP="</script/to/stop/tape/device>" ] \
[ STOP_SYNCHRONOUSLY={no|yes} ]
clmgr modify tape <tape> \
[ NEWNAME=<new_tape_label> ] \
[ DEVICE=<tape_device_name> ] \
[ DESCRIPTION=<tape_device_description> ] \
[ START="</script/to/start/tape/device>" ] \
[ START_SYNCHRONOUSLY={no|yes} ] \
[ STOP="</script/to/stop/tape/device>" ] \
[ STOP_SYNCHRONOUSLY={no|yes} ]
clmgr query tape [ <tape>[,<tape#2>,<tape#n>,...] ]
clmgr delete tape {<tape> | ALL}
NOTE: an alias for "tape" is "tp".
FILE COLLECTION:
clmgr add file_collection <file_collection> \
FILES="/path/to/file1,/path/to/file2,..." \
[ SYNC_WITH_CLUSTER={no|yes} ] \
[ SYNC_WHEN_CHANGED={no|yes} ] \
[ DESCRIPTION="<file_collection_description>" ]
clmgr modify file_collection <file_collection> \
[ NEWNAME="<new_file_collection_label>" ] \
[ ADD="/path/to/file1,/path/to/file2,..." ] \
[ DELETE={"/path/to/file1,/path/to/file2,..."|ALL} ] \
[ REPLACE={"/path/to/file1,/path/to/file2,..."|""} ] \
[ SYNC_WITH_CLUSTER={no|yes} ] \
[ SYNC_WHEN_CHANGED={no|yes} ] \
[ DESCRIPTION="<file_collection_description>" ]
clmgr query file_collection [ <file_collection>[,<fc#2>,<fc#n>,...]]
clmgr delete file_collection {<file_collection>[,<fc#2>,<fc#n>,...]|
ALL}
clmgr sync file_collection <file_collection>
NOTE: the "REPLACE attribute replaces all existing
files with the specified set
NOTE: aliases for "file_collection" are "fc" and "fi".
SNAPSHOT:
clmgr add snapshot <snapshot> \
DESCRIPTION="<snapshot_description>" \
[ METHODS="method1,method2,..." ] \
[ SAVE_LOGS={false|true} ]
clmgr modify snapshot <snapshot> \
[ NEWNAME="<new_snapshot_label>" ] \
[ DESCRIPTION="<snapshot_description>" ]
clmgr query snapshot [ <snapshot>[,<snapshot#2>,<snapshot#n>,...] ]
clmgr view snapshot <snapshot> \
[ TAIL=<number_of_trailing_lines> ] \
514
IBM PowerHA SystemMirror 7.1 for AIX
[
[
[
[
HEAD=<number_of_leading_lines> ] \
FILTER=<pattern>[,<pattern#2>,<pattern#n>,...] ] \
DELIMITER=<alternate_pattern_delimiter> ] \
CASE={insensitive|no|off|false} ]
clmgr delete snapshot {<snapshot>[,<snapshot#2>,<snapshot#n>,...] |
ALL}
clmgr manage snapshot restore <snapshot> \
[ CONFIGURE={yes|no} ] \
[ FORCE={no|yes} ]
NOTE: the "view" action displays the contents of the ".info"
file for the snapshot, if that file exists.
NOTE: CONFIGURE defaults to "yes"; FORCE defaults to "no".
NOTE: an alias for "snapshot" is "sn".
METHOD:
clmgr add method <method_label> \
TYPE={snapshot|verify} \
FILE=<executable_file> \
[ DESCRIPTION=<description> ]
clmgr modify method <method_label> \
TYPE={snapshot|verify} \
[ NEWNAME=<new_method_label> ] \
[ DESCRIPTION=<new_description> ] \
[ FILE=<new_executable_file> ]
clmgr add method <method_label> \
TYPE=notify \
CONTACT=<number_to_dial_or_email_address> \
EVENT=<event>[,<event#2>,<event#n>,...] \
[ NODES=<node>[,<node#2>,<node#n>,...] ] \
[ FILE=<message_file> ] \
[ DESCRIPTION=<description> ] \
[ RETRY=<retry_count> ] \
[ TIMEOUT=<timeout> ]
NOTE: "NODES" defaults to the local node.
clmgr modify method <method_label> \
TYPE=notify \
[ NEWNAME=<new_method_label> ] \
[ DESCRIPTION=<description> ] \
[ FILE=<message_file> ] \
[ CONTACT=<number_to_dial_or_email_address> ] \
[ EVENT=<cluster_event_label> ] \
[ NODES=<node>[,<node#2>,<node#n>,...] ] \
[ RETRY=<retry_count> ] \
[ TIMEOUT=<timeout> ]
clmgr query method [ <method>[,<method#2>,<method#n>,...] ] \
[ TYPE={notify|snapshot|verify} ]
clmgr delete method {<method>[,<method#2>,<method#n>,...] | ALL} \
[ TYPE={notify|snapshot|verify} ]
clmgr verify method <method>
NOTE: the "verify" action can only be applied to "notify" methods.
Appendix D. The clmgr man page
515
If more than one method exploits the same event, and that
event is specified, then both methods will be invoked.
NOTE: an alias for "method" is "me".
LOG:
clmgr modify logs ALL DIRECTORY="<new_logs_directory>"
clmgr modify log {<log>|ALL} \
[ DIRECTORY="{<new_log_directory>"|DEFAULT} ]
[ FORMATTING={none|standard|low|high} ] \
[ TRACE_LEVEL={low|high} ]
[ REMOTE_FS={true|false} ]
clmgr query log [ <log>[,<log#2>,<log#n>,...] ]
clmgr view log [ {<log>|EVENTS} ] \
[ TAIL=<number_of_trailing_lines> ] \
[ HEAD=<number_of_leading_lines> ] \
[ FILTER=<pattern>[,<pattern#2>,<pattern#n>,...] ] \
[ DELIMITER=<alternate_pattern_delimiter> ] \
[ CASE={insensitive|no|off|false} ]
clmgr manage logs collect \
[ DIRECTORY="<directory_for_collection>" ] \
[ NODES=<node>[,<node#2>,<node#n>,...] ] \
[ RSCT_LOGS={yes|no} ] \
NOTE: when "DEFAULT: is specified for the "DIRECTORY" attribute,
then the original, default IBM PowerHA SystemMirror directory
value is restored.
NOTE: the "FORMATTING" attribute only applies to the "hacmp.out"
log, and is ignored for all other logs.
NOTE: the "FORMATTING" and "TRACE_LEVEL" attributes only apply
to the "hacmp.out" and "clstrmgr.debug" logs, and are
ignored for all other logs.
NOTE: when "ALL" is specified in place of a log name, then the
provided DIRECTORY and REMOTE_FS modifications are applied
to all the logs.
NOTE: when "EVENTS" is specified in place of a log name,
then an events summary report is displayed.
VOLUME GROUP:
clmgr query volume_group
LOGICAL VOLUME:
clmgr query logical_volume
FILE_SYSTEM:
clmgr query file_system
PHYSICAL VOLUME:
clmgr query physical_volume \
[ <disk>[,<disk#2>,<disk#n>,...] ] \
[ NODES=<node>,<node#2>[,<node#n>,...] ] \
[ ALL={no|yes} ]
NOTE: "node" may be either a node name, or a networkresolvable name (i.e. hostname or IP address).
516
IBM PowerHA SystemMirror 7.1 for AIX
NOTE: "disk" may be either a device name (e.g. "hdisk0")
or a PVID (e.g. "00c3a28ed9aa3512").
NOTE: an alias for "physical_volume" is "pv".
REPORT:
clmgr view report [<report>] \
[ FILE=<PATH_TO_NEW_FILE> ] \
[ TYPE={text|html} ]
clmgr view report {nodeinfo|rginfo|lvinfo|
fsinfo|vginfo|dependencies} \
[ TARGETS=<target>[,<target#2>,<target#n>,...] ] \
[ FILE=<PATH_TO_NEW_FILE> ] \
[ TYPE={text|html} ]
clmgr view report availability \
[ TARGETS=<appctlr>[,<appctlr#2>,<appctlr#n>,...] ] \
[ FILE=<PATH_TO_NEW_FILE> ] \
[ TYPE={text|html} ] \
[ BEGIN_TIME="YYYY:MM:DD" ] \
[ END_TIME="YYYY:MM:DD" ]
NOTE: the currently supported reports are "basic", "cluster",
"status", "topology", "applications", "availability",
"events", "nodeinfo", "rginfo", "networks", "vginfo",
"lvinfo", "fsinfo", and "dependencies".
Some of these reports provide overlapping information, but
each also provides its own, unique information, as well.
NOTE: "appctlr" is an abbreviation for "application_controller".
NOTE: "MM" must be 1 - 12. "DD" must be 1 - 31.
NOTE: if no "BEGIN_TIME" is provided, then a report will be
generated for the last 30 days prior to "END_TIME".
NOTE: if no "END_TIME" is provided, then the current time will
be the default.
NOTE: an alias for "report" is "re".
Usage Examples
==============
clmgr query cluster
* For output that is more easily consumed by other programs, alternative
output formats, such as colon-delimited or XML, may be helpful:
clmgr -c query cluster
clmgr -x query node nodeA
* Most multi-value lists can be specified in either a colon-delimited manner,
or via quoted strings:
clmgr -a cluster_id,version query cluster
clmgr -a "cluster_id version" query cluster
* Combinations of option flags can be used to good effect. For example, to
retrieve a single value for a single attribute:
clmgr -cSa "version" query cluster
Appendix D. The clmgr man page
517
* Attribute-based searching can help filter out unwanted data, ensuring that
only the desired results are returned:
clmgr -v -a "name" q rg nodes="*nodeB*"
clmgr query file_collection files="*rhosts*"
* Application availability reports can help measure application uptime
requirements:
clmgr view report availability
clmgr add cluster tryme nodes=nodeA,nodeB
clmgr add application_controller manage_httpd \
start_script=/usr/local/bin/scripts/start_ihs.sh \
stop_script=/usr/local/bin/scripts/stop_ihs.sh
clmgr add application_monitor monitor_httpd \
type=process \
applications=manage_httpd \
processes=httpd \
owner=root \
mode=continuous \
stabilization=300 \
restartcount=3 \
failureaction=notify \
notifymethod="/usr/local/bin/scripts/ihs_notification.sh" \
cleanupmethod="/usr/local/bin/scripts/web_app_cleanup.sh" \
restartmethod="/usr/local/bin/scripts/start_ihs.sh"
clmgr add resource_group ihs_rg \
nodes=nodeA,nodeB \
startup=OFAN \
fallover=FNPN \
fallback=NFB \
node_priority_policy=mem \
applications=manage_httpd
clmgr view log hacmp.out FILTER=Event:
Suggested Reading
=================
IBM PowerHA SystemMirror for AIX Troubleshooting Guide
IBM PowerHA SystemMirror for AIX Planning Guide
IBM PowerHA SystemMirror for AIX Installation Guide
Prerequisite Information
========================
IBM PowerHA SystemMirror for AIX Concepts and Facilities Guide
Related Information
===================
IBM PowerHA SystemMirror for AIX Administration Guide
518
IBM PowerHA SystemMirror 7.1 for AIX
Related publications
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this book.
IBM Redbooks
The following IBM Redbooks publication provides additional information about the topic in this
document. Note that it might be available in softcopy only.
Best Practices for DB2 on AIX 6.1 for POWER Systems, SG24-7821
DS8000 Performance Monitoring and Tuning, SG24-7146
IBM AIX Version 7.1 Differences Guide, SG24-7910
IBM System Storage DS8700 Architecture and Implementation, SG24-8786
Implementing IBM Systems Director 6.1, SG24-7694
Personal Communications Version 4.3 for Windows 95, 98 and NT, SG24-4689
PowerHA for AIX Cookbook, SG24-7739
You can search for, view, download or order this document and other Redbooks, Redpapers,
Web Docs, draft, and additional materials, at the following website:
ibm.com/redbooks
Other publications
These publications are also relevant as further information sources:
Cluster Management, SC23-6779
PowerHA SystemMirror for IBM Systems Director, SC23-6763
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Administration Guide,
SC23-6750
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Concepts and Facilities
Guide, SC23-6751
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Installation Guide,
SC23-6755
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Master Glossary, SC23-6757
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Planning Guide,
SC23-6758-01
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Programming Client
Applications, SC23-6759
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist Developer’s
Guide, SC23-6753
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist for DB2 user’s
Guide, SC23-6752
© Copyright IBM Corp. 2011. All rights reserved.
519
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist for Oracle
User’s Guide, SC23-6760
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Smart Assist for WebSphere
User’s Guide, SC23-6762
PowerHA SystemMirror Version 7.1 for AIX Standard Edition Troubleshooting Guide,
SC23-6761
Online resources
These websites are also relevant as further information sources:
IBM PowerHA SystemMirror for AIX
http://www.ibm.com/systems/power/software/availability/aix/index.html
PowerHA hardware support matrix
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105638
IBM PowerHA High Availability wiki
http://www.ibm.com/developerworks/wikis/display/WikiPtype/High%20Availability
Implementation Services for Power Systems for PowerHA for AIX
http://www.ibm.com/services/us/index.wss/offering/its/a1000032
IBM training classes for PowerHA SystemMirror for AIX
http://www.ibm.com/training
Help from IBM
IBM Support and downloads
ibm.com/support
IBM Global Services
ibm.com/services
520
IBM PowerHA SystemMirror 7.1 for AIX
Index
/etc/cluster/rhosts file 73
collection monitoring 202
populating 183
rolling migration 186
snapshot migration 168
/etc/filesystems file 318
/etc/hosts file, collection monitoring 202
/etc/inittab file, cluster monitoring 206
/etc/services file 139
/etc/syslogd.conf file, cluster monitoring 206
/tmp/clmigcheck/clmigcheck.log 161
/usr/es/sbin/cluster/utilities/ file 233
/var/adm/ras/syslog.caa log file 229
/var/hacmp/log/clutils.log file, clmgr debugging 131
/var/log/clcomd/clcomd.log file 313
#5273/#5735 PCI-Express Dual Port Fibre Channel
Adapter 499
Application Availability and Configuration reports 358
application configuration 86
application controller
configuration 91
versus application server 67
application list 353
application monitoring 368
Application Name field tip 143
application server
clmgr command 120
versus application controller 67
application startup, testing with Startup Monitoring configured 298
architecture
changes for RSCT 3.1 3
IBM Systems Director 22
PowerHA SystemMirror 1
Autonomic Health Advisor File System (AHAFS) 11
files used in RSCT 12
A
B
Symbols
-a option
clmgr command 109
wildcards 110
ADAPTER_DOWN event 12
adapters
Ethernet 499
fibre channel 498
for the repository disk 49
InfiniBand 500
PCI bus 500
SAS 499
SCSI and iSCSI 500
adaptive failover 35, 102
Add Network tab 344
adding on new volume group 416
Additional Properties tab 257
agent password 328
AHAFS (Autonomic Health Advisor File System) 11
files used in RSCT 12
AIX
commands and log files 216
disk and dev_group association 443
importing volume groups 383
installation of IBM Systems Director 327
volume group configuration 381
AIX 6.1 67
AIX 6.1 TL6 152
for migration 47
upgrading to 153
AIX 7.1 support of PowerHA 6.1 193
AIX BOS components
installation 59
prerequisites for 44
AIX_CONTROLLED interface state 18
© Copyright IBM Corp. 2011. All rights reserved.
bootstrap repository 225
built-in serial ports 493–494
C
-c flag 209
CAA (Cluster Aware AIX) 7, 224
/etc/filesystems file 318
AIX 6.1 and 7.1 3
central repository 9
changed PVID of repository disk 322
chcluster command 480
clcomdES 24
cluster after the node restarts 317
cluster commands 477
cluster creation 154, 318
cluster environment management 11
cluster services not active message 323
clusterconf command 481
collecting debug information for IBM support 230
commands and log files 224
communication 156
daemons 8
disk fencing 37
file sets 7, 179
initial cluster status 82
log files for troubleshooting 306
lscluster command 478
mkcluster command 478
previously used repository disk 316
removal of volume group when rmcluster does not
320
repository disk 9
repository disk replacement 317
521
rmcluster command 480
RSCT changes 8
services, adding a shared disk 173
subsystem group active 208
subsystem guide 208
subsystems 202
support in RSCT v3.1 3
switch from Group Services 156
troubleshooting 316
volume group
already in use 320
previously used 320
zone 211
Can’t find what you are looking for ? 66
Capture Snapshot window 345
CCI instance 424
central cluster repository-based communication (DPCOM) interface 15
states 15
central repository 9
chcluster command 480
description 480
examples 481
flags 481
syntax 480
checking the configuration 164
clcomd instances 157
clcomd subsystem 157
clcomdES daemon 157
clcomdES subsystem 157
clconfd subsystem 8
clconvert_snapshot command 168
cldump utility 233, 312
clevmgrdES subsystem 31
CLI
cluster creation 340
cluster creation with SystemMirror plug-in 339
command help 341
examples of command usage for resource group management 360
clinfo command 127
clmgr add cluster command 114, 118
clmgr add resource_group command 111
clmgr add resource_group -h command 111
clmgr command 131, 501
-a option 109
actions 104
alternative output formats 130
application server 120
cluster definition synchronization 124
cluster start 127
colon-delimited format 130
configuring a PowerHA cluster 112
displaying log file content 132
enhanced search capability 109
error messages 106
log file 130
new cluster configuration 113
object classes 105
resource group 120
522
IBM PowerHA SystemMirror 7.1 for AIX
return of only one value 110
service address 118
simple XML format 130
syntax 113
usage examples 106
using help 111
-v option 110
clmgr debugging, /var/hacmp/log/clutils.log file 131
clmgr man page 501
clmgr online cluster start_cluster command 129
clmgr query cluster command 108
clmgr query command 107, 122
clmgr sync cluster command 124
clmgr utility 65, 241
cluster configuration 104
PowerHA log files 307
query action 241
view action 242
clmgr verify cluster command 124
clmgr view log command 307
clmigcheck command 153
menu options 164
process overview 158
profile 157
program 157, 167
running 159
running on one node 168
clmigcheck script 308
clmigcheck.txt file 160
clmigcleanup process 155
clRGinfo command 284
clshowres command 393
clstat utility 231, 312
interactive mode 231
-o flag 232
cltopinfo command 82, 118, 309
cluster
adding 385
adding networks 387
adding sites 386
checking in rolling migration 191
configuration 385
configuration synchronization 454
creation
CAA 318
with CLI 340
event flow when a node joins 39
IP address 67
menu 253
multicast IP address, configuration 73
name 114
removal of 103
restarting 183
starting 403
status 205
topology, custom configuration 78
Cluster Applications and Resources 27
Cluster Aware AIX (CAA) 7, 224
/etc/filesystems file 318
AIX 6.1 and 7.1 3
central repository 9
chcluster command 480
clcomdES 24
cluster after node restarts 317
cluster commands 477
cluster creation 154, 318
cluster environment management 11
clusterconf command 481
collecting debug information for IBM support 230
commands and log files 224
communication 156
daemons 8
disk fencing 37
file sets 7, 179
initial cluster status 82
log files for troubleshooting 306
lscluster command 478
mkcluster command 478
repository disk 9
rmcluster command 480
RSCT changes 8
services, adding a shared disk 173
subsystem group active 208
subsystem guide 208
subsystems 202
support in RSCT v3.1 3
troubleshooting 316
volume group, previously used 320
zone 211
cluster communication 13
heartbeat configuration 20
interfaces 13
AIX_CONTROLLED 18
central cluster repository-based communication
(DPCOM) 15
IP network interfaces 13
RESTRICTED 18
SAN-based communication (SFWCOM) 14
node status 18
round-trip time 20
cluster configuration 65
clmgr utility 65, 104
defining 70
event failure recovery 370
node names 70
problem determination data collection 370
recovery from issues 369
resource groups 95
SMIT menu 65–66
custom configuration 68, 78
repository disk and cluster multicast IP address
73
resource group dependencies 96
resources and applications configuration 86
resources configuration 68
typical configuration 67, 69
starting all nodes 129
SystemMirror for IBM Systems Director 133
SystemMirror plug-in 65
test environment 68
undoing local changes 370
verification and synchronization 360
CLI 363
GUI 360
Cluster Configuration Report 366
cluster creation 333–334
common storage 337
host names in FQDN format 75
SystemMirror plug-in CLI 339
SystemMirror plug-in wizard 334
cluster event management 11
cluster implementation
hardware requirements 44
migration planning 46
network 50
planning for high availability 43
PowerHA 7.1 considerations 46
prerequisites for AIX BOS components 44
prerequisites for RSCT components 44
software requirements 44
storage 48
supported hardware 45
cluster interfaces listing 234
cluster management 333
functionality 343
modification functionality 349
storage management 345
SystemMirror plug-in 341
CLI 347
SystemMirror plug-in GUI wizard 341–342
Cluster Management window 343
Cluster Management Wizard 342
cluster modification locks 369
cluster monitoring 201
/etc/inittab file 206
/etc/syslogd.conf file 206
/usr/es/sbin/cluster/utilities/ file tools 233
/var/adm/ras/syslog.caa log file 229
active cluster 368
AIX commands and log files 216
application monitoring 368
CAA commands and log files 224
CAA debug information for IBM support 230
CAA subsystem group active 208
CAA subsystem guide 208
CAA subsystems 202
cldump utility 233
clmgr utility 241
clstat utility 231
Cluster Configuration Report 366
cluster modification locks 369
cluster status 205, 208, 217
Cluster Topology Configuration Report 367
common agent subsystems 205
disk configuration 203, 207, 216
Group Services 218
information collection
after cluster configuration 206
after cluster is running 216
before configuration 202
Index
523
lscluster command for cluster information 209
map view 365
multicast information 205, 207, 217
network configuration and routing table 218
network interfaces configuration 203
ODM classes 236
of activities before starting a cluster 364
PowerHA groups 203
recovery from configuration issues 369
repository disk 206
repository disk, CAA, solidDB 224
routing table 204
solidDB log files 229
subsystem services status 366
SystemMirror plug-in 364
tcipdump, iptrace, mping utilities 220
tools 231
topology view 364
cluster node
installation of SystemMirror agent 330
status and mapping 287
Cluster Nodes and Networks 27
cluster resources, configuration 388
Cluster services are not active message 323
Cluster Snapshot menu 31
Cluster Standard Configuration menu 29
cluster status 208
cluster still stuck in migration condition 308
cluster testing 259, 297
CPU starvation 292
crash in node with active resource group 289
dynamic node priority 302
Group Services failure 296
loss of the rootvg volume group 286, 289
network failure 283
network failure simulation 282
repository disk heartbeat channel 269
rootvg system event 286
rootvg volume group offline 288
SAN-based heartbeat channel 260
cluster topology 385
Cluster Topology Configuration Report 367
cluster topology information 234
CLUSTER_OVERRIDE environment variable 36
deleting the variable 37
error message 37
clusterconf command 481
description 481
examples 482
flags 482
syntax 481
clutils file 306
Collect log files button 347
collection monitoring
/etc/cluster/rhosts file 202
/etc/hosts file 202
colon-delimited format of clmgr command 130
command
help 341
profile 157
524
IBM PowerHA SystemMirror 7.1 for AIX
common storage 337
communication interfaces, adding 387
communication node status 18
communication path 314
components, Reliable Scalable Cluster Technology
(RSCT) 2
configuration
AIX disk and dev_group association 443
cluster 385
adding 385
adding a node 386
adding communication interfaces 387
cluster resources and resource group 388
Hitachi TrueCopy/HUR resources 429
PowerHA cluster 65
recovery from issues 369
troubleshooting 312
verification and synchronization 360
CLI 363
GUI 360
verification of Hitachi TrueCopy/HUR 453
volume groups and file systems on primary site 381
Configure Persistent Node IP Label/Address menu 31
CPU starvation 292
Create Dependency function 357
creation
custom resource group 351
predefined resource group 353
resource group 349
verifying 355
C-SPOC
adding Global Mirror pair to existing volume group
405
creating a volume group 412
disaster recovery 373
on other LVM operations 422
storage resources and resource groups 86
cthags, grpsvcs 6
Custom Cluster Configuration menu 30
custom configuration 68, 78
verifying and synchronizing 81
D
-d flag 213
daemons
CAA 8
clcomd 8
clconfd 8
cld 8
failure in Group Services 12
data collection, problem determination 370
DB2
installation on nodes for Smart Assist 136
instance and database on shared 137
debug information, collecting for IBM support 230
default value 122
dev_group association and AIX disk configuration 443
disaster recovery
C-SPOC operations 373
DS8700 requirements 372
Global Mirror 371
adding a cluster 385
adding a logical volume 407
adding a node 386
adding a pair to a new volume group 411
adding a pair to an existing volume group 404
adding communication interfaces 387
adding networks 387
adding new logical volume on new volume group
416
adding sites 386
AIX volume group configuration 381
cluster configuration 385
cluster resources and resource group configuration 388
considerations 373
creating new volume group 412
DS8700 requirements 372
failover testing 393
FlashCopy relationship creation 379
Global Copy relationship creation 378
graceful site failover 395
importing new volume group to remote site 416
importing volume groups 383
installing DSCLI client software 373
LVM administration of replicated resources 404
mirror group 391
planning 372
PPRC path creation 377
relationship configuration 377
resource configuration 374–375
resource definition 389
resources and resource group definition 391
rolling site failure 398
service IP definition 388
session identifier 379
sessions for involved LSSs 380
site re-integration 400
size increase of existing file system 410
software prerequisites 372
source and target volumes 380
storage agent 389
storage system 390
symmetrical configuration 376
synchronization and verification of cluster configuration 416
testing fallover after adding new volume group
417
volume group and file system configuration 381
Hitachi TrueCopy/HUR 419
adding logical volume 466
adding LUN pair to new volume group 469
adding LUN pairs
to existing volume group 463
adding replicated resources 451
to a resource group 452
AIX disk and dev_group association 443
asynchronous pairing 439
CCI software installation 422
cluster configuration synchronization 454
configuration verification 453
considerations 421
creating volume groups and file systems on replicated disks 447
defining managed replicated resource to PowerHA
451
failover testing 454
graceful site failover 455, 460
HORCM instances 426
horcm.conf files 425
increasing size of existing file system 468
LVM administration of replicated pairs 463
management 422
minimum connectivity requirements 420
planning 420
replicated pair creation 432
resource configuration 429
rolling site failure 457, 461
site re-integration 459, 462
software prerequisites 420
Discovery Manager 249
disk configuration 203, 207
AIX 216
disk heartbeat network, removing 310
DNP (dynamic node priority)
configuration 102
script for the nodes 102
testing 302
DPCOM 15
dpcom node connection 83
dpcomm interface 213
DPF database support 139
DS storage units 495
DS8000 Global Mirror Replicated Resources field 393
DS8700 disaster recovery requirements 372
DSCLI client software 373
duplicated events, filtering 12
dynamic node priority (DNP)
adaptive failover 35
configuration 102
script for the nodes 102
testing 302
E
ECM volume group 313
Edit Advanced Properties button 344
error messages
clmgr command 106
CLUSTER_OVERRIDE variable 37
Ethernet 499
event failure recovery 370
event flow 38
node down processing normal with takeover 41
startup processing 38
when another node joins the cluster 39
export DISPLAY 329
F
failback of PPRC pairs
Index
525
primary site 402
secondary site 400
failbackpprc command 402
failover of PPRC pairs back to primary site 401
failover testing 393
graceful site failover 395
Hitachi TrueCopy/HUR 454
rolling site failure 398
site re-integration 400
failoverpprc command 383
fallover testing
after adding new volume group 476
after making LVM changes 469
fast path, smitty cm_apps_resources 95
fcsX
device busy 57
X value 57
Fibre Channel adapters 495, 498
DS storage units 495
IBM XIV 496
SAN-based communication 57
SVC 497
file collection and logs management 346
file collection creation 346
file sets 7, 61
installation 58, 64
PowerHA 62
PPRC and SPPRC 372
Smart Assist 91
Smart Assist for DB2 136
file systems 121
configuration 381
creation with volume groups 447
importing for Smart Assist for DB2 137
increasing size 468
size increase 410
FILTER argument 132
FlashCopy relationship creation 379
FQDN format on host names 75
G
GENXD Replicated Resources field 393
Global Copy relationships 378
Global Mirror
adding a cluster 385
adding a logical volume 407
adding a node 386
adding a pair to a new volume group 411
adding a pair to an existing volume group 404
adding communication interfaces 387
adding networks 387
adding new logical volume on new volume group 416
adding sites 386
AIX volume group configuration 381
cluster configuration 385
cluster resources and resource group configuration
388
considerations for disaster recovery 373
creating new volume group 412
C-SPOC operations 373
526
IBM PowerHA SystemMirror 7.1 for AIX
disaster recovery 371
DS8700 requirements 372
failover testing 393
graceful site failover 395
rolling site failure 398
site re-integration 400
importing new volume group to remote site 416
importing volume groups 383
LVM administration of replicated resources 404
mirror group 391
planning for disaster recovery 372
software prerequisites 372
PPRC and SPPRC file sets 372
relationship configuration 377
FlashCopy relationship creation 379
Global Copy relationship creation 378
PPRC path creation 377
session identifier 379
sessions for involved LSSs 380
source and target volumes 380
resource configuration 374
prerequisites 375
source and target volumes 375
resource definition 389
resources and resource group definition 391
service IP definition 388
session identifier 379
sessions for all involved LSSs 380
size increase of existing file system 410
source and target volumes 380
storage agent 389
storage system 390
symmetrical configuration 376
synchronization and verification of cluster configuration 416
testing fallover after adding new volume group 417
gossip protocol 13
graceful site failover 395, 455, 460
moving resource group to another site 395
Group Services 2
daemon failure 12
failure 296
information 218
subsystem name
cthags 6
grpsvcs 6
switch to CAA 156
grpsvcs
cthags 6
SRC subsystem 156
H
HACMPtopsvcs class 237
halt -q command 289
hardware configuration
Fibre Channel adapters for SAN-based communication 57
SAN zoning 54
shared storage 55
test environment 54
hardware requirements 44
multicast IP address 45
repository disk 45
SAN 45
supported hardware 45
heartbeat
considerations for configuration 20
testing 260
heartbeat channel, repository disk 269
help in clmgr command 111
high availability, planning a cluster implementation 43
Hitachi CCI software 422
installation in a non-root directory 423
installation in root directory 423
installing a newer version 424
Hitachi TrueCopy/Hitachi Universal Replicator (Hitachi
TrueCopy/HUR) 419
Hitachi TrueCopy/HUR
adding LUN pairs to existing volume group 463
adding LUN pairs to new volume group 469
adding new logical volume 466
AIX disk and dev_group association 443
assigning LUNs to hosts 429
asynchronous pairing 439
CCI instance 424
CCI software installation 422
cluster configuration synchronization 454
considerations 421
creating volume groups and file systems on replicated
disks 447
failover testing 454
graceful site failover 455, 460
HORCM instances 426
horcm.conf files 425
increasing size of existing file system 468
management 422
minimum connectivity requirements 420
replicated pair creation 432
rolling site failure 457, 461
site re-integration 459, 462
software prerequisites 420
HORCM 444
instance 426
horcm.conf files 425
host groups, assigning LUNs 429
host names
FQDN format 75
network planning 51
hostname command 168
I
-i flag 211
IBM storage 495
Fibre Channel adapters 495
DS storage units 495
IBM XIV 496
SVC 497
NAS 497
SAS 498
SCSI 498
IBM support, collecting CAA debug information 230
IBM Systems Director 21
advantages 21
agent file 58
agent password 328
architecture 22
availability menu 251
CLI (smcli interface) 257
cluster configuration 133
cluster creation 333–334
cluster management 333
configuration 328
installation 325–326
AIX 327
hardware requirements 326
login page 246
root user 246
smadmin group 246
smcli utility 22
status of common agent subsystems 205
SystemMirror plug-in 21, 65, 329
systems and agents to discover 250
web interface 246
welcome page 248
IBM XIV 496
ifconfig en0 down command 283
IGMP (Internet Group Management Protocol) 14
InfiniBand 500
information collection after cluster is running 216
installation
AIX BOS components 59
common agent 331
DSCLI client software 373
hardware configuration 54
IBM Systems Director 325–326
hardware requirements 326
on AIX 327
PowerHA file sets 58, 62
PowerHA software example 59
PowerHA SystemMirror 7.1 for AIX Standard Edition
53
SystemMirror agent 332
SystemMirror plug-in 325, 329
agent installation 330
server installation 329
troubleshooting 312
volume group
consideration 64
conversion 64
installp command 133
interfaces
excluding configured 213
states 14
up, point of contact down 20
Internet Group Management Protocol (IGMP) 14
Inter-site Management Policy 452, 460, 462
invalid events, filtering 12
IP address 67, 94
snapshot migration 166
IP network interfaces 13, 118
Index
527
states 14
IPAT via aliasing subnetting requirements 51
IPAT via replacement configuration 162
iptrace utility 220, 222
iSCSI adapters 500
L
LDEV hex values 433
log files 224
AIX 216
clmgr command 130
displaying content using clmgr command 132
PowerHA 306
troubleshooting 306
logical subsystem (LSS), Global Mirror session definition
380
logical volume 416, 466
adding 407
Logical Volume Manager (LVM)
administration of Global Mirror replicated resources
404
commands over repository disk 207
lppchk command 64
lsavailpprcport command 377
lscluster command 18, 82, 478
-c flag 82, 209
cluster information 209
-d flag 213
description 478
examples 478
-i flag 14–15, 20, 83, 211, 275
output 16
-m flag 18, 20, 82, 209, 273
-s flag 215
syntax 478
zone 211
lscluster -m command 18
lslpp command 59
lsmap -all command 287
lspv command 127
lssi command 377
lssrc -ls clstrmgrES command 163
lssrc -ls cthags command 218
lssrc -ls grpsvcs command 218
lsvg command 10
LUN pairs
adding to existing volume group 463
adding to new volume group 469
LUNs
assigning to hosts 429
LDEV hex values 433
LVM (Logical Volume Manager)
commands over repository disk 207
C-SPOC 422
Global Mirror replicated resources 404
lwiplugin.bat script 330
lwiplugin.sh script 330
528
IBM PowerHA SystemMirror 7.1 for AIX
M
-m flag 209
management interfaces 13
map view 365
migration
AIX 6.1 TL6 152
CAA cluster creation 154
clcomdES and clcomd subsystems 157
considerations 152
planning 46
AIX 6.1 TL6 47
PowerHA 7.1 151
premigration checking 153, 157
process 153
protocol 155
snapshot 161
SRC subsystem changes 157
stages 153
switch from Group Services to CAA 156
troubleshooting 308
clmigcheck script 308
cluster still stuck in migration condition 308
non-IP networks 308
upgrade to AIX 6.1 TL6 153
upgrade to PowerHA 7.1 154
mirror group 389, 391
mkcluster command 478
description 479
examples 479
flags 479
syntax 478
mksnapshot command 347
mkss alias 347
modification functionality 349
monitoring 201
/etc/cluster/rhosts file 202
/etc/hosts file 202
/etc/inittab file 206
/etc/syslogd.conf file 206
/usr/es/sbin/cluster/utilities/ file tools 233
/var/adm/ras/syslog.caa log file 229
AIX commands and log files 216
CAA commands and log files 224
CAA debug information for IBM support 230
CAA subsystem group active 208
CAA subsystem guide 208
CAA subsystems 202
cldump utility 233
clmgr utility 241
clstat utility 231
cluster status 205, 208, 217
common agent subsystems 205
disk configuration 203, 207, 216
Group Services 218
IBM Systems Director web interface 246
information collection
after cluster configuration 206
after cluster is running 216
before configuration 202
lscluster command for cluster information 209
multicast information 205, 207, 217
network configuration and routing table 218
network interfaces configuration 203
ODM classes 236
PowerHA groups 203
repository disk 206
repository disk, CAA, solidDB 224
routing table 204
solidDB log files 229
tcpdump, iptrace, mping utilities 220
tools 231
Move Resource Group 395, 455
mping utility 220, 223
mpio_get_config command 56
multicast address 51
multicast information 205, 207
netstat command 217
multicast IP address
configuration 73
hardware requirements 45
not specified 74
multicast traffic monitoring utilities 220
multipath driver 50
N
NAS (network-attached storage) 497
netstat command 217
network configuration 218
network failure simulation 282–283
testing environment 282
network planning 50
host name and node name 51
multicast address 51
network interfaces 51
single adapter networks 51
subnetting requirements for IPAT via aliasing 51
virtual Ethernet 51
Network tab 256
network-attached storage (NAS) 497
networks
addition of 387
interfaces 51, 118
configuration 203
Never Fall Back (NFB) 121
node
crash with an active resource group 289
down processing normal with takeover 41
event flow when joining a cluster 39
failure 41
status 18
node names
cluster configuration 70
network planning 51
NODE_DOWN event 12
nodes
adding 386
AIX 6.1 TL6 152
starting all in a cluster 129
non-DPF database support 139
non-IP networks 308
O
object classes
aliases 105
clmgr 105
supported 106
Object Data Manager (ODM) classes 236
ODM (Object Data Manager) classes 236
odmget command 237
OFFLINE DUE TO TARGET OFFLINE 33
offline migration 191
manually specifying an address 197
planned target configuration 193
planning 191
PowerHA 6.1 support on AIX 7.1 193
procedure 195
process flow 194
starting configuration 192
P
pausepprc command 383
PCI bus adapters 500
physical volume ID (PVID) 88
pick list 90
planning
cluster implementation for high availability 43
hardware requirements 44
migration 46
network 50
PowerHA 7.1 considerations 46
software requirements 44
storage 48
point of contact 18
down, interface up 20
point-of-contact status 82
POWER Blade servers 494
Power Systems 492
POWER5 systems 492
POWER6 systems 493
POWER7 Systems 494
PowerHA 1
available clusters 253
cluster configuration 65
clmgr command 112
clmgr utility 104
custom configuration 68, 78
resource group dependencies 96
resources and applications configuration 86
resources configuration 68
SMIT 66
SystemMirror for IBM Systems Director 133
typical configuration 67, 69
cluster topology with smitty sysmirror 14
defining Hitachi TrueCopy/HUR managed replicated
resource 451
groups, cluster monitoring 203
installation
AIX BOS components 59, 62
file sets 62
RSCT components 59
Index
529
SMIT tree 483
supported hardware 491
SystemMirror
architecture foundation 1
management interfaces 13
SystemMirror 7.1 1
SystemMirror 7.1 features 23
PowerHA 6.1 support on AIX 7.1 193
PowerHA 7.1 36
considerations 46
file set installation 58
migration to 151–153
software installation 59
SystemMirror plug-in 21
volume group consideration 64
PPRC
failing back pairs to primary site 402
failing back pairs to secondary site 400
failing over pairs back to primary site 401
file sets 372
path creation 377
Prefer Primary Site policy 452, 462
premigration checking 153, 157
previous version 1
primary node 120–121
problem determination data collection 370
PVID (physical volume ID) 88, 115
of repository disk 322
Q
query action 241
R
raidscan command 463
Redbooks Web site, Contact us xiv
redundant heartbeat testing 260
refresh -s clcomd command 69
refresh -s syslogd command 306
relationship configuration 377
FlashCopy relationship creation 379
Global Copy relationship creation 378
PPRC path creation 377
session identifier 379
sessions for involved LSSs 380
source and target volumes 380
Reliable Scalable Cluster Technology (RSCT) 2
AHAFS files 12
architecture changes for v3.1 3
CAA support 3
cluster security services 2
components 2
Group Services (grpsvcs) 2
installation 59
PowerHA 5
prerequisites for 44
Remote Monitoring and Control 2
resource managers 2
Topology Services 2
remote site, importing new volume group 416
530
IBM PowerHA SystemMirror 7.1 for AIX
replicated disks, volume group and file system creation
447
replicated pairs 432
LVM administration 463
replicated resources
adding 451
adding to a resource group 452
defining to PowerHA 451
LVM administration of 404
reports
Application Availability and Configuration 358
Cluster Configuration Report 366
Cluster Topology Configuration 367
repository disk 9
changed PVID 322
cluster 224
cluster monitoring 206
configuration 73
hardware requirements 45
heartbeat channel testing 269
LVM command support 207
node connection 83
previously used for CAA 316
replacement 317
snapshot migration 166
resource configuration, Global Mirror 374
prerequisites 375
source and target volumes 375
symmetrical configuration 376
resource group
adding 392
adding from C-SPOC 86
adding Hitachi TrueCopy/HUR replicated resources
452
adding resources 392
application list 353
circular dependencies 33
clmgr command 120
configuration 95, 388
crash in node 289
creation
verifying 355
with SystemMirror plug-in GUI wizard 349
custom creation 351
definition 391
dependencies, Start After 32
management 355
CLI 359
CLI command usage 360
functionality 357
wizard access 355
moving to another site 395
mutual-takeover dual-node implementation 68
OFFLINE DUE TO TARGET OFFLINE 33
parent/child dependency 33
predefined creation 353
removal 358
status change 359
resource group dependencies
Start After and Stop After configuration 96
Stop After 32
Resource Group tab 255
Resource Groups menu 255
resource management 32
adaptive failover 35
dynamic node priority 35
Start After and Stop After resource group dependencies 32
user-defined resource type 34
resource managers 2
Resource Monitoring and Control (RMC) subsystem 2
resource type
management 100
user-defined 100
resources
adding to a resource group 392
configuration 68, 86
RESTRICTED interface state 18
RMC (Resource Monitoring and Control) subsystem 2
rmcluster command 316, 480
description 480
example 480
flags 480
removal of volume group 320
syntax 480
rolling migration 177
/etc/cluster/rhosts file 183, 186
checking newly migrated cluster 191
migrating the final node 188
migrating the first node 179
migrating the second node 185
planning 178
procedure 178
restarting the cluster 183
troubleshooting 191
rolling site failure 398, 457, 461
root user 246
rootvg system event 31
testing 286
rootvg volume group
cluster node status and mapping 287
PowerHA logs 289
testing offline 288
testing the loss of 286
round trip time (rtt) 20, 213
routing table 204, 218
RSCT (Reliable Scalable Cluster Technology) 2, 59
AHAFS files 12
architecture changes for v3.1 3
CAA support 3
changes 8
cluster security services 2
components 2
Group Services 2
PowerHA 5
prerequisites for 44
Remote Monitoring and Control subsystem 2
resource managers 2
Topology Services 2
rtt (round-trip time) 20, 213
S
-s flag 215
SAN
hardware requirements 45
zoning 54
SAN fiber communication
enabling 15
unavailable 15
SAN Volume Controller (SVC) 497
SAN-based communication
channel 54
node connection 83
Fibre Channel adapters 57
SAN-based communication (SFWCOM) interface 14
state 15
SAN-based heartbeat channel testing 260, 263
SAS (serial-attached SCSI) 498
adapters 499
SCSI 498
adapters 500
security keys 313
serial-attached SCSI (SAS) 498
adapters 499
service address 118
defined 94
service IP 388
SFWCOM 14
sfwcom
interface 213
node connection 83
shared disk, adding to CAA services 173
shared storage 55
for repository disk 48
shared volume group
importing for Smart Assist for DB2 137
Smart Assist for DB2 instance and database creation
137
simple XML format of the clmgr command 130
single adapter networks 51
SIRCOL 228
site re-integration 400, 459, 462
failback of PPRC pairs to primary site 402
failback of PPRC pairs to secondary site 400
failover of PPRC pairs back to primary site 401
starting the cluster 403
site relationship 460
sites, addition of 386
smadmin group 246
Smart Assist 91
new location 29
Smart Assist for DB2 135
configuration 147
DB2 installation on both nodes 136
file set installation 136
implementation with SystemMirror cluster 139
instance and database on shared volume group 137
log file 149
prerequisites 136
shared volume group and file systems 137
starting 141
Index
531
steps before starting 139
SystemMirror configuration 139
updating /etc/services file 139
smcli command 257
smcli lslog command 348
smcli mkcluster command 341
smcli mkfilecollection command 348
smcli mksnapshot command 347
smcli synccluster -h -v command 363
smcli undochanges command 363
smcli utility 22
smit bffcreate command 62
smit clstop command 163
SMIT menu 65
changes 66
configuration 66
custom configuration 68, 78
locating available options 66
resource group dependencies configuration 96
resources and applications configuration 86
resources configuration 68
typical configuration 67, 69
SMIT panel 25
Cluster Snapshot menu 31
Cluster Standard Configuration menu 29
Configure Persistent Node IP Label/Address menu
31
Custom Cluster Configuration menu 30
SMIT tree 25
smitty clstart 28
smitty clstop 28
smitty hacmp 26
smit sysmirror fast path 66
SMIT tree 25, 483
smitty clstart command 28
smitty clstop command 28
smitty cm_apps_resources fast path 95
smitty hacmp command 26
smitty sysmirror command 26
PowerHA cluster topology 14
snapshot
conversion 168
failure to restore 169
restoration 169
snapshot migration 161, 164
/etc/cluster/rhosts file 168
adding shared disk to CAA services 173
AIX 6.1.6 and clmigcheck installation 163
checklist 176
clmigcheck program 167
cluster verification 175
conversion 168
procedure 163
process overview 162
repository disk and multicast IP addresses 166
restoration 169
snapshot creation 163
stopping the cluster 163
uninstalling SystemMirror 5.5 168
SNMP, clstat and cldump utilities 312
532
IBM PowerHA SystemMirror 7.1 for AIX
socksimple command 263
software prerequisites 372
software requirements 44
solid subsystem 8
solidDB 224
log file names 230
log files 229
SQL interface 227
status 225
source and target volumes
disaster recovery 375
including in Global Mirror session 380
SPPRC file sets 372
SRC subsystem changes during migration 157
Start After resource group dependency 32, 297
configuration 96
standard configuration testing 298
testing 297
Startup Monitoring, testing application startup 298
startup processing 38
Stop After resource group dependency 32
configuration 96
storage
agent 389
Fibre Channel adapters 495
management 345
NAS 497
resources, adding from C-SPOC 86
SAS 498
SCSI 498
system 389–390
Storage Framework Communication (sfwcom) 213
storage planning 48
adapters for the repository disk 49
multipath driver 50
shared storage for repository disk 48
System Storage Interoperation Center 50
Storage tab 256
subnetting requirements for IPAT via aliasing 51
subsystem services status 366
supported hardware 45, 491
supported storage, third-party multipathing software
49–50
SVC (SAN Volume Controller) 497
symmetrical configuration 376
synccluster command 363
synchronization of cluster configuration 360, 416
CLI 363
GUI 360
syslog facility 306
system event, rootvg 31
System Mirror 7.1
resource management 32
rootvg system event 31
System Storage Interoperation Center 50
SystemMirror
agent installation 332
cluster and Smart Assist for DB2 implementation 139
configuration for Smart Assist for DB2 139
SystemMirror 5.5, uninstalling 168
SystemMirror 7.1
CAA disk fencing 37
CLUSTER_OVERRIDE environment variable 36
deprecated features 24
event flow differences 38
features 23
installation of the Standard Edition 53
new features 24
planning a cluster implementation for high availability
43
SMIT panel 25
supported hardware 45
SystemMirror plug-in 21
agent installation 330
CLI for cluster creation 339
cluster creation and management 333
cluster management 341
CLI 347
Cluster Management Wizard 342
functionality 343
GUI wizard 341
cluster monitoring 364
activities before starting a cluster 364
cluster subsystem services status 366
Cluster tab 255
common storage 337
creation
custom resource group 351
predefined resource group 353
GUI wizard, resource group management 355
initial panel 252
installation 325, 329
verification 329
monitoring an active cluster 368
resource group
creation with GUI wizard 349
management, CLI 359
Resource Groups tab 254
server installation 329
verifying creation of a resource group 355
wizard for cluster creation 334
T
TAIL argument 132
takeover, node down processing normal 41
tcpdump utility 220
test environment 68
testing
application startup with Startup Monitoring configured
298
cluster 259
CPU starvation 292
crash in node with active resource group 289
dynamic node priority 302
failover 393
fallover after adding new volume group 417, 476
fallover after making LVM changes 469
fallover on a cluster after making LVM changes 411
Group Services failure 296
Hitachi TrueCopy/HUR 454
loss of the rootvg volume group 286, 289
network failure 283
network failure simulation 282
repository disk heartbeat channel 269
environment 270
rootvg system event 286
rootvg volume group offline 288
SAN-based heartbeat channel 260, 263
Start After resource group dependency 297
third-party multipathing software 49–50
timeout value 36
top-level menu 67
last two items 67
Topology Services (topsvcs) 2
topology view 364
touch /tmp/syslog.out command 306
troubleshooting 305
CAA 316
changed PVID of repository disk 322
cluster after node restarts 317
cluster creation 318
cluster services not active message 323
previously used repository disk 316
previously used volume group 320
removal of volume group 320
repository disk replacement 317
volume group already in use 320
installation and configuration 312
/var/log/clcomd/clcomd.log file and security keys
313
clstat and cldump utilities and SNMP 312
communication path 314
ECM volume group 313
log files 306
CAA 306
clutils file 306
PowerHA 306
syslog facility 306
migration 308
clmigcheck script 308
cluster still stuck in migration condition 308
non-IP networks 308
rolling migration 191
verbose logging level 307
TrueCopy synchronous pairings 433
TrueCopy/HUR
adding replicated resources 451
adding replicated resources to a resource group 452
configuration verification 453
defining managed replicated resource to PowerHA
451
disaster recovery 419
LVM administration of replicated pairs 463
planning for management 420
resource configuration 429
Two-Node Cluster Configuration Assistant 29
typical configuration 67, 69
clcomdES versus clcomd subsystem 70
node names 70
prerequisite 69
Index
533
U
uname -L command 287
undo changes 363
undochanges command 363
undoing local changes of a configuration 370
unestablished pairs 447
Universal Replicator asynchronous pairing 439
user-defined resource type 34, 100
UUID 224
V
-v option, clmgr command 110
verbose logging level 307
verification
cluster configuration 360, 416
configuration
CLI 363
GUI 360
Hitachi TrueCopy/HUR configuration 453
verification of cluster configuration 360
VGDA, removal from disk 320
view action 242
virtual Ethernet, network planning 51
volume disk group, previous 180
volume groups 120
adding a Global Mirror pair 404, 411
adding LUN pairs 463, 469
adding new logical volume 416
already in use 320
configuration 381
consideration for installation 64
conversion during installation 64
creating 412
creation with file systems on replicated disks 447
importing in the remote site 383
importing to remote site 416
previously used 320
removal when rmcluster command does not 320
testing fallover after adding 417
Volume Groups option 86
volume, dynamically expanding 404
W
web interface 246
wildcards 110
Z
zone 211
534
IBM PowerHA SystemMirror 7.1 for AIX
IBM PowerHA SystemMirror 7.1 for
AIX
IBM PowerHA SystemMirror 7.1 for AIX
IBM PowerHA SystemMirror 7.1 for AIX
IBM PowerHA SystemMirror 7.1 for AIX
(1.0” spine)
0.875”<->1.498”
460 <-> 788 pages
IBM PowerHA SystemMirror 7.1 for
AIX
IBM PowerHA SystemMirror 7.1 for
AIX
Back cover
®
IBM PowerHA
SystemMirror 7.1
for AIX
Learn how to plan for,
install, and configure
PowerHA with the
Cluster Aware AIX
component
IBM PowerHA SystemMirror 7.1 for AIX is a major product announcement
for IBM in the high availability space for IBM Power Systems Servers. This
release now has a deeper integration between the IBM high availability
solution and IBM AIX. It features integration with the IBM Systems
Director, SAP Smart Assist and cache support, the IBM System Storage
DS8000 Global Mirror support, and support for Hitachi Storage.
See how to migrate to,
monitor, test, and
troubleshoot
PowerHA 7.1
This IBM Redbooks publication contains information about the IBM
PowerHA SystemMirror 7.1 release for AIX. This release includes
fundamental changes, in particular departures from how the product has
been managed in the past, which has necessitated this Redbooks
publication.
Explore the IBM
Systems Director
plug-in and disaster
recovery
This Redbooks publication highlights the latest features of PowerHA
SystemMirror 7.1 and explains how to plan for, install, and configure
PowerHA with the Cluster Aware AIX component. It also introduces you to
PowerHA SystemMirror Smart Assist for DB2. This book guides you
through migration scenarios and demonstrates how to monitor, test, and
troubleshoot PowerHA 7.1. In addition, it shows how to use IBM Systems
Director for PowerHA 7.1 and how to install the IBM Systems Director
Server and PowerHA SystemMirror plug-in. Plus, it explains how to
perform disaster recovery using IBM DS8700 Global Mirror and Hitachi
TrueCopy and Universal Replicator.
This publication targets all technical professionals (consultants, IT
architects, support staff, and IT specialists) who are responsible for
delivering and implementing high availability solutions for their enterprise.
®
INTERNATIONAL
TECHNICAL
SUPPORT
ORGANIZATION
BUILDING TECHNICAL
INFORMATION BASED ON
PRACTICAL EXPERIENCE
IBM Redbooks are developed
by the IBM International
Technical Support
Organization. Experts from
IBM, Customers and Partners
from around the world create
timely technical information
based on realistic scenarios.
Specific recommendations
are provided to help you
implement IT solutions more
effectively in your
environment.
For more information:
ibm.com/redbooks
SG24-7845-00
ISBN 0738435120