Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Relational model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Oracle Database wikipedia , lookup
Microsoft Access wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Database model wikipedia , lookup
Clusterpoint wikipedia , lookup
Object-relational impedance mismatch wikipedia , lookup
Failover and Recovery in WebSphere Application Server Advanced Edition 4.0 By: Tom Alcott Michael Cheng Sharad Cocasse David Draeger Melissa Modjeski Hao Wang Revision Date: December 18, 2001 Table of Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 About the Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 1 - WebSphere Topologies for Failover and Recovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1 Introduction to High Availability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.1 The Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.2 The 5 Nines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.3 Considering the Context. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.1.4 Types of Availability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.1.5 Fundamental Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.1.5.1 Hardware Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.1.5.2 Software Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.1.5.3 Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.1.5.4 WebSphere Application Server clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.1.6 Hardware Availability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.1.7 Planned Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.1.8 Disaster Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.1.9 Evaluating your HA Solutions.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2 Failover with WebSphere Application Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.2.1 WebSphere Servlet Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.2.2 EJB Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.2.3 Administrative Server High Availability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.2.4 Discussions and Problem Spots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3 Failover Topologies with WebSphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.3.1 HTTP Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.3.2 Database Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.3.3 LDAP Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.3.4 Firewall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.4 Suggested Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Chapter 2 - HTTP Server Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Chapter 3 - Web Container Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1 WebSphere Server Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 The HTTP Server Plug-in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3 Web Container Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4 Session affinity and session persistence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Chapter 4 - EJB Container Failover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1 The WLM plug-in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 EJB Container failover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3 WLM and EJB Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.4 WLM and EJB Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Chapter 5 - Administrative Server Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.2 Runtime Support in the Administrative Server ......................................... 5.2.1 Types of clients that use the admin server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1.1 Application server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1.2 EJB clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Enabling High Availability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2.1 JNDI Client Failover Behavior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2.2 Location Service Daemon (LSD) Behavior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2.3 Security Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2.4 Transaction logging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2.5 Starting an application server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 System Administration Support in the Administrative Server .............................. 5.3.1 Types of System Administration Clients.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1.1 Administrative Console (admin console). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1.2 XMLConfig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1.3 WSCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Known limitations of System Administration Failover .............................. 5.4 Configuration parameters which affect admin server failover .............................. Chapter 6 - WebSphere Database Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 – Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 – Application Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 StaleConnectionException. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1.1 Connections in auto-commit mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Session Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Expected Behavior - Servlet Service Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Expected Behavior - Manual Update. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Expected Behavior - Time Based Update. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Administrative Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 System Administration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Application Runtime Environment - JNDI caching ................................. 6.4.2.1 -XML/DTD syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2.2 Configuring the JNDI cache for an application server .......................... 6.4.2.3 Configuring the JNDI Cache for an Application Client ......................... 6.4.2.4 Preloading JNDI Cache for Thin Client. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2.5 Operating Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2.6: JNDI Cache Size Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 HA Administrative Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.4 Administrative Database Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 7 - Integrated Hardware Clustering for Database High Availability .................. 7.1 HACMP and HACMP/ES on IBM AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Introduction and Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 Expected Reactions to Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.3 WebSphere HACMP configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.4 Tuning heartbeat and cluster parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 MC/Serviceguard on the HP-Unix Operating System.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 39 39 40 40 40 42 43 43 44 44 44 44 44 45 45 45 47 47 48 48 49 53 54 54 55 56 57 58 59 60 60 60 60 61 61 62 64 64 64 65 67 68 68 7.2.1 Introduction and Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7.2.2 Expected Reactions to Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.2.3 WebSphere MC/ServiceGuard configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 7.2.4 Tuning heartbeat and cluster parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.3 Microsoft Clustered SQL Server on Windows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.3.1 Introduction and Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.3.2 Expected Reactions to Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.3.4 Tuning heartbeat and cluster parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Chapter 8 - Failover for Other Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 8.1 Firewalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 8.2 LDAP Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Appendix A Installing Network Dispatcher (ND) on Windows NT. . . . . . . . . . . . . . . . . . . . . . . . . . 84 Appendix B - Configuring TCP Timeout parameters by OS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Appendix C - MC/ServiceGuard setup instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Appendix D HACMP setup instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Appendix E - Microsoft Clustering Setup Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Preface This document discusses various options for making an IBM WebSphere Application Server V4.x environment Highly Available (HA). It is meant for WebSphere architects and advanced WebSphere users who wish to make their WebSphere environments highly available. We will discuss recommended failover topologies as well as their implementation details. In this document, we suggest a 3-pronged approach to WebSphere failover: 1. Using WebSphere’s built-in failover capability Failover is achieved using the Work Load Management (WLM) facility at the Web Module, EJB Module, and Administrative Server levels. These concepts are discussed in detail in Chapters 3, 4 and 5. 2. Application code best practices for failure related exception handling and recovery In the case of database failover, application code should be able to handle JDBC exceptions and roll-over to the “healthy” database server. This is achieved by ensuring that the application code is “failover-ready”. We discuss this and other database failover concepts in chapter 6. 3. Using other IBM and third party products in conjunction with WebSphere There are a wide variety of IBM and third party products that can be used to enhance the failover capability in WebSphere and the list continues to grow. These products include clustering software for the various operating systems, HA databases and HA web server clusters. These are discussed in chapters 2, 7, and 8. Failover and recovery in WebSphere is an extremely broad subject. In this document we have made an attempt to provide hands-on instructions for some of the tasks and have referenced on-line resources such as IBM RedBooks and the WebSphere InfoCenter for other tasks. About the Authors This document was produced by a team of WebSphere specialists based in the United States. Tom Alcott is an advisory I/T specialist in the United States. He has been a member of the World Wide WebSphere Technical Sales Support team since its inception. In this role, he spends most of his time trying to stay one page ahead of customers in the manual. Before he started working with WebSphere, he was a systems engineer for IBM’s Transarc Lab supporting TXSeries. His background includes over 20 years of application design and development on both mainframe-based and distributed systems. He has written and presented extensively on a number of WebSphere runtime and security issues. Michael Cheng is a member of the WebSphere Architecture Board. He is currently working on the high availability architecture of the WebSphere Application Server. Previously, he had been a significant contributor to IBM's ORB technology, and worked on supporting EJBs in IBM Component Broker. He received a Ph.D. in computer sciences from the University of Wisconsin-Madison. Sharad Cocasse is a Staff Software Engineer on the WebSphere Execution Team. His experience includes several years of writing object-oriented applications and developing/delivering WebSphere education to Customers and IBM field technical personnel. Most recently, he has been directly helping Customers to be successful with their use of the WebSphere Application Server. Sharad holds a Master Degree in Electrical Engineering from the University of Alabama and lives and works in Rochester Minnesota. David Draeger is a WebSphere Software Engineer and member of the WebSphere Execution Team out of Rochester, MN. He has worked on-site with multiple customers during critical situations to resolve issues with WebSphere failover and other topics. Dave has also authored many white papers dealing with WebSphere Application Server. He has helped in the development of tools for WebSphere and has driven new innovations into the WebSphere Application Server software. Dave received a BS in Computer Science from the engineering college of the University of Illinois at Urbana/Champaign. Melissa Modjeski is a member of the WebSphere Execution team, working closely with customers designing and implementing WebSphere environments and applications. She has also been involved in developing WebSphere 3.0, 3.5, and 4.0 education for IBM field members and WebSphere customers. Melissa received a Bachelor of Science degree in Computer Science and Mathematics from Winona State University. Hao Wang is a member of WebSphere development team. He has been working on the San Francisco project, WebSphere Solutions Integration, and WebSphere Connection Management. His background includes Ph.D. degree in Computer Science from Iowa State University. Before he joined IBM in January 1999, he worked for the university as an associate professor and scientist for more than 10 years, instructed graduate-level courses such as the principles of database systems, and conducted R&D on high-performance distributed and parallel computing, cluster computing, and computer simulation models, meanwhile he also worked as IT consultant. Thanks to the following people for their invaluable contributions to this project: Tony Arcuri Keys Botzum Dave Cai Utpal Dave John Koehler Makarand Kulkarni Keith McGuinnes Kevin Zemanek 7 Chapter 1 - WebSphere Topologies for Failover and Recovery 1.1 Introduction to High Availability 1.1.1 The Basics Before discussing the considerations for providing a highly available WebSphere Application Server implementation, it’s important to provide some background on the subject of high availability in order to ensure a common understanding of various terms that will be used later in this document. To make a system “Highly Available” is to design a system or system infrastructure so as to eliminate or minimize the loss of service due to either unplanned or planned outages. While for purposes of this paper we’ll be referring to Highly Available Computer Systems, this term can be applied to a number of systems such as the local water utility or phone system. The meaning of the term “Highly Available” varies based on the service provided and the expected hours of operation. For example, a system used by the local dry cleaner need only be available during normal business hours 5 to 6 days a week (typically 10 to 12 hours/day), while a system used to host an airline reservation system is expected to be available 7x24 (or nearly so). This brings us to our next term, “Continuous Availability”, which is equated with nonstop service. While Continuous Availability is certainly a laudable goal for some systems/services, in practice it is much like perfection, in other words “an absolute that is never achieved”. Moreover, it’s important to realize that “High Availability” does not equal “Continuous Availability”. So if “High Availability” does not equal “Continuous Availability”, what does it mean? For purposes of discussion in this paper we’ll adopt a definition of “providing a computer service that satisfies a defined service level”. What’s a “service level?” Well, as alluded to above in the examples of the dry cleaner and the airline reservation system, it’s providing system availability as appropriate for business. This includes provisions for both planned and unplanned outages, but does not strive to exceed the business needs. By way of example, in a past employment it was necessary for one of the authors to provide a system that satisfied a service level of “6x24 and 1 x20, 99.95 % of the time”, which meant that the system was expected to be available Monday through Saturday, 24 hours a day and on Sunday until 8:00 p.m., at which point 4 hours were allotted for system maintenance. The requirement for “99.95%” availability meant that the system was required to be available for 163.92 hours out of the 164 hours specified, meaning that all unplanned outages could total no more than 5 minutes a week. This is certainly a reasonable and cost-effective goal for most enterprises. A higher availability requirement tends to incur increasingly higher costs for the hardware and system infrastructure. 1.1.2 The 5 Nines This leads to the next term “5 nines”, which refers to making a system available 99.999% of the time. This is generally the “gold standard” for high availability and is offered by most of the leading hardware manufacturers. Examples of products that claim this type of availability are: 8 Ÿ • • • • IBM with High Availability Cluster Multiprocessing (HACMP ™) on AIX ™ Sun with Sun™ Cluster on Solaris ™ Hewlett Packard with MultiComputer/ServiceGuard (MC/ServiceGuard) on HP-UX ™ Microsoft with Windows 2000 and Windows NT Clustering Veritas Cluster Server ™ The availability offered by “5 nines” as compared to other service levels can be seen in the table below. As can be seen, “5 nines” provides for a system that is available for all but 5 minutes per year. Hours Per Year Uptime & Downtime for a 24x7x365 System 8,800.00 4.38 8,750.00 8,700.00 8,650.00 0.09 0 43.8 87.6 8,672.40 8,716.20 8,755.62 8,759.91 8,760 99.95% 99.999% 100.0% Hours Unavail Hours Avail 8,600.00 99.0% 99.5% Uptime 1.1.3 Considering the Context When looking at measurements of system availability, it is important to consider the context of that measurement. For example, a “5 nines” operating system/hardware combination does not mean that your systems will now only have 5 minutes of downtime in a year. Instead, this means that the operating system and associated hardware can achieve this. An entire system must take into account the network, software failures, human error, and a myriad of other factors. Thus, speaking of availability without a context does not necessarily have enough meaning. In order to achieve a true highly-available system and accurately measure its availability, one must consider the entire system and all of the components. Then, the business constraints and goals must be considered. Only then can one meaningfully speak of availability. Issues to consider when designing for high availability are: Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Disk failure CPU or machine failure Process crash/failure Network failure Catastrophic failure, e.g., the loss of an entire data center Human error Planned hardware and software maintenance and upgrades 9 For example, a fault tolerant disk array will not normally address the loss of an entire data center. For some businesses, these issues must be considered. As the saying goes, a system is only as strong as its weakest link. Of course any business must weigh the cost of a system outage versus the cost of any high availability implementation. The incremental cost of providing a highly available architecture increases rapidly once an organization reaches availability of 99.9% or more. 1.1.4 Types of Availability When thinking about availability, two aspects must be considered: process availability and data availability. Process availability is simply the state where processes exist that can process requests. Data availability occurs when data is preserved across process failures and is available for processes that continue to be available. In many systems, data availability is crucial. For example, it is of little value for a banking system to remain available if it can not access account data. Data availability can be further broken down by the general types of data: • static data – binaries, install images, etc. • rarely changing data – configuration information, new document versions, passwords, etc. • active data.- data that is rapidly changing. This type of data usually represents the essence of the system. For a banking system this would include account data. Each of these data types require different types of actions to maintain their availability. For example, static data can usually just be installed at initial application load and perhaps copied to replica machines. Rarely changing data can often be manually updated on each replica. Of course, in either case, manual processes can be automated to reduce human error, but the key point is that data rarely changes, so maintaining availability is not difficult. For active data, the problem is much harder. Significant efforts must be undertaken to maintain the availability of active data. Common techniques include fault tolerant disk arrays and automated replication. 1.1.5 Fundamental Techniques For purposes of this document we’ll define highly available systems as relying on the following technologies: Ÿ Ÿ Hardware Clustering Software Clustering NOTE: This initial discussion is intended solely as a brief overview of the technologies used for high availability. It is neither a recommendation nor a statement of support. 10 1.1.5.1 Hardware Clustering In hardware clustering HA systems, the processes/software to be made highly available are configured to run on one or more server and in the case of a failure are moved from one server to another. Examples of hardware based HA include: • • • • • HACMP Sun Cluster MC/Serviceguard Veritas Cluster Server MS Clustering The noted products provide a mechanism for clustering of software applications across multiple machines, thus eliminating any given server as a Single Point of Failure (SPOF). In general hardware clustering products provide a cluster manager process that periodically polls (also known as “checking the heartbeat”) the other software processes in the cluster to determine if the software, and hardware that it’s running on, is still active. If a “heartbeat” is not detected, then the cluster manager moves the software process to another server in a cluster. While the movement of the process from one machine to another is not instantaneous, it can be accomplished in fairly short order, typically from 30 seconds to a few minutes, depending on the hardware and software employed. Hardware clustering HA systems typically run in either an “active/active” mode of operation or an “active/standby” mode of operation. Depicted below is an “active/active” configuration where all machines are active all the time running some portion of the overall workload during normal operation. Once a failure is detected, the application or process is moved from one server and the remaining server continues to run the entire workload. Active/Active Server 1 Mirrored Disks Application 1 Server 2 Application 2 Before Failover Server 1 Mirrored Disks Server 2 Application 2 Application 1 After Failover Figure 1.1: Active/Active failover configuration 11 In the above scenario, process availability is achieved by starting the appropriate application processes on the replica machine. Data availability is achieved by using fault tolerant disk arrays. Both machines have access to the same data. Another alternative for a hardware clustering HA system is to utilize an “active/standby” configuration where only one server is actively running the workload. A failure triggers a move of all processes or applications from the active server to the standby server. Active/Standby Server 1 Mirrored Disks Server 2 Application 1 Before Failover Server 1 Mirrored Disks Server 2 Application 1 After Failover Figure 1.2: Active/Standby failover configuration 1.1.5.2 Software Clustering Software clustering based HA systems most typically utilize some mechanism for data replication for the purpose of providing database high availability. Examples of software based HA include: • • IBM UDB/DB2 Data Replication Oracle Parallel Server With this type of technology the database manager is configured to replicate the database instance, so that all database updates occur on two instances. In normal operation access to the database occurs via the primary database instance. When a problem occurs in one of the instances, the remaining instance is used. 12 Database Replication Server 1 Mirrored Disks Server 2 Database Instance 2 (copy of Instance 1) Database Instance 1 Before Failover Server 1 Mirrored Disks Server 2 Database Instance 2 (copy of Instance 1) After Failover Figure 1.3: Database Replication failover configuration In this scenario, process availability is achieved by running active database processes on both machines simultaneously. Presumably, the clients accessing the database have the ability to fail over to the backup database server processes as needed. Data availability is achieved by replicating data (using database techniques) from one database to the secondary continuously. Thus, the second database has a nearly current copy of the system data at all times. 1.1.5.3 Advantages and Disadvantages While hardware clustering based HA can be used to make a database, other applications, and processes highly available, software clustering HA is usually limited to databases. Both approaches have advantages and disadvantages. As noted, hardware clustering can be used for applications and processes other than the database. Another advantage with hardware clustering is the light process overhead incurred by the cluster manager process and the “heartbeat” mechanism used to determine hardware or software “health”. When compared to hardware clustering, the ongoing data replication employed by software clustering exacts a higher resource/performance cost. On the other hand, failover for software clustering will occur more rapidly than with hardware clustering. The database processes are already active with the former, while with the latter the process must be moved from one server to another and started. The amount of time required for failover will vary based on the amount of time it takes to detect a failure, time for process startup scripts to execute, any database transaction recovery, etc. As noted previously, failover can occur in roughly 30 seconds to a few minutes, depending on the hardware and software in use. 1.1.5.4 WebSphere Application Server clustering Conspicuous by its absence in the preceding discussion is the clustering technology employed by WebSphere Application Server. WebSphere Application Server relies on “none of the above” to provide clustering 13 capability. Rather WebSphere Application Server provides inherent clustering capability via several technologies as discussed in chapters 3, 4, and 5. 1.1.6 Hardware Availability WebSphere Application Server is part of a much larger environment that includes the network, power supply and other components required of a computing infrastructure. Failure to consider all these additional components will negate any effort expended in deploying WebSphere Application Server in a highly available environment. While use of a HA product can eliminate a given machine as a single point of failure, it’s also important to make provisions for redundant hardware on a given machine. Some examples of redundant hardware include: • • • Disk Controllers/Drives Network Interface Cards Power Supplies/Sources 1.1.7 Planned Maintenance Much of the discussion so far has dealt with use of hardware and software clustering as a means of insuring against an unexpected outage. Clustering can also be employed to allow for service or upgrades on machines in a cluster while the remaining machines in the cluster continue to provide service. This is a very important consideration. Many businesses overlook the very real need to upgrade hardware and software on a regular basis. Upgrades are less risky if alternate systems can still provide services during the upgrades. Planned Maintenance in WebSphere V4.x may involve the use of multiple domains and a special programming model. These techniques are outside the scope of this white paper. 1.1.8 Disaster Recovery Cluster technology eliminates single points of failure, a related but different topic is disaster recovery. Disaster recovery refers to how the system recovers from catastrophic site failures. Examples of catastrophic failures include earthquakes, tornados, hurricanes, floods, and fire. In simple terms, disaster recovery involves replicating an entire site, not just pieces of hardware or sub-components. The service level requirements for disaster recovery depends on the application. Some applications may not have any disaster recovery requirement. Others may simply have backup data tapes from which they would rebuild a new working site over a period of weeks or months. Still others may have requirements to begin operations 24 or 48 hours after the disaster. The most stringent requirement is to keep the service running despite the disaster. Though disaster recovery is a very important topic when considering application availability, it too is beyond the scope of this paper. 1.1.9 Evaluating your HA Solutions 14 We have already discussed the concept of percentage of time available, such as "99.9% availability". This is the number that is most often quoted to relate the probability of how long a system is expected to be available. However, there are other factors that should be taken into consideration when putting together a highly available system. These factors include recovery time, recovery point, cost, programming model, setup and administration complexity, and special operating conditions and restrictions. The recovery time is the duration before the application resumes useful processing again. For the purpose of this paper, the recovery time includes the time to detect a failure and time to perform failover. As an example, we've already mentioned that a database utilizing software replication may be able to failover faster than one that uses hardware clustering. But failover time is not just limited to databases. All services being used by the client, including the application and all the services that it uses, must be considered. Recovery time is related to availability in that with a given probability of failure, the shorter the recovery time, the higher the percent availability. Recovery time is also related to the type of failure. A cascaded failure that brings down many components may take longer to recover than a single failure. Recovery time for algorithms based on time-outs can often be tuned. The recovery point refers to where useful processing resumes. It is directly related to the amount of work that is lost during failover. During database failover, the amount of data lost may be limited to uncommitted transactions. However, if asynchronous replication is used, some committed work may also be lost under some error conditions. The same applies to any other part of the application, whether or not a database is being used to store the data. The cost factor is directly related to the first two factors. In general, the less stringent the availability, recovery time, and recovery point requirements, the less the cost. A system offering 99.999% availability, with recovery time in subseconds, and recovery point of almost no transaction lost during normal failure, can cost orders of magnitude more than a system with 99.9% availability with recovery time in minutes and recovery point of losing all pending transactions, and perhaps more. Taking disaster recovery into account, additional bandwidth and backup sites may have to be set up, adding even more to the cost. The programming model refers to what the application code needs to do during failover. Some services are able to perform failover transparently. Since the failure is masked from the application, there is nothing special that the application needs to do. Other services will require specific actions from the application. For example, the application may get an error message during failover, with the requirement that the operation is retried after the failover is completed. Since application development, testing, and maintenance can involve significant costs, the simpler the programming model the better. The setup and administration complexity should be considered when designing a HA system. Some solutions may require additional third party prerequisites with their own setup and administration overhead. Others may be easy to configure, with no special setup necessary. 15 Usually there are multiple solutions to address a given failure, with different tradeoffs. Some solutions may have special restrictions, such as they only work for certain failures. Or they may work better for some failures. Or perhaps they are unable to maintain consistent view of data under certain failure sequences. The designer of an HA system needs to evaluate all the restrictions to ensure critical data is not impacted, and that the there are ways to work around failure conditions not covered by a solution. 1.2 Failover with WebSphere Application Server WebSphere Application Server is architected to provide for clustering. With prudent planning and deployment, the WebSphere runtime (and your applications) can be made highly available. When planning for high availability with WebSphere there are a number of WebSphere components as well as supporting software components that must be taken into account. In brief these components are: • • • • Network HTTP Server WebSphere Application Server (both web container and EJB container) WebSphere Administration Server (Security, naming (bootstrap and name server), Location Service Daemon, WLM, Serious Event Log, transaction log) • Databases (application, administration, session) • LDAP Server • Firewall These components are depicted below and will be discussed in more detail in subsequent chapters. LDAP HTTP client HTTP Server Internet Firewall Web Container Admin Server EJB Container Transaction log Java client WAS DB App and Session DB Figure 1.4: End-to-End failover configuration Within the WebSphere Application Server runtime there are several distinct technologies employed to provide failover and work load management. 1.2.1 WebSphere Servlet Requests 16 WebSphere Application Server utilizes an HTTP Server plug-in to dispatch servlet requests from the HTTP server to one or more Web/Servlet Containers running inside an application server. In addition to providing for distribution of requests, the plug-in also provides for failover of requests once a web container is determined to be unavailable or not responding to requests. The specifics of how this determination occurs are discussed in chapter 3, but the diagram below depicts request distribution at a high level. The key point to recognize at this point is that requests can be dispatched across 1 to N application servers residing on 1 to N physical servers, thus eliminating a SPOF. These replica processes provide process availability. Data availability is an application domain issue. Servlet Container Servlet Requests HTTP Server App Server Plug-in HTTP(S) Protocol Traffic Servlet Container App Server Figure 1.5: WebSphere Servlet Request Mechanism 1.2.2 EJB Requests In a similar vein, WebSphere Application Server also provides for distribution of requests for Enterprise Java Beans. These requests can come from a standalone Java client, another EJB, or from a servlet (or JSP) as is typically the case in web client architecture. Again the key point for purposes of this discussion is to recognize that requests can be dispatched among 1 to N WebSphere Application Servers residing on multiple physical machines, eliminating any SPOF. These replica processes provide process availability. Data availability is an application domain issue. The specifics of the implementation of EJB failover are discussed further in chapter 4, but is depicted at a high level below. 17 EJB Requests EJB Container Servlet Container App Server App Server EJB Container Java Client IIOP Traffic IIOP Traffic App Server Figure 1.6: WebSphere EJB Request Mechanism These two technologies can be used singularly or combined to provide either vertical clustering or horizontal clustering (these are sometimes referred to as vertical scalability and horizontal scalability). • Vertical clustering refers to the practice of defining multiple clones of an application server on the same physical machine. In some cases a single application server, which is implemented by a single JVM process, cannot always fully utilize the CPU power of a large machine and drive CPU load up to 100%. Vertical cloning provides a straightforward mechanism to create multiple JVM processes, that together can fully utilize all the processing power available as well as providing process level failover. • Horizontal clustering refers to the more traditional practice of defining clones of an application server on multiple physical machines, thereby allowing a single WebSphere application to span several machines while presenting a single system image. Horizontal cloning can provide both increased throughput and failover. 18 Horizontal Scalability Vertical Scalability Application Server Clone 1 Application Server Clone 1 Application Server Clone 2 Application Server Clone 2 Node 1 Node 1 Servlet or EJB Requests Node 2 Servlet or EJB Requests Figure 1.7: Vertical and Horizontal Scalability 1.2.3 Administrative Server High Availability In V4.x, administrative servers can participate in workload management. The WLM mechanism for the administrative servers behaves in a similar manner as EJB requests. This is provided by virtue of the WebSphere runtime's EJB and Java architecture. The workload managed administrative servers provide identical processes across the cluster that can be used to satisfy administration requests, eliminating any SPOF. This provides process availability, as discussed in chapter 5. However, one must keep in mind that the WebSphere infrastructure relies on data in order to function. Chapters 6 and 7 will address administration database availability. A standalone Java client has a unique set of failover concerns. Whereas servlets, JSPs, and EJBs runs in an application server, and have some knowledge of the other administrative servers in the WebSphere domain, the standalone Java client relies on a bootstrap administrative server to provide this information. If the bootstrap administrative server fails, the Java client must re-bootstrap to a different administrative server. Additional information on this scenario is provided in Chapter 6. 1.2.4 Discussions and Problem Spots By providing both vertical and horizontal scalability the WebSphere Application Server runtime architecture eliminates a given application server process as a single point of failure. Horizontal scalability eliminates a single 19 node as a single point of failure. Vertical scalability can be added to improve process availability, or in some cases may provide better utilization of resources on single node. In fact the only single point of failure in the WebSphere runtime is the database server where the WebSphere administrative repository resides. It is on the database server that a hardware-based high availability solution such as HACMP or MC/ServiceGuard should be configured. One should have already invested in making the application data highly available, and this investment should be leveraged for the administration and session databases as well. With WLM of administration servers, WebSphere bypasses many of the problems associated with failover that could result from the failure of an administration server process on a given node. WebSphere does not currently support running its application servers with takeover on hardware based HA platform, because there is little to be gained from the additional investment in the infrastructure. One potential problem area arises when WebSphere is serving as the coordinator of a two-phase commit transaction. This is discussed in more detail in Chapter 5. 1.3 Failover Topologies with WebSphere While the WebSphere Application Server provides a robust failover and load distribution capability within the runtime, there are other components that need to be made highly available in order to provide a highly available web site. As noted previously when discussing failover with WebSphere Application Server, it is important to consider the entire production environment, not just the WebSphere Application Server itself. This includes the HTTP server, the database servers, and the WebSphere Administration Server. Now that we have discussed the fault tolerance mechanisms within the WebSphere runtime proper let’s return to a discussion of failover topologies for the WebSphere environment. 1.3.1 HTTP Requests The first component to consider is the HTTP Server. Some mechanism of providing for load distribution/failover of incoming HTTP requests is required. Without such a mechanism the HTTP server itself becomes a SPOF and scalability is limited by the size the hardware that can be used to host the HTTP server. For this, an IP Sprayer such as WebSphere Edge Server (WES) Network Dispatcher needs to be employed. Like WebSphere Application Server, the WES architecture provides for cluster support, eliminating the need for third party hardware clustering product to provide cluster capability. Other HTTP load distribution products may or may not provide inherent clustering support. Those products that do not provide clustering support will require the use of a hardware based clustering product to eliminate this component as a SPOF. The subjects of HTTP server failover and Network Dispatcher (eND) failover are discussed further in chapters 2 and Appendix A. The diagram below depicts where eND fits into a highly available topology. 20 HTTP Server HTTP Requests Network Dispatcher (primary) Plug-in HTTP Requests HTTP Server TCP/IP Traffic Network Dispatcher (failover) Plug-in Figure 1.8: HTTP Request flows 1.3.2 Database Server The next layer of the topology that needs to be made highly available is the database server. Without highly available application data, your web applications cannot provide dynamic and personalized data to your customer. Order processing and transactions are not possible, in short; a web site could only serve static content for the most part. It is at this layer in the topology that either the hardware or software clustering technology as discussed above needs to be employed. Database failover and the specifics for handling database failover with both the WebSphere runtime for the administration database, and within your applications for application recovery will be discussed in more detail in chapter 6. 1.3.3 LDAP Server A component that is sometimes overlooked when considering HA is the LDAP server. Much like the database server, the LDAP server represents a SPOF unless some manner of replication or clustering is provided. Some options for making the LDAP server highly available will be discussed in chapter 8. 1.3.4 Firewall Firewalls are another major component of a WebSphere environment. Much like the network itself, failure of a firewall can result in catastrophic consequences. HA options for firewalls will be discussed in chapter 8. 1.4 Suggested Topologies The diagram below depicts the various layers in a WebSphere topology and the mechanisms employed to provide high availability. 21 LDAP Internet Firewall IP Sprayer (standby) HTTP WAS WAS WAS App App App Admin Server IP Sprayer HA with Hardware Clustering HA with WebSphere Edge Server or Hardware Clustering HA with WebSphere Edge Server or Hardware Clustering Database(s) HTTP HA with HTTP Server Plug-in WAS WAS WAS App App App Admin Server HA with EJB WLM HA with Hardware Clustering or Software Clustering Figure 1.9: High Availability Mechanisms by Layer This depicts a minimal topology for HA, with two physical servers deployed in each layer of the topology for a total of 12 servers from the firewall to the database and LDAP servers. While the number of servers could be reduced further by compression of the various layers, e.g. Network Dispatcher could be co-located on each of the HTTP server machines, in practice, administration issues as well as security considerations (firewalls), probably preclude this in practice. An even more robust implementation is depicted below. In this case two WebSphere Domains are employed to create a “gold standard” in high availability. 22 Domain 1 HTTP WAS WAS WAS App App App Admin Server WAS WAS WAS App App App Admin Server ND HTTP Domain 2 HTTP WAS DB WAS WAS WAS App App App Admin Server WAS WAS WAS App App App Admin Server WAS WAS WAS App App App Admin Server WAS WAS WAS App App App Admin Server WAS WAS WAS App App App Admin Server WAS WAS WAS App App App Admin Server ND HTTP App and Session DB WAS DB Figure 1.10: “Gold Standard” WebSphere Domain While there are no “hard” limits on the number of nodes that can be clustered in a WebSphere domain, one may want to consider creating multiple WebSphere domains for a variety of reasons: Ÿ Two (or more) domains can be employed to provide not only hardware failure isolation, but software failure isolation as well. This can come into play in a variety of situations: • Planned Maintenance § When deploying a new version of WebSphere. (Note that nodes running WebSphere V3.x and V4.x in the same domain is not supported) § When applying an e-fix or patch. § When rolling out a new application or revision of an existing application. Ÿ In cases where an unforeseen problem occurs with the new software, multiple domains prevent a catastrophic outage to an entire site. A rollback to the previous software version can also be accomplished more quickly. Of course, multiple domains imply the software has to be deployed more than once, which would not be the case with a single domain. Ÿ Multiple smaller domains may provide better performance than a single large domain, since there will be less interprocess communication in a smaller domain. 23 Of course multiple domains will require more effort for day-to-day operations, since administration must be performed on each domain. This can be mitigated through the use of scripts employing wscp and XMLConfig. Multiple domains also mean multiple administration repositories (databases), which means multiple backups here as well. 24 Chapter 2 - HTTP Server Failover In a production WebSphere topology, static HTML content, servlets, and JSPs are invoked using the HTTP protocol through an external HTTP server, such as IBM HTTP Server or Microsoft Internet Information Server. This makes continuous and reliable operation of the HTTP server necessary in the design of a highly available WebSphere solution. The configuration in figure 2.1 has only one web server and thus does not offer any failover capability to the web server component. HTTP client Internet HTTP Server WebSphere Domain (1 or more nodes) Figure 2.1: Single web server configuration - no failover capability In the event of this web server’s failure, all requests from the HTTP client (web browser) would fail to be routed to the application server(s) running inside the WebSphere domain. High availability can be achieved by using a cluster of HTTP servers and an IP sprayer to route requests to the multiple servers. An IP sprayer transparently redirects incoming requests from HTTP clients to a set of HTTP servers. Although the clients behave as if they are communicating directly with a given HTTP server, the IP sprayer is actually intercepting all requests and distributing them among all the available HTTP servers in the cluster. Most IP sprayers provide several different options for routing requests among the HTTP servers, from a basic round-robin algorithm to complex utilization algorithms. Figure 2.2 shows a basic configuration that implements this solution. Each machine in the topology is configured with at least one physical IP address, and a loopback adapter configured with a shared virtual IP address, sometimes called a cluster address. HTTP clients make HTTP requests on this virtual IP address. These requests are routed to the primary sprayer which in turn sprays them to the cluster of web servers. The cluster of web servers consists of identical web servers running on different physical machines. In the event of a failure of one of the HTTP servers, the other HTTP server(s) will successfully accept all the future HTTP requests from the IP sprayer. It is also necessary to provide some sort of failover mechanism for the IP sprayer, to prevent this machine from becoming a single point of failure. One option is to provide a backup IP sprayer, which maintains a heartbeat with the primary sprayer. If primary sprayer fails, the backup sprayer will take over the virtual IP address and process requests from HTTP clients. 25 Physical IP 10.0.0.4 Primary IP Sprayer HTTP client Internet Physical IP 10.0.0.5 HTTP Server WebSphere Domain (1 or more nodes) heart beat Backup IP Sprayer Physical IP 10.0.0.3 HTTP Server Physical IP 10.0.0.6 Virtual IP 10.0.0.2 Figure 2.2: Highly Available web server configuration Some products also provide mutual high availability for an IP sprayer. In this configuration, both IP sprayers route requests to the HTTP server machines. If one of the sprayers fails, the other will take over and route all requests. There are several IP sprayers available, including IBM’s Network Dispatcher, a part of the WebSphere Edge Server product. For more information on working with Network Dispatcher, see Appendix A and “WebSphere Edge Server: Working with Web Traffic Express and Network Dispatcher” (SG246172) available from http://www.redbooks.ibm.com. DNS round-robin may also be used to allow HTTP requests to be served by multiple HTTP servers. However, this solution can introduce failover problems because DNS resolution may be cached by intermediate DNS servers. In the case of failure, the cache may not be flushed quickly enough to allow resolution to a different and functional server. Further more, some DNS round-robin implementations do not provide failure detection of nodes being resolved. Without this capability, a subset of clients of the DNS may be rerouted to the failed node. 26 Chapter 3 - Web Container Failover Once an HTTP request has reached the HTTP server, it is necessary for a decision to be made. Some requests for static content may be handled by the HTTP server. Requests for dynamic content or some static content will be passed to a web container running in a WebSphere Application Server. Whether the request should be handled or passed to WebSphere is decided by the WebSphere HTTP server plug-in, which runs in-process with the HTTP server. For these WebSphere requests, high availability for the web container becomes an important piece of the failover solution. Incoming request HTTP Server plugin App and Session DB Web Container Admin Server WAS DB Figure 3.1: Single Web Container configuration, no failover capability The configuration in Figure 3.1 has only one web container and provides no failover support. In this scenario, if the web container fails, all HTTP requests which needed to be handled by WebSphere would fail. High availability of the web container is provided by a combination of two mechanisms, the server group support built into the WebSphere administration model, and the routing support built into the WebSphere plug-in to the HTTP server. 3.1 WebSphere Server Groups WebSphere Application Server 4.0 provides support for creating an application server template, called a Server Group. Server groups can contain a web container, an EJB container (discussed in more detail in Chapter 4), or both. From a server group, a WebSphere administrator can create any number of application server instances, or clones, of this server group. These clones can all reside on a single node, or can be distributed across multiple nodes in the WebSphere Domain. Clones can be administered as a single unit by manipulating the server group object. WebSphere clones can share application workload and provide failover support. If one of the clones fails, work can continue to be handled by the other clones in the server group, if they are still available. See section 7.2.4 of the WebSphere InfoCenter and Chapter 17 of the “WebSphere 4.0 Advanced Edition Handbook” (SG246176) for more information on working with server groups. 3.2 The HTTP Server Plug-in As mentioned in the previous chapter, a WebSphere environment may include several HTTP server instances. Each HTTP server is configured to run the WebSphere HTTP plug-in in its process. There is a plug-in of 27 similar functionality designed for each HTTP server supported by WebSphere. Each request which comes into the web server is passed through this plug-in, which uses its configuration information to determine if the request should be routed to WebSphere, and if so, which web container the request should be routed to. These web containers may be running on the same machine as the HTTP server, a different machine, or a combination of the two. Web Container Incoming request HTTP Server plugin Admin Server Web Container Admin Server App and Session DB WAS DB Figure 3.2: High Availability Web Container Configuration The plug-in uses an XML configuration file (<WAS_HOME>/config/plugin-cfg.xml) to determine information about the WebSphere domain it is serving. This configuration file is initially generated by the WebSphere Administrative Server. If the web server is on a machine remote from the WebSphere Administrative Server, as in the above diagram, the plugin-cfg.xml file will need to be moved to the web server machine. See the WebSphere 4.0 InfoCenter for more details on WebSphere plug-in topologies. When the HTTP server is started, the plug-in reads the information from the plugin-cfg.xml configuration file into memory. The plug-in then periodically checks to see if the file has been modified, and reloads the information if necessary. How often the plug-in checks this file is based on the RefreshInterval property defined in the file. If this attribute is not present, the default value is 60 seconds. <Config RefreshInterval=60> In a production environment, the RefreshInterval should be set to a relatively high number, as the overhead of the plug-in checking for a new configuration file frequently can adversely affect performance. The plug-in also uses this configuration file to determine how to route HTTP requests to WebSphere web containers. To more closely examine how the plug-in makes this determination, let’s look at an example. Assume the plug-in receives a request for the URL http://www.mycompany.com/myapplication/myservlet. The plug-in parses the request into two pieces, a virtual host (www.mycompany.com:80) and a URI (/myapplication/myservlet). The plug-in then checks the plugin-cfg.xml file searching for a match for these two items: <UriGroup Name="myapplication web module"> <Uri Name="/myapplication"/> </UriGroup> <VirtualHostGroup Name="default_host"> 28 <VirtualHost Name="*:80"/> <VirtualHost Name="*:9080"/> </VirtualHostGroup> Next the plug-in searches for a route entry that contains both the <UriGroup> and the <VirtualHostGroup> entry. This route is used to determine the <ServerGroup> which can handle the request. <Route ServerGroup="MyApplication Server Group" UriGroup="myapplication web module" VirtualHostGroup="default_host"/> A <ServerGroup> represents either a standalone server or a group of WebSphere clones which are available to service the request. Once the <ServerGroup> has been determined, the plug-in must choose a <Server> within the group to route the request to. <ServerGroup Name="MyApplication Server Group"> <Server CloneID="t31ojsis" Name="Server1"> <Transport Hostname="myhost1.domain.com" Port="9089" Protocol="http"/> </Server> <Server CloneID="t31pl23p" Name="Server2"> <Transport Hostname="myhost2.domain.com" Port="9084" Protocol="http"/> </Server> </ServerGroup> The plug-in first checks to see if the client request has an HTTP session associated with it. If so, the CloneID is parsed from the end of the session ID and compared with the CloneIDs for the <Server>s in the <ServerGroup> until a match is found. The request is then routed to that <Server>. If no session ID is associated with the request, the plug-in sends the request to the next <Server> in its routing algorithm. The routing algorithm is either Round Robin or Random, based on the LoadBalance attribute of the <ServerGroup>. If this attribute is not present, the default value is Round Robin. <ServerGroup Name="MyApplication Server Group" LoadBalance="Random"> Once the <Server> has been determined, the <Transport> must be chosen for communications. Transports define the characteristics of the connections between the web server and the application server, across which requests for applications are routed. If the <Server> only has one <Transport> configured, this decision is easy. It is possible, however, for the <Server> to have two transports configured, one for HTTP communication and one for HTTPS communication. <Server CloneID="t31ojsis" Name="Server1"> <Transport Hostname="myhost1.domain.com" Port="9089" Protocol="http"/> <Transport Hostname="myhost1.domain.com" Port="9079" Protocol="https"> <Property name="keyring" value="C:\WebSphere\AppServer\etc\plugin-key.kdb"/> <Property name="stashfile" value="C:\WebSphere\AppServer\etc\plugin-key.sth"/> </Transport> </Server> 29 In this case, the <Transport> communication between the plug-in and the web container is matched to the communication protocol used between the browser and the web server. The URL http://www.mycompany.com/myapplication/myserlvet would be sent from the plug-in to the web container using the HTTP transport while https://www.mycompany.com/myapplication/myserlvet would be sent using the HTTPS transport. 3.3 Web Container Failover When a web container fails, it is the responsibility of the HTTP server plug-in to detect this failure and mark the web container unavailable. Web container failures are detected based on the TCP response values, or lack of response, to a plug-in request. There are four types of failover scenarios for web containers: Ÿ Ÿ Ÿ Ÿ Expected outage of the application server - The application server containing the web application is stopped from one of the administrative interfaces (administrative console, XMLConfig, WSCP). There is currently an issue with application server shutdown under heavy load. During the shutdown process, a small number of client requests may fail with a 404 - File Not Found error. This problem is being investigated and a fix will be made available. Unexpected outage of the application server - The application server crashes for an unknown reason. This can be simulated by killing the process from the operating system. Expected outage of the machine - WebSphere is stopped and the machine is shutdown. Unexpected outage of the machine - The machine is removed from the network due to shutdown, network failure, hardware failure, etc. In the first two cases, the physical machine where the web container is supposed to be running will still be available, although the web container port will not be available. When the plug-in attempts to connect to the web container port to process a request for a web resource, the machine will refuse the connection, causing the plug-in to mark the <Server> unavailable. In the second two cases, however, the physical machine is no longer available to provide any kind of response. In these cases, the plug-in must wait for the local operating system to timeout the request before marking the <Server> unavailable. While the plug-in is waiting for this connection to timeout, requests routed to the failed <Server> appear to hang. The default value for the TCP timeout varies based on the operating system. While these values can be modified at the operating system level, adjustments should be made with great care. Modifications may result in unintended consequences in both WebSphere and other network dependent applications running on the machine. See Appendix B for details on viewing and setting the TCP timeout value for each operating system. If a request to a <Server> in a <ServerGroup> fails, and there are other <Server>s in the group, the plug-in will transparently reroute the failed request to the next <Server> in the routing algorithm. The unresponsive <Server> is marked unavailable and all new requests will be routed to the other <Server>s in the <ServerGroup>. 30 The amount of time the <Server> remains unavailable after a failure is configured by the RetryInterval property on the <ServerGroup> attribute. If this attribute is not present, the default value is 60 seconds. <ServerGroup Name="MyApplication Server Group" RetryInterval=1800> When this RetryInterval expires, the plug-in will add this <Server> into the routing algorithm and attempt to send a request to it. If the request fails or times out, the <Server> is again marked unavailable for the length of a RetryInterval. The proper setting for a RetryInterval will depend on the environment, particularly the value of the operating system TCP timeout value and how many <Server>s are available in the <ServerGroup>. Setting the RetryInterval to a small value will allow a <Server> which becomes available to quickly begin serving requests. However, too small of a value can cause serious performance degradation, or even cause your plug-in to appear to stop serving requests, particularly in a machine outage situation. To explain how this can happen, let’s look at an example configuration with two machines, which we will call A and B. Each of these machines are running two cloned <Server>s each. The HTTP server and plug-in are running on an AIX box with a TCP timeout of 75 seconds, the RetryInterval is set to 60 seconds, and the routing algorithm is Round Robin. If machine A fails, either expectedly or unexpectedly, the following process occurs when a request comes in to the plug-in: 1. The plug-in accepts the request from the HTTP server and determines the <ServerGroup>. 2. The plug-in determines that the request should be routed to clone 1 on machine A. 3. The plug-in attempts to connect to clone 1 on machine A. Because the physical machine is down, the plug-in waits 75 seconds for the operating system TCP timeout interval before determining that clone 1 is bad. 4. The plug-in attempts to route the same request to the next clone in its routing algorithm, clone 2 on machine A. Because machine A is still down, the plug-in must again wait 75 seconds for the operating system TCP timeout interval before determining that clone 2 is bad. 5. The plug-in attempts to route the same request to the next clone in its routing algorithm, clone 1 on machine B. This clone successfully returns a response to the client, over 150 seconds after the request was first submitted. 6. While the plug-in was waiting for the response from clone 2 on machine A, the 60 second RetryInterval for clone 1 on machine A expired, and the clone is added back into the routing algorithm. A new request will soon be routed to this clone, which is still unavailable, and we will begin this lengthy waiting process again. To avoid this problem, we recommend a more conservative RetryInterval, related to the number of clones in your configuration. A good starting point is 10 seconds + (# of clones * TCP_Timeout). This ensures that the plug-in does not get stuck in a situation of constantly trying to route requests to the failed clones. In the scenario above, this setting would cause the two clones on machine B to exclusively service requests for 235 seconds before the clones on machine A were retried, resulting in a another 150 second wait. 3.4 Session affinity and session persistence 31 HTTP session objects can be used within a web application to maintain information about a client across multiple HTTP requests. For example, on an online shopping website, the web application needs to maintain information about what each client has placed in his or her shopping cart. The session information is stored on the server, and a unique identifier for the session is sent back to the client as a cookie or through the URL rewriting mechanisms. The behavior of applications that utilize HTTP sessions in a failover situation will depend on the HTTP session configuration within WebSphere. Each application server within WebSphere 4.0 has a Session Manager Service which is accessible from the Services tab in the WebSphere Administrative Console. WebSphere 4.0 has three basic session configuration options. No sessions - If the application does not need to maintain information about a client across multiple requests, HTTP sessions are not required. Disabling this support within the WebSphere plug-in, while not necessary, will improve performance. To disable session checking, remove the CloneID values from the <Server> entries. <ServerGroup Name="MyApplication Server Group" RetryInterval=240> <Server CloneID="t31ojsis" Name="Server1"> <Transport Hostname="myhost1.domain.com" Port="9089" Protocol="http"/> </Server> <Server CloneID="t31pl23p" Name="Server2"> <Transport Hostname="myhost2.domain.com" Port="9084" Protocol="http"/> </Server> </ServerGroup> Session Affinity (without Session Persistence) - HTTP session affinity is enabled by default. When this support is enabled, HTTP sessions are held in the memory of the application server containing the web application. The plug-in will route multiple requests for the same HTTP session to the same <Server> by examining the CloneID information which is stored with the session key. If this <Server> is unavailable, the request will be routed to another <Server> in the <ServerGroup>, but the information stored in the session is lost. Session Persistence and Session Affinity - When Session Affinity and Session Persistence are both enabled, HTTP sessions are held in memory of the application server containing the web application, and are also periodically persisted to a database. The plug-in will route multiple requests for the same HTTP session to the same <Server>. This <Server> can then retrieve the information from its in-memory session cache. If this <Server> is unavailable, the request is routed to another <Server>, which reads the session information from a database. How much session information is lost depends on the frequency of the persistence, which is configurable on the application server. Persisting information to the database more frequently provides more transparent failover, but may adversely affect your application performance. For more details on configuring session persistence, see Chapter 15 of the “WebSphere V4.0 Advanced Edition Handbook” (SG24-6176-00) available from http://www.redbooks.ibm.com. By persisting HTTP sessions to a database, we introduce another point of failure, the database itself. High availability for databases will be discussed in Chapters 6 and 7. 32 33 Chapter 4 - EJB Container Failover Many J2EE applications rely on Enterprise JavaBeans (EJBs) to provide business logic. EJB clients can be servlets, JSPs, stand-alone Java applications, or even other EJBs. When the EJB client is running in a web container in the same Java Virtual Machine (JVM) as the EJB, as is often the case with servlets and JSPs, the workload management of the web container resources will provide failover for both the HTTP resources and the EJB container. However, if the EJB client is running in a different JVM, failover for the EJB container becomes a critical piece of the overall failover of the environment. EJB Clients Admin Server EJB Container WAS DB App and Session DB Figure 4.1: Single EJB Container Configuration, no failover capability The configuration in Figure 4.1 has only one EJB container and provides no failover support. In this scenario, if the EJB container fails, all EJB requests would also fail. High availability of the EJB container is achieved using a combination of the WebSphere server group support and the Workload Management (WLM) plug-in to the WebSphere Object Request Broker (ORB). As discussed in the previous chapter, WebSphere server groups allow multiple instances of a WebSphere Application Server to be created from a template. These multiple application servers, or clones, have a single administration point, and the ability to share workload. 4.1 The WLM plug-in The mechanisms for routing workload managed EJB requests to multiple clones are handled on the client side of the application. In WebSphere 4.0, this functionality is supplied by a workload management plug-in to the client ORB. In contrast to previous WebSphere releases, there are no longer any changes required to the EJB jar file to provide workload management, greatly simplifying the process of deploying EJBs in a workload managed environment. 34 EJB Container EJB Clients EJB Container Admin Server WAS DB Admin Server App and Session DB Figure 4.2 High Availability EJB Container Configuration To better understand the EJB workload management process, let’s consider a typical workload managed EJB call with the help of Figure 4.2. 1. An EJB client (servlet, Java client, another EJB) performs a JNDI lookup for the EJB. java.lang.Object o = initialContext.lookup("myEJB"); The Interoperable Object Reference (IOR) of the EJB is returned back to the client by the JNDI service running in the WebSphere Administrative Server. This IOR contains a WLM flag which indicates if the EJB is participating in workload management If the WLM flag is set, the IOR will also contain a server group name and routing policy information. 2. A method is invoked on the EJB. If this is the first time the WLM plug-in to the ORB has seen this server group, it makes a request to the Administrative Server to obtain the list of available clones. This list is stamped with an identifier called an epoch number. 3. The client ORB now has information about all the clones in the ServerGroup and it also has information about the selection criteria for the EJBs. As the EJB client continues to make calls to the EJB object, the WLM plug-in to the ORB routes these requests to the appropriate clones using the specified routing policy. For more detail on the routing policies available in WebSphere 4.0, see chapter 17 of the “IBM WebSphere V4.0 Advanced Edition Handbook.” 35 client JVM process EJB Container Clone 1 1: EJB Lookup 4: EJB Requests 4: WLM EJB requests WLM Info + IOR EJB Container Clone 2 N WLM Y/N? Y 2: IOR of EJB + WLM Flag JNDI Lookup AppServer1 process AppServer2 process Admin Server process 3a: Request ServerGroup info 3b: Return ServerGroup info ORB + WLM plug-in Figure 4.3: Workload Managed EJB Invocations This chapter will discuss what happens if failover of the EJB container. However, if the admin server fails during one of the above steps, other problems can occur. See section Chapter 5 for information on handling admin server failover. 4.2 EJB Container failover EJB container failover is handled in combination with the WebSphere Administrative Server and the WLM plug-in to the ORB. The administrative server is the parent process of all of the clones running on a machine. If an existing clone becomes unavailable or a new clone becomes available, the administrative server updates its server group information to reflect this, and assigns a new epoch number. This information is pushed to all of the available clones in the configuration periodically. How often this push occurs is configured by the parameter com.ibm.ejs.wlm.RefreshInterval set on the com.ibm.ejs.sm.util.process.Nanny.adminServerJvmArgs entry in the <WAS_HOME>\bin\admin.config file. The default value is 300 seconds. The administrative server does not push this information to the EJB clients, however. On the next client request, one of the following will happen: 1. The next clone in the WLM routing algorithm is available. i. The clone services the request. ii. If the clone detects that the epoch number of the client does not match its current epoch number, the new server group information is passed to the client along with the response. 2. The next clone in the WLM routing algorithm is no longer available. 36 i. If the com.ibm.ejs.wlm.MaxCommFailures threshold has been reached for the clone, it is marked unusable. By default, the MaxCommFailures threshold is 0, meaning that after the first failure, the clone is marked unusable. This property can be set by specifying -Dcom.ibm.ejs.wlm.MaxCommFailures=<number> on the command line arguments for the client process. ii. The WLM plug-in will not route requests to this clone until new server group information is received, or until the expiration of the com.ibm.ejs.wlm.UnusableInterval. This property is set in seconds. The default value is 900 seconds. This property can be set by specifying -Dcom.ibm.ejs.wlm.UnusableInterval=<seconds> on the command line arguments for the client process. iii. The failed request is transparently routed to the next clone in the routing algorithm. If the WLM plug-in cycles through all of its known servers and none of the clones are available, it makes a request to the administrative server for the new server group information. As with the HTTP server plug-in, the WLM plug-in needs to be able to detect both process and machine failures. When an EJB container process fails, the RMI/IIOP port it was listening on will no longer be available. When an EJB client attempts to communicate with the EJB container, the connection will be refused. Machine failures are a bit more difficult to deal with. If the failure occurs before a connection has been established, the operating system TCP timeout value (as discussed in appendix B) is used. If the failure occurs after the connection is established, the client ORB must timeout communications. The amount of time the ORB will wait before timing out is configured by the property com.ibm.CORBA.requestTimeout. The default value for this property is 180 seconds. This default value should only be modified if your application is experiencing timeout problems, and then great care must be taken to tune it properly. If the value is set too high, the failover process will be very slow. Set it too low, and valid requests will be timed out before the EJB container has a chance to respond. The factors that affect a suitable timeout value are the amount of time to process an EJB request, and the network latency between the client and the EJB server. The time to process a request depends on the application, and the load on the EJB server. The network latency depends on the location of the client. Clients running within the same local area network as the application servers may use a smaller timeout value to provide faster failover. Clients situated farther away on the intranet or internet need to use a longer timeout to tolerate the inherent latency of long distance networks. If the EJB client is running inside of a WebSphere Application Server, this property can be modified by editing the request timeout field on the Object Request Broker property sheet. If the client is a Java client, the property can be specified as a runtime option on the Java command line, for example : java -Dcom.ibm.CORBA.requestTimeout=<seconds> MyClient 4.3 WLM and EJB Types When a client accesses a workload managed EJB, it may be routed to one of a number of instances of the EJB on a number of servers. Not all EJB references can be utilized this way. The table below shows the types of EJBs that can be workload managed. 37 EJB Type Entity Bean, Commit time caching Home Option A CMP BMP Able to be Workload Managed No No No Entity Bean, Commit time caching Home Options B and C CMP BMP Yes Yes Yes Session Bean Yes Yes No Home Stateless Stateful The only type of EJBs that can not be workload managed are instances of a given stateful session bean and entity beans with commit time caching option A. A stateful session bean is used to capture state information that must be shared across multiple client requests that are part of a logical sequence of operations. To ensure consistency in the logic flow of the application, a client must always return to the same instance of the stateful session bean, rather than having multiple requests workload managed across a group of available stateful session beans. Because of this restriction, stateful session bean instances are not considered failover-tolerant. If the process running the stateful session bean fails, the state information maintained in that bean is unrecoverable. It is the responsibility of the application to reconstruct a stateful EJB with the required state. If this is not the case, an alternative implementation, such as storing state in an entity EJB, should be considered. 4.4 WLM and EJB Caching In the case of Entity Beans, the EJB caching option in use plays a role in workload management. WebSphere 4.0 supports three caching options: 1. Option A caching: The EJB container caches an instance of the EJB in a ready state between transactions. The data in the EJB is also cached, not requiring the data to be reloaded from the data store at the start of the next transaction. The entity bean must have exclusive access to the underlying database, which means that workload management will not behave properly in option A caching mode since multiple clones of the same entity could potentially modify the entity data in inconsistent ways. 2. Option B caching: The EJB container caches an instance of the EJB in a ready state between transactions, but invalidates the state information for the EJB. At the start of the next transaction, the data will be reloaded from the database. 3. Option C caching (default): No ready instances are cached by the container, they are returned to the pool of available instances after the transaction has completed. 38 Entity beans can be workload managed if they are loaded from the database at the start of each transaction. By providing either option B caching or option C caching (default), the entity beans can be made to participate in WLM. These two caching options ensure that the entity bean is always reloaded from the database at the start of each transaction. In WebSphere 4.0, the caching options are configured using the Application Assembly Tool (AAT). The caching options (A, B or C) are determined by the combination of options selected in the drop down menus Activate at and Load at. The table below shows the values that represent the three caching options. Option A B C Invalid Activate at Once Once Transaction Transaction Load at Activation Transaction Transaction Activation For more information on how EJB caching options within WebSphere, see the WebSphere InfoCenter or the “WebSphere V4.0 Advanced Edition Handbook.” 39 Chapter 5 - Administrative Server Failover 5.1 Introduction The WebSphere Administrative Server (admin server) runs in a JVM on each node in a WebSphere Administrative Domain. The admin server is responsible for providing run-time support for JNDI service, security, transaction logging, Location Service Daemon, and EJB workload management. The admin server also provides system management support initiated from administrative interfaces, including the Administrative Console, XMLConfig, and WSCP. Communications with the admin server take place using RMI/IIOP. In this chapter, we describe how the failure of the admin server affects the different types of clients, and how to enable high availability for these clients. Java Client Admin Server Admin Console WAS DB XMLConfig Admin Server WSCP Figure 5.1: Administrative Server Failover The first line of defense against admin server failure is the nanny process. If the admin server crashes, the majority of time it is restarted automatically by the nanny process. Clients of the admin server will fail only for the duration required for failure detection and restart. In the very rare case where the admin server is unable to restart, or in the case where the node hosting the admin server crashes, WebSphere 4.0 provides the capability to workload manage admin servers, allowing the clients to route requests around a failed admin server to a different admin server on a different node. This requires a configuration consisting of at least two nodes, with one admin server on each node. The only case where admin server WLM is not supported is transaction logging for two phase commit transaction coordination. These logs are required to be on the same node as the transaction coordinator. One does not explicitly create a server group for the WebSphere Administration Server as is done with Application Servers. The processes of installing WebSphere on multiple machines, all pointing to a single 40 configuration database, creates a WebSphere cluster. The WLM capability is provided by setting the following property in the admin.config: com.ibm.ejs.sm.adminServer.wlm=true This property is set to true by default. Changing the value to false disables the workload management support. For correct behavior, this property must be set to the same value on all admin servers in the WebSphere domain. Admin server workload management uses a special workload management policy different from those used by regular EJBs. Once a client establishes a connection to an admin server, it continues to communicate with that same admin server, unless that server becomes unavailable for some reason. In that case, the client will be routed to one of the other admin servers in the domain, and will continue making calls to that server unless it becomes unavailable. If the client to the workload managed admin servers is running on the same machine as one of the servers, by default the client will always communicate with this local server first. 5.2 Runtime Support in the Administrative Server One of the primary functions of the admin server is to provide runtime support for applications running inside the WebSphere domain. The admin server provides support for JNDI service, Location Service Daemon, security, transaction logging, and EJB workload management. We will first describe how the different types of clients make use of the services in the admin server. We will then describe how to enable high availability for these services. 5.2.1 Types of clients that use the admin server 5.2.1.1 Application server The application server interacts with the admin server in many different ways. During server initialization, it obtains configuration information from the admin server about the modules to host on the application server. It contacts the bootstrap server within the admin server to obtain the InitialContext. This is used by the application server to bind homes into JNDI namespace. If the admin server fails during server initialization, the server is unable to start. Applications within the application server, such as EJBs or servlets, may also contact the bootstrap server and the name server when making use of the JNDI service. If the application object has not yet obtained an InitialContext object when the failure occurs, JNDI lookups will be unsuccessful. If the IntialContext has been obtained, the application will be workload managed to another admin server, if one is available. When communicating with a workload managed EJB home or EJB instance, the WLM runtime may call the admin server to obtain the most recent information about the servers in the server group. This occurs when first making use of an object reference, or as a last resort, when trying all the servers in the server group fail. Failure of the admin server can prevent the client WLM runtime from reaching a functioning server due to lack of server group information. 41 If the application server calls the home of an EJB that is not workload managed, the call is first directed to the Location Service Daemon (LSD) on the admin server where the EJB is hosted. Failure of the LSD prevents the server from completing the call. With security enabled, the application server contacts the security server within the admin server when validating or revalidating the security credentials. Failure of the admin server prevents the application server from validating a new client, or from revalidating an old client whose credentials have expired. The admin server is used by the transactions manager to perform logging for two phase commit operations. If the application server makes use of two phase commit and the administrative server crashes after all participants are in a prepare state and before the commit takes place, it is possible to have rows in the database locked until the transaction can be recovered using the transaction log. 5.2.1.2 EJB clients EJB calls may be initiated from a servlet, EJB, J2EE application client, or an unmanaged thin client. They call the admin server’s bootstrap and name server when making use of the JNDI service to look up EJB homes. If calling workload managed EJB homes or instances, the WLM runtime in the clients may also contact the admin server for server group information, as discussed previously. EJB clients also make use of the LSD in the admin server when calling an EJB home that is not workload managed. 5.2.2 Enabling High Availability 5.2.2.1 JNDI Client Failover Behavior Access to many WebSphere objects, including EJBs and resources such as database connections, are provided through JNDI lookups. Accesses to resources do not pose a high availability concern, because they are created locally in the same process as the clients who uses them. Access to EJB home references involve a remote call to the name server residing in the admin server. The code for this lookup would look similar to the following: 1: 2: 3: 4: p.put( "javax.naming.Context.PROVIDER_URL", "iiop://myserver1.domain.com:900"); InitialContext ic = new InitialContext(p); Object tmpObj = ic.lookup("ejb/myEJB"); myEJBHome beanhome = (myEJBHome)javax.rmi.PortableRemoteObject.narrow((org.omg.CORBA.Object)tmpObj, myEJBHome.class); 5: myEJB mybean = beanhome.create(); Code Sample 5.1 42 The creation of the InitialContext object (line 3) is performed by the JNDI runtime by first bootstrapping to the bootstrap server at port 900 of the admin server at myserver1.domain.com. This returns to the client a workload managed InitialContext that can be used by the client to perform lookup of EJB homes at line 4. The first high availability considerations for using the JNDI service is the initial bootstrap, which may fail if the admin server or the node hosting the admin server fails while line 3 is being executed. In general, the client has no good way of knowing what has transpired to cause the failure. The best course of action is for the client to catch an exception, and re-bootstrap to a different bootstrap server. This means that the application needs to be aware of more than one admin server in the domain, and be coded to bootstrap to a different bootstrap server. The code sample below provides an example of a client coded to bootstrap to multiple administrative servers. It replaces lines 1 and 2 of sample 5.1. This code assumes that an array (hostURL) has been created which contains a list of bootstrap servers and bootstrap ports in colon separated format: localhost:900 myserver:1800 The client attempts to bootstrap to the first server in the array. If this fails due to a NamingException, the next server in the list is tried. If all servers fail, the client prints a message and exits. int i = 0 ; Properties sysProps = System.getProperties(); Properties p = sysProps.clone(); //won't change system properties while ( i < hostURL.length ) { try{ p.put(Context.PROVIDER_URL,"iiop://"+ hostURL[i]); theRootContext = new InitialContext(p); break; }catch(NamingException ex) { if ( i == (hostURL.length - 1)) { System.out.println("All servers are not available, program terminating..."); System.exit(1); } else { System.out.println("Server " + hostURL[i] + " is not available, will use next one"); i ++ ; } } } Code Sample 5.2 Bootstrapping can also occur implicitly. For example, application servers need to bootstrap to the admin server to get an InitialContext during server startup. If this bootstrap fails, the application server will be unable 43 to start. We rely on clustering of the application servers to ensure that other servers are available to process client requests. If line 1 of code sample 5.1 is omitted, the default bootstrap server is used, typically localhost at port 900. For an application client, failover is accomplished by restarting the client to bootstrap to a different admin server: LaunchClient ... -CCBootstrapHost=myserver2.domain.com -CCBootstrapPort=900 In this example, the launchClient command is used to start the client to bootstrap to myserver2.domain.com, at port 900. The same is achieved by changing the application client properties file: BootstrapHost=myserver2.domain.com BootstrapPort=port The default bootstrap location for a thin Java client may be changed by changing the ORB system properties when starting the client from the command line: java -Dcom.ibm.CORBA.BootstrapHost=myserver2.domain.com -Dcom.ibm.CORBA.BootstrapPort=900 . . . Once the initial bootstrap is completed, the second high availability consideration is the actual lookup of the EJB home at line 4. If the admin server fails, the code performing the lookup will get an exception. This problem is addressed by turning on admin server WLM, and writing additional code to retry. Since the InitialContext itself is workload managed, retrying line 4 allows the WLM runtime to failover to a different admin server, allowing the lookup operation to complete successfully. The call to the create method on line 6 is the first call which invokes the EJB server group. If this call fails, this indicates an EJB container failure. EJB container failures are addressed in more detail in Chapter 4. There are four types of administrative server failover scenarios: Ÿ Ÿ Ÿ Ÿ Expected outage of the administrative server - The admin server is stopped from one of the admin interfaces (admin console, XMLConfig, WSCP). Unexpected outage of the administrative server - The administrative server crashes for an unknown reason. This can be simulated by killing the process from the operating system. Expected outage of the machine - WebSphere is stopped and the machine is shut down. Unexpected outage of the machine - The machine is removed from the network due to shutdown, network failure, hardware failure, etc. In the first two cases, the physical machine where the admin server was running is still available. When a client attempts to make a connection to this failed admin server, the remote machine will refuse the connect attempt right away. Bootstrapping to a different server or retrying the lookup allows the client to route the request to one of the other servers. 44 In the second two cases, the physical machine where the admin server was running is no longer available. In this case, the client relies on the ORB to timeout the connection. The same tuning parameters discussed for EJB workload management in chapter 4 can be applied to admin server workload management. 5.2.2.2 Location Service Daemon (LSD) Behavior When a WebSphere application server is started, it allocates a listener port dynamically to process ORB requests. This listener port is registered with the LSD. When an EJB home that is not workload managed is looked up though JNDI, the object reference contains the host and port of the LSD. The very first request to an EJB home is first routed to the LSD, which reroutes the request to the real EJB home. This extra level of indirection enables dynamic port assignment for EJB servers, while allowing the object reference, which is bound to the LSD, to be reusable across server restarts. Since EJB homes that are not workload managed only reside on one application server, the node where the server resides already constitutes a single point of failure. The use of the LSD is just one more single point of failure. To properly address high availability, at least the EJB homes itself needs to be workload managed. If the bean is workload managed, retrying an operation after a failure allows the client to failover. If only the home is workload managed, retrying on the home allows the client to failover to a different home to create or find beans to work with. With workload managed homes or beans, the WLM runtime locates the current host/port of the application servers to route a request, bypassing the LSD altogether. This avoids the LSD as a single point of failure. 5.2.2.3 Security Server An application server uses the security server in the admin server to validate or revalidate security credentials. Enabling admin server WLM allows the security runtime in the application servers to failover to a different security server. The client may have to retry during credential validation during failover. There is currently an issue with security failover in WebSphere 4.x. If the admin server fails, application servers on that node are unable to failover to a backup admin server for credential validation. This is being fixed under APAR PQ55817. With this fix applied, failover should behave properly. 5.2.2.4 Transaction logging If the application makes use of more than one transactional resource, transaction logging is performed via the admin server during two phase commit. Note that the transaction manager will not initiate two phase commit if an application touches just one resource, even for XA capable resources. If the admin server fails prior to the start of the two phase commit protocol, the transaction manager will wait for the admin server to be restarted, so that it is able to perform the logging. If the admin server fails after the start of the two phase commit protocol, the transaction manager will also wait for the admin server to be restarted so as to complete the logging required for two phase commit. If the application server itself fails, it will be restarted to automatically complete the two phase commit protocol. 45 If the node hosting the admin server fails after the start of the two phase commit protocol, the transaction can not be completed since the log resides only on the failed node, and there is no backup process capable of accessing the log. In this case, database rows used by the transaction are locked until the transaction can be completed. When this occurs, the node needs to be fixed, and the admin and application servers restarted. As an alternative, another node with identical configuration and the same host name may be configured, with the transaction log copied over to the new node. It can then be restarted to complete the transaction. The transaction logs are located in the "tranlog" directory of the WebSphere installation. 5.2.2.5 Starting an application server If the admin server fails while starting an application server, the server is unable to start. Utilizing horizontal or vertical clustering allows the clients to failover to a different server within the server group. 5.3 System Administration Support in the Administrative Server The second major function of the admin server is to provide a single administration point for all of the objects within the WebSphere domain. The admin server allows for administration of information stored in the WebSphere repository. 5.3.1 Types of System Administration Clients 5.3.1.1 Administrative Console (admin console) The admin console is the graphical user interface for systems administration. It uses the bootstrap server and the name server in the admin server to perform JNDI operations. User operations on the admin console are relayed to the EJBs on the admin server, and possibly indirectly to another admin server if the request affects another node. If security is enabled, the security runtime in the admin server uses the security server in the admin server to authenticate the user. If the admin server fails during admin console initialization, the console will fail to bootstrap to the admin server. This prevents the console from starting up altogether. Failover is achieved by restarting the admin console to bootstrap to a different admin server. The command line options are: adminclient -host host [-port port] where the default port number is 900. After the admin console successfully starts up, workload management of the admin server takes effect. If an operation on the admin console fails, retrying the console allows the workload management runtime to reroute the request to a different admin server to complete the operation. Note that requests involving the applications on the node where the admin server failed, such as querying the state of the servers on that node, will fail until the admin server is restarted. But requests affecting other nodes will succeed. 5.3.1.2 XMLConfig 46 XMLConfig allows the system administrator to perform application management functions, such as application installation. If the admin server fails, XMLConfig operations will fail as well. If the operations of XMLConfig affect the node where the admin server is running, the admin server needs to be restarted before those operations can succeed. However, if the operations do not affect the applications on the same node as the failed admin server, XMLConfig may be rerun to bootstrap to a different admin server with the following parameters: -adminNodeName host There is a currently a problem in which XMLConfig may attempt to contact the failed administrative server to process a cloned application, even if that application is also available on one of the other nodes. This is being investigated by development and will be fixed in a future release. 5.3.1.3 WebSphere Control Program WebSphere Control Program (WSCP) is used by the system administrator to perform administration operations, such as stopping and starting servers. If the operation affects the node where the admin server fails, the admin server needs to be restored before the operation can complete. If the operation does not affect the node of the failed admin server, wscp can be rerun to bootstrap to a different admin server by setting the following properties either from the command line, or via the configuration property file: wscp.hostName=host 5.3.2 Known limitations of System Administration Failover Due to the distributed nature of WebSphere, it is possible that an admin server may need to retrieve information about an application on a different node. It may also have to delegate administration functions to the admin server hosted on the same node as the affected application servers. To do this, the admin server opens a communication link to the admin server on the remote node. If the admin server on the remote node is unavailable, the operations will not complete successfully. 5.4 Configuration parameters which affect admin server failover Some settings can be configured on the WLM client to influence how the failover occurs. These settings become especially important in the case where network connection to an administrative server is lost. In general, the default values for these parameters can be accepted. However, if you see unusual behavior or have a very slow or very fast network, you may want to consider adjusting the following parameters on the client: Ÿ Ÿ TCP/IP timeout affects the time required to get an exception while bootstrapping to a node that has been shutdown, or disconnected from the network. This setting is covered in Appendix B. com.ibm.CORBA.requestTimeout - This property specifies the time-out period for responding to workload management requests. This property is set in seconds. 47 Ÿ Ÿ com.ibm.ejs.wlm.MaxCommFailures - This property specifies the number of attempts that a workload management client makes to contact the administrative server that manages workloads for the client. The workload management client run time does not identify an administrative server as unavailable until a certain number of attempts to access it have failed. This allows workload management to continue if the server suffers from transient errors that can briefly prevent it from communicating with a client. The default value is 0. com.ibm.ejs.wlm.UnusableInterval -This property specifies the time interval that the workload management client run time waits after it marks an administrative server as unavailable before it attempts to contact the server again. This property is set in seconds. The default value is 900 seconds. See InfoCenter article http://www-4.ibm.com/software/webservers/appserv/doc/v40/ae/infocenter/was/070206.html for more information on these parameters. 48 Chapter 6 - WebSphere Database Failover 6.1 – Introduction HTTP client HTTP Servers, WebSphere Application and Administration Servers Internet WAS DB App and Session DB Firewall Figure 6.1: Single WebSphere Database Topology As noted in the introduction to this paper there are two types of availability, data availability and process availability. High availability for a database server encompasses both of these. Data availability is the domain of the database manager, while process availability falls into the domain of one of the clustering technologies employed to make the database server process highly available. The specifics of configuring your database server for high availability are addressed in chapter 7. However, it is important to note that even in a WebSphere environment which employs one of the HA mechanisms discussed in chapter 7, there is still an interruption in service while the database is switched from a failed server to an available server. This chapter will focus on application code implications for HA as well as administrative options in the WebSphere runtime to allow application toleration of a database outage. WebSphere Application Server adds some key components that allow it to survive a database server failure, or more precisely the interruption and subsequent restoration of service by the database server. These components take two forms; the first are some IBM extensions to the JDBC 2.0 API. These extensions allow WebSphere and applications running within WebSphere the capability to easily reconnect to a database server once it has recovered from a failure, or to recognize that the database is not responding to requests. The first JDBC extension is the com.ibm.websphere.ce.cm.StaleConnectionException. The StaleConnectionException maps the multiple SQL return codes that occur in the event of a database outage to a single exception. Not only does the WebSphere runtime (Admin Server) utilize this mechanism, but it is provided as part of the WebSphere programming model for application components (Servlets, JSPs, and EJBs) running in WebSphere. This extension to JDBC allows applications running in WebSphere and WebSphere itself to reconnect to a database after service is restored. The com.ibm.ejs.cm.pool.ConnectionWaitTimeoutException is the second JDBC extension. This exception occurs if the connection timeout parameter for the datasource is exceeded. This parameter specifies the amount of time an application will wait to obtain a connection from the pool if the maximum number of connections is reached and all of the connections are in use. 49 The second component enables applications running within WebSphere to continue to function while the WebSphere Administrative Repository database is unavailable. Because the WebSphere Administrative Repository stores the WebSphere JNDI name space, an outage of this repository can also cause an inability to perform JNDI lookups. WebSphere provides a runtime JNDI cache available in application clients and application servers, which allows the JNDI service to continue while the administration database is unavailable. 6.2 – Application Databases 6.2.1 StaleConnectionException From a WebSphere application code perspective, stale connections are those connections that for some reason can no longer be used. This can happen, for example, if the database server is shut down or if the network is experiencing problems. In all of these cases, the connections are no longer usable by the application and the connection pool needs to be flushed and rebuilt. This type of support was added in Version 3.5.2, and is improved in Version 4.0. More vendor-specific error codes have been added to the mapping that results in a StaleConnectionException. In addition, when a StaleConnectionException is detected, the entire pool of connections is automatically destroyed. Explicitly catching a StaleConnectionException is not required by most applications. Because applications are already required to catch java.sql.SQLException, and StaleConnectionException extends SQLException, StaleConnectionException is automatically caught in the general catch-block. However, explicitly catching StaleConnectionException makes it possible for an application to perform additional recovery steps from bad connections. Before continuing, it is important to understand that WAS will automatically reconnect to the database when future requests arrive without any application intervention. The use of the StaleConnectionException exception should be limited to applications that feel they need an extra level of transparency in addition to the automatic recovery provided by WAS. This transparent recovery code can be very complex to develop in more complex scenarios involving multiple resources or complex logic flows. The following discussion is aimed at teams that feel that this additional complexity is warranted. The most common time for a StaleConnectionException to be thrown is the first time that a connection is used, just after it is retrieved. Because connections are pooled, a database failure is not detected until the operation immediately following its retrieval from the pool, which is the first time communication to the database is attempted. And it is only when a failure is detected that the connection is marked stale. StaleConnectionException occurs less often if each method that accesses the database gets a new connection from the pool. Examining the sequence of events that occur when a database fails to service a JDBC request shows that the failure occurs less often because all connections currently handed out to an application are marked stale. The more connections the application has, the more StaleConnectionExceptions which occur. Generally, when a StaleConnectionException is caught, the transaction in which the connection was involved needs to be rolled back and a new transaction begun with a new connection. Details on how to do this can be broken down into three categories: • A connection in auto-commit mode • A connection not in auto-commit and transaction begun in the same method as database access • A connection not in auto-commit and transaction begun in a different method from database access 50 6.2.1.1 Connections in auto-commit mode By default, any connection obtained from a one-phase datasource (implementing javax.sql.ConnectionPoolDataSource) is in auto-commit mode when there is no scoping transaction. When in auto-commit mode, each database action is executed and committed in a single database transaction. Servlets often use connections in auto-commit mode, because transaction semantics are not necessary. Enterprise applications do not usually use connections in auto-commit mode. Auto-commit can be explicitly disabled by calling setAutoCommit() on a Connection object. When a StaleConnectionException is caught from a connection in auto-commit mode, recovery is a simple matter of closing all of the associated JDBC resources and retrying the operation with a new connection. Note: In some cases the cause of the database outage might be transient. In these cases, it might be worthwhile to add a pause to the retry logic to allow for database service restoration. The number of retries as well as any pause should be kept small so as to not keep a web site user waiting indefinitely. An example of this follows: public void myConnPool() throws java.rmi.RemoteException { // retry indicates whether to retry or not // numOfRetries states how many retries have been attempted boolean retry = false; int numOfRetries = 0; java.sql.Connection conn = null; java.sql.Statement stmt = null; do { try { //Assumes that a datasource has already been obtained from JNDI conn = ds.getConnection(); stmt = conn.createStatement(); stmt.execute("INSERT INTO ORG VALUES (10, 'Pacific', '270', 'Western', 'Seattle')"); retry = false; } catch(com.ibm.websphere.ce.cm.StaleConnectionException sce) { //if a StaleConnectionException is caught rollback and retry the action if (numOfRetries < 2) { retry = true; numOfRetries++; // add an optional pause sleep(10000) ; } else { retry = false; } } catch (java.sql.SQLException sqle) { //deal with other database exception } finally { //always cleanup JDBC resources try { if(stmt != null) stmt.close(); } catch (java.sql.SQLException sqle) { //usually can ignore 51 } try { if(conn != null) conn.close(); } catch (java.sql.SQLException sqle) { //usually can ignore } } } while (retry) ; } 6.2.1.2 Connections not in auto-commit mode If a connection does not have auto-commit enabled, multiple database statements can be executed in the same transaction. Because each transaction uses a significant number of resources, fewer transactions result in better performance. Therefore, if a connection is used for executing more than one statement, turn off auto-commit mode and use transactions to group a number of statements into one unit of work. Keep in mind that if a transaction has too many statements, the database can experience problems due to lack of memory. 6.2.1.2.1 Transactions started in the same method If a transaction is begun in the same method as the database access, recovery is straightforward and similar to the case of using a connection in auto-commit mode. When a StaleConnectionException is caught, the transaction is rolled back and the method retried. If a StaleConnectionException occurs somewhere during execution of the try block, the transaction is rolled back, the retry flag is set to true, and the transaction is retried. As is the case with connections in auto-commit mode the number of reties should be limited as well as any pause, because the exception might not be transient. This is illustrated in the following: do { try { //begin a transaction tran.begin(); //Assumes that a datasource has already been obtained from JNDI conn = ds.getConnection(); conn.setAutoCommit(false); stmt = conn.createStatement(); stmt stmt.execute("INSERT INTO ORG VALUES (10, 'Pacific', '270', 'Western', 'Seattle')"); tran.commit(); retry = false; } catch(com.ibm.websphere.ce.cm.StaleConnectionException sce) { //if a StaleConnectionException is caught rollback and retry the action try { tran.rollback(); } catch (java.lang.Exception e) { //deal with exception in most cases, this can be ignored } // deal with other database exceptions and clean up as before } } 52 6.2.1.2.2 Transactions started in a different method When a transaction is begun in a different method from the database access, an exception needs to be thrown from the data access method to the transaction access method so that it can retry the operation. In an ideal situation, a method can throw an application-defined exception, indicating that the failure can be retried. However this is not always allowed, and often a method is defined only to throw a particular exception. This is the case with the ejbLoad and ejbStore methods on an enterprise bean. A more comprehensive discussion of each of these scenarios as well as code samples is available in the Whitepaper; “WebSphere Connection Pooling”, which is available from: http://www-4.ibm.com/software/webservers/appserv/whitepapers/connection_pool.pdf 6.2.1.3 Connection Error Recovery A servlet or Bean Managed Persistent (BMP) entity EJB coded to catch a StaleConnectionException, or a Container Managed Persistence (CMP) entity EJB, for which the container catches the exception, should recover from a StaleConnectonException unnoticed from an application perspective. Messages will be displayed as depicted below in the Administration console. 53 Figure 6.2: StaleConnectionException displayed in Administrative Console 6.2.2 ConnectionWaitTimeoutExceptions While the StaleConnectionException provides a mechanism for recovery once the database manager process is restored, a related issue is avoiding a “frozen website”. A website is “frozen”, or appears to be, when servlet threads are waiting for connections from the connection pool, and as viewed from the web browser nothing is happening. This can occur for any number of reasons, but for database access it’s important that applications be coded to catch com.ibm.ejs.cm.pool.ConnectionWaitTimeoutException. This exception occurs if the Connection Timeout parameter for the datasource is exceeded. This parameter specifies the amount of time an application will wait to obtain a connection from the pool if the maximum number of connections is reached and all of the connections are in use. Though the website has not actually suffered an outage, this appears to be the case from the perspective of the end user. Coding the application to return a message to the user and tuning the connection timeout value for the datasource can significantly improve the end user experience with a web site under heavy load. The default value for this parameter is 180 seconds (3 minutes), though any non-negative integer is a valid value. Setting this value to 0 disables the connection timeout.. This value can also be changed programmatically by calling setLoginTimeout() on the datasource. If setLoginTimeout is called on the datasource, this sets the timeout for all applications that are using that datasource. For this reason, it is recommended that setLoginTimeout not be used. Instead, the connection 54 timeout property should be set on the datasource during configuration. The value for this parameter is specified on the Connection Pooling Tab for each Data Source as depicted below: Figure 6.3: Connection timeout configuration An application sample for handling this exception is depicted below: java.sql.Connection conn = null; javax.sql.DataSource ds = null try { //Retrieve a DataSource through the JNDI Naming Service java.util.Properties parms = new java.util.Properties(); setProperty.put(Context.INITIAL_CONTEXT_FACTORY, "com.ibm.websphere.naming.WsnInitialContextFactory"); //Create the Initial Naming Context javax.naming.Context ctx = new javax.naming.InitialContext(parms); //Lookup through the naming service to retrieve a //DataSource object javax.sql.DataSource ds = (javax.sql.DataSource) ctx.lookup("java:comp/env/jdbc/SampleDB"); conn = ds.getConnection(); //work on connection } catch (com.ibm.ejs.cm.pool.ConnectionWaitTimeoutException cw) { //notify the user that the system could not provide a //connection to the database } catch (java.sql.SQLException sqle) { //deal with exception } 6.3 Session Database 55 The default for WebSphere Application server is to store HTTP session objects in memory. Even with the improvements in the session affinity implementation in WebSphere V4.x, placing HTTP session objects in memory does not provide a mechanism for failover of requests associated with that HTTP session object in the event of the shutdown or failure of an application server process. The Session Manager in WebSphere Application Server provides an option for using a database as the mechanism for storing HTTP session objects. This implementation, which utilizes updates to HTTP Session objects in memory, with periodic updates of session information to a database, provides a scalable mechanism for failover of the HTTP session object to an alternative application server. In this implementation all application servers in a WebSphere cluster are candidates for failover of the request associated with the HTTP session object. For applications that use the Servlet HTTPSession API, the database used to persist the HttpSession object represents a potential single point of failure. Fortunately with WebSphere V4.x there are a number of improvements in the Session Manager that allow for tuning of the frequency of updates to the database used for persistence of the HTTP session object. These options provide better scalability and failover capabilities than available in previous releases of WebSphere. In brief, the Session Manager may be tuned to update the database: • At the end of the servlet service method (the default). • Manually (this requires use of an IBM extension to the HttpSession API). • At the end of a specified time interval The default behavior for update of the database at the end of the servlet service method implies that the database must be continuously available in order to service requests. Even highly available databases are not continuously available, hence correct application design and tuning of the Session Manager in conjunction with use of a highly available database are the keys to minimizing disruptions in service, in the event of a database outage. 6.3.1 Expected Behavior - Servlet Service Method In the event of a failure of the session database, servlet requests are queued waiting for database service to be restored. As with the rest of the WebSphere runtime, the Session Manager uses the WebSphere datasource implementation and the StaleConnectionException mechanism. Since the Session Manager is the database client, and the application is the client to the Session Manager, the application has no visibility of any outage of the database. The Session Manager handles all database failure and subsequent recovery transparently to the application. Once the database service is restored the database connection pool is destroyed and recreated by the runtime, and all queued requests are serviced. The only visibility to the problem is the messages in the administration console, depicted below, and the web browser “hanging”. 56 Figure 6.4: Get Connection Failure displayed in Administrative Console 6.3.2 Expected Behavior - Manual Update Another option for session persistence updates in WebSphere involves the use of an IBM extension to the HTTPSession API , know as manual update. Manual update allows the application to decide when a session should be stored persistently. With manual update , the Session Manager only sends changes to the persistent data store if the application explicitly requests a save of the session information Manual update requires that an application developer use the IBMSession class for managing sessions. When the application invokes the sync() method, the Session Manager writes the modified session data and last access time to the persistent session database. The session data that is written out to the database is controlled by the Write Contents option selected. If the servlet or JSP terminates without invoking the sync() method, the Session Manager saves the contents of the session object into the session cache (if caching is enabled), but does not update the modified session data in the session database. The Session Manager will update just the last access time in the database asynchronously at a later time. The expected behavior and recovery with manual update are the same as with the default update that occurs at the end of the servlet service method. 57 6.3.3 Expected Behavior - Time Based Update With the time based update, all updates of the HTTP Session object occur in memory, as is the case with the default and manual update, but the database updates are deferred until the end of the specified time interval. By deferring the updates to the database, the probability that any disruption in database service will disrupt servlet requests is minimized. For example, the default interval for time based updates is 300 seconds, thus a database failure that occurred immediately after the last update interval would not impact servlet requests for 300 seconds. If database service was restored during this interval the outage would not even be noticed, aside from a warning message in the administration console as depicted below. Figure 6.5: StaleConnectionException during Time Based Update In the case where a database update was to take place during an interruption in database service, then the Session Manager functions in the same manner as with updates at the end of the servlet service method and manual updates, by deferring servlet requests until the database service is restored. The amount of time appropriate for a given environment can only be determined through testing to determine the amount of time required for database failover and recovery. While the default time interval of 300 seconds (5 minutes) should prove more than adequate for database failover and recovery with most cases, the time interval can be fine tuned for your environment by using the manual persistence configuration dialog in the session manager. 58 Figure 6.6: Persistence Tuning Configuration Property Sheet While specification of a longer time interval should lower the probability of an update coinciding with a database failure, interval tuning not only needs to consider the time required for database failover, but also the amount of traffic on the website in that interval, as well as the resulting session objects cached by the session manager and the attendant JVM memory impact. Despite the robust Session Manager implementation provided by WebSphere, there remains windows of vulnerability for HTTP session state maintenance/persistence. For example, in the case where time based update is utilized, if an application server process were to fail between updates, all updates in memory that had occurred since the last database update would be lost. Thus in cases where the state information being stored is valuable (“state liability is high”), a recommended alternative to the storage of state information in HTTP session would be for the application to explicitly store the information in a database via a direct JDBC call or an entity EJB, rather than relying on the persistence mechanism used by the Session Manager. This alternative provides for transactional updates of state information 6.4 Administrative Database The administrative database is a potential single point of failure for many operations involving both the system administration and application runtime environments. It stores the configuration information about application servers, and what applications to run on those servers. The configuration information includes lists of servers in server groups for use by the workload management runtime. The administrative database also stores the operational information about the state of application servers. The output of serious events are stored in the administration database. And finally, it serves as the repository for the name server used to support JNDI service. An outage of the Administrative database can have an effect on all of these services. 59 Many customers make a distinction between the loss of administrative function and loss of application function during administration database failure. They are willing to tolerate the loss of administrative function, but unwilling to tolerate the loss of application function. In this section we will describe how to make use of JNDI caching to ensure application functions are uninterrupted despite failure of the administrative database. 6.4.1 System Administration While the administrative database is down, administrative functions requiring reads and writes to this database will fail. These operations include stopping and starting application servers, querying the state of servers, or changing the application topology through any of the WebSphere administrative interfaces. New administrative instances, such as Administrative Consoles, XMLConfig instances, and WSCP instances can not be started while the administrative database is unavailable. If an application server crashes unexpectedly while the administrative database is available, the administrative server would automatically restart the server. If the administrative database is unavailable, this application server can not be restarted because the administrative server can not retrieve the appropriate information from the administrative database. When the administrative database is functioning normally, WebSphere events are logged to both an activity.log file on the local node and a table in the database. This database logging enables any administrative server in the WebSphere domain to view all serious events, rather than just those on the local node. When the administrative database is unavailable, events are logged only to the activity logs on the local nodes. The following error message will be displayed in the administrative console: 60 Figure 6.7: Attempt to log serious event during database outage 6.4.2 Application Runtime Environment - JNDI caching We often discuss application servers in terms of “started” and “stopped.” However, the started phase can actually be divided into two separate sub-states, initialization and steady state. During initialization, the application is in the process of acquiring the resources required for smooth execution, including deployment descriptors for servlets and EJBs, workload management server group information, data sources, and EJB homes. At steady state all of the resources are cached in memory, and the task of the application is to process user requests. As mentioned in the previous section, when the administrative database is unavailable, new application servers can not be started. Also, application servers in the initialization phase will not load successfully. However, if the JNDI cache is properly configured in both the application server and any Java clients in the environment, applications in steady state will continue to function properly. The JNDI service in WebSphere runs inside of the WebSphere Administrative Server, and is backed by the administrative database. If this database fails, the JNDI service is unable to return lookup responses. In WebSphere V3.5.2 a client-side JNDI cache was added as a performance enhancement. As JNDI lookups were performed, the results were stored in a cache within the JNDI client process. The next time the process attempted to perform this same lookup, the response is retrieved from the cache rather than the JNDI service. 61 Primarily designed as a performance improvement, this cache did provide some measure of toleration for failure of the administration database. In WebSphere 4.0.2, configuration options for this client-side JNDI cache are provided to enable it as a mechanism for toleration of administrative database failure. The JNDI cache can be configured to perform in one of three ways. All these options allow the application to initialize the JNDI cache during initialization, allowing the application to function should the administrative database fail after the application is started. The first configuration pre-loads all EJB References used by an application when the application is started. If at any point the JNDI server becomes unavailable, these references are already cached, and the application will continue to function. Note explicit steps to cache resources looked up through the resource references via "java:comp" namespace is unnecessary. These resources reside in the same process as the application client or application server. The parameters used to create these resources are loaded into memory as part of application server or client initialization. The actual lookup is performed locally in process. The second configuration pre-loads the entire JNDI name tree at application startup. This may be useful for applications which do not use the EJB Reference mechanism provided by J2EE, but may adversely affect the performance and memory utilization of the application if the JNDI name tree is particularly large. The third option allows an application to pre-load a specific portion of the JNDI name tree by specifying an XML file which contains additional instructions. 6.4.2.1 -XML/DTD syntax It is possible to create an XML file stating which portions of the JNDI name space you would like a client process to pre-load. The DTD for this XML file looks like: <!ELEMENT JNDICachePreload (Provider)* > <!ELEMENT Provider (Entry | Subtree)* > <!ELEMENT Entry EMPTY> <!ELEMENT Subtree EMPTY> <!ATTLIST Provider INITIAL_CONTEXT_FACTORY CDATA #IMPLIED PROVIDER_URL CDATA #IMPLIED> <!ATTLIST Entry Name CDATA #REQUIRED > <!ATTLIST Subtree Name CDATA #IMPLIED > Here is a sample XML file illustrating what is to be pre-loaded. <?xml version="1.0"?> <JNDICachePreload> <!-- This uses default provider and preloads entry a/b/c and subtree rooted at d/e/f --> <Provider> <Entry Name="a/b/c"/> <Subtree Name="d/e/f"/> </Provider> <!-- This uses a factory and a provider URL to preload entry X and subtree at Y --> 62 <Provider INITIAL_CONTEXT_FACTORY= "com.ibm.websphere.naming.WsnInitialContextFactory”, PROVIDER_URL= "iiop://myserver1.domain.com:900");> <Entry Name="X"/> <Subtree Name="Y"/> </Provider> <!-- This preloads entire tree at a different provider --> <Provider PROVIDER_URL="iiop://myserver2.domain.com:900""> <Subtree/> </Provider> </JNDICachePreload> 6.4.2.2 Configuring the JNDI cache for an application server The application deployer can install one of two WebSphere CustomServices to configure the JNDI cache for an application server. These will help ensure that an application server reaches steady state during server initialization. The first CustomService, com.ibm.websphere.naming.CustomServicePreloadCacheJavaURL, is used to pre load all the EJB references used by an application. The second CustomService, com.ibm.websphere.naming.CustomServicePreloadCacheNameTree, is used to pre-load the entire name tree. Both of these CustomServices can take an additional property as input. The input property com.ibm.websphere.naming.JNDIPreload.configFile=<xmlfile> specifies the location of an XML file that contains additional JNDI entries to be pre loaded. See section 6.6.15 in the WebSphere 4.0 InfoCenter for more information on configuring custom services. 6.4.2.3 Configuring the JNDI Cache for an Application Client To pre load JNDI cache for an application client, use the following command line parameters: # pre load EJB references -CCPreloadCacheJavaURL=true # pre load entire name tree -CCPreloadCacheNameTree=true # config file with additional entries to load -CCJNDIPreloadconfigFile=xmlfile Or place them in the application client property file: PreloadCacheJavaURL=true PreloadCacheNameTree=true JNDIPreloadconfigFile=xmlfile 6.4.2.4 Preloading JNDI Cache for Thin Client 63 Thin clients are non-managed clients not running within a client container. For these clients, it is the application writer’s responsibility to write code to initialize and cache the JNDI lookups at the beginning of the program’s initialization. 6.4.2.5 Operating Restrictions In order to enable client-side caching, applications can not bind to JNDI at runtime. This binding would cause the caches of already running clients to become out of synch with the JNDI server. This should not be a problem for the majority of applications, since the J2EE specification does not require the server runtime to offer this capability. Applications have the option of pre-binding entries into the namespace via a separate utility. During application initialization, these entries may be cached into memory. Finally, steps must be taken to preload the JNDI cache, as previously described in this section. Short running application clients that are constantly being restarted will not run well during a failover situation, since they have a better chance of being at initialization state when the administrative database goes down. The use of highly available administration database can alleviate this problem by shortening the amount of downtime. As mentioned in section 6.4.1, system administration operations, including application server starts, will be unavailable during an administrative database failure. JNDI caching does not provide any relief for this restriction. 6.4.2.6: JNDI Cache Size Considerations The space used by the JNDI client cache is about 1.5K per entry. The calculation of actual memory requirement is complicated by the depth of the compound name, where each component of the name also takes up a cache entry. For example, an EJB home bound as “a/b/MyHome” requires three cache entries: “a”, “b”, and “MyHome”. An EJB home bound as “a/b/MyOtherHome” requires only one additional entry for “MyOtherHome”, since both “a” and “b” are already cached. Therefore, the actual space required for JNDI client cache is determined by the structure of the name tree. If each process tries to access 1000 EJBs, the space requirement lower bound is 1000*1.5K = 1.5Meg per process. On the other hand, assuming that the depth of the compound name is D, and the names never overlap, the space requirement upper bound would be D*1000*1.5K. If the depth is 10, the upper bound is 15Megs. The actual space requirement will be somewhere in-between the lower and upper bound. 6.4.3 HA Administrative Database An HA administrative database can be hosted on the same nodes hosting the HA application database(s). In fact it is recommended that administration database deployment occur in this fashion. When the administration database is co-located with HA application database(s), database administration and recovery are facilitated since processes and procedures are already in place for the server. Further, it is generally easier to add the database into the existing HA monitoring configuration providing the automatic database error detection and recovery required for a production environment. 64 However, even with a highly available database, there will still be a period of time during the failover process in which the administrative database is unreachable. During this failover period, services such as systems management, serious event logging, and JNDI service will be unavailable. This means that even in a situation with a highly available database configuration, the JNDI caching options outlined in the above section will be necessary to ensure uninterrupted application function. Highly available database options will be discussed in more detail in Chapter 7. 6.4.4 Administrative Database Recovery After the failover of the administrative database, clients of the services have to reconnect to resume normal operations. The WebSphere runtime services, such as the WLM runtime and the logging service are already coded to tolerate the failure of the administrative database. Clients to WebSphere runtime, such as callers to the JNDI service, have to be coded to retry. The first access to the administrative server that causes it to access the administrative database, such as running administration client, XMLConfig, or wscp, may result in a an exception due to old stale connections still cached in the server. The adminserver will handle recovery from this StaleConnectionException, and the next administrative operation will succeed. An error message will be displayed in the administration console, much the same as that for an application exception. However the message detail will differ as shown below. 65 Figure 6.8: StaleConnectionException in the Administrative Console If an application server or administrative server crashed unexpectedly during a database outage, the normal nanny procedures would be unable to restart this server. It will be necessary to manually restart these application servers when the database comes back online. In a larger installation with many nodes, it is inconvenient to manually check the health of the administrative and application servers. Therefore, a wscp script has been provided in WebSphere 4.0.2 to check these servers, and optionally restart application servers. It is not currently possible to restart administrative servers from wscp. To run the script, first change to the following directory: %WAS_HOME%\AppServer\bin $WAS_HOME/AppServer/bin on NT on AIX Then invoke this wscp to check for failed admin and application servers: wscp -f restartServer.tcl -c restartServer After restarting any failed admin server, use this command to restart failed application servers: 66 wscp -f restartServer.tcl -c restartServer 1 67 Chapter 7 - Integrated Hardware Clustering for Database High Availability As discussed in previous chapters, WebSphere relies heavily on database functionality for its operations. In addition to customer application data, databases are also used for the WebSphere Administrative Repository, as storage for persistent HTTP sessions, and as storage for some LDAP servers used with WebSphere security. Protecting data integrity and enhancing availability for the entire WebSphere processing environment can be achieved by utilizing hardware clustering solutions to provide high availability for these databases. A hardware cluster is a group of independent computers that work together to run a common set of applications, in this case database applications, and provide the image of a single system to the client and application. The computers are physically connected by cables and programmatically connected by cluster software. These connections allow computers to use failover and load balancing, which is not possible with a standalone computer. To the WebSphere Application Server, connecting to a clustered remote database is no different than connecting to a non-clustered remote database. As the active node in the cluster moves from node to node, WebSphere will detect that the connections it is holding in its connection pool are no longer valid, flush the pool and throw the appropriate exception (usually a StaleConnectionException) to the running application to indicate that the application should establish new connections. WebSphere Application Server 4.0.x supports many different databases technologies, this chapter will discuss the high availability database solutions tested in the WebSphere labs. WebSphere components should perform properly with these solutions assuming proper availability is maintained by the HA solution and appropriate coding practices, discussed in Chapter 6, are followed. 7.1 HACMP and HACMP/ES on IBM AIX 7.1.1 Introduction and Considerations IBM High Availability Cluster Multiprocessing (HACMP) and High Availability Cluster MultiProcessing Enhanced Scalability (HACMP/ES) provides the AIX platform with a high-availability solution for mission critical applications. HACMP allows the creation of clusters of RS/6000 servers to provide high availability of databases and applications in spite of a hardware, software, or network failure. HACMP/ES includes all of the same functionality as HACMP, but enables larger clusters of machines and more granular event control. The unit of failover in HACMP or HACMP/ES is a resource group. A resource group contains all the processes and resources needed to deliver a highly available service and ideally should contain only those processes and resources. HACMP for AIX software supports a maximum of up to 20 resource groups per cluster. 68 As with MC/ServiceGuard, an HACMP cluster requires two network connections for a highly available heartbeat mechanism. A public network connects multiple nodes and allows clients to access these nodes while a private network connection is allows point-to-point communications between two or more nodes. Each node is typically configured with at least two network interfaces for each network to which it connects: a service interface that handles cluster traffic, and one or more standby interfaces to allow recovery from a network adapter failure. This recovery by a standby adapter is typically faster than recovery by another node, and thus provide less down time. The AIX IP address takeover facility is utilized to facilitate node failover. If one node should fail, a takeover node acquires the failed node’s service address on its standby adapter, making failure transparent to clients using that service IP address. To enable IP address takeover, a boot adapter IP address must be assigned to the service adapter on each cluster node. Nodes use this boot address after a system reboot and before the HACMP for AIX software is started. When the HACMP for AIX software is started on a node, the node’s service adapter is reconfigured to use the service address instead of the boot address. To provide data redundancy, a RAID disk array can be shared among the nodes in the cluster. When a failover occurs, HACMP invokes a stop script to stop the resource group and a start script to start the resource group. Samples of these scripts are included in Appendix D, but these scripts can be customized to match the runtime environment. 7.1.2 Expected Reactions to Failures If a failure occurs, the quickest way to recover is to failover from the service adapter to a standby adapter. In many cases, for example a node power failure, this is not an option. In this case, the resource group will fail over to another node in the cluster. The node on which the resource group is recovered depends on the resource group configuration. There are three types of resource groups supported: Ÿ Ÿ Ÿ Cascading resource groups - A resource may be taken over by one or more nodes in a resource chain according to the takeover priority assigned to each node. The available node within the cluster with the highest priority will own the resource. If this node fails, the node with the next highest priority will own the resource. You can also choose to allow the highest priority node recover its resources when it is reintegrated into the cluster, or set a flag requiring the resource to remain on the active lower-priority node. Rotating resource groups - A resource is associated with a group of nodes and rotates among these nodes. When one node fails, the next available node will acquire the resource group. Concurrent access resource groups - A resource can be managed by the HACMP for AIX cluster Lock Manager and may be simultaneously shared among multiple applications residing on different nodes. Cascading resources provide the best performance since cascading resources ensure that an application is owned by a particular node whenever that node is active in the cluster. This ownership allows the node to cache data the application uses frequently. Rotating resources may minimize the downtime for failover. 69 When this failover occurs, the following steps are run on the new active node: Ÿ Ÿ Ÿ Ÿ Ÿ run the stop script release the disk volume groups and other shared resources held by the primary node. take over the service IP address mount the shared disk array and take over any other resources necessary run the start script 10.5.10.x Standby 10.5.10.4 Standby 10.5.10.3 Primary Admin DB LDAP DB Session DB App DB HACMP or HACMP/ES Oracle, DB2, LDAP Service 10.5.12.8 Boot 10.5.12.1 Hot Standby HACMP or HACMP/ES Admin DB LDAP DB mirror Session DB mirror App DB mirror Oracle, DB2, LDAP Dedicated RS232 heartbeat line Service 10.5.12.9 Boot 10.5.12.2 10.5.12.x Figure 7.1: HACMP Cluster Configuration for the WebSphere System 70 10.5.10.x Failover Standby 10.5.10.4 Standby 10.5.10.3 Primary Admin DB LDAP DB Session DB App DB HACMP or HACMP/ES Oracle, DB2, LDAP Service 10.5.12.8 Boot 10.5.12.1 Hot Standby HACMP or HACMP/ES Admin DB LDAP DB mirror Session DB mirror App DB mirror Oracle, DB2, LDAP Dedicated RS232 heartbeat line Service 10.5.12.9 Boot 10.5.12.2 10.5.12.x Figure 7.2 HACMP Cluster for WebSphere after the primary node fails 7.1.3 WebSphere HACMP configuration The lab environment used to test this configuration consisted of two IBM RS/6000 machines running AIX 4.3.3. These machines were installed with the HACMP software. We tested HACMP 4.3.1, HACMP 4.4, or HACMP/ES 4.4 ptf 3. Both DB2 7.2.1 and Oracle 8.1.7 were installed. The machines shared an IBM 7133 Serial Storage Architecture Disk Subsystem. The public network connection was supplied by a 100 MB Ethernet LAN, while the private connection was supplied by a dedicated RS232 connection. WebSphere components were also configured in a highly available topology, as previously described in this document. WebSphere resources requiring database access were configured to communicate with the service IP address. Applications requiring database connections were programmed in accordance with the recommendations in Chapter 6. 71 Test Environment Clients Clients HTTP Server FW App Server FW App Server Admin Server Database LDAP Admin DB HACMP 4.4 ND Sess. DB Clients ND Clients FW App Server HTTP Server FW App Server Admin Server Database LDAP HACMP 4.4` App DB LDAP DB Clients App Figure 7.3: Test environment for integrated WebSphere and HACMP HA system 7.1.4 Tuning heartbeat and cluster parameters HACMP 4.4 provides control over several tuning parameters that affect the cluster’s performance. Setting these tuning parameters correctly to ensure throughput and adjusting the HACMP failure detection rate can help avoid “failures” caused by heavy network traffic. Ÿ Adjusting high and low watermarks for I/O pacing. The default value is 33 -24 Ÿ Adjusting syncd frequency rate. The default value is 10. Ÿ Adjusting HACMP failure detection rate. There are two parameters that control HACMP failure detection rate. w HACMP cycles to failure: the number of heartbeats that must be missed before detecting a failure. w HACMP heartbeat rate: the number of microseconds between heartbeats For examples, heartbeat rate=1 seconds, cycles =10, the failure detection rate would be 10 seconds. Faster heartbeat rates may lead to false failure detection, particularly on busy networks. Please note that the new values will become active the next time cluster services are started. Ÿ AIX deadman switch timeout. If HACMP for AIX cannot get enough CPU resource to send heartbeats on IP and serial networks, other nodes in the cluster will assume the node has failed, and initiate takeover of node’s resources. In order to ensure a lean takeover, the deadman switch crashes the busy node if it is not reset within a given time period. 72 7.2 MC/Serviceguard on the HP-Unix Operating System 7.2.1 Introduction and Considerations Hewlett Packard Multi-Computer/ServiceGuard (MC/ServiceGuard) is a high-availability solution available for HP-UX systems. This product allows the creation of clusters of HP 9000 series computers that provide high availability for databases and applications in spite of a hardware, software, or network failure. MC/ServiceGuard ensures availability of units called packages; a collection of services, disk volumes, and IP addresses. When a cluster starts, a package starts up on its primary node. The MC/ServiceGuard package coordinator component decides when and where to run, halt, or move packages. User-defined control scripts are used to react to changes in the monitored resources. A heartbeat mechanism is used to monitor node and database services and their dependencies. If the heartbeat mechanism detects a failure, recovery is started automatically. To provide a highly available heartbeat mechanism, two connections must be available between the MC/ServiceGuard nodes. This can be provided by dual network connections, or, in a two node configuration, a single network connection and a dedicated serial heartbeat. This heartbeat redundancy will prevent a false diagnosis of an active node failure. Network failures are handled by the network manager component. Each active network interface on a node is assigned a static IP address. A relocatable IP address, also known as a floating IP address or a package IP address, is assigned to each of the packages. This provides transparent failover to the clients by providing a single IP address to connect to, no matter which node in the cluster is hosting the service. When a package starts up, the cmmodnet command in the package control script assigns this relocatable IP address to the primary LAN interface card in the primary node. Within the same node, both static and relocatable IP addresses will switch to a standby interface in the event of a LAN card failure. Only the relocatable IP address can be taken over by an adoptive node if control of the package is transferred. At regular intervals, the network manager polls all the network interface cards specified in the cluster configuration file. The polling interface sends LAN packets to all other interfaces in the node that are on the same bridged net and receives packets back from them. If a network failure is detected, the network manager has two options: Ÿ Ÿ Move the package to a backup network interface on the same node. This is often referred to as a local switch. TCP connections to the relocatable IP address are not lost (except for IEEE 802.3 that does not have the rearp function). Move the package to another node in the cluster. This is often referred to as a remote switch. This switch causes all TCP connections to be lost. If a remote switch of a WebSphere database package is initiated, connections held in the WebSphere connection pools will be lost. These connections will be marked stale and will need to be recovered as described in Chapter 6. Data redundancy is also an important part of this failover mechanism. There are two options available to provide data redundancy with MC/ServiceGuard: Ÿ Disk mirroring with MirrorDisk/UX - MirrorDisk/UX is a software package for the HP-Unix Operating System which provides the capability to create up to three copies of your data on different disks. If one 73 Ÿ disk should fail, MirrorDisk/UX will automatically keep the data available by accessing the other mirror. This access is transparent to the applications utilizing the data. To protect against SCSI bus failures, each copy of the data must be accessed by a separate SCSI bus. Redundant Array of Independent Disks (RAID) levels and Physical Volume (PV) links - An array of redundant disks and redundant SCSI interfaces protect against single point of failure in the I/O channel. PV links are used to configure the redundant SCSI interfaces. You can monitor disks through the HP Event Monitoring Service, a system monitoring framework for the HP environment. 7.2.2 Expected Reactions to Failures If a package needs to be moved to another node, the general MC/ServiceGuard process involves the following steps: Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Stop services and release resources in the failed node Unmount file systems Deactivate volume groups Activate volume groups in the standby node Mount file systems Assign Package IP addresses to the LAN card on the standby node Execute start up of services and get resources needed During the time this failover process is occurring, the package is unavailable for client connections. If the package contains one or more of the WebSphere databases, errors may occur in this component. Once the failover is complete, WebSphere components and applications should successfully reconnect, assuming proper programming practices as outlined in Chapter 6 are followed. Primary root Admin DB LDAP DB Session DB mirror App DB root MC/ServiceGuard mirror Oracle LDAP Hot Standby MC/ServiceGuard Admin DB LDAP DB mirror Session DB App DB mirror Oracle LDAP mirror Dedicated heartbeat line 10.0.0.x Figure 7.4: MC/ServiceGuard cluster database configuration 74 Failover Primary root Admin DB LDAP DB Session DB mirror App DB root MC/ServiceGuard mirror Oracle LDAP Hot Standby MC/ServiceGuard Admin DB LDAP DB mirror Session DB App DB mirror Oracle LDAP mirror Dedicated heartbeat line 10.0.0.x Figure 7.5: MC/ServiceGuard cluster configuration after failover 7.2.3 WebSphere MC/ServiceGuard configuration The lab environment used to test this configuration consisted of two HP 9000 k-class machines running HP-UX 11.0. These machines were installed with MC/ServiceGuard A11.12. Both DB2 7.2.1 and Oracle 8.1.7 were installed as well. The machines shared an AutoRAID disk array, where the DB2 or Oracle database instance was created. We used Fast/Wide SCSI interfaces to connect two nodes to the disk array. The redundant heartbeat mechanism was set up on two Ethernet LANs, one a public LAN connected to the rest of the machines in our configuration, the other a private LAN for heartbeat only. WebSphere components were also configured in a highly available topology, as previously described in this document. WebSphere resource requiring database access were configured to communicate with the relocatable IP address. Applications requiring database connections were programmed in accordance with the recommendations in Chapter 6. 75 Test Environment Clients Clients HTTP Server FW App Server FW App Server Admin Server Database LDAP MC Serviceguard ND Clients ND Clients FW App Server HTTP Server FW App Server Admin Server Database LDAP MC Serviceguard Admin DB Sess. DB App DB LDAP DB Clients App Figure 7.6: Test environment for integrated WebSphere and MC/Serviceguard HA system 7.2.4 Tuning heartbeat and cluster parameters Sending and receiving heartbeat messages among the nodes in the cluster is a key part of the cluster management technique. If a cluster node does not receive heartbeat messages from the other node within the prescribed time, a cluster reformation is initiated. At the end of the reformation, if a new set of nodes form a cluster, that information is passed to the package coordinator. Packages which were running on nodes that are no longer in the new cluster are transferred to their adoptive nodes. There are several MC/ServiceGuard parameters you can tune for performance and reliability: Ÿ Heartbeat interval is the normal interval between the transmission of heartbeat messages from one node to the other in the cluster. The default value is 1 second, with a maximum value of 30 seconds. Ÿ Node timeout is the time after which a node may decide that the other node has become unavailable and initiate cluster reformation. The default value is 2 seconds, with a minimum of 2* (heartbeat interval) and a maximum of 60 seconds. Small values of node time out and heartbeat interval may increase the potential for spurious cluster reformation due to momentary system hangs or network load spikes. Ÿ Network polling interval is the frequency at which the networks configured for MC/Serviceguard are checked. The default value is 2 seconds. Changing this value can affect how quickly a network is detected, ranging from 1 to 30 seconds. Ÿ Choosing switching and failover behavior: Switching the IP address from a failed LAN card to a standby LAN card on the same physical subnet may take place if Automatic Switching is set to Enabled in SAM. You can define failover behavior and whether it will fallback automatically as soon as primary node is available. Ÿ Resource polling interval is the frequency of monitoring a configured package resource. Default value is 60 seconds, with the minimum value of 1 second. 76 7.3 Microsoft Clustered SQL Server on Windows 2000 7.3.1 Introduction and Considerations In the Windows 2000 Advanced Server and Datacenter Server operating systems, Microsoft introduces two clustering technologies that can be used independently or in combination, providing organizations with a complete set of clustered solutions that can be selected based on the requirements of a given application or service. Windows clustering technologies include: Ÿ Ÿ Cluster service. This service is intended primarily to provide failover support for applications such as databases, messaging systems, and file/print services. Cluster service supports 2-node failover clusters in Windows 2000 Advanced Server and 4-node clusters in Datacenter Server. Cluster service is ideal for ensuring the availability of critical line-of-business and other back-end systems, such as Microsoft Exchange Server or a Microsoft SQL Server (TM) 7.0 database acting as a data store for an e-commerce web site. Network Load Balancing (NLB). This service load balances incoming IP (Internet Protocol) traffic across clusters of up to 32 nodes. Network Load Balancing enhances both the availability and scalability of Internet server-based programs such as web servers, streaming media servers, and Terminal Services. By acting as the load balancing infrastructure and providing control information to management applications built on top of Windows Management Instrumentation (WMI), Network Load Balancing can seamlessly integrate into existing web server farm infrastructures. Network Load Balancing will also serve as an ideal load balancing architecture for use with the Microsoft release of the upcoming Application Center in distributed web farm environments. See Introducing Windows 2000 Clustering Technologies at http://www.microsoft.com/windows2000/techinfo/howitworks/cluster/introcluster.asp for more information. The failover mechanism in the Windows 2000 active/passive clustering works by assigning a virtual hostname / IP address to the active node. This is the hostname exposed to all external clients accessing the clustered resource. When a resource on the active node fails, the cluster moves to the passive node, making it the active node and starting all the necessary services. Additionally, the virtual hostname / IP address is moved from the first node to the newly activated node and handles all new requests from the clients. While this transition is taking place, the database will not be available and requests will fail. However, once the transition is complete, the only indication to the client that something has failed is that the connections to the cluster are no longer valid and must be reestablished. 7.3.2 Expected Reactions to Failures The Microsoft cluster can fail in several ways. Ÿ Ÿ Manual Push to Passive Node - Within the cluster administrator, you can right-click on a group and click "Move Group" to move it to the passive node. Clean Shutdown of Active Node. Without manually moving any of the groups, go to Start->Shutdown and power down the Active Node. 77 Ÿ Ÿ Unexpected power failure on Active Node. Physically pull the power cable from the Active Node. Public network failure on the Active Node. Only pull the public network cable from the Active Node. When these failures occur, the cluster service should recognize that one (or more) of the resources failed on the active node and transition all of the components from the failing node to the alternate node. At this point, all the connections to WebSphere are broken, so on the next request from WebSphere, stale connections would be detected and a StaleConnectionException would be thrown to the WebSphere Application (as described in Chapter 6). After the transition to the new active node has completed WebSphere will reestablish connections to the database. Applications programmed according to the guidelines in Chapter 6 would also reconnect to the database. If both the private and public networks on the active node fail, the cluster service will not transition the components to the alternate node. When both networks go down and all communication is lost, both nodes think the other node is down and try to take control of the shared disk. However, the disk is already locked by the active node, so the standby node’s takeover request will fail. Meanwhile, the active node will attempt to move the cluster to another node. However, since all of its network connections were lost, the active node will see itself as the only node in the cluster. 7.3.3 WebSphere Microsoft Cluster Server Configuration The lab environment used to test this configuration consisted of two IBM Netfinity 7000 M10 machines running Microsoft Windows 2000 and Microsoft SQL Server 2000 Enterprise Edition. Hardware Database Server Node Shared Disk Software IBM Netfinity 7000 M10 • Microsoft Windows 2000 Advanced Server with • 4 Internal 16 GB SCSI Service Pack 2 Applied Hard drives (includes clustering • Adaptec AIC-7895 SCSI software) controller • Microsoft SQL Server 2000 • 2 GB RAM Enterprise Edition • 4x500 MHz Pentium III OR Xeon processors IBM DB2 7.2 Enterprise-Extended Edition IBM EXP15 • 10 Drive Array WebSphere Server • 2 SCSI interfaces IBM M Pro Windows NT 4.0 SP 6a • 1x933 MHz Processor • 512 MB RAM 78 Notes: Ÿ Even though the WebSphere server was tested on an NT platform, the same functionality occurs on any of the distributed WebSphere platforms. Ÿ The nodes of the cluster were identical boxes with identical software installed on them, so this data will only be presented once even though two boxes were used. Active Directory DNS Server (10.0.0.5) (10.0.0.6 - DNS IP) WebSphere 4.0x Domain Public (10.0.0.x) Database Server Cluster Node 1 (10.0.0.61 - Public) (10.1.0.61 - Private) Shared Disk Database Server Cluster Node 2 (10.0.0.62 - Public) (10.1.0.62 - Private) Private (10.1.0.x) Figure 7.7: Microsoft Cluster Configuration 7.3.4 Tuning heartbeat and cluster parameters To ensure proper performance of the heartbeat communications, it is recommended that you disable any unnecessary network protocols, and ensure proper TCP/IP configuration. It is also recommended that the “Media Sense” configuration option be disabled in Windows 2000. For more information, see the Microsoft Support Article “Recommended Private "Heartbeat" Configuration on a Cluster Server (Q258750)” at http://support.microsoft.com/default.aspx?scid=kb;en-us;Q258750. 79 Chapter 8 - Failover for Other Components LDAP HTTP client LDAP Clients ( WebSphere Application and Administration Servers ) Internet Firewall Figure 8.1: WebSphere Topology with Firewalls 8.1 Firewalls In a web application environment a firewall is a basic component of the infrastructure, or should be. In fact, most environments have multiple firewalls. As noted in the introduction to this paper, the outage of a firewall can result in catastrophic outage much like losing the network itself. High availability for a firewall is requisite for an e-business environment. If a firewall is down, customers are unable to access applications on your website, and more importantly, in some configurations your web infrastructure is vulnerable to attacks from hackers during any outage. High availability for firewalls can be achieved in a number of different ways, but the two most common involve the use of either a hardware clustering implementation such as HACMP, or a load balancing cluster using a product such as the WebSphere Edge Server Network Dispatcher. The first approach using hardware clustering is depicted below. Two separate servers are configured to run the firewall software as well as the hardware clustering software. Under normal operation all traffic is routed through the first firewall. 80 Normal Operation HTTP client Firewall Machine 1 Internet HTTP Servers Firewall Machine 2 Figure 8.2: Normal HACMP Firewall Operations When an outage is detected the hardware clustering software is configured to effect an IP service address takeover. Migrating the service IP address to the backup node allows the client to continue to use the same IP address. The firewall workstation that has the service IP addresses is the active node from a hardware cluster perspective. It will get all the IP traffic. The result of this is depicted below. Failover Operation HTTP client IP Service Address Takeover Firewall Machine 1 Internet HTTP Servers Firewall Machine 2 Figure 8.3: HACMP Firewall Failover Operation The second approach using load balancing clustering is depicted below. Two separate servers are configured to run the firewall software as well as the load balancing clustering software. Under normal operation all traffic is routed through the first firewall. 81 HTTP client Firewall Machine 1 Internet HTTP Servers eND Heartbeat Firewall Machine 2 Figure 8.4: Normal eND Firewall Operations Failover in this configuration occurs in much the same manner as with hardware clustering. In this scenario the cluster IP addresses are configured as an alias on the active firewall. In the case of a takeover, this alias is deleted (or replaced by aliases to the loopback device) and moved to the standby server. The only information exchange required between the two firewall servers is the heartbeat of ND. An advantage to using load balancing software for this purpose is that not only can traffic be routed around a firewall that is out of service, but the software can also be configured to distribute load between the firewalls, improving performance and throughput, as depicted below. Firewall Machine 1 HTTP Servers HTTP client Internet eND Heartbeat Firewall Machine 2 Figure 8.5: Load Balanced eND Firewall Operations 82 The major difference between these two approaches is that hardware clustering software can only be used to provide high availability, while load balancing software such as WebSphere Edge Server (WES) Network Dispatcher can provide high availability and load balancing. Arguably configuration of hardware clustering is a bit more complex than WES Network Dispatcher. The downside to WES is that it has no concept of rotating resources, as a result in the case of failure for the primary firewall, there needs to be a second takeover once service is restored, since the primary WES machine has to be the active one. In the end, the choice of a high availability mechanism for a firewall isn’t as important as having one. 8.2 LDAP Server Another component that is sometimes overlooked when considering HA is the LDAP server. Much like the database server, the LDAP server represents a SPOF unless some manner of replication or clustering is provided. LDAP Servers provide two mechanisms for scaling and availability; Replication and Referral. Replication Replication involves the designation of one server as the master server and one or more replicas. The master is used for all updates and then propagates changes to the replicas. Replicas can serve requests to: • Bind • Unbind • Search • Compare. Any requests for Ÿ Add • Delete • Modify • ModifyRDN/modifyDN that arrive at a replica are directed to the master. This is depicted below. If the master server fails, replicas can handle requests for read-only operations. However, write operations will fail. Also, LDAP replication does not provide a mechanism for load distribution and/or failover. This is accomplished either manually by designating the replica to be the master or via an external load distribution mechanism. Unless detection of an outage and re-designation of the master can be accomplished in an automatic and timely fashion, a loss of service will occur. 83 LDAP Master LDAP Clients ( WebSphere Application and Administration Servers ) Updates LDAP Replica Figure 8.6: LDAP Master-Replica Configuration Referral Though not a supported configuration with WebSphere Application Server, referral is a variation of LDAP replication that employs multiple masters, typically for an organization or workgroup within a company. Unlike replication where the master contains all data, with referral there is no single master directory. Instead there are a series of masters, each of which contains information specific to the organization or workgroup. Requests that arrive at a server for information that is contained in another server are referred to the appropriate server. Like replication there is no inherent provision for load distribution or failover. Some multi-master configurations also have a local replica to enable write HA. The writes are queued locally until the master comes back up again. request o=ibm, c=uk LDAP Clients ( WebSphere Application and Administration Servers ) referral to org2 LDAP Master (org1) o=ibm,c=us referral to org2 response o=ibm,c=uk LDAP Master (org2) Figure 8.7: LDAP Referal Replication Hardware HA 84 In a HA hardware cluster for LDAP, two separate servers are configured to run the hardware clustering software. Each server is also configured as an LDAP server, one as the master, the other as a replica. Under normal operation all traffic is routed through to the master or primary LDAP server. Normal Operation LDAP Master LDAP Clients ( WebSphere Application and Administration Servers ) Replication LDAP Replica Figure 8.8: Hardware HA for LDAP - Normal Operation When an outage is detected the hardware clustering software is configured to effect an IP service address takeover. The LDAP server that has the service IP addresses is the active node from a hardware cluster perspective. It will get all the IP traffic. The LDAP server is also now designated as the master. The result of this is depicted in below. Failover Operation LDAP Clients ( WebSphere Application and Administration Servers ) LDAP Master IP Service Address Takeover LDAP Master Figure 8.9: Hardware HA for LDAP - Failover Operation Once service is restored to the primary LDAP server it will be necessary to re-designate the master and the replica to the original configuration. 85 Load Balancing DNS RoundRobin A common mechanism for load distribution of LDAP requests is the use of DNS RoundRobin. With DNS RoundRobin the server side name server is modified to respond to translation requests with the IP address of different hosts in a RoundRobin fashion. This results in partitioning client requests among the replicated hosts. Unfortunately, this approach suffers from a couple of drawbacks that precludes its use with WebSphere. Both the Java runtime and the OS cache DNS resolution. This use of cache combined with the fact that the DNS server is typically unaware of an LDAP directory failure means that WebSphere Application Server could continue to attempt a bind to available and failed LDAP servers. Of course the bind to the failed LDAP server will fail, and the user will not be authenticated. Load Balancing Cluster Failover in this configuration occurs in much the same manner as with hardware clustering. In this scenario the cluster IP addresses are configured as an alias on the active LDAP. In the case of a takeover, this alias is deleted (or replaced by aliases to the loopback device) and moved to the standby server. The only information exchange required between the two LDAP servers is the heartbeat of WebSphere Edge Server Network Dispatcher (eND). An advantage to using load balancing software for this purpose is that not only can traffic be routed around a LDAP server that is out of service, but the software can also be configured to distribute load between the two servers, improving performance and throughput. Though not depicted another variation of this configuration would be to dedicate servers to running the eND component. This might be of value in an organization with a very large number of servers simply for ease of administration. With a smaller number of servers, eND can be co-located with the LDAP servers, as shown in figure 8.10, saving the expense of additional servers dedicated to the task as is sometimes required with other load distribution products. LDAP Master LDAP Clients ( WebSphere Application and Administration Servers ) eND Heartbeat LDAP Replica Load Distribution (optional) 86 Figure 8.10: eND LDAP Configuration The advantages and disadvantages of a Load Balancing cluster as compared to a hardware HA cluster are the same with LDAP as with a firewall. 87 Appendix A Installing Network Dispatcher (ND) on Windows NT Introduction This appendix walks you though installation of ND on an NT machine. The ND forwards requests to web servers which are installed on NT machines as well. The procedure for other platforms is quite similar to the one described here. Also, the configuration described here is based on ethernet. If your network uses token-ring, the procedure would be similar, though there might be some differences. These instructions (for the most part) are also provided in the Redbook “WebSphere Edge Server: Working with Web Traffic Express and Network Dispatcher”, SG24-6172-00, July 2001. Download it from http://www.redbooks.ibm.com before starting your installation, as we will be referencing it in this appendix. Chapter 3 of this Redbook deals with installation of the ND. Obtain 3 NT machines: Machine A: Machine on which you will install ND. Machine B and Machine C: Machines on which you have your web servers / Machine A--------> IP1, IP2 \ ---------------> Machine B IP3 ---------------> Machine C IP4 You will need 4 IP addresses to setup this configuration. All four IP addresses must be in the same subnet. This means that the first 3 numbers in all four IP addresses must be same. For example: 1st IP: 25.34.145.A1 This is the primary IP address of machine A 2nd IP:25.34.145.A2 This is the IP that is to be load balanced. It will be used by machine A. - also known as the Cluster Address or Virtual IP Address - typed in a web browser’s URL field to initiate the HTTP request 3nd IP:25.34.144.B IP address of machine B 4nd IP:25.34.145.C IP address of machine C Pre-Installation Setup Machine A setup: This is the machine on which you will install ND. No special setup is required on this machine at this time. Before starting to install ND, make sure that you have an additional IP address, the cluster address, available for this machine. Contact your system administrator to obtain an additional IP address in the same subnet. 88 Note: You will use the 2nd IP address after you have installed ND. At no point will you have to manually add this 2nd IP address to the Network Properties in your NT system. For now make sure that you have an additional IP address available to you. We will discuss the usage of this IP address in detail later. Machines B and C setup: Before installing ND on machine A, configure the NT machines on which you have the two web servers. These machines have to be configured to receive and process HTTP requests which were originally meant for cluster address represented by the ND machine. Therefore we need to install a loopback adapter that’s configured with the Cluster Address that will be used by machine A. Install a loopback adapter on each web server machine: Refer to Chapter 3.2, page 95 of the Redbook “WebSphere Edge Server: Working with Web Traffic Express and Network Dispatcher”. This section titled “Configuration of the back-end servers” walks you through this step. No additional hardware or software is needed to install a loopback adapter. After installing the loopback adapter, your web server machines are equipped to accept requests originally sent for the cluster address. Delete a route from the routing table: Continue with the instructions in the Redbook. Follow instructions to delete an extra route You are now ready to start installing Network Dispatcher on machine A. Installation of ND Through the years, ND has been shipped packaged as a part of other IBM products. Currently, ND is shipped as a part of WebSphere Edge Server. You can download WebSphere Edge Server (WES) from: http://www-4.ibm.com/software/webservers/edgeserver/ JDK 1.3 is a prerequisite for ND. If you have a WebSphere 4.0 installation on an NT machine, you can use its JDK to be used with ND. Simply copy the directory C:\WebSphere\AppServer\java from your WebSphere machine to Machine A on which we plan to install the ND. If you do not have a WebSphere installation, you can download JDK 1.3 from: http://www.ibm.com/developerworks/java/jdk/index.html Follow instructions in the section titled “Installing Network Dispatcher on Windows NT” on page 40 of the Redbook to install the JDK. After installation update your PATH environment variable to include JDK 1.3. If you have other JDKs on the system, make sure that JDK 1.3 is before any other JDK in the PATH variable. After the JDK installation, the Redbook discusses ND installation. Continue with the installation of ND as described in the Redbook. At the end of the installation you will be required to restart your system - go ahead and restart it when prompted. In the next section we will configure ND to accept requests for the Cluster Address 25.34.145.A2 Post installation configuration of ND On Windows NT, ND starts automatically as a service. Before starting the configuration process, make sure that its status is “Started” by viewing the Services menu in the Control Panel. 89 Create Keys: Open a command prompt and run the command: C:\create ndkeys The successful completion message is: “Key files have been created successfully.” This will create a key file required for administration purposes. Configure the Web Server cluster: This is where we will configure ND to receive requests on behalf on the Cluster Address 25.34.145.A2 This can be done using either a command line utility called ndcontrol or using a GUI based tool. We will use the GUI to perform our configuration. The command line tool can be used for the automation of administrative tasks. The configuration consists of the following steps: 1. Create a cluster for the IP address 25.34.145.A2 2. Create a port number to be load balanced. This port number in our case would be the default HTTP port 80 3. Add servers to which ND will forward the HTTP requests. Here we will add servers B and C. Refer to section 4.1.2 of the Redbook. The section is titled “Configuration” and is on page 71. After finishing the configuration, test your basic ND functionality: 1. Start the web servers on machines B and C. Make sure you can access the default page on the web servers by pointing a web browser directly to their IP addresses. In our case, we used IBM HTTP Server on both the web server machines and edited the <title> tag in the file C:\IBM HTTP Server\htdocs\index.html on both machines so that we could identify the web server machine while using ND. 90 2. Open a web browser and type in the URL corresponding to the Cluster Address. If everything is installed and configured right, you will see an HTML page given by either machine B or machine C. You will be able to identify the machine by looking at the tittle of the web page. Refresh the page many times till you can see the HTML pages from both the web server machines. 91 Appendix B - Configuring TCP Timeout parameters by OS Windows NT/2000- On Windows NT/2000, the TCP timeout value defaults to 3000 milliseconds, or 3 seconds. However, the operating system retries the connection twice before failing, and on each retry it waits twice the value of the previous wait. This means that the default value of 3000 milliseconds (3 seconds) is in actuality a timeout value of 3000+6000+12000, or 21 seconds. This value was hard coded prior to Windows NT Service Pack 5. In Windows NT Service Pack 5 and higher, or Windows 2000, the following procedure can be used to view/modify this value: 1. Start Registry Editor (Regedt32.exe). 2. Locate the following key in the registry: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters 3. On the Edit menu, click Add Value, and then add the following registry value: Value Name: InitialRtt Data Type: REG_DWORD Value: <value> decimal (where <value> is the timeout value in milliseconds. For example, 5000 to set it to 5 seconds) Valid Range: 0-65535 (decimal) Default: 0xBB8 (3000 decimal) Description: This parameter controls the initial retransmission timeout used by TCP on each new connection. It applies to the connection request (SYN) and to the first data segment(s) sent on each connection. 4. Quit Registry Editor. 5. Restart the computer. AIX - On AIX, the TCP timeout value is specified in half-seconds, and can be viewed or modified using the network options (no) command. The default value is 150 half-seconds (75 seconds). To view the timeout value enter the command /usr/sbin/no -o tcp_keepinit To set the timeout value enter the command /usr/sbin/no -o tcp_keepinit=<value> where <value> is the timeout value in half-seconds. This is a runtime value, which must be set overtime the machine reboots. 92 Solaris - On Solaris, the timeout value defaults to 180000 milliseconds, or 3 minutes. This value can be viewed or set using the ndd command. To view the timeout value enter the command ndd -get /dev/tcp tcp_ip_abort_cinterval To set the timeout value enter the command ndd -set /dev/tcp tcp_ip_abort_cinterval <value> where <value> is the timeout value in milliseconds. HP-UX - On HP, the timeout value defaults to 75000 milliseconds, or 75 seconds. This value can be viewed or set using the ndd command. To view the timeout value enter the command ndd -get /dev/tcp tcp_ip_abort_cinterval To set the timeout value enter the command ndd -set /dev/tcp tcp_ip_abort_cinterval <value> where <value> is the timeout value in milliseconds. 93 Appendix C - MC/ServiceGuard setup instructions Ÿ Install MC/Serviceguard software to each node with swinstall and choose B3935DA package. For MC/Servicegard installation details, see the manual with MC/Serviceguard software. Ÿ Configure and update each node for the MC/Serviceguard cluster Give security permissions for both machines by adding entries into /etc/cmclustercmclnodelist file: Hp1.somecorp.com Hp2.somecorp.com root root # WebSphere database cluster # WebSphere database cluster If you want to allow non-root users to run cmviewcl, you should also add non-root user Ids into this file. Define name resolution service. By default, MC/ServiceGuard uses /etc/resolv.conf to obtain the addresses of the cluster nodes. In case DNS is not available, you should configure /etc/hosts file and configure /etc/nsswitch.conf to search the /etc/hosts file when other lookup strategies are not working. Ÿ Set up and Configure the shared disk array Connect the shared disk array to both nodes. Create volume groups, logical volumes and mirrors by pvcreate, vgcreate, vgextend, lvcreate, lvextend. Create cluster lock disks. Distribute volume groups to other node. You can distribute volume groups either by SAM or by LVM commands Ÿ Configure MC/Serviceguard cluster for WebSphere databases Use SAM, select Cluster -> High Availability Cluster Choose Cluster Configuration Selection Actions menu, and choose create cluster configuration, and follow the instructions Verify the cluster configuration by cmeckconf -k -v -C /etc/cmcluster/webspheredb2.config for IBM DB2 cmeckconf -k -v -C /etc/cmcluster/websphereoracle.config for Oracle Distribute binary configuration file to other node either by SAM or by commands 94 Backup volume group and cluster lock configuration data for possible replacement of disks later on Ÿ Configure packages and their services Install DB2 or Oracle in both machines and LDAP into the shared disk Create database instances into the shared lvg Use SAM to configure packages Customizing the package control scripts for vg activation, service IPs, volume groups, service start, and service stop. Since the control scripts are very long, we give key functions of our sample scripts for DB2 and Oracle as follows: For DB2, our sample service start script is function customer_defined_run_cmds { su - db2inst4<<STARTDB Db2start STARTDB test_return 51 } And our sample DB2 service stop script is function customer_defined_halt_cmds { su - db2inst4<<STOPDB db2 force applications all sleep 1 Db2stop STOPDB test_return 52 } For oracle, our sample service start script is function customer_defined_run_cmds { su - oracle<<STARTDB lsnrctl start export SIDS="APP ADMIN SESSION" for SID in $SIDS ; do export ORACLE_SID=$SID echo "connect internal\nstartup\nquit" | svrmgrl Done STARTDB test_return 51 } And our sample Oracle service stop script is 95 function customer_defined_halt_cmds { su - oracle<<STOPDB export SIDS="APP ADMIN SESSION" for SID in $SIDS ; do export ORACLE_SID=$SID echo "connect internal\nshutdown\nquit" | svrmgrl Done lsnrctl stop STOPDB test_return 52 } Distribute the package configuration by SAM Ÿ Verify that cluster operation and configuration to ensure Heartbeat networks are OK and up Networks are OK and up All nodes are OK and up All properties configured are what you want All services such as DB2, Oracle, LDAP are OK and up Logs should not have errors Ÿ Verify system failover from SAM by moving packages from one node to another node. 96 Appendix D HACMP setup instructions Ÿ Install HACMP 4.3.1 or HACMP 4.4 or HACMP/ES 4.4 ptf3 Use smit to install HACMP 4.3.1 or HACMP 4.4 or HACMP/ES 4.4pft3 into both nodes. For installation details, please see HACMP for AIX Installation Guide. You also can install HACMP after you configure network adapter and shared disk subsystem. Before you configure HACMP, network adapters must be defined, AIX operating system must be updated, and you must give permission for clustering nodes to access from one another. Modify the following configuration files: /etc/netsvc.conf, /etc/hosts, and /.rhosts. Make sure that each node’s service adapters and boot addresses are listed in the /.rhosts file on each cluster node so that the /usr/sbin/cluster/utilities/clruncmd command and the /usr/sbin/cluster/godm can run. Ÿ Service network configuration Public network is used to provide services to clients (WebSphere, Applications, LDAP), for example, we define two TCP/IP public networks in our configuration. Public network consists of a service/boot adapter and any standby adapters. It is recommended for you to use one or more standby adapters. Define standby IP addresses and boot IP addresses. For each adapter, use smit mktcpip to define IP label, IP address, and network mask. HACMP will define service IP address. Since such configuration process also changes hostname, configure the adapter with the desired default hostname last. Then use smit chinet to change service adapter so that they will boot from the boot IP addresses. Check your configuration by "lsdev -Cc if" Finally try to ping nodes to test the public TCP/IP connections. Ÿ Serial network configuration A serial network is needed to exchange heartbeat messages between the nodes in a HACMP cluster. A serial network allows Cluster Manager to continuously exchange keepalive packets should the TCP/IP-based subsystem, networks, or network adapters fail. The private network can be either a raw RS232 serial line or target mode SCSI, or target mode SSA loop. The HACMP for AIX serial line (a null-model, serial to serial cable) is used to connect the nodes. Use smit tty to create the tty device. After creating the ttydevice on both nodes, test communication over the serial line by entering the command stty </dev/ttyx on both nodes (where /dev/ttyx is the newly added tty device). Both nodes should display their tty settings and return to prompt if the serial line is OK. After testing, define the RS232 serial line to HACMP for AIX. Ÿ Shared disk array installation and LVG configuration The administrative data, application data, session data, LDAP data, log files and other file systems that need to be highly available are stored in the shared disks that use RAID technologies or are mirrored to protect data. The share disk array must be connected to both nodes with at least two paths to eliminate the single point of failure. We use IBM 7133 Serial Storage Architecture (SSA) Disk Subsystem. 97 You can either configure the shared volume group to be concurrent access or nonconcurrent access. Nonconcurrent access environment typically uses journaled file systems to manage data, while concurrent access environments use raw logical volumes. There is a graphical interface called TaskGuide to simplify the task of creating a shared volume group within an HACMP cluster configuration. In version 4.4, the TaskGuide has been enhanced with automatically creating a JFS log and displaying the physical location of available disks. After one node has been configured, import volume groups to the other node by using smit importvg Ÿ DB2 or Oracle and LDAP installation, configuration, instance and database creation For installation details, see the manuals for these products. You can install these products either on the disks in both nodes or on the shared disk. But you must keep all the shared data such as database files, transaction log files, and other important files on the shared disk array so that another node can access these data when the current node fails. We choose to install these products on each node. In this case, you must install the same version products on both nodes. Create DB2 or Oracle instances on the shared disk subsystem. We created three DB2 instances for administrative database, application database, session database for WebSphere. On the other test with Oracle, we also created three Oracle Instances for administrative database, application database, session database for WebSphere. You may need to change applheapsz for the created DB2 database or cursor parameters for Oracle. See the WebSphere installation guide for details. Install database clients in WebSphere nodes and configure database clients to connect to database server if “thick” database clients are used. For example, install DB2 clients on all WebSphere nodes and catalog remote node and database server. Ÿ Define the cluster topology and HACMP application servers The cluster topology comprises of cluster definition, cluster nodes, network adapters, and network modules. The cluster topology is defined by entering information about each component in HACMP-specific ODM classes. These tasks can be done by using smit HACMP. For details see the HACMP for AIX Installation Guide An “application server”, in HACMP or HACMP/ES, is a cluster resource that is made highly available by the HACMP or HACMP/ES software. For example, in our case, it is DB2 databases, Oracle databases, and LDAP servers, not to be confused with WebSphere application server. Use smit HACMP to define the HACMP (HACMP/ES) application server by a name and its start script and stop script. Ÿ Start and stop scripts for both DB2 and Oracle as the HACMP application servers Our sample DB2 service start script is db2start 98 And our sample DB2 service stop script is db2 force applications all db2stop For oracle, our sample service start script is lsnrctl start export SIDS="APP ADMIN SESSION" for SID in $SIDS ; do export ORACLE_SID=$SID echo "connect internal\nstartup\nquit" | svrmgrl Done And our sample Oracle service stop script is export SIDS="APP ADMIN SESSION" for SID in $SIDS ; do export ORACLE_SID=$SID echo "connect internal\nshutdown\nquit" | svrmgrl done lsnrctl stop You must be db2 or oracle users to use the above scripts, otherwise you need to become such user by su Ÿ Define and configure resource groups For HACMP and HACMP/ES to provide highly available application server service, such service has a set of cluster-wide resources essential to uninterrupted processing. The resource group can have both hardware and software resources such as disks, volume groups, file systems, network addresses, and application servers themselves. The resource group is configured in way that has a particular kind of relationship with a set of nodes. There are three kinds of node relationships: cascading, concurrent access, or rotating. For the cascading resource group, setting cascading without fallback (CWOF) attribute will minimize the client failure time. We used this configuration in our tests. Use smit to configure resource groups and resources in each group. Finally, you need to synchronize cluster resources to send the information contained on the current node to the other node. Ÿ Cluster verification Use /usr/sbin/cluster/daig/clverify on one node to check that all cluster nodes agree on the cluster configuration and the assignment of HACMP for AIX resources. You also can use smit HACMP to verify the cluster. If all node do not agree on the cluster topology and you want to define the cluster as it is defined on the local node, you can force agreement of cluster topology onto all nodes by synchronizing the 99 cluster configuration. After the cluster verification is OK, Start the HACMP cluster services by using smit HACMP on both nodes, and monitor the log file by $tail -f /tmp/HACMP.out and check database processes by $ps -ef | grep db2 or ora Ÿ Takeover verification To test a failover, use smit HACMP to stop cluster service with takeover option. On the other node, enter the following command to watch the takeover activity: #tail -f /tmp/HACMP.out You have several places to look up log information: /usr/adm/cluster.log provides a high-level view of current cluster status. It is a good place to look first when diagnosing a cluster problem. and /tmp/HACMP.out is the primary source of information when investigating a problem. You also can configure a remote machine to be able to connect to HACMP cluster and use clstat ( or xclstat) to monitor and test HACMP cluster. 100 Appendix E - Microsoft Clustering Setup Instructions The installation process must be followed exactly as presented here in order to assure accurate installation of the products. Read through the entire procedure and make sure you understand each aspect before starting the process. Once you start, it will be difficult to go back without starting from scratch. Verifying the Hardware/Software Before installing any aspect of this environment, you want to verify that the hardware and software you are about to work with is the proper version. The following links provide a list of prerequisites necessary for this installation. Ÿ Hardware Compatibility List (HCL) for Microsoft Windows 2000 Advanced Server http://www.microsoft.com/windows2000/server/howtobuy/upgrading/compat/default.asp Ÿ HCL for Microsoft Clustering Service http://www.microsoft.com/hcl/default.asp (Search on "cluster") Ÿ SQL Server Enterprise Edition Prerequisites http://msdn.microsoft.com/library/default.asp?url=/library/en-us/instsql/in_overview_74vn.as NOTE: You must use SQL Server Enterprise Edition to get Clustering support. The Developer Edition and Professional Edition are not cluster aware. Setup SCSI Hardware for Shared Disk With the testing that was done for this document, the IBM EXP 15 disk array was used as the shared disk. To get both of the nodes to utilize this disk array appropriately, the SCSI cards in each node needed to be configured to avoid a conflict. The following list are the SCSI options that were changed. On the EXP 15, all of the DIP switches were set to off. This provided one chain of 10 drives in the array. On node 1 of the cluster, we set the SCSI ID (in the SCSI BIOS) to 6 and on the other node to 7. This was the only change from the default SCSI setup we needed to do. Installing Microsoft Clustering Services Once you have all of your prerequisites accounted for, installing the Microsoft Clustering Services in Windows 2000 is fairly straightforward. I would recommend you read through the documents provided in the Useful Resources section of this document to learn more about the process if you have any questions. During setup of our environment, we followed the Microsoft Clustering Step-by-Step guide (see Useful Resources). The process outlined in that paper is fairly good. A list of additional notes we gathered while running through this process is below. 1. Make sure you have a server running the Active Directory service and a DNS server. This is necessary for adding the cluster nodes to a Windows 2000 domain and for providing domain wide user accounts that many of these services will run under. 101 Add the Cluster IP Address and Hostname to the DNS or hosts file of the second node to allow it to find the active cluster during the installation. 3. SQL Server depends on the Microsoft Distributed Transaction Coordinator for distributed queries and two-phase commit transactions, as well as for some replication functionality. At this point you need to install MSDTC by opening a command prompt and running comclust.exe on each node in the cluster. (comclust.exe can be found in the Winnt\system32 directory.) 2. Installing IBM DB2 Enterprise Extended Edition on the Clustered Windows Servers Installing IBM DB2 in a Windows 2000 cluster is a little different than installing on a single node. Follow the instructions in the IBM Whitepapers on Implementing IBM DB2 on a Windows Cluster found in the resources section along with the IBM DB2 documentation for installing to a cluster. I would recommend reading both of the papers (2 node and 4 node) since they both have information that the other doesn’t have. A list of additional notes we gathered while running through this process is below. 1. When creating a new datasource or WebSphere repository which will use the databases on the cluster, be sure to catalog the database using the Virtual IP address of the DB2 servers as the remote host IP. 2. Add the virtual name/IP that you identified in the DB2 Server setup to the DNS server or hosts files on all nodes that need to see this cluster. 3. The account used for DB2 Server service should be a domain login. 4. When running through the Verification tests (later in this paper), be sure that you set the DB2 Environment variable DB2_FALLBACK to ON using db2set DB2_FALLBACK=ON otherwise the system may not failover if a client is connected. 5. Be sure to add the DB2MPP-1 service as a clustered resource if it is not done through the DB2MSCS tool. Installing SQL Server on the Clustered Windows Servers Installing SQL Server in a Windows 2000 cluster is a little different than installing on a single node. Follow the instructions in the SQL Server documentation for installing to a cluster. A list of additional notes we gathered while running through this process is below. 1. Make sure both nodes can see the other nodes' c$ shares before starting the install. This requires having the "File and Print Sharing" installed on the network driver. 2. Add the virtual name/IP that you identified in the SQL Server setup to the DNS server or hosts files on all nodes that need to see this cluster. 3. The login for SQL Server should be a domain login. 4. Use both the SQL Server authentication and NT authentication to allow for the WebSphere JDBC driver to login. 102 Installing Merant SequeLink Service on the SQL Cluster Servers After installing Microsoft SQL Server on the cluster, you are now ready to install the Merant SequeLink Server on both nodes of the SQL Server cluster. Since the Merant SequeLink Server is not cluster aware, you need to install it separately on each node of the cluster. 1. 2. 3. 4. 5. 6. Download the Merant SequeLink Server from the WebSphere e-fix page. (A link to this page can be found in the Useful Resources section at the end of this document.) Unzip the Merant SequeLink files to a temporary directory. Run the SequeLink setup by opening a command prompt, changing to the directory where you extracted the files and run setup /v"IPE=NO" (This command prevents the setup from asking for a registration key.) Click through the Welcome page and accept the license. At the following step in the install, you must change the drive that the Merant server is installed to, to the shared drive. Here the shared drive is drive U. The next step asks for the Agent name/port, Server name/port, and an account to administer the server. I would recommend leaving the default settings unless this would cause a port conflict or there are security requirements that force you to change these. The most important aspect of this page is the user account. Be sure to enter an account from the domain not the local computer. This is so that when a failover occurs the account will still have the proper permissions to administer the new node and this account will still be 103 available after a failure of the first node. After clicking the Next button, the install will begin. This process only installs Merant SequeLink on one node. Even though we are installing the files to the shared drive, values are entered in the registry on each system. At this point, it is recommended that you backup the swandm.ini file found on the shared drive in the Program Files\MERANT\slserver51\cfg directory. 9. To finish the cluster install of Merant, failover to the second node and run through the exact same process to install the registry settings. Be sure to use the EXACT same install path location. This will overwrite the files on the shared drive but doing this does not cause any damage. 10. After installation has completed, go to the Windows Services on each system and change both the SLAgent51 and SLSQLServer51 to start manually. 7. 8. Configuring Merant as a Cluster Resource Once the Merant Server is installed on each node, you need to identify it as a cluster resource so that if it would fail on a node, the cluster will force a failover. 1. Start up the Cluster Administrator. (Start->Programs->Administrative Tools->Cluster Administrator) 2. Right-click on the SQL Server Group and select New->Resource. 3. Fill in the appropriate values: Name: SequeLinkAgent 104 Resource Type: Generic Service Group: SQL Server 4. Assign both nodes as possible owners of this resource. 105 5. Add the following to the Resource Dependencies: Disk U: (Quarum Disk) SQL IP Address SQL Network Name 106 6. Setup the service parameters. Service Name: SLAgent51 (Must match the service that is installed) 107 7. You need to add a registry key to be replicated on a failover. Click ADD and enter in the string SOFTWARE\MERANT\SequeLink51\SLSQLServer. 8. Click Finish. Now you must follow the same process for the SequeLinkListener Service. 108 9. Follow steps 2-8 of the above procedure but when you get to the Generic Service Parameters (Step 6) , enter the following: Service Name: SLSQLServer51 10. Click Finish. Merant is now setup as a resource in the cluster. Additional Setup Steps For the Merant Driver In addition to the setup already performed, the following steps need to be performed to get the environment to function properly. Changing Merant SequeLink to use TCP/IP Sockets Merant SequeLink 5.1 Server defaults to using Named Pipes to connect to MS SQL server. To change it to use TCP/IP you need to follow the steps outlined in the "How to Configure MERANT SequeLink Server for TCP/IP Sockets Net-Library" in the Useful Resources section. Change the Node Hostnames to Virtual Hostname in the swandm.ini File After installing the Merant SequeLink Server, the configuration file swandm.ini (found at SharedDrive:\Program Files\MERANT\slserver51\cfg) was written using the hostnames of each node in the cluster. This will not work for the clustered environment since everything needs to be referenced using the Virtual Hostname of the SQL Server. 109 Change all instances of specific nodes in swandm.ini to virtual IP/hostname (Some examples of where this occurs are the serviceHost and ServiceConnectInfo listings.) Setting up SQL Server Tables / Usernames To create a database and user to use for WebSphere, follow the steps below. 1.)Start the SQL Server Enterprise Manager on the active SQL node. 2.)Expand the Console Root until you get to the running SQL Server object. 3.)Expand this object and right-click on the Databases Folder. 4.)Select New Database. 5.)Enter the database name, change any necessary parameters and click the OK button. 6.)This will create the new database to use. Now you need to create a new user for this database that WebSphere will use to access this database. 1.)Using the tree in the left pane, expand the Security folder and right-click the Logins object. 2.)Select New Login. 3.)Enter a name for this login and have it use the SQL Server Authentication. Select the database you just created as the Default database. 4.)Click on the Database Access tab and checkmark the database you just created. Give this user the appropriate permissions to this database. 5.)Click OK and test that you can access the database with this user by using the Query Analyzer or some other database client. Configuring WebSphere to use SQL Server as an application Database Configuring WebSphere to use the Merant Drivers to access a clustered SQL Server is almost identical to setting up any other database driver/datasource. The main thing to remember is that you will be referencing the virtual IP of the SQL Server cluster, not each node individually. 1. 2. 3. 4. 5. 6. 7. Start out by installing the Merant database driver to each WebSphere node within your WebSphere domain. To do this, start up the WebSphere Administrative console and expand the tree to WAS Domain->Resources->JDBC Providers Right-click on the JDBC Providers folder and select New. Enter a name for the Driver such as "Merant Driver" and choose the Merant driver com.merant.sequelink.jdbcx.datasource.SequeLinkDataSource as the Implementation Class. Click on the Nodes tab and install the Merant Database Driver to the WAS Node, selecting the appropriate jars files D:\WebSphere\AppServer\lib\sljc.jar;D:\WebSphere\AppServer\lib\sljcx.jar After the driver has been installed, expand it and right-click on the Data Sources folder. Select New. Fill in the appropriate information such as the Data Source Name, JNDI Name, Database Name, User ID/password, etc. Additionally, be sure to add the following parameters to the Custom Properties. serverName = <SQLServer Virtual Name or IP> portNumber = 19996 (or whatever you identified when installing Merant) disable2Phase = true Setup any beans in your test application to use this datasource with the JNDI name you specified. 110 Configuring WebSphere to use SQL Server as an Administrative Repository Database If you would like to use the clustered SQL Server database as the database for your WebSphere Application Server repository, you have a couple of ways of specifying this. Starting with WAS 4.0, when you install WAS you are given the option to use the Merant drivers as the jdbc driver for your repository. This is the easiest way to setup WAS to use SQL Server as your repository database. The other way is to install WebSphere using one of the other supported database and then use the Database Conversion tool to change the appropriate settings and have WebSphere point to a different database. (See the Useful Resources for a link to the Database Conversion tool.) Be sure to get the dbconfig4, not the dbconfig conversion tool. Then follow the included instructions. Verifying your configuration Once you have everything setup, you will want to verify that the environment fails over correctly. The following scenarios can be run to verify that if certain points in the environment failed, it would roll over to the functioning node. 111 Ÿ Ÿ Ÿ Ÿ Manual Push to Passive Node - Within the cluster administrator, you can right-click on a group and click "Move Group" to move it to the passive node. Clean Shutdown of Active Node. Without manually moving any of the groups, go to Start->Shutdown and power down the Active Node. Unexpected power failure on Active Node. Physically pull the power cable from the Active Node. Public network cable failure on the Active Node. Only pull the public network cable from the Active Node. The Microsoft cluster should recognize that one (or more) of the resources failed on the active node and transition all of the components from the failing node to the alternate node. At this point, all the connections to WebSphere are broken, so on the next request from WebSphere, stale connections would be detected and a StaleConnectionException would be thrown to the WebSphere Application (as described in Chapter 6). After the transition to the new active node was completed WebSphere would reestablish connections to the database. Applications programmed according to the guidelines in Chapter 6 would also reconnect to the database. 112 Resources Ÿ IBM Redbooks available from http://www.redbooks.ibm.com Ÿ WebSphere Edge Server: Working with Web Traffic Express and Network Dispatcher (SG246172) Ÿ WebSphere V4.0 Advanced Edition Handbook (SG24-6176-00) Ÿ WebSphere 4.0 InfoCenter Ÿ http://www-4.ibm.com/software/webservers/appserv/infocenter.html Ÿ Microsoft HCL for Windows 2000 Ÿ http://www.microsoft.com/windows2000/server/howtobuy/upgrading/compat/default.asp Ÿ Microsoft HCL for Clustering Ÿ http://www.microsoft.com/hcl/default.asp (Search on "cluster") Ÿ Introducing Windows 2000 Clustering Technologies Ÿ http://www.microsoft.com/windows2000/techinfo/howitworks/cluster/introcluster.asp Ÿ Microsoft Clustering Step-by-Step guide Ÿ http://www.microsoft.com/windows2000/techinfo/planning/server/clustersteps.asp Ÿ Recommended Private "Heartbeat" Configuration on a Cluster Server (Q258750) Ÿ http://support.microsoft.com/default.aspx?scid=kb;en-us;Q258750. Ÿ WebSphere 4.0 Database Conversion Tool Ÿ http://www-4.ibm.com/software/webservers/appserv/tools_intro.htm Ÿ Handling WebSphere Connections Correctly Ÿ http://www7.software.ibm.com/vad.nsf/data/document4382?OpenDocument&p=1&BCT=1&Footer=1 Ÿ How to Configure MERANT SequeLink Server 5.1 for MS SQL Server using TCP/IP Sockets Net-Library Ÿ http://www-1.ibm.com/servlet/support/manager?rs=180&rt=0&org=SW&doc=1008271 Ÿ Merant Server 5.1 Download on WebSphere E-fix page Ÿ http://www-3.ibm.com/software/webservers/appserv/efix-archive.html#fp353 Ÿ Configuring Merant to use TCP/IP instead of Named Pipes Ÿ http://www-1.ibm.com/servlet/support/manager?rs=180&rt=0&org=SW&doc=1008271 Ÿ Creating a Merant Datasource in WebSphere Ÿ http://www-1.ibm.com/servlet/support/manager?rs=180&rt=0&org=SW&doc=1008413 Ÿ Instructions on installing the Merant SequeLink 5.1 Server Ÿ http://www7b.boulder.ibm.com/wsdd/library/techarticles/0109_hiranniah/0109_hiranniahpt1.html Ÿ Implementing IBM DB2 Universal Database Enterprise Extended Edition with Microsoft Cluster Server (4 Node Cluster) Ÿ http://www-4.ibm.com/software/data/pubs/papers/#mscseee Ÿ Implementing IBM DB2 Universal Database Enterprise Edition with Microsoft Cluster Server (2 Node Cluster) Ÿ http://www-4.ibm.com/software/data/pubs/papers/#mscs 113