Download SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

IBM® Cloud and Smarter Infrastructure Software SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide Document version 2.3.6 IBM SmartCloud Orchestrator Performance Team © Copyright International Business Machines Corporation 2014. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide CONTENTS Contents ............................................................................................................................iii List of Figures................................................................................................................... vi Author List .......................................................................................................................viii Revision History ............................................................................................................... ix 1 Introduction.......................................................................................................... 10 2 SmartCloud Orchestrator 2.3 Overview............................................................... 11 3 4 2.1 Functional Overview ................................................................................ 11 2.2 Architectural Overview............................................................................. 13 Performance Overview ........................................................................................ 16 3.1 Sample Benchmark Environment ............................................................ 16 3.2 Key Performance Indicators .................................................................... 18 3.2.1 Concurrent User Performance................................................................ 19 3.2.2 Provisioning Performance....................................................................... 22 Performance Benchmark Approaches................................................................. 24 4.1 Monitoring and Analysis Tools................................................................. 24 4.1.1 5 nmon Samples ........................................................................................ 25 4.2 Infrastructure Benchmark Tools............................................................... 27 4.3 Cloud Benchmarks .................................................................................. 27 Capacity Planning Recommendations................................................................. 28 iii 6 7 5.1 Cloud Capacity Planning Spreadsheet .................................................... 28 5.2 SmartCloud Orchestrator Management Server Capacity Planning ......... 29 5.3 Provisioned Virtual Machines Capacity Planning .................................... 30 Cloud Configuration Recommendations.............................................................. 34 6.1 OpenStack Keystone Cache Configuration ............................................. 34 6.2 OpenStack Keystone Worker Support..................................................... 34 6.3 IaaS Gateway Cluster Support ................................................................ 35 6.4 IBM Workload Deployer Configuration .................................................... 35 6.5 Virtual Machine IO Scheduler Configuration............................................ 36 6.6 Advanced Configuration and Power Interface Management ................... 36 6.7 Java Virtual Machine Heap Configuration ............................................... 37 6.8 Database Configuration ........................................................................... 37 Cloud Maintenance Recommendations............................................................... 39 7.1 SmartCloud Orchestrator Volume Management...................................... 39 7.1.1 Install Time Requirements ...................................................................... 39 7.1.2 Long Running System Requirements..................................................... 40 7.2 The SmartCloud Orchestrator Database and Schema Summary............ 43 7.3 Database Management ........................................................................... 43 7.3.1 DBMS Versions....................................................................................... 43 7.3.2 Automatic Maintenance .......................................................................... 43 7.3.3 Operating System Configuration (Linux) ................................................ 44 iv SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide 7.4 8 Database Hygiene Overview ................................................................... 44 7.4.1 Database Backup Management ............................................................. 45 7.4.2 Database Statistics Management........................................................... 47 7.4.3 Database Reorganization ....................................................................... 48 7.4.4 Database Archiving................................................................................. 49 7.4.5 Database Maintenance Automation ....................................................... 50 Summary Cookbook ............................................................................................ 51 8.1 Base Installation Recommendations ....................................................... 51 8.2 Post Installation Recommendations ........................................................ 52 8.3 High Scale Recommendations ................................................................ 52 Appendix A: SmartCloud Orchestrator Monitoring Options............................................. 53 A.1 OpenStack Monitoring ............................................................................. 53 A.2 SmartCloud Orchestrator Monitoring ....................................................... 55 A.3 Infrastructure Monitoring.......................................................................... 55 Appendix B: OpenStack Keystone Monitoring ................................................................ 58 B.1 PvRequestFilter ....................................................................................... 58 B.2 Enabling PvRequestFilter ........................................................................ 59 Appendix C: IaaS Gateway Cluster Enablement............................................................. 61 References ...................................................................................................................... 64 v LIST OF FIGURES Figure 1: Revision History................................................................................................................. ix Figure 2: SCO Functional Overview ................................................................................................ 11 Figure 3: SCO Cloud Marketplace View .......................................................................................... 12 Figure 4: SCO Architecture Reference Topology ............................................................................ 13 Figure 5: SCO Sample Benchmark Environment ............................................................................ 16 Figure 6: Benchmark Data Model Population .................................................................................. 20 Figure 7: Load Driving (User) Scenarios ......................................................................................... 22 Figure 8: Provisioning Performance in a Closed System ................................................................ 22 Figure 9: Monitoring and Analysis Tools.......................................................................................... 24 Figure 10: nmon Samples................................................................................................................ 26 Figure 11: Infrastructure Benchmark Tools ..................................................................................... 27 Figure 12: SCO Management Server Capacity Planning ................................................................ 29 Figure 13: Capacity Planning Tool: Inquiry Form ............................................................................ 30 Figure 14: Capacity Planning Tool: User Demographic Information ............................................... 31 Figure 15: Capacity Planning Tool: Systems and Storage .............................................................. 31 Figure 16: Capacity Planning Tool: System and Workload Options................................................ 32 Figure 17: Capacity Planning Tool: Virtual Machine Requirements ................................................ 32 Figure 18: Planning Tool: Confirmation Screen............................................................................... 32 Figure 19: Planning Tool: System Summary ................................................................................... 33 Figure 20: Keystone Worker Configuration...................................................................................... 35 Figure 20: IWD Configuration .......................................................................................................... 35 Figure 21: Modifying the IO Scheduler ............................................................................................ 36 Figure 22: Java Virtual Machine Heap Change Sets....................................................................... 37 Figure 23: Database Configuration Change Sets............................................................................ 38 Figure 24: SCO 2.3 Volume Management: Install Time Requirements .......................................... 39 Figure 25: Long Running System Requirements: System A ........................................................... 40 Figure 26: Long Running System Requirements: System B ........................................................... 41 Figure 27: Long Running System Requirements Summary ............................................................ 41 Figure 28: Database and Schema Summary................................................................................... 43 Figure 29: DBMS Versions .............................................................................................................. 43 Figure 30: Database Automatic Maintenance Configuration ........................................................... 44 Figure 31: Database Backup with Compression Command............................................................ 45 Figure 32: Database Offline Backup Restore .................................................................................. 45 vi SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide Figure 33: Database Online Backup Schedule................................................................................ 45 Figure 34: Database Incremental Backup Enablement ................................................................... 45 Figure 35: Database Online Backup Manual Restore ..................................................................... 46 Figure 36: Database Online Backup Automatic Restore ................................................................. 46 Figure 37: Database Log Archiving to Disk ..................................................................................... 46 Figure 38: Database Log Archiving to TSM..................................................................................... 46 Figure 39: Database Roll Forward Recovery: Sample A................................................................. 47 Figure 40: Database Roll Forward Recovery: Sample B................................................................. 47 Figure 41: Database Backup Cleanup Command ........................................................................... 47 Figure 42: Database Backup Automatic Cleanup Configuration ..................................................... 47 Figure 43: Database Statistics Collection Command ...................................................................... 47 Figure 44: Database Statistics Collection Table Iterator ................................................................. 48 Figure 45: Database Reorganization Commands ........................................................................... 48 Figure 46: Database Reorganization Table Iterator ........................................................................ 48 Figure 47: Database Archiving Impact............................................................................................. 49 Figure 48: Sample Database Maintenance Schedule ..................................................................... 50 Figure 49: Sample Database Maintenance Crontab Entry .............................................................. 50 Figure 50: Base Installation Recommendations .............................................................................. 51 Figure 51: Post Installation Recommendations ............................................................................... 52 Figure 52: High Scale Recommendations ....................................................................................... 52 Figure 53: OpenStack Ceilometer Metrics....................................................................................... 53 Figure 54: OpenStack Ceilometer Core Metrics .............................................................................. 54 Figure 55: Infrastructure Core Metrics ............................................................................................. 57 Figure 56: Keystone Monitoring PvRequestFilter Format................................................................ 58 Figure 57: Keystone Monitoring PvRequestFilter Sample Output................................................... 58 Figure 58: Keystone Monitoring Log Messages Example ............................................................... 59 Figure 59: Keystone Monitoring Statistics Example ........................................................................ 60 vii AUTHOR LIST This paper is the team effort of a number of cloud performance specialists comprising the SmartCloud Orchestrator performance team. Additional recognition goes out to the entire SmartCloud Orchestrator and OpenStack development teams. Mark Leitch (primary contact for this paper) IBM Toronto Laboratory Amadeus Podvratnik Marc Schunk Peter Altevogt IBM Boeblingen Laboratory Nate Rockwell IBM USA Tiarnán Ó Corráin IBM Ireland viii Alessandro Chiantera Giorgio Corsetti Massimo Marra Michele Licursi Paolo Cavazza Ugo Madama IBM Rome Laboratory SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide REVISION HISTORY Date Version st February 1 , 2014 Revised By Comments Draft MDL Initial version for review. rd 2.3.0 MDL Initial version for distribution. th February 28 , 2014 2.3.1 MDL Update based on review comments. th March 18 , 2014 2.3.2 MDL Volume management update based on SCO 2.3.0.1 delivery. Addition of monitoring points in Appendix A. March 27th, 2014 2.3.3 MDL Added maintenance crontab samples and scripts. February 23 , 2014 th April 8 , 2014 2.3.4 MDL Added IWD configuration options. th 2.3.5 MDL Added Keystone monitoring reference material. th 2.3.6 MDL Added Keystone worker, IaaS gateway cluster material. August 20 , 2014 August 28 , 2014 Figure 1: Revision History ix 1 Introduction Capacity planning involves the specification of the various components of an installation to meet customer requirements, often with growth or timeline considerations. A key aspect of capacity planning for cloud, or virtualized, environments is the specification of sufficient physical resources to provide the illusion of infinite resources in an environment that may be characterized by highly variable demand. This document will provide an overview of capacity planning for the IBM SmartCloud Orchestrator (SCO) Version 2.3. In addition, it will offer management best practices to achieve a well performing installation that demonstrates service stability. SCO Version 2.3 offers end to end management of service offerings across a number of cloud technology offerings including VMware, Kernel-based Virtual Machine (KVM), IBM PowerVM, and IBM System z. A key implementation aspect is integration with OpenStack, the de facto leading open virtualization technology. OpenStack offers the ability to control compute, storage, and network resources through an open, community based architecture. In this document we will provide an SCO 2.3 overview, including functionality, architecture, and performance. We will then offer the capacity planning recommendations, including considerations for hardware configuration, software configuration, and cloud maintenance best practices. A summary “cookbook” is provided to manage installation and configuration for specific instances of SCO. Note: This document is considered a work in progress. Capacity planning recommendations will be refined and updated as new SCO releases are available. While the paper in general is considered suitable for all SCO Version 2.3 releases, it is best oriented towards SCO Version 2.3.0.1. In addition, a number of references are provided in the References section. These papers are highly recommended for readers who want detailed knowledge of SCO server configuration, architecture, and capacity planning. Note: Some artifacts are distributed with this paper. The distributions are in zip format. However Adobe protects against files with a “zip” suffix. As a result, the file suffix is set to “zap” per distribution. To use these artifacts, simply rename the distribution to “zip” and process as usual. 10 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide 2 SmartCloud Orchestrator 2.3 Overview An overview of SCO Version 2.3 will be provided from the following perspectives: 1. Functional 2. Architectural 2.1 Functional Overview The basic functional capability of SCO involves the management of cloud computing resources for dynamic data centers. The following figure provides a functional (service level) overview of SCO. Figure 2: SCO Functional Overview In a nutshell, SCO offers infrastructure, platform, and orchestration services that make it possible to lower the cost of service delivery (both in terms of time and skill) while delivering higher degrees of standardization and automation. A more detailed cloud marketplace view of the SCO solution follows. 11 Figure 3: SCO Cloud Marketplace View The core functional capabilities of SCO include the following.  Workflow Orchestration. The Business Process Manager (BPM) component offers a standard library as well as a graphical editor for workflow orchestration. Overall, this provides a powerful mechanism for complex and custom business process in the cloud context.  Pattern Management. The IBM Workload Deployer (IWD) offers sophisticated pattern support for deploying multi node applications that may consist of complex middleware. Once again, graphical editor support for pattern management is provided.  Image Management. This is comprised of an image construction and composition tool, as well as a Virtual Image Library (VIL) to facilitate image development and reduce image sprawl.  Service Management. Service management options are available in the SCO Enterprise edition. It provides a set of management utilities to further facilitate business process management.  Not shown in the diagram is a Scalable Web Infrastructure to facilitate cloud self service offerings. For more information please consult the SCO information center (URL). In addition, the SCO resource center is available (URL). 12 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide 2.2 Architectural Overview The following diagram shows the reference deployment topology for SCO. A description of the reference topology follows. Figure 4: SCO Architecture Reference Topology 13 The core of the reference topology is based on a core set of virtual machines:  Central Server 1. This server hosts the DB2 Database Management System (DBMS). The performance of the DBMS is critical to the overall solution and is dealt with extensively in Section 7.3.  Central Server 2. This server hosts OpenStack Keystone, providing identity, token, catalog, and policy services. In addition, it hosts the Virtual Image Library (VIL) and SCO gateway services. The most critical aspect of this server is managing the Keystone configuration as described in Section 6.1.  Central Server 3. This server hosts the IBM Workload Deployer pattern engine and the Scalable Web UI. Performance configuration of these components is described in Section 6.  Central Server 4. This server hosts the Business Process Manager engine. Performance configuration of these components is described in Section 6.  Central Server 5. This server hosts the System Automation Application Manager. This is an optional virtual machine that can be used to manage automatic start and stop orchestration of the SCO management server itself. Associated with these core server virtual machines are a number of region servers. Region servers may represent a specific cluster or geographic zone of cloud compute nodes. Sample compute nodes are shown for VMware, KVM, and PowerVM, with associated communication paths. For example, for VMware the SCE driver is used to drive the operation of the VMware cluster. For KVM, the OpenStack control node is used to coordinate the KVM instance. Given this is a virtual implementation, some considerations should be kept in mind:  In general, it is more difficult to manage performance in a virtual environment due to the additional hypervisor management overhead and system configuration.  Device parallelism via dedicated storage arrays/LUNs is preferred. Sample approaches, from most impactful to least impactful, are provided below. o Separate data stores for “managed from” and “managed to” environments. o Spread data stores across several physical disks to maximize storage capability. o Separate data stores for image templates and provisioned images. o Employ the “deadline” or “noop” scheduler algorithm for management server and provisioned VMs (see Section 6.5). o Optimize base storage capability (i.e. SSD with “VMDirectPath” enablement for VMware). Servers where this may be critical, due to their 14 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide dependency on disk IO capabilities, are Central Server 1 and the VMware vCenter instances.  Network optimization, for example 10GbE adoption. In addition, segment customer networks to an acceptable level to reduce address lookup impact. 15 3 Performance Overview There are two distinct aspects of cloud performance: 1. Performance of the SCO management server itself. This is the primary focus of this section. 2. Performance of the provisioned server instances. This is more of a capacity planning statement, and is covered in Section 5.3. We will provide a general overview of the Key Performance Indicators (KPIs) for the SCO management server. The following sections will describe the general benchmark environment, and the associated KPIs. 3.1 Sample Benchmark Environment The following figure shows a sample configuration that has been used for SCO benchmarks. Figure 5: SCO Sample Benchmark Environment 16 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide The environment is characterized by the following features, broken down in terms of the SCO management server (aka “managed from”) and the associated cloud (aka “managed to”).  Managed from: o o  Server configuration:  4/5 HS22V Blades with 2 x 4 cores Intel Xeon x5570 2.93 GHz.  8 physical cores per blade, 16 logical cores when hyper-threading is enabled.  72 GB RAM per blade.  2 x Redundant 10G Ethernet Networking (Janice HSSM).  2 x Redundant 8G FC Network (Qlogic FC SM). Storage configuration: 1 x DS3400 with 4 Exp with 12 Disk 600 GB SAS 10K each (48 x 600 GB = 28.8 TB raw). Managed to: o Server configuration:  Tens of HS22V Blades with 2 x 6 cores Intel Xeon x5670 2.93 GHz.  12 physical cores per blade, 24 logical cores when hyper-threading is enabled.  72 GB RAM per blade.  2 x Redundant 10G Ethernet Networking (Janice HSSM).  2 x Redundant 8G FC Network (Qlogic FC SM). o Storage configuration: 1 x Storwize v7000 with 3 Exp with 12 Disks 2 TB NL-SAS 7.2k each (36 x 2 TB = 72 TB raw). o Storage access has been configured to use the multi-path access granted by Storwize. In particular, VMware ESXi servers have been configured to use all of the 8 active paths to access LUNs using a round robin policy. 17 3.2 Key Performance Indicators The following Key Performance Indicators are managed for SCO through a set of comprehensive benchmarks. 1. Concurrent User Performance, comprising: a. Average response time for SCO pages related to administrative tasks. b. Average response time for SCO pages related to end user tasks. 2. Provisioning throughput, comprising: a. Provisioning throughput for a vSys with a single part. b. Average service time for provisioned VMs. 3. LAMP (Linux, Apache, MySQL, Python) stack performance, comprising: a. vApp deployment time. b. vApp stop time. c. vApp deletion time. 4. Bulk windows stack performance comprising vSys with multiple parts (15 VMs) provisioning time. 5. Virtual Image Library performance comprising: a. Registration discovery throughput. b. Registration basic indexing throughput. c. Image checkin time. d. Image checkout time. A key aspect of the benchmarks is they are run with associated background workloads and for a long duration (e.g. weeks or months). The rationale behind this is very simple: to run benchmarks that closely emulate the customer experience and will drive “real world” results (versus overly optimistic lab based results). We will describe the concurrent user and provisioning throughput KPIs in more detail. 18 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide 3.2.1 Concurrent User Performance SCO User Interface performance is established through concurrent user benchmark tests. In order to understand the applicability of such a benchmark, it is important to understand what is meant by a concurrent user. Consider:  P = total population for an instance of SCO (including cloud administrators, end users, etc.).  C = the concurrent user population for an instance of SCO. Concurrent users are considered to be the set of users within the overall population P that are actively managing the cloud environment at a point in time (e.g. administrator operations in the User Interface, provisioning operations, etc.). In general, P is a much larger value than C (i.e. P >> C). For example, it is not unrealistic that a total population of 200 users may have a concurrent user population of 40 users (i.e. 20%). For the concurrent user workload driven for SCO, there are three sets of criteria that drive the benchmark: 1. Load driving parameters. 2. Data population. 3. Load driving (user) scenarios. Load Driving Parameters The following load driving parameters apply. 1. User transaction rate control. The frequency that simulated users drive actions against the back end is managed via loop control functions. Closed loop simulation approaches are used where a new user will enter the system only when a previous user completes. Through the closed loop system, steady state operations under load may be driven. 2. Think times. Think times are the “pause” between user operations, meant to simulate the behavior of a human user. The think time interval used is [100%,300%] (meaning, the replay via the load driver is up to three times the rate of the scenario recording rate). 3. Bandwidth throttling. In order to simulate low speed or high latency lines, bandwidth throttling is employed for some client workloads. The throttle is set to a value that represents a moderate speed ADSL connection (cable/DSL simulation setting of 1.5 Mbps download, 384 Kbps upload). 19 Data Population Parameters The benchmark is run against a data model that represents a large scale customer environment. The following table shows a sample configuration where the system is populated with data to represent a large number of users, active Virtual System instances, and active Virtual Machines existing prior to SCO installation. Through this approach, the workload for managing the solution is representative of some customer environments. Benchmark Parameter Value Cloud Administrators Cloud Domains Tenants Users 1 1 1 200 1 (VMware) 1 1 40 (20Linux, 20 Windows) 20 + 1 (20 Linux vSys patterns, 1 bulk Windows pattern) 1 (LAMP vApp for VMware domain) 5 (1 flavor for RHEL, 3 flavors for Windows, 1 flavor for vApp) 20 (1 per Linux vSys Pattern) 400 (10 per image template  200 Linux, 200 Windows) Hypervisor Types Cloud Groups Environment Profile Image Templates vSys Patterns vApp Patterns Flavors Active vSys instances Standalone (Unmanaged) VMs Figure 6: Benchmark Data Model Population 20 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide Load Driving (User) Scenarios The concurrent user population (i.e. C) is broken down into the following user profile distribution and scenarios. User Profile Number of Users: 20 (50% overall) User Type: End User Task Type: VM Provisioning Activity: vSys with single part (Linux) provisioning through SelfService Catalog (SSC) offering on VMware. Number of Users: 16 (40% overall) User Type: End User Task Type: User Management Activity: End user operations through Self-Service Catalog (SSC) offering. Number of Users: 2 (5% overall) User Type: Administrator Task Type: Monitoring Activity: Administrative operations through the IBM Workload Deployer user interface. Number of Users: 1 (2.5% overall) User Type: End User Task Type: Provisioning Activity: vApp (LAMP) provisioning through IBM Workload Deployer user interface on VMware. Number of Users: 1 (2.5% overall) User Type: End User Task Type: Provisioning Scenario per User 1. Login. 2. Provision vSys single part using SSC offering. 3. Wait until available. 4. Go to the vSys instance details page. 5. Delete vSys using SSC offering. 6. Wait until deletion complete. 7. Logout. 8. Enter next cycle according to arrival rate. 1. Login. 2. Submit SSC offering "Create User in VM", selecting one of the VMs belonging to one of the pre-populated vSys. 3. Wait until done. 4. Submit SSC offering "Delete User in VM", selecting the same VM. 5. Wait until done. 6. Logout. 7. Enter next cycle according to arrival rate. 1. Login. 2. List hypervisors. 3. Select a hypervisor. 4. List VMs in hypervisor. 5. Show all instances. 6. Go to "My Requests". 7. Sort the requests by status. 8. View the trace log. 9. Logout. 1. 2. 3. 4. 5. 6. 7. 8. 9. Login. Provision vApp using the IWD UI. Wait until available. Stop vApp using the IWD UI. Wait until done. Delete vApp using the IWD UI. Wait until deletion complete. Logout. Enter next cycle according to arrival rate. 1. Login. 2. Provision vSys bulk Windows using SSC offering. 3. Wait until available. 4. Go to vSys instance details page. 5. Delete vSys bulk Windows using SSC 21 Activity: vSys with multiple parts (bulk Windows) provisioning through Self-Service Catalog (SSC) offering on VMware. offering. 6. Wait until deleted. 7. Logout. 8. Enter next cycle according to arrival rate. Figure 7: Load Driving (User) Scenarios In overall terms, 55% of the load driving activities are driving Virtual Machine provisioning scenarios. The remaining 45% of scenarios are general administration and management tasks. For the active workload, the user operations meet the following response time thresholds.  Administrative page response times: 90% of pages < 10s, 100% of pages < 15s.  End user operations: 90% of pages < 2s, 100% of pages < 5s. 3.2.2 Provisioning Performance Cloud provisioning is enormously complex in performance terms. Hardware configuration, user workloads, image properties, and a multitude of other factors combine to determine overall capability. SCO provisioning performance is typically measured via a closed system, defined as an isolated system where we can demonstrate a constant sustained provisioning workload. In order to achieve this, as requests complete within the system, new requests are initiated. Figure 8: Provisioning Performance in a Closed System The performance systems running SCO workloads literally run for months. These systems are treated like customer systems with 24x7 operations and field ready maintenance approaches in place (as described in Section 7). In terms of provisioning performance, the following are sample statistics a long run scenario driven for a number of weeks, once a period of operational stability has been reached based on the recommendations provided in this paper.  Number of systems provisioned: 172,536 VMs.  Provisioning rate (average): 187 VMs/hour.  Service times (average): 3 minutes 28 seconds (IBM Workload Deployer with VMware linked clones).  Workflow capability: On the order of 300 workflows per hour (generally short running workflows under a minute in duration). 22 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide  Success rate: 99.996% Given this is sustained, continuous workload, higher peak workloads are, of course, possible. The success rate is considered especially noteworthy. 23 4 Performance Benchmark Approaches As part of cloud management and capacity planning, it is valuable to manage cloud benchmarks. Value propositions include:  Understanding the capability of the cloud infrastructure (and potentially poorly configured or under performing components of the infrastructure).  Understanding the base capability of the SCO implementation and associated customization.  Understanding the long term performance stability of the system. We will describe basic system monitoring approaches, infrastructure benchmarks, and cloud benchmarks. 4.1 Monitoring and Analysis Tools The following table shows the core recommended monitoring and analysis tools. Tool Description pdcollect SCO log collection tool. Documentation and recommended invocation: SCO Product Information Center esxtop VMware performance collection tool. Documentation: URL Recommended invocation: esxtop -b -a -d 60 -n <number_of_samples> > <output file> nmon nmon is a comprehensive system monitoring tool for the UNIX platform. It is highly useful for understanding system behavior. Documentation: URL Sample invocation: nmon -T -s <samplerate> -c <iterations> -F <output file> Note: On Windows systems, Windows perfmon may be used. db2support Database support collection tool. Documentation: URL Recommended invocation: db2support <result directory> -d <database> -c -f -s -l DBMS Snapshots WAIT DBMS snapshot monitoring can offer insight into SQL workload, and in particular expensive SQL statements. Documentation: URL Java WAIT monitoring can provide a non invasive view of JVM performance through accumulated Java cores and analytic tools. Documentation and recommended invocation: URL Figure 9: Monitoring and Analysis Tools 24 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide 4.1.1 nmon Samples The following figures represent nmon samples for a 22 concurrent user scenario (based on the user profiles in Section 3.2.1. 25 Figure 10: nmon Samples Analysis of the samples follows.  The samples show the summary utilization (CPU, IO) for Central Servers 1 through 4, and the Region Server.  All servers have 8 vCPUs allocated, with the exception of Central Server 4, which has 4 vCPUs.  In general, all nodes are consuming less than 1 vCPU. The exceptions are the Region Server (≈1.6 vCPUs) and Central Server 3 (≈2.4 vCPUs). This reflects an IBM Workload Deployer scenario.  For IO, the bulk of the IO workload is associated with the database node. This is not surprising, and reinforces the recommendations for IO optimization on the DBMS node.  While the summary view is valuable for an “at a glance” assessment, it is always recommended to look at the fine grained results in nmon to ensure processor utilization is healthy (e.g. minimal or no blocked processes, minimal or zero wait times, healthy multi processor utilization). 26 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide 4.2 Infrastructure Benchmark Tools The following table shows some recommended infrastructure benchmark tools. Tool Description iometer I/O subsystem measurement and characterization tool for single and clustered systems. Documentation: URL Recommended invocation: dynamo /m <client host name or ip> iperf TCP and UDP measurement and characterization tool that reports bandwidth, delay, jitter, and datagram loss. Documentation: URL Recommended server invocation: iperf –s Recommended client invocation #1: iperf -c <server host name or ip> Recommended client invocation #2: iperf -c <server host name or ip> -R UnixBench UNIX measurement and characterization tool, with reference benchmarks and evaluation scores. Documentation: URL Recommended invocation: ./Run Figure 11: Infrastructure Benchmark Tools 4.3 Cloud Benchmarks Cloud benchmarks should be based on enterprise utilization. Sample benchmarks that are easy to manage include the following. 1. Single VM deployment times. 2. Small scale concurrent VM deployment times (e.g. 10 requests in parallel). 3. REST API response times. It is recommended to establish a small load driver, record a baseline, and then use these small benchmarks as a standard to assess ongoing cloud health. More complex benchmarks, including client request monitoring approaches, may of course be established. For OpenStack specific benchmarks, OpenStack Rally may be leveraged (see the References section for further detail). In addition, the Open Systems Group is involved in cloud computing benchmark standards. A report, including the IBM CloudBench tool, is available in the References section. 27 5 Capacity Planning Recommendations We will provide capacity planning recommendations through three approaches. 5.1  Static planning via a spreadsheet approach.  Capacity planning for the SCO management server (aka the “managed from” infrastructure).  Capacity planning for the provisioned Virtual Machines (aka the “managed to” infrastructure). Cloud Capacity Planning Spreadsheet In order to provide a desired hardware and software configuration for an SCO implementation, a wide range of parameters must be understood. The following questions are usually relevant. 1. What operations are expected to be performed with SCO? 2. What are the average and peak concurrent user workloads? 3. What is the enterprise network topology? 4. What is the expected workload for provisioned virtual servers, and how do they map to the physical configuration? 5. For the provisioned servers: a. What is the distribution size? b. What are the application service level requirements? A capacity planning spreadsheet is attached to this paper (“SCO Capacity Planning Profile v2.3.3.xlsx”). The spreadsheet may be used to provide a cloud profile for further sizing activities (e.g. a capacity planning activity in association with the document authors). 28 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide 5.2 SmartCloud Orchestrator Management Server Capacity Planning The SCO management server requirements are documented in the SCO Information Center (URL). The summary table is repeated here for discussion purposes 1 . Processor (vCPUs) Memory (GB) Storage (GB) Minimum 2 vCPUs 6 GB 100 GB Recommended 4 vCPUs 12 GB 200 GB Minimum 2 vCPUs 8 GB 141 GB Recommended 4 vCPUs 12 GB 200 GB Minimum 2 vCPUs 4 GB 80 GB Recommended 4 vCPUs 8 GB 160 GB Minimum 2 vCPUs 6 GB 50 GB Recommended 2 vCPUs 8 GB 60 GB n/a n/a n/a Recommended 2 vCPUs 4 GB 20 GB Minimum 2 vCPUs 4 GB 76 GB Recommended 8 vCPUs 8 GB 160 GB Minimum 10 vCPUs 28 GB 447 GB Recommended 24 vCPUs 52 GB 800 GB Server & Configuration Central Server 1 Central Server 2 Central Server 3 Central Server 4 Central Server 5 (optional) Minimum Region Server Totals Figure 12: SCO Management Server Capacity Planning While further qualifiers are available in the Information Center, some comments apply. 1  In general, the recommended vCPU and memory allocations should be met.  To determine the ratio of virtual to physical CPUs, monitoring of the production system is required. For performance verification, a 1:1 mapping is used.  For the physical mapping, it is important to distinguish between “real” cores and hyper threaded (HT) cores. External benchmarks suggest an HT core may yield 30% of the capability of a “real” core. Provided values reflect the SCO 2.3.0.1 release. 29  5.3 The recommended storage amounts are highly subjective. For example, the minimum recommendations are sufficient for performance verification systems driven for months (with some minor exceptions). Recommended volume management approaches are provided in Section 7.1. Provisioned Virtual Machines Capacity Planning Managing cloud workloads is typically driven as a categorization exercise where workload “sizes” are used to determine the overall capacity requirements. A capacity planning tool is available for managing the cloud workload sizes (URL). We will provide an overview of using this tool. The first step is to provide any relevant business value. In the absence of a defined opportunity, simple “not applicable” entries may be given (per the sample below). Once submitted, you must accept the usage agreement which will bring up the demographic page. Figure 13: Capacity Planning Tool: Inquiry Form 30 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide The demographic page simply asks for generic information about the submitter. Figure 14: Capacity Planning Tool: User Demographic Information When “Continue” is selected, then the systems and storage page is provided. Figure 15: Capacity Planning Tool: Systems and Storage 31 Then the target system and associated utilization and Virtual Machine requirements are selected. Note for the utilization we select 20% headroom to support peak cloud workloads. Figure 16: Capacity Planning Tool: System and Workload Options At this point, the virtual machine requirements may be selected. Note a number of entries may be added. Figure 17: Capacity Planning Tool: Virtual Machine Requirements A confirmation screen is then provided to finalize the capacity planning request. Figure 18: Planning Tool: Confirmation Screen 32 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide The summary capacity planning recommendation is then provided. The summary details the compute node, CPU, memory, and storage requirements based on the selected configuration and associated workloads. Figure 19: Planning Tool: System Summary 33 6 Cloud Configuration Recommendations The SCO 2.3 and 2.3.0.1 offerings provide suitable configuration as part of the default installation. However, there are some specific configuration aspects that may improve the capability. The configuration points follow. 1. OpenStack Keystone cache. 2. OpenStack Keystone worker support. 3. IaaS Gateway cluster support. 4. IBM Workload Deployer configuration. 5. Virtual Machine IO scheduler. 6. Advanced Configuration and Power Interface (ACPI) management. 7. Java Virtual Machine heap. 8. Database. 6.1 OpenStack Keystone Cache Configuration SCO is deployed with a default two gigabyte cache for the Keystone cache (aka “memcached”) configuration. The intent of the cache is to provide an in memory repository of Keystone tokens to improve system throughput, particularly under concurrent workloads. Assuming there exists sufficient memory on the Keystone VM (Central Server 2), the recommendation is to double the cache configuration to four (4) gigabytes. Instructions on how to modify the cache setting are provided here. An appendix is provided that offers guidance on low level Keystone monitoring to determine health and throughput capability. 6.2 OpenStack Keystone Worker Support The initial SCO 2.3 offering contains a Keystone implementation that is characterized by a single execution thread instance. Improvements have been made to exploit multiple concurrent Keystone workers. This change is generally advised when Keystone exhibits high request latency, or is seen to consume a significant amount of a virtual CPU (e.g. > 80%). In order to exploit this support, two steps are required. 1. Obtain the required SCO 2.3 limited availability fix or fixpack. The authors of this paper may be contacted for further detail (this paper will be revised upon official availability). 2. Revise the configuration to exploit multiple workers. Further detail on this is provided below. 34 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide With the Keystone Worker improvement in place, the following configuration change will allow a pool of four public workers, and four administrative workers. This will permit increased concurrency, at the expense of virtual CPU consumption. As a result, the virtual CPU allocation should be increased based on monitoring data. In the “4+4” worker example below, it is expected to increase the virtual CPU allocation on the order of two to four virtual CPUs. Location: (Central Server 2) /etc/keystone/keystone.conf # The number of worker processes to serve the public WSGI application # (integer value). public_workers=4 # The number of worker processes to serve the admin WSGI application #(integer value). admin_workers=4 Figure 20: Keystone Worker Configuration 6.3 IaaS Gateway Cluster Support Similar to the Keystone worker support in the previous section, the IaaS Gateway cluster support permits the deployment of a scalable cluster of IaaS Gateway instances to drive greater concurrency and reduce latency. In order to exploit this support, two steps are required. 1. Obtain the required SCO 2.3 limited availability fix or fixpack. The authors of this paper may be contacted for further detail (this paper will be revised upon official availability). 2. Implement the cluster. See Appendix C for further details. Similar to the Keystone worker support, the IaaS Gateway cluster will drive additional virtual CPU utilization. It is expected to monitor and increase the virtual CPU allocation based upon system load. 6.4 IBM Workload Deployer Configuration The IWD component offers a number of configuration options. One specific option provides the ability to control a polling interval to refresh cloud information. Based on the size of the cloud, this configuration option should be changed. Location: (Central Server 3) /opt/ibm/rainmaker/purescale.app/private/expanded/ibm/rainmaker.vmsupport4.0.0.1/config/vmpublish.properties Original: RuntimeInterval=12000 Recommended: RuntimeInterval=30000 Figure 21: IWD Configuration 35 6.5 Virtual Machine IO Scheduler Configuration Each Linux instance has an IO scheduler. The intent of the IO scheduler is to optimize IO performance, potentially by clustering or sequencing requests to reduce the physical impact of IO. In a virtual world, however, the operating system is typically disassociated from the physical world through the hypervisor. As a result, it is recommended to alter the IO scheduler algorithm so that it is more efficient in a virtual deployment, with scheduling delegated to the hypervisor. The default scheduling algorithm is typically “cfq” (completely fair queuing). Alternative and recommended algorithms are “noop” and “deadline”. The “noop” algorithm, as expected, does as little as possible with a first in, first out queue. The “deadline” algorithm is more advanced, with priority queues and age as a scheduling consideration. System specific benchmarks should be used to determine which algorithm is superior for a given workload. In the absence of available benchmarks, we would recommend the “deadline” scheduler be used. The following console output shows how to display and modify the IO scheduler algorithm for a set of block devices. In the example, the “noop” scheduler algorithm is set. Note to ensure the scheduler configuration persists, it should be enforced via the operating system configuration (e.g. /etc/rc.local). Figure 22: Modifying the IO Scheduler 6.6 Advanced Configuration and Power Interface Management The Advanced Configuration and Power Interface (ACPI) operating system support may exhibit high virtual CPU utilization and offers limited value in virtual environments. It is recommended to disable ACPI on the SCO “managed from” nodes through the following steps. 1. Disabling “kacpid”. To switch off the kernel ACPI daemon, edit “/etc/grub.conf” and append "acpi=off" to the kernel boot command line. For example: title Red Hat Enterprise Linux (2.6.32-431.el6.x86_64) root (hd0,0) kernel /boot/vmlinuz-2.6.32-431.el6.x86_64 ro root=UUID=e1131bc1-bdbc-4b2e9ae7-d540b32b1f35 initrd /boot/initramfs-2.6.32-431.el6.x86_64.img becomes: 36 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide title Red Hat Enterprise Linux (2.6.32-431.el6.x86_64) root (hd0,0) kernel /boot/vmlinuz-2.6.32-431.el6.x86_64 ro root=UUID=e1131bc1-bdbc-4b2e-9ae7d540b32b1f35 acpi=off initrd /boot/initramfs-2.6.32-431.el6.x86_64.img 2. Disabling the user-space acpi daemon. To disable user space ACPI on managed-from nodes: chkconfig acpid off 3. Reboot the nodes. 6.7 Java Virtual Machine Heap Configuration The default Java Virtual Machine (JVM) heap sizes are intended to be economical. However, in the presence of sufficient available memory, it is recommended to increase the heap allocation. The three change sets below are recommended for application. They apply to Central Server 3 and, in particular, the IBM Workload Deployer instance. The IWD instance should be restarted once the changes are complete. Location: /opt/ibm/rainmaker/purescale.app/config/overrides.config Original: /config/zso/jvmargs = ["-Xms1024M","-Xmx1024M"] Recommended: /config/zso/jvmargs = ["-Xms1536M","-Xmx1536M"] Location: /etc/rc.d/init.d/iwd-utils Original: sed -i -e 's/3072M/1024M/g' $ZERO_DIR/config/overrides.config Recommended: sed -i -e 's/3072M/1536M/g' $ZERO_DIR/config/overrides.config Location: /opt/ibm/rainmaker/purescale.app/config/zero.config Original: "-Xms1024M","-Xmx1024M" Recommended: "-Xms1536M","-Xmx1536M" Figure 23: Java Virtual Machine Heap Change Sets 6.8 Database Configuration SCO is deployed with a DB2 database. The performance of the database is critical to the overall capability of the solution. The following database configuration changes are recommended for a base SCO 2.3 installation. Note some configuration changes should be in place for a SCO 2.3.0.1 installation, as noted. As a result, these specific steps are optional depending on the specific version deployed. Type Configuration Configuration For each relevant database (see Section 7.2) set: STMT_CONC = LITERALS LOCKTIMEOUT = 60 NUM_IOCLEANERS = AUTOMATIC NUM_IOSERVERS = AUTOMATIC AUTO_REORG = ON 37 For example: db2 UPDATE DB CFG FOR OPENSTAC USING LOCKTIMEOUT 60 Index Addition A number of OpenStack database indexes are required. Please apply the “SCO_CREATE_INDEXES.sh” script provided with this paper. Note an “SCO_DROP_INDEXES.sh” script is provided in the event it is desired to drop the indexes. Foreign Key Modification An OpenStack foreign key should be modified to enable cascading deletes. Please apply the “SCO_MODIFY_FKEY.sh” script provided with this paper. Figure 24: Database Configuration Change Sets 38 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide 7 Cloud Maintenance Recommendations We will describe recommended maintenance approaches for the SCO file systems volumes and the DB2 Database Management System. 7.1 SmartCloud Orchestrator Volume Management We will outline the SCO 2.3 volume management requirements. We will first describe the install time requirements, and then the requirements for a long running system. 7.1.1 Install Time Requirements The following table describes the SCO volume requirements, both overall and installation time free space requirements 2 . The overall requirements are useful for initial hardware allocations. The free space requirements are part of the installer pre-requisite checks. The intent is to ensure basic system health for the minimal set of file systems (i.e. ‘/’ and ‘/home’). Volume Requirements (GB) Server Central Server 1 Central Server 2 Central Server 3 Central Server 4 Central Server 5 Region Server vCPUs RAM (GB) Overall Free Space: ‘/’ Free Space: ‘/home’ 2 6 100 75 19 2 8 141 3 55 30 2 4 80 70 4 2 6 50 40 4 2 4 20 20 n/a 4 2 4 76 40 30 Figure 25: SCO 2.3 Volume Management: Install Time Requirements Some comments on the installation requirements:  These are the minimum installation requirements. The minimum and recommended requirements are provided in the product information center (URL).  The root requirement excludes the home requirement. 2 Referenced requirements are for the SCO 2.3.0.1 release. 3 Also requires 10GB and 40GB in the /opt and /tmp file systems, respectively. 4 Central Server 5 is an optional component. It is not managed as part of the installation pre- requisite check and is listed here for completeness. 39  The /home file system on Central Server 2 and the Region Server is primarily consumed by the /home/library directory of the Virtual Image Library. This path may be symbolic linked to an external volume to simplify image volume management.  It should be noted there is a gap between the overall numbers and the free space numbers reported. This is the result of the following factors. o The overall numbers describe the volume requirements at the hardware level, prior to base operating system installation. o The installer pre-requisite check is dealing with an installed system (i.e. post base operating system installation). As a result, approximately 6 GB is expected to be consumed by the base installation and related artifacts. Once this is factored in, the numbers align. 7.1.2 Long Running System Requirements While the installation requirements are useful, the true management aspect arises from a system under load for a significant period of time. The following tables show fine grained disk requirements for systems running continuous workloads (the so called “24 x 7” workloads) for months. Volume Size in MB Volume /bin/ /boot/ /data/ /drouter/ /etc/ /home/ /iaas/ /lib/ /lib64/ /opt/ /root/ /sbin/ /tmp/ /usr/ /var/ Central Server 1 10 27 11273 35 131153 8 138 28 2738 6 15 4 3250 521 Central Server 2 Central Server 3 10 27 35 24 146 32 4075 2 18 142 3587 399 8 27 23403 35 1 135 28 1203 2 15 27 3048 672 Central Server 4 10 27 35 1 140 28 6444 2 15 157 3062 186 Region Server 10 27 1820 36 23 7 129 28 611 1699 18 67 3556 3908 Figure 26: Long Running System Requirements: System A Volume Size in MB Volume /bin/ /boot/ /data/ /drouter/ /etc/ Central Server 1 10 27 11273 35 Central Server 2 10 27 35 40 Central Server 3 8 27 37263 35 Central Server 4 10 27 35 Region Server 10 27 1820 36 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide /home/ /iaas/ /lib/ /lib64/ /opt/ /root/ /sbin/ /tmp/ /usr/ /var/ 173042 8 138 28 2738 2 15 67 3250 179 3154 146 32 6856 2 18 142 3588 13653 1 135 28 1204 2 15 90 3048 11406 1 140 28 6503 2 15 280 3062 1076 8240 7 129 28 611 447 18 152 3556 540 Figure 27: Long Running System Requirements: System B This fine grained information is useful, but also a bit overwhelming. Let us look at a summary view relative to the installation free space requirements. Please keep in mind the free space requirements are typically 6GB off from the overall (hardware) requirement, but we consider the finer grained values more useful for comparison purposes. Volume Management (GB) Server Volume Install Free Space System A Utilization System B Utilization ‘/’ 75 18 17 ‘/home’ 19 128 169 ‘/’ 55 8 24 ‘/home’ 30 <1 3 ‘/’ 70 28 52 ‘/home’ 4 <1 <1 ‘/’ 40 10 11 ‘/home’ 4 <1 <1 ‘/’ 40 12 7 ‘/home’ 30 <1 8 Central Server 1 Central Server 2 Central Server 3 Central Server 4 Region Server Figure 28: Long Running System Requirements Summary The summary view, in the context of the installation free space requirements shows some surprising results.  The installation requirements are generally overstated. While there is some factoring for maintaining large installation bundles, the values ensure long term operational health (with some exceptions, described below).  For the Central Server 1, the ‘/data’ directory actually contains ~11GB which includes the RHEL ISO files required for installation. 41   Notable issues are highlighted in bold and orange and described below. o The ‘/home’ volume is clearly out of control on both System A and B. This is actually an error logging issue, and is described in the following section. o The ‘/home’ on the System B Region Server is showing greater than expected utilization. This is associated with the Virtual Image Library management and is considered within the recommended allocation. Not all file systems are enumerated in the interests of brevity. These file systems can generally be considered noise, contributing on the order of a handful of megabytes per server. The one exception to this is the ‘/install’ file system where most notably it consumes 20 GB on the System A region server and 61 GB on the System B region server. It should be noted these results are for a specific installation. As always, different installations may have different requirements based on usage. For example, images used for the Virtual Image Library on Central Server 2 can contribute significantly to utilization. Volume monitoring is always recommended as a best practice. Central Server 1 Error Logging Issue A core question is: why is the Central Server 1 ‘/home’ utilization so high? The simple answer is for the systems in questions, a program error is generating massive log entry activity into the database. For example, the PDWDB database log entries are consuming 147GB alone (87%) of the overall space! Is this normal? Absolutely not. A specific program error was triggered in our environment, and suitable fixes have been put in place. The following section provides a brief summary of the SCO 2.3 database structure, archive logging, and some recommended database management approaches (including online backup management). 42 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide 7.2 The SmartCloud Orchestrator Database and Schema Summary The SCO DB2 databases typically run under the default instance of DB2INST1. The following table summarizes the individual SCO databases. Database Schema(s) Comments BPMDB BPMUser Business Process Manager (BPM) databases. OPENSTAC CIRnnnnn GLEnnnnn NOAnnnnnn SCEnnnnnn KSDB OpenStack database. Note the “nnnnn” schema suffix is variable per region. RAINMAKE DB2INST1 IBM Workload Deployer (IWD) database. Uses the default schema for the database instance (in this case, DB2INST1). STORHOUS DB2INST1 IBM Workload Deployer (IWD) database. Uses the default schema for the database instance (in this case, DB2INST1). CMNDB PDWDB Figure 29: Database and Schema Summary 7.3 Database Management Generally speaking, the “out of the box “database configuration will achieve good results for both large and small installations. The following recommendations are primarily in the area of database maintenance. 7.3.1 DBMS Versions The following DBMS versions are recommended. All versions should be 64 bit. Version DB2 10.1 fp3 or later Notes DB2 10.5 and upward is not currently supported. Figure 30: DBMS Versions 7.3.2 Automatic Maintenance DB2 offers a number of automatic maintenance options. Automatic statistics collection (aka runstats) is considered a basic and necessary configuration setting, and is enabled for the product by default. Two other recommended configuration settings follow. It is 43 expected these configuration settings will be enabled by default in future versions of the products. 1. Real time statistics. The default runstats configuration generally collects statistics at two hour intervals. The real time statistics option provides far more granular statistics collection, essentially generating statistics as required at statement compilation time. 2. Automatic reorganization. Many customers ignore database reorganization and system performance starts to decline. This can be especially critical in the cloud space. The recommendation is to enable automatic reorganization support so it is self managed by the DBMS. Further discussion of database reorganization is covered in section 7.4.3. The following commands may be used to enable these automatic maintenance options. At the time of this writing, they are conditionally recommended. Each of these options has runtime impact and should be monitored to ensure there is no unnecessary system impact. In order to facilitate this, they should only be enabled once the system has been established and monitored. In addition, automatic reorganization is dependent on the definition of a maintenance window (see the DB2 Information Center for more detail). update db cfg for OPENSTAC using AUTO_STMT_STATS ON update db cfg for OPENSTAC using AUTO_REORG ON Figure 31: Database Automatic Maintenance Configuration 7.3.3 Operating System Configuration (Linux) The product installation guides have comprehensive instructions for Operating System prerequisites and configuration. However, on Linux systems improper configuration is common, so we will highlight specific issues. The first configuration point to check is the file system ulimit for the maximum number of open files allowed for a process (i.e. nofiles). The value for this kernel limit should be either “unlimited” or “65536”. The DB2 reference for this configuration setting is available here. In addition, the kernel semaphore and message queue specifications should be correct. These configuration settings are a function of the physical memory available on the machine. The DB2 reference for these configuration settings is available here. 7.4 Database Hygiene Overview The following steps will be described for database hygiene overview: 1. Database backup management. 2. Database statistics management. 3. Database reorganization. 4. Database archive management. 5. Database maintenance automation. 44 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide Steps make reference to recommended scheduling frequencies. The general purpose “cron” scheduling utility may be used to achieve this. However, other scheduling utilities may also be used. The key aspect of a cron’ed activity is it is scheduled at regular intervals (e.g. nightly, weekly) and typically does not require operator intervention. Designated maintenance windows may be used for these activities. 7.4.1 Database Backup Management It is recommended that nightly database backups be taken. The following figures offer a sample database offline backup (utilizing compression), along with a sample restore. backup db <dbname> user <user> using <password> to <backup directory> compress Figure 32: Database Backup with Compression Command restore db <dbname> from <backup directory> taken at <timestamp> without prompting Figure 33: Database Offline Backup Restore Online backups may be utilized as well. The following figure provides commands that comprise a sample weekly schedule. With the given schedule, the best case scenario is a restore requiring one image to restore (Monday failure using the Sunday night backup). The worst case scenario would require four images (Sunday + Wednesday + Thursday + Friday). An alternate approach would be to utilize a full incremental backup each night to make the worst case scenario two images. The tradeoffs for the backup approaches are the time to take the backup, the amount of disk space consumed, and the restore dependencies. A best practice can be to start with nightly full online backups, and introduce incremental backups if time becomes an issue. (Sun) (Mon) (Tue) (Wed) (Thu) (Fri) (Sat) backup backup backup backup backup backup backup db db db db db db db <dbname> <dbname> <dbname> <dbname> <dbname> <dbname> <dbname> online online online online online online online include logs use tsm incremental delta use incremental delta use incremental use tsm incremental delta use incremental delta use incremental use tsm tsm tsm tsm tsm Figure 34: Database Online Backup Schedule Note to enable incremental backups, the database configuration must be updated to track page modifications, and a full backup taken in order to establish a baseline. update db cfg for OPENSTAC using TRACKMOD YES Figure 35: Database Incremental Backup Enablement To restore the online backups, either a manual or automatic approach may be used. For the manual approach, you must start with the target image, and then revert to the oldest relevant backup and move forward to finish with the target image. A far simpler approach is to use the automatic option and let DB2 manage the images. A sample of each approach is provided below, showing the restore based on the Thursday backup. 45 restore db <dbname> incremental use tsm taken at <Sunday full timestamp> restore db <dbname> incremental use tsm taken at <Wednesday incremental timestamp> restore db <dbname> incremental use tsm taken at <Thursday incremental delta timestamp> Figure 36: Database Online Backup Manual Restore restore db <dbname> incremental auto use tsm taken at <Thursday incremental delta timestamp> Figure 37: Database Online Backup Automatic Restore In order to support online backups, archive logging must be enabled. The next subsection provides information on archive logging, including the capability to restore to a specific point in time using a combination of database backups and archive logs. Database Log Archiving A basic approach we will advocate is archive logging with the capability to support online backups. The online backups themselves may be full, incremental (based on the last full backup), and incremental delta (based on the last incremental backup). In order to enable log archiving to a location on disk, the following command may be used. update db cfg for <dbname> using logarchmeth1 DISK:/path/logarchive Figure 38: Database Log Archiving to Disk Alternatively, in order to enable log archiving to TSM, the following command may be used 5 . update db cfg for <dbname> using logarchmeth1 TSM Figure 39: Database Log Archiving to TSM Note that a “logarchmeth2” configuration parameter also exists. If both of the log archive method parameters are set, each log file is archived twice (once per log archive method configuration setting). This will result in two copies of archived log files in two distinct locations (a useful feature based on the resiliency and availability of each archive location). Once the online backups and log archive(s) are in effect, the recovery of the database may be performed via a database restore followed by a roll forward through the logs. Several restore options have been previously described in section 7.4.1. Once the restore has been completed, roll forward recovery must be performed. The following are sample roll forward operations. 5 The log archive methods (logarchmeth1, logarchmeth2) have the ability to associate configuration options with them (logarchopt1, logarchopt2) for further customization. 46 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide rollforward <dbname> to end of logs Figure 40: Database Roll Forward Recovery: Sample A rollforward <dbname> to 2012-02-23-14.21.56 and stop Figure 41: Database Roll Forward Recovery: Sample B It is worth noting the second example recovers to a specific point in time. For a comprehensive description of the DB2 log archiving options, the DB2 information center should be consulted (URL). A service window (i.e. stop the application) is typically required to enable log archiving. Database Backup Cleanup Unless specifically pruned, database backups may accumulate and cause issues with disk utilization or, potentially, a stream of failed backups. If unmonitored backups begin to fail, it may make disaster recovery near impossible in the event of a hardware or disk failure. A simple manual method to prune backups follows. find /backup/DB2 -mtime +7 | xargs rm Figure 42: Database Backup Cleanup Command A superior approach is to let DB2 automatically prune the backup history and delete your old backup images and log files. A sample configuration is provided below. update db cfg for OPENSTAC using AUTO_DEL_REC_OBJ ON update db cfg for OPENSTAC using NUM_DB_BACKUPS 21 update db cfg for OPENSTAC using REC_HIS_RETENTN 180 Figure 43: Database Backup Automatic Cleanup Configuration It is also generally recommended to have the backup storage independent from the database itself. This provides a level of isolation in the event volume issues arise (e.g. it ensures that a backup operation will not fill the volume hosting the tablespace containers, which could possibly lead to application failures). 7.4.2 Database Statistics Management As discussed in the previous “Automatic Maintenance” section, database statistics ensure that the DBMS optimizer makes wise choices for database access plans. The DBMS is typically configured for automatic statistics management. However, it may often be wise to force statistics as part of a nightly or weekly database maintenance operation. A simple command to update statistics for all tables in a database is the “reorgchk” command. reorgchk update statistics on table all Figure 44: Database Statistics Collection Command One issue with the reorgchk command is it does not enable full control over statistics capturing options. For this reason, it may be beneficial to perform statistics updates on a table by table level. However, this can be a daunting task for a database with hundreds of tables. As a result, the following SQL statement may be used to generate administration commands on a table by table basis. 47 select 'runstats on table ' || STRIP(tabschema) || '.' || tabname || ' with distribution and detailed indexes all;' from SYSCAT.TABLES where tabschema in ('CIRnnnnn','GLEnnnnn','NOAnnnnn','SCEnnnnn','KSDB'); Figure 45: Database Statistics Collection Table Iterator 7.4.3 Database Reorganization Over time, the space associated with database tables and indexes may become fragmented. Reorganizing the table and indexes may reclaim space and lead to more efficient space utilization and query performance. In order to achieve this, the table reorganization command may be used. Note, as discussed in the previous “Automatic Maintenance” section, automatic database reorganization may be enabled to reduce the requirement for manual maintenance. The following commands are examples of running a “reorg” on a specific table and its associated indexes. Note the “reorgchk” command previously demonstrated will actually have a per table indicator of what tables require a reorg. Using the result of “reorgchk” per table reorganization may be achieved for optimal database space management and usage. reorg table <table name> allow no access reorg indexes all for table <table name> allow no access Figure 46: Database Reorganization Commands It is important to note there are many options and philosophies for doing database reorganization. Every enterprise must establish its own policies based on usage, space considerations, performance, etc. The above example is an offline reorg. However it is possible to also do an online reorg via the “allow read access” or “allow write access” options. The “notruncate” option may also be specified (indicating the table will not be truncated in order to free space). The “notruncate” option permits more relaxed locking and greater concurrency (which may be desirable if the space usage is small or will soon be reclaimed). If full online access during a reorg is required, the “allow write access” and “notruncate” options are both recommended. Note it is also possible to use our table iteration approach to do massive reorgs across hundreds of tables as shown in the following figure. The DB2 provided snapshot routines and views (e.g. SNAPDB, SNAP_GET_TAB_REORG) may be used to monitor the status of reorg operations. select 'reorg table ' || STRIP(tabschema) || '.' || tabname || ' allow no access;' from SYSCAT.TABLES where tabschema in ('CIRnnnnn','GLEnnnnn','NOAnnnnn','SCEnnnnn','KSDB'); select 'reorg indexes all for table ' || STRIP(tabschema) || '.' || tabname || ' allow no access;' from SYSCAT.TABLES where tabschema in ('CIRnnnnn','GLEnnnnn','NOAnnnnn','SCEnnnnn','KSDB'); Figure 47: Database Reorganization Table Iterator 48 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide 7.4.4 Database Archiving Database archiving is the act of removing unnecessary or obsolete information in order to preserve optimum performance. The intent is to keep table cardinality manageable so that query performance is stable, and to minimize IO overhead. The following graph shows the real world impact of proper database archiving. Figure 48: Database Archiving Impact The graph shows provisioning service times pre and post archiving. For the pre archiving interval, not only are the average service times much higher (dark blue line), but the distribution of service times is much wider (series of cyan data points). Once the archiving is implemented, the service times are extremely stable with a much narrower time distribution. In order to achieve database archiving, an archive script and associated documentation is provided with this paper 6 (see “ArchiveScripts.zip”). The archiving is an OpenStack function and copies the historical content to a shadow database (implying the data is still available and online). It is recommended the database archiving be part of a scheduled maintenance activity via the crontab (see the next section for details). 6 The archive scripts are also part of the SCO 2.3.0.1 distribution. 49 7.4.5 Database Maintenance Automation For standard database maintenance, it is advisable to automate the scheduling and execution of the maintenance activities via the crontab. The following table shows a sample schedule for the maintenance operations for the relevant SCO databases. Database Statistics Reorgs STORHOUS Sunday Saturday PWDWB Tuesday Monday BPMDB Wednesday Tuesday OPENSTAC Monday Sunday RAINMAKE Thursday Wednesday Friday Thursday CMNDB Archiving Saturday Figure 49: Sample Database Maintenance Schedule The following example demonstrates maintenance activities on the OPENSTAC database. Similar examples are provided with this paper via the “CrontabScripts.zap” attachment. In general, the sample cron entries schedule activities in disjoint time windows throughout the week. This serves to provide fully online maintenance operations with minimal impact. # Run runstats and reorgchk for openstac db 0 2 * * Mon db2inst1 /home/db2inst1/tools/gen_runstats.sh OPENSTAC /home/db2inst1/tools 30 2 * * Sun db2inst1 /home/db2inst1/tools/gen_reorg.sh OPENSTAC /home/db2inst1/tools Figure 50: Sample Database Maintenance Crontab Entry 50 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide 8 Summary Cookbook The following tables provide a cookbook for the solution implementation. The cookbook approach implies a set of steps the reader may “check off” as completed to provide a stepwise implementation of the SCO solution. The recommendations will be provided in three basic steps: 1. Base installation recommendations. 2. Post installation recommendations. 3. High scale recommendations. All recommendations are provided in tabular format. The preferred order of implementing the recommendations is in order from the first row of the table through to the last. 8.1 Base Installation Recommendations The base installation recommendations are considered essential to a properly functioning SCO instance. All steps should be implemented. Identifier B1 Description Perform the base SCO installation, ensuring the recommended configuration described in Section 5.2 is achieved. A central DB2 server should be used (i.e. the region servers should not manage a local DBMS unless there are compelling geographic considerations). Where possible it is recommended to install the DBMS on bare metal, or in a DBA managed pool, to facilitate performance management. B2 Enable the Keystone memcached implementation (Section 6.1). B3 Enable the OpenStack Keystone worker support (Section 6.2). B4 Enable the IaaS Gateway cluster support (Section 6.3). B5 Optimize the IWD component (Section 6.4). B6 Configure the Linux IO scheduler (Section 6.5). B7 Disable the ACPI management (Section 6.6). B8 Ensure the Java heaps are optimized (Section 6.7). B9 Configure the central database (Section 6.8). B10 Configure the database server Linux instance per section 7.3.3. Figure 51: Base Installation Recommendations 51 Status 8.2 Post Installation Recommendations The post installation recommendations will provide additional throughput and superior functionality. All steps should be implemented. Identifier Description P1 Perform a set of infrastructure and SCO benchmarks to determine the viability of the installation (see Sections 4.2 and 4.3). P2 Implement the database statistics maintenance activity per Sections 7.4.2 and 7.4.5. P3 Implement the database reorg maintenance activity per Sections 7.4.3 and 7.4.5. P4 Implement the database archiving maintenance activity per Sections 7.4.4 and 7.4.5. P5 Implement a suitable backup and disaster recovery plan comprising regular backups of all critical server components (including the database and relevant file system objects). Guidelines are provided in the SCO Information Center (URL). Status Figure 52: Post Installation Recommendations 8.3 High Scale Recommendations The high scale recommendations should be incorporated once the production installation wants to support the high water mark for scalability. All steps may be optionally implemented over time based upon workload. Identifier Description S1 Apply the latest SCO fixpack. S2 Monitor the performance of the installation (Section 4.1) and adjust the management server to the recommended installation values (Section 5.2) as appropriate. S3 Optimize Central Server 1 (DBMS) performance. A basic way to achieve this is to have dedicated, high performance storage allocated to the database containers and logs. Figure 53: High Scale Recommendations 52 Status SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide APPENDIX A: SMARTCLOUD ORCHESTRATOR MONITORING OPTIONS Monitoring is important to understand and ensure the health of any cloud solution. A number of monitoring approaches are available for SCO. The solutions are described via the following summary sections, broken down into three categories. 1. OpenStack monitoring via Ceilometer. 2. SCO monitoring via IBM BPM. 3. Infrastructure monitoring via IBM Tivoli Monitoring (ITM) and third party solutions. A separate appendix is provided that is specific to OpenStack Keystone monitoring. A.1 OpenStack Monitoring OpenStack monitoring is provided via the Ceilometer component. Cielometer offers a comprehensive and customizable infrastructure, including support for event and threshold management. Note while Ceilometer is not part of the base SCO 2.3 distribution, it is a constituent of the OpenStack Grizzly base, with continued enhancement in subsequent OpenStack releases. Ceilometer provides three distinct types of metrics: 1. Cumulative: counters that accumulate or increase over time. 2. Gauge: counters that offer discrete, point in time values. 3. Delta: differential counters showing change rates. A vast array of metrics is provided by Ceilometer. An easy way to interactively derive the set of available metrics is to query Ceilometer directly (see the sample below). In addition, the Ceilometer documentation provides the default set, with associated attributes (URL). ceilometer meter-list -s openstack Figure 54: OpenStack Ceilometer Metrics 53 The following table provides a core set of recommended monitoring points for OpenStack. A broader set may of course be used. Component Meters Nova (Compute Node Management) cpu_util disk.read.requests.rate disk.write.requests.rate disk.read.bytes.rate disk.write.bytes.rate network.incoming.bytes.rate network.outgoing.bytes.rate network.incoming.packets.rate network.outgoing.packets.rate The following counters require enablement: compute.node.cpu.kernel.percent compute.node.cpu.idle.percent compute.node.cpu.user.percent compute.node.cpu.iowait.percent Neutron (Network Management) network.create network.update subnet.create subnet.update Glance (Image Management) image.update image.upload image.delete Cinder (Volume Management) volume.size Swift (Object Storage Management) storage.objects storage.objects.size storage.objects.containers storage.objects.incoming.bytes storage.objects.outgoing.bytes Heat (Orchestration) stack.create stack.update stack.delete stack.suspend stack.resume Figure 55: OpenStack Ceilometer Core Metrics In addition, Ceilometer provides a REST API that allows cloud administrators to record KPIs. For instance, infrastructure metrics could be placed in Ceilometer with a HTTP POST request. As Ceilometer includes a data store, as well as some basic statistical functionality, it is a candidate for an integration point for cloud monitoring data. 54 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide A.2 SmartCloud Orchestrator Monitoring SCO monitoring should be employed to address the solution layer “above” OpenStack. The primary mechanism for SCO monitoring is enablement of the BPM performance data warehouse (relevant information available in the References section) 7 . The performance data warehouse may be enabled via “autotracking”, which will enable both custom KPIs as well as the default total time KPIs. The core KPIs to understand BPM capability are:  BPM processes executed per second.  Average service times per BPM process. It is important to note that given Ceilometer provides a general plugin and distribution infrastructure, it may be combined with the SCO monitoring solution. A sample approach for managing these monitoring points follows. 1. Derive a BPM plugin to retrieve raw times from the BPM performance data warehouse (PDWDB) database. The preferred method is the provided REST interface (versus direct database access). 2. Perform calculations based on the raw data. For example, converting a series of milestones into performance KPIs, or calculating statistical quantities (e.g. standard deviation, harmonic mean). 3. Push the results to Ceilometer as the meter distribution mechanism. 4. Read the results via the Ceilometer REST API and display in the visualization tool of your choice. A.3 Infrastructure Monitoring Infrastructure monitoring can address the operating system and hypervisor health of the cloud. Available tools include IBM Tivoli Monitoring (ITM) or the open source offering Nagios. For example, ITM v6.2 provides the follow infrastructure monitoring agents (for reference, see URL). 1. IBM Tivoli Monitoring Endpoint. 2. Linux OS. 3. UNIX Logs. 4. UNIX OS. 5. Windows OS. 6. i5/OS®. 7. IBM Tivoli Universal Agent. 7 It is worth noting that BPM is built on IBM WebSphere and as a result, WebSphere monitoring capabilities also apply. 55 8. Warehouse Proxy. 9. Summarization and Pruning. 10. IBM Tivoli Performance Analyzer. Critical KPIs to monitor at the infrastructure level are summarized in the following table (VMware is provided as a representative hypervisor sample). Component Meters  Operating System            DBMS: ITM for DB2 (URL)     Application Server: ITCAM Agent for WebSphere Applications (URL)   J2EE: ITCAM Agent for J2EE (URL) HTTP: ITCAM Agent for HTTP Servers (URL) Hypervisor: ITM for Virtual Environments (URL) 56 CPU utilization including kernel, user, IO wait, and idle times. Disk utilization including read/write request and byte rates. Network utilization including incoming and outgoing packet and byte rates. Volume free space across the central and region servers. Special attention should be paid to the Virtual Image Library on Central Server 2 to ensure the “/home/library” space is well managed. Application IO activity workspace. Application lock activity workspace. Application overview workspace. Buffer Pool workspace. Connection workspace. Database workspace. Database Lock Activity workspace. Historical Summarized Capacity Weekly workspace. Historical Summarized Performance Weekly workspace. Locking Conflict workspace. Tablespace workspace. WebSphere Agent Summary workspace. Application Server Summary workspace. Application Health Summary workspace.  Web Server Agent workspace.       Server workspace. CPU workspace. Disk workspace. Memory workspace. Network workspace. Resource Pools workspace. SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide   Hypervisor: VMware esxtop sample    Figure 56: Infrastructure Core Metrics 57 Virtual Machines workspace. CPU: Run(%RUN), Wait (%WAIT), Ready (%RDY), Co-Stop (%CSTP). Network: Dropped packets (%DRPTX, %DRPRX). IO: Latency (DAVG, KAVG), Queue length (QUED) Memory: Memory reclaim (MCTLSZ), Swap (SWCUR, SWR/s, SWW/s), APPENDIX B: OPENSTACK KEYSTONE MONITORING The Keystone component is critical to overall performance of SmartCloud Orchestrator. For example, if one component saturates Keystone, the overall throughput of the system will be impacted. This is magnified by the fact that Keystone has only a single execution thread instance. In order to understand Keystone performance, the best method is to look at the requests and responses via a proxy such as the IaaS Gateway. This provides the ability to see requests that are dropped before being processed by Keystone. We will describe an approach for monitoring Keystone via the PvRequestFilter. B.1 PvRequestFilter The PvRequestFilter was designed to output request and response data into the Keystone log. When enabled it prints the data as warning messages, so it is not necessary to turn up the default debug level to generate the log messages. The format of the messages is as follows. All fields except “<duration>” are printed out for both requests and responses. The duration of the request is printed only for the response. WARNING [REQUEST|RESPONSE] <millisecond timestamp to identify request> <REMOTE_ADDR>:<REMOTE_PORT> <REQUEST_METHOD> <RAW_PATH_INFO> [<duration>] Figure 57: Keystone Monitoring PvRequestFilter Format Sample output follows. 2014-07-21 17:16:56.509 22811 WARNING keystone.contrib.pvt_filter.request [-] REQUEST 2014-07-21_17:16:56.509 172.18.152.103:1278 GET /v3/users 2014-07-21 17:16:56.785 22811 WARNING keystone.contrib.pvt_filter.request [-] RESPONSE 2014-07-21_17:16:56.509 172.18.152.103:1278 GET /v3/users 0.276294 2014-07-21 17:16:56.807 22811 WARNING keystone.contrib.pvt_filter.request [-] REQUEST 2014-07-21_17:16:56.807 172.18.152.103:1278 GET /v3/domains 2014-07-21 17:16:56.824 22811 WARNING keystone.contrib.pvt_filter.request [-] RESPONSE 2014-07-21_17:16:56.807 172.18.152.103:1278 GET /v3/domains 0.017691 2014-07-21 17:16:56.839 22811 WARNING keystone.contrib.pvt_filter.request [-] REQUEST 2014-07-21_17:16:56.839 172.18.152.103:1279 GET /v3/users/e92b94d7068843ef98d664521bd9c983/projects 2014-07-21 17:16:56.868 22811 WARNING keystone.contrib.pvt_filter.request [-] RESPONSE 2014-07-21_17:16:56.839 172.18.152.103:1279 GET /v3/users/e92b94d7068843ef98d664521bd9c983/projects 0.028558 Figure 58: Keystone Monitoring PvRequestFilter Sample Output 58 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide B.2 Enabling PvRequestFilter The process to enable PvRequestFilter follows. 1. Log onto Central Server 2. 2. Extract the distribution provided with this paper (keystoneStats.zap). 3. Install the filter and backup the existing configuration: ./deployKeystoneFilter.sh 4. Make the following changes to the “/etc/keystone/keystone.conf” file. Note: Reversing step 2 will disable the filter. a. Add the following lines just above line starting with "[filter:debug]". [filter:pvt] paste.filter_factory = keystone.contrib.pvt_filter.request:PvtRequestFilter.factory b. Add "pvt" to three of the pipeline statements: [pipeline:public_api] pipeline = access_log sizelimit url_normalize token_auth admin_token_auth xml_body json_body simpletoken ec2_extension user_crud_extension pvt public_service [pipeline:admin_api] pipeline = access_log sizelimit url_normalize token_auth admin_token_auth xml_body json_body simpletoken ec2_extension s3_extension crud_extension pvt admin_service [pipeline:api_v3] pipeline = access_log sizelimit url_normalize token_auth admin_token_auth xml_body json_body simpletoken ec2_extension s3_extension pvt service_v3 c. Restart the keystone service. service openstack-keystone restart d. Validate that the “/var/log/keystone/keystone.log” is producing the appropriate log messages (sample below). e. Update the “hosts.table” file to reflect your environment. f. Run the workload or scenario for analysis. g. Generate the statistics for the request and response data in the “keystone.log” file (sample below): ./keystoneStats.sh /var/log/keystone/keystone.log > results Figure 59: Keystone Monitoring Log Messages Example 59 Figure 60: Keystone Monitoring Statistics Example 60 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide APPENDIX C: IAAS GATEWAY CLUSTER ENABLEMENT The following steps are required to enable the IaaS Gateway cluster. 1. Prepare the HTTP server as a load balancer. a. Ensure the HTTP server is installed. i. Check if there is already a HTTP server on Central Server 2: service httpd status ii. If there is already an HTTP server, stop it with the following command: service httpd stop iii. If there is no HTTP server installed, use the following command to install one: yum install httpd b. Update the “httpd.conf” with the load balancer configuration. i. Modify the file “/etc/httpd/conf/httpd.conf” with the following changes. 1. Update the listen port to the gateway port: # Listen 80 Listen 9973 2. Append the load balancer configuration to the end the file: <VirtualHost *:9973> ProxyRequests off <Proxy balancer://mycluster> # three node gateway cluster BalancerMember http://127.0.0.1:12001 BalancerMember http://127.0.0.1:12002 BalancerMember http://127.0.0.1:12003 Order Deny,Allow Deny from none Allow from all ProxySet lbmethod=byrequests </Proxy> # path of requests to balance "/" -> everything ProxyPass / balancer://mycluster/ </VirtualHost> 2. Prepare the configuration file for cluster members, by performing the following commands. 61 cd /etc/iaasgateway/ cp iaasgateway.conf iaasgateway00.conf vi iaasgateway00.conf #It should look like below before applying this fix: [service] iaasgateway_listen = <central-server-2-ip> iaasgateway_listen_port = 9973 #Update it to: iaasgateway_listen = 127.0.0.1 iaasgateway_listen_port = 1200X iaasgateway_user_entry = <central-server-2-ip> iaasgateway_user_entry_port = 9973 # copy configure files and update port cp iaasgateway00.conf iaasgateway01.conf sed -i 's/1200X/12001/' iaasgateway01.conf cp iaasgateway00.conf iaasgateway02.conf sed -i 's/1200X/12002/' iaasgateway02.conf cp iaasgateway00.conf iaasgateway03.conf sed -i 's/1200X/12003/' iaasgateway03.conf 3. Prepare the init scripts and update the configuration file. cd /etc/init.d/ cp openstack-iaasgateway openstack-iaasgateway01 cp openstack-iaasgateway openstack-iaasgateway02 cp openstack-iaasgateway openstack-iaasgateway03 sed -i 's/prog=openstack-iaasgateway/prog=openstack-iaasgateway01/' openstackiaasgateway01 sed -i 's/iaasgateway.conf/iaasgateway01.conf/' openstack-iaasgateway01 sed -i 's/prog=openstack-iaasgateway/prog=openstack-iaasgateway02/' openstackiaasgateway02 sed -i 's/iaasgateway.conf/iaasgateway02.conf/' openstack-iaasgateway02 sed -i 's/prog=openstack-iaasgateway/prog=openstack-iaasgateway03/' openstackiaasgateway03 sed -i 's/iaasgateway.conf/iaasgateway03.conf/' openstack-iaasgateway03 4. Start up the cluster, through the following commands. service openstack-iaasgateway stop Stopping openstack-iaasgateway: [ OK ] service openstack-iaasgateway01 start Starting openstack-iaasgateway01: [ OK ] service openstack-iaasgateway02 start Starting openstack-iaasgateway02: [ OK ] service openstack-iaasgateway03 start Starting openstack-iaasgateway03: [ OK ] service httpd start Starting httpd: [ OK ] 62 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide 5. Ensure the cluster startup will persist across reboots. # Turn the non-clustered gateway off. chkconfig --level 2345 openstack-iaasgateway off # Turn the clustered gateway on. chkconfig --level 2345 openstack-iaasgateway01 on chkconfig --level 2345 openstack-iaasgateway02 on chkconfig --level 2345 openstack-iaasgateway03 on chkconfig --level 2345 httpd on 6. Check the IaaS Gateway service status. a. Try to open following link in a browser. The content should operate the same as prior to applying the cluster. http://<central-server-2-ip>:9973/providers b. Check for listening ports with the following command: netstat -nap | grep 1200 | grep LISTEN tcp 0 0 127.0.0.1:12001 0.0.0.0:* LISTEN 7269/python tcp 0 0 127.0.0.1:12002 0.0.0.0:* LISTEN 7286/python tcp 0 0 127.0.0.1:12003 0.0.0.0:* LISTEN 7303/python c. Check whether the load balancer is listening: netstat -nap | grep 9973 | grep LISTEN tcp 0 0 :::9973 :::* d. Verify you may login to the SCO UI. 7. The IaaS Gateway cluster is now enabled. 63 LISTEN 7321/httpd REFERENCES SmartCloud Orchestrator and Related Component References IBM SmartCloud Orchestration Information Center SCO 2.3 Information Center IBM SmartCloud Orchestrator Resource Center SCO Resource Center IBM Business Process Manager V8.0 Performance Tuning and Best Practices http://www.redbooks.ibm.com/redpapers/pdfs/redp4935.pdf IBM Business Process Manager Performance Data Warehouse http://pic.dhe.ibm.com/infocenter/dmndhelp/v8r5m0/topic/com.ibm.wbpm.admin.doc/topics/ managing_performance_servers.html IBM Tivoli Monitoring Information Center http://publib.boulder.ibm.com/infocenter/tivihelp/v15r1/topic/com.ibm.itm.doc_6.2.3fp1/welc ome.htm IBM DB2 10.1 Information Center http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/index.jsp?topic=/com OpenStack References OpenStack Performance Presentation (Folsom, Havana, Grizzly) http://www.openstack.org/assets/presentation-media/openstackperformance-v4.pdf OpenStack Ceilometer http://docs.openstack.org/developer/ceilometer OpenStack Rally https://wiki.openstack.org/wiki/Rally Hypervisor References Performance Best Practices for VMware vSphere™ 5.0 http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf Performance Best Practices for VMware vSphere™ 5.1 http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.1.pdf VMware: Troubleshooting virtual machine performance issues VMware Knowledge Base VMware: Performance Blog http://blogs.vmware.com/vsphere/performance Linux on System x: Tuning KVM for Performance 64 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide KVM Performance Tuning Kernel Virtual Machine (KVM): Tuning KVM for performance http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaat/liaattuning_pdf.pdf PowerVM Virtualization Performance Advisor Developer Works PowerVM Performance IBM PowerVM Best Practices http://www.redbooks.ibm.com/redbooks/pdfs/sg248062.pdf Benchmark References Report on Cloud Computing to the OSG Steering Committee, SPEC Open Systems Group, https://www.spec.org/osgcloud/docs/osgcloudwgreport20120410.pdf 65 ® © Copyright IBM Corporation 2014 IBM United States of America Produced in the United States of America US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PAPER “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes may be made periodically to the information herein; these changes may be incorporated in subsequent versions of the paper. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this paper at any time without notice. Any references in this document to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation 4205 South Miami Boulevard Research Triangle Park, NC 27709 U.S.A. All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information is for planning purposes only. The information herein is subject to change before the products described become available. If you are viewing this information softcopy, the photographs and color illustrations may not appear. 66 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the web at "Copyright and trademark information" at http://www.ibm.com/legal/copytrade.shtml. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Other company, product, or service names may be trademarks or service marks of others. 67

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide