* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					IBM® Cloud and Smarter Infrastructure Software
IBM Cloud Orchestrator Version
2.4:
Capacity Planning, Performance,
and Management Guide
Document version 2.4.0
IBM Cloud Orchestrator Performance Team
© Copyright International Business Machines Corporation 2015.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
CONTENTS
Contents .............................................................................................................................. iii
List of Figures ...................................................................................................................... vi
Author List ......................................................................................................................... viii
Revision History .................................................................................................................. ix
1
Introduction............................................................................................................ 10
2
IBM Cloud Orchestrator 2.4 Overview .................................................................. 11
3
4
5
6
2.1
Functional Overview.................................................................................. 11
2.2
Architectural Overview .............................................................................. 13
Performance Overview ......................................................................................... 15
3.1
Sample Benchmark Environment ............................................................. 15
3.2
Key Performance Indicators ..................................................................... 16
3.2.1
Concurrent User Performance ............................................................... 17
3.2.2
Provisioning Performance ...................................................................... 19
Performance Benchmark Approaches .................................................................. 21
4.1
Monitoring and Analysis Tools .................................................................. 21
4.2
Infrastructure Benchmark Tools ................................................................ 22
4.3
Cloud Benchmarks .................................................................................... 22
Capacity Planning Recommendations .................................................................. 23
5.1
Cloud Capacity Planning Spreadsheet ..................................................... 23
5.2
IBM Cloud Orchestrator Management Server Capacity Planning ........... 24
5.3
Provisioned Virtual Machines Capacity Planning ..................................... 25
Cloud Configuration Recommendations ............................................................... 29
iii
6.1
OpenStack Keystone Worker Support...................................................... 29
6.2
Disabling the IWD Service ........................................................................ 29
6.3
IBM Workload Deployer Configuration ..................................................... 30
6.4
Virtual Machine IO Scheduler Configuration ............................................ 30
6.5
Advanced Configuration and Power Interface Management ................... 30
6.6
Java Virtual Machine Heap Configuration ................................................ 31
6.7
Database Configuration ............................................................................ 32
6.8
Database Management............................................................................. 32
6.9
7
6.8.1
DBMS Versions ...................................................................................... 32
6.8.2
Automatic Maintenance .......................................................................... 32
6.8.3
Operating System Configuration (Linux) ................................................ 33
Database Hygiene Overview .................................................................... 33
6.9.1
Database Backup Management ............................................................. 33
6.9.2
Database Statistics Management ........................................................... 36
6.9.3
Database Reorganization ....................................................................... 36
Summary Cookbook.............................................................................................. 38
7.1
Base Installation Recommendations ........................................................ 38
7.2
Post Installation Recommendations ......................................................... 39
7.3
High Scale Recommendations ................................................................. 39
Appendix A: IBM Cloud Orchestrator Monitoring Options ................................................ 40
A.1
OpenStack Monitoring .............................................................................. 40
A.2
IBM Cloud Orchestrator Monitoring .......................................................... 41
A.3
Infrastructure Monitoring ........................................................................... 42
Appendix B: OpenStack Keystone Monitoring ................................................................. 45
iv
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
B.1
PvRequestFilter ......................................................................................... 45
B.2
Enabling PvRequestFilter ......................................................................... 45
References ........................................................................................................................ 47
v
LIST OF FIGURES
Figure 1: Revision History ................................................................................................................ ix
Figure 2: IBM Cloud Orchestrator Benefits Estimator .................................................................... 11
Figure 3: ICO Cloud Marketplace View .......................................................................................... 12
Figure 4: ICO Architecture Reference Topology ............................................................................ 13
Figure 5: ICO Sample Benchmark Environment ............................................................................ 15
Figure 6: Benchmark Data Model Population ................................................................................ 18
Figure 7: Load Driving (User) Scenarios ........................................................................................ 19
Figure 8: Provisioning Performance in a Closed System ............................................................... 20
Figure 9: Monitoring and Analysis Tools ........................................................................................ 21
Figure 10: Infrastructure Benchmark Tools .................................................................................... 22
Figure 11: ICO Management Server Capacity Planning ................................................................ 24
Figure 12: Capacity Planning Tool: Inquiry Form ........................................................................... 25
Figure 13: Capacity Planning Tool: User Demographic Information .............................................. 26
Figure 14: Capacity Planning Tool: Systems and Storage ............................................................. 26
Figure 15: Capacity Planning Tool: System and Workload Options ............................................... 27
Figure 16: Capacity Planning Tool: Virtual Machine Requirements ............................................... 27
Figure 17: Planning Tool: Confirmation Screen ............................................................................. 27
Figure 18: Planning Tool: System Summary .................................................................................. 28
Figure 19: Keystone Worker Configuration .................................................................................... 29
Figure 20: IWD Configuration ........................................................................................................ 30
Figure 21: Modifying the IO Scheduler........................................................................................... 30
Figure 22: Java Virtual Machine Heap Change Sets ..................................................................... 31
Figure 23: Database Configuration Change Sets .......................................................................... 32
Figure 24: DBMS Versions ............................................................................................................ 32
Figure 25: Database Automatic Maintenance Configuration .......................................................... 33
Figure 26: Database Backup with Compression Command ........................................................... 33
Figure 27: Database Offline Backup Restore ................................................................................. 33
Figure 28: Database Online Backup Schedule .............................................................................. 34
Figure 29: Database Incremental Backup Enablement .................................................................. 34
Figure 30: Database Online Backup Manual Restore .................................................................... 34
Figure 31: Database Online Backup Automatic Restore ................................................................ 34
Figure 32: Database Log Archiving to Disk .................................................................................... 34
Figure 33: Database Log Archiving to TSM ................................................................................... 35
vi
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
Figure 34: Database Roll Forward Recovery: Sample A................................................................ 35
Figure 35: Database Roll Forward Recovery: Sample B................................................................ 35
Figure 36: Database Backup Cleanup Command .......................................................................... 35
Figure 37: Database Backup Automatic Cleanup Configuration .................................................... 35
Figure 38: Database Statistics Collection Command ..................................................................... 36
Figure 39: Database Statistics Collection Table Iterator ................................................................ 36
Figure 40: Database Reorganization Commands .......................................................................... 36
Figure 41: Database Reorganization Table Iterator ....................................................................... 37
Figure 42: Base Installation Recommendations ............................................................................. 38
Figure 43: Post Installation Recommendations .............................................................................. 39
Figure 44: High Scale Recommendations...................................................................................... 39
Figure 45: OpenStack Ceilometer Metrics ..................................................................................... 40
Figure 46: OpenStack Ceilometer Core Metrics ............................................................................ 41
Figure 47: Infrastructure Core Metrics ........................................................................................... 44
Figure 48: Keystone Monitoring PvRequestFilter Format .............................................................. 45
Figure 49: Keystone Monitoring PvRequestFilter Sample Output .................................................. 45
Figure 50: Keystone Monitoring Log Messages Example .............................................................. 46
Figure 51: Keystone Monitoring Statistics Example ....................................................................... 46
vii
AUTHOR LIST
This paper is the team effort of a number of cloud performance specialists comprising the IBM
Cloud Orchestrator performance team. Additional recognition goes out to the entire IBM Cloud
Orchestrator and OpenStack development teams.
Mark Leitch
(primary contact for this
paper)
IBM Toronto Laboratory
Amadeus Podvratnik
Marc Schunk
IBM Boeblingen Laboratory
Nate Rockwell
IBM USA
Tiarnán Ó Corráin
IBM Ireland
viii
Alessandro Chiantera
Andrea Tortosa
Massimo Marra
Michele Licursi
Paolo Cavazza
Sandro Piccinini
IBM Rome Laboratory
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
REVISION HISTORY
Date
Version
Revised By
Comments
March 12th, 2015
Draft
MDL
Initial version for review.
April 1st, 2015
2.4.0
MDL
Incorporated review comments.
Figure 1: Revision History
ix
1
Introduction
Capacity planning involves the specification of the various components of an installation to
meet customer requirements, often with growth or timeline considerations. A key aspect of
capacity planning for cloud, or virtualized, environments is the specification of sufficient
physical resources to provide the illusion of infinite resources in an environment that may
be characterized by highly variable demand. This document will provide an overview of
capacity planning for the IBM Cloud Orchestrator (ICO) Version 2.4. In addition, it will offer
management best practices to achieve a well performing installation that demonstrates
service stability.
ICO Version 2.4 offers end to end management of service offerings across a number of
cloud technology offerings including VMware, Kernel-based Virtual Machine (KVM), IBM
PowerVM, and IBM System z. A key implementation aspect is integration with OpenStack,
the de facto leading open virtualization technology. OpenStack offers the ability to control
compute, storage, and network resources through an open, community based architecture.
In this document we will provide an ICO 2.4 overview, including functionality, architecture,
and performance. We will then offer the capacity planning recommendations, including
considerations for hardware configuration, software configuration, and cloud maintenance
best practices. A summary “cookbook” is provided to manage installation and
configuration for specific instances of ICO.
Note: This document is considered a work in progress. Capacity planning
recommendations will be refined and updated as new ICO releases are available. While
the paper in general is considered suitable for all ICO Version 2.4 releases, it is best
oriented towards ICO Version 2.4.0.1. In addition, a number of references are provided in
the References section. These papers are highly recommended for readers who want
detailed knowledge of ICO server configuration, architecture, and capacity planning.
Note: Some artifacts are distributed with this paper. The distributions are in zip format.
However Adobe protects against files with a “zip” suffix. As a result, the file suffix is set to
“zap” per distribution. To use these artifacts, simply rename the distribution to “zip” and
process as usual.
10
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
2
IBM Cloud Orchestrator 2.4 Overview
An overview of ICO Version 2.4 will be provided from the following perspectives:
1. Functional
2. Architectural
2.1
Functional Overview
The basic functional capability of ICO involves the management of cloud computing
resources for dynamic data centers. In a nutshell, ICO offers infrastructure, platform, and
orchestration services that make it possible to lower the cost of service delivery (both in
terms of time and skill) while delivering higher degrees of standardization and automation.
In order to determine the benefits of deploying ICO in business terms, the IBM Cloud
Orchestrator Benefits Estimator (URL) is available. A screenshot of the estimator is
provided below.
Figure 2: IBM Cloud Orchestrator Benefits Estimator
11
A more detailed cloud marketplace view of the ICO solution follows.
Figure 3: ICO Cloud Marketplace View
The core functional capabilities of ICO include the following.
Workflow Orchestration.
The Business Process Manager (BPM) component offers a standard library as well
as a graphical editor for workflow orchestration. Overall, this provides a powerful
mechanism for complex and custom business process in the cloud context.
Pattern Management.
The IBM Workload Deployer (IWD) offers sophisticated pattern support for
deploying multi node applications that may consist of complex middleware. Once
again, graphical editor support for pattern management is provided.
Service Management.
Service management options are available in the ICO Enterprise edition. It
provides a set of management utilities to further facilitate business process
management.
Not shown in the diagram is a Scalable Web Infrastructure to facilitate cloud self
service offerings. For more information please consult the ICO knowledge center
(URL). In addition, the ICO resource center is available (URL).
12
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
2.2
Architectural Overview
The following diagram shows the reference deployment topology for ICO. A description of
the reference topology follows.
Figure 4: ICO Architecture Reference Topology
The core of the reference topology is based on a core set of virtual machines:
Deployment Server.
The installation or deployment service for the ICO instance(s).
Central Server 1.
This server hosts the DB2 Database Management System (DBMS). The
performance of the DBMS is critical to the overall solution and is dealt with
extensively in Section 6.
Central Server 2.
This is essentially the “super node” for the ICO instance. This server hosts
OpenStack Keystone, providing identity, token, catalog, and policy services. It
also hosts Business Process Manager (BPM), the primary mechanism for driving
business process workflows. The most critical aspect of this server is managing
Keystone and BPM, as described in Section 6.
Central Server 3.
This server hosts the IBM Workload Deployer pattern engine. Performance
configuration of this component is described in Section 6.
Associated with these core server virtual machines are a number of region servers.
Region servers may represent a specific cluster or geographic zone of cloud compute
nodes. Sample compute nodes are shown for VMware, KVM, and PowerVM, with
associated communication paths. For example, for VMware the VMware community driver
is used to drive the operation of the VMware cluster. For KVM, the OpenStack control
node is used to coordinate the KVM instance.
Given this is a virtual implementation, some considerations should be kept in mind:
 In general, it is more difficult to manage performance in a virtual environment due to
the additional hypervisor management overhead and system configuration.
 Device parallelism via dedicated storage arrays/LUNs is preferred. Sample
approaches, from most impactful to least impactful, are provided below.
o
Separate data stores for “managed from” and “managed to” environments.
13
o
Spread data stores across several physical disks to maximize storage
capability.
o
Separate data stores for image templates and provisioned images.
o
Employ the “deadline” or “noop” scheduler algorithm for management server
and provisioned VMs (see Section 6.4).
o
Optimize base storage capability (i.e. SSD with “VMDirectPath”
enablement for VMware). Servers where this may be critical, due to their
dependency on disk IO capabilities, are Central Server 1 and the VMware
vCenter instances.
 Network optimization, for example 10GbE adoption. In addition, segment customer
networks to an acceptable level to reduce address lookup impact.
14
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
3
Performance Overview
There are two distinct aspects of cloud performance:
1. Performance of the ICO management server itself.
This is the primary focus of this section.
2. Performance of the provisioned server instances.
This is more of a capacity planning statement, and is covered in Section 5.3.
We will provide a general overview of the Key Performance Indicators (KPIs) for the ICO
management server. The following sections will describe the general benchmark
environment, and the associated KPIs.
3.1
Sample Benchmark Environment
The following figure shows a sample configuration that has been used for ICO
benchmarks.
Figure 5: ICO Sample Benchmark Environment
The environment is characterized by the following features, broken down in terms of the
ICO management server (aka “managed from”) and the associated cloud (aka “managed
to”).
Managed from:
o
Server configuration:
15
o
4/5 HS22V Blades with 2 x 4 cores Intel Xeon x5570 2.93 GHz.
8 physical cores per blade,
16 logical cores when hyper-threading is enabled.
72 GB RAM per blade.
2 x Redundant 10G Ethernet Networking (Janice HSSM).
2 x Redundant 8G FC Network (Qlogic FC SM).
Storage configuration: 1 x DS3400 with 4 Exp with 12 Disk 600 GB SAS
10K each (48 x 600 GB = 28.8 TB raw).
Managed to:
o
3.2
Server configuration:
Tens of HS22V Blades with 2 x 6 cores Intel Xeon x5670 2.93
GHz.
12 physical cores per blade,
24 logical cores when hyper-threading is enabled.
72 GB RAM per blade.
2 x Redundant 10G Ethernet Networking (Janice HSSM).
2 x Redundant 8G FC Network (Qlogic FC SM).
o
Storage configuration: 1 x Storwize v7000 with 3 Exp with 12 Disks 2 TB
NL-SAS 7.2k each (36 x 2 TB = 72 TB raw).
o
Storage access has been configured to use the multi-path access granted
by Storwize. In particular, VMware ESXi servers have been configured to
use all of the 8 active paths to access LUNs using a round robin policy.
Key Performance Indicators
The following Key Performance Indicators are managed for ICO through a set of
comprehensive benchmarks.
1. Concurrent User Performance, comprising:
a. Average response time for ICO pages related to administrative tasks.
b. Average response time for ICO pages related to end user tasks.
2. Provisioning throughput, comprising:
a. Provisioning throughput for a vSys with a single part.
b. Average service time for provisioned VMs.
3. LAMP (Linux, Apache, MySQL, Python) stack performance, comprising:
a. vApp deployment time.
b. vApp stop time.
c.
vApp deletion time.
4. Bulk windows stack performance comprising vSys with multiple parts (15 VMs)
provisioning time.
A key aspect of the benchmarks is they are run with associated background workloads and
for a long duration (e.g. weeks or months). The rationale behind this is very simple: to run
benchmarks that closely emulate the customer experience and will drive “real world”
16
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
results (versus overly optimistic lab based results). We will describe the concurrent user
and provisioning throughput KPIs in more detail.
3.2.1 Concurrent User Performance
ICO User Interface performance is established through concurrent user benchmark tests.
In order to understand the applicability of such a benchmark, it is important to understand
what is meant by a concurrent user. Consider:
P = total population for an instance of ICO (including cloud administrators, end
users, etc.).
C = the concurrent user population for an instance of ICO. Concurrent users are
considered to be the set of users within the overall population P that are actively
managing the cloud environment at a point in time (e.g. administrator operations in
the User Interface, provisioning operations, etc.).
In general, P is a much larger value than C (i.e. P >> C). For example, it is not unrealistic
that a total population of 200 users may have a concurrent user population of 40 users (i.e.
20%).
For the concurrent user workload driven for ICO, there are three sets of criteria that drive
the benchmark:
1. Load driving parameters.
2. Data population.
3. Load driving (user) scenarios.
Load Driving Parameters
The following load driving parameters apply.
1. User transaction rate control.
The frequency that simulated users drive actions against the back end is managed
via loop control functions. Closed loop simulation approaches are used where a
new user will enter the system only when a previous user completes. Through the
closed loop system, steady state operations under load may be driven.
2. Think times.
Think times are the “pause” between user operations, meant to simulate the
behavior of a human user. The think time interval used is [100%,300%] (meaning,
the replay via the load driver is up to three times the rate of the scenario recording
rate).
3. Bandwidth throttling.
In order to simulate low speed or high latency lines, bandwidth throttling is
employed for some client workloads. The throttle is set to a value that represents
a moderate speed ADSL connection (cable/DSL simulation setting of 1.5 Mbps
download, 384 Kbps upload).
Data Population Parameters
The benchmark is run against a data model that represents a large scale customer
environment. The following table shows a sample configuration where the system is
populated with data to represent a large number of users, active Virtual System instances,
and active Virtual Machines existing prior to ICO installation. Through this approach, the
workload for managing the solution is representative of some customer environments.
17
Benchmark Parameter
Value
Cloud Administrators
Cloud Domains
Tenants
Users
1
11
200
1000
2
(KVM, VMware)
1
1
40
(20Linux, 20 Windows)
20 + 1
(20 Linux vSys patterns,
1 bulk Windows pattern)
1
(LAMP vApp for VMware domain)
5
(1 flavor for RHEL, 3 flavors for
Windows, 1 flavor for vApp)
20
(1 per Linux vSys Pattern)
400
(10 per image template 
200 Linux, 200 Windows)
Hypervisor Types
Cloud Groups
Environment Profile
Image Templates
vSys Patterns
vApp Patterns
Flavors
Active vSys instances
Standalone (Unmanaged) VMs
Figure 6: Benchmark Data Model Population
Load Driving (User) Scenarios
The concurrent user population (i.e. C) is broken down into the following user profile
distribution and scenarios.
User Profile
Number of Users: 20 (50%
overall)
User Type: End User
Task Type: VM Provisioning
Activity: vSys with single part
(Linux) provisioning through SelfService Catalog (SSC) offering on
VMware.
Number of Users: 16 (40%
overall)
User Type: End User
Task Type: User Management
Activity: End user operations
through Self-Service Catalog
(SSC) offering.
Scenario per User
1. Login.
2. Provision vSys single part using SSC
offering.
3. Wait until available.
4. Go to the vSys instance details page.
5. Delete vSys using SSC offering.
6. Wait until deletion complete.
7. Logout.
8. Enter next cycle according to arrival rate.
1. Login.
2. Submit SSC offering "Create User in VM",
selecting one of the VMs belonging to one
of the pre-populated vSys.
3. Wait until done.
4. Submit SSC offering "Delete User in VM",
selecting the same VM.
5. Wait until done.
6. Logout.
7. Enter next cycle according to arrival rate.
18
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
Number of Users: 2 (5% overall)
User Type: Administrator
Task Type: Monitoring
Activity: Administrative operations
through the IBM Workload
Deployer user interface.
Number of Users: 1 (2.5% overall)
User Type: End User
Task Type: Provisioning
Activity: vApp (LAMP)
provisioning through IBM
Workload Deployer user interface
on VMware.
Number of Users: 1 (2.5% overall)
User Type: End User
Task Type: Provisioning
Activity: vSys with multiple parts
(bulk Windows) provisioning
through Self-Service Catalog
(SSC) offering on VMware.
1.
2.
3.
4.
5.
6.
7.
8.
9.
1.
2.
3.
4.
5.
6.
7.
8.
9.
Login.
List hypervisors.
Select a hypervisor.
List VMs in hypervisor.
Show all instances.
Go to "My Requests".
Sort the requests by status.
View the trace log.
Logout.
Login.
Provision vApp using the IWD UI.
Wait until available.
Stop vApp using the IWD UI.
Wait until done.
Delete vApp using the IWD UI.
Wait until deletion complete.
Logout.
Enter next cycle according to arrival rate.
1. Login.
2. Provision vSys bulk Windows using SSC
offering.
3. Wait until available.
4. Go to vSys instance details page.
5. Delete vSys bulk Windows using SSC
offering.
6. Wait until deleted.
7. Logout.
8. Enter next cycle according to arrival rate.
Figure 7: Load Driving (User) Scenarios
In overall terms, 55% of the load driving activities are driving Virtual Machine provisioning
scenarios. The remaining 45% of scenarios are general administration and management
tasks. For the active workload, the user operations meet the following response time
thresholds.
Administrative page response times: 90% of pages < 10s, 100% of pages < 15s.
End user operations: 90% of pages < 2s, 100% of pages < 5s.
3.2.2 Provisioning Performance
Cloud provisioning is enormously complex in performance terms. Hardware configuration,
user workloads, image properties, and a multitude of other factors combine to determine
overall capability. ICO provisioning performance is typically measured via a closed
system, defined as an isolated system where we can demonstrate a constant sustained
provisioning workload. In order to achieve this, as requests complete within the system,
new requests are initiated.
19
Figure 8: Provisioning Performance in a Closed System
The performance systems running ICO workloads literally run for months. These systems
are treated like customer systems with 24x7 operations and field ready maintenance
approaches in place (as described in Section Error! Reference source not found.). In
terms of provisioning performance, the following are sample statistics a long run scenario
driven for a number of weeks, once a period of operational stability has been reached
based on the recommendations provided in this paper.
Number of systems provisioned: > 1,000,000 VMs.
Provisioning rate (average): > 400 VMs/hour.
Service times (average): 2 minutes 38 seconds (VMware non linked clones).
Workflow capability: On the order of 300 workflows per hour (generally short
running workflows under a minute in duration).
Success rate: > 99.99%
Given this is sustained, continuous workload, higher peak workloads are, of course,
possible. The success rate is considered especially noteworthy.
20
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
Performance Benchmark Approaches
4
As part of cloud management and capacity planning, it is valuable to manage cloud
benchmarks. Value propositions include:
Understanding the capability of the cloud infrastructure (and potentially poorly
configured or underperforming components of the infrastructure).
Understanding the base capability of the ICO implementation and associated
customization.
Understanding the long term performance stability of the system.
We will describe basic system monitoring approaches, infrastructure benchmarks, and
cloud benchmarks.
4.1
Monitoring and Analysis Tools
The following table shows the core recommended monitoring and analysis tools.
Tool
Description
pdcollect
ICO log collection tool.
Documentation and recommended invocation: ICO Product Knowledge Center
esxtop
VMware performance collection tool.
Documentation: URL
Recommended invocation: esxtop -b -a -d 60 -n <number_of_samples> > <output file>
nmon
nmon is a comprehensive system monitoring tool for the UNIX platform. It is
highly useful for understanding system behavior.
Documentation: URL
Sample invocation: nmon -T -s <samplerate> -c <iterations> -F <output file>
Note: On Windows systems, Windows perfmon may be used.
db2support
Database support collection tool.
Documentation: URL
Recommended invocation: db2support <result directory> -d <database> -c -f -s -l
DBMS
Snapshots
WAIT
DBMS snapshot monitoring can offer insight into SQL workload, and in particular
expensive SQL statements.
Documentation: URL
Java WAIT monitoring can provide a non-invasive view of JVM performance
through accumulated Java cores and analytic tools.
Documentation and recommended invocation: URL
Figure 9: Monitoring and Analysis Tools
21
4.2
Infrastructure Benchmark Tools
The following table shows some recommended infrastructure benchmark tools.
Tool
Description
iometer
I/O subsystem measurement and characterization tool for single and clustered
systems.
Documentation: URL
Recommended invocation: dynamo /m <client host name or ip>
iperf
TCP and UDP measurement and characterization tool that reports bandwidth,
delay, jitter, and datagram loss.
Documentation: URL
Recommended server invocation: iperf –s
Recommended client invocation #1: iperf -c <server host name or ip>
Recommended client invocation #2: iperf -c <server host name or ip> -R
UnixBench
UNIX measurement and characterization tool, with reference benchmarks and
evaluation scores.
Documentation: URL
Recommended invocation: ./Run
Figure 10: Infrastructure Benchmark Tools
4.3
Cloud Benchmarks
Cloud benchmarks should be based on enterprise utilization. Sample benchmarks that are
easy to manage include the following.
1. Single VM deployment times.
2. Small scale concurrent VM deployment times (e.g. 10 requests in parallel).
3. REST API response times.
It is recommended to establish a small load driver, record a baseline, and then use these
small benchmarks as a standard to assess ongoing cloud health. More complex
benchmarks, including client request monitoring approaches, may of course be
established.
For OpenStack specific benchmarks, OpenStack Rally may be leveraged (see the
References section for further detail). In addition, the Open Systems Group is involved in
cloud computing benchmark standards. A report, including the IBM CloudBench tool, is
available in the References section.
22
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
5
Capacity Planning Recommendations
We will provide capacity planning recommendations through three approaches.
5.1
Static planning via a spreadsheet approach.
Capacity planning for the ICO management server (aka the “managed from”
infrastructure).
Capacity planning for the provisioned Virtual Machines (aka the “managed to”
infrastructure).
Cloud Capacity Planning Spreadsheet
In order to provide a desired hardware and software configuration for an ICO
implementation, a wide range of parameters must be understood. The following questions
are usually relevant.
1. What operations are expected to be performed with ICO?
2. What are the average and peak concurrent user workloads?
3. What is the enterprise network topology?
4. What is the expected workload for provisioned virtual servers, and how do they
map to the physical configuration?
5. For the provisioned servers:
a. What is the distribution size?
b. What are the application service level requirements?
A capacity planning spreadsheet is attached to this paper (“ ICO Capacity Planning Profile
v2.4.0.xlsx”). The spreadsheet may be used to provide a cloud profile for further sizing
activities (e.g. a capacity planning activity in association with the document authors).
23
5.2
IBM Cloud Orchestrator Management Server Capacity
Planning
The ICO management server requirements are documented in the ICO Knowledge Center
(URL). The summary table is repeated here for discussion purposes.
Processor
(vCPUs)
Memory
(GB)
Free Storage
(GB)
Minimum
1
4
117
Recommended
2
8
117
Minimum
2 vCPUs
6 GB
100 GB
Recommended
4 vCPUs
12 GB
200 GB
Minimum
2 vCPUs
8 GB
50 GB
Recommended
6 vCPUs
12 GB
200 GB
Minimum
2 vCPUs
6 GB
146 GB
Recommended
4 vCPUs
8 GB
160 GB
n/a
n/a
n/a
Recommended
4 vCPUs
8 GB
20 GB
Minimum
2 vCPUs
4 GB
32 GB
Recommended
4 vCPUs
8 GB
32 GB
Minimum
2 vCPUs
8 GB
77 GB
Recommended
8 vCPUs
8 GB
160 GB
Minimum
2 vCPUs
4 GB
77 GB
Recommended
8 vCPUs
8 GB
160 GB
Minimum
4 vCPUs
32 GB
160 GB
Server & Configuration
Deployment
Server
Central Server 1
Central Server 2
Central Server 3
Minimum
SA Application
Manager
Neutron Network
Server
Region Server:
VMware
Region Server:
KVM, Power,
z/VM
KVM Compute
Node
Recommended
Application Specific
Figure 11: ICO Management Server Capacity Planning
While further qualifiers are available in the Knowledge Center, some comments apply.
In general, the recommended vCPU and memory allocations should be met.
To determine the ratio of virtual to physical CPUs, monitoring of the production
system is required. For performance verification, a 1:1 mapping is used.
24
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
5.3
For the physical mapping, it is important to distinguish between “real” cores and
hyper threaded (HT) cores. External benchmarks suggest an HT core may yield
30% of the capability of a “real” core.
The recommended storage amounts are highly subjective. For example, the
minimum recommendations are sufficient for performance verification systems
driven for months (with some minor exceptions).
Provisioned Virtual Machines Capacity Planning
Managing cloud workloads is typically driven as a categorization exercise where workload
“sizes” are used to determine the overall capacity requirements. A capacity planning tool is
available for managing the cloud workload sizes (URL). We will provide an overview of
using this tool.
The first step is to provide any relevant business value. In the absence of a defined
opportunity, simple “not applicable” entries may be given (per the sample below). Once
submitted, you must accept the usage agreement which will bring up the demographic
page.
Figure 12: Capacity Planning Tool: Inquiry Form
25
The demographic page simply asks for generic information about the submitter.
Figure 13: Capacity Planning Tool: User Demographic Information
When “Continue” is selected, then the systems and storage page is provided.
Figure 14: Capacity Planning Tool: Systems and Storage
26
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
Then the target system and associated utilization and Virtual Machine requirements are
selected. Note for the utilization we select 20% headroom to support peak cloud
workloads.
Figure 15: Capacity Planning Tool: System and Workload Options
At this point, the virtual machine requirements may be selected. Note a number of entries
may be added.
Figure 16: Capacity Planning Tool: Virtual Machine Requirements
A confirmation screen is then provided to finalize the capacity planning request.
Figure 17: Planning Tool: Confirmation Screen
The summary capacity planning recommendation is then provided. The summary details
the compute node, CPU, memory, and storage requirements based on the selected
configuration and associated workloads.
27
Figure 18: Planning Tool: System Summary
28
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
6
Cloud Configuration Recommendations
The ICO 2.4 offering provides suitable configuration as part of the default installation.
However, there are some specific configuration aspects that may improve the capability.
The configuration points follow.
1. OpenStack Keystone worker support.
2. Disabling the IWD service.
3. IBM Workload Deployer configuration.
4. Virtual Machine IO scheduler.
5. Advanced Configuration and Power Interface (ACPI) management.
6. Java Virtual Machine heap.
7. Database configuration.
8. Database management.
6.1
OpenStack Keystone Worker Support
The initial SCO 2.3 offering contained a Keystone implementation that is characterized by
a single execution thread instance. For ICO 2.4, improvements have been made to exploit
multiple concurrent Keystone workers. This change offers advantages when Keystone
exhibits high request latency, or is seen to consume a significant amount of a virtual CPU
(e.g. > 80%). In order to exploit this support, it is necessary to revise the configuration to
exploit multiple workers. Further detail on this is provided below.
With the Keystone Worker improvement in place, the following configuration change will
allow a pool of four public workers, and four administrative workers. This will permit
increased concurrency, at the expense of virtual CPU consumption. As a result, the virtual
CPU allocation should be increased based on monitoring data. In the “4+4” worker
example below, it is expected to increase the virtual CPU allocation on the order of two to
four virtual CPUs.
Location: (Central Server 2)
/etc/keystone/keystone.conf
# The number of worker processes to serve the public WSGI application
# (integer value).
public_workers=4
# The number of worker processes to serve the admin WSGI application
#(integer value).
admin_workers=4
Figure 19: Keystone Worker Configuration
6.2
Disabling the IWD Service
The IWD service consumes significant resources across the ICO management stack. In
the event the service is not required, it should be disabled.
29
6.3
IBM Workload Deployer Configuration
The IWD component offers a number of configuration options. One specific option
provides the ability to control a polling interval to refresh cloud information. Based on the
size of the cloud, this configuration option should be changed.
Location: (Central Server 3)
/opt/ibm/rainmaker/purescale.app/private/expanded/ibm/rainmaker.vmsupport4.0.0.1/config/vmpublish.properties
RuntimeInterval=12000
Original:
Recommended: RuntimeInterval=30000
Figure 20: IWD Configuration
6.4
Virtual Machine IO Scheduler Configuration
Each Linux instance has an IO scheduler. The intent of the IO scheduler is to optimize IO
performance, potentially by clustering or sequencing requests to reduce the physical
impact of IO. In a virtual world, however, the operating system is typically disassociated
from the physical world through the hypervisor. As a result, it is recommended to alter the
IO scheduler algorithm so that it is more efficient in a virtual deployment, with scheduling
delegated to the hypervisor.
The default scheduling algorithm is typically “cfq” (completely fair queuing). Alternative and
recommended algorithms are “noop” and “deadline”. The “noop” algorithm, as expected,
does as little as possible with a first in, first out queue. The “deadline” algorithm is more
advanced, with priority queues and age as a scheduling consideration. System specific
benchmarks should be used to determine which algorithm is superior for a given workload.
In the absence of available benchmarks, we would recommend the “deadline” scheduler be
used.
The following console output shows how to display and modify the IO scheduler algorithm
for a set of block devices. In the example, the “noop” scheduler algorithm is set. Note to
ensure the scheduler configuration persists, it should be enforced via the operating system
configuration (e.g. /etc/rc.local).
Figure 21: Modifying the IO Scheduler
6.5
Advanced Configuration and Power Interface Management
The Advanced Configuration and Power Interface (ACPI) operating system support may
exhibit high virtual CPU utilization and offers limited value in virtual environments. It is
recommended to disable ACPI on the ICO “managed from” nodes through the following
steps.
1.
Disabling “kacpid”.
30
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
To switch off the kernel ACPI daemon, edit “/etc/grub.conf” and append "acpi=off" to
the kernel boot command line. For example:
title Red Hat Enterprise Linux (2.6.32-431.el6.x86_64)
root (hd0,0)
kernel /boot/vmlinuz-2.6.32-431.el6.x86_64 ro root=UUID=e1131bc1-bdbc-4b2e9ae7-d540b32b1f35
initrd /boot/initramfs-2.6.32-431.el6.x86_64.img
becomes:
title Red Hat Enterprise Linux (2.6.32-431.el6.x86_64)
root (hd0,0)
kernel /boot/vmlinuz-2.6.32-431.el6.x86_64 ro root=UUID=e1131bc1-bdbc-4b2e-9ae7d540b32b1f35 acpi=off
initrd /boot/initramfs-2.6.32-431.el6.x86_64.img
2.
Disabling the user-space acpi daemon.
To disable user space ACPI on managed-from nodes:
chkconfig acpid off
3. Reboot the nodes.
6.6
Java Virtual Machine Heap Configuration
The default Java Virtual Machine (JVM) heap sizes are intended to be economical.
However, in the presence of sufficient available memory, it is recommended to increase
the heap allocation. The three change sets below are recommended for application. They
apply to Central Server 3 and, in particular, the IBM Workload Deployer instance. The IWD
instance should be restarted once the changes are complete.
Location:
/opt/ibm/rainmaker/purescale.app/config/overrides.config
Original:
/config/zso/jvmargs = ["-Xms1024M","-Xmx1024M"]
Recommended: /config/zso/jvmargs = ["-Xms1536M","-Xmx1536M"]
Location:
/etc/rc.d/init.d/iwd-utils
Original:
sed -i -e 's/3072M/1024M/g' $ZERO_DIR/config/overrides.config
Recommended: sed -i -e 's/3072M/1536M/g' $ZERO_DIR/config/overrides.config
Location:
/opt/ibm/rainmaker/purescale.app/config/zero.config
Original:
"-Xms1024M","-Xmx1024M"
Recommended: "-Xms1536M","-Xmx1536M"
Figure 22: Java Virtual Machine Heap Change Sets
31
6.7
Database Configuration
ICO is deployed with a DB2 database. The performance of the database is critical to the
overall capability of the solution. The following database configuration changes are
recommended for a base ICO 2.4 installation.
Type
Configuration
Configuration
For each relevant database set:
STMT_CONC = LITERALS
LOCKTIMEOUT = 60
NUM_IOCLEANERS = AUTOMATIC
NUM_IOSERVERS = AUTOMATIC
AUTO_REORG = ON
For example:
db2 UPDATE DB CFG FOR OPENSTAC USING LOCKTIMEOUT 60
Foreign Key
Modification
An OpenStack foreign key should be modified to enable cascading
deletes. Please apply the “ICO_MODIFY_FKEY.sh” script provided with
this paper.
Figure 23: Database Configuration Change Sets
6.8
Database Management
Generally speaking, the “out of the box “database configuration will achieve good results
for both large and small installations. The following recommendations are primarily in the
area of database maintenance.
6.8.1 DBMS Versions
The following DBMS versions are recommended. All versions should be 64 bit.
Version
DB2 10.5 fp5 or later
Notes
The minimum recommended fixpack level is 10.5 fp3a.
Figure 24: DBMS Versions
6.8.2 Automatic Maintenance
DB2 offers a number of automatic maintenance options. Automatic statistics collection
(aka runstats) is considered a basic and necessary configuration setting, and is enabled for
the product by default. Two other recommended configuration settings follow. It is
expected these configuration settings will be enabled by default in future versions of the
products.
1. Real time statistics. The default runstats configuration generally collects statistics
at two hour intervals. The real time statistics option provides far more granular
statistics collection, essentially generating statistics as required at statement
compilation time.
2. Automatic reorganization. Many customers ignore database reorganization and
system performance starts to decline. This can be especially critical in the cloud
space. The recommendation is to enable automatic reorganization support so it is
self managed by the DBMS. Further discussion of database reorganization is
covered in section 6.9.3.
32
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
The following commands may be used to enable these automatic maintenance options. At
the time of this writing, they are conditionally recommended. Each of these options has
runtime impact and should be monitored to ensure there is no unnecessary system impact.
In order to facilitate this, they should only be enabled once the system has been
established and monitored. In addition, automatic reorganization is dependent on the
definition of a maintenance window (see the DB2 Knowledge Center for more detail).
update db cfg for OPENSTAC using AUTO_STMT_STATS ON
update db cfg for OPENSTAC using AUTO_REORG ON
Figure 25: Database Automatic Maintenance Configuration
6.8.3 Operating System Configuration (Linux)
The product installation guides have comprehensive instructions for Operating System prerequisites and configuration. However, on Linux systems improper configuration is
common, so we will highlight specific issues.
The first configuration point to check is the file system ulimit for the maximum number of
open files allowed for a process (i.e. nofiles). The value for this kernel limit should be either
“unlimited” or “65536”. The DB2 reference for this configuration setting is available here.
In addition, the kernel semaphore and message queue specifications should be correct.
These configuration settings are a function of the physical memory available on the
machine. The DB2 reference for these configuration settings is available here.
6.9
Database Hygiene Overview
The following steps will be described for database hygiene overview:
1. Database backup management.
2. Database statistics management.
3. Database reorganization.
4. Database archive management.
5. Database maintenance automation.
Steps make reference to recommended scheduling frequencies. The general purpose
“cron” scheduling utility may be used to achieve this. However, other scheduling utilities
may also be used. The key aspect of a cron’ed activity is it is scheduled at regular
intervals (e.g. nightly, weekly) and typically does not require operator intervention.
Designated maintenance windows may be used for these activities.
6.9.1 Database Backup Management
It is recommended that nightly database backups be taken. The following figures offer a
sample database offline backup (utilizing compression), along with a sample restore.
backup db <dbname> user <user> using <password> to <backup directory> compress
Figure 26: Database Backup with Compression Command
restore db <dbname> from <backup directory> taken at <timestamp> without
prompting
Figure 27: Database Offline Backup Restore
Online backups may be utilized as well. The following figure provides commands that
comprise a sample weekly schedule. With the given schedule, the best case scenario is a
restore requiring one image to restore (Monday failure using the Sunday night backup).
The worst case scenario would require four images (Sunday + Wednesday + Thursday +
33
Friday). An alternate approach would be to utilize a full incremental backup each night to
make the worst case scenario two images. The tradeoffs for the backup approaches are
the time to take the backup, the amount of disk space consumed, and the restore
dependencies. A best practice can be to start with nightly full online backups, and
introduce incremental backups if time becomes an issue.
(Sun)
(Mon)
(Tue)
(Wed)
(Thu)
(Fri)
(Sat)
backup
backup
backup
backup
backup
backup
backup
db
db
db
db
db
db
db
<dbname>
<dbname>
<dbname>
<dbname>
<dbname>
<dbname>
<dbname>
online
online
online
online
online
online
online
include logs use tsm
incremental delta use
incremental delta use
incremental use tsm
incremental delta use
incremental delta use
incremental use tsm
tsm
tsm
tsm
tsm
Figure 28: Database Online Backup Schedule
Note to enable incremental backups, the database configuration must be updated to track
page modifications, and a full backup taken in order to establish a baseline.
update db cfg for OPENSTAC using TRACKMOD YES
Figure 29: Database Incremental Backup Enablement
To restore the online backups, either a manual or automatic approach may be used. For
the manual approach, you must start with the target image, and then revert to the oldest
relevant backup and move forward to finish with the target image. A far simpler approach
is to use the automatic option and let DB2 manage the images. A sample of each
approach is provided below, showing the restore based on the Thursday backup.
restore db <dbname> incremental use tsm taken at <Sunday full timestamp>
restore db <dbname> incremental use tsm taken at <Wednesday incremental
timestamp>
restore db <dbname> incremental use tsm taken at <Thursday incremental delta
timestamp>
Figure 30: Database Online Backup Manual Restore
restore db <dbname> incremental auto use tsm taken at <Thursday incremental
delta timestamp>
Figure 31: Database Online Backup Automatic Restore
In order to support online backups, archive logging must be enabled. The next subsection
provides information on archive logging, including the capability to restore to a specific
point in time using a combination of database backups and archive logs.
Database Log Archiving
A basic approach we will advocate is archive logging with the capability to support online
backups. The online backups themselves may be full, incremental (based on the last full
backup), and incremental delta (based on the last incremental backup). In order to enable
log archiving to a location on disk, the following command may be used.
update db cfg for <dbname> using logarchmeth1 DISK:/path/logarchive
Figure 32: Database Log Archiving to Disk
34
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
Alternatively, in order to enable log archiving to TSM, the following command may be
used1.
update db cfg for <dbname> using logarchmeth1 TSM
Figure 33: Database Log Archiving to TSM
Note that a “logarchmeth2” configuration parameter also exists. If both of the log archive
method parameters are set, each log file is archived twice (once per log archive method
configuration setting). This will result in two copies of archived log files in two distinct
locations (a useful feature based on the resiliency and availability of each archive location).
Once the online backups and log archive(s) are in effect, the recovery of the database may
be performed via a database restore followed by a roll forward through the logs. Several
restore options have been previously described. Once the restore has been completed,
roll forward recovery must be performed. The following are sample roll forward operations.
rollforward <dbname> to end of logs
Figure 34: Database Roll Forward Recovery: Sample A
rollforward <dbname> to 2012-02-23-14.21.56 and stop
Figure 35: Database Roll Forward Recovery: Sample B
It is worth noting the second example recovers to a specific point in time. For a
comprehensive description of the DB2 log archiving options, the DB2 Knowledge Center
should be consulted (URL). A service window (i.e. stop the application) is typically
required to enable log archiving.
Database Backup Cleanup
Unless specifically pruned, database backups may accumulate and cause issues with disk
utilization or, potentially, a stream of failed backups. If unmonitored backups begin to fail, it
may make disaster recovery near impossible in the event of a hardware or disk failure. A
simple manual method to prune backups follows.
find /backup/DB2 -mtime +7 | xargs rm
Figure 36: Database Backup Cleanup Command
A superior approach is to let DB2 automatically prune the backup history and delete your
old backup images and log files. A sample configuration is provided below.
update db cfg for OPENSTAC using AUTO_DEL_REC_OBJ ON
update db cfg for OPENSTAC using NUM_DB_BACKUPS 21
update db cfg for OPENSTAC using REC_HIS_RETENTN 180
Figure 37: Database Backup Automatic Cleanup Configuration
It is also generally recommended to have the backup storage independent from the
database itself. This provides a level of isolation in the event volume issues arise (e.g. it
ensures that a backup operation will not fill the volume hosting the tablespace containers,
which could possibly lead to application failures).
1
The log archive methods (logarchmeth1, logarchmeth2) have the ability to associate configuration
options with them (logarchopt1, logarchopt2) for further customization.
35
6.9.2 Database Statistics Management
As discussed in the previous “Automatic Maintenance” section, database statistics ensure
that the DBMS optimizer makes wise choices for database access plans. The DBMS is
typically configured for automatic statistics management. However, it may often be wise to
force statistics as part of a nightly or weekly database maintenance operation. A simple
command to update statistics for all tables in a database is the “reorgchk” command.
reorgchk update statistics on table all
Figure 38: Database Statistics Collection Command
One issue with the reorgchk command is it does not enable full control over statistics
capturing options. For this reason, it may be beneficial to perform statistics updates on a
table by table level. However, this can be a daunting task for a database with hundreds of
tables. As a result, the following SQL statement may be used to generate administration
commands on a table by table basis.
select 'runstats on table ' || STRIP(tabschema) || '.' || tabname || ' with
distribution and detailed indexes all;' from SYSCAT.TABLES where tabschema in
('CIRnnnnn','GLEnnnnn','NOAnnnnn','SCEnnnnn','KSDB');
Figure 39: Database Statistics Collection Table Iterator
6.9.3 Database Reorganization
Over time, the space associated with database tables and indexes may become
fragmented. Reorganizing the table and indexes may reclaim space and lead to more
efficient space utilization and query performance. In order to achieve this, the table
reorganization command may be used. Note, as discussed in the previous “Automatic
Maintenance” section, automatic database reorganization may be enabled to reduce the
requirement for manual maintenance.
The following commands are examples of running a “reorg” on a specific table and its
associated indexes. Note the “reorgchk” command previously demonstrated will actually
have a per table indicator of what tables require a reorg. Using the result of “reorgchk” per
table reorganization may be achieved for optimal database space management and usage.
reorg table <table name> allow no access
reorg indexes all for table <table name> allow no access
Figure 40: Database Reorganization Commands
It is important to note there are many options and philosophies for doing database
reorganization. Every enterprise must establish its own policies based on usage, space
considerations, performance, etc. The above example is an offline reorg. However it is
possible to also do an online reorg via the “allow read access” or “allow write access”
options. The “notruncate” option may also be specified (indicating the table will not be
truncated in order to free space). The “notruncate” option permits more relaxed locking and
greater concurrency (which may be desirable if the space usage is small or will soon be
reclaimed). If full online access during a reorg is required, the “allow write access” and
“notruncate” options are both recommended.
36
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
Note it is also possible to use our table iteration approach to do massive reorgs across
hundreds of tables as shown in the following figure. The DB2 provided snapshot routines
and views (e.g. SNAPDB, SNAP_GET_TAB_REORG) may be used to monitor the status
of reorg operations.
select 'reorg table ' || STRIP(tabschema) || '.' || tabname || ' allow no
access;' from SYSCAT.TABLES where tabschema in
('CIRnnnnn','GLEnnnnn','NOAnnnnn','SCEnnnnn','KSDB');
select 'reorg indexes all for table ' || STRIP(tabschema) || '.' || tabname ||
' allow no access;' from SYSCAT.TABLES where tabschema in
('CIRnnnnn','GLEnnnnn','NOAnnnnn','SCEnnnnn','KSDB');
Figure 41: Database Reorganization Table Iterator
37
7
Summary Cookbook
The following tables provide a cookbook for the solution implementation. The cookbook
approach implies a set of steps the reader may “check off” as completed to provide a
stepwise implementation of the ICO solution. The recommendations will be provided in
three basic steps:
1. Base installation recommendations.
2. Post installation recommendations.
3. High scale recommendations.
All recommendations are provided in tabular format. The preferred order of implementing
the recommendations is in order from the first row of the table through to the last.
7.1
Base Installation Recommendations
The base installation recommendations are considered essential to a properly functioning
ICO instance. All steps should be implemented.
Identifier
B1
Description
Perform the base ICO installation, ensuring the recommended
configuration described in Section 5.2 is achieved.
A central DB2 server should be used (i.e. the region servers should not
manage a local DBMS unless there are compelling geographic
considerations). Where possible it is recommended to install the DBMS
on bare metal, or in a DBA managed pool, to facilitate performance
management.
B2
Enable the OpenStack Keystone worker support (Section 6.1).
B3
Disable the IWD service, if possible (Section 6.2).
B4
If the IWD service is required, optimize the IWD component (Section 6.3).
B5
Configure the Linux IO scheduler (Section 6.4).
B6
Disable the ACPI management (Section 6.5).
B7
Ensure the Java heaps are optimized (Section 6.6).
B8
Configure the central database (Section 6.7).
Figure 42: Base Installation Recommendations
38
Status
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
7.2
Post Installation Recommendations
The post installation recommendations will provide additional throughput and superior
functionality. All steps should be implemented.
Identifier
Description
P1
Perform a set of infrastructure and ICO benchmarks to determine the
viability of the installation (see Sections 4.2 and 4.3).
P2
Implement the database statistics maintenance activity per Section 6.9.2.
P3
Implement the database reorg maintenance activity per Section 6.9.3.
P4
Implement a suitable backup and disaster recovery plan comprising
regular backups of all critical server components (including the database
and relevant file system objects). Guidelines are provided in the ICO
Knowledge Center (URL).
Status
Figure 43: Post Installation Recommendations
7.3
High Scale Recommendations
The high scale recommendations should be incorporated once the production installation
wants to support the high water mark for scalability. All steps may be optionally
implemented over time based upon workload.
Identifier
Description
S1
Apply the latest ICO fixpack.
S2
Monitor the performance of the installation (Section 4.1) and adjust the
management server to the recommended installation values (Section 5.2)
as appropriate.
S3
Optimize Central Server 1 (DBMS) performance. A basic way to achieve
this is to have dedicated, high performance storage allocated to the
database containers and logs.
Figure 44: High Scale Recommendations
39
Status
APPENDIX A: IBM CLOUD
ORCHESTRATOR MONITORING OPTIONS
Monitoring is important to understand and ensure the health of any cloud solution. A
number of monitoring approaches are available for ICO. The solutions are described via
the following summary sections, broken down into three categories.
1. OpenStack monitoring via Ceilometer.
2. ICO monitoring via IBM BPM.
3. Infrastructure monitoring via IBM Tivoli Monitoring (ITM) and third party solutions.
A separate appendix is provided that is specific to OpenStack Keystone monitoring.
A.1
OpenStack Monitoring
OpenStack monitoring is provided via the Ceilometer component. Ceilometer offers a
comprehensive and customizable infrastructure, including support for event and threshold
management. Note while Ceilometer is not enabled as part of the base ICO 2.4
distribution, it is a constituent of the OpenStack Grizzly base, with continued enhancement
in subsequent OpenStack releases.
Ceilometer provides three distinct types of metrics:
1. Cumulative: counters that accumulate or increase over time.
2. Gauge: counters that offer discrete, point in time values.
3. Delta: differential counters showing change rates.
A vast array of metrics is provided by Ceilometer. An easy way to interactively derive the
set of available metrics is to query Ceilometer directly (see the sample below). In addition,
the Ceilometer documentation provides the default set, with associated attributes (URL).
ceilometer meter-list -s openstack
Figure 45: OpenStack Ceilometer Metrics
40
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
The following table provides a core set of recommended monitoring points for OpenStack.
A broader set may of course be used.
Component
Meters
Nova (Compute Node Management)
cpu_util
disk.read.requests.rate
disk.write.requests.rate
disk.read.bytes.rate
disk.write.bytes.rate
network.incoming.bytes.rate
network.outgoing.bytes.rate
network.incoming.packets.rate
network.outgoing.packets.rate
The following counters require enablement:
compute.node.cpu.kernel.percent
compute.node.cpu.idle.percent
compute.node.cpu.user.percent
compute.node.cpu.iowait.percent
Neutron (Network Management)
network.create
network.update
subnet.create
subnet.update
Glance (Image Management)
image.update
image.upload
image.delete
Cinder (Volume Management)
volume.size
Swift (Object Storage Management)
storage.objects
storage.objects.size
storage.objects.containers
storage.objects.incoming.bytes
storage.objects.outgoing.bytes
Heat (Orchestration)
stack.create
stack.update
stack.delete
stack.suspend
stack.resume
Figure 46: OpenStack Ceilometer Core Metrics
In addition, Ceilometer provides a REST API that allows cloud administrators to record
KPIs. For instance, infrastructure metrics could be placed in Ceilometer with a HTTP
POST request. As Ceilometer includes a data store, as well as some basic statistical
functionality, it is a candidate for an integration point for cloud monitoring data.
A.2
IBM Cloud Orchestrator Monitoring
ICO monitoring should be employed to address the solution layer “above” OpenStack. The
primary mechanism for ICO monitoring is enablement of the BPM performance data
41
warehouse (relevant information available in the References section)2. The performance
data warehouse may be enabled via “autotracking”, which will enable both custom KPIs as
well as the default total time KPIs. The core KPIs to understand BPM capability are:
BPM processes executed per second.
Average service times per BPM process.
It is important to note that given Ceilometer provides a general plugin and distribution
infrastructure, it may be combined with the ICO monitoring solution. A sample approach
for managing these monitoring points follows.
1. Derive a BPM plugin to retrieve raw times from the BPM performance data
warehouse (PDWDB) database. The preferred method is the provided REST
interface (versus direct database access).
2. Perform calculations based on the raw data. For example, converting a series of
milestones into performance KPIs, or calculating statistical quantities (e.g.
standard deviation, harmonic mean).
3. Push the results to Ceilometer as the meter distribution mechanism.
4. Read the results via the Ceilometer REST API and display in the visualization tool
of your choice.
A.3
Infrastructure Monitoring
Infrastructure monitoring can address the operating system and hypervisor health of the
cloud. Available tools include IBM Tivoli Monitoring (ITM) or the open source offering
Nagios. For example, ITM v6.2 provides the follow infrastructure monitoring agents (for
reference, see URL).
1. IBM Tivoli Monitoring Endpoint.
2. Linux OS.
3. UNIX Logs.
4. UNIX OS.
5. Windows OS.
6. i5/OS®.
7. IBM Tivoli Universal Agent.
8. Warehouse Proxy.
9. Summarization and Pruning.
10. IBM Tivoli Performance Analyzer.
2
It is worth noting that BPM is built on IBM WebSphere and as a result, WebSphere monitoring
capabilities also apply.
42
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
Critical KPIs to monitor at the infrastructure level are summarized in the following table
(VMware is provided as a representative hypervisor sample).
Component
Meters
Operating System
DBMS: ITM for DB2 (URL)
Application Server: ITCAM Agent for
WebSphere Applications (URL)
J2EE: ITCAM Agent for J2EE (URL)
HTTP: ITCAM Agent for HTTP Servers
(URL)
Hypervisor: ITM for Virtual Environments
(URL)
Hypervisor: VMware esxtop sample
43
CPU utilization including kernel,
user, IO wait, and idle times.
Disk utilization including read/write
request and byte rates.
Network utilization including
incoming and outgoing packet and
byte rates.
Volume free space across the
central and region servers. Special
attention should be paid to the
Virtual Image Library on Central
Server 2 to ensure the
“/home/library” space is well
managed.
Application IO activity workspace.
Application lock activity workspace.
Application overview workspace.
Buffer Pool workspace.
Connection workspace.
Database workspace.
Database Lock Activity workspace.
Historical Summarized Capacity
Weekly workspace.
Historical Summarized
Performance Weekly workspace.
Locking Conflict workspace.
Tablespace workspace.
WebSphere Agent Summary
workspace.
Application Server Summary
workspace.
Application Health Summary
workspace.
Web Server Agent workspace.
Server workspace.
CPU workspace.
Disk workspace.
Memory workspace.
Network workspace.
Resource Pools workspace.
Virtual Machines workspace.
CPU:
Run(%RUN),
Wait (%WAIT),
Ready (%RDY),
Co-Stop (%CSTP).
Figure 47: Infrastructure Core Metrics
44
Network:
Dropped packets
(%DRPTX, %DRPRX).
IO:
Latency (DAVG, KAVG),
Queue length (QUED)
Memory:
Memory reclaim (MCTLSZ),
Swap (SWCUR, SWR/s, SWW/s),
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
APPENDIX B: OPENSTACK KEYSTONE
MONITORING
The Keystone component is critical to overall performance of IBM Cloud Orchestrator. For
example, if one component saturates Keystone, the overall throughput of the system will
be impacted. This is magnified by the fact that Keystone has only a single execution
thread instance. In order to understand Keystone performance, the best method is to look
at the requests and responses via a proxy such as the IaaS Gateway. This provides the
ability to see requests that are dropped before being processed by Keystone.
We will describe an approach for monitoring Keystone via the PvRequestFilter.
B.1
PvRequestFilter
The PvRequestFilter was designed to output request and response data into the Keystone
log. When enabled it prints the data as warning messages, so it is not necessary to turn
up the default debug level to generate the log messages.
The format of the messages is as follows. All fields except “<duration>” are printed out
for both requests and responses. The duration of the request is printed only for the
response.
WARNING [REQUEST|RESPONSE] <millisecond timestamp to identify request>
<REMOTE_ADDR>:<REMOTE_PORT> <REQUEST_METHOD> <RAW_PATH_INFO> [<duration>]
Figure 48: Keystone Monitoring PvRequestFilter Format
Sample output follows.
2014-07-21 17:16:56.509 22811 WARNING keystone.contrib.pvt_filter.request [-]
REQUEST 2014-07-21_17:16:56.509 172.18.152.103:1278 GET /v3/users
2014-07-21 17:16:56.785 22811 WARNING keystone.contrib.pvt_filter.request [-]
RESPONSE 2014-07-21_17:16:56.509 172.18.152.103:1278 GET /v3/users 0.276294
2014-07-21 17:16:56.807 22811 WARNING keystone.contrib.pvt_filter.request [-]
REQUEST 2014-07-21_17:16:56.807 172.18.152.103:1278 GET /v3/domains
2014-07-21 17:16:56.824 22811 WARNING keystone.contrib.pvt_filter.request [-]
RESPONSE 2014-07-21_17:16:56.807 172.18.152.103:1278 GET /v3/domains 0.017691
2014-07-21 17:16:56.839 22811 WARNING keystone.contrib.pvt_filter.request [-]
REQUEST 2014-07-21_17:16:56.839 172.18.152.103:1279 GET
/v3/users/e92b94d7068843ef98d664521bd9c983/projects
2014-07-21 17:16:56.868 22811 WARNING keystone.contrib.pvt_filter.request [-]
RESPONSE 2014-07-21_17:16:56.839 172.18.152.103:1279 GET
/v3/users/e92b94d7068843ef98d664521bd9c983/projects 0.028558
Figure 49: Keystone Monitoring PvRequestFilter Sample Output
B.2
Enabling PvRequestFilter
The process to enable PvRequestFilter follows.
1. Log onto Central Server 2.
45
2. Extract the distribution provided with this paper ( keystoneStats.zap).
3. Install the filter and backup the existing configuration:
./deployKeystoneFilter.sh
4. Make the following changes to the “/etc/keystone/keystone.conf” file.
Note: Reversing step 2 will disable the filter.
a. Add the following lines just above line starting with " [filter:debug]".
[filter:pvt]
paste.filter_factory =
keystone.contrib.pvt_filter.request:PvtRequestFilter.factory
b. Add "pvt" to three of the pipeline statements:
[pipeline:public_api]
pipeline = access_log sizelimit url_normalize token_auth
admin_token_auth xml_body json_body simpletoken ec2_extension
user_crud_extension pvt public_service
[pipeline:admin_api]
pipeline = access_log sizelimit url_normalize token_auth
admin_token_auth xml_body json_body simpletoken ec2_extension
s3_extension crud_extension pvt admin_service
[pipeline:api_v3]
pipeline = access_log sizelimit url_normalize token_auth
admin_token_auth xml_body json_body simpletoken ec2_extension
s3_extension pvt service_v3
c.
Restart the keystone service.
service openstack-keystone restart
d. Validate that the “/var/log/keystone/keystone.log” is producing the
appropriate log messages (sample below).
e. Update the “hosts.table” file to reflect your environment.
f.
Run the workload or scenario for analysis.
g. Generate the statistics for the request and response data in the
“keystone.log” file (sample below):
./keystoneStats.sh /var/log/keystone/keystone.log > results
Figure 50: Keystone Monitoring Log Messages Example
Figure 51: Keystone Monitoring Statistics Example
46
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
REFERENCES
IBM Cloud Orchestrator and Related Component References
IBM Cloud Orchestration Knowledge Center
ICO 2.4 Knowledge Center
IBM Cloud Orchestrator Resource Center
ICO Resource Center
IBM Cloud Orchestrator Version 2.4: Security Hardening Guide
http://www.ibm.com/software/ismlibrary?NavCode=1TW10SO85
SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide
http://www.ibm.com/software/ismlibrary?NavCode=1TW10SO7P
SmartCloud Orchestrator Version 2.3: Security Hardening Guide
http://www.ibm.com/software/ismlibrary?NavCode=1TW10SO7W
IBM Cloud Orchestrator Version 2.3: Database Movement Cookbook
http://www.ibm.com/software/ismlibrary?NavCode=1TW10SO8T
IBM SmartCloud Orchestrator: Offline-backup approach using Tivoli Storage Manager for Virtual
Environments
http://www.ibm.com/software/ismlibrary?NavCode=1TW10SO7Q
IBM Business Process Manager V8.0 Performance Tuning and Best Practices
http://www.redbooks.ibm.com/redpapers/pdfs/redp4935.pdf
IBM Business Process Manager Performance Data Warehouse
http://pic.dhe.ibm.com/infocenter/dmndhelp/v8r5m0/topic/com.ibm.wbpm.admin.doc/topics/
managing_performance_servers.html
IBM Tivoli Monitoring Information Center
http://publib.boulder.ibm.com/infocenter/tivihelp/v15r1/topic/com.ibm.itm.doc_6.2.3fp1/welc
ome.htm
IBM DB2 10.5 Knowledge Center
DB2 10.5 Knowledge Center
OpenStack References
OpenStack Performance Presentation (Folsom, Havana, Grizzly)
http://www.openstack.org/assets/presentation-media/openstackperformance-v4.pdf
OpenStack Ceilometer
http://docs.openstack.org/developer/ceilometer
OpenStack Rally
https://wiki.openstack.org/wiki/Rally
47
Hypervisor References
Performance Best Practices for VMware vSphere™ 5.0
http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf
Performance Best Practices for VMware vSphere™ 5.1
http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.1.pdf
VMware: Troubleshooting virtual machine performance issues
VMware Knowledge Base
VMware: Performance Blog
http://blogs.vmware.com/vsphere/performance
Linux on System x: Tuning KVM for Performance
KVM Performance Tuning
Kernel Virtual Machine (KVM): Tuning KVM for performance
http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaat/liaattuning_pdf.pdf
PowerVM Virtualization Performance Advisor
Developer Works PowerVM Performance
IBM PowerVM Best Practices
http://www.redbooks.ibm.com/redbooks/pdfs/sg248062.pdf
Benchmark References
Report on Cloud Computing to the OSG Steering Committee, SPEC Open Systems Group,
https://www.spec.org/osgcloud/docs/osgcloudwgreport20120410.pdf
48
IBM Cloud Orchestrator Version 2.4:
Capacity Planning, Performance, and Management Guide
®
© Copyright IBM Corporation 2015
IBM United States of America
Produced in the United States of America
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with
IBM Corp.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM
representative for information on the products and services currently available in your area. Any reference to an IBM
product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used.
Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be
used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program,
or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of
this document does not grant you any license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such provisions are
inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PAPER “AS IS” WITHOUT WARRANTY OF
ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow
disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes may be made periodically to the
information herein; these changes may be incorporated in subsequent versions of the paper. IBM may make improvements
and/or changes in the product(s) and/or the program(s) described in this paper at any time without notice.
Any references in this document to non-IBM Web sites are provided for convenience only and do not in any manner serve
as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product
and use of those Web sites is at your own risk.
IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of
this document does not give you any license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
4205 South Miami Boulevard
Research Triangle Park, NC 27709 U.S.A.
All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent
goals and objectives only.
This information is for planning purposes only. The information herein is subject to change before the products described
become available.
If you are viewing this information softcopy, the photographs and color illustrations may not appear.
49
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in
the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in
this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks
owned by IBM at the time this information was published. Such trademarks may also be registered or common law
trademarks in other countries. A current list of IBM trademarks is available on the web at "Copyright and trademark
information" at http://www.ibm.com/legal/copytrade.shtml.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other
countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
Other company, product, or service names may be trademarks or service marks of others.
50
					 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            