Download SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

Document related concepts

Oracle Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Ingres (database) wikipedia , lookup

Database wikipedia , lookup

Concurrency control wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Transcript
IBM® Cloud and Smarter Infrastructure Software
SmartCloud Orchestrator
Version 2.3:
Capacity Planning, Performance,
and Management Guide
Document version 2.3.6
IBM SmartCloud Orchestrator Performance Team
© Copyright International Business Machines Corporation 2014.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
CONTENTS
Contents ............................................................................................................................iii
List of Figures................................................................................................................... vi
Author List .......................................................................................................................viii
Revision History ............................................................................................................... ix
1
Introduction.......................................................................................................... 10
2
SmartCloud Orchestrator 2.3 Overview............................................................... 11
3
4
2.1
Functional Overview ................................................................................ 11
2.2
Architectural Overview............................................................................. 13
Performance Overview ........................................................................................ 16
3.1
Sample Benchmark Environment ............................................................ 16
3.2
Key Performance Indicators .................................................................... 18
3.2.1
Concurrent User Performance................................................................ 19
3.2.2
Provisioning Performance....................................................................... 22
Performance Benchmark Approaches................................................................. 24
4.1
Monitoring and Analysis Tools................................................................. 24
4.1.1
5
nmon Samples ........................................................................................ 25
4.2
Infrastructure Benchmark Tools............................................................... 27
4.3
Cloud Benchmarks .................................................................................. 27
Capacity Planning Recommendations................................................................. 28
iii
6
7
5.1
Cloud Capacity Planning Spreadsheet .................................................... 28
5.2
SmartCloud Orchestrator Management Server Capacity Planning ......... 29
5.3
Provisioned Virtual Machines Capacity Planning .................................... 30
Cloud Configuration Recommendations.............................................................. 34
6.1
OpenStack Keystone Cache Configuration ............................................. 34
6.2
OpenStack Keystone Worker Support..................................................... 34
6.3
IaaS Gateway Cluster Support ................................................................ 35
6.4
IBM Workload Deployer Configuration .................................................... 35
6.5
Virtual Machine IO Scheduler Configuration............................................ 36
6.6
Advanced Configuration and Power Interface Management ................... 36
6.7
Java Virtual Machine Heap Configuration ............................................... 37
6.8
Database Configuration ........................................................................... 37
Cloud Maintenance Recommendations............................................................... 39
7.1
SmartCloud Orchestrator Volume Management...................................... 39
7.1.1
Install Time Requirements ...................................................................... 39
7.1.2
Long Running System Requirements..................................................... 40
7.2
The SmartCloud Orchestrator Database and Schema Summary............ 43
7.3
Database Management ........................................................................... 43
7.3.1
DBMS Versions....................................................................................... 43
7.3.2
Automatic Maintenance .......................................................................... 43
7.3.3
Operating System Configuration (Linux) ................................................ 44
iv
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
7.4
8
Database Hygiene Overview ................................................................... 44
7.4.1
Database Backup Management ............................................................. 45
7.4.2
Database Statistics Management........................................................... 47
7.4.3
Database Reorganization ....................................................................... 48
7.4.4
Database Archiving................................................................................. 49
7.4.5
Database Maintenance Automation ....................................................... 50
Summary Cookbook ............................................................................................ 51
8.1
Base Installation Recommendations ....................................................... 51
8.2
Post Installation Recommendations ........................................................ 52
8.3
High Scale Recommendations ................................................................ 52
Appendix A: SmartCloud Orchestrator Monitoring Options............................................. 53
A.1
OpenStack Monitoring ............................................................................. 53
A.2
SmartCloud Orchestrator Monitoring ....................................................... 55
A.3
Infrastructure Monitoring.......................................................................... 55
Appendix B: OpenStack Keystone Monitoring ................................................................ 58
B.1
PvRequestFilter ....................................................................................... 58
B.2
Enabling PvRequestFilter ........................................................................ 59
Appendix C: IaaS Gateway Cluster Enablement............................................................. 61
References ...................................................................................................................... 64
v
LIST OF FIGURES
Figure 1: Revision History................................................................................................................. ix
Figure 2: SCO Functional Overview ................................................................................................ 11
Figure 3: SCO Cloud Marketplace View .......................................................................................... 12
Figure 4: SCO Architecture Reference Topology ............................................................................ 13
Figure 5: SCO Sample Benchmark Environment ............................................................................ 16
Figure 6: Benchmark Data Model Population .................................................................................. 20
Figure 7: Load Driving (User) Scenarios ......................................................................................... 22
Figure 8: Provisioning Performance in a Closed System ................................................................ 22
Figure 9: Monitoring and Analysis Tools.......................................................................................... 24
Figure 10: nmon Samples................................................................................................................ 26
Figure 11: Infrastructure Benchmark Tools ..................................................................................... 27
Figure 12: SCO Management Server Capacity Planning ................................................................ 29
Figure 13: Capacity Planning Tool: Inquiry Form ............................................................................ 30
Figure 14: Capacity Planning Tool: User Demographic Information ............................................... 31
Figure 15: Capacity Planning Tool: Systems and Storage .............................................................. 31
Figure 16: Capacity Planning Tool: System and Workload Options................................................ 32
Figure 17: Capacity Planning Tool: Virtual Machine Requirements ................................................ 32
Figure 18: Planning Tool: Confirmation Screen............................................................................... 32
Figure 19: Planning Tool: System Summary ................................................................................... 33
Figure 20: Keystone Worker Configuration...................................................................................... 35
Figure 20: IWD Configuration .......................................................................................................... 35
Figure 21: Modifying the IO Scheduler ............................................................................................ 36
Figure 22: Java Virtual Machine Heap Change Sets....................................................................... 37
Figure 23: Database Configuration Change Sets............................................................................ 38
Figure 24: SCO 2.3 Volume Management: Install Time Requirements .......................................... 39
Figure 25: Long Running System Requirements: System A ........................................................... 40
Figure 26: Long Running System Requirements: System B ........................................................... 41
Figure 27: Long Running System Requirements Summary ............................................................ 41
Figure 28: Database and Schema Summary................................................................................... 43
Figure 29: DBMS Versions .............................................................................................................. 43
Figure 30: Database Automatic Maintenance Configuration ........................................................... 44
Figure 31: Database Backup with Compression Command............................................................ 45
Figure 32: Database Offline Backup Restore .................................................................................. 45
vi
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
Figure 33: Database Online Backup Schedule................................................................................ 45
Figure 34: Database Incremental Backup Enablement ................................................................... 45
Figure 35: Database Online Backup Manual Restore ..................................................................... 46
Figure 36: Database Online Backup Automatic Restore ................................................................. 46
Figure 37: Database Log Archiving to Disk ..................................................................................... 46
Figure 38: Database Log Archiving to TSM..................................................................................... 46
Figure 39: Database Roll Forward Recovery: Sample A................................................................. 47
Figure 40: Database Roll Forward Recovery: Sample B................................................................. 47
Figure 41: Database Backup Cleanup Command ........................................................................... 47
Figure 42: Database Backup Automatic Cleanup Configuration ..................................................... 47
Figure 43: Database Statistics Collection Command ...................................................................... 47
Figure 44: Database Statistics Collection Table Iterator ................................................................. 48
Figure 45: Database Reorganization Commands ........................................................................... 48
Figure 46: Database Reorganization Table Iterator ........................................................................ 48
Figure 47: Database Archiving Impact............................................................................................. 49
Figure 48: Sample Database Maintenance Schedule ..................................................................... 50
Figure 49: Sample Database Maintenance Crontab Entry .............................................................. 50
Figure 50: Base Installation Recommendations .............................................................................. 51
Figure 51: Post Installation Recommendations ............................................................................... 52
Figure 52: High Scale Recommendations ....................................................................................... 52
Figure 53: OpenStack Ceilometer Metrics....................................................................................... 53
Figure 54: OpenStack Ceilometer Core Metrics .............................................................................. 54
Figure 55: Infrastructure Core Metrics ............................................................................................. 57
Figure 56: Keystone Monitoring PvRequestFilter Format................................................................ 58
Figure 57: Keystone Monitoring PvRequestFilter Sample Output................................................... 58
Figure 58: Keystone Monitoring Log Messages Example ............................................................... 59
Figure 59: Keystone Monitoring Statistics Example ........................................................................ 60
vii
AUTHOR LIST
This paper is the team effort of a number of cloud performance specialists comprising the
SmartCloud Orchestrator performance team. Additional recognition goes out to the entire
SmartCloud Orchestrator and OpenStack development teams.
Mark Leitch
(primary contact for this paper)
IBM Toronto Laboratory
Amadeus Podvratnik
Marc Schunk
Peter Altevogt
IBM Boeblingen Laboratory
Nate Rockwell
IBM USA
Tiarnán Ó Corráin
IBM Ireland
viii
Alessandro Chiantera
Giorgio Corsetti
Massimo Marra
Michele Licursi
Paolo Cavazza
Ugo Madama
IBM Rome Laboratory
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
REVISION HISTORY
Date
Version
st
February 1 , 2014
Revised By
Comments
Draft
MDL
Initial version for review.
rd
2.3.0
MDL
Initial version for distribution.
th
February 28 , 2014
2.3.1
MDL
Update based on review comments.
th
March 18 , 2014
2.3.2
MDL
Volume management update based on SCO 2.3.0.1
delivery. Addition of monitoring points in Appendix A.
March 27th, 2014
2.3.3
MDL
Added maintenance crontab samples and scripts.
February 23 , 2014
th
April 8 , 2014
2.3.4
MDL
Added IWD configuration options.
th
2.3.5
MDL
Added Keystone monitoring reference material.
th
2.3.6
MDL
Added Keystone worker, IaaS gateway cluster material.
August 20 , 2014
August 28 , 2014
Figure 1: Revision History
ix
1
Introduction
Capacity planning involves the specification of the various components of an installation to
meet customer requirements, often with growth or timeline considerations. A key aspect of
capacity planning for cloud, or virtualized, environments is the specification of sufficient
physical resources to provide the illusion of infinite resources in an environment that may
be characterized by highly variable demand. This document will provide an overview of
capacity planning for the IBM SmartCloud Orchestrator (SCO) Version 2.3. In addition, it
will offer management best practices to achieve a well performing installation that
demonstrates service stability.
SCO Version 2.3 offers end to end management of service offerings across a number of
cloud technology offerings including VMware, Kernel-based Virtual Machine (KVM), IBM
PowerVM, and IBM System z. A key implementation aspect is integration with OpenStack,
the de facto leading open virtualization technology. OpenStack offers the ability to control
compute, storage, and network resources through an open, community based architecture.
In this document we will provide an SCO 2.3 overview, including functionality, architecture,
and performance. We will then offer the capacity planning recommendations, including
considerations for hardware configuration, software configuration, and cloud maintenance
best practices. A summary “cookbook” is provided to manage installation and
configuration for specific instances of SCO.
Note: This document is considered a work in progress. Capacity planning
recommendations will be refined and updated as new SCO releases are available. While
the paper in general is considered suitable for all SCO Version 2.3 releases, it is best
oriented towards SCO Version 2.3.0.1. In addition, a number of references are provided in
the References section. These papers are highly recommended for readers who want
detailed knowledge of SCO server configuration, architecture, and capacity planning.
Note: Some artifacts are distributed with this paper. The distributions are in zip format.
However Adobe protects against files with a “zip” suffix. As a result, the file suffix is set to
“zap” per distribution. To use these artifacts, simply rename the distribution to “zip” and
process as usual.
10
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
2
SmartCloud Orchestrator 2.3 Overview
An overview of SCO Version 2.3 will be provided from the following perspectives:
1. Functional
2. Architectural
2.1
Functional Overview
The basic functional capability of SCO involves the management of cloud computing
resources for dynamic data centers. The following figure provides a functional (service
level) overview of SCO.
Figure 2: SCO Functional Overview
In a nutshell, SCO offers infrastructure, platform, and orchestration services that make it
possible to lower the cost of service delivery (both in terms of time and skill) while
delivering higher degrees of standardization and automation. A more detailed cloud
marketplace view of the SCO solution follows.
11
Figure 3: SCO Cloud Marketplace View
The core functional capabilities of SCO include the following.

Workflow Orchestration.
The Business Process Manager (BPM) component offers a standard library as well
as a graphical editor for workflow orchestration. Overall, this provides a powerful
mechanism for complex and custom business process in the cloud context.

Pattern Management.
The IBM Workload Deployer (IWD) offers sophisticated pattern support for
deploying multi node applications that may consist of complex middleware. Once
again, graphical editor support for pattern management is provided.

Image Management.
This is comprised of an image construction and composition tool, as well as a
Virtual Image Library (VIL) to facilitate image development and reduce image
sprawl.

Service Management.
Service management options are available in the SCO Enterprise edition. It
provides a set of management utilities to further facilitate business process
management.

Not shown in the diagram is a Scalable Web Infrastructure to facilitate cloud self
service offerings. For more information please consult the SCO information center
(URL). In addition, the SCO resource center is available (URL).
12
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
2.2
Architectural Overview
The following diagram shows the reference deployment topology for SCO. A description of
the reference topology follows.
Figure 4: SCO Architecture Reference Topology
13
The core of the reference topology is based on a core set of virtual machines:

Central Server 1.
This server hosts the DB2 Database Management System (DBMS). The
performance of the DBMS is critical to the overall solution and is dealt with
extensively in Section 7.3.

Central Server 2.
This server hosts OpenStack Keystone, providing identity, token, catalog, and
policy services. In addition, it hosts the Virtual Image Library (VIL) and SCO
gateway services. The most critical aspect of this server is managing the
Keystone configuration as described in Section 6.1.

Central Server 3.
This server hosts the IBM Workload Deployer pattern engine and the Scalable
Web UI. Performance configuration of these components is described in Section
6.

Central Server 4.
This server hosts the Business Process Manager engine. Performance
configuration of these components is described in Section 6.

Central Server 5.
This server hosts the System Automation Application Manager. This is an optional
virtual machine that can be used to manage automatic start and stop orchestration
of the SCO management server itself.
Associated with these core server virtual machines are a number of region servers.
Region servers may represent a specific cluster or geographic zone of cloud compute
nodes. Sample compute nodes are shown for VMware, KVM, and PowerVM, with
associated communication paths. For example, for VMware the SCE driver is used to
drive the operation of the VMware cluster. For KVM, the OpenStack control node is used
to coordinate the KVM instance.
Given this is a virtual implementation, some considerations should be kept in mind:
 In general, it is more difficult to manage performance in a virtual environment due to
the additional hypervisor management overhead and system configuration.
 Device parallelism via dedicated storage arrays/LUNs is preferred. Sample
approaches, from most impactful to least impactful, are provided below.
o
Separate data stores for “managed from” and “managed to” environments.
o
Spread data stores across several physical disks to maximize storage
capability.
o
Separate data stores for image templates and provisioned images.
o
Employ the “deadline” or “noop” scheduler algorithm for management server
and provisioned VMs (see Section 6.5).
o
Optimize base storage capability (i.e. SSD with “VMDirectPath”
enablement for VMware). Servers where this may be critical, due to their
14
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
dependency on disk IO capabilities, are Central Server 1 and the VMware
vCenter instances.
 Network optimization, for example 10GbE adoption. In addition, segment customer
networks to an acceptable level to reduce address lookup impact.
15
3
Performance Overview
There are two distinct aspects of cloud performance:
1. Performance of the SCO management server itself.
This is the primary focus of this section.
2. Performance of the provisioned server instances.
This is more of a capacity planning statement, and is covered in Section 5.3.
We will provide a general overview of the Key Performance Indicators (KPIs) for the SCO
management server. The following sections will describe the general benchmark
environment, and the associated KPIs.
3.1
Sample Benchmark Environment
The following figure shows a sample configuration that has been used for SCO
benchmarks.
Figure 5: SCO Sample Benchmark Environment
16
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
The environment is characterized by the following features, broken down in terms of the
SCO management server (aka “managed from”) and the associated cloud (aka “managed
to”).

Managed from:
o
o

Server configuration:

4/5 HS22V Blades with 2 x 4 cores Intel Xeon x5570 2.93 GHz.

8 physical cores per blade,
16 logical cores when hyper-threading is enabled.

72 GB RAM per blade.

2 x Redundant 10G Ethernet Networking (Janice HSSM).

2 x Redundant 8G FC Network (Qlogic FC SM).
Storage configuration: 1 x DS3400 with 4 Exp with 12 Disk 600 GB SAS
10K each (48 x 600 GB = 28.8 TB raw).
Managed to:
o
Server configuration:

Tens of HS22V Blades with 2 x 6 cores Intel Xeon x5670 2.93
GHz.

12 physical cores per blade,
24 logical cores when hyper-threading is enabled.

72 GB RAM per blade.

2 x Redundant 10G Ethernet Networking (Janice HSSM).

2 x Redundant 8G FC Network (Qlogic FC SM).
o
Storage configuration: 1 x Storwize v7000 with 3 Exp with 12 Disks 2 TB
NL-SAS 7.2k each (36 x 2 TB = 72 TB raw).
o
Storage access has been configured to use the multi-path access granted
by Storwize. In particular, VMware ESXi servers have been configured to
use all of the 8 active paths to access LUNs using a round robin policy.
17
3.2
Key Performance Indicators
The following Key Performance Indicators are managed for SCO through a set of
comprehensive benchmarks.
1. Concurrent User Performance, comprising:
a. Average response time for SCO pages related to administrative tasks.
b. Average response time for SCO pages related to end user tasks.
2. Provisioning throughput, comprising:
a. Provisioning throughput for a vSys with a single part.
b. Average service time for provisioned VMs.
3. LAMP (Linux, Apache, MySQL, Python) stack performance, comprising:
a. vApp deployment time.
b. vApp stop time.
c.
vApp deletion time.
4. Bulk windows stack performance comprising vSys with multiple parts (15 VMs)
provisioning time.
5. Virtual Image Library performance comprising:
a. Registration discovery throughput.
b. Registration basic indexing throughput.
c.
Image checkin time.
d. Image checkout time.
A key aspect of the benchmarks is they are run with associated background workloads and
for a long duration (e.g. weeks or months). The rationale behind this is very simple: to run
benchmarks that closely emulate the customer experience and will drive “real world”
results (versus overly optimistic lab based results). We will describe the concurrent user
and provisioning throughput KPIs in more detail.
18
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
3.2.1 Concurrent User Performance
SCO User Interface performance is established through concurrent user benchmark tests.
In order to understand the applicability of such a benchmark, it is important to understand
what is meant by a concurrent user. Consider:

P = total population for an instance of SCO (including cloud administrators, end
users, etc.).

C = the concurrent user population for an instance of SCO. Concurrent users are
considered to be the set of users within the overall population P that are actively
managing the cloud environment at a point in time (e.g. administrator operations in
the User Interface, provisioning operations, etc.).
In general, P is a much larger value than C (i.e. P >> C). For example, it is not unrealistic
that a total population of 200 users may have a concurrent user population of 40 users (i.e.
20%).
For the concurrent user workload driven for SCO, there are three sets of criteria that drive
the benchmark:
1. Load driving parameters.
2. Data population.
3. Load driving (user) scenarios.
Load Driving Parameters
The following load driving parameters apply.
1. User transaction rate control.
The frequency that simulated users drive actions against the back end is managed
via loop control functions. Closed loop simulation approaches are used where a
new user will enter the system only when a previous user completes. Through the
closed loop system, steady state operations under load may be driven.
2. Think times.
Think times are the “pause” between user operations, meant to simulate the
behavior of a human user. The think time interval used is [100%,300%] (meaning,
the replay via the load driver is up to three times the rate of the scenario recording
rate).
3. Bandwidth throttling.
In order to simulate low speed or high latency lines, bandwidth throttling is
employed for some client workloads. The throttle is set to a value that represents
a moderate speed ADSL connection (cable/DSL simulation setting of 1.5 Mbps
download, 384 Kbps upload).
19
Data Population Parameters
The benchmark is run against a data model that represents a large scale customer
environment. The following table shows a sample configuration where the system is
populated with data to represent a large number of users, active Virtual System instances,
and active Virtual Machines existing prior to SCO installation. Through this approach, the
workload for managing the solution is representative of some customer environments.
Benchmark Parameter
Value
Cloud Administrators
Cloud Domains
Tenants
Users
1
1
1
200
1
(VMware)
1
1
40
(20Linux, 20 Windows)
20 + 1
(20 Linux vSys patterns,
1 bulk Windows pattern)
1
(LAMP vApp for VMware domain)
5
(1 flavor for RHEL, 3 flavors for
Windows, 1 flavor for vApp)
20
(1 per Linux vSys Pattern)
400
(10 per image template 
200 Linux, 200 Windows)
Hypervisor Types
Cloud Groups
Environment Profile
Image Templates
vSys Patterns
vApp Patterns
Flavors
Active vSys instances
Standalone (Unmanaged) VMs
Figure 6: Benchmark Data Model Population
20
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
Load Driving (User) Scenarios
The concurrent user population (i.e. C) is broken down into the following user profile
distribution and scenarios.
User Profile
Number of Users: 20 (50% overall)
User Type: End User
Task Type: VM Provisioning
Activity: vSys with single part
(Linux) provisioning through SelfService Catalog (SSC) offering on
VMware.
Number of Users: 16 (40% overall)
User Type: End User
Task Type: User Management
Activity: End user operations
through Self-Service Catalog
(SSC) offering.
Number of Users: 2 (5% overall)
User Type: Administrator
Task Type: Monitoring
Activity: Administrative operations
through the IBM Workload
Deployer user interface.
Number of Users: 1 (2.5% overall)
User Type: End User
Task Type: Provisioning
Activity: vApp (LAMP) provisioning
through IBM Workload Deployer
user interface on VMware.
Number of Users: 1 (2.5% overall)
User Type: End User
Task Type: Provisioning
Scenario per User
1. Login.
2. Provision vSys single part using SSC
offering.
3. Wait until available.
4. Go to the vSys instance details page.
5. Delete vSys using SSC offering.
6. Wait until deletion complete.
7. Logout.
8. Enter next cycle according to arrival rate.
1. Login.
2. Submit SSC offering "Create User in VM",
selecting one of the VMs belonging to one of
the pre-populated vSys.
3. Wait until done.
4. Submit SSC offering "Delete User in VM",
selecting the same VM.
5. Wait until done.
6. Logout.
7. Enter next cycle according to arrival rate.
1. Login.
2. List hypervisors.
3. Select a hypervisor.
4. List VMs in hypervisor.
5. Show all instances.
6. Go to "My Requests".
7. Sort the requests by status.
8. View the trace log.
9. Logout.
1.
2.
3.
4.
5.
6.
7.
8.
9.
Login.
Provision vApp using the IWD UI.
Wait until available.
Stop vApp using the IWD UI.
Wait until done.
Delete vApp using the IWD UI.
Wait until deletion complete.
Logout.
Enter next cycle according to arrival rate.
1. Login.
2. Provision vSys bulk Windows using SSC
offering.
3. Wait until available.
4. Go to vSys instance details page.
5. Delete vSys bulk Windows using SSC
21
Activity: vSys with multiple parts
(bulk Windows) provisioning
through Self-Service Catalog
(SSC) offering on VMware.
offering.
6. Wait until deleted.
7. Logout.
8. Enter next cycle according to arrival rate.
Figure 7: Load Driving (User) Scenarios
In overall terms, 55% of the load driving activities are driving Virtual Machine provisioning
scenarios. The remaining 45% of scenarios are general administration and management
tasks. For the active workload, the user operations meet the following response time
thresholds.

Administrative page response times: 90% of pages < 10s, 100% of pages < 15s.

End user operations: 90% of pages < 2s, 100% of pages < 5s.
3.2.2 Provisioning Performance
Cloud provisioning is enormously complex in performance terms. Hardware configuration,
user workloads, image properties, and a multitude of other factors combine to determine
overall capability. SCO provisioning performance is typically measured via a closed
system, defined as an isolated system where we can demonstrate a constant sustained
provisioning workload. In order to achieve this, as requests complete within the system,
new requests are initiated.
Figure 8: Provisioning Performance in a Closed System
The performance systems running SCO workloads literally run for months. These systems
are treated like customer systems with 24x7 operations and field ready maintenance
approaches in place (as described in Section 7). In terms of provisioning performance, the
following are sample statistics a long run scenario driven for a number of weeks, once a
period of operational stability has been reached based on the recommendations provided
in this paper.

Number of systems provisioned: 172,536 VMs.

Provisioning rate (average): 187 VMs/hour.

Service times (average): 3 minutes 28 seconds (IBM Workload Deployer with
VMware linked clones).

Workflow capability: On the order of 300 workflows per hour (generally short
running workflows under a minute in duration).
22
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide

Success rate: 99.996%
Given this is sustained, continuous workload, higher peak workloads are, of course,
possible. The success rate is considered especially noteworthy.
23
4
Performance Benchmark Approaches
As part of cloud management and capacity planning, it is valuable to manage cloud
benchmarks. Value propositions include:

Understanding the capability of the cloud infrastructure (and potentially poorly
configured or under performing components of the infrastructure).

Understanding the base capability of the SCO implementation and associated
customization.

Understanding the long term performance stability of the system.
We will describe basic system monitoring approaches, infrastructure benchmarks, and
cloud benchmarks.
4.1
Monitoring and Analysis Tools
The following table shows the core recommended monitoring and analysis tools.
Tool
Description
pdcollect
SCO log collection tool.
Documentation and recommended invocation: SCO Product Information Center
esxtop
VMware performance collection tool.
Documentation: URL
Recommended invocation: esxtop -b -a -d 60 -n <number_of_samples> > <output file>
nmon
nmon is a comprehensive system monitoring tool for the UNIX platform. It is highly
useful for understanding system behavior.
Documentation: URL
Sample invocation: nmon -T -s <samplerate> -c <iterations> -F <output file>
Note: On Windows systems, Windows perfmon may be used.
db2support
Database support collection tool.
Documentation: URL
Recommended invocation: db2support <result directory> -d <database> -c -f -s -l
DBMS
Snapshots
WAIT
DBMS snapshot monitoring can offer insight into SQL workload, and in particular
expensive SQL statements.
Documentation: URL
Java WAIT monitoring can provide a non invasive view of JVM performance through
accumulated Java cores and analytic tools.
Documentation and recommended invocation: URL
Figure 9: Monitoring and Analysis Tools
24
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
4.1.1 nmon Samples
The following figures represent nmon samples for a 22 concurrent user scenario (based on
the user profiles in Section 3.2.1.
25
Figure 10: nmon Samples
Analysis of the samples follows.

The samples show the summary utilization (CPU, IO) for Central Servers 1 through
4, and the Region Server.

All servers have 8 vCPUs allocated, with the exception of Central Server 4, which
has 4 vCPUs.

In general, all nodes are consuming less than 1 vCPU. The exceptions are the
Region Server (≈1.6 vCPUs) and Central Server 3 (≈2.4 vCPUs). This reflects an
IBM Workload Deployer scenario.

For IO, the bulk of the IO workload is associated with the database node. This is
not surprising, and reinforces the recommendations for IO optimization on the
DBMS node.

While the summary view is valuable for an “at a glance” assessment, it is always
recommended to look at the fine grained results in nmon to ensure processor
utilization is healthy (e.g. minimal or no blocked processes, minimal or zero wait
times, healthy multi processor utilization).
26
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
4.2
Infrastructure Benchmark Tools
The following table shows some recommended infrastructure benchmark tools.
Tool
Description
iometer
I/O subsystem measurement and characterization tool for single and clustered
systems.
Documentation: URL
Recommended invocation: dynamo /m <client host name or ip>
iperf
TCP and UDP measurement and characterization tool that reports bandwidth, delay,
jitter, and datagram loss.
Documentation: URL
Recommended server invocation: iperf –s
Recommended client invocation #1: iperf -c <server host name or ip>
Recommended client invocation #2: iperf -c <server host name or ip> -R
UnixBench
UNIX measurement and characterization tool, with reference benchmarks and
evaluation scores.
Documentation: URL
Recommended invocation: ./Run
Figure 11: Infrastructure Benchmark Tools
4.3
Cloud Benchmarks
Cloud benchmarks should be based on enterprise utilization. Sample benchmarks that are
easy to manage include the following.
1. Single VM deployment times.
2. Small scale concurrent VM deployment times (e.g. 10 requests in parallel).
3. REST API response times.
It is recommended to establish a small load driver, record a baseline, and then use these
small benchmarks as a standard to assess ongoing cloud health. More complex
benchmarks, including client request monitoring approaches, may of course be
established.
For OpenStack specific benchmarks, OpenStack Rally may be leveraged (see the
References section for further detail). In addition, the Open Systems Group is involved in
cloud computing benchmark standards. A report, including the IBM CloudBench tool, is
available in the References section.
27
5
Capacity Planning Recommendations
We will provide capacity planning recommendations through three approaches.
5.1

Static planning via a spreadsheet approach.

Capacity planning for the SCO management server (aka the “managed from”
infrastructure).

Capacity planning for the provisioned Virtual Machines (aka the “managed to”
infrastructure).
Cloud Capacity Planning Spreadsheet
In order to provide a desired hardware and software configuration for an SCO
implementation, a wide range of parameters must be understood. The following questions
are usually relevant.
1. What operations are expected to be performed with SCO?
2. What are the average and peak concurrent user workloads?
3. What is the enterprise network topology?
4. What is the expected workload for provisioned virtual servers, and how do they
map to the physical configuration?
5. For the provisioned servers:
a. What is the distribution size?
b. What are the application service level requirements?
A capacity planning spreadsheet is attached to this paper (“SCO Capacity Planning Profile
v2.3.3.xlsx”). The spreadsheet may be used to provide a cloud profile for further sizing
activities (e.g. a capacity planning activity in association with the document authors).
28
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
5.2
SmartCloud Orchestrator Management Server Capacity
Planning
The SCO management server requirements are documented in the SCO Information
Center (URL). The summary table is repeated here for discussion purposes 1 .
Processor
(vCPUs)
Memory
(GB)
Storage
(GB)
Minimum
2 vCPUs
6 GB
100 GB
Recommended
4 vCPUs
12 GB
200 GB
Minimum
2 vCPUs
8 GB
141 GB
Recommended
4 vCPUs
12 GB
200 GB
Minimum
2 vCPUs
4 GB
80 GB
Recommended
4 vCPUs
8 GB
160 GB
Minimum
2 vCPUs
6 GB
50 GB
Recommended
2 vCPUs
8 GB
60 GB
n/a
n/a
n/a
Recommended
2 vCPUs
4 GB
20 GB
Minimum
2 vCPUs
4 GB
76 GB
Recommended
8 vCPUs
8 GB
160 GB
Minimum
10 vCPUs
28 GB
447 GB
Recommended
24 vCPUs
52 GB
800 GB
Server & Configuration
Central Server 1
Central Server 2
Central Server 3
Central Server 4
Central Server 5
(optional)
Minimum
Region Server
Totals
Figure 12: SCO Management Server Capacity Planning
While further qualifiers are available in the Information Center, some comments apply.
1

In general, the recommended vCPU and memory allocations should be met.

To determine the ratio of virtual to physical CPUs, monitoring of the production
system is required. For performance verification, a 1:1 mapping is used.

For the physical mapping, it is important to distinguish between “real” cores and
hyper threaded (HT) cores. External benchmarks suggest an HT core may yield
30% of the capability of a “real” core.
Provided values reflect the SCO 2.3.0.1 release.
29

5.3
The recommended storage amounts are highly subjective. For example, the
minimum recommendations are sufficient for performance verification systems
driven for months (with some minor exceptions). Recommended volume
management approaches are provided in Section 7.1.
Provisioned Virtual Machines Capacity Planning
Managing cloud workloads is typically driven as a categorization exercise where workload
“sizes” are used to determine the overall capacity requirements. A capacity planning tool is
available for managing the cloud workload sizes (URL). We will provide an overview of
using this tool.
The first step is to provide any relevant business value. In the absence of a defined
opportunity, simple “not applicable” entries may be given (per the sample below). Once
submitted, you must accept the usage agreement which will bring up the demographic
page.
Figure 13: Capacity Planning Tool: Inquiry Form
30
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
The demographic page simply asks for generic information about the submitter.
Figure 14: Capacity Planning Tool: User Demographic Information
When “Continue” is selected, then the systems and storage page is provided.
Figure 15: Capacity Planning Tool: Systems and Storage
31
Then the target system and associated utilization and Virtual Machine requirements are
selected. Note for the utilization we select 20% headroom to support peak cloud
workloads.
Figure 16: Capacity Planning Tool: System and Workload Options
At this point, the virtual machine requirements may be selected. Note a number of entries
may be added.
Figure 17: Capacity Planning Tool: Virtual Machine Requirements
A confirmation screen is then provided to finalize the capacity planning request.
Figure 18: Planning Tool: Confirmation Screen
32
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
The summary capacity planning recommendation is then provided. The summary details
the compute node, CPU, memory, and storage requirements based on the selected
configuration and associated workloads.
Figure 19: Planning Tool: System Summary
33
6
Cloud Configuration Recommendations
The SCO 2.3 and 2.3.0.1 offerings provide suitable configuration as part of the default
installation. However, there are some specific configuration aspects that may improve the
capability. The configuration points follow.
1. OpenStack Keystone cache.
2. OpenStack Keystone worker support.
3. IaaS Gateway cluster support.
4. IBM Workload Deployer configuration.
5. Virtual Machine IO scheduler.
6. Advanced Configuration and Power Interface (ACPI) management.
7. Java Virtual Machine heap.
8. Database.
6.1
OpenStack Keystone Cache Configuration
SCO is deployed with a default two gigabyte cache for the Keystone cache (aka
“memcached”) configuration. The intent of the cache is to provide an in memory repository
of Keystone tokens to improve system throughput, particularly under concurrent workloads.
Assuming there exists sufficient memory on the Keystone VM (Central Server 2), the
recommendation is to double the cache configuration to four (4) gigabytes. Instructions on
how to modify the cache setting are provided here.
An appendix is provided that offers guidance on low level Keystone monitoring to
determine health and throughput capability.
6.2
OpenStack Keystone Worker Support
The initial SCO 2.3 offering contains a Keystone implementation that is characterized by a
single execution thread instance. Improvements have been made to exploit multiple
concurrent Keystone workers. This change is generally advised when Keystone exhibits
high request latency, or is seen to consume a significant amount of a virtual CPU (e.g. >
80%). In order to exploit this support, two steps are required.
1. Obtain the required SCO 2.3 limited availability fix or fixpack. The authors of this
paper may be contacted for further detail (this paper will be revised upon official
availability).
2. Revise the configuration to exploit multiple workers. Further detail on this is
provided below.
34
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
With the Keystone Worker improvement in place, the following configuration change will
allow a pool of four public workers, and four administrative workers. This will permit
increased concurrency, at the expense of virtual CPU consumption. As a result, the virtual
CPU allocation should be increased based on monitoring data. In the “4+4” worker
example below, it is expected to increase the virtual CPU allocation on the order of two to
four virtual CPUs.
Location: (Central Server 2)
/etc/keystone/keystone.conf
# The number of worker processes to serve the public WSGI application
# (integer value).
public_workers=4
# The number of worker processes to serve the admin WSGI application
#(integer value).
admin_workers=4
Figure 20: Keystone Worker Configuration
6.3
IaaS Gateway Cluster Support
Similar to the Keystone worker support in the previous section, the IaaS Gateway cluster
support permits the deployment of a scalable cluster of IaaS Gateway instances to drive
greater concurrency and reduce latency. In order to exploit this support, two steps are
required.
1. Obtain the required SCO 2.3 limited availability fix or fixpack. The authors of this
paper may be contacted for further detail (this paper will be revised upon official
availability).
2. Implement the cluster. See Appendix C for further details.
Similar to the Keystone worker support, the IaaS Gateway cluster will drive additional
virtual CPU utilization. It is expected to monitor and increase the virtual CPU allocation
based upon system load.
6.4
IBM Workload Deployer Configuration
The IWD component offers a number of configuration options. One specific option
provides the ability to control a polling interval to refresh cloud information. Based on the
size of the cloud, this configuration option should be changed.
Location: (Central Server 3)
/opt/ibm/rainmaker/purescale.app/private/expanded/ibm/rainmaker.vmsupport4.0.0.1/config/vmpublish.properties
Original:
RuntimeInterval=12000
Recommended: RuntimeInterval=30000
Figure 21: IWD Configuration
35
6.5
Virtual Machine IO Scheduler Configuration
Each Linux instance has an IO scheduler. The intent of the IO scheduler is to optimize IO
performance, potentially by clustering or sequencing requests to reduce the physical
impact of IO. In a virtual world, however, the operating system is typically disassociated
from the physical world through the hypervisor. As a result, it is recommended to alter the
IO scheduler algorithm so that it is more efficient in a virtual deployment, with scheduling
delegated to the hypervisor.
The default scheduling algorithm is typically “cfq” (completely fair queuing). Alternative and
recommended algorithms are “noop” and “deadline”. The “noop” algorithm, as expected,
does as little as possible with a first in, first out queue. The “deadline” algorithm is more
advanced, with priority queues and age as a scheduling consideration. System specific
benchmarks should be used to determine which algorithm is superior for a given workload.
In the absence of available benchmarks, we would recommend the “deadline” scheduler be
used.
The following console output shows how to display and modify the IO scheduler algorithm
for a set of block devices. In the example, the “noop” scheduler algorithm is set. Note to
ensure the scheduler configuration persists, it should be enforced via the operating system
configuration (e.g. /etc/rc.local).
Figure 22: Modifying the IO Scheduler
6.6
Advanced Configuration and Power Interface Management
The Advanced Configuration and Power Interface (ACPI) operating system support may
exhibit high virtual CPU utilization and offers limited value in virtual environments. It is
recommended to disable ACPI on the SCO “managed from” nodes through the following
steps.
1.
Disabling “kacpid”.
To switch off the kernel ACPI daemon, edit “/etc/grub.conf” and append "acpi=off" to
the kernel boot command line. For example:
title Red Hat Enterprise Linux (2.6.32-431.el6.x86_64)
root (hd0,0)
kernel /boot/vmlinuz-2.6.32-431.el6.x86_64 ro root=UUID=e1131bc1-bdbc-4b2e9ae7-d540b32b1f35
initrd /boot/initramfs-2.6.32-431.el6.x86_64.img
becomes:
36
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
title Red Hat Enterprise Linux (2.6.32-431.el6.x86_64)
root (hd0,0)
kernel /boot/vmlinuz-2.6.32-431.el6.x86_64 ro root=UUID=e1131bc1-bdbc-4b2e-9ae7d540b32b1f35 acpi=off
initrd /boot/initramfs-2.6.32-431.el6.x86_64.img
2.
Disabling the user-space acpi daemon.
To disable user space ACPI on managed-from nodes:
chkconfig acpid off
3. Reboot the nodes.
6.7
Java Virtual Machine Heap Configuration
The default Java Virtual Machine (JVM) heap sizes are intended to be economical.
However, in the presence of sufficient available memory, it is recommended to increase
the heap allocation. The three change sets below are recommended for application. They
apply to Central Server 3 and, in particular, the IBM Workload Deployer instance. The IWD
instance should be restarted once the changes are complete.
Location:
/opt/ibm/rainmaker/purescale.app/config/overrides.config
Original:
/config/zso/jvmargs = ["-Xms1024M","-Xmx1024M"]
Recommended: /config/zso/jvmargs = ["-Xms1536M","-Xmx1536M"]
Location:
/etc/rc.d/init.d/iwd-utils
Original:
sed -i -e 's/3072M/1024M/g' $ZERO_DIR/config/overrides.config
Recommended: sed -i -e 's/3072M/1536M/g' $ZERO_DIR/config/overrides.config
Location:
/opt/ibm/rainmaker/purescale.app/config/zero.config
Original:
"-Xms1024M","-Xmx1024M"
Recommended: "-Xms1536M","-Xmx1536M"
Figure 23: Java Virtual Machine Heap Change Sets
6.8
Database Configuration
SCO is deployed with a DB2 database. The performance of the database is critical to the
overall capability of the solution. The following database configuration changes are
recommended for a base SCO 2.3 installation. Note some configuration changes should
be in place for a SCO 2.3.0.1 installation, as noted. As a result, these specific steps are
optional depending on the specific version deployed.
Type
Configuration
Configuration
For each relevant database (see Section 7.2) set:
STMT_CONC = LITERALS
LOCKTIMEOUT = 60
NUM_IOCLEANERS = AUTOMATIC
NUM_IOSERVERS = AUTOMATIC
AUTO_REORG = ON
37
For example:
db2 UPDATE DB CFG FOR OPENSTAC USING LOCKTIMEOUT 60
Index Addition
A number of OpenStack database indexes are required.
Please apply the “SCO_CREATE_INDEXES.sh” script provided with this
paper.
Note an “SCO_DROP_INDEXES.sh” script is provided in the event it is
desired to drop the indexes.
Foreign Key
Modification
An OpenStack foreign key should be modified to enable cascading
deletes. Please apply the “SCO_MODIFY_FKEY.sh” script provided with
this paper.
Figure 24: Database Configuration Change Sets
38
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
7
Cloud Maintenance Recommendations
We will describe recommended maintenance approaches for the SCO file systems
volumes and the DB2 Database Management System.
7.1
SmartCloud Orchestrator Volume Management
We will outline the SCO 2.3 volume management requirements. We will first describe the
install time requirements, and then the requirements for a long running system.
7.1.1 Install Time Requirements
The following table describes the SCO volume requirements, both overall and installation
time free space requirements 2 . The overall requirements are useful for initial hardware
allocations. The free space requirements are part of the installer pre-requisite checks. The
intent is to ensure basic system health for the minimal set of file systems (i.e. ‘/’ and
‘/home’).
Volume Requirements (GB)
Server
Central
Server 1
Central
Server 2
Central
Server 3
Central
Server 4
Central
Server 5
Region
Server
vCPUs
RAM
(GB)
Overall
Free Space: ‘/’
Free Space:
‘/home’
2
6
100
75
19
2
8
141 3
55
30
2
4
80
70
4
2
6
50
40
4
2
4
20
20
n/a 4
2
4
76
40
30
Figure 25: SCO 2.3 Volume Management: Install Time Requirements
Some comments on the installation requirements:

These are the minimum installation requirements. The minimum and
recommended requirements are provided in the product information center (URL).

The root requirement excludes the home requirement.
2
Referenced requirements are for the SCO 2.3.0.1 release.
3
Also requires 10GB and 40GB in the /opt and /tmp file systems, respectively.
4
Central Server 5 is an optional component. It is not managed as part of the installation pre-
requisite check and is listed here for completeness.
39

The /home file system on Central Server 2 and the Region Server is primarily
consumed by the /home/library directory of the Virtual Image Library. This path
may be symbolic linked to an external volume to simplify image volume
management.

It should be noted there is a gap between the overall numbers and the free space
numbers reported. This is the result of the following factors.
o
The overall numbers describe the volume requirements at the hardware
level, prior to base operating system installation.
o
The installer pre-requisite check is dealing with an installed system (i.e.
post base operating system installation). As a result, approximately 6 GB
is expected to be consumed by the base installation and related artifacts.
Once this is factored in, the numbers align.
7.1.2 Long Running System Requirements
While the installation requirements are useful, the true management aspect arises from a
system under load for a significant period of time. The following tables show fine grained
disk requirements for systems running continuous workloads (the so called “24 x 7”
workloads) for months.
Volume Size in MB
Volume
/bin/
/boot/
/data/
/drouter/
/etc/
/home/
/iaas/
/lib/
/lib64/
/opt/
/root/
/sbin/
/tmp/
/usr/
/var/
Central
Server 1
10
27
11273
35
131153
8
138
28
2738
6
15
4
3250
521
Central
Server 2
Central
Server 3
10
27
35
24
146
32
4075
2
18
142
3587
399
8
27
23403
35
1
135
28
1203
2
15
27
3048
672
Central
Server 4
10
27
35
1
140
28
6444
2
15
157
3062
186
Region
Server
10
27
1820
36
23
7
129
28
611
1699
18
67
3556
3908
Figure 26: Long Running System Requirements: System A
Volume Size in MB
Volume
/bin/
/boot/
/data/
/drouter/
/etc/
Central
Server 1
10
27
11273
35
Central
Server 2
10
27
35
40
Central
Server 3
8
27
37263
35
Central
Server 4
10
27
35
Region
Server
10
27
1820
36
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
/home/
/iaas/
/lib/
/lib64/
/opt/
/root/
/sbin/
/tmp/
/usr/
/var/
173042
8
138
28
2738
2
15
67
3250
179
3154
146
32
6856
2
18
142
3588
13653
1
135
28
1204
2
15
90
3048
11406
1
140
28
6503
2
15
280
3062
1076
8240
7
129
28
611
447
18
152
3556
540
Figure 27: Long Running System Requirements: System B
This fine grained information is useful, but also a bit overwhelming. Let us look at a
summary view relative to the installation free space requirements. Please keep in mind the
free space requirements are typically 6GB off from the overall (hardware) requirement, but
we consider the finer grained values more useful for comparison purposes.
Volume Management (GB)
Server
Volume
Install Free
Space
System A
Utilization
System B
Utilization
‘/’
75
18
17
‘/home’
19
128
169
‘/’
55
8
24
‘/home’
30
<1
3
‘/’
70
28
52
‘/home’
4
<1
<1
‘/’
40
10
11
‘/home’
4
<1
<1
‘/’
40
12
7
‘/home’
30
<1
8
Central Server 1
Central Server 2
Central Server 3
Central Server 4
Region Server
Figure 28: Long Running System Requirements Summary
The summary view, in the context of the installation free space requirements shows some
surprising results.

The installation requirements are generally overstated. While there is some
factoring for maintaining large installation bundles, the values ensure long term
operational health (with some exceptions, described below).

For the Central Server 1, the ‘/data’ directory actually contains ~11GB which
includes the RHEL ISO files required for installation.
41


Notable issues are highlighted in bold and orange and described below.
o
The ‘/home’ volume is clearly out of control on both System A and B. This
is actually an error logging issue, and is described in the following section.
o
The ‘/home’ on the System B Region Server is showing greater than
expected utilization. This is associated with the Virtual Image Library
management and is considered within the recommended allocation.
Not all file systems are enumerated in the interests of brevity. These file systems
can generally be considered noise, contributing on the order of a handful of
megabytes per server. The one exception to this is the ‘/install’ file system where
most notably it consumes 20 GB on the System A region server and 61 GB on the
System B region server.
It should be noted these results are for a specific installation. As always, different
installations may have different requirements based on usage. For example, images used
for the Virtual Image Library on Central Server 2 can contribute significantly to utilization.
Volume monitoring is always recommended as a best practice.
Central Server 1 Error Logging Issue
A core question is: why is the Central Server 1 ‘/home’ utilization so high? The simple
answer is for the systems in questions, a program error is generating massive log entry
activity into the database. For example, the PDWDB database log entries are consuming
147GB alone (87%) of the overall space!
Is this normal? Absolutely not. A specific program error was triggered in our environment,
and suitable fixes have been put in place.
The following section provides a brief summary of the SCO 2.3 database structure, archive
logging, and some recommended database management approaches (including online
backup management).
42
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
7.2
The SmartCloud Orchestrator Database and Schema
Summary
The SCO DB2 databases typically run under the default instance of DB2INST1. The
following table summarizes the individual SCO databases.
Database
Schema(s)
Comments
BPMDB
BPMUser
Business Process Manager (BPM) databases.
OPENSTAC
CIRnnnnn
GLEnnnnn
NOAnnnnnn
SCEnnnnnn
KSDB
OpenStack database. Note the “nnnnn” schema suffix
is variable per region.
RAINMAKE
DB2INST1
IBM Workload Deployer (IWD) database. Uses the
default schema for the database instance (in this case,
DB2INST1).
STORHOUS
DB2INST1
IBM Workload Deployer (IWD) database. Uses the
default schema for the database instance (in this case,
DB2INST1).
CMNDB
PDWDB
Figure 29: Database and Schema Summary
7.3
Database Management
Generally speaking, the “out of the box “database configuration will achieve good results
for both large and small installations. The following recommendations are primarily in the
area of database maintenance.
7.3.1 DBMS Versions
The following DBMS versions are recommended. All versions should be 64 bit.
Version
DB2 10.1 fp3 or later
Notes
DB2 10.5 and upward is not currently supported.
Figure 30: DBMS Versions
7.3.2 Automatic Maintenance
DB2 offers a number of automatic maintenance options. Automatic statistics collection
(aka runstats) is considered a basic and necessary configuration setting, and is enabled for
the product by default. Two other recommended configuration settings follow. It is
43
expected these configuration settings will be enabled by default in future versions of the
products.
1. Real time statistics. The default runstats configuration generally collects statistics
at two hour intervals. The real time statistics option provides far more granular
statistics collection, essentially generating statistics as required at statement
compilation time.
2. Automatic reorganization. Many customers ignore database reorganization and
system performance starts to decline. This can be especially critical in the cloud
space. The recommendation is to enable automatic reorganization support so it is
self managed by the DBMS. Further discussion of database reorganization is
covered in section 7.4.3.
The following commands may be used to enable these automatic maintenance options. At
the time of this writing, they are conditionally recommended. Each of these options has
runtime impact and should be monitored to ensure there is no unnecessary system impact.
In order to facilitate this, they should only be enabled once the system has been
established and monitored. In addition, automatic reorganization is dependent on the
definition of a maintenance window (see the DB2 Information Center for more detail).
update db cfg for OPENSTAC using AUTO_STMT_STATS ON
update db cfg for OPENSTAC using AUTO_REORG ON
Figure 31: Database Automatic Maintenance Configuration
7.3.3 Operating System Configuration (Linux)
The product installation guides have comprehensive instructions for Operating System prerequisites and configuration. However, on Linux systems improper configuration is
common, so we will highlight specific issues.
The first configuration point to check is the file system ulimit for the maximum number of
open files allowed for a process (i.e. nofiles). The value for this kernel limit should be either
“unlimited” or “65536”. The DB2 reference for this configuration setting is available here.
In addition, the kernel semaphore and message queue specifications should be correct.
These configuration settings are a function of the physical memory available on the
machine. The DB2 reference for these configuration settings is available here.
7.4
Database Hygiene Overview
The following steps will be described for database hygiene overview:
1. Database backup management.
2. Database statistics management.
3. Database reorganization.
4. Database archive management.
5. Database maintenance automation.
44
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
Steps make reference to recommended scheduling frequencies. The general purpose
“cron” scheduling utility may be used to achieve this. However, other scheduling utilities
may also be used. The key aspect of a cron’ed activity is it is scheduled at regular
intervals (e.g. nightly, weekly) and typically does not require operator intervention.
Designated maintenance windows may be used for these activities.
7.4.1 Database Backup Management
It is recommended that nightly database backups be taken. The following figures offer a
sample database offline backup (utilizing compression), along with a sample restore.
backup db <dbname> user <user> using <password> to <backup directory> compress
Figure 32: Database Backup with Compression Command
restore db <dbname> from <backup directory> taken at <timestamp> without
prompting
Figure 33: Database Offline Backup Restore
Online backups may be utilized as well. The following figure provides commands that
comprise a sample weekly schedule. With the given schedule, the best case scenario is a
restore requiring one image to restore (Monday failure using the Sunday night backup).
The worst case scenario would require four images (Sunday + Wednesday + Thursday +
Friday). An alternate approach would be to utilize a full incremental backup each night to
make the worst case scenario two images. The tradeoffs for the backup approaches are
the time to take the backup, the amount of disk space consumed, and the restore
dependencies. A best practice can be to start with nightly full online backups, and
introduce incremental backups if time becomes an issue.
(Sun)
(Mon)
(Tue)
(Wed)
(Thu)
(Fri)
(Sat)
backup
backup
backup
backup
backup
backup
backup
db
db
db
db
db
db
db
<dbname>
<dbname>
<dbname>
<dbname>
<dbname>
<dbname>
<dbname>
online
online
online
online
online
online
online
include logs use tsm
incremental delta use
incremental delta use
incremental use tsm
incremental delta use
incremental delta use
incremental use tsm
tsm
tsm
tsm
tsm
Figure 34: Database Online Backup Schedule
Note to enable incremental backups, the database configuration must be updated to track
page modifications, and a full backup taken in order to establish a baseline.
update db cfg for OPENSTAC using TRACKMOD YES
Figure 35: Database Incremental Backup Enablement
To restore the online backups, either a manual or automatic approach may be used. For
the manual approach, you must start with the target image, and then revert to the oldest
relevant backup and move forward to finish with the target image. A far simpler approach
is to use the automatic option and let DB2 manage the images. A sample of each
approach is provided below, showing the restore based on the Thursday backup.
45
restore db <dbname> incremental use tsm taken at <Sunday full timestamp>
restore db <dbname> incremental use tsm taken at <Wednesday incremental
timestamp>
restore db <dbname> incremental use tsm taken at <Thursday incremental delta
timestamp>
Figure 36: Database Online Backup Manual Restore
restore db <dbname> incremental auto use tsm taken at <Thursday incremental delta
timestamp>
Figure 37: Database Online Backup Automatic Restore
In order to support online backups, archive logging must be enabled. The next subsection
provides information on archive logging, including the capability to restore to a specific
point in time using a combination of database backups and archive logs.
Database Log Archiving
A basic approach we will advocate is archive logging with the capability to support online
backups. The online backups themselves may be full, incremental (based on the last full
backup), and incremental delta (based on the last incremental backup). In order to enable
log archiving to a location on disk, the following command may be used.
update db cfg for <dbname> using logarchmeth1 DISK:/path/logarchive
Figure 38: Database Log Archiving to Disk
Alternatively, in order to enable log archiving to TSM, the following command may be
used 5 .
update db cfg for <dbname> using logarchmeth1 TSM
Figure 39: Database Log Archiving to TSM
Note that a “logarchmeth2” configuration parameter also exists. If both of the log archive
method parameters are set, each log file is archived twice (once per log archive method
configuration setting). This will result in two copies of archived log files in two distinct
locations (a useful feature based on the resiliency and availability of each archive location).
Once the online backups and log archive(s) are in effect, the recovery of the database may
be performed via a database restore followed by a roll forward through the logs. Several
restore options have been previously described in section 7.4.1. Once the restore has
been completed, roll forward recovery must be performed. The following are sample roll
forward operations.
5
The log archive methods (logarchmeth1, logarchmeth2) have the ability to associate configuration
options with them (logarchopt1, logarchopt2) for further customization.
46
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
rollforward <dbname> to end of logs
Figure 40: Database Roll Forward Recovery: Sample A
rollforward <dbname> to 2012-02-23-14.21.56 and stop
Figure 41: Database Roll Forward Recovery: Sample B
It is worth noting the second example recovers to a specific point in time. For a
comprehensive description of the DB2 log archiving options, the DB2 information center
should be consulted (URL). A service window (i.e. stop the application) is typically
required to enable log archiving.
Database Backup Cleanup
Unless specifically pruned, database backups may accumulate and cause issues with disk
utilization or, potentially, a stream of failed backups. If unmonitored backups begin to fail, it
may make disaster recovery near impossible in the event of a hardware or disk failure. A
simple manual method to prune backups follows.
find /backup/DB2 -mtime +7 | xargs rm
Figure 42: Database Backup Cleanup Command
A superior approach is to let DB2 automatically prune the backup history and delete your
old backup images and log files. A sample configuration is provided below.
update db cfg for OPENSTAC using AUTO_DEL_REC_OBJ ON
update db cfg for OPENSTAC using NUM_DB_BACKUPS 21
update db cfg for OPENSTAC using REC_HIS_RETENTN 180
Figure 43: Database Backup Automatic Cleanup Configuration
It is also generally recommended to have the backup storage independent from the
database itself. This provides a level of isolation in the event volume issues arise (e.g. it
ensures that a backup operation will not fill the volume hosting the tablespace containers,
which could possibly lead to application failures).
7.4.2 Database Statistics Management
As discussed in the previous “Automatic Maintenance” section, database statistics ensure
that the DBMS optimizer makes wise choices for database access plans. The DBMS is
typically configured for automatic statistics management. However, it may often be wise to
force statistics as part of a nightly or weekly database maintenance operation. A simple
command to update statistics for all tables in a database is the “reorgchk” command.
reorgchk update statistics on table all
Figure 44: Database Statistics Collection Command
One issue with the reorgchk command is it does not enable full control over statistics
capturing options. For this reason, it may be beneficial to perform statistics updates on a
table by table level. However, this can be a daunting task for a database with hundreds of
tables. As a result, the following SQL statement may be used to generate administration
commands on a table by table basis.
47
select 'runstats on table ' || STRIP(tabschema) || '.' || tabname || ' with
distribution and detailed indexes all;' from SYSCAT.TABLES where tabschema in
('CIRnnnnn','GLEnnnnn','NOAnnnnn','SCEnnnnn','KSDB');
Figure 45: Database Statistics Collection Table Iterator
7.4.3 Database Reorganization
Over time, the space associated with database tables and indexes may become
fragmented. Reorganizing the table and indexes may reclaim space and lead to more
efficient space utilization and query performance. In order to achieve this, the table
reorganization command may be used. Note, as discussed in the previous “Automatic
Maintenance” section, automatic database reorganization may be enabled to reduce the
requirement for manual maintenance.
The following commands are examples of running a “reorg” on a specific table and its
associated indexes. Note the “reorgchk” command previously demonstrated will actually
have a per table indicator of what tables require a reorg. Using the result of “reorgchk” per
table reorganization may be achieved for optimal database space management and usage.
reorg table <table name> allow no access
reorg indexes all for table <table name> allow no access
Figure 46: Database Reorganization Commands
It is important to note there are many options and philosophies for doing database
reorganization. Every enterprise must establish its own policies based on usage, space
considerations, performance, etc. The above example is an offline reorg. However it is
possible to also do an online reorg via the “allow read access” or “allow write access”
options. The “notruncate” option may also be specified (indicating the table will not be
truncated in order to free space). The “notruncate” option permits more relaxed locking and
greater concurrency (which may be desirable if the space usage is small or will soon be
reclaimed). If full online access during a reorg is required, the “allow write access” and
“notruncate” options are both recommended.
Note it is also possible to use our table iteration approach to do massive reorgs across
hundreds of tables as shown in the following figure. The DB2 provided snapshot routines
and views (e.g. SNAPDB, SNAP_GET_TAB_REORG) may be used to monitor the status
of reorg operations.
select 'reorg table ' || STRIP(tabschema) || '.' || tabname || ' allow no
access;' from SYSCAT.TABLES where tabschema in
('CIRnnnnn','GLEnnnnn','NOAnnnnn','SCEnnnnn','KSDB');
select 'reorg indexes all for table ' || STRIP(tabschema) || '.' || tabname || '
allow no access;' from SYSCAT.TABLES where tabschema in
('CIRnnnnn','GLEnnnnn','NOAnnnnn','SCEnnnnn','KSDB');
Figure 47: Database Reorganization Table Iterator
48
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
7.4.4 Database Archiving
Database archiving is the act of removing unnecessary or obsolete information in order to
preserve optimum performance. The intent is to keep table cardinality manageable so that
query performance is stable, and to minimize IO overhead. The following graph shows the
real world impact of proper database archiving.
Figure 48: Database Archiving Impact
The graph shows provisioning service times pre and post archiving. For the pre archiving
interval, not only are the average service times much higher (dark blue line), but the
distribution of service times is much wider (series of cyan data points). Once the archiving
is implemented, the service times are extremely stable with a much narrower time
distribution.
In order to achieve database archiving, an archive script and associated documentation is
provided with this paper 6 (see “ArchiveScripts.zip”). The archiving is an OpenStack
function and copies the historical content to a shadow database (implying the data is still
available and online). It is recommended the database archiving be part of a scheduled
maintenance activity via the crontab (see the next section for details).
6
The archive scripts are also part of the SCO 2.3.0.1 distribution.
49
7.4.5 Database Maintenance Automation
For standard database maintenance, it is advisable to automate the scheduling and
execution of the maintenance activities via the crontab. The following table shows a
sample schedule for the maintenance operations for the relevant SCO databases.
Database
Statistics
Reorgs
STORHOUS
Sunday
Saturday
PWDWB
Tuesday
Monday
BPMDB
Wednesday
Tuesday
OPENSTAC
Monday
Sunday
RAINMAKE
Thursday
Wednesday
Friday
Thursday
CMNDB
Archiving
Saturday
Figure 49: Sample Database Maintenance Schedule
The following example demonstrates maintenance activities on the OPENSTAC database.
Similar examples are provided with this paper via the “CrontabScripts.zap” attachment. In
general, the sample cron entries schedule activities in disjoint time windows throughout the
week. This serves to provide fully online maintenance operations with minimal impact.
# Run runstats and reorgchk for openstac db
0 2 * * Mon db2inst1 /home/db2inst1/tools/gen_runstats.sh OPENSTAC /home/db2inst1/tools
30 2 * * Sun db2inst1 /home/db2inst1/tools/gen_reorg.sh OPENSTAC /home/db2inst1/tools
Figure 50: Sample Database Maintenance Crontab Entry
50
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
8
Summary Cookbook
The following tables provide a cookbook for the solution implementation. The cookbook
approach implies a set of steps the reader may “check off” as completed to provide a
stepwise implementation of the SCO solution. The recommendations will be provided in
three basic steps:
1. Base installation recommendations.
2. Post installation recommendations.
3. High scale recommendations.
All recommendations are provided in tabular format. The preferred order of implementing
the recommendations is in order from the first row of the table through to the last.
8.1
Base Installation Recommendations
The base installation recommendations are considered essential to a properly functioning
SCO instance. All steps should be implemented.
Identifier
B1
Description
Perform the base SCO installation, ensuring the recommended
configuration described in Section 5.2 is achieved.
A central DB2 server should be used (i.e. the region servers should not
manage a local DBMS unless there are compelling geographic
considerations). Where possible it is recommended to install the DBMS
on bare metal, or in a DBA managed pool, to facilitate performance
management.
B2
Enable the Keystone memcached implementation (Section 6.1).
B3
Enable the OpenStack Keystone worker support (Section 6.2).
B4
Enable the IaaS Gateway cluster support (Section 6.3).
B5
Optimize the IWD component (Section 6.4).
B6
Configure the Linux IO scheduler (Section 6.5).
B7
Disable the ACPI management (Section 6.6).
B8
Ensure the Java heaps are optimized (Section 6.7).
B9
Configure the central database (Section 6.8).
B10
Configure the database server Linux instance per section 7.3.3.
Figure 51: Base Installation Recommendations
51
Status
8.2
Post Installation Recommendations
The post installation recommendations will provide additional throughput and superior
functionality. All steps should be implemented.
Identifier
Description
P1
Perform a set of infrastructure and SCO benchmarks to determine the
viability of the installation (see Sections 4.2 and 4.3).
P2
Implement the database statistics maintenance activity per Sections 7.4.2
and 7.4.5.
P3
Implement the database reorg maintenance activity per Sections 7.4.3 and
7.4.5.
P4
Implement the database archiving maintenance activity per Sections 7.4.4
and 7.4.5.
P5
Implement a suitable backup and disaster recovery plan comprising
regular backups of all critical server components (including the database
and relevant file system objects). Guidelines are provided in the SCO
Information Center (URL).
Status
Figure 52: Post Installation Recommendations
8.3
High Scale Recommendations
The high scale recommendations should be incorporated once the production installation
wants to support the high water mark for scalability. All steps may be optionally
implemented over time based upon workload.
Identifier
Description
S1
Apply the latest SCO fixpack.
S2
Monitor the performance of the installation (Section 4.1) and adjust the
management server to the recommended installation values (Section 5.2)
as appropriate.
S3
Optimize Central Server 1 (DBMS) performance. A basic way to achieve
this is to have dedicated, high performance storage allocated to the
database containers and logs.
Figure 53: High Scale Recommendations
52
Status
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
APPENDIX A: SMARTCLOUD
ORCHESTRATOR MONITORING OPTIONS
Monitoring is important to understand and ensure the health of any cloud solution. A
number of monitoring approaches are available for SCO. The solutions are described via
the following summary sections, broken down into three categories.
1. OpenStack monitoring via Ceilometer.
2. SCO monitoring via IBM BPM.
3. Infrastructure monitoring via IBM Tivoli Monitoring (ITM) and third party solutions.
A separate appendix is provided that is specific to OpenStack Keystone monitoring.
A.1
OpenStack Monitoring
OpenStack monitoring is provided via the Ceilometer component. Cielometer offers a
comprehensive and customizable infrastructure, including support for event and threshold
management. Note while Ceilometer is not part of the base SCO 2.3 distribution, it is a
constituent of the OpenStack Grizzly base, with continued enhancement in subsequent
OpenStack releases.
Ceilometer provides three distinct types of metrics:
1. Cumulative: counters that accumulate or increase over time.
2. Gauge: counters that offer discrete, point in time values.
3. Delta: differential counters showing change rates.
A vast array of metrics is provided by Ceilometer. An easy way to interactively derive the
set of available metrics is to query Ceilometer directly (see the sample below). In addition,
the Ceilometer documentation provides the default set, with associated attributes (URL).
ceilometer meter-list -s openstack
Figure 54: OpenStack Ceilometer Metrics
53
The following table provides a core set of recommended monitoring points for OpenStack.
A broader set may of course be used.
Component
Meters
Nova (Compute Node Management)
cpu_util
disk.read.requests.rate
disk.write.requests.rate
disk.read.bytes.rate
disk.write.bytes.rate
network.incoming.bytes.rate
network.outgoing.bytes.rate
network.incoming.packets.rate
network.outgoing.packets.rate
The following counters require enablement:
compute.node.cpu.kernel.percent
compute.node.cpu.idle.percent
compute.node.cpu.user.percent
compute.node.cpu.iowait.percent
Neutron (Network Management)
network.create
network.update
subnet.create
subnet.update
Glance (Image Management)
image.update
image.upload
image.delete
Cinder (Volume Management)
volume.size
Swift (Object Storage Management)
storage.objects
storage.objects.size
storage.objects.containers
storage.objects.incoming.bytes
storage.objects.outgoing.bytes
Heat (Orchestration)
stack.create
stack.update
stack.delete
stack.suspend
stack.resume
Figure 55: OpenStack Ceilometer Core Metrics
In addition, Ceilometer provides a REST API that allows cloud administrators to record
KPIs. For instance, infrastructure metrics could be placed in Ceilometer with a HTTP
POST request. As Ceilometer includes a data store, as well as some basic statistical
functionality, it is a candidate for an integration point for cloud monitoring data.
54
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
A.2
SmartCloud Orchestrator Monitoring
SCO monitoring should be employed to address the solution layer “above” OpenStack.
The primary mechanism for SCO monitoring is enablement of the BPM performance data
warehouse (relevant information available in the References section) 7 . The performance
data warehouse may be enabled via “autotracking”, which will enable both custom KPIs as
well as the default total time KPIs. The core KPIs to understand BPM capability are:

BPM processes executed per second.

Average service times per BPM process.
It is important to note that given Ceilometer provides a general plugin and distribution
infrastructure, it may be combined with the SCO monitoring solution. A sample approach
for managing these monitoring points follows.
1. Derive a BPM plugin to retrieve raw times from the BPM performance data
warehouse (PDWDB) database. The preferred method is the provided REST
interface (versus direct database access).
2. Perform calculations based on the raw data. For example, converting a series of
milestones into performance KPIs, or calculating statistical quantities (e.g.
standard deviation, harmonic mean).
3. Push the results to Ceilometer as the meter distribution mechanism.
4. Read the results via the Ceilometer REST API and display in the visualization tool
of your choice.
A.3
Infrastructure Monitoring
Infrastructure monitoring can address the operating system and hypervisor health of the
cloud. Available tools include IBM Tivoli Monitoring (ITM) or the open source offering
Nagios. For example, ITM v6.2 provides the follow infrastructure monitoring agents (for
reference, see URL).
1. IBM Tivoli Monitoring Endpoint.
2. Linux OS.
3. UNIX Logs.
4. UNIX OS.
5. Windows OS.
6. i5/OS®.
7. IBM Tivoli Universal Agent.
7
It is worth noting that BPM is built on IBM WebSphere and as a result, WebSphere monitoring
capabilities also apply.
55
8. Warehouse Proxy.
9. Summarization and Pruning.
10. IBM Tivoli Performance Analyzer.
Critical KPIs to monitor at the infrastructure level are summarized in the following table
(VMware is provided as a representative hypervisor sample).
Component
Meters

Operating System











DBMS: ITM for DB2 (URL)




Application Server: ITCAM Agent for
WebSphere Applications (URL)


J2EE: ITCAM Agent for J2EE (URL)
HTTP: ITCAM Agent for HTTP Servers
(URL)
Hypervisor: ITM for Virtual Environments
(URL)
56
CPU utilization including kernel,
user, IO wait, and idle times.
Disk utilization including read/write
request and byte rates.
Network utilization including
incoming and outgoing packet and
byte rates.
Volume free space across the
central and region servers. Special
attention should be paid to the
Virtual Image Library on Central
Server 2 to ensure the
“/home/library” space is well
managed.
Application IO activity workspace.
Application lock activity workspace.
Application overview workspace.
Buffer Pool workspace.
Connection workspace.
Database workspace.
Database Lock Activity workspace.
Historical Summarized Capacity
Weekly workspace.
Historical Summarized Performance
Weekly workspace.
Locking Conflict workspace.
Tablespace workspace.
WebSphere Agent Summary
workspace.
Application Server Summary
workspace.
Application Health Summary
workspace.

Web Server Agent workspace.






Server workspace.
CPU workspace.
Disk workspace.
Memory workspace.
Network workspace.
Resource Pools workspace.
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide


Hypervisor: VMware esxtop sample



Figure 56: Infrastructure Core Metrics
57
Virtual Machines workspace.
CPU:
Run(%RUN),
Wait (%WAIT),
Ready (%RDY),
Co-Stop (%CSTP).
Network:
Dropped packets
(%DRPTX, %DRPRX).
IO:
Latency (DAVG, KAVG),
Queue length (QUED)
Memory:
Memory reclaim (MCTLSZ),
Swap (SWCUR, SWR/s, SWW/s),
APPENDIX B: OPENSTACK KEYSTONE
MONITORING
The Keystone component is critical to overall performance of SmartCloud Orchestrator.
For example, if one component saturates Keystone, the overall throughput of the system
will be impacted. This is magnified by the fact that Keystone has only a single execution
thread instance. In order to understand Keystone performance, the best method is to look
at the requests and responses via a proxy such as the IaaS Gateway. This provides the
ability to see requests that are dropped before being processed by Keystone.
We will describe an approach for monitoring Keystone via the PvRequestFilter.
B.1
PvRequestFilter
The PvRequestFilter was designed to output request and response data into the Keystone
log. When enabled it prints the data as warning messages, so it is not necessary to turn
up the default debug level to generate the log messages.
The format of the messages is as follows. All fields except “<duration>” are printed out
for both requests and responses. The duration of the request is printed only for the
response.
WARNING [REQUEST|RESPONSE] <millisecond timestamp to identify request>
<REMOTE_ADDR>:<REMOTE_PORT> <REQUEST_METHOD> <RAW_PATH_INFO> [<duration>]
Figure 57: Keystone Monitoring PvRequestFilter Format
Sample output follows.
2014-07-21 17:16:56.509 22811 WARNING keystone.contrib.pvt_filter.request [-]
REQUEST 2014-07-21_17:16:56.509 172.18.152.103:1278 GET /v3/users
2014-07-21 17:16:56.785 22811 WARNING keystone.contrib.pvt_filter.request [-]
RESPONSE 2014-07-21_17:16:56.509 172.18.152.103:1278 GET /v3/users 0.276294
2014-07-21 17:16:56.807 22811 WARNING keystone.contrib.pvt_filter.request [-]
REQUEST 2014-07-21_17:16:56.807 172.18.152.103:1278 GET /v3/domains
2014-07-21 17:16:56.824 22811 WARNING keystone.contrib.pvt_filter.request [-]
RESPONSE 2014-07-21_17:16:56.807 172.18.152.103:1278 GET /v3/domains 0.017691
2014-07-21 17:16:56.839 22811 WARNING keystone.contrib.pvt_filter.request [-]
REQUEST 2014-07-21_17:16:56.839 172.18.152.103:1279 GET
/v3/users/e92b94d7068843ef98d664521bd9c983/projects
2014-07-21 17:16:56.868 22811 WARNING keystone.contrib.pvt_filter.request [-]
RESPONSE 2014-07-21_17:16:56.839 172.18.152.103:1279 GET
/v3/users/e92b94d7068843ef98d664521bd9c983/projects 0.028558
Figure 58: Keystone Monitoring PvRequestFilter Sample Output
58
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
B.2
Enabling PvRequestFilter
The process to enable PvRequestFilter follows.
1. Log onto Central Server 2.
2. Extract the distribution provided with this paper (keystoneStats.zap).
3. Install the filter and backup the existing configuration:
./deployKeystoneFilter.sh
4. Make the following changes to the “/etc/keystone/keystone.conf” file.
Note: Reversing step 2 will disable the filter.
a. Add the following lines just above line starting with "[filter:debug]".
[filter:pvt]
paste.filter_factory =
keystone.contrib.pvt_filter.request:PvtRequestFilter.factory
b. Add "pvt" to three of the pipeline statements:
[pipeline:public_api]
pipeline = access_log sizelimit url_normalize token_auth
admin_token_auth xml_body json_body simpletoken ec2_extension
user_crud_extension pvt public_service
[pipeline:admin_api]
pipeline = access_log sizelimit url_normalize token_auth
admin_token_auth xml_body json_body simpletoken ec2_extension
s3_extension crud_extension pvt admin_service
[pipeline:api_v3]
pipeline = access_log sizelimit url_normalize token_auth
admin_token_auth xml_body json_body simpletoken ec2_extension
s3_extension pvt service_v3
c.
Restart the keystone service.
service openstack-keystone restart
d. Validate that the “/var/log/keystone/keystone.log” is producing the
appropriate log messages (sample below).
e. Update the “hosts.table” file to reflect your environment.
f.
Run the workload or scenario for analysis.
g. Generate the statistics for the request and response data in the
“keystone.log” file (sample below):
./keystoneStats.sh /var/log/keystone/keystone.log > results
Figure 59: Keystone Monitoring Log Messages Example
59
Figure 60: Keystone Monitoring Statistics Example
60
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
APPENDIX C: IAAS GATEWAY CLUSTER
ENABLEMENT
The following steps are required to enable the IaaS Gateway cluster.
1. Prepare the HTTP server as a load balancer.
a.
Ensure the HTTP server is installed.
i. Check if there is already a HTTP server on Central Server 2:
service httpd status
ii. If there is already an HTTP server, stop it with the following
command:
service httpd stop
iii. If there is no HTTP server installed, use the following command to
install one:
yum install httpd
b. Update the “httpd.conf” with the load balancer configuration.
i. Modify the file “/etc/httpd/conf/httpd.conf” with the following
changes.
1. Update the listen port to the gateway port:
# Listen 80
Listen 9973
2. Append the load balancer configuration to the end the file:
<VirtualHost *:9973>
ProxyRequests off
<Proxy balancer://mycluster>
# three node gateway cluster
BalancerMember http://127.0.0.1:12001
BalancerMember http://127.0.0.1:12002
BalancerMember http://127.0.0.1:12003
Order Deny,Allow
Deny from none
Allow from all
ProxySet lbmethod=byrequests
</Proxy>
# path of requests to balance "/" -> everything
ProxyPass / balancer://mycluster/
</VirtualHost>
2.
Prepare the configuration file for cluster members, by performing the following
commands.
61
cd /etc/iaasgateway/
cp iaasgateway.conf iaasgateway00.conf
vi iaasgateway00.conf
#It should look like below before applying this fix:
[service]
iaasgateway_listen = <central-server-2-ip>
iaasgateway_listen_port = 9973
#Update it to:
iaasgateway_listen = 127.0.0.1
iaasgateway_listen_port = 1200X
iaasgateway_user_entry = <central-server-2-ip>
iaasgateway_user_entry_port = 9973
# copy configure files and update port
cp iaasgateway00.conf iaasgateway01.conf
sed -i 's/1200X/12001/' iaasgateway01.conf
cp iaasgateway00.conf iaasgateway02.conf
sed -i 's/1200X/12002/' iaasgateway02.conf
cp iaasgateway00.conf iaasgateway03.conf
sed -i 's/1200X/12003/' iaasgateway03.conf
3. Prepare the init scripts and update the configuration file.
cd /etc/init.d/
cp openstack-iaasgateway openstack-iaasgateway01
cp openstack-iaasgateway openstack-iaasgateway02
cp openstack-iaasgateway openstack-iaasgateway03
sed -i 's/prog=openstack-iaasgateway/prog=openstack-iaasgateway01/' openstackiaasgateway01
sed -i 's/iaasgateway.conf/iaasgateway01.conf/' openstack-iaasgateway01
sed -i 's/prog=openstack-iaasgateway/prog=openstack-iaasgateway02/' openstackiaasgateway02
sed -i 's/iaasgateway.conf/iaasgateway02.conf/' openstack-iaasgateway02
sed -i 's/prog=openstack-iaasgateway/prog=openstack-iaasgateway03/' openstackiaasgateway03
sed -i 's/iaasgateway.conf/iaasgateway03.conf/' openstack-iaasgateway03
4. Start up the cluster, through the following commands.
service openstack-iaasgateway stop
Stopping openstack-iaasgateway:
[ OK ]
service openstack-iaasgateway01 start
Starting openstack-iaasgateway01:
[ OK ]
service openstack-iaasgateway02 start
Starting openstack-iaasgateway02:
[ OK ]
service openstack-iaasgateway03 start
Starting openstack-iaasgateway03:
[ OK ]
service httpd start
Starting httpd:
[ OK ]
62
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
5. Ensure the cluster startup will persist across reboots.
# Turn the non-clustered gateway off.
chkconfig --level 2345 openstack-iaasgateway off
# Turn the clustered gateway on.
chkconfig --level 2345 openstack-iaasgateway01 on
chkconfig --level 2345 openstack-iaasgateway02 on
chkconfig --level 2345 openstack-iaasgateway03 on
chkconfig --level 2345 httpd on
6. Check the IaaS Gateway service status.
a. Try to open following link in a browser. The content should operate the
same as prior to applying the cluster.
http://<central-server-2-ip>:9973/providers
b. Check for listening ports with the following command:
netstat -nap | grep 1200 | grep LISTEN
tcp
0 0 127.0.0.1:12001
0.0.0.0:*
LISTEN
7269/python
tcp
0 0 127.0.0.1:12002
0.0.0.0:*
LISTEN
7286/python
tcp
0 0 127.0.0.1:12003
0.0.0.0:*
LISTEN
7303/python
c.
Check whether the load balancer is listening:
netstat -nap | grep 9973 | grep LISTEN
tcp
0 0 :::9973
:::*
d. Verify you may login to the SCO UI.
7. The IaaS Gateway cluster is now enabled.
63
LISTEN
7321/httpd
REFERENCES
SmartCloud Orchestrator and Related Component References
IBM SmartCloud Orchestration Information Center
SCO 2.3 Information Center
IBM SmartCloud Orchestrator Resource Center
SCO Resource Center
IBM Business Process Manager V8.0 Performance Tuning and Best Practices
http://www.redbooks.ibm.com/redpapers/pdfs/redp4935.pdf
IBM Business Process Manager Performance Data Warehouse
http://pic.dhe.ibm.com/infocenter/dmndhelp/v8r5m0/topic/com.ibm.wbpm.admin.doc/topics/
managing_performance_servers.html
IBM Tivoli Monitoring Information Center
http://publib.boulder.ibm.com/infocenter/tivihelp/v15r1/topic/com.ibm.itm.doc_6.2.3fp1/welc
ome.htm
IBM DB2 10.1 Information Center
http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/index.jsp?topic=/com
OpenStack References
OpenStack Performance Presentation (Folsom, Havana, Grizzly)
http://www.openstack.org/assets/presentation-media/openstackperformance-v4.pdf
OpenStack Ceilometer
http://docs.openstack.org/developer/ceilometer
OpenStack Rally
https://wiki.openstack.org/wiki/Rally
Hypervisor References
Performance Best Practices for VMware vSphere™ 5.0
http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf
Performance Best Practices for VMware vSphere™ 5.1
http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.1.pdf
VMware: Troubleshooting virtual machine performance issues
VMware Knowledge Base
VMware: Performance Blog
http://blogs.vmware.com/vsphere/performance
Linux on System x: Tuning KVM for Performance
64
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
KVM Performance Tuning
Kernel Virtual Machine (KVM): Tuning KVM for performance
http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaat/liaattuning_pdf.pdf
PowerVM Virtualization Performance Advisor
Developer Works PowerVM Performance
IBM PowerVM Best Practices
http://www.redbooks.ibm.com/redbooks/pdfs/sg248062.pdf
Benchmark References
Report on Cloud Computing to the OSG Steering Committee, SPEC Open Systems Group,
https://www.spec.org/osgcloud/docs/osgcloudwgreport20120410.pdf
65
®
© Copyright IBM Corporation 2014
IBM United States of America
Produced in the United States of America
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with
IBM Corp.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM
representative for information on the products and services currently available in your area. Any reference to an IBM
product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used.
Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be
used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program,
or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of
this document does not grant you any license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such provisions are
inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PAPER “AS IS” WITHOUT WARRANTY OF
ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow
disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes may be made periodically to the
information herein; these changes may be incorporated in subsequent versions of the paper. IBM may make improvements
and/or changes in the product(s) and/or the program(s) described in this paper at any time without notice.
Any references in this document to non-IBM Web sites are provided for convenience only and do not in any manner serve
as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product
and use of those Web sites is at your own risk.
IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of
this document does not give you any license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
4205 South Miami Boulevard
Research Triangle Park, NC 27709 U.S.A.
All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent
goals and objectives only.
This information is for planning purposes only. The information herein is subject to change before the products described
become available.
If you are viewing this information softcopy, the photographs and color illustrations may not appear.
66
SmartCloud Orchestrator Version 2.3:
Capacity Planning, Performance, and Management Guide
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in
the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in
this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks
owned by IBM at the time this information was published. Such trademarks may also be registered or common law
trademarks in other countries. A current list of IBM trademarks is available on the web at "Copyright and trademark
information" at http://www.ibm.com/legal/copytrade.shtml.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other
countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
Other company, product, or service names may be trademarks or service marks of others.
67