Download 2008-10-22_BrowningT.. - Computer Measurement Group

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
The Mathematics of Performance Management
and Capacity Planning - Overview
Descriptive and Predictive Analytics
in the Age of Virtual Systems
Tim Browning
Presented at the Greater Atlanta Computer
Measurement Group Fall Conference,
October 22, 2008
Mathematics of Performance & Capacity
On Mathematics & Statistics
There are two kinds of statistics, the kind you look up and the kind
you make up. ~Rex Stout, Death of a Doxy
How many times can you subtract 7 from 83, and what is left
afterwards?
You can subtract it as many times as you want, and it leaves 76
every time. ~Author Unknown
In ancient times, they had no statistics, so they had to fall back on
lies. ~Stephen B. Leacock
Tim Browning
October, 2008
Slide 2
Mathematics of Performance & Capacity
Goals Performance Engineering and
Capacity Management
• Goals of Performance Engineering
Monitor/Manage/Predict System Performance
Reflect and Understand Customer Experience
Foundation of evidence-based Capacity Management
• Goals of Capacity Management
Assure Computing Supply is available to Meet Business
Demand
Determine Best use of existing resources (optimization)
Tim Browning
October, 2008
Slide 3
Mathematics of Performance & Capacity
Probability, Probity and Authority
• Before the seventeenth century, legal evidence in Europe was
considered of greater weight if a person testifying had “probity”.
“Empirical evidence” was barely a concept. Probity was a
measure of authority, so evidence came from authority. A noble
person had probity. Yet today, probability is the very measure
of the weight of empirical evidence in science, arrived at from
inductive or statistical inference.
• The term 'probable' (Latin probabilis) meant approvable, and
was applied in that sense, to opinion and to action. A probable
action or opinion was one such as sensible people would
undertake or hold, in the circumstances.
• Even so, the jury of executive opinion, in the businessgovernment Enterprise, is most often swayed by the
consensus of expert opinion, usually at considerable cost.
Tim Browning
October, 2008
Slide 4
Mathematics of Performance & Capacity
•
Probability and Statistics are not the same - They are related, but
circuitously related:
– Probability can be viewed either as the long-run frequency of occurrence
or as a measure of the plausibility of an event given incomplete
knowledge - but not both.
– Statistics are functions of the observations (data) that often have useful
and even surprising properties.
•
So we see the relationship(s) between probability and statistics:
– From the observations we compute statistics that we use to estimate
population parameters, which index the probability density, from which we
can compute the probability of a future observation from that density.
– In general, probability asks what is likely to happen and statistics
describes what has already happened (and forms the basis for what is
likely)
– In statistics, you don’t know how a process works but are able to observe
the outcomes; in probability you already know how a process works but
want to know how to predict what will happen. The combination is the
foundation of statistical inference.
Tim Browning
October, 2008
Slide 5
Mathematics of Performance & Capacity
• Descriptive Statistics are used to describe the basic features
of the data gathered from an experimental study in various
ways. They provide simple summaries about the sample and
the measures. Together with simple graphics analysis, they
form the basis of virtually every quantitative analysis of data.
• Two objectives for formulating a summary statistic:
– To choose a statistic that shows how different units seem similar.
Statistical textbooks call one solution to this objective, a measure
of central tendency.
– To choose another statistic that shows how they differ. This kind
of statistic is often called a measure of statistical variability.
Tim Browning
October, 2008
Slide 6
Mathematics of Performance & Capacity
“Central Tendency”
Central – middle value, center
Tendency – Expected value, most frequent, representative
Arithmetic Mean
The arithmetic mean is the most common measure of central tendency.
It is simply the sum of the numbers divided by the number of numbers.
The symbol M is used for the mean of a population. The symbol M is
used for the mean of a sample. The formula for m is shown below:
M
x
N
where ΣX is the sum of all the numbers in the numbers in the sample and
N is the number of numbers in the sample. As an example, the mean of
the numbers
1+2+3+6+8=
20
=4
5
regardless of whether the numbers constitute the entire population or just a sample from the
population.
Tim Browning
October, 2008
Slide 7
Mathematics of Performance & Capacity
•
Other, less common measures of central tendency:
– Median is the middle value – the point where half the values lie on each
side of the number, i.e. half are larger and half are smaller. The ‘middle’ of
the distribution of values.
The number separating the higher half of a sample, a population, or a
probability distribution, from the lower half.
If you divide a distribution into 4ths (quartiles), then the median is the 2nd
quartile.
• Useful in performance management in the presence of outliers where we are
more concerned about frequency of occurrence relative to a ‘central’ value than
a theoretical ‘average’ that many not even occur in the data. For example,
response time.
– Percentiles group data by putting equal numbers of data into each group.
The nth percentile is the point below which n% of the data are found.
• Useful in performance as it provides a very good view of the user’s experience.
• Useful in capacity planning for ‘sizing’ a system based on accommodation of its
historical high points. For example, the 90th percentile of CPU busy.
Tim Browning
October, 2008
Slide 8
Mathematics of Performance & Capacity
• When to use the arithmetic mean:
– When your data contains no outliers (extreme values that are not
typical or normative).
– When the variability is low between values, for example in
utilization metrics.. when the variability is less than 20%.
• What can you do about outliers (dirty data)?
– Eliminate them (i.e. they are few and unlikely to reoccur).
– Use a weighted mean that discounts the outliers. The weighted
mean is similar to an arithmetic mean (the most common type of
average), where instead of each of the data points contributing
equally to the final average, some data points contribute more
than others.
– Use the Geometric Mean which has remarkable insensitivity to
outliers.
Tim Browning
October, 2008
Slide 9
Mathematics of Performance & Capacity
The Dirty Data Experiment with the Geometric Mean
Tim Browning
October, 2008
Slide 10
Mathematics of Performance & Capacity
The Dirty Data Experiment with the Weighted Mean
=(1/19)-(1/19)*0.2
Tim Browning
October, 2008
=(1/19)+((1/19)*0.2)/18
A convex combination is a linear combination of points (which can be vectors, scalars, etc.) where
all coefficients are non-negative and sum up to 1.
Slide 11
Mathematics of Performance & Capacity
“There are liars, outliers, and out-and out liars.”
• What are ‘outliers’?
– Extreme values not typical of the group
– “Rare events” that do not fit within the range of other data values.
– Non-normative data, anomalous, exceptional, etc.
• How are they detected?
– Visually using statistical graphics
– Statistical Filtering
– Interquartile fencing – less than lower quartile; greater than upper
quartile
– More advanced methods: Grubbs’ Test, etc
There is no such thing as a simple test!
Tim Browning
October, 2008
Slide 12
Mathematics of Performance & Capacity
The Geometric Mean
GM  N x1  x2  ...  xN
• Instead of adding the set of
numbers and then dividing the sum
by the count of numbers in the set,
n, the numbers are multiplied and
1
then the nth root of the resulting
N
N


1 N

product is taken.
GN    xi   exp   ln xi   exp( mean(ln( X )))
 N i 1

 i 1 
•
For instance, the geometric mean
of two numbers, say 2 and 8, is just
the square root (i.e., the second
root) of their product, 16, which is
4. As another example, the
geometric mean of 1, ½, and ¼ is
the cube root (i.e., the third root) of
their product (0.125), which is ½.
In SQL-eese:
SELECT
EXP(AVG(LN(Response_Time)))
as GEOMEAN
FROM
Tim Browning
October, 2008
Slide 13
Mathematics of Performance & Capacity
The ‘geometry’ part of the Geometric Mean:
Consider a ‘line’ where the beginning is at point ‘A’
and the end is at point ‘B’, where is the ‘middle’
(point ‘B’)?
 A  B 
  
 B C 
C
A
B * B  A*C
B?
B2  A*C
B  2 A*C
Tim Browning
October, 2008
Slide 14
Mathematics of Performance & Capacity
Measures of variability
• Variance – the amount of
‘spread’ in the data around
the mean.
S 2  (( x1  x ) 2  ( x2  x ) 2  ...( xn  x ) 2 ) /( n  1)
• Standard Deviation –
square root of the variance
In a normal distribution approx
2/3 of the data are within
one standard deviation of
the mean on either side
Tim Browning
October, 2008
In performance large response time Std
Devns are usually bad; you want it to be
low and repeatable. Wide variations upset
people more than long, but consistent
times.
Slide 15
Mathematics of Performance & Capacity
The Geometric Standard Deviation
• The antilog of the standard
deviation of the natural log
transformed values of x or
In SQL-eese:
SELECT
EXP(STDDEV(LN(Response_Ti
me)))
as GEOSTDEV
Gsd  exp( stdev(ln( x))
FROM
1
N
Tim Browning
October, 2008
N
the_data
2
1

2
ln(
x
)

ln(
x
)



i
i 
i 1
 N i 1

Gsd  sqrt (mean(ln( x)^ 2)  mean(ln( x))^ 2)
Gsd 
N
WHERE
Response_Time>0
Slide 16
Mathematics of Performance & Capacity
Correlation and Regression
• Correlation – How things vary together (or not); the
strength and direction of a linear relationship
between two random variables or the departure of
two variables from independence.
• There are several…Pearson, being the most
common in performance analysis (but mis-named)
• Probably the most misused statistical tool.
• Obtained by dividing the covariance of two variables
by the product of their standard deviations.
Tim Browning
October, 2008
Slide 17
Mathematics of Performance & Capacity
• Linear Regression and it’s cousins (non-linear, multi-, and
logistic, etc.) are all methods for fitting curves or lines to data in
a statistically optimal manner. “The best way of drawing a line
since the invention of the straight edge” – Pat Artis.
• Often used by managers to observe ‘trends’ and predict the
future (or explain the past). Often misused for the same
purpose.
• In statistics, linear regression is a form of regression analysis
in which the relationship between one or more independent
variables and another variable, called dependent variable, is
modeled by a least squares function, called linear regression
equation. This function is a linear combination of one or more
model parameters, called regression coefficients. A linear
regression equation with one independent variable represents
a straight line. The results are subject to statistical analysis.
Tim Browning
October, 2008
Slide 18
Mathematics of Performance & Capacity
Linear regression in Excel:
Using Graphical techniques
Linear Regression
600.0
y = 10.142x + 20.458
500.0
2
R = 0.9638
Y
400.0
300.0
200.0
Data Points
100.0
Linear (Data Points)
0.0
20
25
30
35
40
45
50
X
Tim Browning
October, 2008
Slide 19
Mathematics of Performance & Capacity
Examples of Capacity/Performance Reporting in use now
Traditional time series line charts…
Tim Browning
October, 2008
Slide 20
Mathematics of Performance & Capacity
Advanced Statistical Graphics
3-D Performance Surface
Multi-temporal density plot
Expected high/low/actual
Tim Browning
October, 2008
Slide 21
Mathematics of Performance & Capacity
SAP – CCMS Metrics via SAS/Graph
Tim Browning
October, 2008
Slide 22
Mathematics of Performance & Capacity
APPLICATION RESPONSE TIME
Application Response Time Modeling
System Unresponsive
Small Changes, Large Impact
Large Changes, Small Impact
l
INCREASING APPLICATION WORKLOAD
Tim Browning
October, 2008
Slide 23
Mathematics of Performance & Capacity
How does Modeling differ from Trending
in prediction?
Application Response Time
Application Modeling vs. Linear Regression via Trending
Date predicted
Via Trending
Date predicted
Via Modeling
SLA
Threshold
System Load
Measurement
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Application Workload
Tim Browning
October, 2008
Slide 24
Mathematics of Performance & Capacity
Modern Dynamic Systems
are Challenging to
Understand
Tim Browning
October, 2008
Slide 25
Mathematics of Performance & Capacity
Tim Browning
October, 2008
Slide 26
Mathematics of Performance & Capacity
• Response to Capacity/Performance Crisis:
•
•
I. System/Application tuning, re-engineering, and optimization :
–
Benefit: Considerable merit is obtained sometimes in the hundreds of percent improvements.
Achieved via system administrative action (usually parametric changes for the OS) and by
algorithmic and parametric re-specification (for the application). No capital expense. Efficient use of
resources.
–
Detriments: The effects may not be enduring for dynamic systems as version/release changes and
application functionality changes can, and do, degrade performance tuning effects quickly. Often
system reinitiatlization (reboot, IPL) is required and creates an availability/service delivery issue.
Application re-engineering for performance may be, and often is, cost prohibitive and/or
unsupported by executive management.
2. Capacity Increase via upgrade/replacement or technology refresh:
–
Benefit: Reduces risk of unsupported/unrecoverable infrastructure conditions. The effect is usually
long term. Accommodates increased application functionality for business utility.
–
Detriments: Capital expense may be incurred. Inefficiencies remain. Risk management to avoid
undersizing or oversizing requires expensive predictive modeling tools. Predictive analytics requires
advanced skills in tech staffing. Risks associated with new technologies which may increase
complexity (e.g. virtualization). Costs may be unsupported by executive management.
Tim Browning
October, 2008
Slide 27
Mathematics of Performance & Capacity
Modeling? Why?
Reactive Problem Solving vs Modeling
• damage grows rapidly with time;
• the longer the error goes undiscovered, the more useless and damaging
work based on the error will be done;
• when the error is discovered, it and all the associated damage has to be
removed;
• the system will then need therapy to recover
• the death rate increases dramatically with late discovery
• alternatively, the survival rate increases dramatically with early discovery
"Crude measures of the right things are better than precise measures of
the wrong things."
- from Jim Clemmer's article, "Strategic Measurements Guide Change and
Improvement"
Tim Browning
October, 2008
Slide 28
Mathematics of Performance & Capacity
Summary of Performance Analysis Techniques
Technique
Suitable
Unsuitable
Reactive problem
solving
Measurement
Almost never
Almost always
For a current status report
Consensus of
expert guessing
Analytic
Modeling
Quick and dirty decisions where
risk is low
For models such as capacity
plans where the models will be
reuseable
Simulation
Modeling
For predicting the
performance of complex
new systems and technology
in general and modern
distributed information
systems and computer
systems in particular
To determine the
performance of a particular
workload on a particular
configuration
For modern, dynamic
systems with complex,
rapidly changing technology
High risk, complexity, high
variance between experts
For projects with a) new and
untested architectures, b)
new technology, or c)
complex, heterogeneous,
highly distributed behavior
For rough, quick estimates
for large numbers of
configurations of simple,
mature systems where
analytic modeling is suitable
Benchmarking
Tim Browning
October, 2008
When many workloads,
designs, and configurations
must be analyzed
Slide 29
Mathematics of Performance & Capacity
Predictive Analytics: Benefits
 Predictive analytics provide a practical way to detect problems and
allow early correction as well as avoid resource saturation conditions.
 Simulation provides a practical way to detect such problems and
allow early correction. Avoiding the use of simulation substantially
increases the risk of failure.
 Analytical modeling provides fast and accurate answers based on
existing performance data. It allows for a variety of what-if scenarios
to be easily crafted to determine the best course of action when
systems are experiencing change.
Statistcal Forecasting and Analysis provides descriptive and
predictive aspects of IT performance data topology thru the use of
measures of central tendency, variability, correlation, linear
regression, and statistical pattern recognition.
Tim Browning
October, 2008
Slide 30
Mathematics of Performance & Capacity
SAP-specific Capacity Planning Methodology for CCE
•
We want to acquire capacity to provide required service levels for
sustained busy periods. Typical examples:
–
–
–
–
•
Month end closing
Busy daily window (e.g., 09:00 to 11:00)
Mondays
Complete batch window on time to deliver operational reports or schedule
deliveries/shipment/print picking papers/etc
The best approach is to choose the percentile you want to satisfy
– The 90th percentile of hourly mips across the month is reflective of busy
daily periods
– Likewise the 95th percentile reflects the sustained busy where there is a
pronounced financial systems month end closing effect
– In legacy OLTP we often see peak to average ratio’s between 1.5:1 and
2:1 based on the definition of peak (e.g, 90th vs 95th)
– This really is a view of sustained busy
– No one can afford to buy for absolute peaks (99th or 100th percentile)
Tim Browning
October, 2008
Slide 31
Mathematics of Performance & Capacity
Capacity Planning for the Newly Virtual
Three Essential Elements
 measurement to ascertain critical data like IT
resource availability, utilization and usage
patterns
 second-level analysis to focus on the longterm needs of the enterprise rather than the
immediate concern to bump up resources
 business realignment to ensure that IT is
keeping pace with business needs, not the
other way around
Tim Browning
October, 2008
Slide 32
Mathematics of Performance & Capacity
Capacity Planning for the Newly Virtual
 Over half (54%) of the virtual-server adopters have experienced a net
growth in capacity, while only 7% reported a net decrease (ESG
Research)
 Focus on understanding our “virtualization” factors
o Effect of non-concurrent peaks of multiple workloads
o Follow the sun in a global operation
o Better understanding of these effects can be gained by looking at the
90th/95th percentiles
o Landscape dimensions:
•
•
•
•
a workload level,
a platform (processor complex) level,
a Sysplex / Cluster level
Server/Lpar level, etc.
 The ‘virtualization’ analysis will tell us how much we can over-commit
resources
• The 95th percentile of the sums vs the sum of the 95th percentiles
• It is often the case that we have the ability to load to 115% with the sum of the
95th percentiles
Tim Browning
October, 2008
Slide 33
Mathematics of Performance & Capacity
Organizational Support
Institutionalize the process
 The resource reporting and modeling is actually the easy part
of this
 The more difficult and more important part of institutionalizing
the process is connecting the application blueprinting/design
process to the capacity planning process:
– This creates the understanding of the business drivers which is
key to scaling factors and calibration
– This is also a potential trigger for alerting the organization to the
need for a risk mitigation plan. For example, step function workload
increases with new workloads which should lead to a performance testing
activity
Tim Browning
October, 2008
Slide 34
Mathematics of Performance & Capacity
Organizational Support for Capacity Planning
Market the lesser-known benefits of capacity planning
 Strengthened relationships with developers and end users.
Communication, negotiation, and a sense of joint ownership can all
combine to nurture a healthy, professional relationship between IT and its
customers
 Improved communications with suppliers. Involving key suppliers and
support staffs with your capacity plans can promote effective
communications among these groups
 Increased collaboration with other infrastructure groups. Network
services, technical support, database administration, operations, desktop
support, and even facilities may all play a role in capacity planning. In
order for the plan to be thorough and effective, all these various groups
must support and collaborate with each other.
 Promotion of a culture of strategic planning as opposed to tactical
firefighting. One of the most significant benefits of developing an overall
and ongoing capacity-planning program is the institutionalizing of a
strategic-planning culture
Tim Browning
October, 2008
Slide 35
Mathematics of Performance & Capacity
Author/Contact:
C. Tim Browning
Coca-Cola Enterprises
Technical Architecture
Enterprise Performance & Capacity Planning
\\\|///
\\ - - //
( @ @ )
+-----oOOo-(_)-oOOo--+-----------------------------------+
|
|
|
|
T I M
| Tel: (770) 370-8566 (OFFICE)
|
|
B R O W N I N G |
(404) 210-7051 (CELL)
|
|
| MAIL: [email protected]
|
+--------------Oooo--+-----------------------------------+
oooO
(
)
(
)
) /
\ (
(_/
\_)
Go Green – Stop Global Whining
Tim Browning
October, 2008
Slide 36