Download Maturity Model for Infrastructure Monitoring

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Airborne Networking wikipedia , lookup

Network tap wikipedia , lookup

Service-oriented architecture implementation framework wikipedia , lookup

Transcript
[ WhitePaper ]
MATURITY MODEL
THE MATURITY
MODEL FOR
INFRASTRUCTURE
MONITORING .
Abstract:
The Maturity Model for Infrastructure Monitoring traces a path
through five levels, each of which adds functionalities and
capabilities that improve, streamline, automate, reduce risk and
lower cost. The details of each level describe the impact on IT
staff and end users as an organization matures from ad-hoc
monitoring and resource availability to the ultimate goal of
optimized service delivery.
Infrastructure Monitoring – A New Model for
a More Complex Environment
Monitoring today’s IT infrastructures has become so difficult
that most organizations only detect poor performance when
something goes wrong. The reason for this challenge lies in the
complexity of modern applications and networks, which are often
the result of expansions that occur over time to cope with growth
and advances in technology. These infrastructures contain both
physical and virtual components from multiple vendors, usually in
numerous locations, including both private and public clouds, and
operating on a variety of systems and platforms.
“
SevOne’s maturity model
provides Enterprise and
Service Provider organizations
with an effective, productagnostic blueprint for achieving
optimized service delivery via
performance monitoring. The
functional capabilities described,
combined with a focus on process
improvement, provide a path to
lower risk and improved enduser experience.
To make sense of infrastructure performance monitoring in this complex
environment, it helps to break down individual tasks into distinct goals,
functionalities and capabilities. The Maturity Model for Infrastructure
Monitoring describes the stages of controlled monitoring required to
track, report, react to and resolve infrastructure performance elements
comprehensively, regardless of the complexity of the network.
The Maturity Model helps:
. Reduce risk by closing visibility gaps
. Assist IT staff so they can become more efficient and eliminate
human error
. Decrease CAPEX and OPEX costs
. Lower the impact of performance issues on customers
. Reduce customer churn
. Provide a better handle on controlling infrastructure performance
“
– Shamus McGillicuddy
Senior Analyst, Network Management
Enterprise Management Associates
This white paper provides an overview of the Maturity Model for
Infrastructure Monitoring, and details the five levels and the advantages
of moving through them. It also points out the drawbacks and dangers
of leaving performance monitoring to basic tools. Lastly, it offers ways to
reach a state of optimized service delivery.
THE MATURITY
MODEL’S FIVE
LEVELS.
Starting with the ad hoc performance monitoring tools in Level One and
the basic availability tools in Level Two, the Maturity Model describes the
critical advances that come with the sophisticated standardization and
consolidation found in Level Three, the advanced visibility that comes
with Level Four, and the final optimized service delivery that results from
the monitoring platform in Level Five.
What’s at stake for an organization if it doesn’t take steps to move to
Level Five? Customer experience, application performance and capacity
planning, among other things. IT staff workloads will inevitably grow
more onerous, and problems from human error will remain on the
rise. Expansion and innovation will suffer. And, as a result of all these
unaddressed issues, IT costs will escalate across the board.
The Maturity Model provides insight into how to gain control over all
aspects of a network while reducing both risk and costs. For example,
achieving Level Five, as shown in Figure 1, will result in significant cost
savings in CAPEX and OPEX due to automation and the consolidation
of tools into a platform that addresses 80% or more of the monitoring
needs, thereby eliminating several redundant maintenance contracts.
Risk also diminishes as visibility gaps are closed and reliable multivariate
analytics are added. Returns and revenue may increase as well if the
savings are used to fund innovation and new initiatives. And finally, there
will be significant savings in IT staff time, which studies have shown leads
to improved employee satisfaction.
[White Paper] Maturity Model | PG 2
Showing the five levels and their associated risks and costs, the Maturity Model for Infrastructure Performance
Monitoring can result in significant savings and increased revenue for funding innovation and growth.
[White Paper] Maturity Model | PG 3
LEVEL
1
AD HOC
MONITORING.
Hardware vendors often supply monitoring tools for their products.
Unfortunately, these tools have limited functionality, and provide
little in the way of effective performance monitoring. Instead, they
frequently result in significant application and service disruptions
because they fail to account for the interaction of the product
with other components on the network. Additionally, their lack of
insight into effects on the overall infrastructure makes capacity
forecasting impossible.
In a typical Level One scenario, less than 20% of the infrastructure
is visible, and the focus of staff is on partial coverage of the critical
application delivery. Views are 5 minute snapshots, masking activity
spikes that occur at sub-minute intervals. Alerts occur only at upper
limit thresholds, and false positives generate a lot of noise. Ad hoc
reports are run only after performance events have happened,
and vendors’ canned reports are limited in scope and restrict
understanding of what’s going on. There is no service awareness and
almost no automation, and zero confidence that monitoring tools can
be scaled to cover a larger infrastructure.
At this level IT staff are operating blind, completing everyday tasks at
a slow pace, and dealing with significant, unplanned downtime and
capacity issues. They find themselves frequently troubleshooting in
the dark. Inefficiencies are extremely costly, and IT functioning level
is chaotic at best.
Three ways to move to Level Two:
.D
itch the hardware vendor tools in favor of solutions that
function in multi-vendor environments
. R atchet up polling to one-minute intervals for more granular
views of infrastructure performance
. Define the components of services that need monitoring
[White Paper] Maturity Model | PG 4
[White Paper] Maturity Model | PG 5
LEVEL
2
BASIC
AVAILABILITY.
Adding tools to fill in the gaps caused by inadequate hardware vendor
tools, while a well-intended fix, actually results in more drastic problems
and decreased visibility. Demands on IT staff grow because they now
have to monitor more input sources. And, costs often remain high and
may increase as even more new tools are purchased. This type of reactive
firefighting with multiple small hoses hooked to an array of disparate
monitoring tools can lead to a lot of smoke, while unresolved problems
continue to smolder.
Infrastructure visibility may now reach as high as 40%, but that still leaves
the majority of IT in the dark. Polling may increase from 5 minute cycles
in Level One to a single minute when required, but performance data
are averaged over time, resulting in poor capacity planning data and a
lack of historical reporting granularity. Too many false positive alerts
and swivel-chair troubleshooting across disparate tools still plague staff
and consume far too much of their time. Dashboards bring together
different components of service-related performance reporting, but offer
no true correlation. Overlapping and incomplete tools require costly and
redundant maintenance contracts, and agent-based monitoring adds to
the administrative burden and limits scalability.
Here in Level Two, IT staff is mainly reactive. They’re constantly putting
out fires rather than detecting sparks, still at the mercy of limited,
purpose-built tools. For end users, service is unreliable, causing a high
rate of customer churn. Staff are overworked, and job satisfaction is low.
Innovation and new initiatives are distant dreams.
Three ways to move to Level Three:
. B aseline all metrics and trigger alerts when there’s a deviation
from normal performance
. C orrelate performance metrics with flow data to better
understand consumption of resources
. Crank up interoperability and automation by integrating with help
LEVEL
3
STANDARDIZATION
& CONSOLIDATION.
Level Three is where recognizable, measurable, positive change begins.
Costs start to decrease, staff time is freed up, and a single source of
truth about the infrastructure’s performance emerges. A single, scalable,
future-proof monitoring platform provides visibility into about 60% of
the infrastructure, creating a consolidated metric for key performance
indicator (KPI) monitoring. There’s a clear indication of the health of the
service being provided, thanks to end-to-end testing with IP SLA or RPM.
[White Paper] Maturity Model | PG 6
[White Paper] Maturity Model | PG 7
At this stage, there’s end-to-end visualization of network, compute and
storage by business unit or customer, and it’s possible to view both
physical and virtual resources on one screen. Reports can be customized
on the fly, because they derive from a real-time, single source of truth.
A central, scalable monitoring platform addresses 80% or more of
monitoring needs, with point solutions for specific services. Though
you will always have point solutions for specific monitoring needs, the
majority of infrastructure monitoring at this stage is done without the
need for agents or probes, significantly decreasing administrative burden.
Finally, since there is now integration with help desk solutions such as
ServiceNow, SalesForce, and ZenDesk, seamless transfer of information
between platforms can occur, resulting in faster issue resolution.
However, there’s still room for improvement. For forecast needs and
capacity planning, staff continues to gather data from a number of
sources and manually enter them into spreadsheets. The ability to scale
to current monitoring demands has vastly improved, but at a significant
price tag because of investments in hardware like high-end servers,
pollers, data collectors and centralized database infrastructure.
Level Three also requires baselines for every metric collected. This
provides an accurate view of what’s “normal” at any given time.
When performance deviates from historical norms, an alert is sent.
Understanding change in this way is a key component of Level Three
because often these changes are not only a symptom of problems;
they’re a direct or indirect cause as well.
In Level Three, conditions across the infrastructure are normalized. Staff
is more comfortable and in greater control since they can see more than
half of the infrastructure at any given time. Future-proofing is in place,
and a measurable reduction in costs has begun to take place.
Three ways to move to Level Four:
. Incorporate visibility of applications and service delivery as
opposed to monitoring only individual infrastructure components
. L ink your alerts to log analysis to spot unique logs or
trending conditions
.D
efine, monitor and alert on custom KPIs that don’t exist in the
MIB of monitored devices
LEVEL
ADVANCED
VISIBILITY.
4
At Level Four, service-level views and cross-platform processes ensure
that reliable metrics are the basis of business decision-making. And,
mean time to repair (MTTR) is reduced significantly, resulting in fewer
staff-hours devoted to troubleshooting and issue resolution.
Here, 80% to 90% of the infrastructure is visible, including application
and service delivery instead of just component monitoring. Automated
discovery of L2 and L3 topology is available, and it’s possible to view
real-time status and SLA instrumentation, including packet loss, jitter
[White Paper] Maturity Model | PG 8
and congestion. Log analysis now triggers alerts, working with accurate
baselines and thresholds based on each unique environment. Single
clicks get staff from metric to flow to logs within the same interface,
greatly facilitating troubleshooting and reducing MTTR. In fact, the
monitoring platform makes it possible to resolve half of all issues
proactively before they produce any discernible impact. Organizations
know what’s happening on the network, where it’s happening, and when
it’s happening -- end to end.
At this level capacity planning and trending can be performed from a
single platform. For example, using reports like “days until threshold” and
log data analysis, staff can anticipate how user behavior on individual
applications will impact capacity needs of the underlying infrastructure.
They can then make necessary adjustments to avoid any user impact.
These proactive capacity planning insights can be especially helpful when
rolling out new applications or services.
These reports reliably support business decisions, offering insights based
on KPIs defined by the organization. All time series data can be ingested–
regardless of source – and seamlessly graphed with other metrics, such
as SNMP and IP SLA. For example, an organization could correlate footfall
traffic to demand on a wireless network, or correlate transaction volume
to the stress it places on the underlying infrastructure.
At Level Four, it’s possible to view all object metrics down to one-second
granularity, with zero degradation to the speed of reports, no matter
the size of the monitored domain. Ingestion of daily log volumes greater
than a terabyte is possible, with flows-per-second in the hundreds of
thousands. Without the need for human intervention, the platform allows
new devices to be added to the configuration management database
and integrated with data center orchestration and tools such as Ansible,
Puppet and Chef.
IT strategy and operation has been streamlined and is now proactive.
Interoperability means greatly reduced MTTR, and cost savings are
dramatically evident. The effects are now being felt by customers and
IT staff alike. Customers are seeing consistently reliable service, and
employees are experiencing the relief that comes with responsible
automation. The result is a positive impact on overall business. But
there’s still one more threshold to cross.
Three ways to move to Level Five:
. T ie alerts to multi-variate analysis to spot trouble due to
multiple, related events
. C ollect sub-second views of infrastructure performance from
probe-based solutions and report on these metrics from the
monitoring platform
. Incorporate service-centric status maps to create awareness of
all the components required to deliver the service successfully
[White Paper] Maturity Model | PG 9
LEVEL
5
OPTIMIZED
SERVICE DELIVERY.
Level Five is the ultimate goal in the Maturity Model for Infrastructure
Performance Monitoring . At this stage an effective platform incorporates
extensive automated functions and multivariate analytics. And, there’s
full understanding and control of the entire infrastructure, end to
end, including hybrid cloud elements and all the on- and off-premises
components that make up the network.
With comprehensive automation and reliable real-time analytics,
infrastructure performance undergoes continuous improvement. Optimal
performance drives innovation and creative expansion, and with less of
their attention obligated to mundane monitoring tasks, IT staff have more
time to spend on innovation, which often leads to greater job satisfaction.
Visibility is now at maximum -- a full 99%. Even “shadow IT” is no longer in
the dark, so there’s awareness of everything impacting the infrastructure
resources. Organizations have insight into how environmental,
transaction volume and energy consumption impact the underlying
infrastructure. For example, the platform monitors the temperature
inside and outside a datacenter, noting any differential in trending
conditions as a possible indication of an issue. The platform can even
monitor power strips to detect inefficient servers that draw more energy
than is necessary or normal.
At this level, probe-based solutions deliver sub-second performance
views, including agent-based end-user experience metrics, and correlate
them with infrastructure performance. Comparative analysis and
multivariate metrics and analytics rule, providing increased forecasting
accuracy. The completely virtualized, all-in-one monitoring platform
also has the ability to spin up new monitoring capacity on demand
and as needed.
The platform detects and rectifies 80% of performance issues prior to
any significant end user impact, thanks primarily to service-centric status
maps that reveal all the components involved in successfully delivering
the service. Software defined network (SDN) controllers subscribe to the
performance monitoring platform in order to receive recommendations
for optimizing the performance of the virtual infrastructure. Cloud,
virtualization and an all-IP connected environment means it’s possible to
intuitively scale with massive data collection at will. Adding monitoring
capacity is as simple as spinning up a new VM on demand.
At this stage, organizations have nearly fully automated the monitoring
of their infrastructure performance, resulting in an unprecedented
level of confidence. With a renewed sense of job satisfaction, staff can
now spend time fine-tuning the particulars, and are free to create and
pursue continuous improvement of the application and service delivery.
Having maximized the value of the monitoring platform — and saved
considerable CAPEX and OPEX funds in doing so — it’s now possible to
explore new revenue streams through savings-funded innovation.
[White Paper] Maturity Model | PG 10
[White Paper] Maturity Model | PG 11
DETERMINING
LEVELS ON THE
MATURITY MODEL.
So, how can organizations find out what level they’re at? And what do they
need to do to get to Level Five?
First, they must recognize that they might be far along the model in some
areas but lagging behind in others. For example, a datacenter might be
able to collect time series data at an advanced level, but when an issue
arises, staff may be swivel-chair troubleshooting for hours, even days,
with a variety of disparate vendor tools. A good motivation for moving
an area up from the lower level is that the non-optimal area may be
negatively influencing areas already farther up the model.
This quick Online Assessment helps organizations find their level on the
Maturity Model. The information gained from this assessment provides
a sense of what’s already being done well and helps identify what can
be improved.
CLICK
HERE
CHOOSING A
COMPREHENSIVE
PERFORMANCE
MONITOR.
Get a quick Online Assessment of your
Infrastructure Monitoring Maturity level.
Visit: info.sevone.com/maturitymodel
When looking for a performance monitoring platform that will move an
organization up the levels on the Maturity Model, it’s important to find
one with functionalities and capabilities that cover all five levels. Often,
simple monitoring solutions will only cover certain aspects of the first
couple levels.
Staff must also examine all aspects of infrastructure monitoring -- the
inputs and outputs, as well as the various platforms in the network,
including SDNs and any specialty applications unique to the organization.
When moving through the levels, it’s important to take full advantage
of the various functionalities and capabilities of whatever performance
monitoring system is implemented. Moving up one, or even two levels,
can take as little as a couple of weeks. During this process, organizations
can implement functionalities on their own or take advantage of technical
help from system specialists.
Currently, no single solution covers everything needed to reach the top
of the Maturity Model. However, SevOne’s comprehensive infrastructure
monitoring platform comes close -- very close. By providing a fully
automated, comprehensive solution, it’s helping some of today’s largest,
most connected companies gain the security, cost savings and peace of
mind that comes with approaching Level Five.
About SevOne.
SevOne provides the world’s most scalable infrastructure performance monitoring platform to the world’s most connected companies.
The patented SevOne ClusterTM architecture leverages distributed computing to scale infinitely and collect millions of objects. It provides
real-time reporting down to the second and provides the insight needed to prevent outages. SevOne customers include seven of today’s
13 largest banks, enterprises, CSPs, MSPs and MSOs. SevOne is backed by Bain Capital Ventures. More information can be found at
www.sevone.com. Follow SevOne on Twitter at @SevOneInc.
[ www.sevone.com | blog.sevone.com | [email protected] ]
SEV_WP_06_2015