Download Unit 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Operations management wikipedia , lookup

Transcript
Unit 2
Availability
Availability
• Availability is the process of optimizing the
readiness of production systems by accurately
measuring, analyzing, and reducing outages to
those production systems.
• Availability is one of the most commonly known
characteristics of any computer system.
• If the system is up and running, it is available to
the user otherwise not.
Ex:
Landline telephone system.
• infrastructure analysts focus not only on the timely
recovery from outages to service,
but on methods to reduce their frequency and
duration to maximize availability.
• The suppliers : by nature of their responsibilities,
are interested in keeping their particular
components of the system up and running.
.
• Customers, or end-users, are primarily interested in
their system being up and running—that is,
available to them.
• Differentiating the term availability from other
terms like uptime, downtime, slow response, and
high availability.
• Differentiating the availability and uptime.
availability ---customers
uptime --- suppliers
Uptime
• Uptime is a measure of the time that individual
components within a production system are
functionally operating.
• uptime as oriented toward suppliers.
• It focuses on individual component of a system.
• The suppliers are interested in keeping their
particular components of the system up and
running.
Availability
• Availability is the process of optimizing the
readiness of production systems by accurately
measuring, analyzing, and reducing outages to
those production systems.
• availability as oriented toward customers
• It focuses on the production system as a whole.
• Customers, or end-users, are primarily interested in
their system being up and running—that is,
available to them.
.
• Availability is the process
of
optimizing
the
readiness of production
systems by accurately
measuring, analyzing, and
reducing outages to those
production systems.
• availability as oriented
toward customers
• It
focuses
on
the
production system as a
whole.
• Customers, or end-users,
are primarily interested in
their system being up and
running—that is, available
to them.
• Uptime is a measure of
the time that individual
components within a
production system are
functionally operating.
• uptime
as
oriented
toward suppliers.
• It focuses on individual
component of a system.
• The
suppliers
are
interested in keeping
their
particular
components
of
the
system up and running.
• End-users mainly want • Infrastructure specialists
primarily
want
assurances
that
the
assurances that the
application system they
components
of
the
need to do their jobs is
system for which they
available to them when
are
responsible
are
and where they need it.
meeting or exceeding
their
uptime
expectations.
Ex:
systems administrators focus on keeping the
server hardware and software up and
operational.
Network administrators have a similar focus on
network hardware and software, and
database administrators do the same with their
database software.
Components of availability
1. Data center facility
2. Server hardware (processor, memory, channels)
3. Server system software (operating system, program
products)
4. Application software (program, database management)
5. Disk hardware (controllers, arrays, disk volumes)
6. Database software (data files, control files)
7. Network software
8. Network hardware (controllers, switches, lines, hubs,
routers, repeaters, modems)
9. Desktop software (operating system, program products,
applications)
10. Desktop hardware (processor, memory, disk, interface
cards)
Differentiating Slow Response from Downtime
1. Slow Response
Slow response refers to unacceptably long periods of
time for an online transaction to complete processing
and return results to the user.
The period of time deemed unacceptable varies
depending on the type of transaction involved.
Ex: For simple inquiries, a one-second response may
seem slow; for complex computations, two- or three
second responses may be acceptable.
Slow response
is usually a performance and tuning problem
requiring highly-trained personnel with
specialized expertise.
.
following factors can contribute to slow response
times:
• Growth of a database
• Traffic on the network
• Contention for disk volumes
• Disabling of processors or portions of main
memory in servers
Each of these conditions requires analysis and
resolution by infrastructure professionals.
.
• users are normally unaware of these root causes
and sometimes interpret extremely slow response
as downtime to their systems.
• the root cause of these problems does matter a
great deal to infrastructure analysts and
administrators.
• They are charged with identifying, correcting, and
permanently resolving the root causes of these
service disruptions.
2. Downtime
• Downtime refers to the total inoperability of a
hardware device, a software routine, or some other
critical component of a system that results in the
outage of a production application.
slowly responding software - slow response
malfunctioning hardware - downtime
Differentiating Availability from High
Availability
1. High Availability
High availability refers to the design of a production
environment such that all single points of failure are
removed through redundancy to eliminate
production outages.
Fault Tolerant
• Fault tolerant refers to a production
environment in which all hardware and software
components are duplicated such that they can
automatically fail-over to their backup
component in the event of a fault.
Desired Traits of an Availability Process Owner
• In some instances, it is the operations managers;
in others, it is a strong technical lead in technical
support.
• they should be knowledgeable in a variety of
areas, including systems, networks, databases,
and facilities;
they also must be able to think and act tactically.
.
• desirable Trait of an ideal candidate for availability
process owner is
knowledge of-software and hardware config.,
backup systems, and
desktop hardware and software.
Characteristics of an availability process owner
•
•
•
•
•
•
•
Knowledge of system s/w and component
Knowledge of network s/w and component
Knowledge of database system
Knowledge of s/w & h/w configuration
Knowledge of desktop s/w & h/w
Knowledge of applications
Ability to communicate effectively with IT
executives
• Ability to think & act tactically
The
Seven
Rs
of
High
Availability
• The goal of all availability process owners is to
maximize the uptime of the various online systems.
• Following factors working against the goal of 100
percent availability.
 Budget limitations
 Component failures
 Faulty code
 Human error
Natural disasters
.
• There are several approaches that can be taken to
maximize availability
1.
2.
3.
4.
5.
6.
7.
Redundancy
Reputation
Reliability
Repairability
Recoverability
Responsiveness
Robustness
1. Redundancy
• Manufacturers have been designing these
components into their products for years in the
form of redundant:
Power supplies
Multiple processors
Segmented memory
Redundant disks
• Infrastructure analysts can take a similar
approach by configuring disk and tape controllers,
and servers with dual paths, splitting network
loads over dual lines, and providing alternate
control consoles.
• in short, eliminate as much as possible any single
points of failure that could disrupt service
availability.
2. Reputation
• The reputation of key suppliers of servers, disk
storage systems, database management systems,
and network hardware and software plays a
principle role in striving for high availability.
• Reputations can be verified in several ways,
including the following:
Percent of market share
Reports from industry analysts
Publications such Wall Street Journal and
Computer World
Track record of reliability and repairability
Customer references
cost,
service,
quality of the product,
training of service personnel,
3. Reliability
• The reliability of the h/w and s/w can also be
verified from customer references and industry
analysts.
• An analysis of problem logs should reveal any
unusual patterns of failure and should be studied
by supplier, product, using department, time and
day of failures, frequency of failures, and time to
repair.
component reliability analysis.
1.
2.
3.
4.
5.
Review and analyze problem management logs.
Review and analyze supplier logs.
Acquire feedback from operations personnel.
Acquire feedback from support personnel.
Acquire feedback from supplier repair
personnel.
6. Compare experiences with other shops.
7. Study reports from industry analysts.
4. Repairability
• Repairability refers technicians can resolve or
replace failing components.
• Two common metrics are
1. How long it takes to do the actual repair and
2. How often the repair work needs to be repeated
I,e
average or mean time to repair (MTTR).
5. Recoverability
• Another characteristic of high availability is
recoverability.
• This refers to the ability to overcome a failure in
such a way that there is no impact on end-user
availability.
6. Responsiveness
• This trait is the sense of urgency all people
involved with high availability need to exhibit.
• This includes having well-trained suppliers and
in-house support personnel who can respond to
problems quickly and efficiently.
• Escalation is another aspect of responsiveness.
7. Robustness
• robust process will be able to withstand a
variety of forces—both internal and
external—that could easily disrupt and
undermine availability in a weaker
environment.
• Robustness puts training to withstand the
following:
• Technical changes as they relate to:
Platforms
Products
Services
Customers
• Personnel changes as they relate to:
Turnover
Expansion
Rotation
• Business changes as they relate to:
New direction
Acquisitions
Mergers