Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Unit 2 Availability Availability • Availability is the process of optimizing the readiness of production systems by accurately measuring, analyzing, and reducing outages to those production systems. • Availability is one of the most commonly known characteristics of any computer system. • If the system is up and running, it is available to the user otherwise not. Ex: Landline telephone system. • infrastructure analysts focus not only on the timely recovery from outages to service, but on methods to reduce their frequency and duration to maximize availability. • The suppliers : by nature of their responsibilities, are interested in keeping their particular components of the system up and running. . • Customers, or end-users, are primarily interested in their system being up and running—that is, available to them. • Differentiating the term availability from other terms like uptime, downtime, slow response, and high availability. • Differentiating the availability and uptime. availability ---customers uptime --- suppliers Uptime • Uptime is a measure of the time that individual components within a production system are functionally operating. • uptime as oriented toward suppliers. • It focuses on individual component of a system. • The suppliers are interested in keeping their particular components of the system up and running. Availability • Availability is the process of optimizing the readiness of production systems by accurately measuring, analyzing, and reducing outages to those production systems. • availability as oriented toward customers • It focuses on the production system as a whole. • Customers, or end-users, are primarily interested in their system being up and running—that is, available to them. . • Availability is the process of optimizing the readiness of production systems by accurately measuring, analyzing, and reducing outages to those production systems. • availability as oriented toward customers • It focuses on the production system as a whole. • Customers, or end-users, are primarily interested in their system being up and running—that is, available to them. • Uptime is a measure of the time that individual components within a production system are functionally operating. • uptime as oriented toward suppliers. • It focuses on individual component of a system. • The suppliers are interested in keeping their particular components of the system up and running. • End-users mainly want • Infrastructure specialists primarily want assurances that the assurances that the application system they components of the need to do their jobs is system for which they available to them when are responsible are and where they need it. meeting or exceeding their uptime expectations. Ex: systems administrators focus on keeping the server hardware and software up and operational. Network administrators have a similar focus on network hardware and software, and database administrators do the same with their database software. Components of availability 1. Data center facility 2. Server hardware (processor, memory, channels) 3. Server system software (operating system, program products) 4. Application software (program, database management) 5. Disk hardware (controllers, arrays, disk volumes) 6. Database software (data files, control files) 7. Network software 8. Network hardware (controllers, switches, lines, hubs, routers, repeaters, modems) 9. Desktop software (operating system, program products, applications) 10. Desktop hardware (processor, memory, disk, interface cards) Differentiating Slow Response from Downtime 1. Slow Response Slow response refers to unacceptably long periods of time for an online transaction to complete processing and return results to the user. The period of time deemed unacceptable varies depending on the type of transaction involved. Ex: For simple inquiries, a one-second response may seem slow; for complex computations, two- or three second responses may be acceptable. Slow response is usually a performance and tuning problem requiring highly-trained personnel with specialized expertise. . following factors can contribute to slow response times: • Growth of a database • Traffic on the network • Contention for disk volumes • Disabling of processors or portions of main memory in servers Each of these conditions requires analysis and resolution by infrastructure professionals. . • users are normally unaware of these root causes and sometimes interpret extremely slow response as downtime to their systems. • the root cause of these problems does matter a great deal to infrastructure analysts and administrators. • They are charged with identifying, correcting, and permanently resolving the root causes of these service disruptions. 2. Downtime • Downtime refers to the total inoperability of a hardware device, a software routine, or some other critical component of a system that results in the outage of a production application. slowly responding software - slow response malfunctioning hardware - downtime Differentiating Availability from High Availability 1. High Availability High availability refers to the design of a production environment such that all single points of failure are removed through redundancy to eliminate production outages. Fault Tolerant • Fault tolerant refers to a production environment in which all hardware and software components are duplicated such that they can automatically fail-over to their backup component in the event of a fault. Desired Traits of an Availability Process Owner • In some instances, it is the operations managers; in others, it is a strong technical lead in technical support. • they should be knowledgeable in a variety of areas, including systems, networks, databases, and facilities; they also must be able to think and act tactically. . • desirable Trait of an ideal candidate for availability process owner is knowledge of-software and hardware config., backup systems, and desktop hardware and software. Characteristics of an availability process owner • • • • • • • Knowledge of system s/w and component Knowledge of network s/w and component Knowledge of database system Knowledge of s/w & h/w configuration Knowledge of desktop s/w & h/w Knowledge of applications Ability to communicate effectively with IT executives • Ability to think & act tactically The Seven Rs of High Availability • The goal of all availability process owners is to maximize the uptime of the various online systems. • Following factors working against the goal of 100 percent availability. Budget limitations Component failures Faulty code Human error Natural disasters . • There are several approaches that can be taken to maximize availability 1. 2. 3. 4. 5. 6. 7. Redundancy Reputation Reliability Repairability Recoverability Responsiveness Robustness 1. Redundancy • Manufacturers have been designing these components into their products for years in the form of redundant: Power supplies Multiple processors Segmented memory Redundant disks • Infrastructure analysts can take a similar approach by configuring disk and tape controllers, and servers with dual paths, splitting network loads over dual lines, and providing alternate control consoles. • in short, eliminate as much as possible any single points of failure that could disrupt service availability. 2. Reputation • The reputation of key suppliers of servers, disk storage systems, database management systems, and network hardware and software plays a principle role in striving for high availability. • Reputations can be verified in several ways, including the following: Percent of market share Reports from industry analysts Publications such Wall Street Journal and Computer World Track record of reliability and repairability Customer references cost, service, quality of the product, training of service personnel, 3. Reliability • The reliability of the h/w and s/w can also be verified from customer references and industry analysts. • An analysis of problem logs should reveal any unusual patterns of failure and should be studied by supplier, product, using department, time and day of failures, frequency of failures, and time to repair. component reliability analysis. 1. 2. 3. 4. 5. Review and analyze problem management logs. Review and analyze supplier logs. Acquire feedback from operations personnel. Acquire feedback from support personnel. Acquire feedback from supplier repair personnel. 6. Compare experiences with other shops. 7. Study reports from industry analysts. 4. Repairability • Repairability refers technicians can resolve or replace failing components. • Two common metrics are 1. How long it takes to do the actual repair and 2. How often the repair work needs to be repeated I,e average or mean time to repair (MTTR). 5. Recoverability • Another characteristic of high availability is recoverability. • This refers to the ability to overcome a failure in such a way that there is no impact on end-user availability. 6. Responsiveness • This trait is the sense of urgency all people involved with high availability need to exhibit. • This includes having well-trained suppliers and in-house support personnel who can respond to problems quickly and efficiently. • Escalation is another aspect of responsiveness. 7. Robustness • robust process will be able to withstand a variety of forces—both internal and external—that could easily disrupt and undermine availability in a weaker environment. • Robustness puts training to withstand the following: • Technical changes as they relate to: Platforms Products Services Customers • Personnel changes as they relate to: Turnover Expansion Rotation • Business changes as they relate to: New direction Acquisitions Mergers