Download Disaster Recovery Plan

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Oracle Database wikipedia , lookup

Relational model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

SAP IQ wikipedia , lookup

Transcript
SIEMENS
SAP
Implementation for PIFRA
PIFRA Disaster Recovery Plan
Overview:
The most important aspect of SAP R/3 implementation at PIFRA is establishing an
effective backup and recovery strategy. This process entails a restore of all, or part of
the database after hardware or software errors and a recovery to which the PIFRA
System(s) is updated to a point just before the failure. Many situations may arise that
require a restore and a recovery. These will be discussed in detail in this document.
The backup strategy should be as simple as possible. Complications in backup
strategy can create difficult situations during restoration and recovery.
One of the main aspect of a disaster recovery plan is that the procedures, problem
identification, and handling must be well documented so all individuals clearly
understand their roles and required tasks. This strategy should also not adversely
impact daily transactions at PIFRA.
This document discusses backup and restore strategy of systems with respect to the
setup and implementation at PIFRA. A System Disaster may be considered "any
event that causes significant disruption in services for a period of time that effects the
organization". Therefore, this plan covers various levels of service interruption. The
information in this plan is organized in such a way that this is possible for the other
sites to choose the pieces of the plan necessary for recovery depending on the type
of interruption. For example, in the event of a server failure, the DRP must provide
the contact numbers of persons necessary for recovery of disaster. It should also
provide information of onsite and offsite backups. With this information, the System
Administration Team can begin the recovery process in case of a disaster in the most
efficient and planned manner.
A sensible recovery plan may be the one thing that keeps it going. Here are the basic
steps to creating such a plan and making sure it will work when required. These
steps are observed through the extensive study of on going process at PIFRA Test
Site.
Basic Steps
Make disaster recovery an integral part of the way PIFRA business process runs.
Someone at the top needs explicit responsibility for overseeing the plan, as it is too
easy to make dangerous mistakes when times are tough.
Prioritize
The data and systems that need to be recover first. Each section thinks that theirs is
the most important, but the decision has to be made -- and that usually ends up with
the System Administration, which has the appropriate business insight. Don't forget
to look outside the data centre for things that need protecting. If employees have
heavily customized desktops to do their work, how will it affect them if they have to
start from scratch? Paper records are also always important.
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
1 of 25
Redundancy
Make sure System administrator have redundancy for critical systems, whether it's a
RAID storage system, server mirroring or even a complete duplicate data centre.
There should be no one point of failure, including power supplies,
telecommunications or even the office building itself, that will disrupt SAP Servers for
any length of time.
Backup
Along with redundancy, backup is the most important part of disaster recovery. Once
System Administrator knows what he needs to backup, decide when and how he will
do his backups. A common scheme is to do a full backup at the beginning of each
week, followed by deltas -- backups of changes -- at least daily if not more often.
These can be differential backups, where the entire difference from the starting state
is copied each time, or incremental, where the difference since the last backup is
stored. Incremental backups take less time but produce more individual backups that
have to be restored in order; with differential, administrator have just two restorations
to make.
Offsite backups are essential, but difficult to manage -- especially for the smaller
organizations. Where teleworking is common, it may be possible to automate the
keeping of remote copies of information as part of the standard access
arrangements. Whatever the backup process -- and floppy disks, CD-Rs, removable
hard disks, tapes, leased lines and VPNs are all common -- ensure that access to
offsite backups isn't dependent on just one person. It is common to duplicate the
weekly backup and keep it offsite, and also to keep monthly backups.
Security
Don't neglect security. If System Administrator needs to make backups of sensitive
information, is it adequately protected from attack if someone gets access to -- or
steals -- the backup? Conversely, if he has a secure backup protected by encryption
or severe access controls, is it possible to retrieve the information if key persons are
not available.
Regular Tests
Run regular tests to shake the bugs out of SAP servers -- and that means testing
absolutely everything. Tests that produce no errors aren't tough enough:
Administrators not testing to make sure it works, but to find out when it doesn't. This
will also tell you if your recovery procedure is working but too slow or cumbersome -a system that comes back but takes two days to rebuild may be inappropriate.
Deciding backup and restoration strategies should be part of the initial architectural
planning of any major system and should influence bus types, storage devices and
the segmentation of the network.
When your SAP business processes change, reassess your plans. An acquisition,
new operating system installation or reorganization can trigger this. Also, when
PIFRA change an underlying system and migrate data over make sure you can
recover to the old system for as long as may be necessary -- it's no good having old
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
2 of 25
data desperately needed if System Administrator no longer have a system that will
read it.
There's no point in having your data in the hands of an inexperienced persons. And
keep contact information up to date -- lists of employees with addresses and mobile
phone numbers, supplier contacts, and making everyone's role in the recovery plan
part of their basic training.
Siemens Team offer a full range of recovery options depending on PIFRA specific
needs and budget from simple hardware replacement to complex mirroring.
Backups:
Backup means making copies of files from one location to another. Both, the source
and the target can be on the same or on different storage device. It is best to copy
the files to a different storage device other than the source. Data can be restored
from this backup if the original files get damaged, lost or else.
The backup strategy adopted at PIFRA is in such a way to minimize the chances of a
system disaster and to recover the system at the earliest in case of such a
happening.
A System Disaster:
A System Disaster can mean anything: theft, fire, flood, an earthquake, a virus or
anything that could keep users from accessing SAP Servers and hence the data. If
System Administrator loses entire system (possibly including hardware), then he has
to recover the system as much and as soon as possible.
Disaster Recovery Planning and Disaster Recovery
There are many sites where most attention is paid to database growth and little or no
at all to planning disaster recovery. We can divide disasters into 2 major categories:

Lose of data caused by failure of a hard disk drive or a disk controller. If
hardware redundancy and backup copies are provided we should be able
(usually) to perform onsite recovery within a short period of time.

Disasters of catastrophic nature like fire, explosion or flood. In this case we
require a comprehensive plan organizing offsite recovery and continuation of
business.
Data recovery plan for the first case should be a part of comprehensive plan for
disaster recovery in second case. In both cases it is important to test recovery plans
and to repeat the testing procedure periodically.
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
3 of 25
Critical Steps Involved In SAP Recovery


Database installatio n and maintenance
Planning and conducting of installation

Configuration and upgrades.

Rearranging data structure.

Periodic maintenance of database.

Planning platform and

System upgrade and migration.
Plan
This plan follows the document needed for PIFRA Test Site Services, and prepared
by Disaster Recovery Team in co-ordination with PIFRA Directorate.
Definition of need for a plan
The need for a disaster recovery plan is in order to provide data security and access
to users during normal operations and at times of disaster.
Disaster recovery Team
Members of the Disaster Recovery Team responsible for the Data Processing
component of disaster recovery include:



System Administrator
WAN Administrator
LAN Administrator
Members of the Disaster Recovery responsible for the administrative component of
disaster recovery include:

Manager of Administrative Services for Test as well as Pilot site.
Document goals and objective
The Information Technology Services of Siemens will plan and implement disaster
recovery techniques applicable to:




Computer hardware systems
Computer software systems
PIFRA project data maintained by computing systems located at the Central
Office (Test Site).
Data communications systems, including hardware, software, and lines.
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
4 of 25
As per the System data security audit, the Assistant Manager for Administrative
Services will be the primary administrator for the institutional disaster recovery plan.
Each computer-oriented Pilot Site will develop a disaster recovery and/or business
interruption plan to assure their continued operation when Central systems are
inoperable.
Equipment inventory
The Test Site at the Central Office is responsible for disaster recovery for the
following hardware and software systems at pilot Sites.









SAP Servers
Hardware, software, and data.
Goods inventory system hardware, software, and data.
Personnel responsible for hardware, software, and data tapes.
PIFRA electronic mail hardware and software.
WAN hardware, software, configurations, and lines for systems within.
Central Office LAN hardware, software, configurations, data, and wiring.
UPS equipment located at the Test Site Office.
Key client workstations at the Test Site Office.
Disaster Recovery Phases:
Mission Critical Elements and Applications
1.
2.
3.
4.
5.
6.
SAP Database Servers.
PIFRA Information System
Local Area Network
Electronic Mail System
Inventory System
LAN/WAN
Systems Required To Restore Mission Critical Services
1. SAP Servers - Development server (based on multiprocessor - 512MB
memory with RAID Configuration)
2. Quality assurance Server (8GB RAM ,120GB HD)
3. Production Server.
4. Ethernet network
5. Intel based LAN file server with 128MB memory and 40GB disk storage
6. Test site UPS(Server) – 2.2KVA
7. Intel based Print server with 128RAM and 20 GB Hard disk storage.
8. Blank Tapes for Backups
9. Data circuits between Pilot site location
Equipment and Space
Physical Space for Restoring Institutional Systems
In the event current office space was unusable, new office space would be necessary
that meets the follow requirements:
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
5 of 25





Minimum of 180 square feet of floor space
Temperature between 60 and 75 degrees Fahrenheit
Humidity - noncondensing
Ethernet or Radio Link/DXX access to backbone
Wiring capable of supporting Ethernet LAN would be necessary to support
new Central Office’s LAN
Equipment
PIFRA and SAP servers would need to be acquired from Supplier. LAN server could
be any Intel based PC of sufficient size. Access to 2 ports on link integrity a hub
would be required for PIFRA and electronic mail servers. 48 port link integrity Switch
would be required for restoring new Central Office LAN.
Personnel
All System administration staff would be required to restore mission critical systems
to operation. Key members of other areas including the Functional Consultants, and
members of their staffs, would be required to assist in restoration of critical systems.
Record Storage
Offsite backup tapes are stored in a safe deposit box at ITS Office of Siemens. Each
Monday the backups for the previous Friday are moved to this site and previously
stored tapes returned. Access to the offsite tapes is available 24 hours. System
Administrator has access to this box. The box also includes a copy of this plan, as
well as brief documentation on PIFRA, Inventory.
SAP Servers recovery includes the time needed to:




Find the problem
Repair the damage
Restore the System
Online the System for all users
Factors involved in a disaster recovery plan:





Business process cost of downtime to recover
Operational schedule
Global or local users
Number of transactions an hour
Budget
Steps involved in Disaster recovery plan:
A disaster recovery plan, sometimes referred to as a business continuity plan (BCP)
or business process contingency plan (BPCP) - describes how a PIFRA System
Administrator is to deal with potential disasters. Just as a disaster is an event that
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
6 of 25
makes the continuation of normal functions impossible, a disaster recovery plan
consists of the precautions taken so that the effects of a disaster will be minimized,
and the System Administration Team will be able to either maintain or quickly resume
mission-critical functions. Typically, disaster recovery planning involves an analysis
of business processes and continuity needs; it may also include a significant focus on
disaster prevention.
Risk Analysis
The first step in drafting a disaster recovery plan is conducting a thorough risk
analysis of SAP R/3 systems. List all the possible risks that threaten system uptime
and evaluate how imminent they are Test Site. Anything that can cause a system
outage is a threat, from relatively common man-made threats like virus attacks and
accidental data deletions to more rare natural threats like floods and fires.
Establish the Budget:
Once you've figured out the risks, ask what we can do to suppress them, and how
much will it cost.
The results of risk analysis should be a comprehensive list of possible threats, each
with its corresponding solution and cost.
Develop the Plan:
The recovery procedure should be written in a detailed plan or "script." Establish a
Recovery Team from among the System Administration Team and assign specific
recovery duties to each member.
Define how to deal with the loss of various aspects of the network (databases,
servers, bridges/routers, communications links, etc.) and specify who arranges for
repairs or reconstruction and how the data recovery process occurs. The script will
also outline priorities for the recovery: What needs to be recovered first? What is the
communication procedure for the initial respondents? To complement the script,
create a checklist or test procedure to verify that everything is back to normal once
repairs and data recovery have taken place.
Test, Test, Test:
Once our Disaster Recovery Plan is set, test it frequently. Eventually you'll need to
perform a component-level restoration of your largest databases to get a realistic
assessment of our recovery procedure, but a periodic walk-through of the procedure
with the Recovery Team will assure that everyone knows their roles. Test the
systems you're going to use in recovery regularly to validate that all the pieces work.
Always record your test results and update the Disaster Recovery Plan to address
any shortcomings.
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
7 of 25
Update the Disaster Recovery Plan:
It is very important to update the disaster recovery plan from time to time. This
depends how rapidly changes are being brought in the organization with respect to
the change in system architecture, change of system activities, etc.
Systems/devices to be prevented from a disaster at PIFRA:
System
SID
IP Address
Category
Location
AP Development Level
Development System
Play / Sandbox
Sindh Play System
R3T
PLY
R3D
192.168.1.251
192.168.1.240
C
B
Central Site
Central Site
AGPR s/o Sindh
C
B
C
C
-
Central Site
Siemens ITS
Siemens ITS
ATI, Lahore
FD Baloch
AGPR s/oSindh
AGPR s/oSindh
SAP Quality Assurance Level
Quality Assurance System
Data Migration System I
Data Migration System II
Training System II (Punjab)
Training System I (Baloch)
Training System III (Sindh)
Data Migration III (Baloch)
QAS
DMF
DMH
TR2
TR1
R3Q
UTH
192.168.243
192.168.1.9
192.168.1.212
SAP Production Level
Federal Production
Database Server
Application Server I
Application Server II
NWFP Production
Database Server
Application Server I
Application Server II
Punjab Production
Sindh Production
Balochistan Production
MoF Production
Database Server
Application Server
FD NWFP Production
Database Server
Application Server
FD Punjab Production
Database Server
Application Server
FD Sindh Production
FDN
FDN
192.168.10.151
192.168.10.150
A
C
C
A
B
B
A
B
B
B
B
B
B
FDP
FDP
192.168.18.2
192.180.18.3
B
B
FED
FED
FED
192.168.8.116
192.168.8.114
192.168.8.115
PSH
PSH
PSH
PRP
PRS
PRB
192.168.1.36
192.168.1.102
192.168.1.85
192.168.1.253
192.168.3.126
192.168.1.253
MOF
MOF
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
AGPR Islamabad
AGPR Islamabad
AGPR Islamabad
Central Site
Central Site
Central Site
Siemens ITS
AGPR s/o Sindh
AGPR s/o Sindh
MoF
MoF
FD NWFP
FD NWFP
FD Punjab
FD Punjab
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
8 of 25
Database Server
FDS
192.168.1.79
B
FD Sindh
Application Server
FDS
192.168.1.33
B
FD Sindh
FD Baloch Production
Database Server
FDB
B
Siemens ITS
Application Server
District Functionality
System
FDB
B
Siemens ITS
ABF
C
Siemens ITS
C
Central Site
Central Site
Other Servers
Exchange Server
File Server
Network Devices
Switch 1
Switch 2
Switch 3
Hub 1
Hub 2
PIFRA
SIP
192.168.1.7
192.168.1.250
Brand
Allied
Telesyn
Allied
Telesyn
Allied
Telesyn
3-Com
3-Com
Ports
24 ports
Location
Server Room
24 ports
Server Room
16 ports
Computer Centre
8 ports
12 ports
SDC-124
Computer Centre
LAN Equipment:
-Centre COM -8 Port Hub , 10-Base-T/BNC Port
- 24 Port Switch able Ethernet 10BaseT/100 Base-TX switch-Allied Telesyn
- 24 Port Switch able Ethernet 10BaseT/100 Base-TX switch-Allied Telesyn
-16 Port Fast Ethernet Switch 10Base-T-16 Port
WIRELESS NETWORKING
-Wave, Wireless Networking-Speed LAN
Uninterruptible Power Supply (UPS)
-2.2 KVA UPS for each Server(R3T,QAS,Production,Play)
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
9 of 25
Application/system software inventory
PIFRA Information System:



System Software:
Application Software
Database Software: ORACLE, SAP
Personnel Desktop Publishing:




System Software: Windows 2000,XP
Application Software:
Database Software:
Backups: Incremental (M-TH), Full (F)
PIFRA Data Warehouse:




System Software: Windows 2000 ,XP
Application Software: Client preference
Database Software: SQL Server
Backups: Incremental
Preventive Measurements for a System Disaster:




Rapid Recovery – Advanced options for managing data and helping to
recover quickly and minimizing data loss. This may include duplicate systems
at a Siemens recovery facility which mirrors PIFRA primary systems to
provide fast and current data recovery.
Fixed Site – Quick, complete recovery capability at specialized, strategically
located at Site.
Mobile – Self contained mobile trailers configured to Pifra’s requirements and
delivered to the location of choice.
Managed Delivery – Temporary hardware replacement delivered to the
location of Pilot site.
Backup and Recovery Considerations



Define business, operational, and technical requirements for a backup and
recovery strategy
Identify the components of a disaster recovery plan
Discuss the importance of testing a backup and recovery strategy
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
10 of 25
Database Failures:
There are certain cases in which the database of a system crashes or fails to open.
These database errors can occur due to the following:







Failures caused by user errors (such as logical errors)
Failures due to errors in upgrade
Failures due to errors in data transfer
Limited database free space
Corruption of data files
Data files getting offline
Limited space in rollback segments
SAP Recovery Structures and Processes







Identify Oracle processes, file structures, and memory components as they
pertain to backup and recovery
Observe the importance of checkpoints, redo logs, and archives
Identify the process of synchronizing files during a checkpoint
Multiplex control files and redo logs
Oracle Backup and Recovery Configuration







Identify recovery implications of operating in “Noarchive” mode
Describe the differences between “Archive log” mode and “Noarchivelog”
mode
Configure a database for “Archive log” mode and automatic archiving
Use init.ora parameters to duplex archive log files
Oracle Recovery Manager Overview









Determine when to use RMAN
List the uses of Backup Manager
Identify the advantages of RMAN with and without a recovery catalog
Create a recovery catalog
Connect to Recovery Manager
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
11 of 25
Oracle Recovery Catalog Maintenance









Use Recovery Manager to register, resynch, and reset a database
Maintain the recovery catalog using change, delete, and catalog commands
Query the recovery catalog to generate reports and lists
Create and execute scripts to perform backup and recovery operations
Create, store, and run scripts
Physical Backups without Oracle Recovery Manager













Perform database backups using operating system commands
Describe the recovery implications of closed and open backups
Perform closed and open database backups
Identify the backup implications of the “Logging” and “Nologging” modes
Identify the different types of control file backups
Discuss backup issues associated with “read only” tablespaces
List the data dictionary views useful for backup operations
Physical Backups Using Oracle Recovery Manager









Identify types of RMAN backups
Describe backup concepts using RMAN
Perform incremental and cumulative backups
Troubleshoot backup problems
View information from the data diction
Types of Failures and Troubleshooting








List the types of failure that may occur in an Oracle database environment
Describe the structures for instance and media recovery
Use the DBVERIFY utility to validate the structure of an Oracle database file
Configure checksum operations
Use log and trace files to diagnose backup and recovery problems
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
12 of 25
Oracle Recovery without Archiving







Note the implications of media failure with a database in noarchivelog mode
Recover a database in noarchivelog mode after media failure
Restore files to a different location if media failure occurs
Recover a database in noarchivelog mode using RMAN
Complete Oracle Recovery with Archiving









Note the implications of instance failure with an archivelog database
Describe a complete recovery operation
Note the advantages and disadvantages of recovering an archivelog
database
Recover an archivelog database after media failure
Recover an archivelog database using RMAN and Backup Manager
Incomplete Oracle Recovery with Archiving









Identify the situations to use an incomplete recovery to recover the system
Perform an incomplete database recovery
Recover after losing current and active logs
Use RMAN in an incomplete recovery
Work with table space point-in-time recovery
Oracle Export and Import Utilities

Use the Export utility to create a complete logical backup of a database object

Use the Export utility to create an incremental backup of a database object



Invoke the direct-path method export
Use the Import utility to recover a database object
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
13 of 25
Additional Oracle Recovery Issues




List methods for minimizing downtime
Diagnose and recover from database corruption errors
Reconstruct a lost or damaged control file
List recovery issues associated with an offline or read-only table space
Additional Security Options
Backups:
Backup means making copies of files from one location to another. Both, the source
and the target can be on the same or on different storage device. It is best to copy
the files to a different storage device other than the source. Data can be restored
from this backup if the original files get damaged, lost or else.
The backup strategy adopted at PIFRA is in such a way to minimize the chances of a
system disaster and to recover the system at the earliest in case of such a
happening.
System Backups





Both full and incremental backups
Nightly incremental (M-Th)
Weekly full backups
Weekly backups rotated offsite
Weekly exports of Oracle tables
Restore:
Usually restoration of data is carried out due to the following reasons:



Recover after a system disaster
Test your disaster recovery plan
Copy your database to another system
For continuous business transactions, the restoration procedures should be well–
outlined and time-effective to get the system operational.
Raid Technology:
The basic idea behind RAID (Redundant Array of Independent Disks) is to combine
multiple small, inexpensive disk drives into an array which yields performance
exceeding that of one large and expensive drive. This array of drives will appear to
the computer as a single logical storage unit or drive.
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
14 of 25
Since at PIFRA, there are large quantities of data to keep, it would be beneficial
using the RAID technology. One of the primary reasons to use RAID includes greater
efficiency in recovering from a disk failure. Therefore RAID reduces the chances of a
disaster to a system.
ERD’s:
Emergency Repair Disk (ERD) creation procedure has been integrated with Microsoft
Servers in case of registry corruption. Registry is the main database of operating
system which holds all the information related to hardware and software installed on
the machine. It is recommended that ERD to be updated frequently.
Backups on Tapes (DDS Tapes)
Data to backup:
SAP R/3 Systems:




Operating System files
SAP application files
Oracle Database
Log files
Other Systems (Domain Servers, File Servers etc):



Operating System files
PIFRA documents (Microsoft Office documents)
Miscellaneous
Types of Backup
A backup on DDS Tapes is taken in either of the two modes:


Offline
Online
Offline
Incase of the SAP R/3 System an offline backup is taken with the application and
database stopped - that is, the users cannot work.
In an offline backup of the complete database, you have a backup of the database
that is consistent. If you work with the database after the backup, the backup is
consistent, but not up-to-date. In this case, you have to recover the database after
you restore the backup.
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
15 of 25
Online
Online backup is taken with the application and database running - that is, the users
can continue to work normally. The management of database changes by the
corresponding Oracle background processes is not affected either.
Backup Utilities
Operating System backup utility:
The offline backup is taken while the SAP application and database is down in the
case of SAP R/3 Systems. Similarly for systems other than SAP, it is essential that
no users are connected to those systems for an offline backup. Since high capacity
tape drives are now more common, it is simple and safe to backup the entire server.
This full server backup eliminates the possibility of not backing up an important file.
In an offline backup the data in the database does not change while the backup is
being made, which means that you have a static “picture” of the database and do not
have to deal with the issue of data changing while the backup is being run. A full
server offline backup also gives you the most complete backup in the event of a
catastrophic disaster. On one tape, you have everything of the server.
SAP R/3 backup utilities:
SAP R/3 offers the utility programs BRBACKUP, BRARCHIVE and BRRESTORE.
Each of these programs has its own range of functions that is backup, archiving the
redo log files and restore respectively.
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
16 of 25
Backup Schedule to be followed at PIFRA
PIFRA SAP R/3 Development System
1. Monday
2. Tuesday
3. Wednesday
4. Thursday
5. Friday
6. Saturday
7. Sunday
Online Backup (Online Tape # 1)
----------------------------------------------Online Backup (Online Tape # 2)
----------------------------------------------Online Backup (Online Tape # 3)
----------------------------------------------Offline Backup (Offline Tape # 1)
On next Monday the first tape is repeated for online backups. DDS tapes for offline
backups will be recycled on every 4th Sunday.
Note: We have 3 days Online Backup of the development system and Offline
Backup of 2 Sundays in hand. First DDS tape of Offline Backup is repeated on every
4th Sunday.
Offsite backups are to be rotated on weekly basis.
PIFRA SAP R/3 Quality Assurance System
1. Monday
2. Tuesday
3. Wednesday
4. Thursday
5. Friday
6. Saturday
7. Sunday
Online Backup (Online Tape # 1)
----------------------------------------------Online Backup (Online Tape # 2)
----------------------------------------------Online Backup (Online Tape # 3)
----------------------------------------------Offline Backup (Offline Tape # 1)
On next Monday the first tape is repeated for online backups. DDS tapes for offline
backups will be recycled on every 4th Sunday.
Note: We have 3 days Online Backup of the Quality assurance system and Offline
Backup of 2 Sundays in hand. First DDS tape of Offline Backup is repeated on every
4th Sunday.
Offsite backups are to be rotated on weekly
PIFRA SAP R/3 Production Systems:
1. Monday
2. Tuesday
3. Wednesday
4. Thursday
5. Friday
6. Saturday
7. Sunday
8. Monday
9. Tuesday
Online Backup (Online Tape # 1)
Online Backup (Online Tape # 2)
Online Backup (Online Tape # 3)
Online Backup (Online Tape # 1)
Online Backup (Online Tape # 2)
Online Backup (Online Tape # 3)
Offline Backup (Offline Tape # 1)
Online Backup (Online Tape # 1)
Online Backup (Online Tape # 2)
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
17 of 25
10. Wednesday
11. Thursday
12. Friday
13. Saturday
14. Sunday
Online Backup (Online Tape # 3)
Online Backup (Online Tape # 1)
Online Backup (Online Tape # 2)
Online Backup (Online Tape # 3)
Offline Backup (Offline Tape # 2)
From every 15th day the first DDS tape is to be repeated for online backups
15. Monday
16. Tuesday
17. Wednesday
18. Thursday
19. Friday
20. Saturday
21. Sunday
22. Monday
23. Tuesday
24. Wednesday
25. Thursday
26. Friday
27. Saturday
28. Sunday
Online Backup (Online Tape # 1)
Online Backup (Online Tape # 2)
Online Backup (Online Tape # 3)
Online Backup (Online Tape # 1)
Online Backup (Online Tape # 2)
Online Backup (Online Tape # 3)
Offline Backup (Offline Tape # 3)
Online Backup (Online Tape # 1)
Online Backup (Online Tape # 2)
Online Backup (Online Tape # 3)
Online Backup (Online Tape # 1)
Online Backup (Online Tape # 2)
Online Backup (Online Tape # 3)
Offline Backup (Offline Tape # 1)
Note: We have 3 days Online Backup and Offline Backup of 3 Sundays in hand.
First DDS tape of Offline Backup is repeated on the 4th Sunday. Offsite backups are
to be rotated on Monthly basis.
Other PIFRA Servers (File Servers, Domain Servers etc):
1. Tuesday
2. Thursday
3. Saturday
Complete Backup (Tape # 1)
Complete Backup (Tape # 2)
Complete Backup (Tape # 3)
On every Tuesday the first tape is repeated for backups.
Note: We have 2 days Backup of the systems in hand.
Offsite backups are to be rotated on weekly basis.
Possible Failures at Test Site
There are chances of failure or crash of system at various situations. The possible
failures can be:





Hardware Failures
Network Failures
Operating System Failure
SAP Software/Application Failure
Database Failure
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
18 of 25
Reason Of Failures
The possible reasons of failure can be:












Fire
Theft
Earthquake
Flood
Electric Fluctuations
UPS Failure
Hard Disk problems
Problems in the hardware e.g. RAM, Motherboard, VGA, Data Buses etc.
Abnormal shutdown of server
Virus Attack
Password Policy
Illegal Operation in the System
Precautions/Safety Measurements at PIFRA
To save the system from possible crash there are number of options that must be
taken into consideration;

To protect data loss & system disaster, RAID level (software & hardware) is
configured.

For safety you should have two Array controller and SCSII disks should be
distributed equally.

It is also important that your hard disk should be hot plug-in (i.e. incase of disk
failure the disk could be replaced while the system is running).

There must be at least two Network Cards to protect the system against
Network Card failure.
To save the system from electric shock there is dire need of having power
system solely dedicated for a particular LAN setup, for that purpose a
separate electric power-system powered by generator is a necessary prerequisite.


If UPS fails to support the servers, there are chances that operating system
may be saved but in extreme case there is a possibility of crash of operating
system. For that purpose Repair Option during Operating System installation
must be utilized from operating system CD.

If hard disk failure occurs than first repair or replace the hard disk.

Do not install unnecessary softwares on the servers, for those purposes view
the error logs daily to rectify the problem.
Avoid abnormal shutdown of servers. If it happens recover the system from
the last known good configuration settings

SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
19 of 25

To recover the crashed operating system ERDs must be prepared weekly to
save the data along with the full backup. ERDs stores the registry information
as well needed for operating system recovery.

If operating system crashes and blue screen appears, then fresh installation
of operating system is recommended on the same drive where the previous
operating system is residing with new folder name.

Systems should be updated with the latest anti virus software in order to be
secure from virus attacks.

To secure the system and protect it from any hazard, operating system’s
password must be changed frequently and make it difficult for others to guess
it. Proper password policy should be maintained.
Benefit of Preventive Measurements
Benefits are from complex business continuity recovery issues to focus on PIFRA’S
core business process.



Provides long term cost efficiencies with complete project management and
professional task execution
Ensure a comprehensive business continuity program consistent with
PIFRA’S policies
Provides a single point of contact for improved communications and coordination between Client & Customer.
WAN Connectivity Problems



Loss of Site connectivity to SAP Servers (Dev, QAS, PRO) systems.
Loss of electronic mail.
Loss of internet connectivity
Central Office LAN Connectivity Problem


Loss of access to word processing software, spreadsheet software, database
software, and other shared software.
Loss of access to personnel data stored on File & Print Server.

Loss of access to departmental data stored on server.
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
20 of 25
Other Important Tasks To Avert Disaster
Performance tuning
Performance should be one of the main issues during every phase of system
development: application analysis, design, and implementation. But usually it
becomes an issue once it becomes a problem – when the system is in run.
To solve performance problems requires a close look at the whole system to identify
potential bottlenecks.
More often problems with performance arise from many different areas like:

hardware configuration

software configuration

application design

transaction and query design
Steps To Be Taken
Typically the action consists of the following steps:

Analysis of the overall function of the system

Server analysis – collecting of OS and RDBMS statistics

Query analysis – execution plans and concurrency issues
Remote database monitoring and administration
There are possibilities where service of full-time database administrator is not
required. It is enough to provide remote monitoring of performance and tracing
potential sources of failure like lack of system resources. Many database
administration services can be done remotely. On site assistance can be provided on
as-needed basis. For example on leased line or dial up at abbottabad , can be
connected to Test Site at Islamabad. All the servers at test site can be accessed from
Abbottabad when required.
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
21 of 25
By combination of on-site and on-line administration costs can be cut to minimum
with the same productivity and reliability of system.
Database audit
It is must to review and evaluation of existing database practices and procedures.
Special attention is paid to backup and recovery, migration from development to
production, tape handling, upgrades and patch installations. This may also cover any
site specific procedures that may affect data security and integrity.
TIPS & HINTS
Possible Disasters
Repercussions
Feasible solutions/tools specific to PIFRA
Server room may catch
fire or any natural
calamity may occur
Servers are burnt or
become in-operative




Hardware failure
Server may not be
available for users.



UPS failure/shutdown
Abnormal shut down
of database may
occur.
Data base may
corrupt





Operating System
Failure
Operating System
crashes


SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Fire alarm system must be installed in the
Server rooms.
Latest offsite and online backups be
available.
Availability of all software’s (SAP, Oracle,
OS etc.)
Acquire new hardware, install software &
restore complete data from backups.
To avert hardware failure, RAID
technology must be used as well as dual
network cards and SCSII.
Repair or replace the hardware as soon
as possible.
Restore the system
Power supply must be secured through
generators.
UPS must be replaced
Time to time check of hardware
performance using tools.
Thorough inspection of hardware should
be made.
Availability of generators is a prerequisite.
Fresh installation of operating system is
recommended on the same drive where
previous operating system is residing but
with new folder name
Use ERDs to recover the operating
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
22 of 25
system
Abnormal shutdown of
database.
System variables are
deleted or corrupted
intentionally or
unintentionally
OS, SAP and
Database may be
corrupted.

Server may not be
operative for SAP
users.




Database failure
An error occurred
during an upgrade
(Scenario 1)
Database failure
(Scenario 2)
Database Failure
(Scenario 3)
Database Failure
(Scenario 4)
A logical error
occurred during
normal database
operations. SAP
System will remain
unavailable to users
A logical error
occurred during
normal database
operation that was
only recognized later.
The structure of the
database was
changed between the
error and the last
complete backup.
A logical error
occurred during
normal database
operation that was
only recognized later.
The database was
reorganized between
the error occurring
and its discovery.
You want to recover
the database to the
point in time before
the error.
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan




Servers must be placed in lock and key
coupled with responsible person.
Troubleshoot the exact damage (if any) to
the system and recover the system
Secure password policy should be
implemented.
Password of OS, sapadm user must be
changed frequently.
A softcopy and a hard copy of
environment variables should always be
available in the case that
environment/system variables are to be
added manually in the system.
Restore the last complete backup without
control files and online redo log files
through SAPDBA
Recover the database through SAPDBA
Restore the last complete backup without
control files and online redo log files
through SAPDBA
Import the redo logs and recover
database.






Restores the last complete backup
without control files and online redo
log files
Recover the structural changes
(CREATE DATA <filename> AS
<filespec>).
The redo log files are imported and
the database recovered.
SAPDBA restores the last complete
backup without control files and online
redo log files
Restore the control files as they were
before the reorganization. During the
reorganization the control files were
backed up in the directory
<ORACLE_HOME>/sapreorg/.
The redo log files are imported and
the database recovered. (Recovery
with the option and USING BACKUP
CONTROLFILE).
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
23 of 25
Database Failure
(Scenario 5)
Database Failure
(Scenario 6)
Database Failure
Datafile(s) may
become offline and
inaccessible to the
users
Limited or no space
left in database and
rollback segments

Online the datafile(s) using server
manager tool

Increase the space of the database by
either increasing the size of the
datafile or adding a new datafile
System hangs due to
archival stuck

Remove the old archive logs or
backup them on a separate storage
device
Database could not
open and hence not
accessible to users

Data files may be corrupted, recover
the specific datafile
If required restore the datafile from a
backup and then recover the
database
(Scenario 7)
Database Failure
(Scenario 8)

Important Instructions:
System Administration team is assigned the task of recovering the system in case of
a disaster. They should know the following regarding disaster recovery plan:





















Properly monitor your system, closely and efficiently, in order to reduce the
chances of a system disaster.
Take regular backups of the systems according to the backup schedule.
Take immediate actions in case of errors in backup and resolve the backup
errors.
Proper tagging of the backup tapes should be carried out.
Exact location of all the software necessary for recovering a complete system
should be known.
Exact location of the onsite backup should be known.
There should be a sufficient number of tapes for backups.
Separate tapes should be kept for offline and online backups.
Weekly rotation of offsite backup tapes should be carried out.
Exact location of the offsite backup should be known.
Backup tapes should be in a secure place and should be easily accessible to
the System Administrator in case of a system disaster.
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
24 of 25







In case of a disaster, find the cause of the problem.
Take appropriate actions to the problem occurred; some of these have been
already mentioned above.
Check logs and take precautionary actions in order to avoid the same disaster
in future.
Update the disaster recovery plan as per changes in the system architecture,
system activities etc
Present contact persons and numbers (Siemens Pakistan):
S.No.
Name
1. Abdus Samad
2. Rehan Saleem
3. Muhammad Ali
Shaikh
Contact Address
PIFRA Central
Site, Islamabad
PIFRA Central
Site, Islamabad
PIFRA Central
Site, Islamabad
Contact
Number
051-9224036
051-9224056
051-9224036
051-9224056
051-9224036
051-9224056
Company
Siemens Pakistan
Siemens Pakistan
Siemens Pakistan
We suggest PIFRA Directorate to nominate some PIFRA system administrators to be
a part of PIFRA Disaster Recovery Team.
SIEMENS SAP Implementation for PIFRA
Disaster Recovery Plan
Prepared By
Muhammad Bilal
Date
May 27, 2004
Page
25 of 25