Download RAID 1

Document related concepts

Data analysis wikipedia , lookup

Data center wikipedia , lookup

Data vault modeling wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Business intelligence wikipedia , lookup

Information privacy law wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Object storage wikipedia , lookup

3D optical data storage wikipedia , lookup

Disk formatting wikipedia , lookup

Transcript
Storage Management
W.lilakiatsakun
Storage Technology
• JBOD (Just Bunch Of Disk)
• RAID (Redundant arrays of inexpensive
disks)
• ESS (Enterprise Storage System)
• SSA (Serial Storage Architecture)
JBOD (Just Bunch Of Disk) (1)
JBOD (Just Bunch Of Disk) (2)
• Depending on the Host Bus Adapter a JBOD can be used
•
•
•
as individual disks or any RAID configuration supported
by the HBA.
Concatenation (SPAN)
Concatenation or Spanning of disks is not one of the
numbered RAID levels, but it is a popular method for
combining multiple physical disk drives into a single
virtual disk.
It provides no data redundancy. As the name implies,
disks are merely concatenated together, end to
beginning, so they appear to be a single large disk.
JBOD (Just Bunch Of Disk) (3)
• it consists of an array of independent disks, it
can be thought of as a distant relative of RAID.
Concatenation is sometimes used to turn several
odd-sized drives into one larger useful drive,
which cannot be done with RAID 0.
• For example, JBOD (Just a Bunch Of Disks)
could combine 3 GB, 15 GB, 5.5 GB, and 12 GB
drives into a logical drive at 35.5 GB, which is
often more useful than the individual drives
separately.
Redundant arrays of inexpensive
disks (RAID)
• The organization distributes the data across multiple
smaller disks, offering protection from a crash that
could wipe out all data on a single, shared disk.
• Benefits of RAID include the following
– Increased storage capacity per logical disk volume
– High data transfer or I/O rates that improve information
throughput
– Lower cost per megabyte of storage
RAID0 (stripe set or striped
volume)
• RAID Level 0 splits data evenly
•
•
•
across two or more disks (striped)
with no parity information for
redundancy.
It is important to note that RAID 0
provides zero data redundancy.
RAID 0 is normally used to increase
performance
A RAID0 can be created with disks
of differing sizes, but the storage
space added to the array by each
disk is limited to the size of the
smallest disk
RAID0 – Summary (1)
• RAID 0 uses a very simple design and is
easy to implement with a HUGE
performance advantage.
• I/O performance is greatly improved by
spreading the I/O load across many
channels and drives while the best
performance is achieved when data is
striped across multiple controllers with only
one drive per controller.
RAID0 – Summary (2)
• No parity calculation overhead is involved
• Not a "True" RAID because it is NOT faulttolerant. The failure of just one drive will
result in all data in an array being lost.
RAID1 (mirrorring)
• A RAID 1 creates an exact
•
•
•
copy of a set of data on two or
more disks.
This is useful when read
performance or reliability are
more important than data
storage capacity.
Such an array can only be as
big as the smallest member
disk.
A classic RAID 1 mirrored pair
contains two disks which
increases reliability
RAID1 – Summary (1)
• RAID Level 1 requires a minimum of 2 drives to
•
•
•
•
implement.
For highest performance, the controller must be
able to perform two concurrent separate Reads
per mirrored pair or two duplicate Writes per
mirrored pair.
100 redundancy of data means no rebuild is
necessary in case of a disk failure, just a copy to
the replacement disk.
Transfer rate per block is equal to that of a single
disk.
Simplest RAID storage subsystem design.
RAID1 – Summary (2)
• Highest disk overhead of all RAID types -
inefficient due to the duplication of Write tasks.
• Typically the RAID function is done by system
software, loading the CPU/Server and possibly
degrading throughput at high activity levels.
• Hardware implementation is strongly
recommended.
– May not support hot swap of failed disk when
implemented in "software".
RAID 0 +1 (A Mirror of Stripes)
• RAID Level 0+1 is
implemented as a
mirrored array whose
segments are RAID 0
arrays.
• RAID Level 0+1
requires a minimum
of 4 drives to
implement
RAID 0 +1 – Summary (1)
• RAID 0+1 provides high data transfer performance.
• It also has the same fault tolerance as RAID level 5.
• RAID 0+1 has the same overhead for fault-tolerance as
•
•
mirroring alone.
The high I/O rates are achieved thanks to multiple stripe
segments.
RAID 0+1 provides excellent solution for sites that need
high performance but are not concerned with achieving
maximum reliability.
RAID 0 +1 – Summary (2)
• A single drive failure will cause the whole array
to become a RAID Level 0 array.
It has a high overhead and is very expensive.
•
• All the drives must move in parallel to proper
track, thereby lowering sustained performance.
• It has very limited scalability at a very high
inherent cost.
RAID 10 (A Stripe of Mirrors)
• RAID 10 is
•
implemented as a
striped array whose
segments are RAID 1
arrays.
RAID Level 10
requires a minimum
of 4 drives to
implement.
RAID 10 – Summary (1)
• RAID 10 has as the same fault tolerance as RAID
•
•
•
•
•
level 1 and can achieve the same high I/O rates.
It has the same overhead for fault-tolerance as
mirroring alone.
It provides an excellent solution for sites that
would have otherwise gone with RAID 1 but need
some additional performance boost.
Very expensive with a high overhead.
All drives must move in parallel to proper track
lowering sustained performance.
Plus it has a very limited scalability at a very high
inherently cost.
RAID3 (Parallel access with a
dedicated parity disk)
• RAID Level 3uses byte-level
striping with a dedicated parity
disk.
• This comes about because any
single block of data will be
spread across all members of
the set and will reside in the
same location.
• So, any I/O operation requires
activity on every disk.
RAID3 – Summary (1)
• Level 3 only requires one dedicated disk in the
•
•
•
array to hold parity information.
The server's data is then striped across the
remaining drives, usually one byte at a time.
The parity drive then keeps track of all the info on
the striped drive(s) and uses it to restore info if
the drive should fail.
Because of the parity information that is stored
and because Write operations take place on a
byte level, Read/Write operations often take
longer than other RAID configurations.
RAID3 – Summary (2)
• RAID Level 3 requires a minimum of 3
drives to implement.
• Very high Read data transfer rate.
• Very high Write data transfer rate.
• Disk failure has an insignificant impact on
throughput.
• Low ratio of ECC (Parity) disks to data
disks means high efficiency.
RAID3 – Summary (3)
• Transaction rate equal to that of a single
disk drive at best (if spindles are
synchronized).
• Controller design is fairly complex.
• Very difficult and resource intensive to do
as a "software" RAID because of the parity
generation and checking
RAID5 (Independent access with
distributed parity)
• A RAID 5 uses block-level
striping with parity data
distributed across all member
disks.
• A minimum of 3 disks is
generally required for a complete
RAID 5 configuration.
• In the example, a read request
for block "A1" would be serviced
by disk 0.
• A simultaneous read request for
block B1 would have to wait, but
a read request for B2 could be
serviced concurrently by disk 1
RAID 5 – Summary (1)
• Level 5 also relies on parity information to provide
•
•
redundancy and fault tolerance using independent data
disks with distributed parity blocks.
Each entire data block is written onto a data disk; parity
for blocks in the same rank is generated on Writes,
recorded in a distributed location and checked on Reads.
Compared to RAID 3, RAID 5 uses striping to spread
parity information across multiple drives.
Requirements: RAID Level 5 requires a minimum of 3
drives to implement.
RAID 5 – Summary (2)
• It has the highest Read data transaction rate and
•
•
•
•
with a medium Write data transaction rate.
A low ratio of ECC (Parity) disks to data disks
means high efficiency along with a good
aggregate transfer rate.
Disk failure has a medium impact on throughput.
It also has the most complex controller design.
It's often difficult to rebuild in the event of a disk
failure (as compared to RAID level 1) and
individual block data transfer rate same as single
disk.
SSA (Serial Storage Architecture)
(1)
• Serial Storage Architecture (SSA) defines a high•
•
•
•
•
performance serial link for the attachment of input/output
devices.
It has been optimized for storage applications such as
hard disk drives, host adapter cards, and array controllers.
SSA has many advantages over existing parallel interfaces
such as the Small Computer Systems Interface (SCSI-2).
It uses compact cables and connectors, and it has better
performance, connectivity, and reliability.
However, to facilitate migration, SSA retains much of the
SCSI-2 logical protocol.
Current SSA implementations such as the IBM 7133
SSA (Serial Storage Architecture)
(2)
• Disk Subsystem provide a peak data rate of 20
MB/s in each direction.
• However, a typical loop configuration with one
host adapter can provide a total sustained
bandwidth of up to 73 MB/s, and higher speeds
are becoming available.
• The physical medium is usually a copper cable
up to 20 meters long, but fiber optics can also
be used for longer distances.
SSA (Serial Storage Architecture)
(3)
SSA (Serial Storage Architecture)
(4)
• Architecture overview
• SSA is defined in three layers:
• SSA-PH1 defines the electrical specifications,
•
•
cables, and connectors.
SSA-TL1 is a general-purpose transport layer. It
defines the transmission protocol, configuration,
and error recovery.
SSA-S2P is a mapping of the SCSI-2 queuing
model, command set, status, and sense bytes.
Storage Model
Storage Area Network
• The Storage Network Industry Association (SNIA)
defines the SAN as a network whose primary
purpose is the transfer of data between computer
systems and storage elements.
• A SAN consists of a communication infrastructure,
which provides physical connections; and a
management layer, which organizes the
connections, storage elements, and computer
systems so that data transfer is secure and robust.
SAN ‘s definition
• A SAN is a specialized, high-speed network
attaching servers and storage devices
• It is sometimes referred to as “the network
behind the servers.”
• A SAN introduces the flexibility of networking
to enable one server or many heterogeneous
servers to share a common storage utility,
which may comprise many storage devices,
including disk, tape, and optical storage.
SAN Component
• SAN Connectivity
– the connectivity of storage and server components
typically using Fibre Channel (FC).
• SAN Storage
– TAPE /RAID /JBOD (Just Bunch of Disk) /SSA
(Serial Storage Architecture)
• SAN Server
– Windows /Unix /Linux and etc
Switched Fabric
• An infrastructure specially designed to handle
storage communications called a fabric.
• A typical Fibre Channel SAN fabric is made up
of a number of Fibre Channel switches.
• Today, all major SAN equipment vendors also
offer some form of Fibre Channel routing
solution, and these bring substantial scalability
benefits to the SAN architecture by allowing
data to cross between different fabrics without
merging them.
Fiber Channel protocol
• Fibre Channel is a layered protocol. It consists of 5 layers,
•
•
•
•
•
namely:
FC0 The physical layer, which includes cables, fiber optics,
connectors, pinouts etc.
FC1 The data link layer, which implements the 8b/10b
encoding and decoding of signals.
FC2 The network layer, defined by the FC-PI-2 standard,
consists of the core of Fibre Channel, and defines the main
protocols.
FC3 The common services layer, a thin layer that could
eventually implement functions like encryption or RAID.
FC4 The Protocol Mapping layer. Layer in which other
protocols, such as SCSI, are encapsulated into an information
unit for delivery to FC2.
IP Storage Networking
• FCIP (Fiber Channel over IP)
– It is a method for allowing the transmission of Fibre
Channel information to be tunneled through the IP
network.
• iFCP (Internet Fiber Channel Protocol)
– It is a mechanism for transmitting data to and from
Fibre Channel storage devices in a SAN, or on the
Internet using TCP/IP
• Internet SCSI (iSCSI)
– It is a transport protocol that carries SCSI
commands from an initiator to a target.
FCIP (Fiber Channel over IP)
• FCIP encapsulates FC frames within TCP/IP, allowing
islands of FC SANs to be interconnected over an IPbased network
• TCP/IP is used as the underlying transport to provide
congestion control and in-order delivery FC Frames
• All classes of FC frames are treated the same as
datagrams
• End-station addressing, address resolution, message
routing, and other elements of the FC network
architecture remain unchanged
iFCP
• iFCP is a gateway-to-gateway protocol for
•
•
•
implementing a fibre channel fabric over a TCP/IP
Traffic between fibre channel devices is routed and
switched by TCP/IP network
The iFCP layer maps Fibre Channel frames to a
predetermined TCP connection for transport
FC messaging and routing services are terminated at
the gateways so the fabrics are not merged to one
another
iSCSI
• iSCSI is a SCSI transport protocol for mapping of
block-oriented storage data over TCP/IP networks
• The iSCSI protocol enables universal access to
storage devices and Storage Area Networks (SANs)
over standard TCP/IP networks
Storage Management
• Monitoring disk use
– Disk monitor agent scans the server volumes to
collect disk use information
• Hierarchical storage management
– Files will be archived according to certain criteria
• Prevention against Data Loss
– To protect and recovery from loss
• Outsourcing storage management
Monitoring disk use
• One or more the following categories of information
can be collected
– Volumes: Date and time data was collected, server name,
volumes scanned, capacity, total space used and available
– Directories: Date and time data was collected, server
volume and directory names, creation date and time, file
count directory size (in bytes), owner name, groups to
which owner is a member
– Directory and file owners: Date and time data was
collected, server and volume names, groups to which
owner is a member, total number of files, total space used
Hierarchical storage
management
• When disk space becomes exhausted , data files
•
•
need to be backup (as archived file or back up tape)
With the right tools, user are assured of having
enough disk space to accommodate new files
When a file system reaches a predefined threshold of
X percent full,
– automated procedure are initiated that determine which
files are eligible for archive and are currently backed up
– The file catalog is then updated to indicate that files have
been archived and deletes them from the disk file system
Prevention against data loss
(1/2)
• Backups sent off-site in regular intervals
– Includes software as well as all data information,
to facilitate recovery
• Create an insurance copy on Microfilm or
similar and store the records off-site.
– Use a Remote backup facility if possible to
minimize data loss
• Storage Area Networks (SANs) over multiple
sites make data immediately available without
the need to recover or synchronize it
Prevention against data loss
(2/2)
• Surge Protectors — to minimize the effect of
power surges on delicate electronic equipment
• Uninterruptible Power Supply (UPS) and/or
Backup Generator
• Fire Preventions — more alarms, accessible
extinguishers
• Anti-virus software and other security
measures
Techniques and technology
• Mirroring
– Disk mirroring : Redundant arrays of inexpensive disks 1
(RAID1)
– Server mirroring: web / ftp /email
• RAID : RAID0 – 6 and combination
• On-site data storage
– Back up - Tape / optical disk
• Off-site data storage (backup-site)
– Cold sites
– Warm sites
– Hot site
Mirroring
• Mirroring can occur locally or remotely.
– Locally means that a server has a second hard drive that
stores data.
– A remote mirror means that a remote server contains an
exact duplicate of the data. The second drive is called a
mirrored drive.
• Data is written to the original drive when a write
request is issued and then copied to the mirrored
drive, providing a mirror image of the primary drive.
• If one of the hard drives fails, all data is protected
from loss.
Disk mirroring (RAID1)
• The replication of logical
disk volumes onto separate
physical hard disks in real
time to ensure continuous
availability, currency and
accuracy.
• A mirrored volume is a
complete logical
representation of separate
volume copies
Server mirroring
• Mirror sites are most commonly used to provide multiple
•
sources of the same information, and are of particular value
as a way of providing reliable access to large downloads.
Web server
– To preserve a website or page, especially when it is closed or is about
to be closed
– Load balancing
• Email server
– To protect loss of email information
• ftp server
– To allow faster downloads for users at a specific geographical location
– Load balancing
Back up site
• A backup site is a location where a business can
easily relocate following a disaster, such as fire,
flood, or terrorist threat. This is an integral part of
the disaster recovery plan of a business.
• A backup site can be another location operated by
the business, or contracted via a company that
specializes in disaster recovery services.
• In some cases, a business will have an agreement
with a second business to operate a joint disaster
recovery facility.
Cold Sites
• A cold site is the most inexpensive type of backup
•
•
•
site for a business to operate.
It provides office spaces to operate
It does not include backed up copies of data and
information from the original location of the business,
nor does it include hardware already set up.
The lack of hardware contributes to the minimal
startup costs of the cold site, but requires additional
time following the disaster to have the operation
running at a capacity close to that prior to the
disaster.
Warm Sites
• A warm site is a location where the business
can relocate to after the disaster that is
already stocked with computer hardware
similar to that of the original site, but does
not contain backed up copies of data and
information.
Hot Sites
• A hot site is a duplicate of the original site of the
business, with full computer systems as well as nearcomplete backups of user data.
• Ideally, a hot site will be up and running within a
matter of hours. This type of backup site is the most
expensive to operate.
• Hot sites are popular with stock exchanges and other
financial institutions who may need to evacuate due
to potential bomb threats and must resume normal
operations as soon as possible.
How to choose
• Choosing the type is mainly decided by a
company's cost vs. benefit strategy.
• Hot sites are traditionally more expensive than
cold sites since much of the equipment the
company needs has already been purchased
and thus the operational costs are higher.
• However if the same company loses a
substantial amount of revenue for each day
they are inactive then it may be worth the
cost.
• The advantages of a cold site are simple--cost.
It requires much fewer resources to operate a
cold site because no equipment has been
bought prior to the disaster.
• The downside with a cold site is the potential
cost that must be incurred in order to make
the cold site effective.
• The costs of purchasing equipment on very
short notice may be higher and the disaster
may make the equipment difficult to obtain.
Disaster Recovery Planning
(DRP)
W.lilakiatsakun
Disaster Recovery Planning
(DRP)
• DRP is the process of regaining access to the
data, hardware and software necessary to
resume critical business operations after a
natural or human-induced disaster.
• DRP is part of a larger process known as
business continuity planning (BCP).
• Disaster recovery is the process by which you
resume business after a disruptive event.
What is the difference DRP and
BCP (1/2)
• The event might be
– something huge-like an earthquake or the
terrorist attacks on the World Trade Center
– something small, like malfunctioning software
caused by a computer virus.
• Many business executives are prone to
ignoring "disaster recovery" because disaster
seems an unlikely event.
What is the difference DRP and
BCP (2/2)
• "Business continuity planning" suggests a more
comprehensive approach to making sure you can
keep making money.
• Often, the two terms are married under the acronym
BC/DR.
• DR and/or BC determines how a company will keep
functioning after a disruptive event until its normal
facilities are restored.
What do these plans include
(1/2)
• All BC/DR plans need to encompass
– How employees will communicate
– Where they will go
– How they will keep doing their jobs.
• The details can vary greatly, depending on the
size and scope of a company and the way it
does business.
What do these plans include
(2/2)
• For example :The plan at one global
manufacturing company
– restore critical mainframes with vital data at a
backup site within four to six days of a disruptive
event,
– obtain a mobile PBX unit with 3000 telephones
within two days
– recover the company's 1000-plus LANs in order of
business need
– set up a temporary call center for 100 agents at a
nearby training facility.
Events that necessitate disaster
recovery
•
•
•
•
•
•
•
•
•
•
Natural disasters
Fire
Power failure
Terrorist attacks
Organized or deliberate disruptions
Theft
System and/or equipment failures
Human error
Computer viruses
Testing
Discovery Planning steps (1)
• I. Information Gathering
• Step One - Organize the Project
– Appoint coordinator/project leader, if the leader
is not the dean or chairperson.
– Determine most appropriate plan organization
for the unit (e.g., single plan at college level or
individual plans at unit level)
– Set project timetable
– Draft project plan, including assignment of task
responsibilities
Discovery Planning steps (2)
• Step Two – Conduct Business Impact Analysis
• In order to complete the business impact analysis,
most units will perform the following steps:
– Identify functions, processes and systems
– Interview information systems support personnel
– Interview business unit personnel
– Analyze results to determine critical systems, applications
and business processes
– Prepare impact analysis of interruption on critical systems
Discovery Planning steps (3)
• Step Three – Conduct Risk Assessment
• The risk assessment will assist in determining the
probability of a critical system becoming severely
disrupted and documenting the acceptability of these
risks to a unit.
– Review physical security (e.g. secure office, building access
off hours, etc.)
– Review backup systems
– Review data security
Discovery Planning steps (3/1)
– Review policies on personnel termination and
transfer
– Identify systems supporting mission critical
functions
– Identify vulnerabilities (Such as flood, tornado,
physical attacks, etc.)
– Assess probability of system failure or disruption
– Prepare risk and security analysis
Discovery Planning steps (4/1)
• Step Four - Develop Strategic Outline for
Recovery
1 Assemble groups as appropriate for:
– Hardware and operating systems
– Communications
– Applications
– Facilities
– Other critical functions and business processes as
identified in the Business Impact Analysis
Discovery Planning steps (4/2)
• For each system/process above quantify the
following processing requirements:
– Light, normal and heavy processing days
– Transaction volumes
• Dollar volume (if any)
• Estimated processing time
• Allowable delay (days, hours, minutes, etc.)
Discovery Planning steps (4/3)
3 Detail all the steps in your workflow for each critical
business function (e.g., for student payroll
processing each step that must be complete and
the order in which to complete them.)
4 Identify systems and applications
–
–
–
–
Component name and technical id (if any)
Type (online, batch process, script)
Frequency
Run time
–
Allowable delay (days, hours, minutes, etc.)
Discovery Planning steps (4/4)
• Identify vital records (e.g., libraries,
processing schedules, procedures, research,
advising records, etc.)
– Name and description
– Type (e.g., backup, original, master, history, etc.)
– Where are they stored
– Source of item or record
– Can the record be easily replaced from another
source (e.g., reference materials)
Discovery Planning steps (4/5)
– Backup
• Backup generation frequency
• Number of backup generations available onsite
• Number of backup generations available off-site
• Location of backups
• Media type
• Retention period
• Rotation cycle
• Who is authorized to retrieve the backups?
Discovery Planning steps (4/6)
6 Identify if a severe disruption occurred what would
be the minimum requirements/replacement needs
to perform the critical function during the
disruption.
– Type (e.g. server hardware, software, research materials,
etc.)
– Item name and description
– Quantity required
– Location of inventory, alternative, or offsite storage
– Vendor/supplier
Discovery Planning steps (4/7)
7 Identify if alternate methods of processing either
exist or could be developed, quantifying where
possible, impact on processing. (Include manual
processes.)
8 Identify person(s) who supports the system or
application
9 Identify primary person to contact if system or
application cannot function as normal
10 Identify secondary person to contact if system or
application cannot function as normal
Discovery Planning steps (4/8)
11 Identify all vendors associated with the system or
application
12 Document unit strategy during recovery
(conceptually how will the unit function?)
13 Quantify resources required for recovery, by time
frame (e.g., 1 pc per day, 3 people per hour, etc.)
14 Develop and document recovery strategy, including:
– Priorities for recovering system/function components
– Recovery schedule
Form – critical system processing requirement for recovery
Discovery Planning steps (5)
• Step Five – Review Onsite and Offsite Backup and
•
•
•
•
•
Recovery Procedures
The planning team as identified in Step 1 Task 3
would normally perform this task.
Review current records (OS, Code, System
Instructions, documented processes, etc.) requiring
protection
Review current offsite storage facility or arrange for
one
Review backup and offsite storage policy or create
one
Present to unit leader for approval
Discovery Planning steps (6)
• Step Six – Select Alternate Facility
• ALTERNATE SITE: A location, other than the normal
facility, used to process data and/or conduct critical
business functions in the event of a disaster.
– Determine resource requirements
– Assess platform uniqueness of unit systems (e.g.,
MacIntosh, IBM Compatible, Oracle database, Windows
3.1, etc.)
– Identify alternative facilities
– Review cost/benefit
– Evaluate and make recommendation
Discovery Planning steps (7/1)
II. Plan Development and Testing
• Step Seven – Develop Recovery Plan
• This step would ordinarily be completed by the
coordinator/Project Manager working with the
planning team.
• Sample Plan Outline
Discovery Planning steps (7/2)
1 Objective
2 Plan Assumptions
3 Criteria for invoking the plan
– Document emergency response procedures to occur during
and after an emergency
– Document procedures for assessment and declaring a state
of emergency
– Document notification procedures for alerting unit and
university officials
– Document notification procedures for alerting vendors
– Document notification procedures for alerting unit staff and
notifying of alternate work procedures or locations.
Discovery Planning steps (7/3)
4 Roles Responsibilities and Authority
– Identify unit personnel
– Recovery team description and charge
– Recovery team staffing
– Transportation schedules for media and teams
Discovery Planning steps (7/4)
5 Procedures for operating in contingency mode
– Process descriptions
– Minimum processing requirements
– Determine categories for vital records
– identify location of vital records
– Identify forms requirements
– Document critical forms
– Establish equipment descriptions
– Document equipment - in the recovery site
– Document equipment - in the unit
Discovery Planning steps (7/4)
–
–
–
–
–
–
–
–
–
–
Software descriptions
Software used in recovery
Software used in production
Produce logical drawings of communication and data
networks in the unit
Produce logical drawings of communication and data
networks during recovery
Vendor list
Review vendor restrictions
Miscellaneous inventory
Communication needs - production
Communication needs - in the recovery site
Discovery Planning steps (7/5)
6 Resource plan for operating in contingency mode
7 Criteria for returning to normal operating mode
8 Procedures for returning to normal operating mode
9 Procedures for recovering lost or damaged data
10 Testing and Training
– Document Testing Dates
– Complete disaster/disruption scenarios
– Develop action plans for each scenario
• Sample Testing Diagram
Discovery Planning steps (7/6)
11 Plan Maintenance
– Document Maintenance Review Schedule (yearly,
quarterly, etc.)
– Maintenance Review action plans
– Maintenance Review recovery teams
– Maintenance Review team activities
– Maintenance Review/revise tasks
– Maintenance Review/revise documentation
Discovery Planning steps (7/7)
12 Appendices for Inclusion
– inventory and report forms
– maintenance forms
– hardware lists and serial numbers
– software lists and license numbers
– contact list for vendors
– contact list for staff with home and work numbers
Discovery Planning steps (7/8)
– contact list for other interfacing departments
– network schematic diagrams
– equipment room floor grid diagrams
– contract and maintenance agreements
– special operating instructions for sensitive
equipment
– cellular telephone inventory and agreements
Discovery Planning steps (8)
Step Eight - Test the Plan
1 Develop test strategy
2 Develop test plans
3 Conduct tests
4 Modify the plan as necessary
• Samples
• Test Plan Strategy
• Test Plan Scenario
• Test Results/Test Evaluation
Discovery Planning steps (9)
• III. Ongoing Maintenance
• Step Nine - Maintain the Plan
• Dean/Director/Unit Administrator will be responsible
for overseeing this.
1 Review changes in the environment, technology,
and procedures
2 Develop maintenance triggers and procedures
3 Submit changes for systems development
procedures
4 Modify unit change management procedures
5 Produce plan updates and distribute
Discovery Planning steps (10)
Step Ten – Perform Periodic Audit
1 Establish periodic review and update
procedures
Important factors (1/3)
• Communication
– Personnel — notify all key personnel of the
problem and assign them tasks focused toward the
recovery plan.
– Customers — notifying clients about the problem
minimizes panic.
• Recall backups
– If backup tapes are taken offsite, these need to be
recalled. If using remote backup services, a
network connection to the remote backup location
(or the Internet) will be required.
Important factors (2/3)
• Facilities
– having backup hot sites or cold sites for larger companies.
– Mobile recovery facilities are also available from many
suppliers.
• Prepare your employees
– during a disaster, employees are required to work longer,
more stressful hours, and a support system should be in
place to alleviate some of the stress.
– Prepare them ahead of time to ensure that work runs
smoothly.
Important factors (3/3)
• Business information
– backups should be stored in a completely separate
location from the company
• Testing the plan
– provisions, directions, frequency for testing the
plan should be stipulated.