Download 1 Introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data Protection Act, 2012 wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Data model wikipedia , lookup

Data analysis wikipedia , lookup

Data center wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

3D optical data storage wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Disk formatting wikipedia , lookup

Transcript
ASCA & ALCS Curriculum
SOHO Network Management and Security - Backup
SOHO
NETWORKING
Backup
V1.1 17/05/2007
ASCA & ALCS Curriculum
SOHO Network Management and Security - Backup
V1.1 17/05/2007
Contents
1
2
Introduction ............................................................................................................................. 2
Disaster Examples ................................................................................................................... 2
2.1
Physical damage.............................................................................................................. 2
2.2
Logical damage ............................................................................................................... 3
3
Creating Disaster Recovery Plan ............................................................................................ 3
3.1
Data Loss Prevention ...................................................................................................... 3
3.2
Data Recovery................................................................................................................. 7
-1-
ASCA & ALCS Curriculum
SOHO Network Management and Security - Backup
V1.1 17/05/2007
Author’s remarks
 Part of the materials in this set of handout is adapted from Wikipedia.
 This set of materials is essentially developed by Chung, C.F. Jeffrey.
1
Introduction
"He didn't backup the data, so our team lost last week's work"
Have you ever heard or experienced of that? Although disaster rarely happens, once it does
the price is always higher than what you thought. Try to image your personal computer suddenly
broke down and you lost all of your teaching materials, PowerPoint and worksheets for which
you have spent weeks or even months to prepare. The trouble is severe. Now imagine the serious
of the trouble if a computer network breaks down.
In order for you and your organization to effectively protect your resources from potential
disaster, you should invest in implementing a disaster recovery plan to minimize data loss and
allow data recovery even disaster happened.
Disaster recovery is the ability of an infrastructure or system to restart operations after a
disaster. Disaster recovery is used both in the context of data loss prevention and data recovery.
There are two primary metrics to demonstrate recoverability following failure:
 Recovery Point Objective (RPO) is the point in time that the restarted infrastructure will
reflect. Essentially, this is the roll-back that will be experienced as a result of the
recovery. Reducing RPO requires increasing synchronicity of data replication.
 Recovery Time Objective (RTO) is the amount of time that will pass before an
infrastructure is available. Reducing RTO requires data to be online and available at a
failover site.
2
Disaster Examples
Before a disaster recovery plan is created, we must understand what kinds of disaster we will
face and how to solve these problems. Here are some examples.
2.1
Physical damage
A wide variety of failures can cause physical damage to storage media. CD-ROMs can have
their metallic substrate or dye layer scratched off; hard disks can suffer any of several
mechanical failures, such as head crashes and failed motors; and tapes can simply break.
Physical damage often causes some data loss, and in many cases damages the logical structures
of the file system. This causes logical damage that must be dealt with before any files can be
used again.
-2-
ASCA & ALCS Curriculum
SOHO Network Management and Security - Backup
V1.1 17/05/2007
Most physical damages cannot be repaired by end users. For example, opening a hard disk in
a normal environment can allow dust to settle on the surface, causing further damage to the
platters. End users generally do not have the right tools or technical expertise to make these sorts
of repairs.
2.2
Logical damage
Far more common than physical damage is logical damage to a file system. Logical damage is
primarily caused by power outages that prevent file system structures from being completely
written to the storage medium, but problems with hardware and drivers, as well as system
crashes, can have the same effect. The result is that the file system is left in an inconsistent state.
This can cause a variety of problems, such as drives reporting negative amounts of free space,
system crashes, or an actual loss of data. Various programs exist to correct these inconsistencies,
and most operating systems come with at least one rudimentary repair tool for their native file
systems. Linux, for instance, comes with the fsck utility, and Microsoft Windows provides
chkdsk. Third-party utilities are also available, and some can produce superior results by rescuing
data even when the disk cannot be recognized by the operating system’s repair utility.
Two main techniques are used by these repair programs. The first, consistency checking,
involves scanning the logical structure of the disk and checking to make sure that it is consistent
with its specification.
The second technique for file system repair is to assume very little about the state of the file
system to be analyzed and to rebuild the file system from scratch using any hints that any
undamaged file system structures might provide. This strategy involves scanning the entire drive
and making note of all file system structures and possible file boundaries, then trying to match
what was located to the specifications of a working file system. Some third-party programs use
this technique, which is notably slower than consistency checking. It can, however, rescue data
even when the logical structures are almost completely destroyed. This technique generally does
not repair the underlying file system, but merely allows for data to be extracted from it to another
storage device.
3
Creating Disaster Recovery Plan
As mentioned, disaster recovery should include data loss prevention and data recovery. In the
following paragraphs, some important techniques for data loss prevention and data recovery will
be highlighted.
3.1
Data Loss Prevention
Technique 1 – Using Uninterruptible Power Supply (UPS)
At power failures, disk controller may report file system structures have been saved to the
disk when it has not actually occurred. This can often occur if the drive stores data in its write
cache at the point of failure, resulting in a file system in an inconsistent state such that the
-3-
ASCA & ALCS Curriculum
SOHO Network Management and Security - Backup
V1.1 17/05/2007
journal itself is damaged or incomplete. One solution to this problem is using disk controllers
equipped with a battery backup so that the waiting data can be written when power is restored.
Finally, the entire system can be equipped with a battery backup that may make it possible to
keep the system on in such situations, or at least to give enough time to shut down properly. This
battery backup is called “Uninterruptible Power Supply (UPS)”.
UPS, is a device or system that maintains a continuous supply of electric power to certain
essential equipment that must not be shut down unexpectedly. The equipment is inserted between
a primary power source and the equipment to be protected for the purpose of eliminating the
effects of a temporary power outage and transient anomalies. They are generally associated with
telecommunications equipment, computer systems, and other facilities such as airport landing
systems and air traffic control systems where even brief commercial power interruptions could
cause injuries or fatalities, serious business disruption or data loss.
In order to prevent blackouts, UPS will use a process called load shedding. This reduces the
amount of power being sent to the consumers but does not eliminate it entirely. This drop in
voltage is also sometimes called a voltage sag or a brownout. UPS will also protect equipment
upon the occurrence of a brownout by using its internal batteries to correct the drop in voltage.
The single biggest event that brought attention to the need for UPS power backup units was the
big power blackout of 2003 in the north-eastern US and eastern Canada.
There are nine standard power problems that a UPS may encounter. They are as follows:
1. Power failure.
2. Power sag (under voltage for up to a few seconds).
3. Power surge (over voltage for up to a few seconds).
4. Brownout (long term under voltage for minutes or days).
5. Long term over voltage for minutes or days.
6. Line noise superimposed on the power waveform.
7. Frequency variation of the power waveform.
8. Switching transient (under voltage or over voltage for up to a few nanoseconds).
9. Harmonic multiples of power frequency superimposed on the power waveform.
Technique 2 – Using Redundant Array of Independent Disks (RAID)
RAID is a system of using multiple hard drives for sharing or replicating data among the
drives. Depending on the version chosen, the benefit of RAID is one or more of increased data
integrity, fault-tolerance, throughput or capacity compared to single drives. Its key advantage is
the ability to combine multiple low-cost devices into an array that offered greater capacity,
reliability, or speed, or a combination of these things, than was affordably available in a single
device.
RAID specification suggested a number of prototype “RAID levels”, or combinations of
disks. Each has theoretical advantages and disadvantages. The most common as well as provide
functionality to prevent data loss and fault-tolerance one are RAID 1 and RAID 5. A simple
animation that illustrates how a RAID system delivers its fault-tolerance behavior can be found
at http://www.adtron.com/expertise/activeraid.html.
-4-
ASCA & ALCS Curriculum
SOHO Network Management and Security - Backup
V1.1 17/05/2007
RAID 1
Figure 1. RAID 1 configuration.
RAID 1 creates an exact copy (or mirror) of a set of data on two or more disks. This is useful
when write performance is more important than minimizing the storage capacity used for
redundancy. The array can only be as big as the smallest member disk, however. A classic RAID
1 mirrored pair contains two identical disks, which increases reliability over a single disk, but it
is possible to have more than two disks. Since each member can be addressed independently if
the other fails, reliability increases with the number of disks in the array. To truly get the full
redundancy benefits of RAID 1, independent disk controllers are recommended, one for each
disk. Some refer to this practice as splitting or duplexing.
When reading, both disks can be accessed independently. Thus the average seek time is
reduced. The transfer rate would be almost doubled as data can be accessed in parallel. In
general, the more disks are used in the array, the better the performance of the disk array. The
only limit is how many disks can be connected to the controller and its maximum transfer speed.
RAID 1 has many administrative advantages. For instance, in some 365*24 environments, it
is possible to “Split the Mirror”: make one disk inactive, do a backup of that disk, and then
“rebuild” the mirror. This requires that the application supports recovery from the image of data
on the disk at the point of the mirror split. This procedure is less critical in the presence of the
“snapshot” feature of some file systems, in which some space is reserved for changes, presenting
a static point-in-time view of the file system. Alternatively, a set of disks can be kept in much the
same way as traditional backup tapes are.
-5-
ASCA & ALCS Curriculum
SOHO Network Management and Security - Backup
V1.1 17/05/2007
RAID 5
Figure 2. RAID 5 configuration.
RAID 5 uses block-level striping with parity data distributed across all member disks. RAID
5 has achieved popularity due to its low cost of redundancy. Generally RAID-5 is implemented
with hardware support for parity calculations.
Every time a data block is written on a disk in an array, a parity block is generated within the
same stripe. A block is often composed of many consecutive sectors on a disk. A series of blocks
from each of the disks in an array is collectively called a “stripe”. If another block, or some
portion of a block, is written on that same stripe the parity block is recalculated and rewritten.
The disk used for the parity block is staggered from one stripe to the next, hence the term
“distributed parity blocks”. RAID-5 writes are expensive in terms of disk operations and traffic
between the disks and the controller.
The parity blocks are not read on data reads, since this would be unnecessary overhead and
would diminish performance. The parity blocks are read, however, when a read of a data sector
results in a cyclic redundancy check (CRC) error. In this case, the sector in the same relative
position within each of the remaining data blocks in the stripe and within the parity block in the
stripe are used to reconstruct the errant sector. The CRC error is logged down but this would not
hinder the operations of the computer system. Likewise, should a disk fail in the array, the parity
blocks from the surviving disks are combined mathematically with the data blocks from the
surviving disks to reconstruct the data on the failed drive “on the fly”. This is sometimes called
Interim Data Recovery Mode. The operating system knows that a disk drive has failed and it
notifies the administrator that a drive needs replacement; applications running on the computer
are unaware of the failure. Reading and writing to the drive array continues seamlessly, though
with some performance degradation. In RAID 5, where there is a single parity block per stripe,
the failure of a second drive results in total data loss.
The maximum number of drives in a RAID-5 redundancy group is theoretically unlimited, but
it is common practice to limit the number of drives. The tradeoffs of larger redundancy groups
are greater probability of a simultaneous double disk failure, the increased time to rebuild a
redundancy group, and the greater probability of encountering an unrecoverable sector during
RAID reconstruction. In a RAID 5 group, the mean time between failures (MTBF) can become
lower than that of a single disk. This happens when the likelihood of a second disk failing out of
-6-
ASCA & ALCS Curriculum
SOHO Network Management and Security - Backup
V1.1 17/05/2007
(N-1) dependent disks, within the time it takes to detect, replace and recreate a first failed disk,
becomes larger than the likelihood of a single disk failing.
RAID-5 implementations suffer from poor performance when faced with a workload which
includes many writes that are smaller than the capacity of a single stripe because parity must be
updated on each write, requiring read-modify-write sequences for both the data block and the
parity block. More complex implementations often include non-volatile write back cache to
reduce the performance impact of incremental parity updates.
In the event of a system failure while there are active writes, the parity of a stripe may become
inconsistent with the data. If this is not detected and repaired before a disk or block fails, data
loss may ensue as incorrect parity will be used to reconstruct the missing block in that stripe.
This potential vulnerability is sometimes known as the “write hole”. Battery-backed cache and
other techniques are commonly used to reduce the window of vulnerability of this occurring.
3.2
Data Recovery
Technique – Using Backup
Backup means copying of data for the purpose of having an additional copy of an original
source. If the original data is damaged or lost, the data may be copied back from that source, a
process which is known as data recovery or data restore. The “data” in question can be either
data as such, or stored program code, both of which are treated the same by the backup software.
Backups differ from an archive in that the data is necessarily duplicated, instead of simply
moved.
Backups are most often made from hard disk based production systems to large capacity
magnetic tape, hard disk storage, or optical write-once read-many (WORM) disk media like CDR, DVD-R and similar formats. As broadband access becomes more widespread, network and
remote backups are gaining in popularity. There are quite a few companies (found by Google
Keyword Search) offering Internet-based backup. During the period 1975–95, most
personal/home computer users associated backup mostly with copying floppy disks. However,
recent drop in hard disk prices, and its number one position as the most reliable re-writeable
media, makes it one of the most practical backup media.
To plan for Backup, several strategies should be considered:
 A backup should be easy to do.
 A backup should be automated and rely on as little human interaction as possible.
 Backups should be made regularly.
 A backup should rely on standard, well-established formats.
 A backup should not use compression. Uncompressed data is easier to recover if the
backup media is damaged or corrupted.
 A backup should be able to run without interrupting normal work.
 Rely on standard formats.
 If a backup spans multiple volumes, recovery should not rely on all volumes being
readable or present.
 If you use certain medium to do your backup on, you also need to have a drive available
that can read it.
-7-
ASCA & ALCS Curriculum





SOHO Network Management and Security - Backup
V1.1 17/05/2007
Backup media needs to be read from time to time, to make sure the data is still readable.
Also, the data needs to be copied to a new medium if it’s about to disappear. Will you be
able to read a CD-R in 10 years' time?
Each of the different media has benefits and drawbacks. Also consider the cost per
gigabyte when comparing different solutions.
Proper backup relies on at least two copies, stored on different media, kept at different
locations.
In the case of a disaster, no one will be able to think clearly, and act accordingly. For this
reason, checklists need to exist that outline what to do.
Staff needs proper training in what to do in case of disaster occurred.
To perform backup, you should use some backup software to help such as ntbackup in
Windows XP environment (see Figure 3).
Figure 3. ntbackup – a data backup and recovery tool in Windows XP.
To perform network backup, which means backup data to network backup server through the
network, you should set up your client computers and network backup server with proper IP and
subnet mask.
-8-