Download PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Business intelligence wikipedia , lookup

Transcript
United Nations Regional Seminar on Census
Data Archiving for Africa
Addis Ababa, Ethiopia, 20-23 September 2011
Session 7 – Data storage, maintenance and
security
Presented by: Ayenika Godheart Mbiydzenyuy
African Centre for Statistics (ACS)
UNECA
Free Powerpoint Templates
Page 1
Presentation Plan
1. Introduction
2. Strategies for data storage
3. Institutional back-up policy
4. Procedures to safeguard the security of the data
5. procedures for data transmission and encryption of the data
6. Conclusion
Free Powerpoint Templates
Page 2
1. Introduction
Data storage is the holding of data in an electromagnetic form for
access by a computer processor. There are two main kinds of
storage:
(i) Primary storage is data that is held in random access memory
(RAM) and other memory devices that are built into computers and
(ii) Secondary storage is data that is stored on external storage
devices such as hard disks, tapes, CD's.
•Data being stored is maintain by adding, deleting, changing and
updating binary as well as high level files.
•Data is maintained manually and/or through an automated programs,
but the origination and translation/delivery point of the data must be
translated into a binary representation for storage.
•Data is usually edited at a slightly higher level in a format relevant to
the content of the data (such as text, images, or scientific or financial
information).
•The data being maintained must be secured as a means of ensuring
data safety from corruption permitting access in a suitably and
controlled manner.
Free Powerpoint Templates
Page 3
2. Strategies for data storage
•
•
•
In an information environment, the success of any census archiving
system is tightly coupled to its ability to store and manage
information. Storage systems are a critical part of NSOs network
infrastructure. With the amount of data growing at an incredible rate,
during the census the storage strategy must keep pace.
In designing a storage strategy for census data archiving, the choice
of the right technology for the primary storage system, as well as a
solid backup procedure that ensure system management must be
guaranteed.
The Need for Storage:
– A computer's main memory uses Dynamic RAM (DRAM): It stores
data and provides almost instantaneous access to that data, but is
limited in and is gone once the computer is turned off.
– Permanent storage holds the data and software that must be
preserved even when the computer is powered down. Permanent
storage needs can be immense. The present library of software
applications can easily exceed many gigabytes, and the quantity of
data can range in the terabytes. In designing a data storage
strategy for census archiving in a network, the stakes are
extremely high.
Free Powerpoint Templates
Page 4
Storage Strategy Design Issues
•
A well constructed storage system should:
– Prevent data loss
– Offer adequate capacity that can easy scale as storage needs grow
– Provide fast access to data without interruptions
– Be prepared for equipment failures
– Use cost-effective technologies
Free Powerpoint Templates
Page 5
Data Policies
•
Not all issues that have an impact on NSOs data strategy can be
solved with technology. Individuals must follow sound practices with
institutional data.
•
Be sure that the users place their data within the supported data
structures. Users cannot, for example, store vital data files on the local
drives of their computers if the organization's storage strategy
assumes that all data resides on network servers.
•
End users are unlikely to perform frequent backups of their data or
follow other procedures that ensure that institutional data are secure.
•
An important part of NSOs storage strategy should involve the training
of users to ensure the network storage facilities are secure and wellmanaged.
•
Most backup software products will archive the contents of distributed
computer hard drives; however, few of the other requirements of a
well-managed storage strategy can be met with this approach.
Free Powerpoint Templates
Page 6
General Considerations
•
Several factors come into play when selecting storage options.
– Capacity
– Scalability
– Costs
– Performance
– Reliability
–
– Manageability
– Cost Analysis
– Risk Analysis
– Capacity Planning Free Powerpoint Templates
Page 7
Institutional back-up policy
• Backing up data is a basic precautionary step that everybody
working with computers should take. Backup copies are an
insurance policy against the possibility of your data being lost,
damaged or destroyed.
• A reliable backup mechanism is indispensable for every
institution engaged in digital preservation. Digital collections
prepared with so much of effort and cost with an aim of longterm preservation must be made immune from all kinds of
natural or man-made disasters.
•
Moreover, the back-up methodology adopted must be such that it has
long-term relevance and usage. It should not become obsolete or
redundant after a small period of time because digital preservation
technology has not been standardized or finalized yet. From the use of
obsolete floppies to CD’s and DVD’s, and to Tape Drives, it has been
changing so fast.
Free Powerpoint Templates
Page 8
Understanding Requirements for a Good Backup Policy
A good backup policy will protect your data from a large range of
mishaps. The range of events that you should consider when planning
how to backup your data includes:
•
Accidental changes to data
•
Accidental deletion of data
•
Loss of data due to media or software faults
•
Virus infections and interference by hackers
•
Catastrophic events (fire, flood etc.)
•
A good backup policy should provide protection against all of these
threats.
Free Powerpoint Templates
Page 9
Frequency of Backup
• Backups should be made regularly to ensure that they remain
up-to-date.
• The more frequently data is being changed the more frequently
backups should be made.
• If your data is changing significantly every day you should
consider a daily backup, but if you are prepared and can afford
to redo a longer period of work then less frequent backup may
be appropriate.
• As well as backing up frequently, you should keep several
backup copies made at different dates.
• Doing this guard against the danger that your backup copy will
incorporate a recent, but as yet undiscovered problem, from
your working copy.
Free Powerpoint Templates
Page 10
Multiple Backup Copies
•
•
•
•
A backup copy may suffer the same mishaps as the working copy of
your data, so it is a good idea to spread the risk by maintaining
several backup copies. A minimum of two backup copies should be
maintained in addition to your working copy of the data.
Offsite Backups
More serious events, such as a fire in the office, will destroy both the
working copy of the data and any backup copies stored at the same
location. Some backup copies should be stored 'offsite' (offsite is a
relative term, dependent on the level of protection you want).
Media
Backup copies should be made on new media. Do not continue to use
media once they start to develop faults. Specifically, floppy disks are
not a good media for backup copies. If they are used, they should be
replaced often.
Multiple Formats
Store backup copies in both the software formats that you are using
and in exported formats (many spreadsheets and database packages
can exported to delimited text for example). This will help protect you
from subtle faults that can sometimes develop in complicated data
formats (such as database file formats) that may not become apparent
until after they have been included in both the working copy and the
backup copies.
Free Powerpoint Templates
Page 11
Institutional Backup Policy
•
Census projects should never assume that their institution's policies
will be appropriate to their needs. Always check.
•
Institutions may maintain backups for a limited period
•
Institutions may only provide backups to protect against complete
loss of data, and not individual users losing data
•
Institutions may not backup all data held on their network
•
Many organizations advise their users to make their own backups of
critical data. This is good advice and should be followed.
Check Your Backup
• A backup that does not actually work is of no use at all. Always test
your backup procedures to ensure that your backup can be retrieved
and is useable.
Free Powerpoint Templates
Page 12
Backup is not Preservation
• A backup copy is an exact copy of the version of the data you
are working on. If your working copy becomes unusable, you
should be able to start using your backup copy immediately, on
the same computers, using the same software.
• In contrast, a preservation version of the data should be
designed to mitigate the effects of rapid technology change
that might otherwise make the data unusable within a few
years.
• Some of the prevalent devices for storing back-up data are hard
drives or disks in a computer, CDs, DVDs, Tape Drives, Tape
Drives (DAT tape, DLT Tape, Zip and JAZ) and hard copies.
Free Powerpoint Templates
Page 13
OTHER TYPES OF BACKUP TECHNOLOGY TO CONSIDER:
•
•
•
•
VIRTUAL TAPE LIBRARY - A VTL is an archival backup solution that
combines traditional tape backup methodology (software or appliance
based) with low-cost disk technology to create an optimized backup and
recovery solution.
NEAR-LINE DISK TARGET - A disk array that acts as a target or cache for
tape backup. These arrays typically offer faster backup and recovery times
when compared with tape and are cost effective because they're
increasingly based on low cost Advanced Technology Attachment disk
drives. Unlike virtual tape libraries, however, they typically require
configuration and process changes to existing backup / recovery operations.
CONTENT-ADDRESSED STORAGE (CAS) - A disk based storage system
that uses the content of the data as a locator for the information, eliminating
dependence on file system locators or volume/block/device descriptors to
identify and locate specific data.
MASSIVE ARRAY OF IDLE DISKS (MAID) - A disk system in which disks
spin only when necessary (such as during read/write operations), reducing
total power consumption and enabling massive high-capacity disk systems
with comparable economics to tape libraries.
Free Powerpoint Templates
Page 14
•
•
•
•
•
•
SNAPSHOTS AND INCREMENTAL CAPTURE - A snapshot is a copy of a
volume that is essentially empty but has pointers to existing files. When one
of the files changes the snap volume creates a copy of the original file just
before the new file is written to disk on the original volume.
INCREMENTAL CAPTURE - Vendors in this category can replace existing
backup technologies or co-exist with them. Incremental capture solutions
can take snapshots at the block, file, or volume level.
CONTINUOUS CAPTURE - This segment of the data-protection market
includes software or appliances designed to capture every write made to
primary storage and make a time-stamped copy on a secondary device.
ARRAY-BASED REPLICATION - These products have been around for a
long time and have traditionally come from large disk-array vendors such as
EMC, Hitachi Data Systems, and IBM. These products run on high-end
arrays and are very robust (and expensive).
HOST-BASED REPLICATION - Host-based replication software runs on
servers. As writes are made to one array, they are also written to a second
array. Vendors in this category have eliminated many of the complexities in
their products, making them easier to deploy and manage.
FABRIC-BASED REPLICATION - The new debate raging in the storage
industry revolves around the following question: "Where should storage
services, or applications, reside—on hosts, arrays, or in the fabric on
switches or appliances?"
Free Powerpoint Templates
Page 15
4. Procedures to safeguard the security of the data
Steps to Improving Data Safeguards
Protecting data in dynamic and diverse environments is a formidable challenge.
You need to focus on categorized data inventory, sharing mechanisms, and
leak detection. The challenges of securing data in modern organizations are
vast.
First: Find and Understand Data
To determine how to secure your data, first identify which records warrant
protection, and where they reside. Finding the data typically involves
interviews and the review of existing documentation. Expand your findings
by scanning file servers within your organization for potentially sensitive
records. Budget-strapped? Free data discovery tools that can get you
started with this task include:
• Sensitive data plugins for Nessus
• Spider
• Firefly
• FindSSN and Find_CCN
Free Powerpoint Templates
Page 16
Second: Help Users Share and Store Data
• How will people exchange and store data securely? Don't
expend your efforts on security controls without defining how
people will share the sensitive data to get work done.
Third: Detect the Data Leaks, to React Quickly
• Despite your best efforts, sensitive data may get exposed, often
because of an oversight in storing, sharing, or securing them.
Consider how you will detect the leak quickly to minimize the
incident's scope and severity.
• The data discovery process, as well as a security assessment,
can help discover data where they don't belong. In addition,
make use of web search engines to identify potentially
sensitive records accessible to the public over the internet.
•
Keep an eye on public data breaches. Knowing what data breaches
have occurred can help you understand the leading causes of the
incidents, so you can adjust your security controls appropriately.
Free Powerpoint Templates
Page 17
Procedures for data transmission and encryption of the data
•
Data transmission refers to computer-mediated communication among system users,
and also with other systems. The basic functions of using on-line information systems
-entering data into a computer, displaying data from a computer, controlling the
sequence of input-output transactions, with guidance for users throughout the
process.
•
In considering data transmission functions, we must adopt a broad perspective. Data
that are transmitted via computer may include words and pictures as well as
numbers. And the procedures for data transmission may take somewhat different
forms for different system applications.
•
Data might be transmitted by transferring a data file from one user to another,
perhaps with an accompanying message to indicate that such a file transfer has been
initiated.
•
In some applications, computer-mediated data transmission may be a discrete, taskdefined activity.
•
Effective communication is of critical importance in systems where information
handling requires coordination among groups of people. This will be true whether
communication is mediated by computer or by other means.
Free Powerpoint Templates
Page 18
•
In cryptography, encryption is the process of transforming information
(referred to as plaintext) using an algorithm (called cipher) to make it
unreadable to anyone except those possessing special knowledge, usually
referred to as a key.
•
The result of the process is encrypted information (in cryptography,
referred to as ciphertext). In many contexts, the word encryption also
implicitly refers to the reverse process, decryption (e.g. “software for
encryption” can typically also perform decryption), to make the encrypted
information readable again (i.e. to make it unencrypted).
•
Encryption has long been used by militaries and governments to facilitate
secret communication. Encryption is now commonly used in protecting
information within many kinds of civilian systems.
•
Encryption, by itself, can protect the confidentiality of messages, but other
techniques are still needed to protect the integrity and authenticity of a
message; for example, verification of a message authentication code (MAC)
or a digital signature. Standards and cryptographic software and hardware
to perform encryption are widely available, but successfully using encryption
to ensure security may be a challenging problem.
Free Powerpoint Templates
Page 19
Conclusion
• Cloud computing is a technology that uses the internet and central
remote servers to maintain data and applications. Cloud computing
allows consumers and businesses to use applications without
installation and access their personal files at any computer with
internet access. This technology allows for much more efficient
computing by centralizing storage, memory, processing and
bandwidth.
• Cloud computing is simply all about data: storing data securely,
managing data effectively, accessing data efficiently, integrating data
relative to needs, and using data analytics to improve business
intelligence and enhancing decision making business processes.
Sounds like a proper mouthful but, if you’re in business and
accumulating data, you’re more than likely already doing that kind of
thing already. It’s just a matter of how effectively you’re doing it, and
whether Cloud Computing can offer you efficiencies of scale and
cost.
Free Powerpoint Templates
Page 20
THANK YOU
Free Powerpoint Templates
Page 21