Download What are the security issues in database management

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Serializability wikipedia , lookup

Microsoft Access wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

IMDb wikipedia , lookup

SQL wikipedia , lookup

Oracle Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Ingres (database) wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Kristallynn D. Tolentino
MSIT
I.
Introduction
“A relational database management system (RDBMS) is a database
management system (DBMS) that is based on the relational model.” As
defined by E. F. Codd of IBM's San Jose Research Laboratory.
A relational DBMS is special system software that is used to manage the
organization, storage, access, security and integrity of data. This specialized
software allows application systems to focus on the user interface, data
validation and screen navigation. When there is a need to add, modify, delete or
display data, the application system simply makes a "call" to the RDBMS.
Although there are many different types of database management
systems, relational databases are by far the most common. Other types include
hierarchical databases and network databases.
Although database management systems have been around since the
1960s, relational databases didn't become popular until the 1980s when the
power of the computer skyrocketed and it became feasible to store data is sets of
related tables and provided real-time data access.
1
RDBMS have become predominant choice since the 1980s a for the
storage of information in new databases used for financial records,
manufacturing and logistical information, personnel data, and much more.
Relational
databases
have
often
replaced
legacy hierarchical
databases and network databases because they are easier to understand
and use.
In the relational model, all data must be stored in relations (tables), and
each relation consists of rows and columns. Each relation must have a
header and body. The header is simply the list of columns in the relation. The
body is the set of data that actually populates the relation, organized into
rows. You can extrapolate that the junction of one column and one row will
result in a unique value - this value is called a tuple.
The second major characteristic of the relational model is the usage of
keys. These are specially designated columns within a relation, used to order
data or relate data to other relations. One of the most important keys is the
primary key, which is used to uniquely identify each row of data. To make
querying for data easier, most relational databases go further and physically
order the data by the primary key. Foreign keys relate data in one relation to
the primary key of another relation.
2
The relational model is concerned with what is required, which
separates it from concerns of how the model will be implemented. The how is
the concern of the relational DBMS. The relational model focuses on
representing data through relationships held between those data items. This
approach has its theoretical basis in set theory and predicate logic - data is
therefore represented in the form of tuples. The data in the relational model is
queried/manipulated using relational algebra or relational calculus.
The Relational DBMS is concerned with how the relational model will
be implemented. The data from the relational data model is represented in the
form of tables and rows. The data is queried using particular query
languages, most commonly this is a language known as SQL (Sequential
Query Language).
II.
Statement of the Problem
1. DBMS Architecture - “Open Problem” even up to now
1.1 What is DBMS in the Past? In the Present? And in the Future?
1.2 What are the security issues in database management system?
3
III.
Related Literature
Database is generally design “to operate large quantities of
information by inputting, storing, retrieving, and managing that information.
Databases are set up so that one set of software programs provides all
users with access to all the data.” Database Management System (DBMS)
allows users to define, create and maintain a database and provides
controlled access on handling data which includes adding, retrieving and
updating data.
It helps the user to management or manipulates data inside a
database or the data storage. There four (4) main of types of Database
Management System (DBMS) which has its own category or applicability,
and usefulness. The structure of each type DBMS is different. One of
these types is the Relational Database Management System (RDBMS).
Based upon the structure of an RDBMS, “the database relationships are
treated in the form of a table. There are three keys on relational DBMS:
relation, domain and attributes. A network means it contains a
fundamental constructs sets or records. Sets contains one to many
relationship, records contains fields statistical table that is composed of
rows and columns is used to organize the database and its structure and
is actually a two dimension array in the computer memory”. As have said
RDBMS are widely used around the world, because it often replaced
4
legacy hierarchical databases and network databases because RDBMS
are easier to understand and use.
Nowadays, one of the types of Database Management System
(DBMS) is widely used around the world; it is the Relational Database
Management System (RDBMS).
The RDBMS has different types, such as, Oracle, SQL Server,
MySQL, Sybase, DB2, and among others. RDBMS Architecture is a 30
year old running architecture in an old platform. According to Michael
Stonebraker, a computer scientist specialized on databases, “Traditional
databases are slow not because SQL is slow. It’s because of their
architecture and the fact that they are running code that is 30 years old.”
Here are the following problems we frequently encountered:
1. Performance level. It has a very poor performance level. It may
cause lagging or even a possibility that the database may crashed.
2. Bloat ware. Hence it is an Elephant System, bloat ware might
occur. This may affects the usefulness of the software, for the
reason that it extremely reduces disk-space and large memory or
the Random Access Memory (RAM).
5
3. Scalability. The ability of a DBMS on handing a big data is also an
issue on old DBMS, especially when handling multiple transactions
at the same time.
RDBMS uses executional plans in order to evaluate the data that
the user requested at the run time, it uses such algorithm called costbased algorithm for handling task scheduling. Through this, it establishes
an ad hoc manner by the used of defined elements of the database.
According to Tony Bain “Even though RDBMS have provided database
users with the best mix of simplicity, robustness, flexibility, performance,
scalability, and compatibility, their performance in each of these areas is
not necessarily better than that of an alternate solution pursuing one of
these benefits in isolation.”
RDBMS has its delightful features to ensure its concurrency,
integrity, flexibility and other logical and/or outcome aspects, but still
encountered critical situations after another. For over thirty years of
technological innovation, the relational database architecture design is
getting obsolete wherein it is no longer capable handling too much big
data, scalability, performance, concurrency and other issues encountered
in consuming this kind of technology. Performance and Scalability is
important in any DBMS. “There are five factors that influence database
6
performance:
workload,
throughput,
resources,
optimization,
and
contention… database performance can be defined as the optimization of
resource use to increase throughput and minimize the contention,
enabling the largest possible workload to be processed.”
New ideas for database architecture are needed, that can go with
the flow on the innovation of technology. Thus, it is demanding but still
needs to preserve some feature of the database capabilities that can help
to improve the performance level and can supports higher scalability of the
services given in a database management system. Performance and
Scalability through modern innovative software architecture can help the
non-functional requirements of a Database Management System by
completely different implementation of new design architecture.
For the reason that databases architecture and the building codes
are old for the present competences and proficiencies of fast changes of
technology and to cope-up the demands of user for services, it is like
running an Elephant System. This affects the performance on how fast the
retrieving, transferring, updating, saving data and avoiding crashes of
databases or hardware when there is a simultaneous and continuous
transaction of big data. One of the biggest problem in databases is that
the existing legacy implementation of it.
7
Nowadays, there are proposed and currently used new developed
database architecture or platform that can support Online Transaction
Processing (OLTP) or Online Analytical Processing (OLAP). An “OLTP
systems is put on very fast query processing, maintaining data integrity in
multi-access environments and an effectiveness measured by number of
transactions per second.” While OLAP is “a category of software tools that
provides analysis of data stored in a database. OLAP tools enable users
to analyze different dimensions of multidimensional data. For example, it
provides time series and trend analysis views. OLAP often is used in data
mining.” The Old SQL or old database structure for OLTP is very slow and
does not scale. Finance monitoring, website analytics, online gaming and
more are the things that new OLTP work on and all these thing must
responds on real-time basis. Processing data and validation of it are all in
real-time. Furthermore it requires support for transactions that can span a
network and may include more than one company. That’s why the new
OLTP software uses client-server processing and brokering software that
allows transactions to run on different computer platforms in a network.
At the present time, there are still users/clients that uses old
databases management system which are the legacy RDBMS. This is
known as Elephant systems. Using such DBMS can bump into mediocre
performance. Michael Stonebraker once said that these kind of databases
8
are “slow because they spend all of their time on not useful work” other
than useful works.
One of the solutions for the new era of technology is developing a
new database known as NoSQL (short term for Not Only SQL). It is
“capable of high throughput”. NoSQL doesn’t adopt the implementation of
Atomicity, Consistency, Isolation, Durability (ACID) and Standard Query
Language (SQL) features, in order to have high performance services of a
DBMS. Tim Perdue a Director of Information Technology for a mid-sized
tech company in Northern Virginia said that “The idea is that both
technologies can coexist and each has its place. The NoSQL movement
has been in the news in the past few years as many of the Web 2.0
leaders have adopted a NoSQL technology. Companies like Facebook,
Twitter, Digg, Amazon, LinkedIn and Google all use NoSQL in one way or
another.” It emerged from a need to data storage, more interconnected
data and it can handle complex data structure. “Since the rise of the
web, the volume of data stored about users, objects, products and events
has exploded.
Data is also accessed more frequently, and is processed more
intensively – for example, social networks create hundreds of millions of
customized, real-time activity feeds for users based on their connections'
activities” , by this means NoSQL provides two (2) technical approaches to
9
address shortcomings of scalability and agility challenges these are
manual sharding and distributed cache. There are two recently proposed
NoSQL language standards namely CQL and UnQL.
According to Edmond Lau, “The main problems that a NoSQL aims
to solve typically revolve around issues of scale. When data no longer fits
on a single MySQL server or when a single machine can no longer handle
the query load, some strategy for sharding and replication is required.”
NoSQL is better suited for Web application because of its ability to
cope up with real-time service through web. But there still delinquent
moments that will be encountered using this database. “These databases
can be resource intensive. They demand higher CPU and RAM allocation
than any relational database… when your website gets a traffic boost, be
prepared to allocate more resources in a hurry. And that’s the problem
with these NoSQL databases. And that’s the reason most shared Web
hosting companies will not offer them to you on a shared hosting account.
You need a Cloud or VPS or a dedicated server.”
Another thing is that when giving up ACID, rolling back data on your
own is hard and can be error-prone. Because ACID is: Atomicity is like
follows all or nothing rule, comparable to processing a data and if it is fully
completed or not. For Consistency it ensures only valid data. Isolation
10
“ensures that transactions are securely and independently processed at
the same time without interference, but it does not ensure the order of
transactions.” And lastly Durability, it assures that transactions committed
are perform or compete over a long period of time and will survive
permanently.
One of the enhancements made to overcome the problem with the
scalability and performance of database is having a solution in preventing
four (4) main overhead problems. These four main overhead problems are
online failover, online failback, Local Area Network (LAN) Partitioning and
Wide Area Network Partitioning (WAN). Through these problems it needs
a solution to write-ahead logging. One is that considering the node speed
so that it can accommodate greater scale through reducing network traffic
and reducing the probability of failures to hardware. If the database has
main-memory storage, it is single threaded and has durability it will
prevent locking / latching situation and also avoid log. NewSQL has this
kind of ability. It preserves ACID and SQL but uses newly and different
architecture of database implemented in a new platform in order to provide
performance and scalability. Its Architecture provides much higher pernode performance. It can handle the same workload of NoSQL. Although
NewSQL has these kinds of aptitudes, it is still new to the industry and
need to explore more.
11
Though NewSQL is good and has its own approach to consider
there are still a lot of users of NoSQL. And these users preferred using
NoSQL rather than the old SQL and NewSQL. User supports for the
development of NoSQL because of its maturity. “Enterprises want the
reassurance that if a key system fails, they will be able to get timely and
competent support. All RDBMS vendors go to great lengths to provide a
high level of enterprise support… NoSQL databases are becoming an
increasingly important part of the database landscape, and when used
appropriately, can offer real benefits. However, enterprises should
proceed with caution with full awareness of the legitimate limitations and
issues that are associated with these databases. ”
Performance and Scalability are two main things that are needed to
contemplate on a Database Management System. Data partitioning is still
the fundamental issue in high performance database processing. The data
itself is getting more complex, including XML-based data bio-informatics
data and data streams. Another thing to consider is the database security.
According to Amichai Shulman the Co-founder and CTO of Imperva, Inc.
there are ten Database Security Threats these are: “Excessive Privilege
Abuse, Legitimate Privilege Abuse, Privilege Elevation, Database Platform
Vulnerabilities, SQL Injection, Weak Audit Trail, Denial of Service,
Database Communication Protocol Vulnerabilities, Weak Authentication
and Backup Data Exposure”.
12
Excessive Privilege Abuse is when users (or applications) are
granted database access privileges that exceed the requirements of their
job function, these privileges may be abused for malicious purpose. The
solution to excessive privileges is query-level access control.
Users
may
also
abuse
legitimate
database
privileges
for
unauthorized purposes. Attackers may take advantage of database
platform software vulnerabilities to convert access privileges from those of
an ordinary user to those of an administrator. Vulnerabilities may be found
in stored procedures, built-in functions, protocol implementations, and
even SQL statements. Vulnerabilities in underlying operating systems
(Windows 2000, UNIX, etc.) and additional services installed on a
database server may lead to unauthorized access, data corruption, or
denial of service. In a SQL injection attack, a perpetrator typically inserts
(or “injects”) unauthorized database statements into a vulnerable SQL
data channel. Typically targeted data channels include stored procedures
and Web application input parameters. These injected statements are
then passed to the database where they are executed. Using SQL
injection, attackers may gain unrestricted access to an entire database.
Automated recording of all sensitive and/or unusual database transactions
should be part of the foundation underlying any database deployment.
13
Weak database audit policy represents a serious organizational risk on
many levels.
Denial of Service (DOS) is a general attack category in which
access to network applications or data is denied to intended users. Denial
of service (DOS) conditions may be created via many techniques - many
of which are related to previously mentioned vulnerabilities. Database
communication protocol attacks can be defeated with technology
commonly referred to as protocol validation. Protocol validation technology
essentially parses (disassembles) database traffic and compares it to
expectations. In the event that live traffic does not match expectations,
alerts or blocking actions may be taken. Weak authentication schemes
allow attackers to assume the identity of legitimate database users by
stealing or otherwise obtaining login credentials. An attacker may employ
any number of strategies to obtain credentials. Lastly is when backing up
a database. Backup database storage media is often completely
unprotected from attack. As a result, several high profile security breaches
have involved theft of database backup tapes and hard disks.
John Ottman, the author of Save the Database, Save the World,
that there are many approaches in network- or perimeter-based and not
really centered on the database and that might be the biggest issue. “We
have spent, as an industry, as a society, billions of dollars over the last 15
14
to 20 years on building security solutions for our infrastructure. Almost all
of that has gone into network- and perimeter-oriented approaches. We
have done some work with operating systems and spam and things like
this. But it has really been focused on perimeter- and network-based
security solutions and as our research shows, only 10% of databases
have gotten that kind of focus so our message is that you have to protect
the data where it lives - in the database. It is kind of like a bank locking
the front door and leaving the bank vault open if you don't deal with the
issue of database security”, he said. Ottman also said that “Obviously, a
database administrator has universal access to the database and we have
to have somebody who has universal access to manage the database. But
the most common database security audit filing is a separation of duty
violation, where database administrators are deemed to be privileged
users who should have compensating control. In other words, it is
theoretically possible for a database administrator to turn off audit and
logging, do something to the database that might be nefarious - and then
turn the audit back on, after they are done, so there are no
footprints. That is a very typical SOX audit finding. And database activity
monitoring is a pretty standard solution to resolve that. I think part of the
issue is that the suggestion there is that database administrators are
somehow the problem. Database administrators are actually the solution
and there should be no thought of demonizing database administrators.
They are critical to the solution set. But - compensating control of people
15
who have privileged access to sensitive data is a critical issue in many
regulatory filings.” He also said that if they really want personally
identifiable information will be protected, there has to be enforcement.
Next is that database security professionals is avoiding inference
capabilities. Basically, inference occurs when users are able to piece
together information at one security level to determine a fact that should
be protected at a higher security level.
IV.
Finding and Analysis
DBMS Architecture - “Open Problem” even up to now.
Database management systems are complex softwares which were
often developed and optimized over years. Since the birth of DBMS thirty (30)
years ago, DB researchers faced up to the question of how to design a dataindependent database management system (DBMS), that is, a DBMS which
offers an appropriate application programming interface (API) to the user and
whose architecture is open for open for criticism, updates, refinery and
innovation. For this purpose, an architectural model based on successive
data abstraction steps of record-oriented data was proposed as kind of a
standard and later refined to a five-layer hierarchical DBMS model.
Furthermore, we consider the interplay of the layered model with the
transactional
Atomicity,
Consistency,
Isolation
and
Durability
(ACID)
properties and again outline the progress obtained.
16
In the seventies, the scientific discussion in the database (DB) area
was dominated by heavy arguments concerning the most suitable data
model. It essentially focused on the question of which abstraction level is
appropriate for a DB application programmer. The network data model seems
to be best characterized by “the more complex the pointer-based data
structure, the more accurate is the mini-world representation”. However, it
offers only very simple operations forcing the programmer to navigate through
cursor-controlled data spaces. In that time, the decision concerning the most
appropriate data model could be pinpointed to “record orientation and pointerbased, navigational use” vs. “set orientation and value-based, declarative
use”.
Far ahead of the common belief of his time, E. F. Codd taught us that
simplicity is the secret of data independence—a property of the data model
and the database management system (DBMS) implementing it. A high
degree of data independence is urgently needed to let a system “survive” the
permanent change in computer science in general and in the DB area in
particular.
Nowadays, however, important DBMS requirements include data
streams, unstructured or semi-structured documents, time series, spatial
objects, and so on. What were the recommendations to achieve the system
17
properties for which the terms physical and logical data independence were
coined?
It is immediately clear that a monolithic approach to DBMS
implementation is not very reasonable. It would mean to map the data model
functionality (e.g., SQL) in a single step to the interfaces offered by external
storage devices, e.g., read/write block. Since the development of Database
Management System (DBMS), new system evolution requirements were
abundant: growing information demand led to enhanced standards with new
object types, constraints, etc.; advances in research and development bred
new storage structures and access paths, etc.; rapid changes of the
technologies
used
and
especially
Moore’s
law
had
far-reaching
consequences on storage devices, memory, connectivity (e.g., Web), and so
on.
Developing a hierarchically structured system offers the following
important benefits:
• The implementation of higher-level system components is simplified
by the usage of lower-level system components.
• Lower-level system components are independent of functionality and
modifications in higher-level system components.
18
• Testing of lower-level system components is possible, before the
higher system levels are put into use.
The resulting abstraction hierarchy hides some properties of a system
level (an abstract machine) from higher-layer machines. Furthermore, the
implementation of higher-level operations extends the functionality of an abstract
machine. System evolution is often restricted to the internals of such abstract
machines when, for example, a function implementation is replaced by a more
efficient one. In case new functionality extends their interfaces, the invocation of
these operations implies “external” changes which are, however, limited to the
next higher layer.
Description of the DBMS mapping hierarchy
Level of abstraction
Objects
Auxiliary mapping data
Nonprocedural or
Tables, views, tuples
Logical
algebraic access
scheme
description
Record-oriented,
Records, sets,
Logical and physical
navigational access
hierarchies, networks
schema description
Record and access
Physical records,
Free space tables, DB-
path management
access paths
key
translation tables
Propagation control
Segments, pages
DB buffer, page tables
File management
Files, blocks
Directories, VTOCs, etc.
19
The architectural description embodies the major steps of dynamic
abstraction from the level of physical storage up to the user interface. At the
bottom, the database consists of huge volumes of bits stored on non-volatile
storage devices, which are interpreted by the DBMS into meaningful information
on which the user can operate. With each level of abstraction (proceeding
upwards), the objects become more complex, allowing more powerful operations
and being constrained by a growing number of integrity rules. The uppermost
interface supports a specific data model, in our case by a declarative data access
via SQL.
The bottom layer, called File Management, operates on the bit pattern
stored on some external, non-volatile storage device. Often in collaboration with
the operating system’s file management, this layer copes with the physical
characteristics of each type of storage device.
Propagation Control as the next higher layer introduces different types of
pages which are fixed-length partitions of a linear address space and mapped
into physical blocks which are, in turn, stored on external devices by the file
management. The strict distinction between pages and blocks offers additional
degrees of freedom for the propagation of modified pages. For example, a page
can be stored in different blocks during its lifetime in the database thereby
enabling atomic propagation schemes (supporting failure recovery based on
logical logging). To effectively reduce the physical I/O, this layer provides for a
20
(large) DB buffer which acts as a page-oriented interface (with fix/unfix
operations) to the fraction of the DB currently resident in memory.
The Record and Access Path Management implements mapping functions
much more complicated than those provided by the two subordinate layers. For
performance reasons, the partitioning of data into segments and pages is still
visible at this layer. It has to provide clustering facilities and maintain all physical
object representations, that is, data records, fields, etc. as well as access path
structures, such as B-trees, and internal catalog information.
It typically offers a variety of access paths of different types to the
navigational access layer. Especially with the clustering options and the provision
of flexibly usable access paths that are tailored to the anticipated workloads, this
layer plays a key role for the entire DBMS performance.
Extensions and Optimizations
While the explanation model concerning the DBMS architecture is still
valid, an enormous evolution/progress has been made during the last two
decades concerning functionality, performance, and scalability. The fact that all
these enhancements and changes could be adopted by the proposed
architecture, is a strong indication that we refer to a salient DBMS model. We
21
cannot elaborate on all extensions, let alone to discuss them in detail, but we
want to sketch some major improvements/changes.
In the past thirty (30) years ago, SQL—not standardized at that time—and
the underlying relational model were simple. Today, we have to refer to
SQL:2013 and an object-relational model which are complex and not well
understood in all parts. Many of the new aspects and functions—such as userdefined types, type and table hierarchies, recursion, constraints, triggers—have
to be adjusted. While initially query translation and optimization started with solid
foundations enabled the integration of new mechanisms, and could be
successfully improved in particular, by using refined statistics (in particular,
histograms), some of the new language concepts turn out to be very hard for the
optimization.
Furthermore, functionality for “arbitrary” join predicates, reuse of
intermediate query evaluation results, sorting (internally usually optimized for
relatively small sets of variable length records in memory as well as external
sort/merge), etc. was improved and much better integrated. In particular, spaceadaptable algorithms contribute to great improvements and support load
balancing and optimized throughput, even for high multi-programming levels.
Moreover, it should not rely on special data preparation and its optimal
use should not require the knowledge on system internals or expert experience.
22
Furthermore, over-specialized use and tailoring to narrow applications do not
promise practical success in DBMSs5.
Finally, many of these methods disregard the DBMS environment, where
dependencies to locking and recovery issues, integration into optimizer
decisions, support of mixed and unexpected workload characteristics have to be
regarded.
Finally, the huge DB buffer capacity facilitated the provision of buffer
partitions where each partition can individually be tailored to the anticipated
locality behavior of a specific workload. Nevertheless, buffering demands of VITA
applications, considered in various projects, cannot be integrated in any
reasonable way the transfer of the huge data volumes through the layered
architecture up to the application.
OS people proposed various improvements in file systems where only
some were helpful for DB management, e.g., distribution transparency. Logstructured files, for example, turned out to be totally unsuitable. Furthermore,
there is still no transaction support available at this layer of abstraction. A lot of
new storage technology was invented during the last thirty (30) years disks of
varying capacity, form and geometry, DVDs, WORM storage, electronic disks,
etc.. Their integration into our architectural model could be transparently
performed as far as the standard file interfaces were concerned.
23
Architectural Variants
Up to now, we have intensively discussed the questions of data mapping
and transactional support in a centralized DBMS architecture. In the last thirty
(30) years, however, a variety of new data management scenarios emerged in
the DBMS area.
Architectural Requirements
So far, our architectural layers perfectly match the invariants of setoriented, record-like database management such that they could be reused more
or less unchanged in the outlined DBMS variants. However, recent requirements
strongly deviate from this processing paradigm. Integration efforts developed
during the last twenty (20) years were primarily based on a kind of loose coupling
of components—called Extenders, DataBlades, or Cardridges—and a so-called
extensibility infrastructure. Because these approaches could neither fulfill the
demands for seamless integration nor the overblown performance and scalability
expectations, future solutions may face major changes in the architecture.
The Ten (10) Commandments of Database Management System (DBMS)
General Rules
1. Recovery based on logical logging relies on a matching operation-consistent
state of the materialized DB at the time of recovery.
2. The lock granule must be at least as large as the log granule.
24
3. Crash recovery under non-atomic propagation schemes requires Redo
Winners resp.
4. State logging requires a WAL protocol (if pages are propagated before
Commit).
5. Non-atomic propagation combined with logical logging is generally not
applicable.
6. If the log granularity is smaller than the transfer unit of the system (block
size), a system crash may cause media recovery.
7. Partial rollback within a transaction potentially violates the 2PL protocol
8. Log information for Redo must be collected independently of measures for
Undo.
9. Log information for Redo must be written at the latest in phase 1 of Commit.
10. To guarantee repeatability of results of all transactions using Redo recovery
based on logical logging, their DB updates must be reproduced on a
transaction basis (in single-user mode) in the original Commit sequence.
CODD Rules
Codd's Rule Maulin Thaker Ahmedabad There are 13 (0 to 12) rules which
were presented by Dr. E.F.Codd ,in June 1970,in ACM (Association of Computer
Machinery)
25
Rule 0.
Relational
Database
management
“A
relational
database
management system must use only its relational capabilities to manage
the information stored in the database”.
Rule 1.
The information rule All information in the database to be
represented in one and only one way, Namely by values in column
positions within rows of tables.
Rule 2.
Logical accessibility This rule says about the requirement of
primary keys. Every individual value in the database must be logically
addressable by specifying the name of table, column and the primary key
value of the row.
Rule 3.
Representation of null values The DBMS is required to support a
representation of "missing information and inapplicable information" (for
example, 0 'Zero' is different from other Numbers), This type of information
must be represented by the DBMS in a systematic way (For example Null
Character ).
26
Rule 4.
Catalog Facilities The system is required to support an on line, in
line, relational data access to authorized users by using their Query
language.
Rule 5.
Data Languages. The system must support a least one relational
language (It may support more than one relational language) that
(a) has a linear syntax,
(b) can be used in two ways and within application programs,
(c) supports data operations security and integrity constraints, and
transaction management operations (commit).
Rule 6.
View Updatability All views that are theoretically updatable must be
updatable by the system.
Rule 7.
Update and delete. The system must support INSERT, UPDATE,
and DELETE operators.
27
Rule 8.
Physical data independence Changes to the physical level (how the
data is stored, whether in arrays or linked lists etc.) must not require a
change to an application based on the structure.
Rule 9.
Logical data independence Changes made to tables to modify any
data stored in the tables must not require changes to be made to
application programs. Logical data independence is more difficult to
achieve than physical data independence.
Rule 10.
Integrity Constraints Integrity constraints must be specified
separately from application programs and stored in the catalog. It must be
possible to change such constraints when they are unnecessarily affecting
existing applications.
Rule 11.
Database Distribution The RDBMS may spread across more than
one system and across several networks, however the tables should
appear in same manner to every user like local users.
28
Rule 12.
The Non Subversion rule If the system provides a low-level
interface, then that interface cannot be used to weaken the system (e.g.)
bypassing a relational security or integrity constraint.
What is DBMS in the Past? In the Present? And in the Future?
In the late 1800s, Thomas Edison and George Westinghouse became
embroiled in what has become known as "The War of the Currents." Edison had
invested heavily in infrastructure, supporting the use of direct current for the
distribution of electricity. Westinghouse, having bought patents on the inventions
of Tesla, advocated alternating current. For a short period of time, there were two
sets of infrastructure that operated under different assumptions about how power
should be transported and consumed. Fortunately, the technology was young
and the infrastructure was immature, so the cost of competing standards was
relatively low.
Fast forward about 100 years, data is the power that runs a modern
business. When consumers of that power have different assumptions, a
transformation is required. Very bad things (including the loss of infrastructure
investment) can happen if that transformation is not carefully planned. Currently,
there are at least three major data management paradigms - ISAM,
SQL/Relational, and XML - in use, with XML poised to explode.
29
Each has different assumptions regarding how data models the
organization's view of the real world. When different models try to operate on the
same source of power, they must reconcile these differences. The process of
reconciling these assumptions can result in data loss, performance degradation,
system fragility, or feature unavailability. The ideal solution is for every consumer
to enjoy native and natural access to the power source, without one model
compromising another.
Because data is the power for business, DBMSs (and their associated
processes) tend to evolve more slowly than other, less mission-critical segments
of computing infrastructure. In fact, database management models have
remained relatively unchanged through the emergence and explosion of the
Internet. It usually requires a significant shift in business practice to engender
any change in database management. Distributed computing (such as web
services) appears to be one such shift; it is growing rapidly and many major
vendors expect distributed computing to be the next major paradigm in
application development. XML is establishing the system for distributed data
management, and this system does not neatly fit with the assumptions made by
existing infrastructure.
30
Database Concepts of Concern
The first assumptions involve three relatively basic (and familiar) database
concepts: entities, attributes, and relationships. Each model addresses
collections, operational efficiency, and the relationship of these concepts to the
rest of the computing environment. There are other issues involved with data
management (concurrency, operation atomicity, relational integrity, and so on),
but these issues are not necessarily differentiating factors between models.
The first basic concept is that of entity - the thing that is being stored and
is representative of something in the external world, such as a customer, invoice,
or inventory item. It may be thought of as the most granular representation of
data that retains context. The question of what exactly comprises an entity is
more frequently resolved through data/business analysis than through application
of the normalization formulas found in database textbooks.
The second concept is that of attribute - a descriptor of an entity.
Depending on your particular prejudices, you may think of attributes as fields or
columns. Attributes rely on the entity for context.
The third concept is that of relationships. A customer entity and three
order entities are useless in a business process unless you have some way of
making their relationship persistent; that is, some way of denoting that the order
31
entities were made by the person represented by the customer entity. In
relational theory, this can be represented by foreign key relationships.
Obviously, a database would not be of much use if it allowed you to have
one customer entity, one order entity, and one inventory entity. It would be
almost as useless if it did not let you access the collection of customer entities
independently from the collection of order entities.
Although a flat text file could act as the basis of these concepts, the
practical requirement is that specific information can be quickly and efficiently
retrieved. It is critical that the performance does not degrade as more entities are
added. This performance requirement is most often met through the use of
indexes or keys.
Finally, there is the relationship between the database system and the rest
of the computing environment - in particular, the operating system and
application. Early on, the database paradigm was represented by a set of
procedures and coding standards that dictated how a particular shop's
application code interacted with operating-system code. The evolution of
database systems has seen at least one consistent trend: the abstraction of the
database paradigm from application and operating-system constraints and the
encapsulation of that abstraction within a database infrastructure.
32
In other words, relational database application developers typically no
longer worry about the offset and length of a particular attribute or the particular
OS file that contains the attribute data. Those issues are abstracted within a
database infrastructure that generally is viewed as, if not a black box, then a
really, really dirty one with very tiny windows.
Stages of Database Evolution
Database technology has evolved through several stages, including ISAM,
SQL/Relational, and XML:
ISAM. Although ISAM has not been formally standardized as a data
model, thanks to the dominance of Cobol and the effect of that dominance on
database management, there is a common set of well-understood expectations
for an ISAM DBMS. In the ISAM paradigm, entities are records. Attributes are
understood to be data stored starting at a specific offset for a specific length. The
application is responsible for maintaining relationships, usually performed in
much the same way as the relational model, where entities are collected in OS
files, and the application (and thus the developer) is responsible for knowing
which set of records is in which file.
The application can include multiple types of records in one file, but any
differences in entity type within a file must be implemented, understood, and
33
maintained by the application. The DBMS does not understand any distinction
between different entity types within the same entity collection.
Efficiency is achieved through the use of indexes. Since the DBMS is
responsible for maintaining index information and the DBMS does not make any
distinction between entity types within an entity collection, an ISAM file indexes
the same attributes for an entire collection. This can result in added responsibility
for the application if multiple entity types are in the same collection. Furthermore,
since the DBMS is unaware of any nonindexed attributes of the entity, the same
entity can be viewed as having several different compositions, and there is no
guarantee that the attribute indexed by the DBMS is an attribute that is
meaningful to the application.
An ISAM application acts as if it is operating on the physical
representation of the record (which it is, in most implementations). As a result,
much of the database management of the ISAM paradigm is closely tied to both
the operating system and the application.
SQL/Relational. From a theoretical standpoint, the SQL paradigm and
relational model are not synonymous. In fact, SQL can be used to build result
sets that do not meet relational requirements. However, the average computing
professional is not interested in purely theoretical DBMSs, and when most people
use a relational database, they are almost invariably using SQL to manipulate the
34
data (whether directly or under the covers, as is often the case with ADO).
Collections of entities and attributes may be arbitrarily defined at run time through
SQL predicates (as well as through views). Relationships are persisted in much
the same way as the ISAM model (the constraints, such as primary key
uniqueness, are formalized in the relational model, but the ISAM model is similar
in practice).
To summarize, the relational model abstracts the database from the
operating system and to some extent from the application. There is no longer an
exploitable interaction between the operating system and the DBMS.
Furthermore, while the application may have foreknowledge of the database
composition, it is incapable of using that knowledge in a manner that is not
understood beforehand by the DBMS. An application can also be written that
derives all of its information about the database at run time, which is certainly not
the case with the ISAM paradigm. This abstraction frees the application and the
database administrator from a number of concerns regarding the internals of data
management, but it also demands that the application conform to the
expectations of the model.
The .NET framework is an example of the pipeline approach. The
architecture proposed by Microsoft has a SQL database (which generally
performs pipelined ISAM atomics) returning query results as a disconnected, inmemory database. This database can then be transformed as needed or viewed
35
as XML or as a record set. The power in this is the flexibility for developers; the
weaknesses involve the performance and concurrency issues of the multilayered
disconnected approach, the mapping that must be performed beforehand, and
the inability to adapt quickly to changes without losing data.
Architectural issues
With all the solutions currently being offered, how do you know which is
right? Much of the decision involves your need to communicate with legacy
databases. If you know that your need to interact with XML data is isolated from
your need to work with your legacy relational data, then an XML database
solution most directly addresses the paradigm in which you plan to work. The
most common cases, though, involve integration of XML data with legacy
databases. Inertia would suggest that the majority of adopted solutions are going
to be based on the current solution offered by the vendor of the existing legacy
store.
The problem with this is that some of the existing solutions, such as the
pipeline approach, will experience scalability problems that may not be apparent
upon initial deployment.
36
What are the security issues in database management system?
There are several co-related activities in the database area and computer
architecture that make the discussion of database machines and their
implications on DBMS standards timely and meaningful.
First, in the database area there is a drive toward more powerful database
management systems which support high-level data models and languages. The
motive for this drive is the requirement to greatly improve user/programmer
productivity and to protect applications from changes in the user environment.
However, supporting these interfaces with software means often introduces
inefficiency in database management systems because of the many levels of
complex software which are required to map the high-level data representation
and languages to the low level storage representation and machine codes.
Second, the need for systems which handle very large databases is
increasing rapidly. Very large databases complicate the problems of retrieval,
update, data recovery, transaction processing, integrity, and security. Software
solutions to these problems work well for both small databases supporting many
applications and large databases supporting only a few applications. However,
the labor-intensive cost, time delays and reliability problems associated with
software development and maintenance will soon become prohibitive as large
and highly shared databases emerge. The search for hardware solutions to these
37
problems is a necessary and viable alternative for balancing functionality and
price/performance.
Third, the progress made in hardware technology in the past decade is
phenomenal. The cost of memories, processors, terminals and communication
devices has dropped and will continue to drop at a drastic rate. It is time for a
reevaluation of the traditional role of hardware and software in solving problems
of today and tomorrow in database management.
Database Security Issues
Daily Maintenance:
Database audit logs require daily review to make certain that there has
been no data misuse. This requires overseeing database privileges and then
consistently updating user access accounts. A database security manager also
provides different types of access control for different users and assesses new
programs that are performing with the database. If these tasks are performed on
a daily basis, you can avoid a lot of problems with users that may pose a threat
to the security of the database.
Varied Security Methods for Applications:
More often than not applications developers will vary the methods of
security for different applications that are being utilized within the database. This
can create difficulty with creating policies for accessing the applications. The
38
database must also possess the proper access controls for regulating the varying
methods of security otherwise sensitive data is at risk.
Post-Upgrade Evaluation:
When a database is upgraded it is necessary for the administrator to
perform a post-upgrade evaluation to ensure that security is consistent across all
programs. Failure to perform this operation opens up the database to attack.
Split the Position:
Sometimes organizations fail to split the duties between the IT
administrator and the database security manager. Instead the company tries to
cut costs by having the IT administrator do everything. This action can
significantly compromise the security of the data due to the responsibilities
involved with both positions. The IT administrator should manage the database
while the security manager performs all of the daily security processes.
Application Spoofing:
Hackers are capable of creating applications that resemble the existing
applications connected to the database. These unauthorized applications are
often difficult to identify and allow hackers access to the database via the
application in disguise.
39
Manage User Passwords:
Sometimes IT database security managers will forget to remove IDs and
access privileges of former users which leads to password vulnerabilities in the
database. Password rules and maintenance needs to be strictly enforced to
avoid opening up the database to unauthorized users.
Windows OS Flaws:
Windows operating systems are not effective when it comes to database
security. Often theft of passwords is prevalent as well as denial of service issues.
The database security manager can take precautions through routine daily
maintenance checks.
These are just a few of the database security problems that exist within
organizations. The best way to avoid a lot of these problems is to employ
qualified personnel and separate the security responsibilities from the daily
database maintenance responsibilities.
NoSQL – Current Trends and issues about DBMS
On late 1960's there are existing NoSQL Databases and that are used,
these database are considered as not relational database. But it is not still that
popular. On today's time NoSQL database are popular on the market because
companies and organizations that uses relational database shift to NoSQL.
40
Some documents said that NoSQL are better to use than relational databases
because it doesn't adopt ACID and SQL.
Also according to Guy Harrison that "non-relational, "cloud," or "NoSQL"
databases are gaining mindshare as an alternative model for database
management". Because it has elastic scalability, can handle big data, it also
requires less management, it is cheaper commodity servers on handling multiple
transaction. "Their primary advantage is that, unlike relational databases, they
handle unstructured data such as word-processing files, e-mail, multimedia, and
social media efficiently." There are a lot of NoSQL databases but it has different
approaches.
Either way, "Relational databases are based on Edgar F. Codd's relational
data model which assumes strictly structured data. The whole SQL language is
constructed around this model and the databases which implement it are
optimized for working that way. But in the past few years, there were attempts to
add features to SQL which allow to work with unstructured data, like the
SQL/XML extension which allows to store XML documents in fields of SQL tables
and query their document-trees transparently. Document-oriented databases like
MongoDB or CouchDB, on the other hand, were designed from the start to work
with unstructured data and their query languages were designed around this
concept, so when working with unstructured data they are usually much faster
and more convenient to use."
41
Five advantages of NoSQL
1: Elastic scaling
For years, database administrators have relied on scale up -- buying
bigger servers as database load increases -- rather than scale out -- distributing
the database across multiple hosts as load increases. However, as transaction
rates and availability requirements increase, and as databases move into the
cloud or onto virtualized environments, the economic advantages of scaling out
on commodity hardware become irresistible.
RDBMS might not scale out easily on commodity clusters, but the new
breed of NoSQL databases are designed to expand transparently to take
advantage of new nodes, and they're usually designed with low-cost commodity
hardware in mind.
2: Big data
Just as transaction rates have grown out of recognition over the last
decade, the volumes of data that are being stored also have increased
massively. O'Reilly has cleverly called this the "industrial revolution of data."
RDBMS capacity has been growing to match these increases, but as with
transaction rates, the constraints of data volumes that can be practically
managed by a single RDBMS are becoming intolerable for some enterprises.
42
Today, the volumes of "big data" that can be handled by NoSQL systems, such
as Hadoop, outstrip what can be handled by the biggest RDBMS.
3: Goodbye DBAs (see you later?)
Despite the many manageability improvements claimed by RDBMS
vendors over the years, high-end RDBMS systems can be maintained only with
the assistance of expensive, highly trained DBAs. DBAs are intimately involved in
the design, installation, and ongoing tuning of high-end RDBMS systems.
NoSQL databases are generally designed from the ground up to require less
management: automatic repair, data distribution, and simpler data models lead
to lower administration and tuning requirements -- in theory. In practice, it's likely
that rumors of the DBA's death have been slightly exaggerated. Someone will
always be accountable for the performance and availability of any mission-critical
data store.
4: Economics
NoSQL databases typically use clusters of cheap commodity servers to
manage the exploding data and transaction volumes, while RDBMS tends to rely
on expensive proprietary servers and storage systems. The result is that the cost
per gigabyte or transaction/second for NoSQL can be many times less than the
cost for RDBMS, allowing you to store and process more data at a much lower
price point.
43
5: Flexible data models
Change management is a big headache for large production RDBMS.
Even minor changes to the data model of an RDBMS have to be carefully
managed and may necessitate downtime or reduced service levels.
NoSQL databases have far more relaxed -- or even nonexistent -- data
model restrictions. NoSQL Key Value stores and document databases allow the
application to store virtually any structure it wants in a data element. Even the
more rigidly defined BigTable-based NoSQL databases (Cassandra, HBase)
typically allow new columns to be created without too much fuss.
The result is that application changes and database schema changes do
not have to be managed as one complicated change unit. In theory, this will allow
applications to iterate faster, though,clearly, there can be undesirable side effects
if the application fails to manage data integrity.
Five challenges of NoSQL
The promise of the NoSQL database has generated a lot of enthusiasm,
but there are many obstacles to overcome before they can appeal to mainstream
enterprises. Here are a few of the top challenges.
44
1: Maturity
RDBMS systems have been around for a long time. NoSQL advocates will
argue that their advancing age is a sign of their obsolescence, but for most CIOs,
the maturity of the RDBMS is reassuring. For the most part, RDBMS systems are
stable and richly functional. In comparison, most NoSQL alternatives are in preproduction versions with many key features yet to be implemented.
Living on the technological leading edge is an exciting prospect for many
developers, but enterprises should approach it with extreme caution.
2: Support
Enterprises want the reassurance that if a key system fails, they will be
able to get timely and competent support. All RDBMS vendors go to great
lengths to provide a high level of enterprise support.
In contrast, most NoSQL systems are open source projects, and although there
are usually one or more firms offering support for each NoSQL database, these
companies often are small start-ups without the global reach, support resources,
or credibility of an Oracle, Microsoft, or IBM.
3: Analytics and business intelligence
NoSQL databases have evolved to meet the scaling demands of modern
Web 2.0 applications. Consequently, most of their feature set is oriented toward
the demands of these applications. However, data in an application has value to
the business that goes beyond the insert-read-update-delete cycle of a typical
45
Web application. Businesses mine information in corporate databases to improve
their efficiency and competitiveness, and business intelligence (BI) is a key IT
issue for all medium to large companies.
NoSQL databases offer few facilities for ad-hoc query and analysis. Even
a simple query requires significant programming expertise, and commonly used
BI tools do not provide connectivity to NoSQL.
Some relief is provided by the emergence of solutions such as HIVE or PIG,
which can provide easier access to data held in Hadoop clusters and perhaps
eventually, other NoSQL databases. Quest Software has developed a product -Toad for Cloud Databases -- that can provide ad-hoc query capabilities to a
variety of NoSQL databases.
4: Administration
The design goals for NoSQL may be to provide a zero-admin solution, but the
current reality falls well short of that goal. NoSQL today requires a lot of skill to
install and a lot of effort to maintain.
5: Expertise
There are literally millions of developers throughout the world, and in
every business segment, who are familiar with RDBMS concepts and
46
programming. In contrast, almost every NoSQL developer is in a learning mode.
This situation will address naturally over time, but for now, it's far easier to find
experienced RDBMS programmers or administrators than a NoSQL expert.
NoSQL systems part ways with the hefty SQL standard and offer simpler
but piecemeal solutions for architecting storage solutions. These systems were
built with the belief that in simplifying how a database operates over data, an
architect can better predict the performance of a query. In many NoSQL systems,
complex query logic is left to the application, resulting in a data store with more
predictable query performance because of the lack of variability in queries
NoSQL systems part with more than just declarative queries over the
relational data. Transactional semantics, consistency, and durability are
guarantees
that
organizations
such
as
banks
demand
of
databases. Transactions provide an all-or-nothing guarantee when combining
several potentially complex operations into one, such as deducting money from
one account and adding the money to another. Consistency ensures that when a
value
is
updated,
subsequent
queries
will
see
the
updated
value. Durability guarantees that once a value is updated, it will be written to
stable storage (such as a hard drive) and recoverable if the database crashes.
47
V.
Conclusion
Database solutions are not that perfect even though you are using the
oldest trend or the new trend, because there are no finite solutions or
definitive answer to every DBMS. The reason why? It’s because, innovation
or current trend and issues as far as technology is concern is also
interchanging every minute and every seconds.
Organizations have a great deal of investment in their infrastructure
incorporating ISAM and relational models. There are a number of competitive
advantages that can be gained by distributed computing (such as web
services), and the common language of distributed computing is XML. The
problem is that the XML model makes different assumptions about data than
the ISAM and relational models. The result is that businesses are now tasked
with adapting existing infrastructure to a new, incompatible data model more
quickly than ever before.
There are several ways to accomplish this, but each has drawbacks.
Some of these drawbacks are more likely than others to only show up as the
system scales outward; other drawbacks are more obvious. Therefore, it is
essential that the integration of distributed data not merely coast along the
path of least resistance, but that it proceeds in the manner best suited to the
needs of the business.
48
There are several possible futures. As unlikely as it seems, distributed
computing may turn out to be a fad. One particular mechanism of model
adaptation may improve to the point where it satisfactorily addresses the
needs of most businesses. One of the models may evolve to accept the
assumptions of the other models, making it the reference model. On the other
hand, the importance of distributed computing may simply force businesses to
accept the cost of inefficient data transformation. An ideal solution is a DBMS
that can apply the constraints of any particular model to the underlying data,
allowing existing infrastructure to perform at current levels while providing
native and natural support for new models as needed.
VI.
Recommendation
NoSQL databases are becoming an increasingly important part of the
database landscape, and when used appropriately, can offer real benefits.
However, enterprises should proceed with caution with full awareness of the
legitimate limitations and issues that are associated with these databases.
For a quarter of a century, the relational database (RDBMS) has been the
dominant model for database management. But, today, non-relational,
"cloud," or "NoSQL" databases are gaining mindshare as an alternative model
for database management. In this article, we'll look at the 10 key aspects of
these non-relational NoSQL databases: the top five advantages and the top
five challenges.
49
It is clearly, absolutely, precisely and promptly that RDBM is not yet as of
now, the final frontier, for the well solutionized problem solving for the trend
and issues as far as databases are concerned. We do all now for the fact that
RDBMS I evolving since its development, but still some problems and issues
are not being solve by the system.
No Database is an innovative distributed database that can be deployed in
any datacenter, in any cloud, anywhere, without the compromises inherent in
other New SQL solutions. The release also eliminates the need for the
complex database workarounds like clustering, performance tuning and
sharding that are typically associated with bringing applications to the cloud
(internet).
As the researcher, I therefore recommend the up to date tracking,
continuous evolution and the professional growth of every DBMS be
employed not only in the industry, not only in the education sector but also
thru the heart of every information technologist, computer scientist but also to
all the people that are responsible in the evolution of DBMS.
50