Download Enterprize database Managment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Commitment ordering wikipedia , lookup

IMDb wikipedia , lookup

Microsoft Access wikipedia , lookup

SQL wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Oracle Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Ingres (database) wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Serializability wikipedia , lookup

Functional Database Model wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Relational model wikipedia , lookup

ContactPoint wikipedia , lookup

Concurrency control wikipedia , lookup

Clusterpoint wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database model wikipedia , lookup

Transcript
CSC 532
Enterprise database management
Pranav Raj Sharma
Page 1
Database management system is the backbone of most companies from small size to large size
that handle their sales, inventory, day to day transaction, order history, purchase, financial data
and many more. An enterprise DBMS is designed to manage large number of users, huge
databases, and to integrate & run multiple types of software applications. Enterprise DBMS
offers several important features including a multi-processor support, parallel processing,
distributed transactions, concurrency control, data security, high scalability, and high availability
features such as clustering and replications. Enterprise database management system can directly
communicate with frontend application such as SAP, Oracle E-business, J.D Edwards, Dynamics
AX, Baan and so forth to perform daily operation of many small to large scale organizations.
Some of largest DBMS include Oracle, Microsoft SQL Server, Informix, DB2, and Sybase.
Today Oracle and Microsoft SQL Server are two of the largest DBMS vendors competing for
market share of medium to large organization. Therefore, database is one of the most crucial
entity of software engineering.
Enterprise database management ensures database quality, and performance which serves both
internal and external users. Continuous and growing data volumes, coupled with the legal and
regulatory requirements to save data are impacting application performance and straining
resources of the organizations.
In a typical database management system actual database is stored in a disk. In order to ensure
fault tolerance and higher performance various Redundant Array of Independent Disks- RAID
level are used in software engineering industry. RAID can be defined as redundant array of
independent disk or redundant array of inexpensive disk. Some of the basic Raid level that is
used in EDBMS is listed below.
1. Raid 0: It offers block striping or data striping meaning data will be spread across various
block in a given disk which promotes faster reads/write due to parallel processing. But
Page 2
there is no fault tolerance due to lack of redundancy meaning if a disk which contains
database fails, data will be lost permanently.
2. Raid 1: It offers redundancy of disk or mirroring but there is no data striping. Due to
redundancy feature, it offers faster read but slower write because writing involves 2
writing in 2 disks at the same time. On a positive note it offers fault tolerance due to
mirroring or redundant disk feature.
3. Raid 0+1 or Raid 10: It offers fault tolerance and mirroring. It offers excellent
performance due to parallel reads and writes. It also offers better durability due to its
mirroring feature.
4. Raid5: Just like Raid 0+1, Raid5 offers both fault tolerance and faster read/write access.
It involves parity blocks distributed over all disk.
Most industry EDBMS recommends that we use RAID 10 or at least Raid 5 for parallel
transaction processing and fault tolerance purposes especially for production environment.
In a typical DBMS, main memory/ buffer stores current data in use. Likewise, the actual
databases are stored in disk sub system and the archived data & backup files are stored in offsite
tape. Most Enterprise level Database Management system can do disk space and buffer
management. Data must be in RAM (buffer) for DBMS to operate on it. Most transactions
retrieve their data from main memory or buffer pool where the data pages are cached & flushed
out on periodic basis. Buffer pool acts as a mediator between disk data and transaction manager.
Because reading and writing data to & from disk can be very expensive, data pages stays in main
memory or buffer pool for better access. Therefore, the larger the server memory, the better the
faster the transaction will be. The size of main memory/ Ram is dictated by size of databases on
the server, the amount of transactions throughput, and the size of data pages that gets stored on
the buffer pool. The amount of data pages which get stored in buffer pool is called buffer cache.
Therefore if the buffer cache is larger than main memory or the server RAM, data pages will get
Page 3
flushed out of memory very quickly which will lead to expensive query and lower page life
expectancy of database pages. Low Page life expectancy is a good indicator of memory pressure.
Page life expectancy can be defined as the amount of time data page stays on buffer pool before
it gets flushed from the memory. In Microsoft SQL Server, if the average Page life Expectancy
(PLE) is consistently less than 300, they recommend that we either increase the server memory
or increase query efficiency in order to alleviate memory pressure.
All Enterprise DBMS have database object called Index which is basically an underlining
structure in a database that determines how data is stored and retrieved in a database thereby
expediting the query processing. Indexes are one of the most powerful tools for database
administrator when it comes to troubleshooting slowness and performance tuning. When rows on
indexes are not sorted such tables are called heap. The problem with heap table structure is it
involves table scan for query processing. If a table is small, table scan is not very costly.
However, if a table is substantially large, table scan will lead to high query wait time, blocking
locks and higher disk I/O which will all lead to system slowness. What indexing does is it not
only sort the database table but also avoid table scan by searching/ filtering records in a table
based on user’s query.
Most commercial Database Management Systems have two major types of index: Clustered
Index and Non-Clustered Index. A database table can have only one Clustered index. This is
because in clustered index, the leaf node contains the actual physical data. So, the Clustered
index will map to the physical data pages in tables. In non-clustered index, the logical structure
of index is different from the actual physical structure of the data on tables. Therefore, each table
can have several non-clustered indexes as opposed to single clustered index per table. In
enterprise Database Management system, most indexes come with databases that are generally
Page 4
created by ERP applications. However, during a course of time, as more and more users are
connected to databases and they run different queries/reports, new indexes must be added to
satisfy user’s query.
Clustered indexes are typically created on one key column or column used frequently such as
one that is referenced by where clause. Generally speaking clustered indexes are suitable for
range queries where large amount of data is queried. Also query that uses order by or group by
clause can largely benefit from clustered indexes. Moreover, clustered index can be more
effective when data is accessed sequentially/frequently or when data needs to be sorted
because data is automatically sorted when clustered index is added to a table and doesn’t have
to be re-sorted. Queries that involve small data set will benefit from creation of non-clustered
index. Nonclustered indexes can be added to address queries that are not covered by clustered
index. For a given table, you can have only one clustered index at most. As a good practice,
clustered index should be created before adding non-clustered indexes on a table. This is because
Nonclustered index relies on location of index data within the clustered table (Source: Database
Management Systems- Third Edition: Author RamaKrishnan, Gehrke).
In a database world, transaction is an atomic unit of work. Transactions are called queries when
they request read only access. In other words, read only transaction is called query. Multiple
concurrent transaction can interfere each other when they are both connected to the same
database resource performing either read or write operation. Generally speaking when 2 or more
concurrent transactions involve read operation from the same table; it should not result in
conflict. However, when there is at least one read transaction and multiple write transaction
trying to access same resource or multiple transactions trying to modify the same table, it results
in conflict. In order to maintain isolation level and eliminate transaction conflict, each database
Page 5
management system has feature called lock escalation and concurrency control which basically
regulate transactions in order to reduce conflicts. When multiple concurrent read transactions
access same table in a database they will use Shared lock (SLOCK). Likewise, when there is
multiple read/write transactions, they will each transaction will use exclusive lock in order to
maintain integrity of database. Concurrency control can be applied at various levels that include
filed level, record level, page level, extent level, file level, and database level. In SQL Server
there is 3 type of row versioning to reduce transaction conflicts which are: 1. Read committed
isolation level 2. Read committed snapshot isolation level 3. Snapshot isolation level.
Regardless of which vendors you use for managing your data, any Enterprise Database
Management system must guarantee the following four fundamental properties of transactions
known as ACID properties ad defined below.
1. Atomicity: Transaction is all or nothing. Transaction will both succeed to completion and
commit the transaction. Else it will rollback if the transaction is incomplete or
unsuccessful. The atomicity property of transaction is to ensure that integrity of a
database.
2. Consistency: Database Management System must take a transaction from one consistent
state to another. Transaction should not be in intermediate state. To ensure consistency
and integrity of a database, there are various constraint such as primary key constraint,
not null constraint, check constraint, referential constraint that includes primary &
foreign key.
Page 6
3. Isolation: EDBMS should give illusion to its user as being sole user of a database. In an
enterprise environment, hundreds of thousands of user are connected at the same time.
These users should not interfere with one another.
4. Durability: In Enterprise database Management system, data once committed and written
to disk must stay permanent unless somebody deletes them on purpose. In other words,
data in database should stay permanent unless somebody deletes them.
Most Enterprise Database Management vendors such as IBM, Informix, Oracle, SQL Server uses
a standard relational query language called SQL. SQL is a structured query language originally
developed by IBM in the 1970’s which currently uses the latest SQL -99 standard which
supports procedural construct (such as if, then, else) and Object Oriented constructs such as
inheritance, polymorphism.
A typical database management system has following layers described below.
1. Query optimization & execution layer
Produces most efficient query execution plan based on user queries.
2. Database Operator layer
Implements physical data model operators such as relational operators; select, project,
join.
3. Buffer management
Deals with buffer and manages how databases access it. So basically buffer management
layer partitions the main memory allocated to the DBMS into buffer page frame and
brings data pages to and from disk as requested by file manager.
4. Disk Space Management
Page 7
Supports the file concept to higher layers and supports access paths to the data in those
files such as indexes.
In today’s commercial world the colorful front end by itself wouldn't make any sense
without the robust backend database. The advancement in database has made our lives
easier.
References:
1. RamaKrishnan, Gehrke, “Database Management Systems,” Third Edition, 2009.
2. Microsoft Corporation
http://msdn.microsoft.com/en-us/library/ms174173.aspx
3. University of Pennsylvania
http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf
4. Microsoft Corporation
http://www.microsoft.com/sqlserver/en/us/default.aspx
6. Wikipedia
http://en.wikipedia.org/wiki/Enterprise_database_management
http://en.wikipedia.org/wiki/RAID
5. Database Systems: The Complete Book, Stanford University:
http://infolab.stanford.edu/~ullman/dscb.html
Page 8