Download Data Mart - KV Institute of Management and Information Studies

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

SQL wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Introduction -Database management system
•Database
management system is software designed to assist the
maintenance and utilization of large scale collection of data.
•DBMS
came into existence in 1960 by Charles. Integrated data store which
is also called as the first general purpose DBMS.
•
Again in 1960 IBM brought IMS-Information management system.
•In
1970 Edgor Code at IBM came with new database called RDBMS.
•In
1980 then came SQL Architecture- Structure Query Language.
•In
1980 to 1990 there were advances in DBMS e.g. DB2, ORACLE.
Database Management System

A Database Management System (DBMS) is a
collection of program that enables user to create
and maintain a database.

The DBMS is hence a general purpose software
system that facilitates the process of defining,
constructing
and
manipulating
database
for
various applications.

DBMS is efficient to use since there are wide
varieties of sophisticated techniques to store and
retrieve the data.
Organisation of DBMS

The data may be logically organized into

Characters.

Fields.

Records.

Files and

Database.
Characteristics of DBMS

Non Redundant Data

It avoids unnecessary duplication of data and effectively reduces the
total amount of data storage required.

Sharing Data

A database allows the sharing of data under its control by any number
of application programs or users.

Data Integrity

Data integrity means that the data contained in the database is both
accurate and consistent.
Data Security
Data is vital importance to an organization and may be
confidential.
The
DBA who has the ultimate responsibility for the data in
DBMS can ensure that proper access procedures are followed,
including proper authentication schemas for access to the DBMS
and additional checks before permitting access to sensitive data.
Conflict Resolution
Since the database is under the control of the DBA, she or he
should resolve the conflicting requirements of various users and
applications.
Disadvantages of DBMS.
•Unauthorized
•Threat
•Need
of failure.
to control data quality.
•Threat
to data integrity.
•Enterprise
•Cost
access
vulnerability.
of using DBMS.
COMPONENTS OF DBMS
DML Pre- complier
 DDL complier
 File manager
 Database manager.
 Query processor.
 Database administrator.
 Data dictionary.
 Storage manager.
 Database users.

Architecture of DBMS

Three Level Database Architecture

Data are actually stored as bits, or numbers and strings, but it is
difficult to work with data at this level.

It is necessary to view data at different levels of abstraction.

There are following three levels or layers of DBMS architecture:

• External Level

•Conceptual Level

• Internal Level
Architecture of a DBMS Layers
External Level

The external level is the view that the
individual user of the database has.

This view is often a restricted view of the
database and the same database may
provide a number of different views for
different classes of users.
conceptual view

The
conceptual
view
is
the
overall
community view of the database and it
includes all the information that is going to be
represented in the database.

The conceptual view is defined by the
conceptual schema which includes definitions
of each of the various types of data.
Internal Level

The internal view is the view about the actual
physical storage of data.

It tells us what data is stored in the database and
how.

The following aspects are considered at this level:

Storage allocation

Access paths

Miscellaneous
Categories of Data Model
Categories of Data
Model
Record Based
Models
Relational
Network
Hierarchical
Object Based Models
Entity –
Relationship
model
Object - Oriented
Model
Relational database management system
(RDBMS ) Model

This model represents data and relationships among
data by a collection of tables known as relations, each of
which has a number of columns with unique names.

Example, consider the following wage table.
hours
rate
total
Raju
40
10
400
sabi
38
8.75
332.50
Ram
42
9.25
388.50
Concepts of RDBMS Model

E.F.Codd of the IBM propounded the relational model in
1972.

Some of the basic concepts of relational model are;

The relational database is a collection of two – dimensional
tables.

Each table represents some real- world person, place, thing,
or event about which information is collected.

The organization of data into relational tables is known as the
logical view of the database.
Advantages of RDBMS

Ease of use

Flexibility

Precision

Security

Data independence

Data manipulation language.
Disadvantages of RDBMS

A major constraint and therefore disadvantage
in the use of relational database system is
machine performance.

If the number of tables between which
relationships to be established are large and the
table
themselves
are
voluminous,
the
performance in responding to queries is
definitely degraded.
Network database management system (NDBMS)

This model represents data by collection of records and relationship
among data.

This is represented by links, which can be viewed as pointers.

3 Basic Components:

Record type: it represents a finite number of similar type entities.

Data elements: Entities are distinguished by the values of the data
elements with which the corresponding record type is associated.

links: all relationships between the same or different record types are
restricted to binary, many – one relationships. These many – one
relationships are called, links.
Type level view in the network model

Many – one
Teachers
courses
many - many
employees
Work _ in
project
Advantage of NDBMS model

Conceptual simplicity

Capability to handle more relationship
types.

Ease to access data.

Data integrity.

Data independence.
Disadvantage of NDBMS model

System complexity.

Operational anomalies.

Absence of structural independence.
Hierarchical Database Management
System ( HDBMS)MODEL
•This
model is similar to network model in the sense that data and
relationships among data are represented by records and links
respectively.
•Hierarchical
data model uses tree structures to represent relationship
among records.
•A
parent record can have many child records but a child record can have
only one parent.
•There
are no many-to-many relationships between records.
No
dependent record within a hierarchical data structure can exist without its
parent record.

A Hierarchical database therefore consists of a
collection of records, which are connected with
each other through links.

Each record is a collection of fields(attributes)
each of which contains one data value.

A link is an association between precisely two
records.
Ex: Consider the employee hierarchy

Root
Employee
First child
Compensation
Job Assignment
Benefits
Second child
Rating
Salary
Pension
Insurance
Health
Example:
Advantages of HDBMS Model

It is simple, straight forward and natural method
of implementing record relationships.

Disadvantages of HDBMS Model

It cannot represent all the relationships that occur
in the real world.

It is used only when there is a hierarchical
character in the concerned database.
Concurrency Management

In computer science, concurrency is a property of
systems
in
which
several
computations
are
executing simultaneously, and potentially interacting
with each other.

Concurrency control methods are required to ensure
that the transaction update do not result in an incorrect
execution .

Eg. Update of one transaction overwrite another’s
update.

It ensure both consistency and isolation.
Reasons of Concurrency Management

The concurrency management is used
because of following reasons:

To improved throughput and resource
utilisation.

To reduced waiting time.
Methods to avoid concurrency

The problem of concurrent access can be
solved in a number of ways. Some of them
are as follows:

Locking file

Locking record

Locking data field

Versioning.
Data Warehouse

A data warehouse is supposed to be a
place where data gets stored so that
applications can access and share it easily.

A data warehouse is of course a base but it
contains summarized information.
A data warehousing system
various
company
Data
Warehouse
Software
Information
discovery
database
Data
warehouse
database
Features of Data Warehousing

A common way of introducing data warehousing is
to refer to the characteristics of a data warehouse
are:
Nonvolatile
Features of
Time
Integrity
data
variant
warehousing
Subject
oriented
Warehouse data modeling levels

There are three level of data modeling:

Physical.

Logical.

Data Mart.

Each level of data modeling has its own
purpose in data warehouse design.
Data Mart

A data mart is a simple form of a data
warehouse that is focused on a single
subject (or functional area), such as Sales,
Finance, or Marketing.

Data marts are often built and controlled
by
a
single
organization.
department
within
an
What Are the Steps in Implementing a
Data Mart?


the major steps in implementing a data mart
"Designing"

"Constructing"

"Populating"

"Accessing"

"Managing“

data mart are to design the schema, construct the physical
storage, populate the data mart with data from source systems,
access it to make informed decisions, and manage it over time.
Types of Data mart

Multidimensional database.

It support the management capability of
analytically looking at the same data in
different ways.

Relational OLAP

It contains both numeric and textual data.

It serve a much wider purpose than the
multidimensional database
Query processing

Query
processing
is
the
procedure
of
transforming a high – level query ( like SQL ) into
a correct and efficient execution plan expressed
in low level language that performs the required
retrieval and manipulation in the database.
Steps in query processing

High level query language ( SQL)
Scanning & parsing
Query
decomposer
Algebraic expression
Query optimizer
Execution Plan
Code to execute the
query
Query code
generator/ query
Interpreter
Runtime database
processor
Syntax analyzer
Steps in query processing

1.Syntax analysis

An SQL query is analyzed and the server
produces either a parse tree for the syntax
or syntax error.

When
a
statement
is
parsed,
the
information necessary for its execution is
loaded into the statement cache.
2.Query decomposition

It is a phase of query processing whose aims are to transform a
high level query into a relational algebra query.

It also check whether the query is syntactically and semantically
correct.

Query decomposer work in five stages, they are

Query analysis,

Query Normalization,

Semantic Analysis,

Query Simplifier,

Query Restructuring.
Steps in Query Decomposition

SQL
Query Analysis
Query
Normalization
Semantic analysis

Equivalence
Rules
Data
Dictionary
Query simplifier
Idempotancy rules
Query
restructuring
Algebraic Expression
Transformation
Rules
(i) Query analysis

During the query analysis phase, the query is
lexically and syntactically analyzed in order
to find out any syntax errors.

A syntactically legal query is then validated,
to ensure that all the database objects(
relations and attributes ) referred to by
query are defined in the database.
ii) Query Normalisation

The primary goal of normalisation phase is
to avoid redundancy( to avoid duplication,
data insufficient)

The normalisation converts the query into a
normalised form that can be more easily
manipulated.
iii) Semantic Analysis
•
The main objective is to reduce the number
of predicates that must be evaluated by
refuting incorrect or contradictory.
•
The
semantic
analyzer
rejects
the
normalized queries that are incorrectly
formulated or contradictory.
Query simplifier

The objectives of a query simplifier are to
detect redundant qualifications, eliminate
common sub- expressions and transform
sub-graphs
(query)
to
semantically
equivalent but more easily and efficiently
computed forms.
Query Restructuring

The query can be restructured to give a
more efficient implementation.

Transformation rules are used to convert
one relational algebra expression into an
equivalent from that is more efficient.
3.Query optimizer

Performing
optimization
by
substituting
equivalent expressions for those in the query.

4. query code generator

Generating the code for the queries

5.Runtime database processor

Estimates each process plan, selecting optimal
plan and execution takes place.