Download Overview of Relational Database

Document related concepts

Serializability wikipedia , lookup

IMDb wikipedia , lookup

Microsoft Access wikipedia , lookup

Oracle Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Ingres (database) wikipedia , lookup

Concurrency control wikipedia , lookup

PL/SQL wikipedia , lookup

SQL wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Overview of Relational
Database
What is Microsoft SQL Server?
• SQL Server is a Database Management System (DBMS) sold
by the Microsoft Corporation.
• It is one of the most popular DBMS products in the world.
Principal Editions of SQL Server
• Enterprise
• Business Intelligence
• Standard
• SQL Server Web
• Breadth editions
Principal Editions of SQL Server
• Enterprise
• Delivers comprehensive high-end data center capabilities with fast
performance.
• Unlimited virtualization.
• End-to-end business intelligence
• Enabling high service levels for mission-critical workloads and end user access to data
insights.
Business Intelligence
• Delivers a platform empowering organizations to build and
deploy secure, scalable and manageable Business intelligence
(BI) solutions.
• BI refers to data transformed into knowledge that can then be used to make
more informed business decisions.
Standard
• Delivers basic data management and business intelligence
database for departments and small organizations
• Enabling effective database management with minimal IT resources.
Other Editions of SQL Server
• SQL Server Web
• A specialized edition of SQL Server that targets business workloads.
• A low total-cost-of-ownership option for Web hosters and Web VAPs to provide
scalability and affordability.
• Manageability capabilities for small to large scale Web properties.
Breadth Editions of SQL Server.
• These editions are engineered for specific customer scenarios
and are offered FREE or at a very nominal cost.
Breadth editions of SQL Server.
• Developer
•
•
•
•
Lets developers build any kind of application on top of SQL Server.
It includes all the functionality of Enterprise edition.
Is licensed for use as a development and test system.
Not a production server.
• Express
• An entry-level, free database and ideal for learning and building desktop and
small server data-driven applications.
• It is the good choice for
• Independent software vendors.
• Developers.
• Hobbyists building client applications.
Data Persistence
• Computer programs process data in the main memory of a
computer.
• The changes made to data in main memory are temporary
• Do not persist beyond the execution of the program.
To preserve changes to data.
• The data must be written to disk.
There are two common approaches to data
persistence:
• Use simple operating system files
• Use a database
Simple operating system files are common to
• Spreadsheets.
• Word processors.
• Other productivity software.
The organization of such files is based on a
record
• A collection of data items.
Using simple operating system files
• Any record for any entity such as an employee has no defined
association with any record for another entity such as a
department.
• Users or programmers may recognize an association.
• It is not made explicit in the files.
Files and Databases
• The fundamental concepts of a database management system.
• Differs from a simple file system.
• A file is a collection of data items.
• A database is a collection of data items.
• A database also captures relationships between data items.
The manner in which relationships are captured
by a database depends on the type of system.
• Traditional network and hierarchical databases represent
relationships via pointers.
• Relational databases represent relationships taking a more
value-based approach via foreign keys.
Relationships have different cardinality
• One to One (1:1)
• One to Many (1:M)
• Many to Many (M:M)
Foreign key
• A relational database can directly represent a 1:1 or 1:M
relationship using a foreign key.
• A M:M relationship is similar to two 1:M relationships and is
represented in a separate table.
• A database explicitly captures such relationships.
A database is a collection of related data and
has the following implicit properties
• Represents some aspect of the real world
• Consists of a logically coherent collection of data with some
inherent meaning
• Designed for a specific purpose - an intended group of users
with some preconceived applications
A database management system (DBMS)
• Computer program (or collection of programs) which
• Manages a database and acts as an interface between application
programs and front-end tools which access the database.
The DBMS acts an interface to data
stored in the database.
• Regardless of whether of the “front-end” program is
• Running on the same machine as the DBMS (i.e. Server)
• On a different machine (i.e. Client).
Front-end Program
• The “front-end” program communicates its requests for data
through the
• DBMS.
• Programs do not directly access the operating system files that contain the data
entrusted to the DBMS.
• The front-end program could be an interactive interface for executing database commands
(e.g. Server Management Studio)
• It could be an application program that contains embedded database commands (e.g.
Visual Basic).
DBMS Responsibilities
• The DBMS adds a layer of software between the application
and the stored data.
• This layer adds overhead.
• DBMS performs many important functions.
A DBMS provides all of the following.
• Security
• Prevents unauthorized access to data
• Integrity
• Enforces business rules
• Concurrency
• Controls data access by multiple users
• Consistency
• Supports transaction processing
• Recovery
• Mechanisms to restore the database after a
• transaction.
• database.
• media failure
RDBMS
• Many popular database systems are based on the relational
model.
• A DBMS based on the relational model is said to be an RDBMS.
Relational Database Systems
• DB2 UDB (IBM)
• Oracle Database (Oracle)
• SQL Server (Microsoft)
The relational model establishes criteria for
• The (logical) structure of the data within the DBMS.
• How users see the data
• The language used by the DBMS.
• How users operate on the data
• Database integrity.
• The enforcement of relationships and business rules
A logical view of data
• Does not impose any restrictions on how that data may be
physically stored or accessed.
Relational Data Structure
• Data are perceived as a collection of tables.
• Each table consists of columns and rows.
Data Processing Terminology (1 of 7)
• Table
• Row
• Column
• Relations
Table (2 of 7)
• A table is similar to a file
• A program reads from or writes a file.
• When accessing a relational database a program directs its access to table.
Row (3 of 7)
• A row is similar to a record
• A record is a collection of data items that describe some entity.
• Its counterpart is a row.
Column (4 of 7)
• A column is similar to a field or data item in a record.
• Each column must have a unique name within its table.
• Each column contains a single ("atomic") data item.
• This means the DBMS treats that data item as a single unit of data.
• All values in a column must be of the same data type.
Column (5 of 7)
• The left-to-right order of columns is not important in the
relational model.
• How the data values are physically stored does not impose restrictions on how
we can retrieve that data in a query.
• This behavior is known as data independence.
• We will see that we can return the column data in any left to-right order we
desire.
• regardless of the physical order in which these values are stored.
Row (6 of 7)
• Each row contains a single entry for each column.
• Each row-column intersection in a table holds exactly one value of some data
type.
• Rows do not have names and can appear in any order.
Relations (7 of 7)
• The relational model is based on mathematical relations or sets.
• A table is a computer representation of a relation.
• A computer representation may not always be completely
faithful to the object it represents.
• A mathematical set.
Relational Language
• SQL (Structured Query Language) is the standard relational
language used with Oracle and almost every RDBMS.
Relational Language
• SQL has two key features that distinguish it from conventional
third generation (3GL) languages such as COBOL and
languages used by traditional hierarchical and network DBMS
products.
• SQL is non-procedural
• SQL supports set-level processing
Non-procedural
• Non-procedural means that a SQL statement indicates which
rows and columns of a table are to be retrieved
• Does not specify how they are to be retrieved.
• Declarative language
• Query optimizer.
• This component makes the decisions of how to access the data based on what the user has
requested as well as characteristics of the data stored in the database.
Set-level processing
• A SQL statement can refer to many rows (a set of rows).
• Provides greater expressive power than traditional languages which could only
reference a single record at a time.
Relational Language
• Classified according to three categories:
• Data Definition (DDL)
• Data Control (DCL)
• Data Manipulation (DML)
Data Definition (DDL)
• DDL statements are used to
• Create.
• Drop.
• Modify
• DDL operations are generally performed by the database
administrator (DBA).
Data Control (DCL)
• DCL statements are used to grant or revoke privileges on
database objects.
• DCL operations are generally performed by the DBA or security
administrator.
Data Manipulation (DML)
• DML statements are used to
•
•
•
•
Retrieve.
Insert.
Delete.
Update
• DML operations may be performed by
• Application developers.
• Business analysts.
• The DBA.
Indexes
• Allows efficient retrieval of the data stored in a table.
• An index might also be used to enforce an integrity constraint.
How many indexes
• One.
• Many.
• No indexes.
Index (1 of 3)
• Automatically updates an index when
• A new row is added to a table.
• An existing row is deleted from a table.
• An existing row is updated.
Index (2 of 3)
• An index can also be used to enforce data integrity.
• Enforcing uniqueness of values in a column.
Index (3 of 3)
• An index can also be used to enforce data integrity.
• Enforcing uniqueness of values in a column.
Database Integrity (1 of 2)
• Three categories of database integrity:
• Entity Integrity
• Referential Integrity
• User-Defined Integrity
Database Integrity (2 of 2)
• SQL Server uses:
• PRIMARY KEYs (entity integrity)
• FOREIGN KEYs (referential integrity)
There are two ways that an RDBMS can
support integrity:
• Declarative integrity
• Business rules enforced as constraints defined as part of the definition of the
table
• Procedural integrity
• Business rules enforced programmatically through database triggers
Entity Integrity
• Entities are distinct and identifiable.
• An entity is typically represented by a row in a table.
Entity integrity
• The primary key:
• Must be a unique column (or group of columns)
• Must not be null
• A table can contain only one primary key constraint.
• Cannot exceed 16 columns
• A total key length of 900 bytes.
Primary Key Index
• Enforces entity integrity.
• Cannot cause the number of indexes on the table to
• Exceed 999 non-clustered indexes AND
• 1 clustered index.
Referential integrity (1 of 3)
• Referential integrity pertains to an entity instance referencing
another "valid" entity instance.
• The values of the foreign key are constrained by the values of
the referenced primary key.
Referential integrity (2 of 3)
• A foreign key includes a DELETE RULE to inform the system on
how to handle an attempted deletion of a referenced row.
• CASCADE
• SET NULL
• NO ACTION
Referential integrity (3 of 3)
• Corresponding index.
• Manually create an index on the columns of a foreign key.
Parent/Child
• The system either accepts or rejects the child based on the
existence of a referenced parent row.
DELETE Rule (1 of 3)
• A delete rule of CASCADE means the parent row and all child
rows will be deleted.
• could extend to grandchildren and beyond
DELETE Rule (2 of 3)
• NO ACTION
DELETE Rule (3 of 3)
• SET NULL
• An attempted deletion of a parent row (that is the target of one or more child
references) will be allowed by the system.
• The child row will not be deleted.
• The foreign key value of the referencing child rows will be set to a special value called
NULL.
• The NULL value means that the child has no parent.
• This is a different situation from the case where a child row “references” a non-existent parent
row.
User-Defined Integrity
• CHECK
• CHECK identifies a condition which must be satisfied by a column value for
the insertion of a new row or for the update of an existing row to be accepted
by the system.
• UNIQUE
• UNIQUE indicates that a column value must be unique within a table.
• A unique index is automatically created by the system to enforce a UNIQUE constraint.
• NOT NULL and DEFAULT
• clauses ensure that null values will not be stored in a table.
Triggers
• Triggers provide a means to implement procedural integrity.
• A trigger can be used to support complex business rules that could not be
implemented within the framework of declarative integrity.
CONSTRAINT Example
• A constraint can be assigned a name with the CONSTRAINT
clause.
CREATE TABLE EMPLOYEE
(EMP_NO CHAR(5)
NOT NULL,
ENAME CHAR(30)
NOT NULL,
ESALARY
NUMERIC(7,2)
CONSTRAINT CK_ESALARY CHECK(ESALARY > 0),
DEPTID VARCHAR(4),
CONSTRAINT PK_EMPLOYEE PRIMARY KEY(EMP_NO),
CONSTRAINT FK_DEPTID
FOREIGN KEY(DEPTID) REFERENCES DEPARTMENT)
SQL Server Management
Studio
What is SQL Server Management Studio?
• SQL Server Management Studio (SSMS) is an integrated
environment to
•
•
•
•
•
Access.
Configure.
Manage.
Administer.
Develop components of SQL server.
SQL Server Architecture
Fundamental Concepts (1 of 2)
• A database in SQL Server is made up of a collection of tables
that stores a specific set of structured data.
Fundamental Concepts (2 of 2)
• A table consists of a collection of rows and columns.
• Each column holds a particular type of information.
• Dates.
• Strings.
• Numbers.
Database Instances
• A computer can have one or more than one instance of SQL
Server installed.
• Each instance of SQL Server can contain one or many
databases.
Database
• Includes one or many object ownership groups called schemas.
• Each schema contains database objects
• Tables.
• Views.
• Stored procedures.
Permissions
• A user that has access to a database can be given permissions
to access the objects in the database.
Database Instances (1 of 3)
• Each instance of SQL Server has
• System databases
• One or more user databases.
Database Instances (1 of 3)
• An instance of the SQL Server Standard or Enterprise Edition
can
• Handle many users working in multiple databases at the same time.
Database Instances (1 of 3)
• Each instance of SQL Server makes
• All databases in the instance available to all users that connect to the instance.
• Subject to the defined security permissions.
The system databases
• Created by default when an instance of SQL Server is installed
•
•
•
•
Master database
tempdb database
Model database
msdb database
Master database
• The master database is the primary system database.
• Without it, SQL Server cannot start.
• Contains the most important information about objects within
the SQL Server instance.
tempdb database (1 of 5)
• The tempdb database is a global area for temporary objects
created by the internal processes that run SQL Server and
temporary objects that are created by users or applications.
tempdb database (2 of 5)
• Temporary objects includes
•
•
•
•
temporary tables and stored procedures.
table variables.
global temporary tables.
cursors.
tempdb database (3 of 5)
• tempdb is re-created every time SQL Server is restarted.
tempdb database (3 of 5)
• tempdb stores
• Row versions for read-committed or snapshot isolation transactions.
• Online index operations.
• AFTER triggers.
tempdb database (5 of 5)
• tempdb should never be used to store persistent information.
• Because tempdb is global, it is accessible to all databases on the SQL Server system.
Model database
• The model database is a model for all databases created on an
instance of SQL Server.
• It serves as a template each time a database is created.
msdb database
• The msdb database serves primarily as the back-end database
for Microsoft SQL Server Agent.
• Whenever a SQL Server Agent job is created or schedules.
• the metadata for that job is stored in this database.
Database Files (1 of 4)
• SQL Server maps a database over a set of operating system
files.
• Has two operating system files:
• Data file
• Log file
Data files (2 of 4)
• Data and objects
•
•
•
•
Tables.
Indexes.
Stored procedures.
Views.
Data files (3 of 4)
• Log files contain the information that is required to recover all
transactions in the database.
Data files (4 of 4)
• Data files can be grouped together in file groups for allocation
and administration purposes.
SQL Server databases have three types of
files: (1 of 4)
• Primary
• The primary data file contains the startup information for the database and
points to the other files in the database.
• User data and objects can be stored in this file or in secondary data files.
SQL Server databases have three types of
files: (2 of 4)
• Every database has one primary data file.
• The recommended file name extension for primary data files is .mdf.
SQL Server databases have three types of
files: (3 of 4)
• Secondary
• Secondary data files are optional.
• User-defined.
• Store user data.
• Secondary files can be used to spread data across multiple disks by putting
each file on a different disk drive.
SQL Server databases have three types of
files: (4 of 4)
• If a database exceeds the maximum size for a single Windows
file.
• You can use secondary data files so the database can continue to grow.
• The recommended file name extension for secondary data files is .ndf.
Transaction Log
• The transaction log files hold the log information that is used to
recover the database.
• There must be at least one log file for each database.
• The recommended file name extension for transaction logs is
.ldf.
sysdatabases
• Information about the database is recorded in the sysdatabases
table of the master database.
When a new database is created (1 of 2)
• System objects are copied from the model database.
• The initial size of a database must be at least the size of the model database.
• The model database provides the starting point for all SQL Server databases on a system.
• Administrators may add additional objects to this database.
When a new database is created (2 of 2)
• All objects in the model database will automatically be copied to
the new database.
• This is one way that administrators can ensure all databases have certain
characteristics or objects.
Logical database
• Logical database design is concerned with the user's perception
of data.
• In a relational database the user sees data as tables.
• Logical database design is concerned with identifying the tables
for an application domain.
Implementation of a design
• The implementation of a design is concerned with physical
matters
•
•
•
•
•
Indexes.
Hashing.
Ordering of rows.
Size of a table.
Size of physical blocks
• Specification of a collection of base tables
Design Criteria
• The design should:
• Satisfy specified data requirements
• Be stable
• Business changes should be easily incorporated
• Be efficient
• Implementation should perform well.
• Resulting in good response time for queries
Additional database design criteria
• Logical Design
• Base Tables
• Views
• Physical Design
• Indexes
• Tablespaces or Filegroups
Database Models
• Logical Model
• Relational Design
• applicable to any RDBMS
• Physical Model
• Physical Design
• product specific
• The logical model provides the input to the physical model
Conceptual Model
• No reference to any DBMS
• A non-technical description of an application domain
• Relational Model
• Applies to any RDBMS
• Physical Model
• Specific to RDBMS product such as Microsoft SQL Server or Oracle Database
Design Challenges (1 of 2)
• Stability vs. Efficiency
• Almost impossible to maximize both objectives:
• Direct implementation of Conceptual Design
•
•
•
•
Understandable design
Simpler application programs
Simpler user-written SQL queries
But rejection of opportunities to improve machine efficiency
Design Challenges (1 of 2)
• Stability vs. Efficiency
• Almost impossible to maximize both objectives:
• Application of every efficiency technique
•
•
•
•
•
More complex design
Convoluted application programming
Few user-written SQL queries
Possible future machine inefficiencies
Loss of stability
• Future design changes will be more difficult to implement
Understanding Semantics of Data
• Classic analysis problem enhanced
• Conceptual Design forces you to ask more comprehensive questions of users
• Management imposed time constraints
• Significant size and complexity of some application domains
• Difficulty of predicting performance
Transform to Logical Model
• (Relational Model)
• Each entity becomes a table
• Each Many-to-Many relationship becomes a table
• For a One-to-Many relationship.
• Include the primary key of "one" in the "many" table
• Include attributes of columns in tables
SQL Server Datatypes
Common Datatypes
BIT
Maximum Size
Integer that can be 0, 1, or NULL.
CHAR(size )
Maximum size of 8,000 characters. Where size is the number of characters to store. Fixed-length. Space
padded on right to equal size characters. Non-Unicode data.
DEC(m ,d )
DECIMAL(m ,d )
m defaults to 18, if not specified.
m defaults to 18, if not specified.
Where m is the total digits and d is the number of digits after the
Where m is the total digits and d is the number of digits after the
decimal.
FLOAT(n )
INT
MONEY
NUMERIC(m ,d )
Floating point number.
-2,147,483,648 to 2,147,483,647
-922,337,203,685,477.5808 to
m defaults to 18, if not specified.
Where n is the number of number of bits to store in scientific notation.
NVARCHAR(size ) or NVARCHAR(max) Maximum size of 4,000 or max
characters.
SMALLDATETIME
Date values range from '1900-0101' to '2079-06-06'.
SMALLINT
-32768 to 32767
SMALLMONEY
- 214,748.3648 to 214,748.3647
TEXT
Maximum size of 2GB.
VARCHAR(size ) or VARCHAR(max)
Maximum size of 8,000 or max
characters.
Explanation
Where m is the total digits and d is the number of digits after the
decimal.
Where size is the number of characters to store. Variable-length. If max
is specified, the maximum number of characters is 2GB. Unicode data.
Displayed as 'YYYY-MM-DD hh:mm:ss'
Variable-length. Non-Unicode data.
Where size is the number of characters to store. Variable-length. If max
is specified, the maximum number of characters is 2GB. Non-Unicode
Integrity constraints include the following:
• Primary Key
• Foreign Key
• Check Conditions (including NOT NULL)
ALTER TABLE Statement
• The ALTER TABLE statement can be used to modify a table
definition
•
•
•
•
•
•
•
•
Add columns
Add constraints
Drop columns
Drop constraints
Disable constraints
Enable constraints
Disable triggers
Enable triggers
Database Design Approaches
• Top-Down
• Bottom-up
Database Design Approaches
• Top-Down
• Discover
• The entities.
• The relationships.
• The attributes or data items
Bottom-up approach (1 of 5)
• Produces a logical data model which is translated into a
physical data model to implement the database design.
• Focuses on the process model first.
• Making it a function-driven approach.
Bottom-up approach (2 of 5)
• All processes to be performed by the system are identified.
• As well as the data they require.
Bottom-up approach (3 of 5)
• The initial focus is on applications rather than on data.
Bottom-up approach (4 of 5)
• A data model is constructed to satisfy this precise set of data
requirements.
Bottom-up approach (5 of 5)
• Techniques such as normalization are integral to this approach.
Bottom-Up Approach (Analysis Phase)
(1 of 2)
• Collecting local views
• Determining functional dependencies among data items
Bottom-Up Approach (Design Phase)
(2 of 2)
• Normalizing local views
• Synthesizing a global view
Normalization Theory (1 of 2)
• Normalization is an integral technique of a bottom-up approach.
• Normalization theory is used as a design verification technique
in a top-down approach.
Normalization Theory (2 of 2)
• “A record is in second and third normal form if every field is
either part of the key or provides a (single-valued) fact about
exactly the whole key and nothing else”
• “A table is in third normal form if every non-key field depends
upon
• The primary key.
• The whole key.
• And nothing but the key”
Scope of Normalization (1 of 4)
• This theory is applicable to any DBMS.
• Not just relational.
Scope of Normalization (2 of 4)
• It is applicable to the design of simple files as well.
Scope of Normalization (3 of 4)
• Normalization theory is a process of decomposing tables or
records from an un-normalized form to a normalized form.
• Each decomposition reduces the possibility of errors or anomalies
occurring during database update processing.
Scope of Normalization (4 of 4)
• Applying the normalization process.
• Tables are transformed to a higher normal form.
• Note that 5th normal form (5NF) is a theoretical ideal.
Benefit and Cost of Normalization (1 of 4)
• The ultimate benefit derived from applying normalization theory
is a stable design.
• Reducing the chance of update errors.
Benefit and Cost of Normalization (2 of 4)
• In order to implement a database through the normalization
process.
• One must have a real understanding of the semantics (meaning) of
each data item.
• This forces the database designer to effectively communicate with the users of
the database.
Benefit and Cost of Normalization (3 of 4)
• Users understand clearly the meaning of data in relations so
they correctly formulate queries.
Benefit and Cost of Normalization (4 of 4)
• There is extra an effort of joining and tables.
Functional Dependence (1 of 2)
• Given a relation R.
• Attribute Y of R is functionally dependent on attribute X of R if each X-value
of R has associated with it precisely one Y-value of R (at any one time).
• Attributes X and Y may be composite.
Functional Dependence (2 of 2)
• Functional dependence means:
• "You tell me X and I'll tell you Y"
• Notation:
• R.X  R.Y
• Examples:
•
•
•
•
•
STUDNO  SNAME
STUDNO SAGE
COURSENO CNAME
POLICYNO  NAME-OF-INSURED
CLAIMNO  POLICYNO
Functional Dependence
• Functional dependence is not limited to a single data item.
• It can be extended to a collection of fields.
• Notation:
• A,BC
Functional dependence is a semantic notion.
• An understanding of the meaning of data items in the
application domain is required.
• Applying functional dependence enables the designer to identify
keys.
Candidate key
• If a field uniquely identifies every other field in a given table.
• It is a candidate key.
• Alternatively: If every field in a table is functionally dependent
on field k.
• Then k is a candidate key.
Composite key
• Sometimes a candidate key is a composite key
• A single data item is not sufficient to determine the other data items of the
record.
First Normal Form
• For a table to be in First Normal Form (1NF).
• All data item values must be atomic.
• This means that there are no repeating groups.
• In a table.
• Every row-column intersection holds a single value.
• Not a set of values.
Unnormalized table.
First Normal form
Second Normal form
Third Normal form
Transitive Dependency
SQL Timeline
• The logical sequence of operations is as follows:
1. Rows satisfying the conditions of the WHERE clause are
retrieved
2. Grouping of rows is performed as specified in the GROUP BY
clause
3. The aggregate functions are evaluated for each group
4. Groups are filtered as specified in the HAVING clause
5. Rows to be returned to the user are arranged according to the
ORDER BY clause
Phantom Reads (1 of 7)
• The concept of a phantom read is best explained through an
example.
1. Assume there are two transactions.
• A and B.
Phantom Reads (2 of 7)
2. Transaction A performs a query.
• Such as summing account balances.
Phantom Reads (3 of 7)
3. Transaction B inserts information about a new account into
the table processed by Transaction A.
Phantom Reads (4 of 7)
4. Transaction A repeats its query to sum account balances.
• This time, the result is different.
Phantom Reads (5 of 7)
5. Transaction A sees something that did not exist the first time
• A phantom.
Phantom Reads (6 of 7)
6. Following this scenario.
• Serializability has been violated.
Phantom Reads (7 of 7)
7. The interleaved execution of these transactions is neither Athen-B nor B-then-A.
Transactions (1 of 4)
• Transactions let users guarantee consistent changes to data.
• As long as the SQL statements within a transaction are grouped
logically.
Transactions (2 of 4)
• Data in all referenced tables are in a consistent state before the
transaction begins and after it ends.
Transactions (3 of 4)
• Transactions should consist of only the SQL statements that
make one consistent change to the data.
Transactions (4 of 4)
• After a transaction is committed or rolled back.
• The next transaction begins with the next SQL statement.
Materialized Views (1 of 3)
• A materialized view provides indirect access to table data by
storing the results of a query in a separate object.
• A materialized view is also called a materialized query table.
• A conventional view does not require disk space and has no data of its
own.
Materialized Views (2 of 3)
• To resolve a query against a conventional view.
• The DBMS retrieves the stored query from the system catalog and
executes a query.
Materialized Views (3 of 3)
• A materialized view contains rows resulting from the execution
of a query against one or more base tables or views.
• The result set of the query is stored in the database.
• This means the contents of a materialized view is subject to aging and
could need to be periodically updated or refreshed.
Applications of Materialized Views (1 of 6)
• Materialized views can be used to
•
•
•
•
Summarize.
Compute.
Replicate.
Distribute data.
Applications of Materialized Views (2 of 6)
• Materialized views are suitable in a variety of
•
•
•
•
Computing environments
Data warehousing.
Decision support.
Distributed or mobile computing
Applications of Materialized Views (3 of 6)
• Data warehouse: materialized views are used to compute and
store aggregated data
• Sums and averages.
• Compute joins with or without aggregations.
Applications of Materialized Views (4 of 6)
• Distributed environment
• materialized views are used to replicate data at distributed sites and
synchronize updates done at several sites with conflict resolution
methods.
Applications of Materialized Views (5 of 6)
• The materialized views as replicas provide local access to data
that otherwise has to be accessed from remote sites.
Applications of Materialized Views (6 of 6)
• Mobile computing
• Materialized views are used to download a subset of data from central
servers to mobile clients.
• With periodic refreshes from the central servers and propagation of
updates by clients back to the central servers.
Data Warehousing with Materialized Views
(1 of 5)
• Data flows from one or more online transaction processing
(OLTP) databases into a data warehouse
• Monthly.
• Weekly.
• Daily basis.
Data Warehousing with Materialized Views
(2 of 5)
• Data are normally processed in a staging file before being
added to the data warehouse.
Data Warehousing with Materialized Views
(3 of 5)
• Data warehouses commonly range in size from tens of
gigabytes to a few terabytes.
• Usually, the vast majority of the data is stored in a few very large fact
tables.
Data Warehousing with Materialized Views
Summaries (4 of 5)
• One technique employed in data warehouses to improve
performance is the creation of summaries.
Data Warehousing with Materialized Views
Summaries (5 of 5)
• Summaries are special kinds of aggregate views that improve
query execution times by pre-calculating expensive joins and
aggregation operations prior to execution and storing the results
in a table in the database.
• For example., you can create a table to contain the sums of sales by
region and by product.
• i.e. materialized view.
Data Warehousing with Materialized Views
(1 of 3)
• Materialized views that precompute and store aggregated data
are often referred to as summaries.
• They store summarized data.
• Summary management
• Eases the workload of the database administrator
• Means the user no longer needs to be aware of the summaries that
had been defined.
Data Warehousing with Materialized Views
(2 of 3)
• This mechanism reduces response time for returning results
from the query.
1. The database administrator creates one or more materialized
views.
• Which are the equivalent of a summary.
2. The end user queries the tables and views at the detail data
level.
3. The query rewrite mechanism automatically rewrites the query
to use the summary tables.
Data Warehousing with Materialized Views
(3 of 3)
• Materialized views within the data warehouse are transparent to
the end user or to the database application.
• Materialized views are usually accessed through the query
rewrite mechanism.
• However, queries can directly access summaries.
• Materialized views can also be used to precompute joins with or
without aggregations.
Query Rewrite Optimization
• Query transformation is transparent to an application.
• No reference to the materialized view is required in a SQL statement.
• Materialized views can be added or dropped without invalidating the SQL in the
application code.
• The optimizer transparently rewrites the request to use the materialized view.
• Queries are then directed to the materialized view and not to the underlying detail
tables or views.
• Query rewrite uses cost-based optimization.
• So it is important to collect statistics both on tables involved in the query and on the tables
representing materialized views.
• The concepts discussed above are generic and apply to Oracle and DB2.
• Details of the implementation of materialized views and their refresh mechanisms differ
between database systems.