Download Relational Databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

SQL wikipedia , lookup

Relational algebra wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Multimedia Information Systems
CS 4570
Outlines
• Introduction to DMBS
• Relational database and SQL
• B+- tree index structure
Database Management System
(DBMS)
• Collection of interrelated data
• Set of programs to access the data
• DBMS contains information about a
particular enterprise
• DBMS provides an environment that it both
convenient and efficient to use
Purpose of Database Systems
• To overcome the disadvantages of the typical
file–processing systems:
–
–
–
–
–
–
–
Data redundancy and inconsistency
Difficulty in accessing data
Data isolation  multiple files and formats
Integrity problems
Atomicity of updates
Concurrent access by multiple users
Security problems
Data and Data Models
• Data is really the “given facts”
• A data model is a collection of tools for describing:
–
–
–
–
Data
Data relationships
Data semantics
Data constraints
• Object-based logical models:
– object-oriented model, semantic model…
• Record-based logical models:
– relational model (e.g. SQL, DB2)…
Entity-Relationship (ER) Model
• A database can be modeled as:
– a collection of entities,
– relationships among entities
Relational Model
Relational Databases
• A relational database is a database that is
perceived be its users as a collection of
tables. ( and nothing but tables ).
Relation Instance
• The current values (relation instance) of a relation
are specified by a table.
• An element t of r is a tuple; represented by a row
in a table.
tuple
attribute
Structured Query Language (SQL)
• SQL is based on set and relational operations with certain
modifications and enhancements
• A typical SQL query has the form:
select A 1 , A 2 , ..., A n
from r 1 , r 2 , ..., r m
where P
– Ais represent attributes
– ris represent relations
– P is a predicate.
• The result of an SQL query is a relation.
Banking Example
branch(branch-name, branch-city, assets)
customer(customer-name, customer-street, customer-city)
account(branch-name, account-number, balance)
loan(branch-name, loan-number, amount)
depositor(customer-name, account-number)
borrower(customer-name, loan-number)
The Select Clause
• The select clause corresponds to the projection operation of the
relational algebra. It is used to list the attributes desired in the result
of a query.
• Find the names of all branches in the loan relation
select branch-name
from loan
• An asterisk in the select clause denotes”all attributes”
branch-name
loan-number amount
select
*
from loan
The Select Clause(Cont.)
• SQL allows duplicates in relations as well as in query
results.
• To force the elimination of duplicates, insert the keyword
distinct after select.
Find the names of all branches in the loan relation, and
remove duplicates
select distinct branch-name
from loan
• The keyword all specifies that duplicates not be removed.
Select all branch-name
from loan
The Select Clause(Cont.)
• The select clause can contain arithmetic expressions
involving the operators,+,-,*,/,and operating on constants
or attributes of tuples.
• The query:
select branch-name,loan-number,amount * 100
from loan
would return a relation which is the same as the loan
relation, except that the attribute amount is multiplied by
100
The from Clause
• The from clause corresponds to the Cartesian product
operation of the relational algebra. It lists the relations to
be scanned in the evaluation of the expression.
• Find the Cartesian product borrower  loan
select *
from borrower, loan
• Find
the name
and loan number
of all customers having a
borrower
(customer-name,
loan-number)
loan at the Perryridge branch.
loan (branch-name, loan-number, amount)
select distinct customer-name,borrower.loan-number
borrower
× loan
from borrower,loan
where borrower.loan-number=loan.loan-number
( borrower.customer-name,
borrower.loan-number, and
loan.branch-name,
loan.loan-number, loan.amount)
branch-name=“Perryridge”
The where Clause
• The where clause corresponds to the selection predicate of
the relational algebra. It consists of a predicate involving
attributes of the relations that appear in the from clause.
• Find all loan numbers for loans made at the Perryridge
branch with loan amounts greater than $1200.
select loan-number
from loan
where branch-name=“Perryridge” and amount>1200
• SQL uses the logical connectives and, or, and not. It
allows the use of arithmetic expressions as operands to the
comparison operators.
The where Clause(Cont.)
• SQL includes a between comparison operator in order to
simplify where clauses that specify that a value be less
than or equal to some value and greater than or equal to
some other value.
• Find the loan number of those loans with loan amounts
between $90,000 and $100,000(that is, $90,000 and 
$100,000)
select loan-number
from loan
where amount between 90000 and 100000
Example Query
• Find all branches that have greater assets than some branch
located in Brooklyn.
select distinct T.branch-name
from branch as T, branch as S
where T.assets > S.assets and
S.branch-city =“Brooklyn”
B+ -Tree Index Files
B+ -Tree indices are an alternative to indexed-sequential files.
• Disadvantage of indexed-sequential files: performance
degrades as file grows, both for index lookups and for
sequential scans through the data.
• Advantage of B+ -Tree index files: automatically reorganizes
itself with small, local, changes, in the face of insertions and
deletions.
• Disadvantage of B+ -Trees: extra insertion and deletion
overhead, space overhead.
• Advantages of B+ -Trees outweigh disadvantages, and they are
used extensively.
+
B
-Tree Index Files (Cont.)
A B+ -Tree is a rooted tree satisfying the following
properties:
•All paths from root to leaf are of the same length
•Each node that is not a root or a leaf has between n/ 2
and n children
•A leaf node has between (n-1)/2 and n-1 values
•Special cases:if the root is not a leaf it has at least 2
children; if the root is a leaf (that is there are no other
nodes in the tree) it can have between 0 and (n-1) values.
B+ -Tree Node Structure
•Typical node
–Ki are the search-key values
–Pi are pointers to children (for non-leaf nodes) or
pointers to records or buckets of records (for leaf
nodes).
•The search-keys in a node are ordered
K1 < K2 < K3 < ...
Leaf Nodes in B+ -Trees
Properties of a leaf node:
• For i = 1, 2, . . . , n-1, i either points to a file record with
search-key value Ki , or to a bucket of pointers to file records,
each record having search-key value Ki . Only need bucket
structure if search-key does not form a primary key.
• If Li , Lj are leaf nodes and i < j , Li ‘s search-key values are
less than Lj ’s search-key values
• Pn points to next leaf node in search-key order
Non-Leaf Nodes in B+ -Trees
• The nonleaf nodes form a multi-level sparse index on the leaf
nodes. For a non-leaf node with n pointers:
–All the search-keys in the subtree to which P1
points are less than K1
–For 2  i  n-1 all the search-key in the subtree to
which Pi points have values greater than or equal
to Ki-1 and less than Ki
–All the search-keys in the subtree to which Pn
points are greater than or equal to Kn-1
Example of a
+
B -tree
Queries on B+-Trees (Cont.)
• In processing a query, a path is traversed in the tree from the root to
some leaf node.
• If there are K search-key values in the file, the path is no longer
than log n/ 2 (K ).
• A node is generally the same size as a disk block, typically 4
kilobytes, and n is typically around 100 (40 bytes per index entry).
• With 1 million search key values and n = 100, at most log50 (1, 000,
000) = 4 nodes are accessed in a lookup.
• Contrast this with a balanced binary tree with 1 million search key
values --- around 20 nodes are accessed in a lookup
–above difference is significant since every node access may
need a disk I/O, costing around 30 millisecond!
Updates on B+-Trees: Insertion (Cont.)
• Splitting a node:
–take the n (search-key value, pointer) pairs (including the
one being inserted) in sorted order. Place the first n/ 2 in
the original node, and the rest in a new node.
–let the new node be p, and let k be the least key value in p.
Insert (k, p) in the parent of the node being split. If the
parent is full, split it and propagate the split further up.
• The splitting of nodes proceeds upwards till a node that is not
full is found. In the worst case the root node may be split
increasing the height of the tree by 1.
Updates on B+-Trees: Insertion (Cont.)
Self-Practice
+
•Construct a B -tree for the following set of
key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31)
with n = 6