Download Document

Document related concepts

IMDb wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational algebra wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
A database consists of an organized
collection of interrelated data for one or
more uses, typically in digital form.
Digital databases are managed using
database management systems, which
store database contents, allowing data
creation and maintenance, search and
other access.
Purpose of Database:
=>A database system should
be a repository of data needed
for
an
organization’s
data
processing.
=>This data should be accurate
,private and protected from
damage. it should be organized
so that diverse applications
with
different
data
requirements can employ the
data.



Different application programmers and
different end users will have different views
of the data which must be derived from a
common overall data structure .
The ways in which ends users want to utilize
existing data will constantly change, and in
some cases demands for new uses of the
data will arise rapidly and urgently.
The extent to which these demands can be
satisfied determines the overall value of the
database system.
Database concept
Disadvantage:
1.inconsistency
2.unsharable data
3.unstandardized data
4.unsecure data
5.incorrect data.
Database management system



It provides a centralized control of the data
IT is a software that responsible for storing,
maintaining and utilizing database. A database
along with a DBMS is referred to as a database
system.
Advantages
1.Databases reduces the data redundancy to a large
extent:
=>Duplication of data is known as Data
Redundancy.

The database systems do not maintain separate
copies of the same data. Rather, all the data are
kept at one place and all the application that
required data refer to centrally maintained
database


If any change is to be made to data, it will
be made at just one place and the same
changed information will be available to all
the application referring to it.
Thus redundancy gets controlled and so are
problems associated with it.
Database Management System
Database
End User
Centrally Controlled Database System
Application
Programs





2. Databases can control data
inconsistency
to
a
large
extent.
Inconsistent database provide incorrect or conflict
information.
When the redundancy is not controlled, there may be
occasions on which the two entries about the same data do
not agree (when one of them stores the updated
information and the other does not).
Data base said to be inconsistent.
By controlling redundancy, the inconsistency is also be
controlled.
Even if there is some redundancy retained in the database
due to some technical reasons, the database management
system ensures that any change made to either of the two
entries is automatically made to the other . This process is
known as propagating up dates
 3.Databases
facilitate sharing
of data: Individual pieces of data

in database may be shared by
different users and users may have
access to the same piece of data for
different purposes.
The database management system
makes sure that data not only the
existing application can
may be
satisfied without having to create
any new stored files
4.Databases enforce standards
The database management system
can ensures that all the data (that is
stored centrally ) follow the applicable
standards . There may be certain
standards laid by the company or
organization using database.
 Standards storing data formats is
particularly desirable as an aid to data
interchange or migration between
system.

5.Databases can ensure data
security: The information stored inside a
database is sometimes of great value to a
corporations.
Data Security: It refers to protection of data against
accidental or intentional disclosure to unauthorized
persons, or unauthorized modification or
destruction.
Privacy of Data :refers to the rights of individuals and
organizations to determine for themselves when
“how” and to what extent information about them is
to be transmitted to others.
6.Integrity can be maintained through database:
By integrated database we mean unification of several
otherwise distinct data files, with any redundancy
among those files partially or whole elimiminated.
1. A collection of interrelated files and a set of programs that
allow users to access and modify these files is known as a data
management system.
2.Database system is to provide the users only that much
information that is required by them.
3. A database abstraction layer is an application programming
interface which unifies the communication between a computer
application and databases such as MySQL, PostgreSQL, Oracle or
SQL. Traditionally, all database vendors provide their own
interface tailored to their products which leaves it to the
application programmer to implement code for all database
interfaces he would like to support. Database abstraction layers
reduce the amount of work by providing a consistent API to the
developer and hide the database specifics behind this interface
as much as possible. There exist many abstraction layers with
different interfaces in numerous programming languages.
Database management system
(DBMS)

A collection of interrelated files and a set of programs
that allow users to access and modify these files is
known as a DBMS.

End user is a person who is not a computer-trained

An application system analyst is the one who

person .
is concerned about the all of the database of logical
level. what are the relationships between the dataentities .
Physical Storage system analyst is concerned with
the physical implementation details of the database. i.e.
How would the database the stored on which storage
device. what will be the starting address of the
database.
=>Physical storage system analyst




It concerned with the physical
implementation details of the database
.i.e. how would the database be stored on
which storage device?
Levels of Database Implementation
INTERNAL LEVEL (Physical Level): The
lowest level of abstraction, the internal level, is
the one closest to physical storage. This level is
also sometimes termed as physical level.
It describes how the data are stored on
the storage medium

CONCEPTUAL LEVEL: This level of
describes what data are actually stored in
the database. It also describes the
relationships exists among data. The level
are not concerned with how these logical
in terms of simple data structure

EXTERNAL LEVEL : This is the level
closest to the users and is concerned
with the way in which the data are
viewed by individual user

Sales officer
View 1
Item-Name
Price
Purchase Officer
View 1
Item-Name
Price
Conceptual Level
Item-number Character(6)
Item-Name Characer(20)
Internal Level
External Level
Application
Programs
are
used to fetch
the
desired
information
Stored-Item
Length=40
Item #
Type=Byte(6),offset=0 Index=Ix
Name
Type=Byte(20),offset=6
Data Independence: The ability to modify a scheme
definition in one level without affecting a scheme in the next
higher level is called Data independence.
 There are two Levels of data independence:
1.Physical Data Independence: The ability to change the
physical schema without changing the logical schema
is called physical data independence. For example, a
change to the internal schema, such as using different
file organization or storage structures, storage
devices, or indexing strategy, should be possible
without having to change the conceptual.
2.LogicalData Independence: The ability to change the
logical (conceptual) schema without changing the
External schema (User View) is called logical data
independence. For example, the addition or removal
of new entities, attributes, or relationships to the
conceptual schema should be possible without having
to change existing external schemas.
=>It more difficult to achieve logical data independence than
physical data independence.
There are several models for databases:
• Tabular ("flat file") – data in a single
table,
eg. spreadsheet
• Hierarchical – e.g.. company
departments
• Relational
– Each row is a record (patient, child etc.)
– Each column is an attribute (age, sex
etc.)
=>Relational Data Model
A relational database is a set of related tables,
each concerning a specific topic. Its central
idea was to describe a database as a
collection of predicates over a finite set of
predicate variables, describing constraints on
the possible values and combinations of
values.
 The data is organized into tables (i.e. row and
columns).
 Tables are called Relations.
 A Row in a table represents a relationship among a
set of values. since a table is a collection of such
relationships
For example
 Suppliers (Supp#, supp-name, Status,
City)
 Underlined fields indicate primary key
Advantages:
 – Reduction of data redundancy
 – Eliminates inconsistencies (errors).
 – High data integrity and quality.
 – Data can be descriptive.
 – Allows implementation of security.

Primary-key is the key-field that can uniquely
identify a row in a relation.

Suppliers
Supp#
Supp-Name
Status
City
S1
Britannia
10
Delhi
S2
New Bakery 30
Mumbai
S3
10
Delhi
S4
Mother
Dairy
Cook
50
Bangalore
S5
Haldiram
40
Jaipur


Row of relations are known as tuples
and columns are known as Attribute
The relational data model is based on
a collection of tables (relations)
Conceptual design
Logical design
Physical design
Logical Data Model
Logical Process Model
Track
01
Country
TR
Physical
Implementation
Process
Database creation
CREATE DATABASE
CREATE TABLE
LOAD
Current systems evaluation
Development of Standards
Technological feasability
Operational feasability
Economical feasability



Network Data Model: The network
model is a database model conceived as a
flexible way of representing objects and their
relationships. Its distinguishing feature is that
the schema, viewed as a graph in which object
types are nodes and relationship types are arcs,
is not restricted to being a hierarchy or lattice.
A record is a collection of field (attribute),
each of which contains only one data
values.
Link mean that it is an association
between precisely two records
S1
Britannia
10
I2
20
Cake
I3
30
Bread
I6 Cream Biscuit
ADVANTAGES:
• Network Model is able to model complex relationships
and represents semantics of add/delete on the
relationships.
• Can handle most situations for modeling using record
types and relationship types.
• Language is navigational; uses constructs like FIND,
FIND member, FIND owner, FIND NEXT within set, GET
etc. Programmers can do optimal navigation through the
database.
DISADVANTAGES:
•Navigational and procedural nature of processing
•Database contains a complex array of pointers that
thread through a set of records.
• Little scope for automated "query optimization”
A hierarchical data model is a data model in which
the data is organized into a tree-like structure. The
structure allows repeating information using
parent/child relationships: each parent can have many
children but each child only has one parent. All
attributes of a specific record are listed under an entity
type.
Example:
Serial Num
Type
User EmpNo
3009734-4
Computer
100
10-L
3-23-283742
Monitor
100
Schultz
20-B
2-22-723423
Monitor
100
Drachmei
er
20-B
232342
Printer
100
Emp
No
First
Name
Last
Name
Dept.
Num
100
Sally
Baker
10-L
101
Jack
Douglas
102
Sarah
103
David

Hierarchical Form (Supplier Superior to items)
S1
Britannia
I2
I3
I6
10
Cake
Delhi
5.00
Bread 9.00
10
Cram Biscuit 10.00
20
20
ADVANTAGES:
• Hierarchical Model is simple to construct and operate on
• Corresponds to a number of natural hierarchically
organized domains - e.g., assemblies in manufacturing,
personnel organization in companies
• Language is simple; uses constructs like GET, GET
UNIQUE, GET NEXT, GET NEXT WITHIN PARENT etc.
DISADVANTAGES:
•Navigational and procedural nature of processing
•Database is visualized as a linear arrangement of
records
•Little scope for "query optimization"
A relational database matches data by using
common characteristics found within the data
set. The resulting groups of data are organized
and are much easier for many people to
understand. For example, a data set containing
all the real-estate transactions in a town can be
grouped by the year the transaction occurred;
or it can be grouped by the sale price of the
transaction; or it can be grouped by the buyer's
last name; and so on.
Relational Model developed by E.F. Codd (IBM)




Relation: A relation is a table i.e. data
arranged in rows and columns.
Domain : A domain is a pool of values from
which the actual values appearing in a
given column are drawn.
A domain describes the set of possible
values for a given attribute, and can be
considered a constraint on the value of
the attribute.
A domain is said to be if atomic domain
are considered to be indivisible domain.




Tuple: The row of tables (relations).
Attributes: The columns of tables
(relations).
The number of attributes in a relation is
called Degree and the number of row in
a relation is known as Cardinality.
View: A view is a table that does not
really exist in its own right but is instead
derived from one or more underlying
base table (s).

The concept of views
‘name of view
Select * from Items ‘ Name of base table
where Price>12
‘ the condition
Create VIEW GoodItems AS
=>GoodItems (A virtual table
based on Items table)
Item#
Item-Name
Price
I1
Milk
15.00
I4
Milk Bread
14.00
I7
Ice Cream
16.00
I9
Namkeen
15.00





Primary Key: It is a set of one or more
attributes that can uniquely identify tuples
within the relation.
Supp# is the primary key for suppliers.
Primary key consists of more than one
attribute ,it is called composite-primary key.
Candidate keys: All attribute combinations
inside a relation that can serve as primary
key are candidate keys as they are
candidates for the primary key position.
For Example: Supp# and supp-Name are two
candidate key in the suppliers relation. and
also Item# and Item-Name are candidate
key.




In case of two or more candidate
keys, only one of them serves as the
primary key. the rest of them are
alternates key.
Alternate Key: A candidate key that is
not the primary key is called an
alternate key.
In suppliers table, supp-name is the
alternate key and in Item table ItemName is the alternate key
Key : A non-key attribute, whose
values are derived from



Foreign-key: A non –key attribute, whose
values are derived from the primary key of
some other table, is known as Foreign-key
in its current table.
Primary key of one table is a foreign key of
other table.
For Example:
Sid
Sname
Email
01
A
[email protected]
02
B
[email protected]
Sid
Total Fee Fee Paid FeeBal
01
6000
4000
2000
02
6000
2000
4000



Referential Integrity is a system of
rules that a DBMS uses to ensure that
relationships between records in related
tables are valid, and that users don’t
accidentally delete or change related data.
Relational Algebra: The relational
algebra is a collection of operations on
relations. Each operation takes one or
more relations as its operand(s) and
produces another relation as its result.
Select operation: The Select operation
selects tuples(horizontal subset) from a
relation that satisfy a given condition.
It is denoted by lower letter Greek letter Sigma
( σ
).

For Example :To select those tuples from Items
relation where the price is more than 14.00.
We shall write;
σ Price >14.00(Items)
Item#
I1
I7
Item-Name
Milk
Ice Cream
Price
15.00
16.00
I9
Namkeen
15.00


i.
ii.
iii.
Project Operation: The project operation yields a
“vertical” subset of a given relation .
In Result, Duplicating tuples are automatically removed
It is denoted by Greek letter Pi(л) .
For Example:
л Supp-Name, city(Suppliers)
Britannia
City
Delhi
New Bakery
Mumbai
Mother Dairy
Delhi
Cook
Haldiram
Bangalore
Supp-Name
Jaipur
Supp#
S1
S2
S3
S4
S5
SuppName
Britanni
a
New
Baker
y
Mother
Dairy
Cook
Status City
Haldira
m
40
10
Delhi
30
Mumba
i
10
Delhi
50
Bangal
ore
Jaipur





Cartesian Product operation: The
Cartesian product is a binary operation.
It is denoted by a cross(X).
The Cartesian product of two relations A
and B is written as A X B.
It produce a new relation which has a
degree equal to the sum of the degree of
the two relations operated upon.
All tuples of first relation are concatenated
with all the tuples of second relation to
form the tuples of the new relation.
Student
Stud#
S001
S002
S003
StudName
Suraj
Preeti
Asra
Hosteler
Y
N
N
Instructor
Inst#
Inst-Name
Subject
I01
KVS, jabalpur IP
I02
KVS, kurnool Math
Stud#
StudName
Hosteler Inst#
InstName
Subject
S001
Suraj
Y
I01
KVS,
IP
jabalpur
S001
Suraj
Y
I02
KVS
kurnool
Math
S002
Preeti
N
I01
IP
S002
Preeti
N
I02
S003
Asra
N
I01
S003
Asra
N
I02
KVS,
jabalpur
KVS
kurnool
KVS,
jabalpur
KVS
kurnool
Math
IP
Math
Union operation: The union operation is a
binary operation that requires two relations
as its operands. It produces a third relation
that contains tuples from both the operand
relations
 It is denoted by U.
 Union of two relations x and y is x U y
 Condition for union operation
1.The relations A and B must be of the same
degree. that is, they must have the same
number of attribute.
2. The domains of the ith attributes of A and
ith attribute of B must be the same.

Examples
Drama

Song
Roll No Name Age
13
Rush 15
17
Swati 14

Roll No
2
10
13
Name
Manya
Rishabh
Rush
Result of Drama U Song will be
Roll No
2
10
13
17
Name
Manya
Rishabh
Rush
Swati
Age
15
15
13
14
A
B
Age
15
15
13
The Set Difference Operation:
 The set difference operation gives
tuples that are in one relation but
not in another
 It is denoted by –(minus).
 The expression A-B results in a
relation containing those tuples in A
but in B.
 Result of Drama – Song will be
Roll No
Name
Age
17
Swati
14
A
B




The set Intersection operation
The set intersection operation finds
tuples that are common to the two
operand relations
It is denoted by ∏.
Result of Drama ∏ Song will be
Roll No
13
Name
kush
Age
15
END



The Join operation
The join operation joins two relations to form a new
relation on the basis of common column the two
operand relations have.
Suppliers
Supp#
Supp-Name
S1
S2
Britannia
10
New Bakery 30
Delhi
Mumbai
S3
Mother
Dairy
Cook
Haldiram
10
Delhi
50
Bangalore
40
Jaipur
S4
S5
Status
City
Clients relation

Clients#
C101
C102
C104

Client-Name
ABC Co.
XYZ Co.
Zigs Co.
City
Delhi
Jaipur
Jaipur
If tables suppliers and Clients are joined over
City columns ; the results will be
Supp
#
SuppName
Stat City
u
s
Clients#
ClientName
City
S1
Britannia
10
Delhi
C101
ABC Co.
Delhi
S3
Mother
Dairy
10
Delhi
C104
Zigs Co.
Jaipur
S5
Haldiram
40
Jaipur
C102
XYZ Co.
Jaipur
S5
Haldiram
40
C104
C104
Zigs Co.
Jaipur
Type of Joins
1.Inner or equi –join: Inner join is used to give
matching records from joined tables.




Normalization: It is a process by which one goes
from to another preferable form through a set of
rule and observations.
The objective of normalization on tables is to
remove the redundancy of data.
First normal form: A relation R is in First
Normal form(1NF) if and only if all underlying
domains of the relation atomic (indivisible)
values.
In every tuples of the relation R, no
attribute should have repeating
groups.
 In every tuple of the relation R, each
attribute must have a value and that
too an atomic(indivisible) values
 Method of Bringing Unnormalized
Relation into 1NF
1.Remove all repeating groups from the
relation
2. Decompose non-atomic attributes to
atomic attributes.

Student
Stud Name
#
101 Shobhit
102
Subject
English
Maths
Biology
Upasana Economic
s
English
Hindi
Stud Name
#
101 Shobhit
101 Shobhit
Subject
101 Shobhit
Biology
English
Maths
102 Upasana Econom
ics
102 Upasana English
102 Upasana Hindi
Second Normal Form(2NF)



Functional Dependence:
Functional Dependence is
a relationship that exists
between any two fields.
JK(K is functionally
dependent on j)
JL(L is not functionally
dependent on j)
J
K
L
X
1
0
X
1
6
Y
4
1
Y
4
9
Z
3
5




Second Normal Form(2NF): A relation R is in
Second Normal form(2NF) if and only if it is 1NF
and every non-key attribute is fully dependent
on the primary key.
An attribute is a non-key if it does not
participate in the primary key.
Partial dependence means a non-key attribute
is dependent upon a portion of the compositeprimary- key and not the whole primary key.
Fully functional dependency: A FD X Y is a
fully FD if remove of any attribute A from X
means that the dependency does not hold good
any more.

Teacher
T-Name
Subject
Child
Name
Gurinder Comp.Sc Saran
Child
Age
5
Anand
Child
Sex
F
Electroni Vivek
7
cs
Shridhar Physics Sandhya 4
M
Maya
Comp.Sc Vinayaka 7
M
Maya
Comp.Sc Sangeeta 5
F
Maya
Comp.Sc Ananth
M
3
F

Teacher
T-Name Subject
Gurinder Comp.Sc
Anand
Electronics
Shridhar Physics
Maya
Comp.Sc
Child
TName
Gurind
er
Anand
Child
Name
Saran
Child Child
Age Sex
5
F
Vivek
7
M
Shridh Sandh 4
ar
ya
Maya Vinaya 7
ka
Maya Sange 5
eta
Maya Anant 3
h
F
M
F
M



The third Normal Form: A Relation R is
said to be in Third Normal Form(3NF) if
and only if it is in 2NF and every non-key
attribute is non-transitively dependent
upon the primary key
Every non-key attribute depends upon the
primary key.
Teacher
T-Name
Gurinder
Anand
Shridhar
Maya
Subject
Comp.Sc
Electronics
Physics
Comp.Sc
Classes
5
4
5
2

Teacher
T-Name
Gurinder
Anand
Shridhar
Maya
Sbcl
Subject
Comp.Sc
Electronics
Physics
Comp.Sc
Subject
Comp.Sc
Electronics
Physics
Comp.Sc
Classes
5
4
5
2

Boyce- Codd Normal Form(BCNF): A
relation is in BCNF if it is in 3NF and
all of its determinants (i.e. the
attributes upon which other
attributes depend) are candidate key
(i.e. the attributes that posses the
properties to be a primary key but
could not have been chosen as keys).
Fields – different types of data
(number or text)
Records
Queries
Reports
Searching for novel patterns,
rules or relationships in data,
e.g.:
•correlations
•classification
•clustering
•visualization
Versus traditional statistics:
hypothesis testing
States Data Mart - Cognos