Download Concept of database

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

IMDb wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Unit: 2
Database
Concept of database
People need data, so we create all kinds of list to store and organize it. A grocery list, a
phone book, a library’s card catalog, and instructor list of student are all organizes list of data.
Likewise, computer can be used to store and manage lists of data, and this is the reason for the
computerize database. In fact, many early attempts to build and program computer grew out of a
need to manage large list of data. That’s why you probably have heard the phrase data processing
– computers manipulate data. That data is usually stored in a database.
Today’s computers are much more sophisticated than early computers, but they still need an
organized data source. This need applies to just about every type of computer program. For
example, a word processor maintain several database – such as a dictionary of words for the spell
checker and thesaurus, a list of available fronts and other types of data.
Data, Information, Database and Database Management Software(DBMS)
Data
Data can be anything like bio-data of various applications when the computer is used for
recruiting personnel or the marks obtained by various students in various subject when computer
is used to prepared result. Thus a collection of fact in raw form that becomes information after
proper organization or process.
Information
When the raw data is processed then we obtain the information. The activity of processing data
using a computer is called data processing. The information is the processed form of data which
gives complete meaning and used in decision making.
Database:
A database is a collection of data stored in a standardized format designed to be shared by
multiple user. The databases are designed to manage large body of information. The
management of data involves both definition of structure for the storage of information and
provision of mechanism for the manipulation of data. The example of database are telephone
diary, a library’s card catalog etc.
1
DBMS:
Database management system (DBMS) is software that defines a database, stores the data,
supports query language, produces reports and creates data entry screens. The set of program
provides to facilitate the user in organizing, creating, deleting and manipulating their data in a
database is also known as DBMS. The examples of database management software are MSAccess, MYSQL, SQL server etc.
Objectives of DBMS (Database management system)
In the database –oriented of organized data, data from multiple related fields is integrated
together in the form of a database, which has following objectives.
a.
b.
c.
d.
e.
f.
g.
h.
i.
Provides greater query flexibility.
Provides for mass storage of relevant data.
Reduces data redundancy.
Solves data integrity (inconsistency) problems.
Provide data security features at database level, record level, etc.
Allows multiple users to be active at a time for accessing data.
Makes data independent of the application programs.
Provides prompt response to user request for data.
Allows updating large volume of data at a time.
Database model
There are different types of database management systems, each are characterized by the
way in which data are defined and structured of database. The database model defines the
manner in which the various files of a database are linked together. There are mainly four types
of database model commonly in used. They are:
a) Relational database model.
b) Hierarchical database model
c) Network database model.
2
a) Relational database model.
A database model in which the data elements are organized in the form of multiple tables
and the data in one table is related to the data in another table through the use of a
common field(column).
Table 1
Table 2
StudentName
RollNo
RollNo
PaymentDate
DOB
FeeAmount
Gender
Remark
Address
Fig 2.1 :Relational database model
In the fig 2.1 the table of student info is related with the Fee table through the RollNo.
Advantages
a.
b.
c.
d.
e.
It has less redundancy because of primary key.
Normalization of database is possible.
Rapid database process.
Easy for searching of data.
Referential integrity can be applied.
Disadvantages
a. It is complex to maintain than other.
b. We have to apply to many rules
c. It is not user friendly.
3
Hierarchical database model.
In hierarchical database, the data elements are linked in the form of an inverted tree structure
with the root as the top and the branch formed below. Below the single-root data element are
subordinate elements, each of which in turn, has one or more other elements. There is a parentchild relationship among the data elements of a hierarchical database. A parent data element is
one or more subordinate data elements. The data elements are below a parent are its children data
elements.
ORGANIZATION
Personal dept
Staff
Manager
Technical dept
Finance dept
Staff
Manager
Staff
Manager
Fig 2.2: hierarchical database model.
Advantages
a. Searching is fast and easy, if parent is known.
b. It is easiest model than other database model.
c. Very efficient in handling ‘one to many’ relationship.
Disadvantages
a. It is old model of database.
b. Difficult to modification and addition of child.
c. It can’t handle ‘many-to-many’ relationship.
Network database model
A network database structure is an extension of the hierarchical database structure. In this model
the data elements of a database are organized in the form of parent-child relationships and all
type of relationships among the data elements must be determined when the data is first design.
In network database model a child data elements can have more than one parent or no parent at
all. In this model the database management system permits the extraction of the needed
4
information by beginning from any data elements in the database structure instead of starting
from the root data element.
college
English
Computer
Sabita
sunita
Maths
Rabi
Account
Alin
Fig 2.3: network database model
In the fig 2.3 the data has more than one parents but the data ram has no parents.
Advantages
a.
b.
c.
d.
It has more flexibility.
It reduces data redundancy.
Searching is fast because of multi directional pointer.
It accepts many-to-many relationship.
Disadvantages
a. Difficult to sort data.
b. Very complex type of database model.
c. Need long programs for relationship.
Relational Data Model
A database is the collection of related items of facts arranged in a specific structure. So arrange
data in the proper structure we define different terms.
1. Field: The smallest unit of data in a database, used to group each piece or item of data
into a specific category. Field is arranged in a column. The field has the same data types.
2. Record: A database row composed of related fields; a collection of record makes up the
database. The record gives the complete meaning.
5
3. Tables: complete collection of record is called tables. A table contains rows and
columns. Each columns of table represent fields and each row represents the record.
4. Entity: an entity is a class of people, objects, events, or concept in this real word that is
different from other object. An entity is something about which the business needs to
store data entity is called entity class or entity type.
Person: doctor, teacher, student etc
Object: tool, machine, building, pen etc.
Place: zone, region, country etc.
5. Attributes: it is descriptive propertied possessed by each member of an entity. Attribute
are also called elements, property or fields.
Stid
S001
S002
S003
S004
name
Sabita regmi
Ram
Anjana
Raju
address
Ngh
Ktm
Pok
Ngh
Class
12
11
12
12
Sec
A
B
A
a
In the above table stid, name, address, classs, sec are attributes.
Name
Stid
Student
Address
Sex
Class
Fig 2.5: E-R diagram
The given figure is a ER (entity Relationship) diagram which shows the relation between
different entities. The entity relationship show actual relation of instance to different attributes
In E-R diagram we also assign different key to make the relation between two tables of different
entity. The keys are as following.

Primary key: A column or set of column that identify a particular row in a table. The
primary key makes the data uniqueness in that table. It also helps us to reduce data
redundancy. It is also used to set the relationship between tables. In E-R diagram the
primary key is shown as underline. In fig 2.5 stid has the primary key.
6



Foreign key: A foreign key is a field in a child table that refers the primary key of a
master table. It is required for setting relationship between tables
Unique Key: A unique key is a type of key in which data can’t be repeated and they must
be unique and allow null value once.
Candidate Key: In table there may be no. of unique columns, but one is chosen for
primary key then others become candidate keys.
Concept of Normalization
The essence of normalization is to split your data into several tables that will be
connected to each other based on the data within them. Relational data base operate on
tables of data. These tables must be carefully defined to obtain the advantage of database
approach. The process of determining the proper tables for the database is called
normalization.
The normalization is the process of organizing data in a database to reduce the
redundancies, it also include creating of tables and establishing the relationship between
those tables using rules designed to protect the data and to make database flexibility.
Unnormalized
normalize table
Student info
personal info
Studentid
Firstname
Lastname
Class
Subject
Marks
Sec
Roll
StudentId
Firstname
Lastname
Class
sec
subject table
marks table
SubjectId
Subject
roll
SubjectId
Studentid
Marks
fig2.6: normalization of data
7
In the above fig 2.6 the first table is not in normalized form. Those we want to enter the marks
we have to enter all the information so there is chance of redundancy of data and it seem to be
ineffective so the table is split into other three table to make data independent and the table is
only depend by keys. The above normalization helps us to make sure of following:
a) Dependence between the data is identified.
b) Redundancy in database is minimized.
c) The data model is making more flexible, and easier to maintain.
Types of normalization
a) 1NF( first normal form)
When the table has no repeating group of data then it is said to be in first normal
form. That means for each cell in a table (one row and one column), there can be
only one value. This value should be atomic in the sense that It can’t be
decomposed into smaller pieces.
Name
Sangita
Laxmi
Sabita
Roll
2
1
1
Class
12
11
12
Sec
A
B
A
Sub1
English
English
english
Marks1
78
90
67
Sub2
maths
Maths
Maths
Marks2
90
89
98
Sub3
Computer
Computer
Computer
Marks3
78
67
90
The above table is not in normal form the attributes are most in repeated form do
in first normal form we break table in the following way.
Name
Roll
Class
Sec
Subject
Marks
Sangita
2
12
A
Maths
90
Laxmi
1
11
B
English
90
Sabita
1
12
A
Computer
90
Sangita
2
12
A
English
78
Laxmi
1
11
B
Computer
67
Sabita
1
12
A
Maths
98
Sangita
2
12
A
Computer
78
Laxmi
1
11
B
Maths
89
Sabita
1
12
A
English
67
b) 2NF(second normal form)
8
The table is in second normal form if every non-key column depends on the entire
key. For these split the table. Pull out the columns that depend on parts of key.
Remember to include that part of the key in new table. The new table must have
key or id that must be on both tables. Each attributes in the table must depend on
whole key.
Table 3: Marks
Name
Subject
Sangita Maths
Laxmi
English
Sabita
Computer
Sangita English
Laxmi
Computer
Sabita
Maths
Sangita Computer
Laxmi
Maths
Sabita
English
Marks
90
90
90
78
67
98
78
89
67
Table 1: student
Name
Roll
Sangita 2
Laxmi
1
Sangita 1
Table 2:subject
class
Subject
English
11
English
12
Computer 11
Computer 12
Maths
11
Maths
12
Class
12
11
12
Sec
A
B
A
In the above the whole table is split into the three tables, marks, subject, and student. The
interrelated data are place together in the table. Name depends on roll+class+sec, subname
dependent on class not on roll, name subject and marks are interrelated.
c) 3NF(Third normal form)
The logical, analysis and elements of designing for third normal form (3NF) are
similar to those used in deriving 2NF. In particular, you still concentrate on the
issue of dependence. To be in 3NF a table must be in 2NF, and every non-key
column must depend on nothing but the key.
Table1: class
classid
Classname
1
11
2
12
Table 2:subject
subid
Subject
1
English
2
Computer
3
maths
Table 3: student
stid
Name
1
2
3
Table4: marks
stdid
1
2
3
Sangita
Laxmi
sabita
subid
1
1
1
Roll Classi
d
2
2
1
1
1
2
Sec
A
B
A
Marks
78
90
67
9
1
2
2
2
In the given table all the attribute are depends on 3
2
the key thus in the table class subject and student all 1
3
the attributes depend on primary key but in table 2
3
3
3
marks the data are depend on stdid and subid. So
these four tables are the normalized data of the given non normalized table.
78
67
90
90
89
98
Structure Query language
Structured Query Language (SQL), in computer science, a database sublanguage used in
querying, updating, and managing relational databases. Derived from an IBM research project
that created Structured English Query Language (SEQUEL) in the 1970s, SQL is an accepted
standard in database products. Although it is not a programming language in the same sense as C
or Pascal, SQL can either be used in formulating interactive queries or be embedded in an
application as instructions for handling data. The SQL standard also contains components for
defining, altering, controlling, and securing data. SQL is designed for both technical and
nontechnical users
Some database system provides special window or form for creating queries. Because of similar
of almost all databases, a common type of query language is developed which is called structured
English Query language. The most commonly used command in SQL is the select statements,
which is used to retrieve data from table.
The basic structures of SQL language are as following.
SELECT field1, field2,….
FROM table name
WHERE condition.
Query from one tables
SELECT name, class, roll, sec
FROM student
WHERE sec=”a”
This query show the name, class, roll, sec from student tables which section is A only.
Query from two tables
SELECT name, subject, marks
FROM student, subject
WHERE stid.dtudent=subid.subject
10
Centralized Vs Distributed Database.
Centralized database work on a client-server basis. With this system powerful machine with
multiuser operating system function as server. Smaller computer –usually personal computer
operate as the clients. The software holds software and data that will be shared by the users.
Individual client computer hold data that is used by the individual using that machine. The
controlled mechanism and data are deposited in a center location. A DBA is appointed as the
controller of the whole database. It is only suitable for the small organization and small-scale
operation bases.
Distributed database systems consist of multiple independent databases that operate on two or
more computers that are connected and shared data over network. The databases are generally in
different physical locations. Each database is controlled by an independent DBMS, which is
responsible for maintaining the integrity of its own database. The advantages of distributed
database are as following:
a. It provide high performance, most update and queries are update locally.
b. They are easy to expand database.
c. It also provides transaction processing and decision support application.
Data Security
A databases collects a large amount of data in one location and makes it easy for people to
retrieve and change data. So databases are a critical resource that must be protected. Yet the
same factors that make a database so useful also make it more difficult to secure. The database
should be making secure. The data may be lost by crashes during transaction processing,
unauthorized reading of data, destruction of data etc.
To protect the database, we must take security measures at several levels:


Physical: the site or sites of containing the computer must physically secure against
armed or surreptitious entry by intruders. The physical security also consist of regular
maintenance, insurance, protect from theft etc.
Human: database user must be authorized carefully. The data privacy of user is also the
factor that may damage the database.
11


Operating system: we can also make the database secure by the policy of operating
system also today there are many options of security provided by O.S.
Network: we can also applied by the network physical layer, since all the databases are
connected through the network.
Data Integrity
Data integrity is set of rule that govern database for correctness and consistence purpose. Data
integrity ensure that changes made to the database by the authorized user don’t result in a loss of
data consistency . Data integrity guard against accidental damage to the database by ensuring
that authorized changes to the database do not result in loss of data consistency.
The three different type of data consistency are:
1. Domain Integrity
2. Entity Integrity
3. Referential Integrity
Domain Integrity
Domain integrity is set of rule that is applied on cell level data. It ensure set of value that may be
associated with and attribute(property/Column).Domain constraint(rule) are most elementary
form of data integrity. They are tested by system, whenever a new data item is entered into
database. E.g.
Student
RollNo
1
2
3
aa
Name
Ram
Syam
Hari
Amit
Class
XI
XI
XI
XI
Age
20
19
18
55
Here ‘aa’ is not a roll no. that violate domain integrity and set of allowable value are only
numeric value.
Entity Integrity
Entity integrity is set of rule that is applied on row level(record level) data. It ensure that each
row in a table must be uniquely identifiable by some key.. E.g.
Student
RollNo(Primary Key)
Name
Class
Age
12
1
2
3
3
Ram
Syam
Hari
Amit
XI
XI
XI
XI
20
19
18
55
Here 4th row violate entity intergrity rule because each key that is use to indentify the
row(record) must be unique in that column.
Referential Integrity
Referential integrity is set of rule that is applied on Table level. The table which refrence
column of main table primary key column must be present in reference table. E.g ;
.
Student(Main Table)
RollNo(Primary Key)
1
2
3
4
Marks(Refrence
Table
RollNo
1
2
3
Name
Ram
Syam
Hari
Amit
Class
XI
XI
XI
XI
Age
20
19
18
55
Subject
Math
Math
Math
Math
Class
XI
XI
XI
XI
Age
20
19
18
55
Here in Marks table Roll No. must be present for Marks information, Therefore the marks table
violate referential integrity rule.
13
Difference Between DBMS and RDBMS
DBMS
DBMS Stands for Database Management
System.
It is a flat file approach of data storing system
A DBMS, is any system that manages
databases,
Small numbers of users
Chances of massive duplicate data records
RDBMS
RDBMS stands for Relational Database
Management System.
It stores data in multiple table which are related
to each other by some relationship.
A RDBMS is a subtype of DBMS that is
limited to what are called relational databases.
Large No. of users
Less data duplication .
14