Download chapter 1 notes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data center wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Data model wikipedia , lookup

Data analysis wikipedia , lookup

SAP IQ wikipedia , lookup

Information privacy law wikipedia , lookup

Versant Object Database wikipedia , lookup

Concurrency control wikipedia , lookup

3D optical data storage wikipedia , lookup

Data vault modeling wikipedia , lookup

Relational model wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Database wikipedia , lookup

Business intelligence wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
CHAPTER-1: DATABASE SYSTEM CONCEPTS
1) Define data, database and DBMS. List any two applications of DBMS.
Data: A set of isolated and unrelated raw facts with an implicit meaning.
Or
Data is a representation of facts, concepts or instructions in a formalized manner suitable
for communication, interpretation or processing by humans or by automated means. E.g
student database file contain name, age, enroll-no .This all information is called as data
stored in the database.
Database:
A database is a collection of information that is organized so that it can easily be
accessed, managed, and updated.
DBMS: A database-management system is a collection of interrelated data and a set of
programs to access those data.
Applications of DBMS:
1. Banking
2. Airlines and railways
3. Sales
4. Telecommunications
5. Universities.
6. Manufacturing
7. E-commerce
8. Credit card transactions.
2)Explain two disadvantages of file processing system.
Ans) Disadvantages of file processing system
1. Data redundancy and inconsistency:
Since the files and application programs are created by different programmers over a long
period, the various files are likely to have different formats and the programs may be
written in several programming languages. So, the same information may be duplicated
in several places (files).That repetition of information is known as redundancy. This
redundancy leads to higher storage and access cost.
In addition, it may lead to data inconsistency, which is different copies of the same data
may have different values.
2. Difficulty in accessing data:
The conventional file processing system do not allow to access data in a convenient and
efficient way. As the data was scattered in different files and whenever need arises
different application programs were written by different programmers in different
formats.
3. Data isolation:
As the data scattered in various files and files may be in different formats, writing new
application programs to retrieve the data is very difficult.
4. Integrity problems:
CHAPTER-1: DATABASE SYSTEM CONCEPTS
The data values stored in the database must satisfy certain types of consistency
constraints. When new constraints are added, it is difficult to change the programs to
enforce them.
5. Atomicity problems:
A computer system, like any other mechanical or electrical device, is subject to failure. If
any failure occurs in the system the transaction which are executing should fully get
executed or should not, so that database remains in consistent state.
6. Concurrent-access anomalies:
To improve the performance of the system, multiple transactions must get executed
concurrently. Multiple transactions may be updating the same data concurrently. In such
case the data may result in inconsistent state
7. Security problems:Only authorized person should be able to modify the data. Security
should be maintained at different levels which were not possible in file processing
system.
3) Explain Data base Redundancy and Integrity.
Ans:
Data redundancy:
Data redundancy is the unnecessary repetition of data.
Since different programmers create the files and application programs over a long period,
the various files are likely to have different structures and programs may be written in
several programming languages. The same piece of information or program may be
duplicated in several places.E.g accountingdepartment and registration department both
keep student name, number and address.
Data Redundancy:
1. Increases the size of the database unnecessarily
2. Causes data inconsistency.
3. Increases the access cost and decreases efficiency of database.
4. May cause data corruption.
Such data redundancy in DBMS can be prevented by database normalization.
Data integrity:
1. Data integrity refers to maintaining and assuring the accuracy and consistency
of data over its entire life-cycle.
2. Data integrity is usually imposed during the database design phase through the
use of standard procedures and rules.
3. Data integrity can be maintained through the use of various error checking
methods and validation procedures.
E.g The balance of certain type of bank account may never fall below a prescribed
amount (Rs.5000).
We can handle this through program code and declaring integrity constraint along with
definition.
CHAPTER-1: DATABASE SYSTEM CONCEPTS
4) Explain any four functions of DBMS.
Ans)Database Communication Interfaces: The end-user's requests for database access
are transmitted to DBMS in the form of communication messages.
1. Authorization / Security Management: The DBMS protects the database
against unauthorized access, either internationalor accidental. It furnishes
mechanism to ensure that onlyauthorized users can access the database.
2. Backup and Recovery Management: The DBMS providesmechanisms for
backing up data periodically and recoveringfrom different types of failures. This
prevents the loss of data,
3. Concurrency Control Service: Since DBMSs support sharing ofdata among
multiple users, they must provide a mechanism formanaging concurrent access to
the database. DBMSs ensurethat the database kept in consistent state and that
integrity ofthe data is preserved.
4. Transaction Management: A transaction is a series ofdatabase operations,
carried out by a single user or applicationprogram, which accesses or changes the
contents of thedatabase. Therefore, a DBMS must provide a mechanism toensure
either that all the updates corresponding to a giventransaction are made or that
none of them is made.
5. Database Access and Application Programming Interfaces:All DBMS provide
interface to enable applications to use DBMSservices. They provide data access
via Structured QueryLanguage (SQL). The DBMS query language contains
twocomponents: (a) a Data Definition Language (DDL) and (b) a
DataManipulation Language (DML).
6. Data integrity and consistency: to provide data integrity anddata consistency,
the DBMS uses sophisticated algorithms toensure that multiple user can access
the database concurrentlywithout compromising the integrity of the database
5) Difference between DBMS and RDBMS.
Ans:
Sr. DBMS
No.
Old version of software tohandle the
1
databases.
In DBMS no relationship concept.
2
3
4
5
. Data security is low as compare to
RDBMS
Data storage capacity is lessas
compare to RDBMS.
Not easy to maintain dataintegrity.
RDBMS
Latest version of software forhandling
databases.
It is used to establish therelationship
concept between todatabase objects i.e
tables.
Level of data security is very high as
compare to DBMS.
Data storage capacity is very high.
Data integrity is one of the mostimportant
features of RDBMS. Itcan be maintained
easily in RDBMS.
CHAPTER-1: DATABASE SYSTEM CONCEPTS
6
7
8
9
Works better in single useror few user
systems.
It supports 3 rules of E.F. Codd.
DBMS normalization processwill not
be present.
e.g:- FoxPro,MS-Access
Works very efficiently and givegood
performance over thenetwork.
It supports minimum 6 rules of E.F.Codd.
RDBMS fully support normalization
e.g:- SQL-server, Oracle,IBM-DB2
6) Describe data abstraction with neat diagram.
Ans: Three levels of abstraction are as follows:
1) Physical level
2) Logical level
3) View level
Three levels of data abstraction
Explanation:
1) Physical Level:
a) It is lowest level of abstraction.
b) This level defines lowest complicated data structure of database system.
c) This level hidden from user.
d) It defines how the data are stored.
2) Logical Level:
a) The level next to physical level is called logical level.
b) This level defines what data stored in the database and what the relationships
among these data are.
c) Fully decides the structure of the entire database.
3) View Level:
a) This level is used to show the part of database to user.
b) There is more complexity in physical as well as logical level so user should not
interact with complicated database.
c) So different view of database can be created for user to interact with database
easily.
7) What is instances and schema?
Ans)
A) Schema:
The overall design of the database is known as schema. The database schemas are
partitioned at different level of abstractions.
1. Physical Schema: Used to describe database design at the physical level. It contain the
definitions of the records stored in the storageand gives various access methods.
2. Logical Schema: Used to describe database design at conceptual level. It is union of
individual subschemas with additional security and integrity constraints.
3. Subschemas:Used to describe database design at view level. A DB may have several
schemas at this level. Subschema as consist of the definition of the logical records and
relationship between them.
CHAPTER-1: DATABASE SYSTEM CONCEPTS
B) Instance:
The collection of information stored in the databases at a particular moment is called as
an instance.
8) Describe data independence with its type.
Ans: Data independence:
The ability to modify a schema definition in one level without affecting a schema
definition in next higher level is called data independence.
There are two types of data independence.
1. Physical data independence
Physical data independence is the ability to change internal level without having change
in conceptual or external level.
2. Logical data independence
Logical data independence is the ability to change conceptual level without having
change in external level or application program.
9) Draw diagram for overall architecture of DBMS.
Ans:
CHAPTER-1: DATABASE SYSTEM CONCEPTS
10) What are the components of DBMS? Explain in brief.
Ans) Components of DBMS are classified in three categories:
1. Query Processor:
a) DML Compiler: It translates DML statements of High level language into low
level instructions that query evaluation engine understands.
b) Embedded DML Pre-Compiler: It converts DML statements embedded in
application program to normal procedural calls in host language.
c) DDL Interpreter: It interprets DDL statements and records them in a set of tables
containing metadata.
d) Query Evaluation Engine: It executes low level instructions generated by DML
compiler and DDL interpreter.
2. Storage Manager Components:
a) Authorization and Integrity Manager: It tests for integrity constraints and
authority of the user.
b) Transaction Manager: It ensures that the database remains in consistent state
despite the failures and that concurrenttransaction execution proceeds without
conflicting.
CHAPTER-1: DATABASE SYSTEM CONCEPTS
c) File Manager: It manages the allocation of space on disk storage& data structures
used to represent information stored on disk.
d) Buffer Manager: It is responsible for fetching data from disk storage into main
memory and deciding what data to cache in memory.
3. Disk Storage:
a) Data Files: It stores the database.
b) Data Dictionary: It stores metadata about the structure of the database.
c) Indices: Provide fast access to data items that hold particular values.
d) Statistical Data: It stores statistical information about the data in the database.
This information is used by query processor to select efficient ways to execute
query.
11) Data Dictionary:
Ans) Data dictionary contains data definition and its characteristics and entity
relationships. This may include names and descriptions of various tables and fields within
database also it includes data types and length of data item. Overall a will designed data
dictionary will help make it easier to build and maintain database.
12) List and explain types of DBMS users.
Ans: List of DBMS user.
a) Naive users
b) Application programmers
c) Sophisticated users
d) Specialized users
Explanation:
a) Native User:
Natïve users are unsophisticated users. They are interact with the system through the
application program.
They give data as input through application program or get output data which is
generated by application program.
Example: Bank cashier.
b) Application programmers:
Application programmers are the users who write the program. These programmers use
programming tools to develop the program. RAD technology is used to write the
program.
c) Sophisticated users:
Sophisticated users interact with the system by making the requests in the form of query
language. These queries are then submitted to the query processor.
Query processor converts the DML statements into lower level interactions which are
understandable by storage manager. Some sophisticated users can be analyst.
d) Specialized users:
These users are not traditional. They write some special application programs which are
not regular applications.
CHAPTER-1: DATABASE SYSTEM CONCEPTS
Example: such types of applications are CAD, knowledge based and expert system.
13) What are the functions of DBA?
Ans) 1. Schema Definition
The Database Administrator creates the database schema by executing DDL statements.
Schema includes the logical structure of database table (Relation) like data types of
attributes, length of attributes, integrity constraints etc.
2. Storage structure and access method definition
The DBA creates appropriate storage structures and access methods by writing a set of
definitions which is translated by data storage and DDL compiler.
3. Schema and physical organization modification
DBA writes set of definitions to modify the database schema or description of physical
storage organization.
4. Granting authorization for data access
The DBA provides different access rights to the users according to their level. Ordinary
users might have highly restricted access to data, while you go up in the hierarchy to the
administrator, you will get more access rights.
Integrity constraints specifications: Integrity constraints are written by DBA and they
are stored in a special file which is accessed by database manager while updating data.
5. Routine Maintenance
Some of the routine maintenance activities of a DBA is given below.
1. Taking backup of database periodically
2. Ensuring enough disk space is available all the time.
3. Monitoring jobs running on the database.
4. Ensure that performance is not degraded by some expensive task submitted by some
users.
14) State the meaning of client server architecture. State the role of server.
Ans:
1. Computer networking allows some task to be executed on a server system and
some tasks on client system. This leads to development of client server
architecture. The clients are the machines which requests for the service to the
server. Server is the machine which serves to the clients.
2. There are different types of client/server architecture such as two-tier, three-tier
architecture.
3. Role of Server: The server is the machine that can provide services to the client
machine such as file access, printing, and database access. It is used to manage
the database tables optimally among multiple clients who concurrently request
the server for the same data.
CHAPTER-1: DATABASE SYSTEM CONCEPTS
15) Explain two tier architecture with diagram
Ans)
1. In a two-tier architecture, the application is partitioned into a component that
resides at the client machine, which invokes database system functionality at the
server machine through query language statements
2. Application program interface stands like O DBC and JDBC are used for
interaction between the client and the server.
3. Two tier architecture is intended to improve usability by supporting a form based,
user friendly interface.
16) Explain three tier architecture with diagram.
Ans)
1. In a three-tier architecture, the client machine acts as merely a front end and does
not contain any direct database calls. Instead, the client end communicates with an
application server, usually through a forms interface.
2. The application server in turn communicates with a database system to access
data. The business logic of the application, which says what actions to carry out
under what conditions, is embedded in the application server, instead of being
distributed across multiple clients.
3. In three tier architecture the communication taken place from client to application
server and then application server to database system to access the data.
CHAPTER-1: DATABASE SYSTEM CONCEPTS
4. The application server or web server is sometimes called middle layer or
intermediate layer. The middle layer which processes applications and database
server processes the queries.
5. This type of communication system is used in the large applications or the world
web applications. On WWW all clients requests for data and server serves it.
6. There are multiple servers used like fax server, proxy server, mail server etc.
17) Explain distributed database with advantages and disadvantages?
A distributed database appears to a user as a single database but is, in fact,a set of
databases stored on multiple computers. The data on severalcomputers can be
simultaneously accessed and modified using a network.
Each database server in the distributed database is controlled by its localDBMS, and each
cooperates to maintain the consistency of the globaldatabase.
The distribution of data and applications has potential advantages over traditional
centralized database systems. Unfortunately, there are also disadvantages of DDBMS.
There are following advantages of DDBMS:
1. Reflects organizational structure
a. Many organizations are naturally distributed over several locations. Forexample, a
bank has many offices in different cities. It is natural fordatabases used in such an
application to be distributed over these locations.
b. A bank may keep a database at each branch office containing details suchthings as
the staff that work at that location, the account informationofcustomers etc.
c. The staff at a branch office will make local inquiries of the database. Thecompany
headquarters may wish to make global inquiries involving the accessof data at all
or a number of branches.
2. Improved share ability and local autonomy
a. The geographical distribution of an organization can be reflected in thedistribution
of the data; users at one site can access data stored at other sites.
b. Data can be placed at the site close to the users who normally use thatdata. In this
way, users have local control of the data, and they canconsequently establish and
enforce local policies regarding the use of this data.
CHAPTER-1: DATABASE SYSTEM CONCEPTS
c. A global database administrator (DBA) is responsible for the entiresystem.
Generally, part of this responsibility is assigned the local level, sothat the local
DBA can manage the local DBMS.
3. Improved availability
a. In a centralized DBMS, a computerfailure terminates the applications of the
DBMS.
b. However, a failure at one site of a DDBMS, or a failure of a communication link
making some sites inaccessible, does not make the entiresystem in opera bite.
c. Distributed DBMS’s are designed to continue tofunction despite such failures. If a
single node fails, the system may be ableto reroute the failed node's requests to
another site.
4. Improved reliability
a. As data may be replicated so that it exists at more than one site, the failureof a
node or a communication link does not necessarily make the datainaccessible.
5. Improved Performance
a. As the data is located near the site of 'greatest demand', and given theinherent
parallelism of distributed DBMSs, speed of database access may bebetter than that
achievable from a remote centralized database.
b. Furthermore, since each site handles only a part of the entire database,there may
not be the same contention for CPUand I/O services ascharacterized by a
centralized DBMS.
6. Economics
a. It is now generally accepted that it costs much less to create a system of smaller
computerswith the equivalent power of a single large computer.
b. Thismakes it more cost effective for corporate divisions and departments to obtain
separate computers.
c. It is also much more cost-effective to addworkstations· to a network than to update
a mainframe system.
d. The second potential cost saving occurs where database are geographicallyremote
and the applications require access to distributed data.
e. In suchcases, owing to the relative expense of data being transmitted across
thenetwork as opposed to the cost of local access, it may be much moreeconomical
to partition the application and perform the processing locally ateach site.
7. Modular growth
a. In a distributed environment, it is much easier to handle expansion. Newsites can
be added to the network without affecting the operations of othersites. This
flexibility allows an organization to expand relatively easily.
b. Adding processing and storage power to the network can usually handle
theincrease in database size.
c. In a centralized DBMS, growth may entail changesto both hardware (the
procurement of a more powerful system) andsoftware (the procurement of a more
powerful or more configurable DBMS).
CHAPTER-1: DATABASE SYSTEM CONCEPTS
There are following disadvantages of DDBMSs:
1. Complexity
a. A distributed DBMS that hides the distributed nature from the user andprovides an
acceptable level of performance, reliability, availability isinherently more complex
than a centralized DBMS.
b. The fact that data canbe replicated also adds an extra level of complexity to the
distributedDBMS.
2. Cost
Increased complexity means that we can expect the procurement andmaintenance costs
for a DDBMS to be higher than those for a centralizedDBMS. Furthermore, a distributed
DBMS requires additional hardware to establish a network between sites.
3. Security
a. In a centralized system, access to the data can be easily controlled.
b. However, in a distributed DBMS not only does access to replicated data haveto be
controlled in multiple locations but also the network itself has to bemade secure.
4. Integrity control more difficult
a. Database integrity refers to the validity and consistency of stored data.
b. Integrity is usually expressed in terms of constraints, which are consistencyrules
that the database is not permitted to violate.
c. Enforcing integrityconstraints generally requires access to a large amount of data
that defines the constraints.
d. In a distributed DBMS, the communication and processingcosts that are required
to enforce integrity constraints are high ascompared to centralized system.
5. Lack of Standards
a. Although distributed DBMSs depend on effective communication, we areonly
now starting to see the appearance of standard communication and data access
protocols.
b. This lack of standards has significantly limited the potential of distributed
DBMS’s.
c. There are also no tools or methodologies tohelp users convert a centralized DBMS
into a distributed DBMS
6. Lack of experience
a. General-purpose distributed DBMSs have not been widely accepted,
althoughmany of the protocols and problems are well understood.
b. Consequently, we donot yet have the same level of experience in industry as we
have with centralized DBMS’s.
c. For a prospective adopter of this technology, this maybe a significant deterrent.
7. Database design more complex
Besides the normal difficulties of designing a centralized database, thedesign of a
distributed database has to take account of fragmentation ofdata, allocation of
fragmentation to specific sites, and data replication.
CHAPTER-1: DATABASE SYSTEM CONCEPTS
18)List and explain Codd’s 12 rules?
Ans)Dr. Edgar F. Codd, after his extensive research on the Relational Model ofdatabase
systems, came up with twelve rules of his own, which according tohim, a database must
obey in order to be regarded as a true relationaldatabase.
These rules can be applied on any database system that manages storeddata using only its
relational capabilities. This is a foundation rule, whichacts as a base for all the other
rules.
The rules are as follows:
Rule 1: Information Rule
The data stored in a database, may it be user data or metadata, must be avalue of some
table cell. Everything in a database must be stored in a tableformat.
Rule 2: Guaranteed Access Rule
Every single data element (value) is guaranteed to be accessible logicallywith a
combination of table-name, primary-key (row value), andattribute-name (column value).
No other means, such as pointers, can beused to access data.
Rule 3: Systematic Treatment of NULL Values
The NULL values in a database must be given a systematic and uniformtreatment. This is
a very important rule because a NULL can be interpretedas one the following − data is
missing, data is not known, or data is notapplicable.
Rule 4: Active Online Catalog
The structure description of the entire database must be stored in anonline catalog, known
as data dictionary, which can be accessed byauthorized users. Users can use the same
query language to access thecatalog which they use to access the database itself.
Rule 5: Comprehensive Data Sub-Language Rule
A database can only be accessed using a language having linear syntax thatsupports data
definition, data manipulation, and transaction managementoperations. This language can
be used directly or by means of someapplication. If the database allows access to data
without any help of thislanguage, then it is considered as a violation.
Rule 6: View Updating Rule
All the views of a database, which can theoretically be updated, must alsobe updatable by
the system.
Rule 7: High-Level Insert, Update, and Delete Rule
A database must support high-level insertion, updatingand deletion. Thismust not be
limited to a single row, that is, it must also support union,intersection and minus
operations to yield sets of data records.
Rule 8: Physical Data Independence
The data stored in a database must be independent of the applications thataccess the
database. Any change in the physical structure of a databasemust not have any impact on
how the data is being accessed by externalapplications.
Rule 9: Logical Data Independence
The logical data in a database must be independent of its user’s view (application). Any
change in logical data must not affect the applicationsusing it. For example, if two tables
CHAPTER-1: DATABASE SYSTEM CONCEPTS
are merged or one is split into twodifferent tables, there should be no impact or change on
the userapplication. This is one of the most difficult rule to apply.
Rule 10: Integrity Independence
A database must be independent of the application that uses it. All itsintegrity constraints
can be independently modified without the need of anychange in the application. This
rule makes a database independent of thefront-end application and its interface.
Rule 11: Distribution Independence
The end-user must not be able to see that the data is distributed overvarious locations.
Users should always get the impression that the data islocated at one site only. This rule
has been regarded as the foundation ofdistributed database systems.
Rule 12: Non-Subversion Rule
If a system has an interface that provides access to low-level records, thenthe interface
must not be able to subvert the system and bypass securityand integrity constraints.
19) Explain data warehouse, data mining. List four features of data mining.
Ans:
Data Warehousing:a. A data warehouse is a repository of information gathered from multiple sources,
stored under a unified schema, at a single site.
b. Once gathered, data are stored for long time, permitting access to historical data.
c. Data warehouses provide the user a single consolidated interface to data, making
decision-support queries easier to write.
d. Moreover, by accessing information for decision support from a data warehouse,
the decision makers ensures that online transaction-processing systems are not
affected by decision support workload
Data Mining:a. Data mining is the exploration and analysis of large quantities of data in order to
discover valid, novel, potentially useful and ultimately understandable patterns in
data.It is known as “Knowledge Discovery in Databases”. When the data is stored
in large quantities in data warehouse, it is necessary to dig the data from the ware
house that is useful and required for further use.
b. For data mining, different software tools are used to analyze, filter and transfer the
data from the data warehouses.
Feature of data mining:
1) Prediction3) Classification
2) Identification4) Optimization.