Download File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Operational transformation wikipedia , lookup

Data center wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data analysis wikipedia , lookup

Data model wikipedia , lookup

Versant Object Database wikipedia , lookup

SAP IQ wikipedia , lookup

Information privacy law wikipedia , lookup

Database wikipedia , lookup

Disk formatting wikipedia , lookup

Object storage wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
BScIT – Semester 1
BT0066 – Database Management System – 3 Credits
Assignment Set – 1
Answer all questions:
1. Differentiate between physical data independence and logical
data independence.
A: Data independence is usually considered from two points of view:
physical data independence and logical data independence.
a) Physical data independence allows changes in the physical
storage devices or organization of the files to be made without
requiring changes in the conceptual view or any of the external
views and hence in the application programs using the
database.Thus, the file may migrate from one type of physical
media to another or the file structure may change without any
need for changes in the application programs.
b) Logical data independence implies that application programs
need not be changed if the fields are added to and existing
record; nor do they have to be changed if fields not used by
application programs are deleted. Logical data independence
indicates that the conceptual schema can be changed without
affecting the existing external schemas. Data independence is
advantageous in the database, without affecting other levels.
These changes are absorbed by the mappings between the
levels. Logical data independence is more difficult to achieve
then physical independence. Since application programs are
heavily dependent on the logical structure of the data they
access.
2. Explain the three level architecture of DBMS.
A: The three level architecture of DBMS are;
a) External level or Subschema: The external level is the highest level
of database abstraction where only those portions of the
database of concern to a user or application program are
included. Any number of users may exist for a given global or
conceptual view. Each external view is described by means of a
schema called an external schema or subschema. The external
schema consists of the definition of the logical records and the
relationships in the external view.
b) Conceptual level or Conceptual Schema: At this level of
database abstraction all the database entities and the
relationships among them are included. One conceptual view
represents the entire database. This conceptual view is defined by
the conceptual schema. It describes all the records and
relationships included in the conceptual view and, therefore, in
the database. There is only one conceptual schema per
database. This schema also contains the method of deriving the
objects in the conceptual view from the objects in the internal
view. The description of data at this level is in a format
independent of its physical representation. It also includes
features that specify the checks to retain data consistency and
integrity.
c) Internal level or Physical Schema: We find this view at the lowest
level of abstraction, closest to the physical storage method used.
It indicates how the data will be stored and describes the data
structures and access methods to be used by the database. The
internal view is expressed by the internal schema, which contains
the definition of the stored record, the method of representing the
data fields, and the access aids used.
External Schema 1
External Schema 2
External Schema 3
Conceptual Schema
Conceptual Schema
Disk
Figure: The three levels architecture for a DBMS.
3. Explain the distinction among the terms primary key, candidate
key and super key.
A: The distinctions are as follows;
Super key - A super key is a set of one or more attributes that, taken
collectively, allow us to uniquely identify a tuple in the relation. E.g. :The customer-id attribute of the relation customer is sufficient to
distinguish one customer tuple from another. Thus customer-id is a
super key.
Candidate key - Super keys for which no proper subset is a super key
are called candidate keys. Several distinct sets of attributes can
serve as a candidate key. E.g. :- Suppose the combination
{customer-name, customer-street} uniquely identifies a relation in a
table. {Customer-id} can also identify a relation uniquely. Then, the
combination {customer-name, customer-id} is not a candidate key
because customer-id itself alone is a candidate key.
Primary key - It is a candidate key that is chosen by the database
designer as the principle means of identifying tuples within a relation.
The primary key should be chosen such that its values are never, or
rarely changed. E.g. :- the column containing department numbers
in the S_DEPT table is created as a primary key and therefore every
department number is different.
4. Explain the various storage devices and their characteristics.
A: Several types of data storage exist in most computer systems.
These storage media are classified by the speed with which the data
can be accessed, by the cost per unit of data to buy the medium,
and by the medium’s reliability. Among the media typically
available are these:
a) Cache:- The cache is the fastest and most costly form of storage.
Cache memory is small; its use is managed by the computer
system hardware.
b) Main Memory:- The storage medium used for data that are
available to be operated on is main memory. The general
purpose machine instructions operate on main memory. Although
main memory may contain many megabytes of data, or even
gigabytes of data in large server systems, it is generally too small(
or too expensive) for storing the entire database. The contents of
the main memory are usually lost if a power failure or system crash
occurs.
c) Flash memory:- Also known as electrically erasable
programmable read-only memory (EEPROM), flash memory differs
from main memory in that data survive power failure. Reading
data from flash memory takes less than 100 nanoseconds (a
nanosecond is 1/1000 of a microsecond), which is roughly as fast
as reading data from main memory.
d) Magnetic-disk storage:- The primary medium for the long-term online storage of data is the magnetic disk. Usually, the entire
database is stored on magnetic disk. The system must move the
data from disk to main memory, so that they can be accessed.
After the system has performed the designated operations, the
data that have been modified must be written to disk.
e) Optical storage:- The most popular forms of optical storage are
the compact disks (CD), which can hold about 640 megabytes of
data, and the digital video disk (DVD) which can hold 4.7 or 8.5
gigabytes of data per side of the disk (or up to 17 gigabytes on a
two-sided disk). Data are stored optically on a disk, and are read
by a laser. The optical disks used in read-only compact disks (CDROM) or read-only digital video disk (DVD-ROM) cannot be
written, but are supplied with data pre-recorded.
f) Tape storage:- Tape storage is used primarily for backup and
archival data. Although magnetic tape is much cheaper than
disks, access to data is much slower, because the tape must be
accessed sequentially from the beginning. For this reason, tape
storage is referred to as sequential-access storage. In contrast,
disk storage is referred to as direct-access storage because it is
possible to read data from any location on disk.
The fastest storage media – for example, cache and main memory –
are referred to as primary storage. The media in the next level in the
hierarchy – for example, magnetic disks – are referred to as
secondary storage, or online storage. The media in the lowest level in
the hierarchy – for example, magnetic tape and optical-disk
jukeboxes – are referred to as tertiary storage, or offline storage.
In addition to the speed and cost of the various storage systems,
there is also the issue of storage volatility. Volatile storage loses its
contents when the power to the device is removed. In the hierarchy
shown in Figure 4.1, the storage systems from main memory up are
volatile, whereas the storage systems below main memory are
nonvolatile. In the absence of expensive battery and generator
backup systems, data must be written to non-volatile storage for safe
keeping.
5. What are the benefits of making the system catalogs relations?
A: We can store a relation using one of several alternative file
structures, and we can create one or more indexes each stored as a
file on every relation. Conversely, in a relational DBMS, every file
contains either the tuples in a relation or the entries in an index. The
collection of files corresponding to users’ relations and indexes
represents the data in the database.
A fundamental property of a database system is that it maintains a
description of all the data that it contains. A relational DBMS
maintains information about every relation and index that it
contains. The DBMS also maintains information about views, for
which no tuples are stored explicitly; rather, a definition of the view is
stored and used to compute the tuples that belong in the view when
the view is queried. This information is stored in a collection of
relations, maintained by the system, called the Catalog relations.
The catalog relations are also called the System catalog, the
catalog, or the Data dictionary. The system catalog is sometimes
referred to as Metadata; that is, not data, but descriptive
information about the data. The information in the system catalog is
used extensively for query optimization
There are several advantages to storing the system catalogs as
relations. Relational system catalogs take advantage of all of the
implementation and management benefits of relational tables:
effective information storage and rich querying capabilities. The
choice of what system catalogs to maintain is left to the DBMS
implementer.
6. Explain the statement that relational algebra operators can be
composed. Why the ability to compose operators is important?
A: Every operator in relational algebra accepts one or more relation
in-stances as arguments and the result is always an relation instance.
So the argument of one operator could be the result of another
operator. This is important because, this makes it easy to write
complex queries by simply composing the relational algebra
operators.
7. What is an unsafe query? Give an example and explain why it is
important to disallow such queries.
A: An unsafe query is a query in relational calculus that has an
infinite number of results. An example of such a query is:
{S|¬(S (- Sailors)}
The query is for all things that are not sailors which of course is
everything else. Clearly there is an infinite number of answers, and
this query is unsafe. It is important to disallow unsafe queries because
we want to be able to get back to users with a list of all the answers
to a query after a finite amount of time.
8. Define the term functional dependency.
A: A Functional Dependency describes a relationship between
attributes in a single relation. An attribute is functionally dependant
on another if we can use the value of one attribute to determine the
value of another.
E.g. Employee_Name is functionally dependant on
Social_Security_Number because Social_Security_Number can be
used to determine the value of Employee_Name.
We use the symbol -> to indicate a functional dependency. -> is
read functionally determines.
Student_ID -> Student_Major Student_ID, Course#, Semester# ->
Grade SKU -> Compact_Disk_Title, Artist Model, Options, Tax ->
Car_Price Course_Number, Section -> Professor, Classroom, Number
of Students
The attributes listed on the left hand side of the -> are called
determinants. One can read A -> B as, "A determines B".
9. Discuss the relative advantages of centralized and distributed
database.
A: Advantages of Distributed Systems over Centralized ones are:
a) Incremental growth: Computing power can be added in small
increments
b) Reliability: If one machine crashes, the system as a whole can still
survive
c) Speed: A distributed system may have more total computing
power than a mainframe.
d) Open system: This is the most important point and the most
characteristic point of a distributed system. Since it is an open
system it is always ready to communicate with other systems. An
open system that scales has an advantage over a perfectly
closed and self-contained system.
e) Economic: AND Microprocessors offer a better
price/performance than mainframes.
Disadvantages of Distributed Systems over Centralized ones are:
a) Security: As previously told you distributed systems will have an
inherent security issue.
b) Networking: If the network gets saturated then problems with
transmission will surface.
c) Software: There is currently very little less software support for
Distributed system.
d) Troubleshooting: Troubleshooting and diagnosing problems in a
distributed system can also become more difficult, because the
analysis may require connecting to remote nodes or inspecting
communication between nodes.
10.
List a few requirements for multimedia data management.
A: The goal of a multimedia database management system
(MMDBMS) is to provide a suitable environment for using and
managing multimedia database information. Hence, it must include
the traditional DBMS functions (e.g., database definition and
creation, data retrieval, data access and organization, data
independence, privacy, integration, integrity control, version control
and concurrency support but applied to various multimedia data
types. The functional requirements imposed on a MMDBMS can be
grouped into two categories [DATAPRO]: data representation
requirements and data manipulation requirements.
a) Support for Generalization/Specialization Hierarchy: A major
requirement that is imposed on multimedia applications is the
support for generalization/specialization hierarchy. This
hierarchy is used to define the type, subtype, and instance
relationships between the various entities. E.g. all the
documents could be grouped into a type called “document
type”. If a new document is created, then it is made an
instance of this type. Certain documents could have some
special properties. E.g. a book document could have
properties, which are different from a newspaper document.
Therefore, the collection of books could be grouped into a
type called “book document type”. Since a book is also a
document, the type “book document type” is made a subtype
of the type “document type”. Support for
generalization/specification hierarchy also facilitates schema
evolution.
b) Attribute Specification: Support for specifying the properties of
a document (such as its title, author, font size, etc.) should be
provided. These properties are also called the attributes of a
document.
c) Specifications of Operations: Another requirement is the ability
to specify the operations that can be performed on a
multimedia document. E.g. it should be possible to change the
font size of the document, change the contents of a
document, retrieve the contents of the document, etc.
d) Support for Composite Objects: A major requirement for
modeling multimedia applications is the support for composite
objects. E.g. a document may be composed of front matter,
body, and back matter. The front matter may be in turn
composed of cover page, abstract, preface,
acknowledgments, and table of contents. The cover page
may consist of title, authors, organization, date of publications,
and sponsors, etc.
e) Object Sharing: Object sharing is the capability for different
documents to share parts of their contents. Such a capability is
especially necessary for multimedia documents as the amounts
of storage space required to store a document might be quite
large. It should be possible to represent the fact that different
documents share portions of their contents.
f) Ordering of Documents: The presentation of the paragraphs,
images, and drawings of a multimedia document could
depend on the users accessing the document. Usually
constraints are imposed on the presentation of the document.
g) Support for Multimedia Data: This major requirement includes
extensibility, where new multimedia devices as well as new
functions on multimedia information can be incorporated with
ease.
h) Data Independence: Database and the management
functions must be separated from the application programs.