* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download File
Survey
Document related concepts
Operational transformation wikipedia , lookup
Data center wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data analysis wikipedia , lookup
Versant Object Database wikipedia , lookup
Information privacy law wikipedia , lookup
Disk formatting wikipedia , lookup
Object storage wikipedia , lookup
Data vault modeling wikipedia , lookup
Business intelligence wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Clusterpoint wikipedia , lookup
Transcript
BScIT – Semester 1 BT0066 – Database Management System – 3 Credits Assignment Set – 1 Answer all questions: 1. Differentiate between physical data independence and logical data independence. A: Data independence is usually considered from two points of view: physical data independence and logical data independence. a) Physical data independence allows changes in the physical storage devices or organization of the files to be made without requiring changes in the conceptual view or any of the external views and hence in the application programs using the database.Thus, the file may migrate from one type of physical media to another or the file structure may change without any need for changes in the application programs. b) Logical data independence implies that application programs need not be changed if the fields are added to and existing record; nor do they have to be changed if fields not used by application programs are deleted. Logical data independence indicates that the conceptual schema can be changed without affecting the existing external schemas. Data independence is advantageous in the database, without affecting other levels. These changes are absorbed by the mappings between the levels. Logical data independence is more difficult to achieve then physical independence. Since application programs are heavily dependent on the logical structure of the data they access. 2. Explain the three level architecture of DBMS. A: The three level architecture of DBMS are; a) External level or Subschema: The external level is the highest level of database abstraction where only those portions of the database of concern to a user or application program are included. Any number of users may exist for a given global or conceptual view. Each external view is described by means of a schema called an external schema or subschema. The external schema consists of the definition of the logical records and the relationships in the external view. b) Conceptual level or Conceptual Schema: At this level of database abstraction all the database entities and the relationships among them are included. One conceptual view represents the entire database. This conceptual view is defined by the conceptual schema. It describes all the records and relationships included in the conceptual view and, therefore, in the database. There is only one conceptual schema per database. This schema also contains the method of deriving the objects in the conceptual view from the objects in the internal view. The description of data at this level is in a format independent of its physical representation. It also includes features that specify the checks to retain data consistency and integrity. c) Internal level or Physical Schema: We find this view at the lowest level of abstraction, closest to the physical storage method used. It indicates how the data will be stored and describes the data structures and access methods to be used by the database. The internal view is expressed by the internal schema, which contains the definition of the stored record, the method of representing the data fields, and the access aids used. External Schema 1 External Schema 2 External Schema 3 Conceptual Schema Conceptual Schema Disk Figure: The three levels architecture for a DBMS. 3. Explain the distinction among the terms primary key, candidate key and super key. A: The distinctions are as follows; Super key - A super key is a set of one or more attributes that, taken collectively, allow us to uniquely identify a tuple in the relation. E.g. :The customer-id attribute of the relation customer is sufficient to distinguish one customer tuple from another. Thus customer-id is a super key. Candidate key - Super keys for which no proper subset is a super key are called candidate keys. Several distinct sets of attributes can serve as a candidate key. E.g. :- Suppose the combination {customer-name, customer-street} uniquely identifies a relation in a table. {Customer-id} can also identify a relation uniquely. Then, the combination {customer-name, customer-id} is not a candidate key because customer-id itself alone is a candidate key. Primary key - It is a candidate key that is chosen by the database designer as the principle means of identifying tuples within a relation. The primary key should be chosen such that its values are never, or rarely changed. E.g. :- the column containing department numbers in the S_DEPT table is created as a primary key and therefore every department number is different. 4. Explain the various storage devices and their characteristics. A: Several types of data storage exist in most computer systems. These storage media are classified by the speed with which the data can be accessed, by the cost per unit of data to buy the medium, and by the medium’s reliability. Among the media typically available are these: a) Cache:- The cache is the fastest and most costly form of storage. Cache memory is small; its use is managed by the computer system hardware. b) Main Memory:- The storage medium used for data that are available to be operated on is main memory. The general purpose machine instructions operate on main memory. Although main memory may contain many megabytes of data, or even gigabytes of data in large server systems, it is generally too small( or too expensive) for storing the entire database. The contents of the main memory are usually lost if a power failure or system crash occurs. c) Flash memory:- Also known as electrically erasable programmable read-only memory (EEPROM), flash memory differs from main memory in that data survive power failure. Reading data from flash memory takes less than 100 nanoseconds (a nanosecond is 1/1000 of a microsecond), which is roughly as fast as reading data from main memory. d) Magnetic-disk storage:- The primary medium for the long-term online storage of data is the magnetic disk. Usually, the entire database is stored on magnetic disk. The system must move the data from disk to main memory, so that they can be accessed. After the system has performed the designated operations, the data that have been modified must be written to disk. e) Optical storage:- The most popular forms of optical storage are the compact disks (CD), which can hold about 640 megabytes of data, and the digital video disk (DVD) which can hold 4.7 or 8.5 gigabytes of data per side of the disk (or up to 17 gigabytes on a two-sided disk). Data are stored optically on a disk, and are read by a laser. The optical disks used in read-only compact disks (CDROM) or read-only digital video disk (DVD-ROM) cannot be written, but are supplied with data pre-recorded. f) Tape storage:- Tape storage is used primarily for backup and archival data. Although magnetic tape is much cheaper than disks, access to data is much slower, because the tape must be accessed sequentially from the beginning. For this reason, tape storage is referred to as sequential-access storage. In contrast, disk storage is referred to as direct-access storage because it is possible to read data from any location on disk. The fastest storage media – for example, cache and main memory – are referred to as primary storage. The media in the next level in the hierarchy – for example, magnetic disks – are referred to as secondary storage, or online storage. The media in the lowest level in the hierarchy – for example, magnetic tape and optical-disk jukeboxes – are referred to as tertiary storage, or offline storage. In addition to the speed and cost of the various storage systems, there is also the issue of storage volatility. Volatile storage loses its contents when the power to the device is removed. In the hierarchy shown in Figure 4.1, the storage systems from main memory up are volatile, whereas the storage systems below main memory are nonvolatile. In the absence of expensive battery and generator backup systems, data must be written to non-volatile storage for safe keeping. 5. What are the benefits of making the system catalogs relations? A: We can store a relation using one of several alternative file structures, and we can create one or more indexes each stored as a file on every relation. Conversely, in a relational DBMS, every file contains either the tuples in a relation or the entries in an index. The collection of files corresponding to users’ relations and indexes represents the data in the database. A fundamental property of a database system is that it maintains a description of all the data that it contains. A relational DBMS maintains information about every relation and index that it contains. The DBMS also maintains information about views, for which no tuples are stored explicitly; rather, a definition of the view is stored and used to compute the tuples that belong in the view when the view is queried. This information is stored in a collection of relations, maintained by the system, called the Catalog relations. The catalog relations are also called the System catalog, the catalog, or the Data dictionary. The system catalog is sometimes referred to as Metadata; that is, not data, but descriptive information about the data. The information in the system catalog is used extensively for query optimization There are several advantages to storing the system catalogs as relations. Relational system catalogs take advantage of all of the implementation and management benefits of relational tables: effective information storage and rich querying capabilities. The choice of what system catalogs to maintain is left to the DBMS implementer. 6. Explain the statement that relational algebra operators can be composed. Why the ability to compose operators is important? A: Every operator in relational algebra accepts one or more relation in-stances as arguments and the result is always an relation instance. So the argument of one operator could be the result of another operator. This is important because, this makes it easy to write complex queries by simply composing the relational algebra operators. 7. What is an unsafe query? Give an example and explain why it is important to disallow such queries. A: An unsafe query is a query in relational calculus that has an infinite number of results. An example of such a query is: {S|¬(S (- Sailors)} The query is for all things that are not sailors which of course is everything else. Clearly there is an infinite number of answers, and this query is unsafe. It is important to disallow unsafe queries because we want to be able to get back to users with a list of all the answers to a query after a finite amount of time. 8. Define the term functional dependency. A: A Functional Dependency describes a relationship between attributes in a single relation. An attribute is functionally dependant on another if we can use the value of one attribute to determine the value of another. E.g. Employee_Name is functionally dependant on Social_Security_Number because Social_Security_Number can be used to determine the value of Employee_Name. We use the symbol -> to indicate a functional dependency. -> is read functionally determines. Student_ID -> Student_Major Student_ID, Course#, Semester# -> Grade SKU -> Compact_Disk_Title, Artist Model, Options, Tax -> Car_Price Course_Number, Section -> Professor, Classroom, Number of Students The attributes listed on the left hand side of the -> are called determinants. One can read A -> B as, "A determines B". 9. Discuss the relative advantages of centralized and distributed database. A: Advantages of Distributed Systems over Centralized ones are: a) Incremental growth: Computing power can be added in small increments b) Reliability: If one machine crashes, the system as a whole can still survive c) Speed: A distributed system may have more total computing power than a mainframe. d) Open system: This is the most important point and the most characteristic point of a distributed system. Since it is an open system it is always ready to communicate with other systems. An open system that scales has an advantage over a perfectly closed and self-contained system. e) Economic: AND Microprocessors offer a better price/performance than mainframes. Disadvantages of Distributed Systems over Centralized ones are: a) Security: As previously told you distributed systems will have an inherent security issue. b) Networking: If the network gets saturated then problems with transmission will surface. c) Software: There is currently very little less software support for Distributed system. d) Troubleshooting: Troubleshooting and diagnosing problems in a distributed system can also become more difficult, because the analysis may require connecting to remote nodes or inspecting communication between nodes. 10. List a few requirements for multimedia data management. A: The goal of a multimedia database management system (MMDBMS) is to provide a suitable environment for using and managing multimedia database information. Hence, it must include the traditional DBMS functions (e.g., database definition and creation, data retrieval, data access and organization, data independence, privacy, integration, integrity control, version control and concurrency support but applied to various multimedia data types. The functional requirements imposed on a MMDBMS can be grouped into two categories [DATAPRO]: data representation requirements and data manipulation requirements. a) Support for Generalization/Specialization Hierarchy: A major requirement that is imposed on multimedia applications is the support for generalization/specialization hierarchy. This hierarchy is used to define the type, subtype, and instance relationships between the various entities. E.g. all the documents could be grouped into a type called “document type”. If a new document is created, then it is made an instance of this type. Certain documents could have some special properties. E.g. a book document could have properties, which are different from a newspaper document. Therefore, the collection of books could be grouped into a type called “book document type”. Since a book is also a document, the type “book document type” is made a subtype of the type “document type”. Support for generalization/specification hierarchy also facilitates schema evolution. b) Attribute Specification: Support for specifying the properties of a document (such as its title, author, font size, etc.) should be provided. These properties are also called the attributes of a document. c) Specifications of Operations: Another requirement is the ability to specify the operations that can be performed on a multimedia document. E.g. it should be possible to change the font size of the document, change the contents of a document, retrieve the contents of the document, etc. d) Support for Composite Objects: A major requirement for modeling multimedia applications is the support for composite objects. E.g. a document may be composed of front matter, body, and back matter. The front matter may be in turn composed of cover page, abstract, preface, acknowledgments, and table of contents. The cover page may consist of title, authors, organization, date of publications, and sponsors, etc. e) Object Sharing: Object sharing is the capability for different documents to share parts of their contents. Such a capability is especially necessary for multimedia documents as the amounts of storage space required to store a document might be quite large. It should be possible to represent the fact that different documents share portions of their contents. f) Ordering of Documents: The presentation of the paragraphs, images, and drawings of a multimedia document could depend on the users accessing the document. Usually constraints are imposed on the presentation of the document. g) Support for Multimedia Data: This major requirement includes extensibility, where new multimedia devices as well as new functions on multimedia information can be incorporated with ease. h) Data Independence: Database and the management functions must be separated from the application programs.