* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download chapter 1 notes
Data center wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
Versant Object Database wikipedia , lookup
Concurrency control wikipedia , lookup
3D optical data storage wikipedia , lookup
Data vault modeling wikipedia , lookup
Relational model wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Business intelligence wikipedia , lookup
CHAPTER-1: DATABASE SYSTEM CONCEPTS 1) Define data, database and DBMS. List any two applications of DBMS. Data: A set of isolated and unrelated raw facts with an implicit meaning. Or Data is a representation of facts, concepts or instructions in a formalized manner suitable for communication, interpretation or processing by humans or by automated means. E.g student database file contain name, age, enroll-no .This all information is called as data stored in the database. Database: A database is a collection of information that is organized so that it can easily be accessed, managed, and updated. DBMS: A database-management system is a collection of interrelated data and a set of programs to access those data. Applications of DBMS: 1. Banking 2. Airlines and railways 3. Sales 4. Telecommunications 5. Universities. 6. Manufacturing 7. E-commerce 8. Credit card transactions. 2)Explain two disadvantages of file processing system. Ans) Disadvantages of file processing system 1. Data redundancy and inconsistency: Since the files and application programs are created by different programmers over a long period, the various files are likely to have different formats and the programs may be written in several programming languages. So, the same information may be duplicated in several places (files).That repetition of information is known as redundancy. This redundancy leads to higher storage and access cost. In addition, it may lead to data inconsistency, which is different copies of the same data may have different values. 2. Difficulty in accessing data: The conventional file processing system do not allow to access data in a convenient and efficient way. As the data was scattered in different files and whenever need arises different application programs were written by different programmers in different formats. 3. Data isolation: As the data scattered in various files and files may be in different formats, writing new application programs to retrieve the data is very difficult. 4. Integrity problems: CHAPTER-1: DATABASE SYSTEM CONCEPTS The data values stored in the database must satisfy certain types of consistency constraints. When new constraints are added, it is difficult to change the programs to enforce them. 5. Atomicity problems: A computer system, like any other mechanical or electrical device, is subject to failure. If any failure occurs in the system the transaction which are executing should fully get executed or should not, so that database remains in consistent state. 6. Concurrent-access anomalies: To improve the performance of the system, multiple transactions must get executed concurrently. Multiple transactions may be updating the same data concurrently. In such case the data may result in inconsistent state 7. Security problems:Only authorized person should be able to modify the data. Security should be maintained at different levels which were not possible in file processing system. 3) Explain Data base Redundancy and Integrity. Ans: Data redundancy: Data redundancy is the unnecessary repetition of data. Since different programmers create the files and application programs over a long period, the various files are likely to have different structures and programs may be written in several programming languages. The same piece of information or program may be duplicated in several places.E.g accountingdepartment and registration department both keep student name, number and address. Data Redundancy: 1. Increases the size of the database unnecessarily 2. Causes data inconsistency. 3. Increases the access cost and decreases efficiency of database. 4. May cause data corruption. Such data redundancy in DBMS can be prevented by database normalization. Data integrity: 1. Data integrity refers to maintaining and assuring the accuracy and consistency of data over its entire life-cycle. 2. Data integrity is usually imposed during the database design phase through the use of standard procedures and rules. 3. Data integrity can be maintained through the use of various error checking methods and validation procedures. E.g The balance of certain type of bank account may never fall below a prescribed amount (Rs.5000). We can handle this through program code and declaring integrity constraint along with definition. CHAPTER-1: DATABASE SYSTEM CONCEPTS 4) Explain any four functions of DBMS. Ans)Database Communication Interfaces: The end-user's requests for database access are transmitted to DBMS in the form of communication messages. 1. Authorization / Security Management: The DBMS protects the database against unauthorized access, either internationalor accidental. It furnishes mechanism to ensure that onlyauthorized users can access the database. 2. Backup and Recovery Management: The DBMS providesmechanisms for backing up data periodically and recoveringfrom different types of failures. This prevents the loss of data, 3. Concurrency Control Service: Since DBMSs support sharing ofdata among multiple users, they must provide a mechanism formanaging concurrent access to the database. DBMSs ensurethat the database kept in consistent state and that integrity ofthe data is preserved. 4. Transaction Management: A transaction is a series ofdatabase operations, carried out by a single user or applicationprogram, which accesses or changes the contents of thedatabase. Therefore, a DBMS must provide a mechanism toensure either that all the updates corresponding to a giventransaction are made or that none of them is made. 5. Database Access and Application Programming Interfaces:All DBMS provide interface to enable applications to use DBMSservices. They provide data access via Structured QueryLanguage (SQL). The DBMS query language contains twocomponents: (a) a Data Definition Language (DDL) and (b) a DataManipulation Language (DML). 6. Data integrity and consistency: to provide data integrity anddata consistency, the DBMS uses sophisticated algorithms toensure that multiple user can access the database concurrentlywithout compromising the integrity of the database 5) Difference between DBMS and RDBMS. Ans: Sr. DBMS No. Old version of software tohandle the 1 databases. In DBMS no relationship concept. 2 3 4 5 . Data security is low as compare to RDBMS Data storage capacity is lessas compare to RDBMS. Not easy to maintain dataintegrity. RDBMS Latest version of software forhandling databases. It is used to establish therelationship concept between todatabase objects i.e tables. Level of data security is very high as compare to DBMS. Data storage capacity is very high. Data integrity is one of the mostimportant features of RDBMS. Itcan be maintained easily in RDBMS. CHAPTER-1: DATABASE SYSTEM CONCEPTS 6 7 8 9 Works better in single useror few user systems. It supports 3 rules of E.F. Codd. DBMS normalization processwill not be present. e.g:- FoxPro,MS-Access Works very efficiently and givegood performance over thenetwork. It supports minimum 6 rules of E.F.Codd. RDBMS fully support normalization e.g:- SQL-server, Oracle,IBM-DB2 6) Describe data abstraction with neat diagram. Ans: Three levels of abstraction are as follows: 1) Physical level 2) Logical level 3) View level Three levels of data abstraction Explanation: 1) Physical Level: a) It is lowest level of abstraction. b) This level defines lowest complicated data structure of database system. c) This level hidden from user. d) It defines how the data are stored. 2) Logical Level: a) The level next to physical level is called logical level. b) This level defines what data stored in the database and what the relationships among these data are. c) Fully decides the structure of the entire database. 3) View Level: a) This level is used to show the part of database to user. b) There is more complexity in physical as well as logical level so user should not interact with complicated database. c) So different view of database can be created for user to interact with database easily. 7) What is instances and schema? Ans) A) Schema: The overall design of the database is known as schema. The database schemas are partitioned at different level of abstractions. 1. Physical Schema: Used to describe database design at the physical level. It contain the definitions of the records stored in the storageand gives various access methods. 2. Logical Schema: Used to describe database design at conceptual level. It is union of individual subschemas with additional security and integrity constraints. 3. Subschemas:Used to describe database design at view level. A DB may have several schemas at this level. Subschema as consist of the definition of the logical records and relationship between them. CHAPTER-1: DATABASE SYSTEM CONCEPTS B) Instance: The collection of information stored in the databases at a particular moment is called as an instance. 8) Describe data independence with its type. Ans: Data independence: The ability to modify a schema definition in one level without affecting a schema definition in next higher level is called data independence. There are two types of data independence. 1. Physical data independence Physical data independence is the ability to change internal level without having change in conceptual or external level. 2. Logical data independence Logical data independence is the ability to change conceptual level without having change in external level or application program. 9) Draw diagram for overall architecture of DBMS. Ans: CHAPTER-1: DATABASE SYSTEM CONCEPTS 10) What are the components of DBMS? Explain in brief. Ans) Components of DBMS are classified in three categories: 1. Query Processor: a) DML Compiler: It translates DML statements of High level language into low level instructions that query evaluation engine understands. b) Embedded DML Pre-Compiler: It converts DML statements embedded in application program to normal procedural calls in host language. c) DDL Interpreter: It interprets DDL statements and records them in a set of tables containing metadata. d) Query Evaluation Engine: It executes low level instructions generated by DML compiler and DDL interpreter. 2. Storage Manager Components: a) Authorization and Integrity Manager: It tests for integrity constraints and authority of the user. b) Transaction Manager: It ensures that the database remains in consistent state despite the failures and that concurrenttransaction execution proceeds without conflicting. CHAPTER-1: DATABASE SYSTEM CONCEPTS c) File Manager: It manages the allocation of space on disk storage& data structures used to represent information stored on disk. d) Buffer Manager: It is responsible for fetching data from disk storage into main memory and deciding what data to cache in memory. 3. Disk Storage: a) Data Files: It stores the database. b) Data Dictionary: It stores metadata about the structure of the database. c) Indices: Provide fast access to data items that hold particular values. d) Statistical Data: It stores statistical information about the data in the database. This information is used by query processor to select efficient ways to execute query. 11) Data Dictionary: Ans) Data dictionary contains data definition and its characteristics and entity relationships. This may include names and descriptions of various tables and fields within database also it includes data types and length of data item. Overall a will designed data dictionary will help make it easier to build and maintain database. 12) List and explain types of DBMS users. Ans: List of DBMS user. a) Naive users b) Application programmers c) Sophisticated users d) Specialized users Explanation: a) Native User: Natïve users are unsophisticated users. They are interact with the system through the application program. They give data as input through application program or get output data which is generated by application program. Example: Bank cashier. b) Application programmers: Application programmers are the users who write the program. These programmers use programming tools to develop the program. RAD technology is used to write the program. c) Sophisticated users: Sophisticated users interact with the system by making the requests in the form of query language. These queries are then submitted to the query processor. Query processor converts the DML statements into lower level interactions which are understandable by storage manager. Some sophisticated users can be analyst. d) Specialized users: These users are not traditional. They write some special application programs which are not regular applications. CHAPTER-1: DATABASE SYSTEM CONCEPTS Example: such types of applications are CAD, knowledge based and expert system. 13) What are the functions of DBA? Ans) 1. Schema Definition The Database Administrator creates the database schema by executing DDL statements. Schema includes the logical structure of database table (Relation) like data types of attributes, length of attributes, integrity constraints etc. 2. Storage structure and access method definition The DBA creates appropriate storage structures and access methods by writing a set of definitions which is translated by data storage and DDL compiler. 3. Schema and physical organization modification DBA writes set of definitions to modify the database schema or description of physical storage organization. 4. Granting authorization for data access The DBA provides different access rights to the users according to their level. Ordinary users might have highly restricted access to data, while you go up in the hierarchy to the administrator, you will get more access rights. Integrity constraints specifications: Integrity constraints are written by DBA and they are stored in a special file which is accessed by database manager while updating data. 5. Routine Maintenance Some of the routine maintenance activities of a DBA is given below. 1. Taking backup of database periodically 2. Ensuring enough disk space is available all the time. 3. Monitoring jobs running on the database. 4. Ensure that performance is not degraded by some expensive task submitted by some users. 14) State the meaning of client server architecture. State the role of server. Ans: 1. Computer networking allows some task to be executed on a server system and some tasks on client system. This leads to development of client server architecture. The clients are the machines which requests for the service to the server. Server is the machine which serves to the clients. 2. There are different types of client/server architecture such as two-tier, three-tier architecture. 3. Role of Server: The server is the machine that can provide services to the client machine such as file access, printing, and database access. It is used to manage the database tables optimally among multiple clients who concurrently request the server for the same data. CHAPTER-1: DATABASE SYSTEM CONCEPTS 15) Explain two tier architecture with diagram Ans) 1. In a two-tier architecture, the application is partitioned into a component that resides at the client machine, which invokes database system functionality at the server machine through query language statements 2. Application program interface stands like O DBC and JDBC are used for interaction between the client and the server. 3. Two tier architecture is intended to improve usability by supporting a form based, user friendly interface. 16) Explain three tier architecture with diagram. Ans) 1. In a three-tier architecture, the client machine acts as merely a front end and does not contain any direct database calls. Instead, the client end communicates with an application server, usually through a forms interface. 2. The application server in turn communicates with a database system to access data. The business logic of the application, which says what actions to carry out under what conditions, is embedded in the application server, instead of being distributed across multiple clients. 3. In three tier architecture the communication taken place from client to application server and then application server to database system to access the data. CHAPTER-1: DATABASE SYSTEM CONCEPTS 4. The application server or web server is sometimes called middle layer or intermediate layer. The middle layer which processes applications and database server processes the queries. 5. This type of communication system is used in the large applications or the world web applications. On WWW all clients requests for data and server serves it. 6. There are multiple servers used like fax server, proxy server, mail server etc. 17) Explain distributed database with advantages and disadvantages? A distributed database appears to a user as a single database but is, in fact,a set of databases stored on multiple computers. The data on severalcomputers can be simultaneously accessed and modified using a network. Each database server in the distributed database is controlled by its localDBMS, and each cooperates to maintain the consistency of the globaldatabase. The distribution of data and applications has potential advantages over traditional centralized database systems. Unfortunately, there are also disadvantages of DDBMS. There are following advantages of DDBMS: 1. Reflects organizational structure a. Many organizations are naturally distributed over several locations. Forexample, a bank has many offices in different cities. It is natural fordatabases used in such an application to be distributed over these locations. b. A bank may keep a database at each branch office containing details suchthings as the staff that work at that location, the account informationofcustomers etc. c. The staff at a branch office will make local inquiries of the database. Thecompany headquarters may wish to make global inquiries involving the accessof data at all or a number of branches. 2. Improved share ability and local autonomy a. The geographical distribution of an organization can be reflected in thedistribution of the data; users at one site can access data stored at other sites. b. Data can be placed at the site close to the users who normally use thatdata. In this way, users have local control of the data, and they canconsequently establish and enforce local policies regarding the use of this data. CHAPTER-1: DATABASE SYSTEM CONCEPTS c. A global database administrator (DBA) is responsible for the entiresystem. Generally, part of this responsibility is assigned the local level, sothat the local DBA can manage the local DBMS. 3. Improved availability a. In a centralized DBMS, a computerfailure terminates the applications of the DBMS. b. However, a failure at one site of a DDBMS, or a failure of a communication link making some sites inaccessible, does not make the entiresystem in opera bite. c. Distributed DBMS’s are designed to continue tofunction despite such failures. If a single node fails, the system may be ableto reroute the failed node's requests to another site. 4. Improved reliability a. As data may be replicated so that it exists at more than one site, the failureof a node or a communication link does not necessarily make the datainaccessible. 5. Improved Performance a. As the data is located near the site of 'greatest demand', and given theinherent parallelism of distributed DBMSs, speed of database access may bebetter than that achievable from a remote centralized database. b. Furthermore, since each site handles only a part of the entire database,there may not be the same contention for CPUand I/O services ascharacterized by a centralized DBMS. 6. Economics a. It is now generally accepted that it costs much less to create a system of smaller computerswith the equivalent power of a single large computer. b. Thismakes it more cost effective for corporate divisions and departments to obtain separate computers. c. It is also much more cost-effective to addworkstations· to a network than to update a mainframe system. d. The second potential cost saving occurs where database are geographicallyremote and the applications require access to distributed data. e. In suchcases, owing to the relative expense of data being transmitted across thenetwork as opposed to the cost of local access, it may be much moreeconomical to partition the application and perform the processing locally ateach site. 7. Modular growth a. In a distributed environment, it is much easier to handle expansion. Newsites can be added to the network without affecting the operations of othersites. This flexibility allows an organization to expand relatively easily. b. Adding processing and storage power to the network can usually handle theincrease in database size. c. In a centralized DBMS, growth may entail changesto both hardware (the procurement of a more powerful system) andsoftware (the procurement of a more powerful or more configurable DBMS). CHAPTER-1: DATABASE SYSTEM CONCEPTS There are following disadvantages of DDBMSs: 1. Complexity a. A distributed DBMS that hides the distributed nature from the user andprovides an acceptable level of performance, reliability, availability isinherently more complex than a centralized DBMS. b. The fact that data canbe replicated also adds an extra level of complexity to the distributedDBMS. 2. Cost Increased complexity means that we can expect the procurement andmaintenance costs for a DDBMS to be higher than those for a centralizedDBMS. Furthermore, a distributed DBMS requires additional hardware to establish a network between sites. 3. Security a. In a centralized system, access to the data can be easily controlled. b. However, in a distributed DBMS not only does access to replicated data haveto be controlled in multiple locations but also the network itself has to bemade secure. 4. Integrity control more difficult a. Database integrity refers to the validity and consistency of stored data. b. Integrity is usually expressed in terms of constraints, which are consistencyrules that the database is not permitted to violate. c. Enforcing integrityconstraints generally requires access to a large amount of data that defines the constraints. d. In a distributed DBMS, the communication and processingcosts that are required to enforce integrity constraints are high ascompared to centralized system. 5. Lack of Standards a. Although distributed DBMSs depend on effective communication, we areonly now starting to see the appearance of standard communication and data access protocols. b. This lack of standards has significantly limited the potential of distributed DBMS’s. c. There are also no tools or methodologies tohelp users convert a centralized DBMS into a distributed DBMS 6. Lack of experience a. General-purpose distributed DBMSs have not been widely accepted, althoughmany of the protocols and problems are well understood. b. Consequently, we donot yet have the same level of experience in industry as we have with centralized DBMS’s. c. For a prospective adopter of this technology, this maybe a significant deterrent. 7. Database design more complex Besides the normal difficulties of designing a centralized database, thedesign of a distributed database has to take account of fragmentation ofdata, allocation of fragmentation to specific sites, and data replication. CHAPTER-1: DATABASE SYSTEM CONCEPTS 18)List and explain Codd’s 12 rules? Ans)Dr. Edgar F. Codd, after his extensive research on the Relational Model ofdatabase systems, came up with twelve rules of his own, which according tohim, a database must obey in order to be regarded as a true relationaldatabase. These rules can be applied on any database system that manages storeddata using only its relational capabilities. This is a foundation rule, whichacts as a base for all the other rules. The rules are as follows: Rule 1: Information Rule The data stored in a database, may it be user data or metadata, must be avalue of some table cell. Everything in a database must be stored in a tableformat. Rule 2: Guaranteed Access Rule Every single data element (value) is guaranteed to be accessible logicallywith a combination of table-name, primary-key (row value), andattribute-name (column value). No other means, such as pointers, can beused to access data. Rule 3: Systematic Treatment of NULL Values The NULL values in a database must be given a systematic and uniformtreatment. This is a very important rule because a NULL can be interpretedas one the following − data is missing, data is not known, or data is notapplicable. Rule 4: Active Online Catalog The structure description of the entire database must be stored in anonline catalog, known as data dictionary, which can be accessed byauthorized users. Users can use the same query language to access thecatalog which they use to access the database itself. Rule 5: Comprehensive Data Sub-Language Rule A database can only be accessed using a language having linear syntax thatsupports data definition, data manipulation, and transaction managementoperations. This language can be used directly or by means of someapplication. If the database allows access to data without any help of thislanguage, then it is considered as a violation. Rule 6: View Updating Rule All the views of a database, which can theoretically be updated, must alsobe updatable by the system. Rule 7: High-Level Insert, Update, and Delete Rule A database must support high-level insertion, updatingand deletion. Thismust not be limited to a single row, that is, it must also support union,intersection and minus operations to yield sets of data records. Rule 8: Physical Data Independence The data stored in a database must be independent of the applications thataccess the database. Any change in the physical structure of a databasemust not have any impact on how the data is being accessed by externalapplications. Rule 9: Logical Data Independence The logical data in a database must be independent of its user’s view (application). Any change in logical data must not affect the applicationsusing it. For example, if two tables CHAPTER-1: DATABASE SYSTEM CONCEPTS are merged or one is split into twodifferent tables, there should be no impact or change on the userapplication. This is one of the most difficult rule to apply. Rule 10: Integrity Independence A database must be independent of the application that uses it. All itsintegrity constraints can be independently modified without the need of anychange in the application. This rule makes a database independent of thefront-end application and its interface. Rule 11: Distribution Independence The end-user must not be able to see that the data is distributed overvarious locations. Users should always get the impression that the data islocated at one site only. This rule has been regarded as the foundation ofdistributed database systems. Rule 12: Non-Subversion Rule If a system has an interface that provides access to low-level records, thenthe interface must not be able to subvert the system and bypass securityand integrity constraints. 19) Explain data warehouse, data mining. List four features of data mining. Ans: Data Warehousing:a. A data warehouse is a repository of information gathered from multiple sources, stored under a unified schema, at a single site. b. Once gathered, data are stored for long time, permitting access to historical data. c. Data warehouses provide the user a single consolidated interface to data, making decision-support queries easier to write. d. Moreover, by accessing information for decision support from a data warehouse, the decision makers ensures that online transaction-processing systems are not affected by decision support workload Data Mining:a. Data mining is the exploration and analysis of large quantities of data in order to discover valid, novel, potentially useful and ultimately understandable patterns in data.It is known as “Knowledge Discovery in Databases”. When the data is stored in large quantities in data warehouse, it is necessary to dig the data from the ware house that is useful and required for further use. b. For data mining, different software tools are used to analyze, filter and transfer the data from the data warehouses. Feature of data mining: 1) Prediction3) Classification 2) Identification4) Optimization.