* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter1[1]
Serializability wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Microsoft Access wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Oracle Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Ingres (database) wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
Chapter One Definition and Basic Concepts What is Data? Facts that can be recorded and have implicit meanings. For example name, telephone number, etc. Many facts can be recorded but not everything is recorded 1. Only used or useful data need to be recorded, e.g., we need to record students names and addresses but we do not need to record their parents’ jobs. 2. Only known facts need to be recorded. What is a Database? A database is a collection of interrelated data. A database has the following implicit properties. It represents some aspect of real world, called Universe of Discourse (UoD) or Mini World. It is a logically coherent collection of data with some inherent meanings. Random collection of data cannot correctly be referred to as a database. A database is designed, built, and populated with data for a specific purpose for specific users. 1.1 Database can be of any size from a database of books in a library to world statistical information about water resources. It is made independent of applications, and organized to provide a foundation for future application development. Types of Databases and Database Applications Traditional Applications: Numeric and Textual Databases More Recent Applications: Multimedia Databases Geographic Information Systems (GIS) Data Warehouses Real-time and Active Databases Many other applications 1.2 A Simple Example of a Database Consider student record in a university environment where student grades are also recorded. We may have the following files to maintain student-grade information: Student file: where basic information about student is recorded. Course file: where information about courses is recorded. Sections: where information about each section is stored. Grade-report: where information about grades for each student in each course is recorded. Pre-requisite: where information related prerequisite course is recorded. 1.3 Database that stores student records and their grades Student Name Abc Xyz St-Id# 980001 980002 Class 1 2 Major CS CS Course Course-Name Into to Comp. Sc Data structure Logic Design Database Course-code CSC100 CSC215 CSC202 CSC385 CrHr 3 3 4 4 Department CS CS EE CS Section Section number 1 2 1 1 1 Course-code Semester Year Instructor CSC100 CSC100 CSC202 CSC385 CS215 II II II II II 2002 2002 2002 2002 2002 Dr. X Dr. Y Dr. A Dr. B Dr. X Grade-report St-Id# 98001 98001 98002 98002 98002 Course-code CS100 CS215 CS100 CS215 CS385 Grade A B B C+ A Prerequisite Course-code Prerequisitecode CS215 CS100 CS202 CS100 CS385 CS215 1.4 Database Management System (DBMS) A database management system is a collection of programs that enables users to create and maintain a database. DMBS is a general purpose software used to: Define: A database involves specifying the data types, structures, and constraints for the data to be stored. a. A special language such as SQL is used for defining or declaring the database b. SQL has all the necessary commands to specify tables, fields and data types as well as some constraints c. One database can have many tables; a table can have two or more fields; each field can only have one data type, eg, character or numeric or any other type allowed by the DBMS Construct: a process of storing the data itself on some storage medium that is controlled by the DBMS. a. How and where data is stored on disk is decided by the DBMS b. Data can be stored (inserted) directly through SQL or can through a host language program such as Developer 200 or VB c. How the DBMS interacts with OS and HW does not concern the developers or users 1.5 Manipulate: A database includes such functions as querying the database to retrieve specific data, updating the database, and generating reports from the data. a. All access to the database is done through the host DBMS and through SQL b. The efficiency of storage and retrieval can be enhanced by the DBA by using certain commands and facilities such as indexing provided by the DBMS c. Generating reports can be done directly using the SELECT statement in SQL or using a special package provided the DBMS company. There is no limit to the variety of reports generated by SQL or the report writer based on the information in the database Also note that A DBMS is normally bought as a ready made package DBMSs vary in cost depending on a number of issues, such as level of security provided, level of support, supporting tools for application development, Internet functions, speed of storage and retrieval, etc. A DBMS may be stand alone or multi-user The DBMS is like a black box to users and developers, they know how to use it for a variety of purposes but cannot modify its programs A DBMS must provide the basic facilities such as retrieve, insert, delete and update. These four 1.6 operations could be carried out on any data in the database mounted using this DBMS A Simplified Database System Environment Users\Programmers Users who use the application system Programmers who develop the application programs Application Programs/Queries Application programs are developed using a RAD tools such as VB. Queries are in SQL DATABASE SYSTEM DBMS SOFTWAREE Software to Process Queries/Programs Software to Access Stored Data This plays the role of compiler DBMS specific Stored Database Definition (Meta-Data) Stored Database 1.7 What is a Database System? Database System = Database + DBMS + Application Programs Brief History of Database Systems 1940's & 50's 1960's Initial use of computers as calculators. Limited data, focus on algorithms. Science, military applications Business uses. Organizational data, customer data, sales, inventory, accounting, etc. File system based, high emphasis on applications programs to extract and assimilate data. Larger amounts of data, relatively simple calculations. Hierarchical or Network Database Systems 1970's & 80's The relational model. Data separated into individual tables. Related by keys. Initially required heavy system resources. Examples: Oracle, Sybase, Informix, Digital RDB, IBM DB2. Late- 1980's Local area networks. Workgroups sharing resources such as files, printers, e-mail. Client/Server Database resides on a central server, applications programs 1.8 run on client PCs attached to the server over a LAN 1990's Internet and World Wide Web make databases of all kinds available from a single type of client - the Web Browser. Object-Oriented Database Systems? Distributed Database Systems? Knowledge-Base Systems Users or Actors on the Scene There are mainly three kinds of users associated with Database 1. Database Administrators (DBA) DBA is responsible for the overall control of the system at the technical level. The DBA is mainly responsible for implementing and maintaining database. 1. The DBA handles security issues of passwords and levels of authority 2. The DBA programs the efficiency of the DBMS and the associated database 3. The DBA can be responsible for backup and recovery procedures 2. Database designer Database designers are responsible for identifying the data to be stored in the database and for choosing appropriate structures to represent and store this data. 1.9 1. Database designers may be called by other titles such as system analysts or design analysts and many other titles. Normally they are responsible for designing the data part and functional part for the system 2. Database designers follow a methodology together with techniques to reach at the perfect database design. The Entity Relationship model is used to design the static part of the system, which is the data design. They then transform the ERM to relations (tables) and then optimize these relations using a technique known as Normalisation. Certain tasks are undertaken before the database is actually implemented and populated with data. 1. Before a database is implemented and populated, the users must be satisfied with the structure of the database. The database must be able to contain all their required data and produce the necessary reports 2. The DBA implements the database on the chosen DBMS 3. Normally the application system would have been developed or being developed. 1.10 3. End users End users are the people whose jobs require them to access the database for querying, updating, and generating reports; the database primarily exists for their use. Types of end users: Casual end users: Occasionally access the database They may need different information each time they access database Naïve or parametric end users: Bank teller’s users: check account of balance, withdrawal, and deposit. Reservation clerks for airlines, hotel, and car companies: check availability for a given request and make reservation. Sophisticated end users: Include engineers, scientists, business analysts, and others who are thoroughly familiar with the DBMS to implement their complex quires. Stand-alone users: Mostly maintain personal databases using ready-to-use packaged applications. An example is a tax program user that creates its own internal database. Another example is a user that maintains an address book 1.11 Workers behind the scene DBMS system designers and implementers Tools developers Operators and maintenance personnel Viewpoints DBMS designer/implementer Database designer Issues concerned Develop a DBMS Capture information structures in the real world and design an organization of a database logical structure physical structure Database administrator Monitor operations on a (DBA) database and maintain a database system efficient Application programmer Write programs accessing a database End user Enter data and manipulate data through a Casual query language Naïve/parametric Sophisticated 1.12 Benefits of the database Approach 1. Redundancy can be reduced In non-database systems each application has its own private files. This fact can often lead to considerable redundancy in stored data with resultant waste in storage space. In the database approach, files for the entire application can be stored at a single location, therefore redundancy can be controlled carefully. However, sometime there are sound reasons to maintain multiple (many) copies of the same file. 2. Inconsistency can be avoided If some data (suppose name of student) is represented in two places in the database, it may happen that at some stage we update the name in one place but forget in other places. At such times the database is said to be inconsistent. Inconsistency can be avoided in well designed database but keeping all the data at a single location. 3. The data can be shared Different users and different applications can share data. It does not mean only that existing applications can share data in the database but also that new applications can be developed to operate against that same stored data. 1.13 4. Standards can enforced With central control of the database, the DBA can ensure that all applicable standards are observed in the representation of the data. This is very crucial for the success of database applications in large organizations. Standards refer to data item names, display formats, screens, report structures, meta-data (description of data), Web page layouts, etc. 5. Security restrictions can be applied Since, the data is controlled and stored at single location, the DBA can ensure that the only means of access to the database is through the proper channels. Different security checks can be established for each type of access. 6. Integrity can be maintained The problem of integrity is the problem of ensuring that the data in the database is accurate. Data integrity is more important in a multi-user database system. Without appropriate controls it would be possible for one user to update the database incorrectly, thereby generating bad data and so “infecting” other innocent users of that data. Centralized control database can help in avoiding such problems. DBA can define certain 1.14 controlled checks while data is updated or deleted. Other advantages Conflicting requirement can be balanced 1. Data can be designed in such a way to accommodate users with conflicting requirements and satisfactory reports and processing can be generate for all 2. Since one designer designs one centralized database all different requirements can be modeled and reflected on the same database Back-up & recovery 1. The DBMS provides built in backup and recovery procedures which can be customized the DBA to suit the requirements of the organization 2. Backup can be done every minute, every hour, every day or every week or even every month 3. Special recovery procedures can be established to retract from the fault and continue with the database processing without loss of data or transactions 4. In very important databases, disk mirroring can be established where data is backed up on immediate basis, i.e., as if two parallel systems are operating Data independence 1. The structure of the database and the contents are separate from the application system 1.15 programs and a change in one should not necessarily necessitate a change in the other 2. Unlike file processing systems, rewriting and recompiling a program does not mean you have to change or recreate data structures. Flexibility - Database structure may evolve as new requirements are defined. Speedy development - Incremental time to add each new application is reduced. Up-to-date information - Extremely important for online transaction systems such as airline, hotel, car reservations. Economies of scale - Wasteful overlap of resources and personnel can be avoided by consolidating data and applications across departments. When not to use DBMS? In spite of the advantages of using a DBMS, there are a few situations in which such a system may involve unnecessary overhead costs, as that would not be incurred in traditional file processing. For example: High initial investment in hardware, software, and training. Generality that a DBMS provides for defining and processing data. Overhead for providing security, concurrency control, recovery, and integrity functions. Multiple users access to data is not required. 1.16 The database and applications are simple, well defined, and not expected to change. There are stringent (strict) real-time requirements for some programs that may not be met because of DBMS overhead. Two Major Topics 1. Database Design (Viewpoint of a database designer) 2. Algorithmic Issues on a DBMS (Viewpoint of a DBMS designer) 1.17