* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DATABASES A database is a shared collection of logically related
Open Database Connectivity wikipedia , lookup
Relational algebra wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
DATABASES A database is a shared collection of logically related data with descriptions designed to satisfy the information needs of an organisation. The key idea behind the database concept is separating the data from the application program (data independence) Some common uses of databases are in large shops and supermarkets. The databases can be linked directly to the operations of the business. For example, bar codes on items for sale can be linked to products in the company's database so that stock levels can be maintained and pricing details can be applied centrally from the database rather than on each individual item. The database approach was developed and adopted because of the problems associated with data being stored within application programs or within file based systems. The problem with data being stored within application programs was that it was hard to access from other places and there were limits to what could be done with the data. There are three main areas within the relational model that can be focused on: 1. Data Structure 2. Data Integrity 3. Data Manipulation Databases and File Based Systems A file based system is a collection of application programs that perform services for the users wishing to access information. Each program within a file based system defines and manages its own data. Because of this, there are limits as to how that data can be used or transported. File based systems were developed as better alternatives to paper based filing systems. By having files stored on computers, the data could be accessed more efficiently. It was common practice for larger companies to have each of its departments looking after its own data. The problems that arise with this type of file based system are listed below: 1. Data separation and isolation 2. Data dependence 3. Data duplication 4. Incompatible data (different file formats) 5. Lack of flexibility in organising and querying the data 6. Increased number of different application programs Some advantages of database systems are outlined below: Sharing of data Consistency of data Integrity of data Security of data Data independence Allows for more analysis of the same amount of data Improved data access and system performance Potentially increased productivity Increased concurrency Improved data backups and recovery Some potential disadvantages of database systems are the cost of implementing them, the amount of effort needed to transfer data into the database from a current system, and also the impact on the whole company if the database fails (even if only for a relatively short period). DATABASE MANAGEMENT SYSTEM (DBMS) A DBMS is a software system that enables users to define, create and maintain a database. The DBMS also enforces necessary access restrictions and security measures in order to protect the database. The DBMS also has the job of controlling access to database. Various types of control systems within the DBMS make sure that the database continues to function properly: Integrity system Security system Concurrency control system Recovery control system Many DBMSs enable "views" of the database to be defined. A view is how the database appears to a certain user. Views offer the benefit of only having to show relevant information to different types of users and it increases security as certain users will not be able to see data which they are not meant to see. Views can also decrease the perceived complexity of the database from the user's point of view. Data Definition and Manipulation The DBMS makes use of a Data Definition Language (DDL) and a Data Manipulation Language (DML). The DML enables the specification of data types, structures and constraints that should be part of the database. All specifications defined by the DDL are stored in the database. The DML enables those with access to the database to insert, update, delete and retrieve data from it. Structured Query Language (SQL) is the standard DML used today. SQL is a nonprocedural language. Why Use a DBMS? •Data independence and efficient access. •Reduced application development time. •Data integrity and security. •Uniform data administration. •Concurrent access, recovery from crashes. Relational Database Principles and Terminology A relational database is a collection of 2-dimensional tables which consist of rows and columns. Tables are known as "relations", columns are known as "attributes" and rows (or records) are known as "tuples". Tables / Relations are a logical structure therefore they are an abstract concept and they do not represent how the data is stored on the physical computer system. Each column / attribute in a relation represents an attribute of an entity. A single row / tuple contains all the information (in the form of attributes) about a single entity. The "cardinality" of a relation is the number of row / tuples it has. The "degree" of a relation refers to the number of columns / attributes in that relation. The order of records and columns is irrelevant. Relations and columns should always be uniquely named and therefore uniquely indentifiable. No duplicate rows occur in a single relation. Each cell of a relation contains a single value or element which is atomic. This means that arrays or lists, for example, would not be stored under a single attribute. Multi-valued attributes are possible though but this involves a technique of referring to another relation which holds these multiple values. Attribute Domains Attribute domains define the set of all possible values an attribute within a relation can take. For example, the attribute "height" of a person may only take integer values with a 4 digit maximum (if measuring in millimetres). The attribute "gender" may only take a string of length 1 with the only two characters accepted being "m" or "f" representing male or female. Defining attribute domains allows the database to reject invalid inappropriate data. An attribute may in some cases be permitted to contain Null values. A Null value makes it possible to deal with missing or erroneous data. Null means that there is an absence of a value. It is different from using a blank space or zero value in an attribute. Keys (Primary, Candidate, and Foreign) A key (also known as a candidate key) is an attribute which uniquely identifies a row / tuple within a relation. The primary key is the key chosen by the database designer as the main identifier for all records in that relation (even though other keys could be used as the primary key). A foreign key is an attribute which provides a logical link between tables. A foreign key in one relation will correspond to a key within another relation. Candidate keys can consist of a single attribute or multiple attributes. Multiple attribute keys are known as composite keys. Database Design / Application Lifecycle Before a database design is set into motion, planning is an essential stage. This is typically the responsibility of management within the organisation. Good planning will enable the design and implementation to be successfully completed with the desired results. A general definition of the desired system should also be drawn up at this stage. This would include information such as what the database application should be able to do, to what areas it will be applied, and who will be using it. Three main factors that should be analysed in the planning are • What work will need to be done • The resources needed to complete the work • How much it will all cost Requirements Analysis and Collection Information can be gathered using various techniques. Some of these are listed below: • Interviewing individuals • Observation • Examining documents • Using questionnaires • Using expert knowledge and experience from other design work Interviewing many individuals can be time consuming but can result in quality feedback which can be used in the system design. Using questionnaires is a quicker and easier way to gain feedback from relevant people. It is hard to ensure that the answers people give are accurate though. Looking at the current documents and paperwork in circulation can be useful in determining what input/output screens should look like for example. Database Design The database design stage will result in a database model that will hopefully support the organisation's goals. The two design approaches that can be used are Bottom Up or Top Down. The Top Down approach starts with a very general overview of the system and details are added in an iterative manner. Bottom Up design involves getting all the details firs then constructing the overall design from the smaller more detailed parts. The aim of a database design should be to represent all relevant data and the relationships that exist between data items. The database model should also allow for all necessary transactions on the data to be possible. The following stages follow the database design stage: • DBMS Selection • Application Design • Prototyping • Implementation • Data Conversion and Loading (if data from an existing system needs to be put into the new database) • Testing • Operational Maintenance (follows the installation of the system) DATABASE MODELS Various techniques are used to model data structure. Most database systems are built around one particular data model, although it is increasingly common for products to offer support for more than one model. For any one logical model various physical implementations may be possible, and most products will offer the user some level of control in tuning the physical implementation, since the choices that are made have a significant effect on performance. An example of this is the relational model: all serious implementations of the relational model allow the creation of indexes which provide fast access to rows in a table if the values of certain columns are known. A data model is not just a way of structuring data: it also defines a set of operations that can be performed on the data. The relational model, for example, defines operations such as select, project, and join. Although these operations may not be explicit in a particular query language, they provide the foundation on which a query language is built. Flat model This may not strictly qualify as a data model, as defined above. The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another. For instance, columns for name and password that might be used as a part of a system security database. Each row would have the specific password associated with an individual user. Columns of the table often have a type associated with them, defining them as character data, date or time information, integers, or floating point numbers. This model is, incidentally, a basis of the spreadsheet. Hierarchical model In a hierarchical model, data is organized into a tree-like structure, implying a single upward link in each record to describe the nesting, and a sort field to keep the records in a particular order in each same-level list. Hierarchical structures were widely used in the early mainframe database management systems, such as the Information Management System (IMS) by IBM, and now describe the structure of XML documents. This structure allows one 1:N relationship between two types of data. This structure is very efficient to describe many relationships in the real world; recipes, table of contents, ordering of paragraphs/verses, any nested and sorted information. However, the hierarchical structure is inefficient for certain database operations when a full path (as opposed to upward link and sort field) is not also included for each record. One limitation of the hierarchical model is its inability to efficiently represent redundancy in data. Network model The network model (defined by the CODASYL specification) organizes data using two fundamental constructs, called records and sets. Records contain fields (which may be organized hierarchically, as in the programming language COBOL). Sets (not to be confused with mathematical sets) define one-to-many relationships between records: one owner, many members. A record may be an owner in any number of sets, and a member in any number of sets. The network model is a variation on the hierarchical model, to the extent that it is built on the concept of multiple branches (lower-level structures) emanating from one or more nodes (higher-level structures), while the model differs from the hierarchical model in that branches can be connected to multiple nodes. The network model is able to represent redundancy in data more efficiently than is the hierarchical model. The operations of the network model are navigational in style: a program maintains a current position, and navigates from one record to another by following the relationships in which the record participates. Records can also be located by supplying key values. Although it is not an essential feature of the model, network databases generally implement the set relationships by means of pointers that directly address the location of a record on disk. This gives excellent retrieval performance, at the expense of operations such as database loading and reorganization. Most Object databases use the navigational concept to provide fast navigation across networks of objects, generally using Object Identifiers as "smart" pointers to related objects. Objectivity/DB, for instance, implements named 1:1, 1:many, Many:1 and Many:Many named relationships that can cross databases. Many object databases also support SQL, combining the strengths of both models. Relational model The relational model was introduced in an academic paper by E. F. Codd in 1970 as a way to make database management systems more independent of any particular application. It is a mathematical model defined in terms of predicate logic and set theory. The products that are generally referred to as relational databases in fact implement a model that is only an approximation to the mathematical model defined by Codd. Three key terms are used extensively in relational database models: relations, attributes, and domains. A relation is a table with columns and rows. The named columns of the relation are called attributes, and the domain is the set of values the attributes are allowed to take. The basic data structure of the relational model is the table, where information about a particular entity (say, an employee) is represented in columns and rows (also called tuples). Thus, the "relation" in "relational database" refers to the various tables in the database; a relation is a set of tuples. The columns enumerate the various attributes of the entity (the employee's name, address or phone number, for example), and a row is an actual instance of the entity (a specific employee) that is represented by the relation. As a result, each tuple of the employee table represents various attributes of a single employee. All relations (and, thus, tables) in a relational database have to adhere to some basic rules to qualify as relations. First, the ordering of columns is immaterial in a table. Second, there can't be identical tuples or rows in a table. And third, each tuple will contain a single value for each of its attributes. A relational database contains multiple tables, each similar to the one in the "flat" database model. One of the strengths of the relational model is that, in principle, any value occurring in two different records (belonging to the same table or to different tables), implies a relationship among those two records. Yet, in order to enforce explicit integrity constraints, relationships between records in tables can also be defined explicitly, by identifying or non-identifying parentchild relationships characterized by assigning cardinality (1:1, (0)1:M, M:M). Tables can also have a designated single attribute or a set of attributes that can act as a "key", which can be used to uniquely identify each tuple in the table. A key that can be used to uniquely identify a row in a table is called a primary key. Keys are commonly used to join or combine data from two or more tables. For example, an Employee table may contain a column named Location which contains a value that matches the key of a Location table. Keys are also critical in the creation of indexes, which facilitate fast retrieval of data from large tables. Any column can be a key, or multiple columns can be grouped together into a compound key. It is not necessary to define all the keys in advance; a column can be used as a key even if it was not originally intended to be one. A key that has an external, real-world meaning (such as a person's name, a book's ISBN, or a car's serial number) is sometimes called a "natural" key. If no natural key is suitable (think of the many people named Brown), an arbitrary or surrogate key can be assigned (such as by giving employees ID numbers). In practice, most databases have both generated and natural keys, because generated keys can be used internally to create links between rows that cannot break, while natural keys can be used, less reliably, for searches and for integration with other databases. (For example, records in two independently developed databases could be matched up by social security number, except when the social security numbers are incorrect, missing, or have changed.)