* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Flat Files and Relational Databases Presentation
Computer file wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Versant Object Database wikipedia , lookup
Data center wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data analysis wikipedia , lookup
Clusterpoint wikipedia , lookup
3D optical data storage wikipedia , lookup
Information privacy law wikipedia , lookup
Business intelligence wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Data vault modeling wikipedia , lookup
Databases Flat Files & Relational Databases Learning Objectives Describe flat files and databases. Explain the advantages that using a relational database gives over flat files. Flat Files A Flat File All the data is held in a single file which is two dimensional. Rows for records and columns for fields. E.g. A stock flat file Field Name Data Type Description String Cost Price Currency Selling Price Currency Number in Stock Integer Reorder Level Integer Supplier Name String Supplier Address String A Set / Series of Flat Files Flat files which have at least one common field which is duplicated but not linked. E.g. A Set / Series of Flat Files Purchasing Department Sales Department Accounts Department Programs to place orders when stocks are low Programs to record orders from customers Programs to record accounts of customers File containing Stock Code, Description, Re-order level, Cost Price, Sale Price, Supplier name and address, etc File containing Stock Code, Description, Number sold, Sale Price, Customer name and address, etc. File containing Customer name and address, amount owing, dates of orders, etc. Limitations of Flat Files Separation and isolation of data Which customers have bought parts produced by a particular supplier? Find the parts supplied by a particular supplier from one file and then use a second file to find which customers have bought those parts. Problem compounded if data is needed from more than two files. Duplication of data Supplier details duplicated if a supplier supplies more than one part. Customer details held in two different files. Data entered more than once, therefore time, space and money are wasted. Loss of data integrity: What happens if a customer changes his address? The Sales Department may update their files but the Accounts Department may not do this at the same time. If the Order Department order some parts and there is an increase in price. The Order Department increases the Cost and Sale prices but the Accounts Department do not, there is now a discrepancy. Data dependence If there is a need to change data formats, whole programs have to be changed. Different applications may hold the data in different forms, again causing a problem. If an extra field is needed in a file, all applications using that file have to be changed, even if they do not use that new item of data. Incompatibility of files If one department writes its applications in COBOL and another in C. Departments may find it difficult to read each other’s files. Fixed queries and the proliferation of application programs Each time a new query is needed, a new program has to be written. Difficult if the data needed to answer the query is in more than one file. Compounded if some files are incompatible. Relational Databases Note: The term database on its own means a relational database. Database Management System (DBMS) Relational Databases use DBMS to provide a structure at three different levels: Internal Level Conceptual Level External Level Internal Level The relational database physically stores on some form of storage medium all data in one file structure as a physical table with a data dictionary. Physical table: Contains all fields for all types of users (e.g. departments). Data dictionary: Holds information about the database necessary to enable it to work in different views for different types of users e.g. departments. Structures of logical / virtual tables or views, Relationships, Fields (names, data types, validation, descriptions etc…). Conceptual Level A manager uses a data description/definition language (DDL) to: 1. Create two or more logical/virtual tables or views and define one or more common linked fields called relationships. 2. Set the data types and write descriptions of each field. 3. Set validation. http://databases.about.com/od/sql/a/sqlfundamentals_2.htm External Level Users use a Data Manipulation Language (DML) to store, access, change, delete and search for data. Data Manipulation Language (DML) INSERT To add a new employee to a “personal_info” table: INSERT INTO personal_info values('bart','simpson',12345,$45000) SELECT Select everything from a “personal_info” table”: SELECT * FROM personal_info Extract a list of the last names of all employees in the company: SELECT last_name FROM personal_info Who has a salary value greater than $50,000? SELECT * FROM personal_info WHERE salary > $50000 UPDATE Gives all employees a 3% cost-of-living increase in their salary: UPDATE personal_info SET salary = salary * 1.03 An employee (ID = 12345) is due a $5,000 raise. UPDATE personal_info SET salary = salary + $5000 WHERE employee_id = 12345 DELETE An employee (ID = 12345) has been laid off. DELETE FROM personal_info WHERE employee_id = 12345 http://databases.about.com/od/sql/a/sqlfundamentals_3.htm http://en.wikipedia.org/wiki/Data_manipulation_language Advantages of Relational Databases Through the use of DBMS: Different logical/virtual tables or user views of data can be created. Views: A virtual or logical table composed of the result set of a query. Unlike ordinary tables a view is not part of the physical schema: it is a dynamic, virtual table computed or collated from data in the database. Easier to access the data because data is now accessible through queries using a Data Manipulation Language (DML). Advantages of Relational Databases All data can be accessed through this one relational database but different tables can be suited to different requirements (shows only relevant information). Less duplication of data held because data is not duplicated across different files, consequently there is less danger of data integrity being compromised. Less chance of one copy of data altered when another is not. Advantages of Relational Databases Data manipulation / input can be achieved more quickly as there is only one copy of each piece of data. Flat files need to be compatible, this problem does not arise with a database. Vocabulary for relational databases Fields / columns = attributes Records / rows = tuples Tables = relations Table description / shorthand notation / design: Example table: Employee (EMP) table for a company. Note that an NINumber is a unique number given by the UK government to every person who works in the UK, for tax purposes. EmpID NINumber Name Address Table: EMP (EmpID, NINumber, Name, Address) The words in brackets separated by a , = Fields / Columns / Attributes Keys Primary Keys Primary Key: Unique attribute (or set of attributes) used to identify the record or tuple. E.g. EMP (EmpID, NINumber, Name, Address) EmpID & NINumber could be primary keys as they are unique for each employee. However, it would be standard to choose EmpID to be the primary key. Note that it is conventional to underline the primary key and I will do so on the next few slides. Foreign Key An non-primary key attribute/field in one table which is also the primary key in another table and so links the two tables together. E.g. CINEMA (CID, Cname, Loc, MID) CID = Cinema ID, Cname = Cinema name, Loc = Location, MID = Manager ID & MANAGER (MID, Mname) Mname = Manager name MID occurs in CINEMA and is the primary key in MANAGER. When discussing CINEMA we say MID is the foreign key. Secondary Keys A different attribute (other than the primary key) that allows the data (the same record as the primary key or other groups of related records – so not necessarily unique) to be accessed in a different way. E.g. EMP (EmpID, NINumber, FName, LName, Address, Dept) NINumber could be a secondary key to allow access to the Employee’s record in a different way e.g. if EmpID is not known. Dept could also be a secondary key as it allows access to all employees in a particular department. The setup of one or more secondary keys allows these fields to be “indexed” so that searching by these fields is faster than ones that are not set to be possible secondary keys. For example, it is very unlikely that we would search the EMP table above by FName only, so we would not set FName to be a secondary key (even though it would still be possible to search by FName, it just would be slower than searching by a secondary key such as NINumber or Dept). However, Secondary keys can be a combination of fields e.g. FName+LName to allow a record to be accessed if neither the EmpID or the NINumber is known (but note that it is possible that two or Plenary What is meant by a flat file? Plenary All the data is held in a single file which is two dimensional. Rows for records and columns for fields. Plenary Describe the advantages of using a relational database over flat files. Plenary All data can be accessed through this one relational database but different tables can be suited to different requirements (shows only relevant information). Less duplication of data held because data is not duplicated across different files, consequently there is less danger of data integrity being compromised. Less chance of one copy of data altered when another is not. Plenary Data manipulation / input can be achieved more quickly as there is only one copy of each piece of data. Flat files need to be compatible, this problem does not arise with a database. Through the use of DBMS: Different user views of data can be created Easier to access the data because data is now accessible through relations / queries. Plenary What is meant by terms primary key, secondary key and foreign key? Primary Keys Primary Key: Unique attribute (or set of attributes) used to identify the record or tuple. E.g. EMP(EmpID, NINumber, Name, Address) EmpID & NINumber could be primary keys as they are unique for each employee. Secondary Keys Secondary A different attribute that allows the data to be accessed in a different order. E.g. EMP(EmpID, NINumber, Name, Address) If EmpID is the primary key then NINumber is the secondary key. Foreign Keys Foreign The primary key of another file / table / relation that is used to link files / tables / relations together. E.g. CINEMA(CID, Cname, Loc, MID) & MANAGER(MID, MName) MID occurs in CINEMA and is the primary key in MANAGER. When discussing CINEMA we say MID is the foreign key.