Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Brief Look at Reality datastructures 6.1 Aircraft Noise Monitoring System 1 Sydney Airport - 7000 complaints per day Software Used : CA Windows/4GL - Application Development Tool CA-OpenIngres - DBMS Global Environment Management System MOSAIX Technologies Spatial Application Development Tool datastructures 6.2 Aircraft Noise Monitoring System 2 • Developers - Lochard Environment Systems (Melbourne group 1 of many involved) Have installed noise and flight plan systems at: Sydney, Melbourne, Brisbane, Cairns, Perth, Adelaide, Coolangatta All systems are integrated into 1 system monitored from the offices of Air Services, Canberra datastructures 6.3 Aircraft Noise Monitoring System 3 Some Features: Environment Monitoring Units (EMU) (data source - 16 at Sydney - A reading every second) Exception Levels - Report of noise level > permissible threshold Correlated with Secondary Surveillance Radar and Flight Plan Information from the Air Traffic Control System datastructures 6.4 Aircraft Noise Monitoring System 4 Noise and Flight Plan overlaid to a map of surrounding area - viewed in real time Records stored in CA-OpenIngres database Geographic Information System interface GEMS database contains attributes such as: aircraft information type call sign engine capacity address of owner/operator datastructures 6.5 Aircraft Noise Monitoring System Hardware Platform Sun Microsystems: Quad processor SunSPARC 514 256Mb RAM, 50Gbyte disk storage Communications TCP/IP network - CA-Ingres/NET Communications software datastructures 6.6 Aircraft Noise Monitoring System MOSAIX Technologies Spatial Application Development EMU’s CA Windows/4GL CA-Open Ingres Global Environment Management System datastructures 6.7 6.0 Design and Data Structures Data structures are the bricks and mortar that hold databases together. Data structures (for the ANSI/SPARC standard) are defined in the internal model level and implemented in the physical data organisation. datastructures 6.8 Design and Data Structures Data structures are often hidden from the application programmer, since they are primarily used by the DBMS and the Operating System Understanding data structures is important for performance reasons, to improve program design and allow easier communication with DBMS specialists datastructures 6.9 Goals of Relational Design To determine what Relations should exist and what Attributes they should contain Avoid Redundancy if possible - minimise storage space Avoid Anomalies Avoid Nulls Avoid Joins which produce misleading or incomplete rows datastructures 6.10 Assumptions A group of attributes has a natural “inherent” structure This structure is independent of the way the data is used Normalisation • introduced by E.F. Codd together with relational database theory • originally Codd defined three normal forms • later expanded to Boyce-Codd and fourth and fifth normal forms datastructures 6.11 Dependency Theory " one truly scientific part of the field [of database design]" Date 6th Ed p.380 - needs more research Relational Database Design - a mechanical approach to producing a database schema with certain desirable properties Review Normal Forms and the Problems they Solve Study Algorithms which produce Normalised Relations with Desirable Properties datastructures 6.12 Normal Forms Formal measure of why one grouping of attributes may be better than another Each Normal Form requires that a Relation satisfies criteria for that form. This eliminates a different kind of redundancy Normalised Relations will remain consistent following database operations and will store each fact only once Database operations applied to unnormalised relations may lead to anomalies datastructures 6.13 Anomalies Person_Id Project_budget Project Time_Spent_on_Project S75 32 P1 7 S75 40 P2 8 S79 32 P1 4 S79 27 P3 1 S80 40 P2 5 - 17 P4 - Relation Assign Null Values are considered to be anomalies datastructures 6.14 Anomalies Insertion Anomaly add row (ASSIGN , <S85,35,P1,9>) - two conflicting budgets for P1 S75 32 P1 7 S79 32 P1 4 Deletion Anomaly delete row(ASSIGN, <S79,27,P3,11>) - removes project budget for P3 S79 27 P3 1 (the previous ‘table’ was more of a ‘report’ format) datastructures 6.15 Functional Dependencies - the values of one set of attributes effect the values of another attributes X Y The value of X determines the value of Y The value of Y depends on the value of X The simplest case is 1 attribute determines another single attribute datastructures 6.16 Functional Dependencies Project Project Budget Person-Id Project Time Spent on Project Functional Dependency Diagram Project Project Budget datastructures 6.17 Minimal Functional Dependencies Employee Department Employee determines Department Department Location Department determines Location Employee Location Employee determines Location - This Functional Dependency Is Redundant datastructures 6.18 Normalisation What is it ? It is a method for increasing the quality of database design. It also doubles up as a theoretical base for definition of the properties of tables. It probably has its origins in the early days of moving from ‘files’ to database tables. It supports and verifies the database model. datastructures 6.19 Normalisation 1st Normal Form Repeating groups must not occur Subject Student No. Result COT7180 COT2138 COT7700 9142717 9131618 9077184 8967384 8737980 9142717 6932475 C D P N P P HD Name Wilson Renoir Gilbey Breton Balzac Wilson Gilbey 1st Normal Form datastructures 6.20 Normalisation Corrected Table Subject COT7180 COT7180 COT7180 COT2138 COT2138 COT7700 COT7700 StudentNo 9142717 9131618 9077184 8967384 8737980 9142717 6932475 Result C D P N P P HD Name Wilson Renoir Gilbey Breton Balzac Wilson Gilbey Formally expressed as: Results(subject code, studentno,result,name datastructures 6.21 Normalisation - Examples 1. 1NF or First Normal Form Rule Each Row MUST CONTAIN the same number of columns Example: Course Instructor table class code lecturer tutor c3576 k4567 b6745 r3289 tutor tutor Doe,J Jones,R Smith,V Nguyen,P Fabbri,M Ong,W Pratt,W Archer,V Barrat,N Ng,K The number of columns (attributes) is not consistent Create a table of TUTORS with Class Codes datastructures 6.22 Normalisation - 2 2. 2NF or Second Normal Form (1) The table must be in 1 NF (2) Each attribute which is NOT part of the key must be functionally dependent on the whole key Building A A B Room 214 242A 213 Seats 85 25 135 No. of Levels 4 4 6 The number of levels is functionally dependent on BUILDING only, not Building and Room datastructures 6.23 Normalisation 3 3. 3NF or Third Normal Form (1) Table must be in 2NF ( and therefore 1NF) (2) Every attribute which is NOT part of the Primary Key must be dependent ONLY on the P.K. (i.e. not dependent on any other NON-KEY attribute) Class Code c3567 k4567 b7654 Lecturer Doe,J Nguyen,L Fabbri,M Lecturer’s Office 101 A Building 209 F Building 312 B Building NOT 3NF datastructures 6.24 Normalisation 3a Table1 Class Code Lecturer c3576 Doe,J k4567 Nguyen,L b7645 Fabbri,M Table 2 Lecturer Lecturer’s Office Doe,J 101 A Building Nguyen,L 209 F Building Fabbri,M 312 B Building These tables are in 3NF. datastructures 6.25 Functional Dependencies supplier name p.k supplier number status city Notice that these attributes are dependencies of the primary key part name colour p.k part number mass city Not functionally dependent city assets datastructures 6.26 A Simple Test for 3NF Each attribute should depend on : • the key • the whole key • and nothing but the key datastructures 6.27 Normal Forms Based on FD’s Review of Kent's Process 1NF - the shape of the record - fixed length - no repeating groups - Relational Model does not handle repeating groups Relationship between key and non-key fields 1:1 or 1:N 2NF is violated when a non-key field is a fact about a subset of the key datastructures 6.28 Normal Forms Based on FD’S Problems with relations in 2NF - repeated information - update anomalies - potential inconsistency - delete anomalies 3NF - is violated when a non-key field is a fact about another non-key field Problems with relations in 3NF when a non-key field is a fact about a key field datastructures 6.29 Modelling - some hints 1. Do not prematurely combine entities into tables 2. Concentrate on Access Mechanisms which can be shared among requests 3. Deviate from the data model in a responsible manner 4. Use table/view and column names which closely reflect data model names 5. Do not define multiple attributes as one (composite) column in a table. (i.e. attribute = column) 6. Develop an ‘architecture’ to support business rules (rules, procedures, integrity) 7. Position your organisation for future technology (windows, CASE, optimisers) datastructures 6.30 Physical Access Considerations We are going to have a quick look at some aspects of accessing data from the database datastructures 6.31 3 level architecture external view A Cobol + DML external view B C + DML DBMS external view C DML external view X Assembler + DML Conceptual Model Internal Model operating system Physical Level External Schemas user interface Conceptual Schema Internal Schema logical record interface stored record interface physical record interface datastructures 6.32 Transaction Processes Locate Record Bring Block into Buffer Rewrite CREDITS field xxx in buffer 3 1 2 4 6 Write block to disk 5 datastructures 6.33 Database Access DBMS FUNCTIONS 1 2 Request for Stored Record Request for Stored Page Stored Record returned for processing FILE MANAGER Stored Page Returned 6 5 DISK MANAGER 3 Operating System Disk I/O Operation Data Read from Disk DATA BASE 4 DISK MANAGER Physical Disk Addresses (data and free space) FILE MANAGER Files and Page Sets datastructures 6.34 Implementation Design - File Organisation and Access File Access and Organisation datastructures 6.35 Table Access Calculation Number of PAGES or BLOCKS No. of Disk I/O’s per second Assume 1 disk I/O per page Assume HEAP storage structure 300,000 3 select * from employee where employee.name = ‘Johnston’; [closure feature of SQL] datastructures 6.36 Table Access Calculation Time taken = 300000/30 = 10,000 seconds (nearly 3 hours !) Indicates the need of a storage structure which provides access on a ‘key’ Modify existing structure to ? ? on what key (attribute or multiple attributes) Other items: fillfactor minpages maxpages compression leaffill location datastructures 6.37 Indexes - Performance Improvers Performance is dependent on how quickly an application can access table data (remember caching in Session 2 ?) In Oracle there are these points to consider in Implementation Design 1. Applications can access table data with or without indexes 2. IF an index is present and IF the index will help performance, Oracle will use the index 3. Oracle will automatically update an index to keep it in synchronisation with its table datastructures 6.38 Indexes - Performance Improvers It is not a good tactic to index every attribute set. Indexes should be restricted to key attributes Index maintenance generates overheads - time and storage datastructures 6.39 File Organisation A file organisation is a technique for physically arranging the records of a file on a secondary storage device. File organisations Sequential Sequential (block index) Hardwaredependent (ISAM) Indexed Nonsequential (full index) Direct RelativeAddressed HashAddressed Hardwareindependent (VSAM) datastructures 6.40 Record Access Modes Sequential Access In sequential access, record storage starts at a designated point, usually the beginning, and proceeds in a linear sequence through the file. Each record can only be retrieved by accessing all the records that physically precede it. Random Access In random access, a given record is accessed "out of the blue" without referencing other records in the file. datastructures 6.41 File Organisation and Access Mode A File organisation is established when the file is created, and is rarely changed. However, record access mode can change each time the file is used. File Organisation Record access mode Sequential Random Sequential Yes No (impractical) Indexed Seq. Yes Yes Direct-Relative Yes Yes Direct-Hashed No (impractical) Yes datastructures 6.42 Indexed Sequential File Organisation There are two basic implementations of the indexed sequential organisation: - hardware-dependent uses block index on the key, disk address to the prime area which contains the data records and the track index for the cylinder - hardware-independent uses a control interval which may be considered a virtual track, free space for new records is provided by distributed free space. datastructures 6.43 Relative Files - Each record can be retrieved by specifying its relative record number. - The relative record number is a number 0 to n that gives the position to the record relative to the beginning of the file. - This provides a method of direct file organisation. datastructures 6.46 Hashed Files For applications which do updates and retrievals in random mode, and there is rarely the need for sequential access to the data records (e.g. reservation systems). Hashed file organisation provides rapid access to individual records based on a key. The major disadvantage of hash organisation is that sequential organisation is not convenient because the records are not stored in primary key sequence. But highly concurrent environments doing random access are suitable for using hash organisation. The basis of a hash file is an addressing algorithm which transforms the record identifier into a relative address. datastructures 6.47 Binary Trees A non-linear data structure, each element having several "next" elements ( branching ). A binary tree has a maximum of two branches per element or node. A node consist of some data and a maximum of two pointers, a left pointer to the left branch and right pointer to the right branch. If there is no left or right branch then a nil pointer is used. datastructures 6.54 A Diagram of a Binary Tree Basic binary PRODUCTNO LLINK RLINK tree record Primary Data Less Than Greater Than layout for PRODUCT Key Pointer Pointer __________________________________________ 1000 1000 < > 1600 (1) Initial tree < 1000 0350 (2) Insert 1000 (3) Insert 1600 < 1600 > 2000 (5) Insert 2000 0350 1600 (4) Insert 0350 < 0350 > 1600 > 0975 (6) Insert 0975 > 2000 > 0350 1000 > 1000 1000 > 1600 > < 0975 > 2000 0625 (7) Insert 0625 datastructures 6.55 An Example of a Binary Tree 1000 < 1600 < 0350 < 0100 > < 0625 > 0975 1250 > 1425 > 2000 < 1775 Task: Indicate the different traversals on this diagram. datastructures 6.57 B Trees The problem with Binary Trees is balance, the tree can easily deteriorate to a linked list. Consequently, the reduced search times are lost. This problem is overcome in B-trees. B for Balanced, where all the leaves are the same distance from the root. B-trees guarantee a predictable efficiency. datastructures 6.58 B+ Trees There are several varieties of B-trees, most applications use the B+ tree. The + indicates the presence of an Index - A B-tree index is an ordered tree of index nodes Each index node contains one or more index entries Each index entry corresponds to a row in a table The index entry contains : the indexed attribute vale (or set) for the row the ROWID (physical disk location) datastructures 6.59 Example of a B+ Tree Leaves 1250 0625 0350 0350 1000 0625 0625 1425 1000 1000 1250 1300 1250 2000 1425 1600 2000 1425 1300 2000 1600 Actual Data Records datastructures 6.63 A Review of Trees Can permit rapid retrieval of data for both random and sequential processing. Can be used on primary or secondary keys. Trees are special cases of networks; in networks, records from different files are joined without a strict hierarchy being observed. This is addressed in the hierarchical and network model lectures. datastructures 6.64 Other Methods Bit Mapping This is another table and its contents are – a bit to indicate the presence of some value – a row i.d. to reference the row Row I.D. Row 1 Row 2 Row 3 Row 4 Green Red 1 1 1 % of distinct values to total should be low Useful in DSS/data warehouse applications Not good for frequent update or insert applications datastructures 6.65 Other Methods Clustering - Oracle Is a technique which ‘clusters’, or groups together, related rows of one or more tables in the same data block. The objective is to store (on disk) rows of an application which are used together (e.g. Orders and Items ) - this saves disk I/O on analysis applications A cluster key is necessary for each cluster. Not very successful for high volume processing datastructures 6.66 Storage Structure Selection Requirements Table size : small Heap 2 Hash 1 1 ISAM BTree 1 3 Table size : medium (modify disk space available) 4 1 1 1 Table size : large (>1/2 disk 2** 4 4 1 Deletes Frequent 4 1 1 3 Updates Frequent 4 1 1 2 1 1 Secondary Index Structure N/A 1 ** secondary indexes used with a heap structure datastructures 6.67 Storage Structure Selection 2 Requirements Heap Hash ISAM BTree Need Pattern Matching 4 4 1 1 Need Range Searches 4 4 1 1 Exact Match Key Retrieval 4 1 2 2 Sorted Data 4 4 2 1 Concurrent Updates 4 1 1 2 Add Data - No Modify 2 3 3 1 Sequential Addition of Data 1** 2 5 1 datastructures 6.68 Storage Structure Selection Requirements Heap Hash 3 ISAM BTree Initial Bulk Copy of Data 1 2 2 2 Table Growth - nil/static N/A 1 1 2 Table growth - low (15%) Plan to modify periodically N/A 1 1 2 Table growth - high Too fast to modify 3 3 3 1 datastructures 6.69