* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Physical Database Design
Survey
Document related concepts
Open Database Connectivity wikipedia , lookup
Oracle Database wikipedia , lookup
Relational algebra wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Ingres (database) wikipedia , lookup
Functional Database Model wikipedia , lookup
Concurrency control wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
Clusterpoint wikipedia , lookup
ContactPoint wikipedia , lookup
Transcript
IT 20303 • The Relational DBMS • Section 06 Relational Database Theory • Physical Database Design Relational Database Theory • Physical Database Design – Goals • Improve performance –By minimizing disk I/O • Improving management of the data –By grouping tables that can be managed as a group Relational Database Theory • Physical design decisions are based on: – Use of the data (volume, frequency) – Features supported by the specific RDBMS – Disk storage configuration Relational Database Theory • DBA initially sets up the physical database – Tunes physical parameters on a ongoing basis • As usage patterns change • As new hardware/software options become available Relational Database Theory • Steps in Physical Design Process – Determine which tables can be managed as a group • Many RDBMSs support the concept of a Container (Oracle Tablespace, db space, Access uses the .mdb) –A collection of tables, and indexes Relational Database Theory – Develop a plan for allocating tables to disk devices • Consider parallel disk controllers • Group tables together that are frequently joined • Distribute heavily accessed table to different disk devices –To avoid excessive head movement on one disk Relational Database Theory – Build indexes on table columns, based on frequency of use – Restructure tables if necessary • Fragment large tables into multiple smaller ones • De-normalize tables if appropriate Relational Database Theory Tablespace • Example of a Container Table 1 Table N Table 2 OS File Relational Database Theory • Managing a collection of Tables, Indexes – Purpose of container concept • Relate tables, indexes to physical disk files • Aid in the management of the database –Example: A tablespace can be taken offline, backed up, and restored while the remainder of the database is online Relational Database Theory – Support clustering data from related tables in the same file • So that related data is read with the same I/O request Relational Database Theory • How the RDBMS processes a user request – RDBMS parses, validates, and optimizes the SQL request – Determines disk file in which the table is written • Specific to each RDBMS & OS Relational Database Theory – Initiates I/O request to operating system, if necessary • I/O is requested if file is not currently in buffers – Processes execution plan using data in its buffer Relational Database Theory • Indexes – Index is a separate structure (table) • Points into the data table • Built on one or more columns in the data table Relational Database Theory • Comments on Indexing – An index can be built on any column or combination of columns – An index can be unique or non-unique – An index on the primary key is called the primary index – Most RDBMSs use an internal row id as the pointer to the row – Use of the index is transparent to the user Relational Database Theory • Use of an index – Provides access to a row based on data value(s) – Avoids duplicates – only way – Supports sequential processing on the indexed field – Improves performance Relational Database Theory • Use of an index improves performance on Retrieval – Processing an index is more efficient than processing a table – for reads • Index is usually small, relative to the table –Can be held entirely in memory • The smaller the index value, the more entries per block the more likely the index will be in memory Relational Database Theory • Most RDBMSs use a type of B-Tree Index – B-tree indexes were designed for efficient search of a sorted list – Algorithms exist for managing and maintaining B-trees Relational Database Theory • B-trees were introduced by Bayer (1972) and McCreight. – They are a special m-ary balanced tree used in databases because their structure allows records to be inserted, deleted, and retrieved with guaranteed worst-case performance Relational Database Theory • B-Tree Relational Database Theory • Use of index degrades performance on Updates – Inserting a row is the source of much disk I/O (overhead) • Every index on the table must be searched and updated also Relational Database Theory • Frequently inserting rows leads to index block overflow – Causes much disk I/O as overflow condition is processed Relational Database Theory • Techniques for managing volatile tables (many interests, deletes) – Partially fill index blocks when creating the index – Periodically restructure (Drop, Create) the indexes Relational Database Theory • Indexing: Strengths and Weaknesses – Strengths • Improves performance on retrieval of data • Can be built or dropped at any time • Usage is transparent to the user – Weaknesses • Degrades update performance Relational Database Theory • De-normalization – De-normalization means combining two (or more) tables • Usually done when tables are frequently joined – De-normalization (joining two tables) depends on usage • Depends on how applications and users access the data Relational Database Theory • De-normalization is done to improve performance – Tailors data structures for one specific application’s use – Improves performance of one type of access at expense of others Relational Database Theory • De-normalization Trade-Offs Normalization De-normalization Eliminates update anomalies Improves performance for specific application(s) Minimizes data redundancy Supports simpler logic Provides applicationindependent database design Encourages sharing of data Relational Database Theory • When to De-Normalize – This is EVIL, Do Not Do… – When does de-normalization have minimal impact? • Data is accessed primarily on a read-only basis • Data is accessed primarily by one application Relational Database Theory • When to de-normalize – After database design is done and tables are normalized to 3NF – After clustering related tables in the same logical container – After considering trade-offs and usage of data Relational Database Theory • Alternatives to de-normalization – Physical placement of data • Use of container • Can improve performance without impacting logical design – Selective hardware upgrades • More main memory, expanded storage, cache storage devices Relational Database Theory • Fragmentation – Better alternative to denormalization – Means breaking one table into two (or more) tables • Usually done when one table is very large • Or groups of user almost exclusively access a subset of data in a table Relational Database Theory • Fragmentation can be based on selection or projection – Must be able to reconstruct the original table – by union or join – Primary key column(s) must be included in all vertical fragments • Disadvantage is that the DBA must be aware of all the fragmented tables Relational Database Theory • Physical Design Review Relational Database Theory • Physical Database Design – Goals • Improve performance –By minimizing disk I/O • Improving management of the data –By grouping tables that can be managed as a group Relational Database Theory • Indexing: Strengths and Weaknesses – Strengths • Improves performance on retrieval of data • Can be built or dropped at any time • Usage is transparent to the user – Weaknesses • Degrades update performance Relational Database Theory • Questions?