Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IMS 4212: Database Implementation Physical Database Implementation—Topics • Denormalization • Partitioning Tables (relations) • Parallel Processing & RAID Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 1 IMS 4212: Database Implementation Denormalization • Denormalizing is the process of reshuffling attributes and sometimes entities to create entities that violate the rules of normalization • We are trading off (again) storage efficiency and anomaly avoidance for better retrieval efficiency • Denormalizing includes: – Storing derived attributes explicitly – Allowing transitive dependencies (violating second, third, or Boyce-Codd normal form) – Merging entities in 1:1 relationships Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 2 IMS 4212: Database Implementation Denormalization (cont.) STUDENT StudentID LastName FirstName TotCrHr TotGP • Derived attributes – Storing derived attributes is one of the most common means of improving processing efficiency – How many tables/row examinations are avoided by storing total grade points and total credit hours with the STUDENT entity? – What new operations must be introduced to keep the data current? – Explicitly storing derived attributes gives rise to new operational business rules to enforce accuracy Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 3 IMS 4212: Database Implementation Denormalization (cont.) • 1:1 Relationships – It may be possible to collapse data from one entity in a 1:1 relationship into the other. OFFICE – Usually the pervasive entity survives Building Room Window – Alternately, both entities may be retained Length Width but the data from one may be copied into LastPaint FacultyID <FK> the other to avoid a table look-up LastName PROFESSOR FacultyID LastName FirstName Phone Dept. <FK> OFFICE Has Building Room Window Length Width LastPaint FacultyID <FK> Dr. Lawrence West, Management Dept., University of Central Florida [email protected] FirstName Phone Dept 4 IMS 4212: Database Implementation Denormalizing (cont.) • 1:M Relationships – You may consider moving or duplicating attributes from the “one” side of a 1:M relationship into the “many” side – This will result in considerable data duplication – Considerations • There should be many records on the “one” side • Frequent access should be directly into the “many” side Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 5 IMS 4212: Database Implementation Denormalization (cont.) • 1:M Relationships (cont.) ENROLLMENT SectionID <FK1> StudentID <FK2> Grade <FK3> LastName FirstName STUDENT Has StudentID LastName FirstName : • Similar technique may be used by collapsing or copying attributes into the associative entity between two entities in a M:M relationship Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 6 IMS 4212: Database Implementation Denormalization (cont.) • The goal of denormalizing is to avoid accessing a (large) table for high frequency critical transactions • Denormalizing usually requires additional business rules to guarantee that data remains accurate in the face of updates Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 7 IMS 4212: Database Implementation Partitioning • Partitioning entities divides one table into many – Horizontal partitioning • Each table has all fields from the original table • Each table has a subset of records – Vertical partitioning • Each table has the PK of the original table • Each table has all records • Each table has a subset of fields – May partition both vertically and horizontally • Very powerful technique with historical data Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 8 IMS 4212: Database Implementation Partitioning (cont.) • Horizontal Partitioning – How many records in the STUDENT table? – How many of them are currently enrolled? – How frequently do we need to access both current and former students in the same query or operation? – It may make sense to partition tables based on a historical context • Active records vs. archived records – May also partition based on geographic considerations – Whole table can be reconstructed using UNION query Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 9 IMS 4212: Database Implementation Partitioning (cont.) STUDENT E_MAIL StudentID E_Mail STUDENT HEALTHREC STUDENT StudentID LastName FirstName MiddleInitial StudentID BloodType Height Weight • Vertical Partitioning – Librarian, Registrar, Athletic Department, and Health Center may all need a different subset of fields from the STUDENT entity – It may make sense to create separate tables containing the necessary attributes for each view – Common PK creates 1:1 cardinality between all tables – Whole logical record can be assembled using SQL when needed – We are actually backing into a supertype/subtype relationship STUDENT ADDRESS StudentID HomeStreet HomeCity HomeState HomeCountry HomeZip LocalStreet LocalCity LocalState LocalCountry LocalZip LocalPhone E_Mail Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 10 IMS 4212: Database Implementation RAID Storage Devices • In conventional drives data is laid down sequentially along a track in the disk – Read/Write head must move along the track to read the data – Each read/write operation must finish before the next can begin – A drive failure can result in loss of all data Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 11 IMS 4212: Database Implementation RAID Storage Devices • RAID is for Redundant Array of Inexpensive Disks – Multiple disks appear as a single logical drive to the computer – May be implemented in hardware or software (OS) • Various RAID levels provide for different levels of performance and redundancy • Most RAID levels enable the rebuilding of entire lost physical drives through parity storage Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 12 IMS 4212: Database Implementation * RAID Storage Devices—Raid 3 • Records are striped across multiple physical devices – Part of each record is laid down across multiple physical drives – Much faster Read/Write time since disk rotation needed to read whole record/block is much shorter – However only one request can be serviced concurrently – Not commonly used in practice • A single parity disk allows reconstruction of data on damaged drives Dr. Lawrence West, Management Dept., University of Central Florida [email protected] * Image source: Wikipedia 13 IMS 4212: Database Implementation * RAID Storage Devices—Raid 4 • Blocks are stored independently on the drives – Block A1 can be serviced just by Drive 0 – Simultaneous requests for Blocks B2 or D3 can also be serviced • A single parity drive enables recovery of lost data • Write operations may be slower—simultaneous write operations to Drives 0-2 must wait on the parity calculation and writing on Drive 3 Dr. Lawrence West, Management Dept., University of Central Florida [email protected] * Image source: Wikipedia 14 IMS 4212: Database Implementation * RAID Storage Devices—Raid 5 • Similar to Raid 4 except that parity storage is distributed across multiple drives – Rotating allocation – Lessens the chance that writes on two drives will wait on parity updates on a single parity drive Dr. Lawrence West, Management Dept., University of Central Florida [email protected] * Image source: Wikipedia 15 IMS 4212: Database Implementation Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 16 P2 P3 One Processor • More and more computers support parallel processing (multiple CPUs on the same computer) 1 ... • Some tasks can be split among multiple 2 ... 3 ... processors 4 ... • In an SQL SELECT query the usual 5 ... method requires the RDBMS to scan 6 ... each record to determine if it matches 7 ... the WHERE clause or JOIN criteria 8 ... 9 ... • In parallel processing part of the whole table is passed to each processor • Availability depends on hardware, OS, and RDBMS P1 Parallel Processing