Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 9: Database Systems Computer Science: An Overview Tenth Edition by J. Glenn Brookshear Modified for UMBC’s CMSC 100, Fall 2008, by Marie desJardins Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 9: Database Systems • • • • • • • 9.1 Database Fundamentals 9.2 The Relational Model 9.3 Object-Oriented Databases 9.4 Maintaining Database Integrity 9.5 Traditional File Structures 9.6 Data Mining 9.7 Social Impact of Database Technology 1-2 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-2 Database A collection of data that is multidimensional in the sense that internal links between its entries make the information accessible from a variety of perspectives 1-3 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-3 Figure 9.1 A file versus a database organization 1-4 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-4 Figure 9.2 The conceptual layers of a database implementation 1-5 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-5 Schemas • Schema: A description of the structure of an entire database, used by database software to maintain the database • Subschema: A description of only that portion of the database pertinent to a particular user’s needs, used to prevent sensitive data from being accessed by unauthorized personnel 1-6 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-6 Database Management Systems • Database Management System (DBMS): A software layer that manipulates a database in response to requests from applications • Distributed Database: A database stored on multiple machines – DBMS will mask this organizational detail from its users • Data independence: The ability to change the organization of a database without changing the application software that uses it 1-7 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-7 Database Models • Database model: A conceptual view of a database – Relational database model – Object-oriented database model 1-8 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-8 Relational Database Model • Relation: A rectangular table – Attribute: A column in the table – Tuple: A row in the table 1-9 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-9 Figure 9.3 A relation containing employee information 1-10 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-10 Relational Design • Avoid multiple concepts within one relation – Can lead to redundant data – Deleting a tuple could also delete necessary but unrelated information 1-11 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-11 Improving a Relational Design • Decomposition: Dividing the columns of a relation into two or more relations, duplicating those columns necessary to maintain relationships – Lossless or nonloss decomposition: A “correct” decomposition that does not lose any information 1-12 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-12 Figure 9.4 A relation containing redundancy 1-13 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-13 Figure 9.5 An employee database consisting of three relations 1-14 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-14 Figure 9.6 Finding the departments in which employee 23Y34 has worked 1-15 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-15 Figure 9.7 A relation and a proposed decomposition 1-16 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-16 Relational Operations • Select: Choose rows • Project: Choose columns • Join: Assemble information from two or more relations 1-17 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-17 Figure 9.8 The SELECT operation 1-18 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-18 Figure 9.9 The PROJECT operation 1-19 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-19 Figure 9.10 The JOIN operation 1-20 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-20 Figure 9.11 Another example of the JOIN operation 1-21 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-21 Figure 9.12 An application of the JOIN operation 1-22 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-22 Maintaining Database Integrity • Transaction: A sequence of operations that must all happen together – Example: transferring money between bank accounts • Transaction log: A non-volatile record of each transaction’s activities, built before the transaction is allowed to execute – Commit point: The point at which a transaction has been recorded in the log – Roll-back: The process of undoing a transaction 1-29 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-29 Maintaining database integrity (continued) • Simultaneous access problems – Incorrect summary problem – Lost update problem • Locking = preventing others from accessing data being used by a transaction – Shared lock: used when reading data – Exclusive lock: used when altering data 1-30 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-30 Sequential Files • Sequential file: A file whose contents can only be read in order – Reader must be able to detect end-of-file (EOF) – Data can be stored in logical records, sorted by a key field • Greatly increases the speed of batch updates 1-31 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-31 Figure 9.14 The structure of a simple employee file implemented as a text file 1-32 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-32 Figure 9.15 A procedure for merging two sequential files 1-33 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-33 Figure 9.16 Applying the merge algorithm (Letters are used to represent entire records. The particular letter indicates the value of the record’s key field.) Indexed Files • Index: A list of key values and the location of their associated records 1-35 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-35 Figure 9.17 Opening an indexed file 1-36 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-36 Hashing • Each record has a key field • The storage space is divided into buckets • A hash function computes a bucket number for each key value • Each record is stored in the bucket corresponding to the hash of its key 1-37 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-37 Figure 9.18 Hashing the key field value 25X3Z to one of 41 buckets 1-38 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-38 Figure 9.19 The rudiments of a hashing system 1-39 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-39 Collisions in Hashing • Collision: The case of two keys hashing to the same bucket – Clustering problem: Poorly designed hashing function can have uneven distribution of keys into buckets – Collision also becomes a problem when there aren’t enough buckets (probability greatly increases as load factor (% of buckets filled) approaches 75%) – Solution: somewhere between 50% and 75% load factor, increase number of buckets and rehash all data 1-40 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-40 Data Mining • Data Mining: The area of computer science that deals with discovering patterns in collections of data • Data warehouse: A static data collection to be mined – Data cube: Data presented from many perspectives to enable mining 1-41 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-41 Data Mining Strategies • • • • • • Class description Class discrimination Cluster analysis Association analysis Outlier analysis Sequential pattern analysis 1-42 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-42 Social Impact of Database Technology • Problems – Massive amounts of personal data are being collected • Often without knowledge or meaningful consent of affected people – Data merging produces new, more invasive information – Errors are widely disseminated and hard to correct • Remedies – Existing legal remedies often difficult to apply – Negative publicity may be more effective 1-43 Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 9-43