Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft Jet Database Engine wikipedia , lookup
Serializability wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
Functional Database Model wikipedia , lookup
Concurrency control wikipedia , lookup
Relational algebra wikipedia , lookup
eCommerce Technology 20-751 Databases 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Concepts • Relational model • SQL • DB construction – Normalization – ER diagrams • Transactions • Web support 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Critical Role of Data • Without data, an organization cannot function – especially in eCommerce • Initially, data was prepared for specific applications – payroll data for the payroll system – parts lists for the bill of materials system – sales data for statistical analysis • By 1970, clear that data had common properties • Data for many applications could be stored together in an organized way – database instead of separate collections 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS What is a Database? • No formal definition • A collection of related data allowing: – insert (add new data) – delete (delete existing data) – update (change existing data = delete + insert) – query (retrieve all data having a certain property) • What does “related” mean? 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Database Management System • Based on a data model, e.g. relational, object, hierarchical • Has data definition language (DDL) to identify data • Has data manipulation language (DML) for queries and updates • Separates structure of data from DB implementation • Enforces data structure and content rules • Handles transactions, concurrent operations • Allows backup and recovery from errors • Connects to other software 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS The Relational Model • A set is a collection of unique items { CS, HCII, ISRI, RI, LTI, CALD } Divisions of SCS { CS, HCII, CS, HCII, RI, CS } NOT A SET (repeated elements) • A relation on two sets A, B is a set of pairs of elements, one from A and one from B A = { 46-870, 20-751, 46-749, 20-753, 20-770 } B = { GSIA, SCS } R = { (46-870, GSIA), (20-751, SCS), (20-753, SCS), (20-770, GSIA), (46-749, GSIA) } • Relations can be defined on any number of sets 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS The Relational Model of Data A = { 46-870, 20-751, 46-749, 20-753, 20-770 } B = { GSIA, SCS } R = { (46-870, GSIA), (20-751, SCS), (20-753, SCS), (20-770, GSIA), (46-749, GSIA) } 20-770 46-870 46-749 20-753 20-751 GSIA SCS This is the graph of the relation R 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS The Relational Model of Data A = { 46-870, 20-751, 46-749, 20-753, 20-770 } B = { GSIA, SCS } R = { (46-870, GSIA), (20-751, SCS), (20-753, SCS), (20-770, GSIA), (46-749, GSIA) } CONTAINS ONLY COURSE NUMBERS COURSE SCHOOL 20-770 SCS 46-749 GSIA 20-753 SCS 20-751 SCS 46-870 GSIA CONTAINS ONLY SCHOOL NAMES This is a table of the relation R 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS The Relational Model of Data Relations are not necessarily binary. May involve many sets: COURSE SCHOOL REQ'D ROOM # FACULTY DEPT 20-770 SCS Y 152 64 STEENKISTE CS 46-749 GSIA N 150 62 GOETTLER GSIA 20-753 SCS N 150 57 NYBERG LTI 20-751 SCS Y 152 64 SHAMOS LTI 46-870 GSIA Y 152 64 MUKHOPADHYAY GSIA • Each row is a 7-tuple. Relation on 7 sets. • No implied ordering of either rows or columns. Sorting is irrelevant • Note: bad table design since “DEPT” is an attribute of “FACULTY”, not “COURSE” 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Tables • • • • A relation can be represented as a table One row for each tuple in the relation Easier to draw than a graph Table has implicit order (of rows and columns) – But: a relation has no ordering, either of tuples or attributes • The cardinality C(R) of a relation R is the number of tuples it contains = # of rows in its table • Relational model represents data as a collection of unordered two-dimensional tables 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Keys • Key: an attribute (or minimum set of attributes) that uniquely defines a tuple – In the example relation, “Course” is a key • A relation may have more than one key. • A set of attributes that can serve as a key is a candidate key. • One is chosen as the primary key. • Keys are used to reference (retrieve) tuples. 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Foreign Keys • A key from one relation that is an attribute of another relation is a foreign key. • If we had a “Faculty” relation, then “Faculty” would be a foreign key in the “Courses” relation. • Foreign keys connect relations together. FOREIGN KEY: PRIMARY KEY PRIMARY KEY COURSE SCHOOL REQD RM # FACULTY 20-770 SCS Y 152 64 STEENKISTE FACULTY GOETTLER 46-749 GSIA N 150 62 GOETTLER MUKHOPADHYAY GSIA 20-753 SCS N 150 57 NYBERG NYBERG LTI 20-751 SCS Y 152 64 SHAMOS SHAMOS LTI 46-870 GSIA Y 152 64 MUKHOPADHYAY STEENKISTE CS 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 DEPT GSIA COPYRIGHT © 2003 MICHAEL I. SHAMOS Operations on Relations • Projection List specific attributes L (columns) of R, written L(R) E.g. show course number and room Course, Room(Courses) Courses COURSE SCHOOL REQD RM 20-770 SCS Y 152 # FACULTY 64 STEENKISTE DEPT CS COURSE RM 20-770 152 46-749 GSIA N 150 62 GOETTLER GSIA 46-749 150 20-753 SCS N 150 57 NYBERG LTI 20-753 150 20-751 SCS Y 152 64 SHAMOS LTI 20-751 152 46-870 GSIA Y 152 64 MUKHOPADHYAY GSIA 46-870 152 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Operations on Relations • Selection (extract horizontal slices) – List all tuples of relation R whose attributes satisfy condition C, written C(R) – E.g. show all tuples with Room = 152, Room=152(R) COURSE SCHOOL REQD RM 20-770 SCS Y 152 # FACULTY 64 STEENKISTE DEPT CS 46-749 GSIA N 150 64 GOETTLER GSIA 20-751 SCS Y 152 64 SHAMOS LTI • Projection & Selection are unary (1-table) operations 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Structured Query Language (SQL) • A data manipulation language for manipulating relational databases • SELECT queries the database • UPDATE modifies relations • DELETE removes tuples Syntax of the SQL SELECT command: SELECT { attributes } FROM { table } WHERE { attribute-conditions }; 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Structured Query Language (SQL) YIELDS DISTINCT TUPLES SINCE CourseNo IS A KEY • Projection – SQL: SELECT CourseNo, Room FROM Courses; – SQL: SELECT DISTINCT Room FROM Courses; MUST ASK FOR DISTINCT TUPLES SINCE Room IS NOT A KEY • Selection – SELECT * FROM Courses WHERE Room= “152”; – Give a table of all courses that meet in 152 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Join • The natural join A * B consists of tuples with matching attributes (names & values) in A and B • Natural join is a way of obtaining information across tables 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Natural Join A * B • Attribute names and values must match Statistics: Geography: Detroit MI = 1027974 * City State Pop % For Seattle WA 532900 13.1 3.4 City Seattle Atlanta Detroit Area 84 130 139 Elevation Radio 14 40 1050 27 601 59 City Seattle State Pop % For WA 532900 13.1 Area 84 Elevation 14 Radio 40 Detroit MI 139 601 59 1027974 3.4 • Also called “inner join”: Statistics * Geography • Cartesian product and join are binary operations 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Joins in SQL • SELECT City, State, Radio FROM Statistics INNER JOIN Geography ON Statistics.City = Geography.City Statistics: Geography: City Seattle Atlanta Detroit City State Pop % For Seattle WA 532900 13.1 Detroit MI Result of Query: 1027974 3.4 City Seattle State WA Detroit MI 20-751 ECOMMERCE TECHNOLOGY Area 84 130 139 Elevation Radio 14 40 1050 27 601 59 Radio 40 59 SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Other SQL Constructs • ORDERBY (sorting) – SELECT Company, OrderNumber FROM Orders ORDER BY Company; • BETWEEN – SELECT * FROM Persons WHERE LastName BETWEEN 'Hansen' AND 'Pettersen‘; • ALTER TABLE (changes table structure) • Functions – SUM() – COUNT() – MAX() 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Database Constraints • Domain (data validity) constraints – All values in a column must be from the same domain – Example: all salaries are positive numeric dollar amounts. “Monthly” is invalid. • Entity Integrity – Every entity must have a unique primary key. (Otherwise, can’t access the entity) • Referential Integrity – Every foreign key value in a relation must match a primary key in the foreign relation table 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Functional Dependency • Attribute B is functionally dependent on attribute A if the value of A uniquely determines B – One-to-one relationship: two functional dependencies: A depends on B; B depends on A – Many-to-one relationship: one functional dependency: B depends on A – Many-to-many relationship: no dependencies: neither A nor B depends on the other • Functional dependencies are constraints between attributes or sets of attributes. They must be maintained or error or inconsistency will result. 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Normalization • A relation is well-structured if it is non-redundant and allows INSERT, MODIFY and DELETE without error or inconsistency. • Normalization assists in maintaining functional dependencies and preventing errors and inconsistencies. • DELETE anomaly: Student Email Course Smith smithj@andrew 20-751 Room 152 Smith Jones 152 150 smithj@andrew 20-753 jonesj@cs 46-870 • Deleting “Jones” removes all information about course 46-870 (namely that its room is 150) • In the information is in another table, it shouldn’t be here also. 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Normalization • MODIFY anomaly: Student Email Course Smith smithj@andrew 20-751 Room 152 Smith Jones 152 150 smithj@andrew 20-753 jonesj@cs 46-870 • Suppose Smith’s email address changes. Every line in the table corresponding to Smith must be changed or data will be inconsistent. • An attribute unique to a key should be entered only once in the database. 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Normalization • Restructuring to produce smaller, well-structured equivalent relations, reduce data replication • First Normal Form. Make all attributes atomic. No multiple values. Name Phone Carbonell 83064, 87279 Dept CS, LTI Reddy CS, Robotics WeH 82597, 87170 Bldg WeH, NSH MULTIPLE VALUES FIRST NORMAL FORM: MULTIPLE VALUES Name Phone Carbonell 83064 Dept CS Bldg WeH Carbonell 87279 Reddy 82597 Reddy 87170 LTI CS Robotics NSH WeH WeH 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Second Normal Form • Eliminate partial functional dependencies. Every non-key attribute must depend on all key attributes (or redundancy can result). Capital KEY IS (City, State) NOT IN 2NF: 2NF: DECOMPOSE INTO TWO TABLES City Philadelphia State Capital PA Harrisburg Pittsburgh Detroit PA MI City Philadelphia State PA Pittsburgh Detroit PA MI Harrisburg Ann Arbor City Pop. 1478002 1336449 1027974 City Pop. 1478002 DEPENDS ON State ONLY, NOT CITY 1336449 1027974 State Capital PA Harrisburg MI Ann Arbor There are many other normal forms and normalization rules 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Entity-Relationship (ER) Diagrams • Must specify: – Entities (things to be represented in the database) – Attributes (properties of entities) – Relationships (relations among entities) • These can be modeled by entity-relationship diagrams • The diagrams are used as a guide to designing the database 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Entity-Relationship (ER) Diagrams • Entity types – Entity type: Store – Entities: Downtown Store, Squirrel Hill Store, Oakland Store • Relationships between entity types: • This is the “Has” relationship • Direction of arrow is important (“Branch has Staff,” not “Staff Has Branch”) EXAMPLE FROM CONNOLLY & BEGG 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Entity-Relationship (ER) Diagrams • Relationships need not be binary: • This is the “Arranges” relationship; it can be though of as a 4-tuple (Solicitor, Bid, Buyer, Institution) EXAMPLE FROM CONNOLLY & BEGG 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Entity-Relationship (ER) Diagrams EXAMPLE FROM CONNOLLY & BEGG 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Entity-Relationship (ER) Diagrams EXAMPLE FROM CONNOLLY & BEGG 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Web Database Connectivity JDBC = Java Database Connectivity SQLJ = Java-Embedded SQL SOURCE: CONNOLLY & BEGG Distributed Databases • Databases in which data is stored in more than one location but appears local to the user – Replicated: multiple copies of database – Partitioned: data is split among locations • Fragmentation – Information about fragments is stored in a distributed data catalog (DDC) – Horizontal v. vertical fragmentation 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Distributed Databases • Advantages – Reduced load on central DB – Lower cost (data spread among small machines) – Reliability (machine failure is not fatal) – Fast access to local data – Ease of growth • Disadvantages – Complexity. Difficult to maintain consistency – Security (many access points) – Telecommunications required 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Distributed Databases • Products – PeerDirect • Issues – Updating of information in a distributed database is a form of transaction processing 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS What is a Transaction? • An action requiring a series of steps and database updates. • Transactions are the basis of Ecommerce. • Transactions may be distributed. Steps processed on different computers. • Transactions may fail. One or more steps may be unsuccessful. • Transaction systems must be recoverable. Data and “state” must be restored after failure. 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS ATM Transaction Database ACCT 536-0178 536-9112 PIN AEZ22 71BZZ BALANCE $ 1013.22 $ 25561.18 557-0308 82A6Z $ 599-8133 632-0012 K8LL0 R3TTP $ 1622.77 $ 12016.45 ACCOUNT MASTER STOLEN CARDS 542-1613 599-0028 3278.08 RECENT ACTIVITY POST I NG LOG ACCT 557-0308 536-9112 RECENT $ 150.00 $ 6700.00 542-0061 $ 240.00 ACCT 536-0178 536-0178 599-8133 610-0518 $ 5500.00 $ 50.00 107-0003 599-8133 20-751 ECOMMERCE TECHNOLOGY STOLEN 421-2254 536-9112 613-4299 667-0033 ATM TIME UID AMOUNT 543 10:08:32 0005148 -200.00 543 10:09:21 0005154 -200.00 391 10:10:06 0005167 422 10:11:15 0005174 SUMMER 2003 300.00 -75.00 COPYRIGHT © 2003 MICHAEL I. SHAMOS ATM Withdrawal 1 2 3 4 5 6 7 8 9 10 11 Check STOLEN Check PIN Check RECENT v. BALANCE Check ATM reserve Update RECENT Update BALANCE Write to Log Update ATM reserve Tell ATM to dispense money Check dispensing status Make updates permanent 20-751 ECOMMERCE TECHNOLOGY If card is stolen, ABORT If wrong, retry 3 times, ABORT Too much activity, ABORT If not enough money, ABORT Indicate new activity Debit bank account Record transaction Debit ATM balance Pay the man If failed, ABORT COMMIT the transaction (Think: “to memory”) SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS ACID • Four minimum requirements of a transaction T in a transaction system: • Atomic. T executes completely or not at all. • Consistent. T preserves database consistency and integrity. • Isolated. T executes as if it were running alone. Not affected by other concurrent transactions. • Durable. T’s results preserved during failure. 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Atomicity • • • • Transaction: John pays Mary $100. Take $100 out of John’s account. Add $100 to Mary’s account Problems: – John or Mary might not have an account – John might not have $100 – System might fail after subtracting $100 from John • If failure occurs, must undo partial results • “Commit”: successful recording of a transaction • “Abort”: failure of a transaction. 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Consistency • Maintain database constraints – – – – data validity unique primary keys referential integrity conservation conditions (debits = credits, total cash = sub of cash in all accounts, etc.) 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Isolation • Two transactions T1, T2 are interleaved if some steps of one are performed after the other starts but before it completes. T1 has steps A B C D E F; T2 has steps P Q R S A B P C D Q R S E F is an interleaved schedule. P Q R S A B C D E F is not interleaved. • A sequence of transactions is isolated if their steps can be interleaved without affecting the result. Transactions are blind to simultaneous execution. 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Durability • Results must survive failure. • Logging. Maintaining a record of all data updates so databases can be repaired if failure occurs. • Updates must be logged before they are performed. If failure occurs, transaction can complete from failure point. • If abort is necessary, can undo logged transactions. • Without logging, can’t recover from some types of failures. 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Simultaneous Transactions • If all TP were done by one single-threaded process, it would be easy. Just execute one step at a time. • With just two threads (or processes) it’s complicated. 2 transactions T1, T2: READ A; A = A+1; WRITE A; A in DB 5 5 T1 STEP READ A A in T1 5 T2 STEP A in T2 5 5 6 A=A+1 5 6 READ A 5 5 6 WRITE A 6 6 A=A+1 WRITE A 6 6 6 • Value of A is 6, but it should be 7! 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Locking • A solution is LOCKING. Associate a variable with one process at a time. LOCK the others out. • LOCK A; READ A; A = A+1; WRITE A; UNLOCK A; • If T1 starts first, it locks A. • When T2 tries to lock A, it can’t. It has to wait. • T1 finishes completely before T2 can lock A. • After T1 finishes, A = 6 • After T2 finishes, A = 7, the correct value • Locking achieves atomicity 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Deadlock T1: LOCK A; LOCK B; B=A+B; UNLOCK A; UNLOCK B; T2: LOCK B; LOCK A; B=A - B; UNLOCK B; UNLOCK A; A in DB B in DB 17 3 17 3 T1 STEP LOCK A T2 STEP 17 3 17 3 17 3 LOCK B CAN'T EXECUTE LOCK B LOCK A CAN'T EXECUTE This is deadlock. Neither transaction can complete. 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Ways to Eliminate Deadlock 1. Require each transaction to request all locks at the same time. System either grants them all or none. • Problem: very restrictive. Transactions cannot be interleaved. Essentially serial execution. 2. Assign an ordering to the variables: A=1; B=2; C=3 … Require transactions to request locks in that order. 3. Do nothing. Periodically check for deadlock. If it exists, cancel out a transaction. 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Q&A 20-751 ECOMMERCE TECHNOLOGY SUMMER 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS