Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS4432: Database Systems II Course Introduction Mohamed Eltabakh 1 What Is a Relational Database Management System ? Database Management System = DBMS Relational DBMS = RDBMS • A collection of files that store the data – But: Files that we do not directly access or read • A big C program written by someone else that accesses and updates those files for you – But: Huge program containing 100s of 1000s of lines 2 Where are RDBMS used ? • Backend for traditional “database” applications • Backend for large Websites • Backend for Web services Airline reservation University registration Bank accounts and transactions Movie Database Hospital System 3 Example of a Traditional Database Application University Suppose we are building a system registration to store the information about: • students • courses • professors • who takes what, who teaches what 4 Can we do it without a DBMS ? Sure we can! Start by storing the data in files: students.txt courses.txt professors.txt Now write C or Java programs to implement specific tasks File System Approach 5 Doing it without a DBMS... • Enroll “Mary Johnson” in “CSE444”: Write a C program to do the following: Read ‘students.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “CSE444” Write “students.txt” Write “courses.txt” 6 What Can Go Wrong Several drawbacks of using file systems •Data redundancy and inconsistency – Multiple file formats, duplication of information in different files – Multiple records formats within the same file – No order enforced between fields •Difficulty in accessing data – Need to write a new program to carry out each new task – No indexes, always scan the entire file •Integrity problems – Modify one file (or field in a file), and not changing the dependent fields or files – Integrity constraints (e.g., account balance > 0) become “buried” in program code rather than being stated explicitly 7 What Can Go Wrong • Concurrent access by multiple users – Many users need to access/update the data at the same time (concurrent access) – Uncontrolled concurrent access can lead to inconsistencies • Example: Two people are updating the same bank account at the same time • Security problems – Hard to provide user access to some, but not all, data • Recovery from crashes – While updating the data the system crashes • Maintenance problems – Hard to search for or update a field – Hard to add new fields 8 Enters a DMBS “Two tier database system” Applications Connection (ODBC, JDBC) Data files Database server (someone else’s C program) 9 Functionality of a DBMS The programmer sees SQL, which has two components: • Data Definition Language - DDL Frontend • Data Manipulation Language – DML (CS3431) Behind the scene the DBMS has: • Query Optimizer Backend • Query Engine (CS4432) • Storage Management • Transaction Management (concurrency, recovery) 10 How the Programmer Sees the DBMS Frontend • Start with DDL to create tables: CREATE TABLE Students ( Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, Category CHAR(20) ) ... • Continue with DML to populate tables: INSERT INTO Students VALUES(‘Charles’, ‘123456789’, ‘undergraduate’) . . . . 11 How the Programmer Sees the DBMS Frontend • Tables: Students: SSN 123-45-6789 234-56-7890 Courses: CID CSE444 CSE541 Takes: Name Charles Dan … Category undergrad grad … Name Databases Operating systems SSN 123-45-6789 123-45-6789 234-56-7890 CID CSE444 CSE444 CSE142 … Quarter fall winter “data independence” = separate logical view from physical implementation 12 What is Hidden ??? CREATE TABLE Students ( Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, Category CHAR(20) ) ... May create indexes INSERT INTO Students VALUES(‘Charles’, ‘123456789’, ‘undergraduate’) . . . . May update indexes Backend Creating file (Data) Updating catalog tables Reading and updating file from disk Check constraints 13 Queries Frontend • Find all courses that “Mary” takes SELECT C.name FROM Students S, Takes T, Courses C WHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid • We did not specify how to execute • We did not specify how to optimize Should be happy not to do it… 14 What is Hidden ??? Backend Imperative query execution plan: Declarative SQL query Cname SELECT C.name FROM Students S, Takes T, Courses C WHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid cid=cid sid=sid Complex cost model & many alternatives name=“Mary” Students Takes Courses The optimizer chooses the best execution plan for a query 15 Transactions Frontend • Enroll “Mary Johnson” in “CSE444”: BEGIN TRANSACTION; INSERT INTO Takes SELECT Students.SSN, Courses.CID FROM Students, Courses WHERE Students.name = ‘Mary Johnson’ and Courses.name = ‘CSE444’ -- More updates here.... IF everything-went-OK THEN COMMIT; ELSE ROLLBACK If system crashes, the transaction is still either committed or aborted 16 Transactions • A transaction = sequence of statements that either all succeed, or all fail • Basic unit of processing • Transactions have the ACID properties: A = atomicity C = consistency I = independence (Isolation) D = durability 17 Transaction ACID Properties T1 T2 T3 T4 • Each transaction has a Start and End and does many things in between • “A” Atomic: Either the entire transaction is done (all its actions) or none. • “C” Consistency: A transaction must move the DB from one consistent state to another consistent state 18 Transaction ACID Properties (Cont’d) T1 T2 T3 T4 • What about interaction – Can T2 read what T1 is writing? – Can T3 read what T1 is reading? – Can T4 read what T1 wrote? • “I” Isolation: Although running concurrently, they should appear as if they run is a certain serial order 19 Transaction ACID Properties (Cont’d) T1 T2 T3 T4 • If T1 failed & T2 completed This means what? • T1 Rolledback & T2 Committed • “D” Durability: The effect of a committed transaction must be persistent (not lost) 20 Transactions BEGIN TRANSACTION; BEGIN TRANSACTION; BEGIN TRANSACTION; INSERT INTO Takes BEGIN TRANSACTION; INSERT INTO Takes Courses.CID SELECT Students.SSN, INSERT INTO Takes SELECT Students.SSN, Courses.CID FROM Students, Courses INSERT INTO Takes SELECT Students.SSN, Courses.CID FROM Students, Courses WHERE Students.name = ‘Mary Johnson’ and SELECT Students.SSN, Courses.CID FROM Students, Courses WHERE Students.name = ‘Mary Courses.name = ‘CSE444’Johnson’ and FROM Students, Courses WHERE Students.name = ‘Mary Johnson’ and Courses.name = ‘CSE444’ WHERE Students.name = ‘Mary Johnson’ and = ‘CSE444’ -- More updates Courses.name here.... = ‘CSE444’ -- More updates Courses.name here.... -- More updates here.... IF everything-went-OK -- More updates here.... IFTHEN everything-went-OK COMMIT; IF everything-went-OK THEN COMMIT; ELSE IF ROLLBACK everything-went-OK THEN COMMIT; ELSE ROLLBACK THEN COMMIT; ELSE ROLLBACK ELSE ROLLBACK Backend Many transactions at the same time (Concurrency Control) Ensure the ACID properties Logging and Recovery Control 21 DBMS Backend Components • We will cover several of these components Chapter 1 in textbook 22 Topics To Be Covered… • File & System Structure Records in blocks, dictionary, buffer management,… • Indexing & Hashing B-Trees, hashing,… • Query Processing Query costs, join strategies,… • Crash Recovery Failures, stable storage,… • Concurrency Control Correctness, locks,… • Transaction Processing Logs, deadlocks,… 23 Database Material at WPI B, C terms CS 3431 D term (alternate) CS 4432 you are here CS525 MQP CS 542 CS 561 Grad. DB Advanced DB DON’T TAKE! DSRG Selected Topics Selected DB Project DB Research at WPI Varies Any time year round 24 Database Systems • The big commercial database vendors: – – – – Oracle IBM (with DB2) bought Informix recently Microsoft (SQL Server) Sybase • Some free database systems (Unix) : – Postgres – Mysql – Predator • In CS4432 we use Oracle & SimpleDB! 25