* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Databases and Database Programming Robert M. Dondero, Ph.D. Princeton University
Survey
Document related concepts
Commitment ordering wikipedia , lookup
Oracle Database wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Serializability wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Ingres (database) wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Clusterpoint wikipedia , lookup
Concurrency control wikipedia , lookup
Transcript
Databases and Database Programming Robert M. Dondero, Ph.D. Princeton University 1 Objectives You will learn about: Databases and database management systems Database design Transactions SQL and MySQL Database programming in Java and Python 2 Objectives Note: Comprehensive coverage is impossible! Please supplement with the required reading 3 Part 1: Databases and Database Management Systems 4 DB and DBMS Some definitions... Database (DB) A structured collection of data An abstract view of a file or collection of files stored on a disk 5 DB and DBMS Database management system (DBMS) Software that maintains a database Usually, a server Listens on a known host at a known port Clients contact server to perform queries and updates 6 Motivation for DB and DBMS Question: Why not simply use files? Answer: Centralized control See An Introduction to Database Systems (C. J. Date) Database administrators (DBAs) control the data 7 Motivation for DB and DBMS A good DBMS used by good DBAs can: Reduce redundancy Avoid inconsistencies Facilitate data sharing Enforce standards Apply security restrictions Maintain integrity Balance conflicting requirements Insure safety (backups) 8 Types of DBs In historical order... Navigational DBs Relational DBs 9 Navigational DBs Data are linked into tree or network structure User/program is given "root" node User/program follows links to desired data Example: 10 Example Navigational DB BOOKS ORDERS CUSTOMERS isbn title quantity custid quantity author AUTHORS custname street zipcode ZIPCODES city state 11 Example Nav DB Query Which customers purchased the book whose ISBN is 123? BOOKS (isbn, title, quantity) ORDERS (custid, quantity) CUSTOMERS (custname, street, zipcode) 123 | The Practice of Programming | 50 222 | 20 111 | 30 Princeton | 114 Nassau St | 08540 Harvard | 1256 Mass Ave | 02138 12 Navigational DB Limitations In the example: Good: Given book, easy to find customers Bad: Given customer, hard to find books In general: Queries are biased DB designer must anticipate queries to create appropriate links 13 Relational DBs Edgar Codd, 1968-1970 Applied some mathematics Eliminate all links!!! Informally: Database consists of tables Example... 14 Relational DBs BOOKS isbn 123 234 345 title The Practice of Programming The C Programming Language Algorithms in C ORDERS isbn custid 123 222 345 222 123 111 quantity 50 100 150 AUTHORS isbn author 123 Kernighan 123 Pike 234 Kernighan 234 Ritchie 345 Sedgewick quantity 20 100 30 CUSTOMERS custid custname 111 Princeton 222 Harvard 333 MIT street 114 Nassau St 1256 Mass Ave 292 Main St zipcode 08540 02138 02142 ZIPCODES zipcode city 08540 Princeton 02138 Cambridge 02142 Cambridge state NJ MA MA 15 Relational DBs No links Queries are unbiased However: User can create links (indices), based upon anticipated usage patterns DBMS can create links (indices), based upon observed usage patterns 16 Popular Relational DBMSs Today relational DBs are the most popular kind Popular relational DBMSs: Oracle (Oracle, commercial) SQL Server (Microsoft, commercial) Informix (IBM, commercial) MySQL (Sun/Oracle, free) SQLite (free) 17 Popular Relational DBMSs Hereafter, we will: Limit our discussion to relational DBs and DBMSs Use MySQL 18 Part 2: Database Design 19 Example Database DB0 Design: BOOKS: isbn, title, authors, quantity ORDERS: isbn, custid, custname, street, city, state, zipcode, quantity 20 Example Database DB0 Example data: BOOKS isbn 123 234 345 title The Practice of Programming The C Programming Language Algorithms in C ORDERS isbn custid 123 222 345 222 123 111 custname Harvard Harvard Princeton authors Kernighan,Pike Kernighan,Ritchie Sedgewick street 1256 Mass Ave 1256 Mass Ave 114 Nassau St city Cambridge Cambridge Princeton quantity 500 800 650 state MA MA NJ zipcode 02138 02138 08540 quantity 20 100 30 21 Normal Forms Observation: Design seems wrong Note redundancy Can we be more formal? Are there rules that we can apply to: Determine that the design is wrong? Make it right? 22 Normal Forms Database researchers have developed theories to help DBAs: Develop "right" designs, or... Consciously decide to use a "wrong" design In particular, they have proposed normal forms 23 First Normal Form Informally... Def: A table is in first normal form iff each column contains only atomic values 24 DB0 Not in First Normal Form BOOKS isbn 123 234 345 title The Practice of Programming The C Programming Language Algorithms in C ORDERS isbn custid 123 222 345 222 123 111 custname Harvard Harvard Princeton authors Kernighan,Pike Kernighan,Ritchie Sedgewick street 1256 Mass Ave 1256 Mass Ave 114 Nassau St city Cambridge Cambridge Princeton quantity 500 800 650 state MA MA NJ zipcode 02138 02138 08540 quantity 20 100 30 25 Example Database DB1 Design BOOKS: isbn, title, quantity AUTHORS: isbn, author ORDERS: isbn, custid, custname, street, city, state, zipcode,quantity 26 Example Database DB1 Example data: BOOKS isbn 123 234 345 title The Practice of Programming The C Programming Language Algorithms in C ORDERS isbn custid 123 222 345 222 123 111 custname Harvard Harvard Princeton quantity 500 800 650 street 1256 Mass Ave 1256 Mass Ave 114 Nassau St city Cambridge Cambridge Princeton AUTHORS isbn author 123 Kernighan 123 Pike 234 Kernighan 234 Ritchie 345 Sedgewick state MA MA NJ zipcode 02138 02138 08540 quantity 20 100 30 27 Candidate Keys Suppose a table has columns {C1, ..., Cn } Def: A set of columns {Ci, …, Ck} is a candidate key of that table if it satisfies two properties: Uniqueness: At any given time, no two distinct rows of the table have the same value for {Ci, … Ck}. Minimality: None of {Ci, …, Ck} can be discarded without destroying the uniqueness property 28 Candidate Keys in DB1 Candidate keys: BOOKS isbn 123 234 345 title The Practice of Programming The C Programming Language Algorithms in C ORDERS isbn custid 123 222 345 222 123 111 custname Harvard Harvard Princeton quantity 500 800 650 street 1256 Mass Ave 1256 Mass Ave 114 Nassau St city Cambridge Cambridge Princeton AUTHORS isbn author 123 Kernighan 123 Pike 234 Kernighan 234 Ritchie 345 Sedgewick state MA MA NJ zipcode 02138 02138 08540 quantity 20 100 30 29 Primary Keys Def: We choose one candidate key to be the primary key; the others are called alternate keys In DB1: Each table has only one candidate key It's obvious which to choose 30 Functional Dependence Def: A column C2 of a table is functionally dependent on a column C1 iff, for each row in the table, the value of C1 determines the value of C2 In DB1... 31 DB1 Functional Dependencies BOOKS isbn title AUTHORS quantity ORDERS quantity isbn custid isbn author custname street zipcode city state 32 Second Normal Form Informally... A table is in second normal form iff: It is in first normal form, and Every non-key column is fully dependent on the primary key 33 DB1 Not in Second Norm Form BOOKS isbn AUTHORS title quantity ORDERS quantity isbn custid isbn author custname street zipcode city state 34 Example Database DB2 Design BOOKS: isbn, title, quantity AUTHORS: isbn, author CUSTOMERS: custid, custname, street, city, state ,zipcode ORDERS: isbn, custid, quantity 35 Example Database DB2 BOOKS isbn 123 234 345 title The Practice of Programming The C Programming Language Algorithms in C ORDERS isbn custid 123 222 345 222 123 111 quantity 500 800 650 AUTHORS isbn author 123 Kernighan 123 Pike 234 Kernighan 234 Ritchie 345 Sedgewick quantity 20 100 30 CUSTOMERS custid custname 111 Princeton 222 Harvard 333 MIT street 114 Nassau St 1256 Mass Ave 292 Main St city Princeton Cambridge Cambridge state NJ MA MA zipcode 08540 02138 02142 36 DB2 in Second Normal Form BOOKS isbn title quantity ORDERS isbn custid AUTHORS isbn author CUSTOMERS custname quantity custid street zipcode city state 37 Third Normal Form Informally... A table is in third normal form iff: It is in second normal form, and Every non-key column is non-transitively dependent on the primary key 38 DB2 Not in Third Normal Form BOOKS isbn title quantity ORDERS isbn custid AUTHORS isbn author CUSTOMERS custname quantity custid street zipcode city state 39 Example Database DB3 Design BOOKS: isbn, title, quantity AUTHORS: isbn, author CUSTOMERS: custid, custname, street, zipcode ZIPCODES: zipcode, city, state ORDERS: isbn, custid, quantity 40 Example Database DB3 BOOKS isbn 123 234 345 title The Practice of Programming The C Programming Language Algorithms in C ORDERS isbn custid 123 222 345 222 123 111 quantity 50 100 150 AUTHORS isbn author 123 Kernighan 123 Pike 234 Kernighan 234 Ritchie 345 Sedgewick quantity 20 100 30 CUSTOMERS custid custname 111 Princeton 222 Harvard 333 MIT street zipcode 114 Nassau St 08540 1256 Mass Ave 02138 292 Main St 02142 ZIPCODES zipcode city 08540 Princeton 02138 Cambridge 02142 Cambridge state NJ MA MA 41 DB3 in Third Normal Form BOOKS title isbn quantity CUSTOMERS custname custid AUTHORS isbn author ORDERS isbn custid quantity street zipcode ZIPCODES zipcode city state 42 Database Design Wrap-Up Some additional points... Database designers routinely violate normal forms But a good one does so: Only with a purpose – typically efficiency Knowing the consequences 43 Database Design Wrap-Up DBMS can enforce additional "consistency" constraints; eg: Primary key values cannot be null Primary key values must be unique within a table Within a table, "foreign" key values must correspond to primary key values in another table 44 Database Design Wrap-Up There is a substantial mathematical theory of relational database design More precise definitions of normal forms Several additional normal forms Relational algebra, relational calculus Foreign keys, integrity rules ... See An Introduction to Database Systems (C. J. Date) 45 Part 3: Transactions 46 Motivation for Transactions Problem 1: Recovery Problem 2: Concurrency DBMS must recover from HW/SW failures DBMS must handle updates from multiple concurrent clients Both problems are solved by transactions 47 Problem 1: Recovery Recovery DBMS must recover from HW/SW failures DBMS must preserve DB consistency in the presence of HW/SW failures 48 Problem 1: Recovery Example: Customer 111 has purchased 1 copy of book 123 Must change BOOKS: subtract 1 from quantity of appropriate row Must change ORDERS: add 1 to quantity of appropriate row 49 Problem 1: Recovery Time Decrement quantity of the BOOKS row whose isbn is 123 HW/SW error Increment quantity of the ORDERS row whose isbn is 123 and custid is 111 DB becomes inconsistent!!! 50 Solution 1: Transactions Transaction A logical unit of work A sequence of operations that the DBMS performs atomically DBMS executes all or none of the operations Transforms DB from one consistent state to another Without necessarily preserving consistency at intermediate points 51 Solution 1: Transactions Pattern: (1) Execute BEGIN statement to begin transaction (2) Perform update(s) (3) Execute: COMMIT statement to commit updates to DB and end transaction, or ROLLBACK statement to discard updates and end transaction 52 Solution 1: Transactions BEGIN Time Decrement quantity of the BOOKS row whose isbn is 123 HW/SW error Increment quantity of the ORDERS row whose isbn is 123 and custid is 111 COMMIT DBMS rolls back first update DB remains consistent 53 Problem 2: Concurrency Concurrency Multiple processes may access DB concurrently DBMS must maintain DB consistency in the presence of concurrent updates 54 Problem 2: Concurrency Process A Fetch row R (quan=100) Time Process B Fetch row R (quan=100) quan+=5 Update row R (quan=105) quan+=5 Update row R (quan=105) Process A's update is lost; DB is inconsistent 55 Solution 2: Transactions Transaction A recovery mechanism (as previously described), and A locking mechanism Oversimplification... Process can ask DBMS to lock DB rows that are involved in active transaction Other processes cannot access locked DB rows 56 Solution 2: Transactions Process A Process B BEGIN Fetch row R (quan=100) Acquire lock on R BEGIN quan += 5 Time Update row R (quan=105) COMMIT Release lock on R Fetch row R (quan=105) Acquire lock on R quan += 5 Update row R (quan=110) COMMIT Release lock on R Process B is blocked until A commits 57 The ACID Test DBMS transaction mechanism must pass the ACID test Atomicity: each trans is atomic Consistency: each trans takes the DB from one consistent state to another Isolation: other trans cannot access the data that has been modified by current trans Durability: no data is lost because of system failures 58 Part 4: SQL and MySQL 59 SQL and MySQL SQL: Structured Query Language Initially developed by IBM Has been standardized (ISO/IEC 90751:2008) Now the de facto standard for communicating with DBMSs 60 SQL and MySQL MySQL A popular free DBMS Uses SQL (no surprise!) Extends SQL with additional statements As DBMSs typically do 61 Creating a MySQL Database CS Dept runs MySQL DBMS DB exists for COS 333 course Host: publicdb.cs.princeton.edu Port: 3306 Database: cos333 User: cos333 Password: cos333 Use that DB for COS 333 pgmming asgts Please don't change!!! 62 Creating a MySQL Database May need to create another CS Dept database: To help you learn, or For your COS 333 project 63 Creating a MySQL Database If so: Complete and submit form https://csguide.cs.princeton.edu/requests/db Wait for e-mail from the CS system administrators Change your password: https://csguide.cs.princeton.edu/db/mysql Using the MySQL Client MySQL command-line client From a penguins shell prompt: mysql dbname --host=publicdb --user=username --port=3306 --password Or abbreviated: mysql dbname -h publicdb -u username -p Type password when prompted Type (My)SQL statements at "mysql> " prompt 65 (My)SQL Statements Keywords are case insensitive Each statement ends with semicolon CREATE, DROP, ALTER, SELECT, INSERT, DELETE, UPDATE, BEGIN, COMMIT, and ROLLBACK are standard SQL statements The others are specific to MySQL... 66 (My)SQL Statements HELP statement; HELP SELECT; HELP CREATE; HELP CREATE TABLE; QUIT; QUIT; SHOW DATABASES; SHOW DATABASES; SHOW TABLES; SHOW TABLES; 67 (My)SQL Statements CREATE TABLE [IF NOT EXISTS] table (column datatype, …) ENGINE=engine CHARSET=latin1; CREATE TABLE books (isbn VARCHAR(20), title VARCHAR(255), quantity INT) ENGINE=InnoDB CHARSET=latin1; 68 (My)SQL Data Types Some (My)SQL data types: Data Type Bytes Notes CHAR(n) n (<= 255) VARCHAR(n) Up to n (<= 255) Common TEXT(n) Up to n (<= 255) Good for searches on limited-length prefixes BLOB(n) Up to n (<= 65535) Binary Large Object INT 4 Can be UNSIGNED DOUBLE 8 DATE yyy-mm-dd TIME hh:mm:ss 69 (My)SQL Engines Some (My)SQL storage engines: MyISAM: non-transactional, but fast InnoDB: transactional, but slower … We'll use InnoDB 70 (My)SQL Statements DESCRIBE table; DESCRIBE books; DROP TABLE [IF EXISTS] table; DROP TABLE books; SOURCE file; SOURCE books.sql; Incidentally: Use this Linux (not MySQL) command to export database to text file: mysqldump dbName -h publicdb -u username -p > somefile 71 (My)SQL Statements ALTER TABLE table specification [, specification] …; ALTER TABLE books ADD price DOUBLE FIRST; ALTER TABLE books ADD pages INT AFTER quantity; ALTER TABLE books DROP price, DROP pages; ALTER TABLE table ADD INDEX(column); ALTER TABLE books ADD INDEX (isbn); • Enables fast search on isbn column • Matters for large tables (see Assignments 2, …) 72 (My)SQL Statements SELECT expr, … FROM table, … [WHERE condition] [ORDER BY column [ASC | DESC]]; SELECT * FROM books; SELECT * FROM authors; SELECT * FROM customers; SELECT * FROM orders; SELECT * FROM zipcodes; SELECT isbn, title FROM books; SELECT * FROM books ORDER BY quantity DESC; 73 (My)SQL Statements SELECT SELECT SELECT SELECT SELECT SELECT SELECT SELECT * * * * * * * * FROM FROM FROM FROM FROM FROM FROM FROM books WHERE quantity = 650; books WHERE quantity >= 650; orders orders WHERE isbn = 123 AND custid = 222; orders WHERE isbn = 123 OR custid = 222; books WHERE title LIKE 'The%'; books WHERE title LIKE '%of%'; books WHERE title LIKE 't_e%'; 74 (MySQL) Statements SELECT books.title, authors.author from books, authors; • Cartesian product; 15 rows, but only 5 are meaningful! SELECT books.title, authors.author from books, authors WHERE books.isbn = authors.isbn; SELECT title, author from books, authors WHERE books.isbn = authors.isbn; SELECT custname, title, orders.quantity FROM books, customers, orders WHERE books.isbn = orders.isbn AND orders.custid = customers.custid; 75 (My)SQL Statements INSERT INTO table (column, …) VALUES (expr, …); INSERT INTO books (isbn, title, quantity) VALUES ('456', 'Core Java', 120); DELETE FROM table [WHERE condition]; DELETE FROM books; -- Be careful!!! DELETE FROM books WHERE title LIKE 'The%'; UPDATE table SET column1=expr1 [, column2=expr2 …] [WHERE condition]; UPDATE books SET quantity=60 WHERE isbn=123; UPDATE books SET quantity=quantity+1 WHERE isbn=123; 76 Transactions in (My)SQL BEGIN; Begins a transaction COMMIT; Commits and ends the active transaction ROLLBACK; Rolls back and ends the active transaction 77 Transactions in MySQL UPDATE books SET quantity = 11111 WHERE isbn = 123; QUIT; BEGIN; UPDATE books SET quantity = 22222 WHERE isbn = 123; QUIT; BEGIN; UPDATE books SET quantity = 33333 WHERE isbn = 123; COMMIT; QUIT; BEGIN; UPDATE books SET quantity = 44444 WHERE isbn = 123; ROLLBACK; QUIT; 78 Part 5: Database Programming in Java and Python 79 Database Drivers Program must use a database driver to communicate with DBMS Program Database driver DBMS 80 Java Database Drivers Generally: Specifically: Java program Java program JDBCcompliant driver mysql-connectorjava-X.Y.Z-bin.jar DBMS MySQL mysql-connector-java-X.Y.Z-bin.jar must be in CLASSPATH 81 Python Database Drivers Generally: Specifically: Python program Python program Python DBAPIcompliant module MySQLdb DBMS MySQL MySQLdb module must be installed 82 "SelectRows" Programs Illustrate: Database connectivity Execution of "select" statements Handling of result tables 83 "SelectRows" in Java See SelectRows.java Accessing the CS Dept MySQL DBMS "Impedence mismatch" Use of "cursor" to iterate over result table 84 "SelectRows" in Python See selectrows.py Accessing the CS Dept MySQL DBMS "Impedence mismatch" Use of "cursor" to iterate over result table Cursor can provide each row as list (default) or dictionary 85 "SelectRows" in Python Generalizing: Editorial... In relational model, columns are (in principle) unordered Cursor should provide each row as dictionary, not list 86 "UpdateRows" Programs Illustrate: Database connectivity Execution of "update" statements Transactions 87 "UpdateRows" in Java See UpdateRows.java Execution of SQL UPDATE statements Transactions connection.setAutoCommit(false); Commands connection not to automatically commit after each update Program must explicitly commit First update implicitly begins a transaction connection.commit() Commits and ends transaction 88 "UpdateRows" in Python See updaterows.py Execution of SQL UPDATE statements Transactions First update implicitly begins a transaction connection.commit() Commits and ends transaction 89 "Trans" Programs Illustrate: Using transactions for recovery See Trans.java, trans.py "Error" occurs at unpredictable time between updates Database consistency is preserved 90 SQL Injection Attacks The problem (via an example)... Consider this SQL statement: SELECT * from books where title='expr' Suppose malicious user provides this expr: x'; DROP TABLE books; - Note that two hyphens indicate the start of a SQL comment 91 SQL Injection Attacks SELECT * from books where title='expr' x'; DROP TABLE books; -- SELECT * from books where title='x'; DROP TABLE books;--' 92 SQL Injection Attacks Resulting statement is this: SELECT * from books where title='x'; DROP TABLE books; –- ' Valid sequence of 2 SQL statements Corrupts the database!!! For more info: http://unixwiz.net/techtips/sql-injection.html 93 Prepared Statements A Solution... Prepared statements "Compile" SQL statement with placeholders Send user data to statement as parameters User data is unrelated to SQL statement parsing 94 "SelectRowsPrepared" Pgms Illustrate: Prepared statements See SelectRowsPrepared.java See selectrowsprepared.py Prepared statements disallow SQL injection attacks 95 Summary We have covered... Databases and database management systems Database design Normal forms Transactions SQL and MySQL Database programming in Java and Python Transactions, SQL injection attacks 96 Appendix 1: Database Programming in C 97 C Database Pgmming Generally: Specifically: C program C program ODBCcompliant driver mysql.h libmysqlclient.a DBMS MySQL Must tell gcc where to find mysql.h and libmysqlclient.a 98 C Database Pgmming Place in .bashrc file: export MYSQL_INCLUDE_DIR= /usr/local/include/mysql export MYSQL_LIBRARY_DIR= /usr/local/lib/mysql Then, to build: gcc -Wall -ansi -pedantic -I $MYSQL_INCLUDE_DIR -L $MYSQL_LIBRARY_DIR -lmysqlclient program.c -o program 99 C Database Pgmming Selecting rows See selectrows.c Accessing the CS Dept MySQL DBMS Use of cursor to iterate over result table 100 C Database Pgmming Updating rows See updaterows.c Execution of SQL UPDATE statements Transactions (Similar to UpdateRows.java) 101 C Database Pgmming Transactions See trans.c "Error" occurs at unpredictable time between updates Database consistency is preserved 102