Download Databases and Database Programming Robert M. Dondero, Ph.D. Princeton University

Document related concepts

Commitment ordering wikipedia , lookup

Oracle Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Serializability wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Ingres (database) wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

Concurrency control wikipedia , lookup

PL/SQL wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Databases and Database
Programming
Robert M. Dondero, Ph.D.
Princeton University
1
Objectives

You will learn about:

Databases and database management
systems

Database design

Transactions

SQL and MySQL

Database programming in Java and Python
2
Objectives

Note:

Comprehensive coverage is impossible!

Please supplement with the required reading
3
Part 1:
Databases and
Database Management Systems
4
DB and DBMS

Some definitions...

Database (DB)


A structured collection of data
An abstract view of a file or collection of files
stored on a disk
5
DB and DBMS

Database management system (DBMS)

Software that maintains a database

Usually, a server


Listens on a known host at a known port
Clients contact server to perform queries
and updates
6
Motivation for DB and DBMS

Question: Why not simply use files?

Answer: Centralized control


See An Introduction to Database Systems (C.
J. Date)
Database administrators (DBAs) control the
data
7
Motivation for DB and DBMS

A good DBMS used by good DBAs can:

Reduce redundancy

Avoid inconsistencies

Facilitate data sharing

Enforce standards

Apply security restrictions

Maintain integrity

Balance conflicting requirements

Insure safety (backups)
8
Types of DBs

In historical order...

Navigational DBs

Relational DBs
9
Navigational DBs


Data are linked into tree or network
structure

User/program is given "root" node

User/program follows links to desired data
Example:
10
Example Navigational DB
BOOKS
ORDERS
CUSTOMERS
isbn title quantity
custid quantity
author
AUTHORS
custname street zipcode
ZIPCODES
city state
11
Example Nav DB Query
Which customers purchased the book whose ISBN is 123?
BOOKS
(isbn, title, quantity)
ORDERS
(custid, quantity)
CUSTOMERS
(custname, street,
zipcode)
123 | The Practice of Programming | 50
222 | 20
111 | 30
Princeton | 114 Nassau St | 08540
Harvard | 1256 Mass Ave | 02138
12
Navigational DB Limitations


In the example:

Good: Given book, easy to find customers

Bad: Given customer, hard to find books
In general:


Queries are biased
DB designer must anticipate queries to create
appropriate links
13
Relational DBs

Edgar Codd, 1968-1970

Applied some mathematics

Eliminate all links!!!

Informally: Database consists of tables

Example...
14
Relational DBs
BOOKS
isbn
123
234
345
title
The Practice of Programming
The C Programming Language
Algorithms in C
ORDERS
isbn custid
123 222
345 222
123 111
quantity
50
100
150
AUTHORS
isbn author
123 Kernighan
123 Pike
234 Kernighan
234 Ritchie
345 Sedgewick
quantity
20
100
30
CUSTOMERS
custid custname
111
Princeton
222
Harvard
333
MIT
street
114 Nassau St
1256 Mass Ave
292 Main St
zipcode
08540
02138
02142
ZIPCODES
zipcode city
08540
Princeton
02138
Cambridge
02142
Cambridge
state
NJ
MA
MA
15
Relational DBs

No links


Queries are unbiased
However:


User can create links (indices), based upon
anticipated usage patterns
DBMS can create links (indices), based upon
observed usage patterns
16
Popular Relational DBMSs


Today relational DBs are the most popular
kind
Popular relational DBMSs:

Oracle (Oracle, commercial)

SQL Server (Microsoft, commercial)

Informix (IBM, commercial)

MySQL (Sun/Oracle, free)

SQLite (free)
17
Popular Relational DBMSs

Hereafter, we will:


Limit our discussion to relational DBs and
DBMSs
Use MySQL
18
Part 2:
Database Design
19
Example Database DB0

Design:


BOOKS: isbn, title, authors, quantity
ORDERS: isbn, custid, custname, street, city,
state, zipcode, quantity
20
Example Database DB0
Example data:
BOOKS
isbn
123
234
345
title
The Practice of Programming
The C Programming Language
Algorithms in C
ORDERS
isbn custid
123 222
345 222
123 111
custname
Harvard
Harvard
Princeton
authors
Kernighan,Pike
Kernighan,Ritchie
Sedgewick
street
1256 Mass Ave
1256 Mass Ave
114 Nassau St
city
Cambridge
Cambridge
Princeton
quantity
500
800
650
state
MA
MA
NJ
zipcode
02138
02138
08540
quantity
20
100
30
21
Normal Forms

Observation: Design seems wrong


Note redundancy
Can we be more formal?

Are there rules that we can apply to:


Determine that the design is wrong?
Make it right?
22
Normal Forms


Database researchers have developed
theories to help DBAs:

Develop "right" designs, or...

Consciously decide to use a "wrong" design
In particular, they have proposed normal
forms
23
First Normal Form


Informally...
Def: A table is in first normal form iff each
column contains only atomic values
24
DB0 Not in First Normal Form
BOOKS
isbn
123
234
345
title
The Practice of Programming
The C Programming Language
Algorithms in C
ORDERS
isbn custid
123 222
345 222
123 111
custname
Harvard
Harvard
Princeton
authors
Kernighan,Pike
Kernighan,Ritchie
Sedgewick
street
1256 Mass Ave
1256 Mass Ave
114 Nassau St
city
Cambridge
Cambridge
Princeton
quantity
500
800
650
state
MA
MA
NJ
zipcode
02138
02138
08540
quantity
20
100
30
25
Example Database DB1

Design

BOOKS: isbn, title, quantity

AUTHORS: isbn, author

ORDERS: isbn, custid, custname, street, city,
state, zipcode,quantity
26
Example Database DB1
Example data:
BOOKS
isbn
123
234
345
title
The Practice of Programming
The C Programming Language
Algorithms in C
ORDERS
isbn custid
123 222
345 222
123 111
custname
Harvard
Harvard
Princeton
quantity
500
800
650
street
1256 Mass Ave
1256 Mass Ave
114 Nassau St
city
Cambridge
Cambridge
Princeton
AUTHORS
isbn author
123 Kernighan
123 Pike
234 Kernighan
234 Ritchie
345 Sedgewick
state
MA
MA
NJ
zipcode
02138
02138
08540
quantity
20
100
30
27
Candidate Keys


Suppose a table has columns {C1, ..., Cn }
Def: A set of columns {Ci, …, Ck} is a
candidate key of that table if it satisfies
two properties:


Uniqueness: At any given time, no two distinct
rows of the table have the same value for {Ci,
… Ck}.
Minimality: None of {Ci, …, Ck} can be
discarded without destroying the uniqueness
property
28
Candidate Keys in DB1
Candidate keys:
BOOKS
isbn
123
234
345
title
The Practice of Programming
The C Programming Language
Algorithms in C
ORDERS
isbn custid
123 222
345 222
123 111
custname
Harvard
Harvard
Princeton
quantity
500
800
650
street
1256 Mass Ave
1256 Mass Ave
114 Nassau St
city
Cambridge
Cambridge
Princeton
AUTHORS
isbn author
123 Kernighan
123 Pike
234 Kernighan
234 Ritchie
345 Sedgewick
state
MA
MA
NJ
zipcode
02138
02138
08540
quantity
20
100
30
29
Primary Keys


Def: We choose one candidate key to be
the primary key; the others are called
alternate keys
In DB1:

Each table has only one candidate key

It's obvious which to choose
30
Functional Dependence


Def: A column C2 of a table is functionally
dependent on a column C1 iff, for each
row in the table, the value of C1
determines the value of C2
In DB1...
31
DB1 Functional Dependencies
BOOKS
isbn
title
AUTHORS
quantity
ORDERS
quantity
isbn
custid
isbn
author
custname
street
zipcode
city
state
32
Second Normal Form

Informally...

A table is in second normal form iff:


It is in first normal form, and
Every non-key column is fully dependent on
the primary key
33
DB1 Not in Second Norm Form
BOOKS
isbn
AUTHORS
title
quantity
ORDERS
quantity
isbn
custid
isbn
author
custname
street
zipcode
city
state
34
Example Database DB2

Design

BOOKS: isbn, title, quantity

AUTHORS: isbn, author


CUSTOMERS: custid, custname, street, city,
state ,zipcode
ORDERS: isbn, custid, quantity
35
Example Database DB2
BOOKS
isbn
123
234
345
title
The Practice of Programming
The C Programming Language
Algorithms in C
ORDERS
isbn custid
123 222
345 222
123 111
quantity
500
800
650
AUTHORS
isbn author
123 Kernighan
123 Pike
234 Kernighan
234 Ritchie
345 Sedgewick
quantity
20
100
30
CUSTOMERS
custid custname
111
Princeton
222
Harvard
333
MIT
street
114 Nassau St
1256 Mass Ave
292 Main St
city
Princeton
Cambridge
Cambridge
state
NJ
MA
MA
zipcode
08540
02138
02142
36
DB2 in Second Normal Form
BOOKS
isbn
title
quantity
ORDERS
isbn
custid
AUTHORS
isbn
author
CUSTOMERS
custname
quantity
custid
street
zipcode
city
state
37
Third Normal Form

Informally...

A table is in third normal form iff:


It is in second normal form, and
Every non-key column is non-transitively
dependent on the primary key
38
DB2 Not in Third Normal Form
BOOKS
isbn
title
quantity
ORDERS
isbn
custid
AUTHORS
isbn
author
CUSTOMERS
custname
quantity
custid
street
zipcode
city
state
39
Example Database DB3

Design

BOOKS: isbn, title, quantity

AUTHORS: isbn, author

CUSTOMERS: custid, custname, street,
zipcode

ZIPCODES: zipcode, city, state

ORDERS: isbn, custid, quantity
40
Example Database DB3
BOOKS
isbn
123
234
345
title
The Practice of Programming
The C Programming Language
Algorithms in C
ORDERS
isbn custid
123 222
345 222
123 111
quantity
50
100
150
AUTHORS
isbn author
123 Kernighan
123 Pike
234 Kernighan
234 Ritchie
345 Sedgewick
quantity
20
100
30
CUSTOMERS
custid custname
111
Princeton
222
Harvard
333
MIT
street
zipcode
114 Nassau St 08540
1256 Mass Ave 02138
292 Main St
02142
ZIPCODES
zipcode city
08540
Princeton
02138
Cambridge
02142
Cambridge
state
NJ
MA
MA
41
DB3 in Third Normal Form
BOOKS
title
isbn
quantity
CUSTOMERS
custname
custid
AUTHORS
isbn
author
ORDERS
isbn
custid
quantity
street
zipcode
ZIPCODES
zipcode
city
state
42
Database Design Wrap-Up


Some additional points...
Database designers routinely violate
normal forms

But a good one does so:


Only with a purpose – typically efficiency
Knowing the consequences
43
Database Design Wrap-Up

DBMS can enforce additional
"consistency" constraints; eg:



Primary key values cannot be null
Primary key values must be unique within a
table
Within a table, "foreign" key values must
correspond to primary key values in another
table
44
Database Design Wrap-Up

There is a substantial mathematical theory
of relational database design

More precise definitions of normal forms

Several additional normal forms

Relational algebra, relational calculus

Foreign keys, integrity rules

...

See An Introduction to Database Systems (C.
J. Date)
45
Part 3:
Transactions
46
Motivation for Transactions

Problem 1: Recovery


Problem 2: Concurrency


DBMS must recover from HW/SW failures
DBMS must handle updates from multiple
concurrent clients
Both problems are solved by transactions
47
Problem 1: Recovery

Recovery


DBMS must recover from HW/SW failures
DBMS must preserve DB consistency in the
presence of HW/SW failures
48
Problem 1: Recovery

Example:



Customer 111 has purchased 1 copy of book
123
Must change BOOKS: subtract 1 from quantity
of appropriate row
Must change ORDERS: add 1 to quantity of
appropriate row
49
Problem 1: Recovery
Time
Decrement quantity of the BOOKS
row whose isbn is 123
HW/SW error
Increment quantity of the ORDERS
row whose isbn is 123 and custid is 111
DB becomes inconsistent!!!
50
Solution 1: Transactions

Transaction


A logical unit of work
A sequence of operations that the DBMS
performs atomically


DBMS executes all or none of the
operations
Transforms DB from one consistent state to
another

Without necessarily preserving consistency
at intermediate points
51
Solution 1: Transactions

Pattern:

(1) Execute BEGIN statement to begin
transaction

(2) Perform update(s)

(3) Execute:


COMMIT statement to commit updates to
DB and end transaction, or
ROLLBACK statement to discard updates
and end transaction
52
Solution 1: Transactions
BEGIN
Time
Decrement quantity of the BOOKS
row whose isbn is 123
HW/SW error
Increment quantity of the ORDERS
row whose isbn is 123 and custid is 111
COMMIT
DBMS rolls back first update
DB remains consistent
53
Problem 2: Concurrency

Concurrency


Multiple processes may access DB
concurrently
DBMS must maintain DB consistency in the
presence of concurrent updates
54
Problem 2: Concurrency
Process A
Fetch row R
(quan=100)
Time
Process B
Fetch row R
(quan=100)
quan+=5
Update row R
(quan=105)
quan+=5
Update row R
(quan=105)
Process A's update is lost; DB is inconsistent
55
Solution 2: Transactions

Transaction

A recovery mechanism (as previously
described), and

A locking mechanism

Oversimplification...


Process can ask DBMS to lock DB rows
that are involved in active transaction
Other processes cannot access locked DB
rows
56
Solution 2: Transactions
Process A
Process B
BEGIN
Fetch row R (quan=100)
Acquire lock on R
BEGIN
quan += 5
Time
Update row R (quan=105)
COMMIT
Release lock on R
Fetch row R (quan=105)
Acquire lock on R
quan += 5
Update row R (quan=110)
COMMIT
Release lock on R
Process B is blocked until A commits
57
The ACID Test

DBMS transaction mechanism must pass
the ACID test




Atomicity: each trans is atomic
Consistency: each trans takes the DB from
one consistent state to another
Isolation: other trans cannot access the data
that has been modified by current trans
Durability: no data is lost because of system
failures
58
Part 4:
SQL and MySQL
59
SQL and MySQL

SQL:

Structured Query Language

Initially developed by IBM


Has been standardized (ISO/IEC 90751:2008)
Now the de facto standard for communicating
with DBMSs
60
SQL and MySQL

MySQL

A popular free DBMS

Uses SQL (no surprise!)

Extends SQL with additional statements

As DBMSs typically do
61
Creating a MySQL Database

CS Dept runs MySQL DBMS

DB exists for COS 333 course





Host: publicdb.cs.princeton.edu
Port: 3306
Database: cos333
User: cos333
Password: cos333

Use that DB for COS 333 pgmming asgts

Please don't change!!!
62
Creating a MySQL Database

May need to create another CS Dept
database:

To help you learn, or

For your COS 333 project
63
Creating a MySQL Database

If so:

Complete and submit form



https://csguide.cs.princeton.edu/requests/db
Wait for e-mail from the CS system
administrators
Change your password:

https://csguide.cs.princeton.edu/db/mysql
Using the MySQL Client

MySQL command-line client

From a penguins shell prompt:


mysql dbname --host=publicdb
--user=username --port=3306 --password
Or abbreviated:

mysql dbname -h publicdb -u username -p

Type password when prompted

Type (My)SQL statements at "mysql> " prompt
65
(My)SQL Statements

Keywords are case insensitive

Each statement ends with semicolon


CREATE, DROP, ALTER, SELECT,
INSERT, DELETE, UPDATE, BEGIN,
COMMIT, and ROLLBACK are standard
SQL statements
The others are specific to MySQL...
66
(My)SQL Statements
HELP statement;
HELP SELECT;
HELP CREATE;
HELP CREATE TABLE;
QUIT;
QUIT;
SHOW DATABASES;
SHOW DATABASES;
SHOW TABLES;
SHOW TABLES;
67
(My)SQL Statements
CREATE TABLE [IF NOT EXISTS] table
(column datatype, …) ENGINE=engine
CHARSET=latin1;
CREATE TABLE books (isbn VARCHAR(20),
title VARCHAR(255), quantity INT)
ENGINE=InnoDB CHARSET=latin1;
68
(My)SQL Data Types
Some (My)SQL data types:
Data Type
Bytes
Notes
CHAR(n)
n (<= 255)
VARCHAR(n)
Up to n (<= 255)
Common
TEXT(n)
Up to n (<= 255)
Good for searches on
limited-length prefixes
BLOB(n)
Up to n (<= 65535)
Binary Large Object
INT
4
Can be UNSIGNED
DOUBLE
8
DATE
yyy-mm-dd
TIME
hh:mm:ss
69
(My)SQL Engines


Some (My)SQL storage engines:

MyISAM: non-transactional, but fast

InnoDB: transactional, but slower

…
We'll use InnoDB
70
(My)SQL Statements
DESCRIBE table;
DESCRIBE books;
DROP TABLE [IF EXISTS] table;
DROP TABLE books;
SOURCE file;
SOURCE books.sql;
Incidentally: Use this Linux (not MySQL)
command to export database to text file:
mysqldump dbName -h publicdb -u username -p > somefile
71
(My)SQL Statements
ALTER TABLE table specification [, specification] …;
ALTER TABLE books ADD price DOUBLE FIRST;
ALTER TABLE books ADD pages INT AFTER quantity;
ALTER TABLE books DROP price, DROP pages;
ALTER TABLE table ADD INDEX(column);
ALTER TABLE books ADD INDEX (isbn);
• Enables fast search on isbn column
• Matters for large tables (see Assignments 2, …)
72
(My)SQL Statements
SELECT expr, … FROM table, … [WHERE condition]
[ORDER BY column [ASC | DESC]];
SELECT * FROM books;
SELECT * FROM authors;
SELECT * FROM customers;
SELECT * FROM orders;
SELECT * FROM zipcodes;
SELECT isbn, title FROM books;
SELECT * FROM books ORDER BY quantity DESC;
73
(My)SQL Statements
SELECT
SELECT
SELECT
SELECT
SELECT
SELECT
SELECT
SELECT
*
*
*
*
*
*
*
*
FROM
FROM
FROM
FROM
FROM
FROM
FROM
FROM
books WHERE quantity = 650;
books WHERE quantity >= 650;
orders
orders WHERE isbn = 123 AND custid = 222;
orders WHERE isbn = 123 OR custid = 222;
books WHERE title LIKE 'The%';
books WHERE title LIKE '%of%';
books WHERE title LIKE 't_e%';
74
(MySQL) Statements
SELECT books.title, authors.author from books, authors;
• Cartesian product; 15 rows, but only 5 are meaningful!
SELECT books.title, authors.author from books, authors
WHERE books.isbn = authors.isbn;
SELECT title, author from books, authors
WHERE books.isbn = authors.isbn;
SELECT custname, title, orders.quantity
FROM books, customers, orders
WHERE books.isbn = orders.isbn
AND orders.custid = customers.custid;
75
(My)SQL Statements
INSERT INTO table (column, …) VALUES (expr, …);
INSERT INTO books (isbn, title, quantity) VALUES
('456', 'Core Java', 120);
DELETE FROM table [WHERE condition];
DELETE FROM books; -- Be careful!!!
DELETE FROM books WHERE title LIKE 'The%';
UPDATE table SET column1=expr1 [, column2=expr2 …]
[WHERE condition];
UPDATE books SET quantity=60 WHERE isbn=123;
UPDATE books SET quantity=quantity+1 WHERE isbn=123;
76
Transactions in (My)SQL
BEGIN;
Begins a transaction
COMMIT;
Commits and ends the active transaction
ROLLBACK;
Rolls back and ends the active transaction
77
Transactions in MySQL
UPDATE books SET quantity = 11111 WHERE isbn = 123;
QUIT;
BEGIN;
UPDATE books SET quantity = 22222 WHERE isbn = 123;
QUIT;
BEGIN;
UPDATE books SET quantity = 33333 WHERE isbn = 123;
COMMIT;
QUIT;
BEGIN;
UPDATE books SET quantity = 44444 WHERE isbn = 123;
ROLLBACK;
QUIT;
78
Part 5:
Database Programming
in Java and Python
79
Database Drivers

Program must use a database driver to
communicate with DBMS
Program
Database
driver
DBMS
80
Java Database Drivers
Generally:
Specifically:
Java program
Java program
JDBCcompliant driver
mysql-connectorjava-X.Y.Z-bin.jar
DBMS
MySQL
mysql-connector-java-X.Y.Z-bin.jar
must be in CLASSPATH
81
Python Database Drivers
Generally:
Specifically:
Python program
Python program
Python DBAPIcompliant module
MySQLdb
DBMS
MySQL
MySQLdb module must be installed
82
"SelectRows" Programs

Illustrate:

Database connectivity

Execution of "select" statements

Handling of result tables
83
"SelectRows" in Java

See SelectRows.java

Accessing the CS Dept MySQL DBMS

"Impedence mismatch"

Use of "cursor" to iterate over result table
84
"SelectRows" in Python

See selectrows.py

Accessing the CS Dept MySQL DBMS

"Impedence mismatch"


Use of "cursor" to iterate over result table
Cursor can provide each row as list
(default) or dictionary
85
"SelectRows" in Python

Generalizing:



Editorial...
In relational model, columns are (in principle)
unordered
Cursor should provide each row as dictionary,
not list
86
"UpdateRows" Programs

Illustrate:

Database connectivity

Execution of "update" statements

Transactions
87
"UpdateRows" in Java

See UpdateRows.java

Execution of SQL UPDATE statements

Transactions

connection.setAutoCommit(false);




Commands connection not to automatically
commit after each update
Program must explicitly commit
First update implicitly begins a transaction
connection.commit()

Commits and ends transaction
88
"UpdateRows" in Python

See updaterows.py

Execution of SQL UPDATE statements

Transactions


First update implicitly begins a transaction
connection.commit()

Commits and ends transaction
89
"Trans" Programs

Illustrate:


Using transactions for recovery
See Trans.java, trans.py


"Error" occurs at unpredictable time between
updates
Database consistency is preserved
90
SQL Injection Attacks

The problem (via an example)...

Consider this SQL statement:


SELECT * from books where
title='expr'
Suppose malicious user provides this expr:

x'; DROP TABLE books; -
Note that two hyphens indicate the start of
a SQL comment
91
SQL Injection Attacks
SELECT * from books where title='expr'
x'; DROP TABLE books; --
SELECT * from books where title='x'; DROP TABLE books;--'
92
SQL Injection Attacks

Resulting statement is this:

SELECT * from books where
title='x'; DROP TABLE books; –- '

Valid sequence of 2 SQL statements

Corrupts the database!!!

For more info:

http://unixwiz.net/techtips/sql-injection.html
93
Prepared Statements

A Solution...

Prepared statements

"Compile" SQL statement with placeholders

Send user data to statement as parameters

User data is unrelated to SQL statement
parsing
94
"SelectRowsPrepared" Pgms

Illustrate:

Prepared statements

See SelectRowsPrepared.java

See selectrowsprepared.py

Prepared statements disallow SQL injection
attacks
95
Summary

We have covered...


Databases and database management
systems
Database design

Normal forms

Transactions

SQL and MySQL

Database programming in Java and Python

Transactions, SQL injection attacks
96
Appendix 1:
Database Programming in C
97
C Database Pgmming
Generally:
Specifically:
C program
C program
ODBCcompliant driver
mysql.h
libmysqlclient.a
DBMS
MySQL
Must tell gcc where to find mysql.h and
libmysqlclient.a
98
C Database Pgmming

Place in .bashrc file:



export MYSQL_INCLUDE_DIR=
/usr/local/include/mysql
export MYSQL_LIBRARY_DIR=
/usr/local/lib/mysql
Then, to build:

gcc -Wall -ansi -pedantic
-I $MYSQL_INCLUDE_DIR
-L $MYSQL_LIBRARY_DIR
-lmysqlclient
program.c -o program
99
C Database Pgmming

Selecting rows

See selectrows.c

Accessing the CS Dept MySQL DBMS

Use of cursor to iterate over result table
100
C Database Pgmming

Updating rows

See updaterows.c

Execution of SQL UPDATE statements

Transactions

(Similar to UpdateRows.java)
101
C Database Pgmming

Transactions



See trans.c
"Error" occurs at unpredictable time between
updates
Database consistency is preserved
102