Download PPT - NYU Stern School of Business

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

SQL wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database model wikipedia , lookup

Transcript
C20.0046: Database
Management Systems
Lecture #1
M.P. Johnson
Stern School of Business, NYU
Spring, 2008
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
1
What Is a Database?


A large, integrated collection of data
which models a real-world enterprise:

Entities


Relationships




students, courses, instructors, TAs
Hillary is currently taking C20.0046
Barack is currently teaching C20.0046
John is currently TA-ing C20.0046 but took it last semester
A Database Management System (DBMS) is a
software package that stores and manages DBs
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
2
Databases are everywhere: non-web

criminal/terrorist: TIA

NYPD’s CompStat

Tracking crime stats by precinct

airline bookings

Retailers: Wal-Mart, etc.


when to re-order, purchase patterns, data-mining
Genomics
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
3
Databases are everywhere: web

retail: Amazon, etc.

data-mining: Page You Made

search engines

searchable DBs: IMDB, tvguide.com, etc.

Web2.0 sites:


flickr = images + tags
CMS systems (Wikis, blog & forum software, etc.)
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
4
Databases involved in ordering a pizza?
1.
Pizza Hut’s DB
2.
Credit card records
3.
CC  approval by credit agencies
4.
phone company’s records

(“Pull his LUDs, Lenny.”)
Caller ID
5.

Error-checking, anticrime
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
5
Your wallet is full of DB records…








Driver’s license
Credit cards
NYUCard
Medical insurance card
Social security card
Money (serial numbers)
Photos (ids on back)
Etc…
“You may not be interested in
databases, but databases are
interested in you.” - Trotsky
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
6
Example of a Traditional DB App

Suppose we build a system

We store:




checking accounts
savings accounts
account holders
state of each of each person’s accounts
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
7
Can we do without a DBMS?
Sure! Start by storing the data in files:
checking.txt
savings.txt
customers.txt
Now write C or Java programs to implement
specific tasks…
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
8
Doing it without a DBMS...

Transfer $100 from George’s savings to
checking:
Write a C program to do the following:
Read savings.txt
Find&update the line w/“George”
balance -= 100
Write savings.txt
Read checking.txt
Find&update the line w/“George”
balance += 100
Write checking.txt
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
9
Problems without an DBMS...
1. System crashes:


Read savings.txt
Find&update the line w/ “George.”
Write savings.txt
Read checking.txt
Find&update the line w/ “George”
Write checking.txt
Same problem even if reordered
High-volume  (Rare  frequent)
CRASH!
2. Simultaneous access by many users


George and Dick visit ATMs at same time
Lock checking.txt before each use–what is the problem?
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
10
Problems without a DBMS...
3.Large data sets (100s of GBs, or TBs, …)

No indices


Finding “George” in huge flatfile is expensive
Modifications intractable without better data
structures


“George”  “Georgie” is very expensive
Deletions are very expensive
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
11
Problems without an DBMS...
5. Security?


File system may lack security features
File system security may be coarse
6. Application programming interface (API)?

Interfaces, interoperability
7. How to query the data?
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
12
In homebrew system, must support

failover/rovery
concurrent use
deal with large datasets?
security
interop?
querying in what?

 DBMS as application







Q: How does a DBMS solve these problems?
A: See third part of course, but for now…
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
13
One big issue: Transaction processing

Grouping of several queries (or other DB operations)
into one transaction

ACID test properties

Atomicity


Consistency


constraints on relationships
Isolation



all or nothing
concurrency control
simulated solipsism
Durability

Crash recovery
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
14
Atomicity & Durability


Avoiding inconsistent state
A DBMS prevents this outcome


xacts are all or nothing
One simple idea: log progress of and plans
for each xact


Durability: changes stay made (with log…)
Atomicity: entire xact is committed at once
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
15
Isolation

Many users  concurrent execution




Interweaving actions of different user programs
 but can lead to inconsistency:


Disk access is slow (compared to CPU)
 don’t waste CPU – keep running
e.g., two programs simultaneously withdraw from the same
account
For each user, should look like a single-user system

Simulated solipsism
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
16
Isolation

Contrast with a file in two Notepads




Strategy: ignore multiple users
whichever saves last wins
first save is overwritten
Contrast with a file in two Words



Strategy: blunt isolation
One can edit
To the other it’s read-only
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
17
Consistency

Each xact (on a consistent DB) must leave it
in a consistent state

can define integrity constraints

checks that the defined claims about the data

Only xacts obeying them are allowed
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
18
A level up: data models

Any DBMS uses a data model: collection of
concepts for describing data

Relational data model: basically universal

Oracle, DB2, SQLServer, other SQL DBMSs



Relations: table of rows & columns
a rel’s schema defines its fields
Though some have OO extensions…
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
19
Data Schemas

Schema: description of partic set of data,
using some data model

“Physical schema”


Schema


Physical files on disk
Set of relations/tables, with structure
Views (“external schema”)

Virtual tables generated for user types
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
20
Schema e.g.: college registrar

Schema:




Physical schema:



Students(ssn: string, name: string, login: string, age: int,
gpa: real)
Courses(cid: string, cname: string, credits: int)
Enrolled(sid:string, cid:string, grade: string)
Relations stored as unordered text files.
Indices on first column of each rel
Views:


My_courses(cname: string, grade: string, credits: int)
Course_info(ssn: string, name: string, status: string)
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
21
How the programmer sees the DBMS

Start with SQL DDL to create tables:
CREATE TABLE Students (
Name CHAR(30)
SSN CHAR(9) PRIMARY KEY NOT NULL,
Category CHAR(20)
);

Continue with SQL to populate tables:
INSERT INTO Students
VALUES('Hillary', '123456789', 'undergraduate');
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
22
How the programmer sees the DBMS
Takes:
Students:
SSN
123-45-6789
234-56-7890
Courses:
CID
C20.0046
C20.0056

Name
Hillary
Barak
…
Category
undergrad
grad
…
SSN
123-45-6789
CID
C20.0046
123-45-6789
C20.0056
234-56-7890
C20.0046
…
CName
Databases
Advanced Software
semester
Spring,
2004
Spring,
2004
Fall, 2003
Ultimately files, but complex
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
23
Querying: Structured Query Language

Find all the students who have taken C20.0046:
SELECT SSN FROM Takes
WHERE CID='C20.0046';

Find all the students who C20.0046 previously:
SELECT SSN FROM Takes
WHERE CID='C20.0046' AND
Semester='Fall, 2005';

Find the students’ names:
SELECT Name FROM Students, Takes
WHERE Students.SSN=Takes.SSN AND
CID='C20.0046' AND Semester='Fall, 2005';
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
24
Database Industry

Relational databases are based on set theory

Commercial DBMSs: Oracle, IBM’s DB2,
Microsoft’s SQL Server, etc.
Opensource: MySQL, PostgreSQL, etc.


DBAs manage these
Programmers write apps (CRUD, etc.)

XML (“semi-structured data”) also important

M.P. Johnson, DBMS, Stern/NYU, Spring 2008
25
The Study of DBMS

Primary aspects:




Data modeling
SQL
DB programming
DBMS implementation

This course covers all four (tho less of #4)

Also will look at some more advanced areas

XML, websearch, column-oriented DBs, RAID,
RegExs, MapReduce…
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
26
Course outline

Database design:



The relational model:



Entity/Relationship models
Modeling constraints
Relational algebra
Transforming E/R models to relational schemas
SQL

DDL & query language
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
27
Course outline


Programming for databases
Some DB implementation

Indexes, sorting, xacts

Advanced topics…

May change as course progresses


partly in response to audience
Also “current events”

Slashdot/whatever, Database Blog, etc.
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
28
Textbook: Database Management Systems

by Raghu Ramakrishnan &
Johannes Gehrke


3 edition (August 14, 2002)
Available:





NYU bookstore
Amazon/BN (may be cheaper)
Amazon.co.uk (may be cheaper
still)
Links on class page
Difficult but good
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
29
SQL Readings

Many SQL references available online

Good online (free) SQL tutorials include:

A Gentle Introduction to SQL (http://sqlzoo.net/)

SQL for Web Nerds
(http://philip.greenspun.com/sql/)
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
30
Communications

M. P. Johnson



mjohnson@stern
Office hours: after class
To receive class mail:


Activate account: http://start.stern.nyu.edu
Forward mail: http://simon.stern.nyu.edu
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
31
Communications

Web page:
http://pages.stern.nyu.edu/~mjohnson/dbms/




syllabus
course policies
antecedent courses
Blackboard web site


Some materials will be available here
Discussion board



send general-interest messages here to benefit all
Go to http://sternclasses.nyu.edu
Click on C20.0046
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
32
Grading

Prerequisites:




Requirements & base score:







Light programming experience
A bit of “mathematical maturity”
Interest in IT/CS
Homework 15%: O(3)
Project: 30% - see below…
Midterm (closed book/notes): 20%
Final (closed book/notes; likely 2hrs in class): 25%
Class participation/pop-quizzes: 10%
Stern Curve
Consistent class attendance is required



Absences will seriously affect your total grade.
Final score = base score – 2n-1
where n = # missed quizzes (if n>0)
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
33
The Project: design end-to-end DB web app

data model



creation of DB in Oracle/MySQL


Population with real(alistic) data
web app for accessing/modifying data



Identify entities & their relationships
 relations
Identification of “interesting” questions & actions
Produce DBMS interface
Work in pairs (/threes)


Choose topic on your own
Start forming your group today!
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
34
Collaboration model






Homework and exams done individually
Project done with your team members only, though
can in general use any tools
Non-cited use of others’ problem solutions, code,
etc. = plagiarism
See Stern’s stern academic honesty policy
Contact me if you’re at all unclear before a particular
case
Cite any materials used if you’re at all unclear after
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
35
On-going Feedback

Don’t be afraid to ask questions


Some parts will be abstract/mathematical
Topic selection will be partly based on
student interest
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
36
So what is this course about, really?



Languages: SQL (some XML …)
Data modeling
Some theory! (rereading…)




Some DBMS implementation (algs & data structs)
Algorithms and data structures (in the latter part)


Functional dependencies, normal forms
e.g., how to find most efficient schema for data
e.g., indices make data much faster to find – how?
Lots of DB implementation and hacking for the
project
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
37
For next time

Get the book

Skim chapter 1
Start reading chapter 2

M.P. Johnson, DBMS, Stern/NYU, Spring 2008
38
For right now/tonight: email survey

1.
2.
3.
4.
5.
Send to SOMEWHERE
name
previous cs/is/math/logic courses
previous programming experience
career plans: programmer, DBA, MBA, etc.
why taking class/what you’re interested in
learning about
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
39