Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
C20.0046: Database Management Systems Lecture #1 M.P. Johnson Stern School of Business, NYU Spring, 2008 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 1 What Is a Database? A large, integrated collection of data which models a real-world enterprise: Entities Relationships students, courses, instructors, TAs Hillary is currently taking C20.0046 Barack is currently teaching C20.0046 John is currently TA-ing C20.0046 but took it last semester A Database Management System (DBMS) is a software package that stores and manages DBs M.P. Johnson, DBMS, Stern/NYU, Spring 2008 2 Databases are everywhere: non-web criminal/terrorist: TIA NYPD’s CompStat Tracking crime stats by precinct airline bookings Retailers: Wal-Mart, etc. when to re-order, purchase patterns, data-mining Genomics M.P. Johnson, DBMS, Stern/NYU, Spring 2008 3 Databases are everywhere: web retail: Amazon, etc. data-mining: Page You Made search engines searchable DBs: IMDB, tvguide.com, etc. Web2.0 sites: flickr = images + tags CMS systems (Wikis, blog & forum software, etc.) M.P. Johnson, DBMS, Stern/NYU, Spring 2008 4 Databases involved in ordering a pizza? 1. Pizza Hut’s DB 2. Credit card records 3. CC approval by credit agencies 4. phone company’s records (“Pull his LUDs, Lenny.”) Caller ID 5. Error-checking, anticrime M.P. Johnson, DBMS, Stern/NYU, Spring 2008 5 Your wallet is full of DB records… Driver’s license Credit cards NYUCard Medical insurance card Social security card Money (serial numbers) Photos (ids on back) Etc… “You may not be interested in databases, but databases are interested in you.” - Trotsky M.P. Johnson, DBMS, Stern/NYU, Spring 2008 6 Example of a Traditional DB App Suppose we build a system We store: checking accounts savings accounts account holders state of each of each person’s accounts M.P. Johnson, DBMS, Stern/NYU, Spring 2008 7 Can we do without a DBMS? Sure! Start by storing the data in files: checking.txt savings.txt customers.txt Now write C or Java programs to implement specific tasks… M.P. Johnson, DBMS, Stern/NYU, Spring 2008 8 Doing it without a DBMS... Transfer $100 from George’s savings to checking: Write a C program to do the following: Read savings.txt Find&update the line w/“George” balance -= 100 Write savings.txt Read checking.txt Find&update the line w/“George” balance += 100 Write checking.txt M.P. Johnson, DBMS, Stern/NYU, Spring 2008 9 Problems without an DBMS... 1. System crashes: Read savings.txt Find&update the line w/ “George.” Write savings.txt Read checking.txt Find&update the line w/ “George” Write checking.txt Same problem even if reordered High-volume (Rare frequent) CRASH! 2. Simultaneous access by many users George and Dick visit ATMs at same time Lock checking.txt before each use–what is the problem? M.P. Johnson, DBMS, Stern/NYU, Spring 2008 10 Problems without a DBMS... 3.Large data sets (100s of GBs, or TBs, …) No indices Finding “George” in huge flatfile is expensive Modifications intractable without better data structures “George” “Georgie” is very expensive Deletions are very expensive M.P. Johnson, DBMS, Stern/NYU, Spring 2008 11 Problems without an DBMS... 5. Security? File system may lack security features File system security may be coarse 6. Application programming interface (API)? Interfaces, interoperability 7. How to query the data? M.P. Johnson, DBMS, Stern/NYU, Spring 2008 12 In homebrew system, must support failover/rovery concurrent use deal with large datasets? security interop? querying in what? DBMS as application Q: How does a DBMS solve these problems? A: See third part of course, but for now… M.P. Johnson, DBMS, Stern/NYU, Spring 2008 13 One big issue: Transaction processing Grouping of several queries (or other DB operations) into one transaction ACID test properties Atomicity Consistency constraints on relationships Isolation all or nothing concurrency control simulated solipsism Durability Crash recovery M.P. Johnson, DBMS, Stern/NYU, Spring 2008 14 Atomicity & Durability Avoiding inconsistent state A DBMS prevents this outcome xacts are all or nothing One simple idea: log progress of and plans for each xact Durability: changes stay made (with log…) Atomicity: entire xact is committed at once M.P. Johnson, DBMS, Stern/NYU, Spring 2008 15 Isolation Many users concurrent execution Interweaving actions of different user programs but can lead to inconsistency: Disk access is slow (compared to CPU) don’t waste CPU – keep running e.g., two programs simultaneously withdraw from the same account For each user, should look like a single-user system Simulated solipsism M.P. Johnson, DBMS, Stern/NYU, Spring 2008 16 Isolation Contrast with a file in two Notepads Strategy: ignore multiple users whichever saves last wins first save is overwritten Contrast with a file in two Words Strategy: blunt isolation One can edit To the other it’s read-only M.P. Johnson, DBMS, Stern/NYU, Spring 2008 17 Consistency Each xact (on a consistent DB) must leave it in a consistent state can define integrity constraints checks that the defined claims about the data Only xacts obeying them are allowed M.P. Johnson, DBMS, Stern/NYU, Spring 2008 18 A level up: data models Any DBMS uses a data model: collection of concepts for describing data Relational data model: basically universal Oracle, DB2, SQLServer, other SQL DBMSs Relations: table of rows & columns a rel’s schema defines its fields Though some have OO extensions… M.P. Johnson, DBMS, Stern/NYU, Spring 2008 19 Data Schemas Schema: description of partic set of data, using some data model “Physical schema” Schema Physical files on disk Set of relations/tables, with structure Views (“external schema”) Virtual tables generated for user types M.P. Johnson, DBMS, Stern/NYU, Spring 2008 20 Schema e.g.: college registrar Schema: Physical schema: Students(ssn: string, name: string, login: string, age: int, gpa: real) Courses(cid: string, cname: string, credits: int) Enrolled(sid:string, cid:string, grade: string) Relations stored as unordered text files. Indices on first column of each rel Views: My_courses(cname: string, grade: string, credits: int) Course_info(ssn: string, name: string, status: string) M.P. Johnson, DBMS, Stern/NYU, Spring 2008 21 How the programmer sees the DBMS Start with SQL DDL to create tables: CREATE TABLE Students ( Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, Category CHAR(20) ); Continue with SQL to populate tables: INSERT INTO Students VALUES('Hillary', '123456789', 'undergraduate'); M.P. Johnson, DBMS, Stern/NYU, Spring 2008 22 How the programmer sees the DBMS Takes: Students: SSN 123-45-6789 234-56-7890 Courses: CID C20.0046 C20.0056 Name Hillary Barak … Category undergrad grad … SSN 123-45-6789 CID C20.0046 123-45-6789 C20.0056 234-56-7890 C20.0046 … CName Databases Advanced Software semester Spring, 2004 Spring, 2004 Fall, 2003 Ultimately files, but complex M.P. Johnson, DBMS, Stern/NYU, Spring 2008 23 Querying: Structured Query Language Find all the students who have taken C20.0046: SELECT SSN FROM Takes WHERE CID='C20.0046'; Find all the students who C20.0046 previously: SELECT SSN FROM Takes WHERE CID='C20.0046' AND Semester='Fall, 2005'; Find the students’ names: SELECT Name FROM Students, Takes WHERE Students.SSN=Takes.SSN AND CID='C20.0046' AND Semester='Fall, 2005'; M.P. Johnson, DBMS, Stern/NYU, Spring 2008 24 Database Industry Relational databases are based on set theory Commercial DBMSs: Oracle, IBM’s DB2, Microsoft’s SQL Server, etc. Opensource: MySQL, PostgreSQL, etc. DBAs manage these Programmers write apps (CRUD, etc.) XML (“semi-structured data”) also important M.P. Johnson, DBMS, Stern/NYU, Spring 2008 25 The Study of DBMS Primary aspects: Data modeling SQL DB programming DBMS implementation This course covers all four (tho less of #4) Also will look at some more advanced areas XML, websearch, column-oriented DBs, RAID, RegExs, MapReduce… M.P. Johnson, DBMS, Stern/NYU, Spring 2008 26 Course outline Database design: The relational model: Entity/Relationship models Modeling constraints Relational algebra Transforming E/R models to relational schemas SQL DDL & query language M.P. Johnson, DBMS, Stern/NYU, Spring 2008 27 Course outline Programming for databases Some DB implementation Indexes, sorting, xacts Advanced topics… May change as course progresses partly in response to audience Also “current events” Slashdot/whatever, Database Blog, etc. M.P. Johnson, DBMS, Stern/NYU, Spring 2008 28 Textbook: Database Management Systems by Raghu Ramakrishnan & Johannes Gehrke 3 edition (August 14, 2002) Available: NYU bookstore Amazon/BN (may be cheaper) Amazon.co.uk (may be cheaper still) Links on class page Difficult but good M.P. Johnson, DBMS, Stern/NYU, Spring 2008 29 SQL Readings Many SQL references available online Good online (free) SQL tutorials include: A Gentle Introduction to SQL (http://sqlzoo.net/) SQL for Web Nerds (http://philip.greenspun.com/sql/) M.P. Johnson, DBMS, Stern/NYU, Spring 2008 30 Communications M. P. Johnson mjohnson@stern Office hours: after class To receive class mail: Activate account: http://start.stern.nyu.edu Forward mail: http://simon.stern.nyu.edu M.P. Johnson, DBMS, Stern/NYU, Spring 2008 31 Communications Web page: http://pages.stern.nyu.edu/~mjohnson/dbms/ syllabus course policies antecedent courses Blackboard web site Some materials will be available here Discussion board send general-interest messages here to benefit all Go to http://sternclasses.nyu.edu Click on C20.0046 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 32 Grading Prerequisites: Requirements & base score: Light programming experience A bit of “mathematical maturity” Interest in IT/CS Homework 15%: O(3) Project: 30% - see below… Midterm (closed book/notes): 20% Final (closed book/notes; likely 2hrs in class): 25% Class participation/pop-quizzes: 10% Stern Curve Consistent class attendance is required Absences will seriously affect your total grade. Final score = base score – 2n-1 where n = # missed quizzes (if n>0) M.P. Johnson, DBMS, Stern/NYU, Spring 2008 33 The Project: design end-to-end DB web app data model creation of DB in Oracle/MySQL Population with real(alistic) data web app for accessing/modifying data Identify entities & their relationships relations Identification of “interesting” questions & actions Produce DBMS interface Work in pairs (/threes) Choose topic on your own Start forming your group today! M.P. Johnson, DBMS, Stern/NYU, Spring 2008 34 Collaboration model Homework and exams done individually Project done with your team members only, though can in general use any tools Non-cited use of others’ problem solutions, code, etc. = plagiarism See Stern’s stern academic honesty policy Contact me if you’re at all unclear before a particular case Cite any materials used if you’re at all unclear after M.P. Johnson, DBMS, Stern/NYU, Spring 2008 35 On-going Feedback Don’t be afraid to ask questions Some parts will be abstract/mathematical Topic selection will be partly based on student interest M.P. Johnson, DBMS, Stern/NYU, Spring 2008 36 So what is this course about, really? Languages: SQL (some XML …) Data modeling Some theory! (rereading…) Some DBMS implementation (algs & data structs) Algorithms and data structures (in the latter part) Functional dependencies, normal forms e.g., how to find most efficient schema for data e.g., indices make data much faster to find – how? Lots of DB implementation and hacking for the project M.P. Johnson, DBMS, Stern/NYU, Spring 2008 37 For next time Get the book Skim chapter 1 Start reading chapter 2 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 38 For right now/tonight: email survey 1. 2. 3. 4. 5. Send to SOMEWHERE name previous cs/is/math/logic courses previous programming experience career plans: programmer, DBA, MBA, etc. why taking class/what you’re interested in learning about M.P. Johnson, DBMS, Stern/NYU, Spring 2008 39