Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ORCL1 Oracle 8i: SQL & PL/SQL Session #1 Matthew P. Johnson CISDD, CUNY Fall, 2004 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 1 Personnel Instructor: Matthew P. Johnson TA: Myo Zin mattpj at gmail.com myokyawzin at msn.com Admin: Dawn Kleinberger dkleinberger at gc.cuny.edu Matthew P. Johnson, OCL1, CISDD CUNY, F2004 2 Communications Web page: http://pages.stern.nyu.edu/~mjohnson/oracle/ syllabus course policies reading assignments etc. Matthew P. Johnson, OCL1, CISDD CUNY, F2004 3 Acknowledgements Thanks to Ramesh, Ullman, et al., Raghu and Johannes, Dan Suciu, Arthur Keller, David Kuijt for course materials See classpage for other related, antecedent DBMS courses Matthew P. Johnson, OCL1, CISDD CUNY, F2004 4 What is a Database? A very large, integrated collection of data. Models real-world enterprise. Entities Relationships students, courses, instructors, TAs George is currently taking OCL1 Dick is currently teaching OCL1 Condi is currently TA-ing OCL1 but took it last semester Database Management System (DBMS): large software package designed to store and manage databases Matthew P. Johnson, OCL1, CISDD CUNY, F2004 5 Databases are everywhere: Ordering a pizza Databases involved? Pizza Hut’s DB 1. stores previous orders by customer stores previous credit cards used Credit card records 2. huge databases of (attempted) purchases location, date, amount, parties Got approved by credit-report companies phone company’s records 3. 4. Local Usage Details (“Pull his LUDs, Lenny.”) Caller ID 5. ensures reported address matches destination Matthew P. Johnson, OCL1, CISDD CUNY, F2004 6 Your wallet is full of DB records Driver’s license Credit cards Medical insurance card Social security card Gym membership Individual checks Dollar bills (w/serial numbers) Maybe even photos (ids on back) Matthew P. Johnson, OCL1, CISDD CUNY, F2004 7 Databases are everywhere Q: Websites backed by DBMSs? retail: Amazon, etc. data-mining: “Page You Made” search engines: Google, etc. directories: Internic, etc. searchable DBs: IMDB, tvguide.com, etc. Q: Non-web examples of DBMSs? airline bookings criminal/terrorist: TIA NYPD’s CompStat all serious crime stats by precinct Retailers: Wal-Mart, etc. when to re-order, purchase patterns, data-mining Genomics! Matthew P. Johnson, OCL1, CISDD CUNY, F2004 8 Example of a Traditional DB App Suppose we are building a system to store the information about: checking accounts savings accounts account holders state of each of each person’s accounts Matthew P. Johnson, OCL1, CISDD CUNY, F2004 9 Can we do it without a DBMS? Sure we can! Start by storing the data in files: checking.txt savings.txt customers.txt Now write C or Java programs to implement specific tasks Matthew P. Johnson, OCL1, CISDD CUNY, F2004 10 Doing it without a DBMS... Transfer $100 from George’s savings acc to checking: Write a C program to do the following: Read ‘savings.txt’ Find & update the record “George” balance -= 100 Write ‘savings.txt’ Read ‘checking.txt’ Find & update the record “George” balance += 100 Write ‘checking.txt’ Matthew P. Johnson, OCL1, CISDD CUNY, F2004 11 Problems without an DBMS... 1. System crashes: Read ‘savings.txt’ Find&update the rec “George.” Write ‘savings.txt’ Read ‘checking.txt’ Find&update the rec “George” Write ‘checking.txt’ CRASH ! Q: What is the problem? A: George lost his $100 Same problem even if reordered 2. Simultaneous access by many users George and Dick visit ATMs at same Lock checking.txt before each use– what is the problem? Matthew P. Johnson, OCL1, CISDD CUNY, F2004 12 Problems without an DBMS... 3.Large data sets (say 50GB) No indices Why is this a problem? Finding “George” in huge flatfile is expensive Modifications intractable without better data structures “George” “Georgie” is very expensive Deletions are very expensive Matthew P. Johnson, OCL1, CISDD CUNY, F2004 13 Problems without an DBMS... 5.Security? File system may be insecure File system security may be coarse 6.Application programming interface (API)? suppose need other apps to access DB 7.How to interact with other DBMSs? Matthew P. Johnson, OCL1, CISDD CUNY, F2004 14 General problems to solve In building our own system, many Qs arise: how do we store the data? (file organization, etc.) how do we query the data? (write programs…) make sure that updates don’t mess things up? provide different views on the data? leave the DB “consistent” e.g., ATM user’s view v. bank teller’s view how do we deal with crashes? Too hard! Go buy Oracle! Q: How does a DBMS solve these problems? A: Long story; see other courses/books Matthew P. Johnson, OCL1, CISDD CUNY, F2004 15 Big issue: Transaction processing Grouping of several queries (or other database actions) into one transaction ACID properties Atomicity Consistency constraints on relationships Isolation all or nothing concurrency control Simulated solipsim Durability Crash recovery Matthew P. Johnson, OCL1, CISDD CUNY, F2004 16 Atomicity & Durability Saw how George lost $100 with makeshift DBMS DBMS prevents this outcome xacts are all or nothing One idea: Keep a log (history) of all actions in set of xacts Durability: Use log to redo or undo certain ops in crash recovery Atomicity: don’t really commit changes until end Then, all at once Matthew P. Johnson, OCL1, CISDD CUNY, F2004 17 Isolation Concurrent execution is essential for performance. Interleaving actions of different user programs can lead to inconsistency: Frequent, slow disk accesses don’t waste CPU – keep running e.g., two programs simultaneously withdraw from the same account DBMS ensures such problems don’t arise: users can pretend they are using a single-user system Matthew P. Johnson, OCL1, CISDD CUNY, F2004 18 Isolation Contrast with a file in two Notepads Strategy: ignore multiple users whichever saves last wins first save is overwritten Contrast with a file in two Words Strategy: blunt isolation One can edit To the other it’s read-only Matthew P. Johnson, OCL1, CISDD CUNY, F2004 19 Consistency Each xant (on a consistent DB) must leave it in a consistent state can define integrity constraints checks the defined claims about the data remain true Matthew P. Johnson, OCL1, CISDD CUNY, F2004 20 Data Models Any DBMS uses a data model: collection of concepts for describing data Schema: description of partic set of data, using some data model Relational data model: most widely used (by far) data model Oracle, DB2, SQLServer, other SQL DBMSs main concept: relation ~ table of rows & columns a rel’s schema defines its fields Matthew P. Johnson, OCL1, CISDD CUNY, F2004 21 Example: university database Conceptual schema: Physical schema: Students(ssn: string, name: string, login: string, age: int, gpa: real) Courses(cid: string, cname: string, credits: int) Enrolled(sid:string, cid:string, grade: string) Relations stored as unordered text files. Indices on first column of each rel External Schema (View): Course_info(ssn: string, name: string) My_courses(cname: string, grade: string) Matthew P. Johnson, OCL1, CISDD CUNY, F2004 22 How the programmer sees the DBMS Start with DDL to create tables: CREATE TABLE Students ( Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, Category CHAR(20) ) ... Continue with DML to populate tables: INSERT INTO Students VALUES(‘Howard, ‘123456789’, ‘undergraduate’) . . . . Matthew P. Johnson, OCL1, CISDD CUNY, F2004 23 How the programmer sees the DBMS Tables: Takes: Students: SSN 123-45-6789 234-56-7890 Courses: CID C20.0046 C20.0056 Name Howard Wesley … Category undergrad grad … SSN 123-45-6789 CID C20.0046 123-45-6789 C20.0056 234-56-7890 C20.0046 … CName Databases Advanced Software semester Spring, 2004 Spring, 2004 Fall, 2003 Still implemented as files, but behind the scenes can be quite complex “data independence” = separate logical view from physical implementation Matthew P. Johnson, OCL1, CISDD CUNY, F2004 24 Querying: Structured Query Language Find all the students who have taken OCL1: Find all the students who OCL1 last fall: SELECT SSN FROM Takes WHERE CID=“OCL1” AND Semester=“Fall, 2003” Find the students’ names: SELECT SSN FROM Takes WHERE CID=“OCL1” SELECT Name FROM Students, Takes WHERE Students.SSN=Takes.SSN AND CID=“OCL1” AND Semester=“Fall, 2003” Query processor does this efficiently Matthew P. Johnson, OCL1, CISDD CUNY, F2004 25 Database Industry Relational databases are a great success of theoretical ideas based on most “theoretical” type of math there is: set theory DBMS companies are among the largest software companies in the world Oracle, IBM (with DB2), Microsoft (SQL Server, Microsoft Access), Sybase Also opensource: MySQL, PostgreSQL, etc. $20B+ industry XML (“semi-structured data”) also important New lingua franca for exchanging data Matthew P. Johnson, OCL1, CISDD CUNY, F2004 26 The Study of DBMS Several aspects: This course covers all three Modeling and design of databases DBMS programming: querying and update DBMS implementation though more time on first two Also will look at some more advanced areas XML, data-mining, LDAP? Matthew P. Johnson, OCL1, CISDD CUNY, F2004 27 Databases are used by DB app programmers Database administrators (DBAs) desktop app programmers web developers design schemas security/authorization crash recovery tuning Everyone else (perhaps indirectly) “You may not be interested in databases, but databases are interested in you.” - Trotsky Matthew P. Johnson, OCL1, CISDD CUNY, F2004 28 Course outline Two biggest topics: SQL PL/SQL But also: Database design: Entity/Relationship models Modeling constraints The relational model: Relational algebra Transforming E/R models to relational schemas Matthew P. Johnson, OCL1, CISDD CUNY, F2004 29 Outline (Continued) SQL views and triggers Connecting to a database from a programming language Web apps XML May change as course progresses partly in response to audience Matthew P. Johnson, OCL1, CISDD CUNY, F2004 30 Textbook Mastering Oracle SQL Hardcover: 492 pages Publisher: O'Reilly ISBN: 0596006322 2nd Edition June 2004 Distributed in class, hopefully next time Matthew P. Johnson, OCL1, CISDD CUNY, F2004 31 SQL Readings Optional reference: SQL in a Nutshell Online (free) SQL tutorials include: SQL for Web Nerds (http://philip.greenspun.com/sql/) A Gentle Introduction to SQL (http://sqlzoo.net/) Matthew P. Johnson, OCL1, CISDD CUNY, F2004 32 On-going Feedback Don’t wait until the class is over to give feedback on improving it too late for you then! Send mail if you have questions or concerns “We’re in touch, so you be in touch.” Matthew P. Johnson, OCL1, CISDD CUNY, F2004 33 Summary DBMS used to maintain, query large datasets Benefits include recovery from system crashes, concurrent access, data integrity, security, and quick application development Database skills are critical in financial services, marketing and other business areas! Matthew P. Johnson, OCL1, CISDD CUNY, F2004 34 So what is this course about, really? A bit of everything! Languages: SQL, XPath, XQuery Data modeling Some theory! Functional dependencies, normal forms e.g., how to find most efficient schema for data Lots of implementation and hacking in end-ofcourse project Business DBMS examples/cases Most importantly: how to meet real-world needs Matthew P. Johnson, OCL1, CISDD CUNY, F2004 35 For right now: written survey name email previous cs/is/math/logic courses/background previous programming experience Perl? PHP? HTML? Job:: programmer, DBA, etc. why taking class Matthew P. Johnson, OCL1, CISDD CUNY, F2004 36 Agenda Last time: intro, RDBMS, ACID test This time: E/R model 6. Identify entity sets, relations and attributes One-one, one-many, many-many relations Simple ER diagrams to model a situation 3-way relationships; Converting to binary Entities with multiple roles Subclasses Design issues 1. Principles of faithfulness & simplicity in ER diagrams Redundancy Whether an element should be an attribute or entity set Replacing a relationships with entity sets 1. 2. 3. 4. 5. 2. 3. 4. Matthew P. Johnson, OCL1, CISDD CUNY, F2004 37 DB development path the World E/R design Relational schema Matthew P. Johnson, OCL1, CISDD CUNY, F2004 Relational DB 38 Entity Relationship (E/R) Model A popular data model – useful to database designers Graphical representation of miniworld Helps design the database, not implement it E/R design is translated to a relational design relational design then implemented in an RDBMS Elements of model Entities Entity Sets Attributes Relationships (!= relations!) Matthew P. Johnson, OCL1, CISDD CUNY, F2004 39 Elements of E/R Model: Entity Sets Entity: like an object Entity set: set of one sort of entities or a concept e.g. President Bush Particular instance of a concept e.g. World leaders Generally, same set of attributes World Leader Represented by a rectangle A “good” entity set – you decide Common properties Correspond to class of phys. or bus. objects (People, products, accounts, grades, etc.) Matthew P. Johnson, OCL1, CISDD CUNY, F2004 40 Elements of E/R Model: Attributes Properties of entities in entity set Like fields in a struct Like columns in a table/spreadsheet Like data members in an object Values in some domain (e.g., ints, strings) Represented by ovals: ID Name Assumed atomic But could have limited structure Ints, strings, etc. Matthew P. Johnson, OCL1, CISDD CUNY, F2004 Student 41 Elements of E/R Model: Relationships Connect two or more entity sets e.g. students enroll in courses Binary relationships: connect two entity sets – most common Multiway relationships: connect several entity sets Represented by diamonds Students Enroll Courses Matthew P. Johnson, OCL1, CISDD CUNY, F2004 42 Elms of E/R Model: Rel’ships (cont’d) Students Enroll in courses Courses are Held in rooms The E/R data model: Students ID Enroll Courses Name Held Rooms Matthew P. Johnson, OCL1, CISDD CUNY, F2004 43 A little set theory A mathematical set a collection of members A set is defined by its members Sets specified by listing: “Are you in or are you out?” No other structure, no order, no duplicates allowed {1, 2, 3, …} = N {1, 2, George Bush} (not useful in DBMS) Or by “set-builder” notation: { x in N: 2 divides x} = ? { x in Presidents | reelected(x)} = ? {2x: x in N} = ? Matthew P. Johnson, OCL1, CISDD CUNY, F2004 44 A little set theory One set can be a subset of another (which is a superset) ReelectedPresidents is a subset of Presidents Also, RP is a proper subset of Pres – some lost reelection Given two sets X and Y, the cross product or Cartesian product is X x Y = {(x,y): x in X, y in Y} = the set of all ordered pairs in which the first comes from X and the second comes from Y Important: (x,y) != {x,y} In an order pair or tuple Order matters Duplicates are allowed Matthew P. Johnson, OCL1, CISDD CUNY, F2004 45 A little set theory Mathematically, a relation(ship) between X and Y is just a subset of X x Y = all those pairs (x,y) s.t. x is related to y Example: owner-of O on People, Cats The equals relation E on N, N: O(MPJ, Izzy) holds E(3,3) holds because 3 = 3 E(3,4) does not hold E is still a set: E = {(1,1), (2,2), (3,3), …} Father of relation F on People, People: F(GHWB, GWB) holds F(GWB, GHWB) does not hold Relations aren’t necessarily symmetric Matthew P. Johnson, OCL1, CISDD CUNY, F2004 46 Multiplicity of Relationships Many-many Many-one One-one Representation of relationships No arrow: many-to-many Sharp arrow: many-to-one Rounded arrow: “exactly one” “key constraint” One-one: Matthew P. Johnson, OCL1, CISDD CUNY, F2004 47 Multiplicity of Relationships Many-to-many: Students Enrolls Courses Many to one: a student lives in <= 1 residence hall Student Live Residence hall Many to exactly one: a student must live in a residence hall Student Live Residence hall Matthew P. Johnson, OCL1, CISDD CUNY, F2004 48 Multiplicity, set-theoretically Assume no vars below are equal Many-one means: One-many means: if (x1, y1) in R then (x2,y1) cannot be in R One-one means: if (x1,y1) in R then (x1,y2) cannot be in R if (x1,y1) in R, then neither (x2,y1) nor (x1,y2) can be in R Notice: one-one is stronger than many-one One-one implies both many-one and onemany Matthew P. Johnson, OCL1, CISDD CUNY, F2004 49 E/R Diagram Name Name Students Enrolls Courses ID ID Assisting ID TA Matthew P. Johnson, OCL1, CISDD CUNY, F2004 Name 50 E/R Diagrams Works if each TA is a TA of all students. Connection student-TA is only through the course But what if students were divided into sections, each section with a separate TA? Then, a student in OCL1 would be related to only one of the TA's for OCL1. Which one? 3-way relationship is helpful here Matthew P. Johnson, OCL1, CISDD CUNY, F2004 51 Multiway Relationships Students Courses Enrolls Enrolls entries: Students Carol Richard Al … Courses OCL1 OCL1 OCL1 … TAs Donald Paul Colin … TAs NB: Enrolls determines TA: (student, course) at most one TA Matthew P. Johnson, OCL1, CISDD CUNY, F2004 52 Converting multiway relships to binary Some models (e.g. ODL) limit relationships to binary Multiway relationship – equivalent collection of binary, many to one relationships Replace relationship with connecting entity set Course-of Students Student-of NB: Enrolls has no attributes! Courses Enrolls TA- of Matthew P. Johnson, OCL1, CISDD CUNY, F2004 TAs 53 Second multiway e.g.: renting movies Scenario: a Customer Rents a Movie from a VideoStore on a certain date date VideoStore Rental Movie Customer Q: Which entity does date belong to? A: To the fact of the renting Relationships can have attributes always (implicitly) many-one Matthew P. Johnson, OCL1, CISDD CUNY, F2004 54 Second multiway e.g.: renting movies But they don’t have to Relationship attributes can be replaced with (trivial) new entities date Date VideoStore Rental Movie Customer Matthew P. Johnson, OCL1, CISDD CUNY, F2004 55 Second multiway e.g.: renting movies Where can we draw arrows? date VideoStore Rental Movie Customer (store, video, customer) date ? Date is a relship att, implicitly determinied (store, video, date) customer ? (store, date, customer) video ? (video, date, customer) store ? Matthew P. Johnson, OCL1, CISDD CUNY, F2004 56 Arrow-drawing Q: Why does it matter? Round arrow benefit: Regular arraw benefit: Obvious: One item takes less space than many Less obvious: easier to access one item x than set of one item {x} In programming: an int v. a linked list with just one int Mapping to a set of either one elm or none seems bad But not implemented this way Always one element, but that value may be NULL Lesson: it pays to identify your relship’s multiplicity Matthew P. Johnson, OCL1, CISDD CUNY, F2004 57 Second multiway e.g.: renting movies date VideoStore Convert to binary? Rental Movie Customer date Rental MovieOf Movie StoreOf Store BuyerOf Customer Matthew P. Johnson, OCL1, CISDD CUNY, F2004 58 Roles in relationships Entity set appears more than once in a relship Generally distinct entities Each appearance is in a different role Edges labeled by roles Successor Pre-req Course Prereq Course (Pre-req) Accounting Finance-I Finance-I Calculus Matthew P. Johnson, OCL1, CISDD CUNY, F2004 Course (Successor) Finance-I Derivatives Finance-II Derivatives 59 Subclasses in the E/R model Some entities are special cases of other Conversely: some are generalizations Humans are specialized mammals Grad students are specialized students And, in turn, specialized mammals NB: These aren’t examples but subclasses Subclass A isa B Represented by a triangle Always one-to-one, though arrows omitted Root is more general Multiple inheritance is allowed! A single entity may consist of all components (sets of fields) in aribtrary ESs and their ancestors Matthew P. Johnson, OCL1, CISDD CUNY, F2004 60 Subclasses length stars Voices title “Lion year King”: atts of Movies; relship Voices Movies isa Lion King TX isa Chainsaw Massacre Weapon MurderMysteries Cartoons “Roger Component Roger Rabbit Matthew P. Johnson, OCL1, CISDD CUNY, F2004 Rabbit”: atts of Movies; relship Voices; att weapon 61 E/R inheritance v. OO inheritance In a OOP class hierarchy, children also inherit “attributes” from parents In E/R, an entity may be composed of components from multiple, not-directly-related ESs But an object is an instance of one class Roger Rabbit is composed of components from Cartoons, Murder Mysteries, and Movies We could create a Cartoon Murder Mysteries ES if there were any atts specific to them So the real difference: In E/R, can have implicit multiple inheritance between any set of IS-Aconnected nodes (sharing a root) Matthew P. Johnson, OCL1, CISDD CUNY, F2004 62 Design Principles Faithfulness Avoiding redundancy Simplicity Choice of relationships Picking elements Matthew P. Johnson, OCL1, CISDD CUNY, F2004 63 Faithfulness Is the relationship many-many or many-one? Are the attributes appropriate? Are the relationships applicable to the entities? Examples Courses & instructors maybe many-one, maybe many-many Bosses & subordinates maybe one-many, maybe many-many Matthew P. Johnson, OCL1, CISDD CUNY, F2004 64 Simplicity Einstein: Theories as simple as possible, but not simpler. Use as few elements as possible Minimum required relations No unnecessary attributes (will you be using this attribute?) Eliminate “spinning wheels” Example: how can we simplify this? Movies Owned-by Ownings Owns Matthew P. Johnson, OCL1, CISDD CUNY, F2004 Studios 65 Avoiding redundancy Say everything exactly once Minimize database storage requirements More important: prevent possible update errors simplest but not only e.g.: modify data one place but not the other – more later Example: Spot the redundancy Studios Own Name Movies Length Address Name StudioName Matthew P. Johnson, OCL1, CISDD CUNY, F2004 66 Next time We’ll finish E/R models and begin the relational model Readings will be posted Matthew P. Johnson, OCL1, CISDD CUNY, F2004 67