Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS351 Introduction to Database systems [email protected] 1 About me Quan Wang – developer at Oracle Corp. • • • JDBC Web Services Cloud 2 Content Introduction – Why databases? Data Model 3 Why databases? It used to be about boring stuff: employee records, bank records, etc. Today, the field covers all the largest sources of data, with many new ideas. Web search. Data mining. Scientific and medical databases. Integrating information. 4 Why databases? You may not notice it, but databases are behind almost everything you do on the Web. Google searches. Queries at Amazon, eBay, etc. 5 Why databases? Databases often have unique concurrency-control problems. Many activities (transactions) at the database at all times. Must not confuse actions, e.g., two withdrawals from the same account must each debit the account. 6 Why databases? Changing landscape System R: Oracle, MySQL, IBM, MS NoSQL: MangoDB NewSQL: VoltDB Cloud: Amazon RDS, Google Bigquery Bigdata: Hadoop, MapR 7 Why databases? Job perspectives – – Startups Massive industry 8 Content Introduction – – Why database? What to cover? Data Model 9 What to cover? Database – – a set of inter-related data pertaining to some organization airline, university, store Database Management System (DBMS) – – one or more databases algorithms accessing the databases 10 11 What to cover? Database Theory – – relational model and algebra norm forms Design of databases – – normalization E/R, UML 12 What to cover? Database programming – SQL, stored procedures. Integration? – – ODBC, JDBC O-R Mapping 13 What not covered? Not covered – – – – DB tuning DB administration(DBA) DB implementation Programming languages • May need to learn languages and the DB related aspects 14 Content Introduction – – – Why database? What to cover? Logistics Data Model 15 Logistics Email: [email protected] Office hour – – – Tuesday & Thursday 3 pm to 5pm 201L 16 Logistics No TA No detailed comments on assignments Distribute answer keys when appropriate Check discrepancies Raise issues 17 Logistics Course page: – – – Http://encs.vancouver.wsu.edu/~quwa ng weekly schedule assignments • • • due Thursdays before class paper submission preferred email submission ok 18 Logistics Assignments 30% Team project 20% Midterms 25% Final 25% 19 Logistics Team project – – – mini-ebay Max 3 people each team Single grade for each team Assignments and project – Late submission 10% penalty 20 Logistics Databases – – MySQL SQLite • /usr/bin/sqlite3 21 Logistics Exams – – – Lectures, textbook, and assigned readings Only areas covered in the lectures Not curved 22 Content Introduction Why database? What to cover? Logistics Data Model – flat files 23 Flat files data savings.dat NAME,ADDR,BALANCE,OPENED,INTEREST David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 operations – – – check balance deposit or withdraw change address 24 Flat files data David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 problems – code coupled with data format • changing delimiter would require code change 25 Flat files data Savings.data: David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 Checkings.data: David, 789 NE Sandy Blvd, 1500, 08/20/1998, 0.001! Lisa, 432 NE Alberta St, 3500, 09/20/1990, 0 problems – redundancy leads to inconsistency 26 Flat files data David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005 Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 problems – concurrency • withdraw calls for file lock – – withdraw $100 from the same account at the same time, synchronized withdraw $100 from different accounts at the same time, synchronized unnecessarily 27 Flat files data David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005 Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 problems – user interface • file based algorithms 28 Flat files Why not flat files? – – – – code coupled with data redundancy leading to inconsistency low concurrency unfriendly user interface 29 Relational data model database as abstraction data model hides data format SQL hides algorithms runtime handles concurrency 30 Content Introduction Data Model – – flat files relational data model • • • relations schema keys 31 What is a Data Model? 1. Underlying structure of a database. 2. Mathematical representation of data. Examples: • • • • flat file model = files relational model = tables; semistructured model = trees/graphs; key-value model=key value pairs. 3. Operations on data. 4. Constraints. 32 A Relation is a Table Attributes (column headers) Tuples (rows) name manf Winterbrew Pete’s Bud Lite Anheuser-Busch Beers Relation name 33 Schemas Relation schema = relation name and attribute list. Optionally: types of attributes. Example: Beers(name, manf) or Beers(name: string, manf: string) Database = collection of relations. Database schema = set of all relation schemas in the database. 34 Why Relations? Very simple model. Often matches how we think about data. Abstract model that underlies SQL, the most important database language today. 35 Example Sells( Joe’s Joe’s Sue’s Sue’s bar, Bud Miller Bud Coors beer, price ) 2.50 2.75 2.50 3.00 Bars( name, addr Joe’s Maple St. Sue’s River Rd. 37 ) Database Schemas in SQL SQL is primarily a query language, for getting information from a database. But SQL also includes a data-definition component, DDL, for describing database schemas. 38 Creating (Declaring) a Relation Simplest form is: CREATE TABLE <name> ( <list of elements> ); To delete a relation: DROP TABLE <name>; 39 Elements of Table Declarations Most basic element: an attribute and its type. The most common types are: INT or INTEGER (synonyms). REAL or FLOAT (synonyms). CHAR(n ) = fixed-length string of n characters. VARCHAR(n ) = variable-length string of up to n characters. 40 Example: Create Table CREATE TABLE Sells ( bar CHAR(20), beer VARCHAR(20), price REAL ); 41 SQL Values Integers and reals are represented as you would expect. Strings are too, except they require single quotes. Two single quotes = real quote, e.g., ’Joe’’s Bar’. Any value can be NULL. 42 Dates and Times DATE and TIME are types in SQL. The form of a date value is: DATE ’yyyy-mm-dd’ Example: DATE ’2007-09-30’ for Sept. 30, 2007. 43 Times as Values The form of a time value is: TIME ’hh:mm:ss’ with an optional decimal point and fractions of a second following. Example: TIME ’15:30:02.5’ = two and a half seconds after 3:30PM. 44 45 Declaring Keys An attribute or list of attributes may be declared PRIMARY KEY or UNIQUE. Either says that no two tuples of the relation may agree in all the attribute(s) on the list. There are a few distinctions to be mentioned later. 46 Declaring Single-Attribute Keys Place PRIMARY KEY or UNIQUE after the type in the declaration of the attribute. Example: CREATE TABLE Beers ( name CHAR(20) UNIQUE, manf CHAR(20) ); 47 Declaring Multiattribute Keys A key declaration can also be another element in the list of elements of a CREATE TABLE statement. This form is essential if the key consists of more than one attribute. May be used even for one-attribute keys. 48 Example: Multiattribute Key The bar and beer together are the key for Sells: CREATE TABLE Sells ( bar CHAR(20), beer VARCHAR(20), price REAL, PRIMARY KEY (bar, beer) ); 49 PRIMARY KEY vs. UNIQUE 1. There can be only one PRIMARY KEY for a relation, but several UNIQUE attributes. 2. No attribute of a PRIMARY KEY can ever be NULL in any tuple. But attributes declared UNIQUE may have NULL’s, and there may be several tuples with NULL. 50 Content Introduction Data Model flat files relational data model – – – relations schema keys relations vs files 51 relations vs files data savings.dat NAME,ADDR,BALANCE,OPENED,INTEREST David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 52 relations vs files schema Savings( name CHAR(50), addr CHAR(50), balance REAL, opened DATE, interest REAL); 53 relations vs files relation name addr balance opened interest David 123 NW Flander St 500.75 2010 0.005 Lisa 432 Alberta St 4000.00 2001 0.006 operations – – – check balance deposit or withdraw change address 54 relations vs files relation name addr balance opened interest David 123 NW Flander St 500.75 2010 0.005 Lisa 432 Alberta St 4000.00 2001 0.006 advantage – code coupled with data format? 55 relations vs files relation name addr balance opened interest David 123 NW Flander St 500.75 2010 0.005 Lisa 432 Alberta St 4000.00 2001 0.006 advantage – concurrency • simultaneous withdraw calls 56 relations vs files relations Savings(name, addr, balance, opened, interest) Checkings(name, addr, balance, opened, interest) advantage – redundancy? 57 relations vs files relations Savings name addr balance opened interest David 123 NW Flander St 500.75 2010 0.005 Lisa 432 Alberta St 4000.00 2001 0.006 Checkings name addr balance opened interest David 123123 Sandy Blvd 1500 1998 0.001 Lisa 432 Alberta St 3500 1990 0 advantage – redundancy? 58 relations vs files relations Customers(cust_id, name, addr) Savings(cust_id, balance, opened, interest) Checkings(cust_id, balance, opened, interest) advantage – redundancy? 59 relations vs files Savings relations cust_id balance opened interest c1 500.75 2010 0.005 c2 4000.00 2001 0.006 Customers cust_id name addr c1 David 123 NW Flander St Checkings c2 Lisa 432 Alberta St advantage – cust_id balance opened interest c1 1200 1998 0.001 c2 3500 1990 0 redundancy? 60 relations vs files relation name addr balance opened interest David 123 NW Flander St 500.75 2010 0.005 Lisa 432 Alberta St 4000.00 2001 0.006 advantage – user interface • natural language like support desirable 61 relations vs files relation NAME ADDR BALANCE OPENED David 123 NW 500.75 Flander St 2010 0.005 Lisa 432 Alberta St 2001 0.006 4000.00 INTEREST advantage – user interface SELECT name, addr FROM savings WHERE balance>2000 62 relations vs files relation NAME ADDR BALANCE OPENED David 123 NW 500.75 Flander St 2010 0.005 Lisa 432 Alberta St 2001 0.006 4000.00 INTEREST advantage – SELECT name, addr, balance FROM savings user interface ORDER BY balance 63 Content Introduction Data Model – – – flat files relational data model relations vs files 64 Content Introduction Data Model – – – – flat files relational data model relations vs files semi-structured data models • XML and JSON 65 XML <Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author>Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <Publisher>Springer</Publisher> </Book> 66 JSON { "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" } } 67 XML and JSON, side-by-side { <Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author>Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <Publisher>Springer</Publisher> </Book> "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" } } 68 XML Schema <xs:element name="Book"> <xs:complexType> <xs:sequence> <xs:element name="Title" type="xs:string" /> <xs:element name="Authors"> <xs:complexType> <xs:sequence> <xs:element name="Author" type="xs:string" maxOccurs="5"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Date" type="xs:gYear" /> <xs:element name="Publisher" minOccurs="0"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Springer" /> <xs:enumeration value="MIT Press" /> <xs:enumeration value="Harvard Press" /> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> 69 JSON Schema { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false } 70 { String type "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { <xs:element name="Title" type="xs:string" /> "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false } 71 List type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", <xs:element name="Authors"> <xs:complexType> "properties": { <xs:sequence> "Book": { <xs:element name="Author" type="xs:string" maxOccurs="5"/> "type": "object", </xs:sequence> </xs:complexType> "properties": { </xs:element> "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false } 72 Year type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { <xs:element name="Date" type="xs:gYear" /> "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false } 73 { Enumeration type "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false <xs:element name="Publisher" minOccurs="0"> } <xs:simpleType> }, <xs:restriction base="xs:string"> "required": ["Book"], <xs:enumeration value="Springer" /> "additionalProperties": false <xs:enumeration value="MIT Press" /> } <xs:enumeration value="Harvard Press" /> </xs:restriction> </xs:simpleType> </xs:element> 74 Semi-Structured Data Models XML Adoption – – Web Services (SOAP) XML DB JSON Adoption – – Web Services (REST) MongoDB 75 Content Introduction Data Model – – – – – flat files relational data model relations vs files semi-structured data models history 76 History Hierarchical (IMS): late 1960’s and 1970’s Network (CODASYL): 1970’s Relational: 1970’s and early 1980’s Entity-Relationship: 1970’s Object-oriented: late 1980’s and early 1990’s Object-relational: late 1980’s and early 1990’s Semi-structured (XML): late 1990’s to the present 77