Download ppt

CS351 Introduction to Database systems [email protected] 1 About me Quan Wang – developer at Oracle Corp. • • • JDBC Web Services Cloud 2 Content Introduction – Why databases? Data Model 3 Why databases? It used to be about boring stuff: employee records, bank records, etc. Today, the field covers all the largest sources of data, with many new ideas.  Web search.  Data mining.  Scientific and medical databases.  Integrating information. 4 Why databases? You may not notice it, but databases are behind almost everything you do on the Web.  Google searches.  Queries at Amazon, eBay, etc. 5 Why databases? Databases often have unique concurrency-control problems.  Many activities (transactions) at the database at all times.  Must not confuse actions, e.g., two withdrawals from the same account must each debit the account. 6 Why databases? Changing landscape  System R: Oracle, MySQL, IBM, MS  NoSQL: MangoDB  NewSQL: VoltDB  Cloud: Amazon RDS, Google Bigquery  Bigdata: Hadoop, MapR 7 Why databases? Job perspectives – – Startups Massive industry 8 Content Introduction – – Why database? What to cover? Data Model 9 What to cover? Database – – a set of inter-related data pertaining to some organization airline, university, store Database Management System (DBMS) – – one or more databases algorithms accessing the databases 10 11 What to cover? Database Theory – – relational model and algebra norm forms Design of databases – – normalization E/R, UML 12 What to cover? Database programming – SQL, stored procedures. Integration? – – ODBC, JDBC O-R Mapping 13 What not covered? Not covered – – – – DB tuning DB administration(DBA) DB implementation Programming languages • May need to learn languages and the DB related aspects 14 Content Introduction – – – Why database? What to cover? Logistics Data Model 15 Logistics   Email: [email protected] Office hour – – – Tuesday & Thursday 3 pm to 5pm 201L 16 Logistics  No TA   No detailed comments on assignments Distribute answer keys when appropriate   Check discrepancies Raise issues 17 Logistics Course page: – – – Http://encs.vancouver.wsu.edu/~quwa ng weekly schedule assignments • • • due Thursdays before class paper submission preferred email submission ok 18 Logistics     Assignments 30% Team project 20% Midterms 25% Final 25% 19 Logistics  Team project – – –  mini-ebay Max 3 people each team Single grade for each team Assignments and project – Late submission 10% penalty 20 Logistics  Databases – – MySQL SQLite • /usr/bin/sqlite3 21 Logistics  Exams – – – Lectures, textbook, and assigned readings Only areas covered in the lectures Not curved 22 Content Introduction Why database? What to cover? Logistics    Data Model – flat files 23 Flat files  data savings.dat NAME,ADDR,BALANCE,OPENED,INTEREST David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006  operations – – – check balance deposit or withdraw change address 24 Flat files  data David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006  problems – code coupled with data format • changing delimiter would require code change 25 Flat files  data Savings.data: David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 Checkings.data: David, 789 NE Sandy Blvd, 1500, 08/20/1998, 0.001! Lisa, 432 NE Alberta St, 3500, 09/20/1990, 0  problems – redundancy leads to inconsistency 26 Flat files  data David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005 Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006  problems – concurrency • withdraw calls for file lock – – withdraw $100 from the same account at the same time, synchronized withdraw $100 from different accounts at the same time, synchronized unnecessarily 27 Flat files  data David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005 Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006  problems – user interface • file based algorithms 28 Flat files  Why not flat files? – – – – code coupled with data redundancy leading to inconsistency low concurrency unfriendly user interface 29 Relational data model  database as abstraction    data model hides data format SQL hides algorithms runtime handles concurrency 30 Content Introduction Data Model – – flat files relational data model • • • relations schema keys 31 What is a Data Model? 1. Underlying structure of a database. 2. Mathematical representation of data.  Examples: • • • • flat file model = files relational model = tables; semistructured model = trees/graphs; key-value model=key value pairs. 3. Operations on data. 4. Constraints. 32 A Relation is a Table Attributes (column headers) Tuples (rows) name manf Winterbrew Pete’s Bud Lite Anheuser-Busch Beers Relation name 33 Schemas Relation schema = relation name and attribute list.  Optionally: types of attributes.  Example: Beers(name, manf) or Beers(name: string, manf: string) Database = collection of relations. Database schema = set of all relation schemas in the database. 34 Why Relations? Very simple model. Often matches how we think about data. Abstract model that underlies SQL, the most important database language today. 35 Example Sells( Joe’s Joe’s Sue’s Sue’s bar, Bud Miller Bud Coors beer, price ) 2.50 2.75 2.50 3.00 Bars( name, addr Joe’s Maple St. Sue’s River Rd. 37 ) Database Schemas in SQL SQL is primarily a query language, for getting information from a database. But SQL also includes a data-definition component, DDL, for describing database schemas. 38 Creating (Declaring) a Relation Simplest form is: CREATE TABLE <name> ( <list of elements> ); To delete a relation: DROP TABLE <name>; 39 Elements of Table Declarations Most basic element: an attribute and its type. The most common types are:  INT or INTEGER (synonyms).  REAL or FLOAT (synonyms).  CHAR(n ) = fixed-length string of n characters.  VARCHAR(n ) = variable-length string of up to n characters. 40 Example: Create Table CREATE TABLE Sells ( bar CHAR(20), beer VARCHAR(20), price REAL ); 41 SQL Values Integers and reals are represented as you would expect. Strings are too, except they require single quotes.  Two single quotes = real quote, e.g., ’Joe’’s Bar’. Any value can be NULL. 42 Dates and Times DATE and TIME are types in SQL. The form of a date value is: DATE ’yyyy-mm-dd’  Example: DATE ’2007-09-30’ for Sept. 30, 2007. 43 Times as Values The form of a time value is: TIME ’hh:mm:ss’ with an optional decimal point and fractions of a second following.  Example: TIME ’15:30:02.5’ = two and a half seconds after 3:30PM. 44 45 Declaring Keys An attribute or list of attributes may be declared PRIMARY KEY or UNIQUE. Either says that no two tuples of the relation may agree in all the attribute(s) on the list. There are a few distinctions to be mentioned later. 46 Declaring Single-Attribute Keys Place PRIMARY KEY or UNIQUE after the type in the declaration of the attribute. Example: CREATE TABLE Beers ( name CHAR(20) UNIQUE, manf CHAR(20) ); 47 Declaring Multiattribute Keys A key declaration can also be another element in the list of elements of a CREATE TABLE statement. This form is essential if the key consists of more than one attribute.  May be used even for one-attribute keys. 48 Example: Multiattribute Key The bar and beer together are the key for Sells: CREATE TABLE Sells ( bar CHAR(20), beer VARCHAR(20), price REAL, PRIMARY KEY (bar, beer) ); 49 PRIMARY KEY vs. UNIQUE 1. There can be only one PRIMARY KEY for a relation, but several UNIQUE attributes. 2. No attribute of a PRIMARY KEY can ever be NULL in any tuple. But attributes declared UNIQUE may have NULL’s, and there may be several tuples with NULL. 50 Content Introduction Data Model flat files relational data model – –    – relations schema keys relations vs files 51 relations vs files  data savings.dat NAME,ADDR,BALANCE,OPENED,INTEREST David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005! Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006 52 relations vs files  schema Savings( name CHAR(50), addr CHAR(50), balance REAL, opened DATE, interest REAL); 53 relations vs files   relation name addr balance opened interest David 123 NW Flander St 500.75 2010 0.005 Lisa 432 Alberta St 4000.00 2001 0.006 operations – – – check balance deposit or withdraw change address 54 relations vs files   relation name addr balance opened interest David 123 NW Flander St 500.75 2010 0.005 Lisa 432 Alberta St 4000.00 2001 0.006 advantage – code coupled with data format? 55 relations vs files relation   name addr balance opened interest David 123 NW Flander St 500.75 2010 0.005 Lisa 432 Alberta St 4000.00 2001 0.006 advantage – concurrency • simultaneous withdraw calls 56 relations vs files  relations Savings(name, addr, balance, opened, interest) Checkings(name, addr, balance, opened, interest)  advantage – redundancy? 57 relations vs files relations  Savings name addr balance opened interest David 123 NW Flander St 500.75 2010 0.005 Lisa 432 Alberta St 4000.00 2001 0.006 Checkings  name addr balance opened interest David 123123 Sandy Blvd 1500 1998 0.001 Lisa 432 Alberta St 3500 1990 0 advantage – redundancy? 58 relations vs files  relations Customers(cust_id, name, addr) Savings(cust_id, balance, opened, interest) Checkings(cust_id, balance, opened, interest)  advantage – redundancy? 59 relations vs files Savings  relations cust_id balance opened interest c1 500.75 2010 0.005 c2 4000.00 2001 0.006 Customers cust_id name addr c1 David 123 NW Flander St Checkings c2 Lisa  432 Alberta St advantage – cust_id balance opened interest c1 1200 1998 0.001 c2 3500 1990 0 redundancy? 60 relations vs files relation   name addr balance opened interest David 123 NW Flander St 500.75 2010 0.005 Lisa 432 Alberta St 4000.00 2001 0.006 advantage – user interface • natural language like support desirable 61 relations vs files   relation NAME ADDR BALANCE OPENED David 123 NW 500.75 Flander St 2010 0.005 Lisa 432 Alberta St 2001 0.006 4000.00 INTEREST advantage – user interface SELECT name, addr FROM savings WHERE balance>2000 62 relations vs files   relation NAME ADDR BALANCE OPENED David 123 NW 500.75 Flander St 2010 0.005 Lisa 432 Alberta St 2001 0.006 4000.00 INTEREST advantage – SELECT name, addr, balance FROM savings user interface ORDER BY balance 63 Content Introduction Data Model – – – flat files relational data model relations vs files 64 Content Introduction Data Model – – – – flat files relational data model relations vs files semi-structured data models • XML and JSON 65 XML <Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author>Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <Publisher>Springer</Publisher> </Book> 66 JSON { "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" } } 67 XML and JSON, side-by-side { <Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author>Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <Publisher>Springer</Publisher> </Book> "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" } } 68 XML Schema <xs:element name="Book"> <xs:complexType> <xs:sequence> <xs:element name="Title" type="xs:string" /> <xs:element name="Authors"> <xs:complexType> <xs:sequence> <xs:element name="Author" type="xs:string" maxOccurs="5"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Date" type="xs:gYear" /> <xs:element name="Publisher" minOccurs="0"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Springer" /> <xs:enumeration value="MIT Press" /> <xs:enumeration value="Harvard Press" /> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> 69 JSON Schema { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false } 70 { String type "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { <xs:element name="Title" type="xs:string" /> "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false } 71 List type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", <xs:element name="Authors"> <xs:complexType> "properties": { <xs:sequence> "Book": { <xs:element name="Author" type="xs:string" maxOccurs="5"/> "type": "object", </xs:sequence> </xs:complexType> "properties": { </xs:element> "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false } 72 Year type { "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { <xs:element name="Date" type="xs:gYear" /> "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false } 73 { Enumeration type "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false <xs:element name="Publisher" minOccurs="0"> } <xs:simpleType> }, <xs:restriction base="xs:string"> "required": ["Book"], <xs:enumeration value="Springer" /> "additionalProperties": false <xs:enumeration value="MIT Press" /> } <xs:enumeration value="Harvard Press" /> </xs:restriction> </xs:simpleType> </xs:element> 74 Semi-Structured Data Models XML Adoption – – Web Services (SOAP) XML DB JSON Adoption – – Web Services (REST) MongoDB 75 Content Introduction Data Model – – – – – flat files relational data model relations vs files semi-structured data models history 76 History  Hierarchical (IMS): late 1960’s and 1970’s  Network (CODASYL): 1970’s  Relational: 1970’s and early 1980’s  Entity-Relationship: 1970’s  Object-oriented: late 1980’s and early 1990’s  Object-relational: late 1980’s and early 1990’s  Semi-structured (XML): late 1990’s to the present 77

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download ppt