Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Relational algebra wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Functional Database Model wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Clusterpoint wikipedia , lookup
Using Databases in ICS4U Grant Hutchison TDSB Agenda • • • • Motivation Why teach Database Technologies? Teaching DBMS using Apache Derby? Alternative Data Technologies – Big Data / NoSQL / R Motivation There is a Data Explosion Occurring… and is NOT about to slowdown!! Volume Velocity Variety Where is the data coming from? and many more… Curriculum Integration – ICS4U A3.1 demonstrate the ability to read from, and write to, an external file (e.g., text file, binary file, database, XML file) from within a computer program; Pre-DB Units in ICS4U • Object-Oriented Design – UML, CRC – Software Engineering Principles – Implementation • Java, C++, Python, C# - typical languages • Algorithms – design : sort, search – analysis : predict and test runtime expectations • Data – Files : reading, writing, parsing (basics only) – Structures : Lists, Stacks, Queues, Trees (basics only) – Scalable data serving components : RDBMS, NoSQL, etc… Data (Preservation and Access) File based solutions Objects can be stored in flat files where each line of data represents an instance of an object. Problem : data (a simple order) Bob, Smith, 2 Main Street, Toronto, G, "Jan 15, 2012",C, Hammer, 2, 9.25, Lumber (4ft), 3, 10.85, .13, 57.69 When to use a Database in student projects? Applications that need to : store and retrieve data quickly and reliability while users interact with the provided user interface (UI). Problems with simply using Files for Objects • no query based access • highly structured Object Design Would you design a Class with these attributes? • • • • • • • • • cust_name cust_street cust_city order_num item_num description cost quantity total Customer Composition Item Order Relational Database 1980 New and Hot Quick adoption 1990 Defacto Standard DBMS Object Relational 2000 Relational / XML Hybrid DB 2010 ++ Object Oriented DBMS Parallel RDBMS Scalable Transactions (OLTP) Data warehouse In Memory DBs Open Source DBs NoSQL Analytical DBs Big Data • Volume • Velocity • Variety Introduction to SQL Declarative programming language designed for querying and managing data stored in RDBMS (Relational DBMS). Declarative ? Describe the what and not the how !! SQL Query SELECT <attributes> FROM <one or more relations> WHERE <conditions> • Not case-sensitive • Must know the schema of the database • Tools usually used to manage • Entity-Relationship Diagrams (ER Diagrams) SQL Data Definition Language (DDL) – Create/alter/delete tables and their attributes Data Manipulation Language (DML) – Access one or more tables – Insert/delete/modify tuples/records in tables CRUD – SQL operations Operation C R U D Create Read Update Delete SQL statement INSERT SELECT UPDATE DELETE SCHEMA - Table Structure Table Name Attribute / Column PRODUCT DESCRIPTION PRICE CATEGORY MANUFACTURER Data type CREATE TABLE PRODUCT ( DESCRIPTION VARCHAR(40), PRICE DECIMAL(9,2), CATEGORY VARCHAR (40), MANUFACTURER VARCHAR (40)) Table Structure PRODUCT DESCRIPTION PRICE CATEGORY MANUFACTURER Hammer $19.99 Hand Tools Stanley Lumber (4 ft) $29.99 Raw Materials ForestWorld INSERT INTO PRODUCT VALUES ( "Hammer", 19.99, "Hand Tools", "Stanley") INSERT INTO PRODUCT VALUES ( "Lumber (4 ft) ", 29.99, "Raw Materials", "ForestWorlds") Tables Explained The schema of a table is the table name and its attributes. A key is an attribute whose values are used to ensure uniqueness of the tuple or assist in defining relationships with out tables. Data Types in SQL Strings – CHAR, VARCHAR, LONG VARCHAR, CLOB Numeric – SMALLINT, INT, BIGINT, DECIMAL, REAL, FLOAT,DOUBLE Dates and Times – DATE, TIME, TIMESTAMP Binary – CHAR FOR BIT DATA, VARCHAR FOR BIT DATA, LONG VARCHAR FOR BIT DATA, BLOB SQL Query SELECT <attributes> FROM <one or more relations> WHERE <conditions> • Not case-sensitive • Must know the schema of the database • Tools usually used to manage • Entity-Relationship Diagrams (ER Diagrams) Simple SQL Query DESCRIPTION PRICE CATEGORY MANUFACTURER Hammer $19.99 Hand Tools Stanley Lumber (4 ft) $29.99 Raw Materials ForestWorld SELECT * FROM PRODUCT WHERE CATEGORY="Hand Tools" DESCRIPTION PRICE CATEGORY MANUFACTURER Hammer $19.99 Hand Tools Stanley Simple SQL Query DESCRIPTION PRICE CATEGORY MANUFACTURER Hammer $19.99 Hand Tools Stanley Lumber (4 ft) $29.99 Raw Materials ForestWorld SELECT DESCRIPTION FROM PRODUCT WHERE PRICE > 20 DESCRIPTION PRICE CATEGORY MANUFACTURER Lumber (4 ft) $29.99 Raw Materials ForestWorld The LIKE operator SELECT * FROM PRODUCTS WHERE DESCRIPTION LIKE ‘%Lumber%’ % = any sequence of characters _ = any single character Ordering the Results SELECT DESCRIPTION, PRICE FROM PRODUCT WHERE CATEGORY=‘Hand Tools’ AND PRICE > 20 ORDER BY PRICE Ordering is ascending, unless you append the DESC keyword. Normalization Process of designing tables to avoid data duplication and maintain consistency. 1NF – Remove multiple items for each tuple/row 2NF – Remove duplicate data for each tuple/row 3NF – Ensure attributes are dependant on the key and nothing but the key 4NF – Not usually needed Object Design Would you design a Class with these attributes? • • • • • • • • • cust_name cust_street cust_city order_num item_num description cost quantity total Customer Aggregate Item Order Joins Bob, Smith, 2 Main, Toronto, G, "Jan 15, 2012",C, .13, 57.69, "Hammer", 2, 9.25, "Lumber (4ft)", 3, 10.85 CUST (cid, fname, lname, street, city, type) ORDER (oid, cid, o_date, status, tax_r, total ) LINEITEM (oid, iid, prod_id, quantity) PRODUCT (pid, description, price) Complex Joins – Answer Questions CUST (cid, name, street, city, type) ORDER (oid, cid, o_date, status, tax_r, total ) LINEITEM (oid, iid, prod_id, quantity) PRODUCT (pid, description, price) Find the Average Order for preferred customers in 2012. SELECT FROM WHERE avg(o.total) CUST c, ORDER o c.type = "P" AND c.cid = o.cid AND YEAR(o.o_date) = "2012" SQL Aggregation Functions SELECT avg(price) FROM product WHERE maker=“Toyota” SELECT count(*) FROM product WHERE year > 1995 There are many more aggregation functions. Purchase Simple Aggregations Product Date Price Quantity Bagel 10/21 1 20 Banana 10/3 0.5 10 Banana 10/10 1 10 Bagel 10/25 1.50 20 SELECT Sum(price * quantity) FROM Purchase WHERE product = ‘bagel’ $50 ($20+$30) Deletions DELETE FROM CUST WHERE name = "Hutchison" Careful as DELETE could remove: • NO ROWS • ONE ROW • ALL ROWS Updates UPDATE SET WHERE PRODUCT price = price * .5 category = "Hand Tools" Apache Derby Apache derby is an open source, small footprint, SQL standards-based Java relational database engine. How Small? • only 2 MB (derby.jar) History of Cloudscape / Apache Derby 1997 Cloudscape Inc. 1999 Acquired by Informix 2001 Acquired by IBM 2004 Open Source Apache Derby Derby Modes Embedded or Networked Networked Mode The Derby server can be started in its own JVM and then multiple Java applications can use the same database. Connecting Java Apps to Derby • Add derby.jar to your CLASSPATH • Use Standard JDBC APIs to access data • JDBC documentation is in the JDK Alternative Data Technologies Big Data Systems • clustered file system (Hadoop) • analysis usually using MapReduce programs (not SQL) • Batch oriented • Analytics not transactions NoSQL • document style of data query and storage (JSON usually) • scalable across servers • complex queries can be challenging • Used primarily for web apps