* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CS331: Database Programming: Architectures and Issues
Serializability wikipedia , lookup
Microsoft Access wikipedia , lookup
Relational algebra wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Concurrency control wikipedia , lookup
Ingres (database) wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Oracle Database wikipedia , lookup
Clusterpoint wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Database model wikipedia , lookup
COMP30311: Database Programming: Architectures and Issues Norman Paton University of Manchester [email protected] Observations Databases are mostly accessed by applications: Transaction processing: many small query or update requests (e.g., flight booking, account management). Analytical processing: more complex queries, but less frequent updates (e.g., management information systems). In practice, databases are hardly ever accessed by users typing a query language at a prompt. Client-Server Architecture The client: Runs the application. Invokes requests on the database using the query/update language. The server: Manages concurrency, caching, etc. Client-1 Client-2 Network Server Database Client-Server Issues Classical model has a thick client: Process flow. Business rules. Constraints. ... The server is essentially a shared fact store. Thin clients involve more central code: Process flow. Business rules. Constraints. ... Database servers are able to be much more than fact stores. Classical Relational Database Clients encode most application functionality. Clients are written using embedded SQL. Calls to the database use SQL92. Client (C) SQL DBMS Tables Views Modern Relational Database Application functionality is divided – there is no strict thin or thick client choice. Clients are written using embedded languages. The database has stored programs, using embedded languages or extensions to SQL. Client (C) SQL + Calls DBMS Tables Views Triggers Procedures Multi-Tier Environment User/ Application Layer Middleware Layer Database Server Layer Application User Interface Application Application Library Database Server Middleware Application Library Database Server Multi-Tier Environments Greater flexibility, and thus potentially scaleability. Data-intensive tasks near the database. Compute-intensive tasks in the middle layer or on the client. Example of multi-tier platform: A Web Browser interacts with a Web Server using CGI (Common Gateway Interface). The Web Server runs a Java Servlet that interacts with a DBMS using JDBC. Where Does SQL Fit in? SQL acts as the API to the database (if relational). Features of SQL: Standardised. Declarative. Flexible (Queries, Updates, Administration). Problems with SQL: Non-trivial to learn (not good for end users). Poor for repetitive tasks (e.g., for manual data entry). Of limited computational power (so used with other languages). Programming Databases Options include: Embed query language in existing programming language (e.g., JDBC, SQLJ). Extend query language with programming features (e.g., SQL-99, PL/SQL). Extend programming language with database features (no current products?). Map database constructs to programming constructs as in Object Databases and JDO (e.g., FastObjects, Objectivity) Provide database components for programming environments (e.g., Delphi, ADO.NET). Embedded SQL Example EXEC SQL BEGIN DECLARE SECTION; VARCHAR name[20]; // Data passed C <-> SQL EXEC SQL END DECLARE SECTION; EXEC SQL SELECT type INTO :type // Single valued result FROM station WHERE name=:name; // Parameter type.arr[type.len] = '\0' printf("%s\n",type.arr); // Note String Format JDBC Example Connection conn = DriverManager.getConnection(url,args[0],args[1]); Statement stmt = conn.createStatement(); ResultSet rset = stmt.executeQuery ( "select T# from TRAIN"); while (rset.next()) System.out.println(rset.getString(1)); PL/SQL Example declare cursor c1 is select t# from train where source = ‘Edinburgh’; begin for ed_train in c1 loop insert into edinburgh values (ed_train.t#); end loop; end Multi-Language Environments Where two languages are used together, a mapping is required between their type systems. SQL-92 C INTEGER int VARCHAR char* tuple struct table array Impedance Mismatch The problems encountered linking two independently developed languages are known as the impedance mismatch, which has two aspects: A type system mismatch that affects programmer productivity. An evaluation strategy mismatch that affects performance. Type System Mismatch Database types are not supported directly in the programming language, so, for example, relations may have to be mapped to iterators. Programming language types are not supported directly in the database, and thus have to be mapped, for example, to relations for storage. The programming language type checker cannot check the legality of embedded calls, leading to runtime errors. Evaluation Strategy Mismatch Database operations typically act on and return collections. Programming language operations typically act on and return single values. Query results may be computed in their entirety and cached before any access from the programming language. The database may retrieve data that is never consumed by the programming language. Summary There are many choices in database programming: Which technologies to use. Which architecture to use. Many non-trivial decisions may significantly influence: System performance. Development and maintenance costs. Further Reading Oracle Database Application Developers Guide – Fundamentals [Chapter 1: Programmatic Environments]. JDBC: Programming Relational Databases from Java Trains Database Schema Customer 1 * Booking * Station District 1 * Visit * 1 1 Train See handout for the relational schema and example programs. JDBC and SQLJ There are two standard interfaces allowing relational databases to be accessed and manipulated from Java: JDBC: A class library that allows dynamic SQL statements to be called from Java. SQLJ: A preprocessor that allows static SQL statements to be embedded in Java. JDBC is much more widely used. JDBC JDBC can be used in client applets or applications, or (in some database systems) for implementing server-side functionality. JDBC involves no extensions to the syntax of Java. The JDBC package is imported thus: import java.sql.* Specific database systems are accessed using vendor or third party drivers: DriverManager.registerDriver( new oracle.jdbc.driver.OracleDriver()); JDBC Database Interaction ResultSet ResultSet ResultSet Statement Prepared Statement Callable Statement Connection mySQL Driver Driver Manager Application Oracle Driver Connecting to a Database Statements and transactions are associated with connections. There are several ways of establishing a connection. An example is: String url = "jdbc:oracle:thin:@sr.cs.man.ac.uk:1526:teach"; Connection conn = DriverManager.getConnection (url,username,password); Connection URL The URL is of the form: jdbc:oracle:<drivertype>@<hoststring> An example hoststring is: aardvark.cs.man.ac.uk:1526:teach Different driver types use pure java or include native code, and use generic or custom network protocols. In the above, 1526 is the port, and teach is the system identifier. Single Slide Example import java.sql.*; class Trains { public static void main (String args []) throws SQLException { DriverManager.registerDriver(...); String url = “..."; Connection conn = DriverManager.getConnection (url,args[0],args[1]); Statement stmt = conn.createStatement(); ResultSet rset = stmt.executeQuery ("select T# from TRAIN"); while (rset.next()) System.out.println (rset.getString(1)); } } Statements Queries are run against the database through the creation and execution of statements: Statement stmt = conn.createStatement(); ResultSet rset = stmt.executeQuery ("select T# from TRAIN"); Note that the query is a String, which could be constructed at runtime if required. Note the potential for runtime errors if the query is invalid. Query Results The result of executing a query is a ResultSet, which supports: Iterator functionality, through boolean next(), boolean previous(). Tuple access functionality, as described on the next slide. Update functionality, for results from simple queries, through deleteRow(), updateXXX(). Control functionality, through setFetchSize(int rows). Accessing Result Tuples In JDBC there is no predefined Java type for the result of a query, so attribute values are retrieved by: getXXX() functions, where XXX is the result type. The argument to the function is either the column position (starting from 1) or its name. Note the potential for runtime errors if the result is not as anticipated. Prepared Statements - 1 A PreparedStatement object allows an SQL statement to be run multiple times, with different parameters, without the SQL being recompiled by the database. Simple example: PreparedStatement pstmt = conn.prepareStatement( "select t# from train where source = ?"); pstmt.clearParameters(); pstmt.setString(1,args[2]); ResultSet rset = pstmt.executeQuery(); Prepared Statements - 2 Creating a prepared statement – formal parameters are identified by “?”s: PreparedStatement pstmt = conn.prepareStatement( "insert into booking values (?,?,?)"); Parameters are bound using setXXX (pos,val) (pos starts from 1): pstmt.setString(1,args[2]) The request is executed using executeQuery() or executeUpdate(). Single Slide Update import java.sql.*; class MakeBooking { public static void main (String args []) throws SQLException { DriverManager.registerDriver(...); String url = “...”; Connection conn = DriverManager.getConnection(url,args[0],args[1]); PreparedStatement pstmt = conn.prepareStatement( "insert into booking values (?, ?, ?)"); pstmt.clearParameters(); pstmt.setString(1,args[2]); pstmt.setString(2,args[3]); pstmt.setDate(3,java.sql.Date.valueOf(args[4])); pstmt.executeUpdate(); } } Update Results Statement and PreparedStatement objects can be associated with queries and updates (as strings). The result types of the outputs are different, however, so separate ResultSet executeQuery() and int executeUpdate() methods are required. Transactions By default, each statement executes in a distinct transaction. To group statements, where conn is a Connection, use: conn.setAutoCommit(false) to override the single-statement default and start a transaction. conn.commit() and conn.rollback() to complete a transaction. Closing Things Down The close() operation is supported on lots of things: Connection. Statement. ResultSet. In all cases, close() reclaims resources; it is good practice to close all the above as soon as possible. Handling Errors – Important! Connection conn = null; try { ... } catch (SQLException e) { System.out.println("SQL Exception: " + e.getMessage()); } finally { if (conn != null) { try { conn.rollback(); conn.close(); } catch (SQLException sqlEx) { // ignore } } } Summary JDBC is the most widely used means of accessing relational databases from Java. JDBC is a class library – there are no syntactic extensions to Java. JDBC supports dynamic SQL (i.e., queries are strings) – flexible, but runtime type error possibilities. Impedance mismatches? See tutorial sheet. Further Reading Oracle 10g JDBC Developers Guide and Reference, 2001 [Chapter 1: Overview; Chapter 3: Basic Features]. Sun JDBC Tutorial: http://java.sun.com/docs/books/tutorial/jdbc/ Object Relational Extensions to SQL Data Model History IMS 1970 1980 1990 2000 2005 Network Rel’n Object XML Object-Relational Databases Weaknesses of vanilla Relational databases: Limited data modelling facilities. Limited application development facilities. Object-relational databases aim to overcome these weaknesses. “Object-Relational” is an umbrella term for assorted extensions. Model: Abstract data types (cartridges, blades, ...). Object type extensions. Programming: Programming language extensions to SQL. Active rules/triggers. Object Relational Databases These add to the relational model: Object types. Nested tables. References. Inheritance. Methods. Abstract data types. The SQL:2003 standard covers all of the above; in what follows, examples are from Oracle 10g. SQL:1999 and SQL:2003 The SQL-92 standard now characterises basic relational functionality (as taught in CS231). SQL:1999 was the successor for objectrelational databases, developed throughout the ’90s. SQL:2003 refined the many extensions in SQL:1999 and started to add XML support. SQL:1999/SQL:2003: Are not uniformly adopted – many vendors have their own objectrelational extensions developed since the early ’90s. Cover model extensions, type extensions, programming extensions, triggers, etc. Object-Relational in Oracle Model: Type system extensions to support object types, encapsulation, references. Primitive type extensions as cartridges to support multimedia data, spatial data, etc. Programming: PL/SQL adds imperative programming to SQL. Triggers allow PL/SQL programs to be executed reactively. Relational Model and Types Data type completeness: each type constructor can be applied uniformly to types in the type system. In the basic relational model: There is only one type constructor (i.e. relation). That type constructor cannot be applied to itself. Incorporating data type completeness to the relational model gives nested relations. In addition, the type relation is essentially: Bag < Tuple >. Separating out these type constructors provides further flexibility, such as tuplevalued attributes. Object Types in Oracle An object type is a user-defined data type, somewhat analogous to a class in objectoriented programming. Types can be arranged in hierarchies, and instances of types can be referenced. create type visit_type as object ( name varchar(20), /* the station */ thetime number ); Nested Relations Nested relations involve the storage of one relation as an attribute of another. create type visit_tab_type as table of visit_type; create table train ( t# varchar(10) not null, type char(1) not null, visits visit_tab_type, primary key (t#)) nested table visits store as visits_tab; Populating Nested Tables The name of the type can be used as a constructor for values of the type. update train set visits = visit_tab_type( visit_type('Edinburgh',950), visit_type('Aberdeen',720)) where t# = '22403101' Querying Nested Tables Query operations such as unnesting allow access to the contents of a nested table. The following query retrieves details of the trains that visit Inverness. select * from train t, table(t.visits) v where v.name = ‘Inverness’ Abstract Data Types Abstract data types allow new primitive types to be added to a DBMS (a.k.a. data blades, cartridges). These primitive types can be defined by (skilled) users or vendors. Oracle-supplied cartridges include: Time. Text. Image. Spatial. Video. Oracle Spatial Cartridge The spatial cartridge provides a collection of new primitive types. Supporting the Spatial Types Operations: Geometric (area, difference, …). Topological: Implementation: The cartridge uses specialised index structures such as Rtrees. The optimiser knows the properties of the R-tree, and how it can be used to make queries faster. Programming in SQL declare cursor c1 is select t# from train where source = 'Edinburgh' or dest = 'Edinburgh'; begin for ed_train in c1 loop insert into edinburgh values (ed_train.t#); end loop; end Programming in SQL The following PL/SQL program iterates through a query result. declare cursor c1 is select t# from train where source = 'Edinburgh' or dest = 'Edinburgh'; begin for ed_train in c1 loop insert into edinburgh values (ed_train.t#); end loop; end Example Program: Comments-1 Pl/SQL is a block structured language, with structure: [declare declarations] begin statements [exception handlers] end No relation type in PL/SQL, so cursors iterate over query results. Example Program: Comments-2 The for loop iterates over the result of the query associated with the cursor, fetching results one at a time. Each tuple retrieved from the cursor has type: record(t# varchar(10)) The type of the variable ed_train is inferred. PL/SQL: More Cursors/Loops declare cursor c1 is <as before> ed_tno train.t#%type; begin open c1; loop fetch c1 into ed_tno; exit when c1%notfound; insert into edinburgh values (ed_tno); end loop; close c1; end Loop Example: Comments The declare section can introduce new cursors, types or variables. Variables and cursors have attributes, such as %type, %rowtype and %notfound for accessing properties. The cursor is explicitly opened, closed and fetched from (in contrast with the previous example). The loop construct can mimic classical whiledo and repeat-until loops. Declaring Types Types can be declared explicitly: As a choice, even if in the database. If there is no direct analogue in the database. Other than records, there are object types and lookup tables. declare type ed_train_type is record (t# varchar(10), thetime number); ed_train ed_train_type; Collection Types Collections tend to be important in databases: Persistent data types tend to be bulk data types (e.g. relations). Operations on bulk data types tend to act on complete collections (e.g. there is no operation to update a tuple in SQL92). There are normally few built-in collection types in programming languages (e.g. array). Collections are often provided in class libraries (e.g. java.util.Collection). PL/SQL Collection Types Declarations: type name is table of type-name. type name is varray (size-limit) of type- name. type name is table of type-name index by Unlike tables, varrays: binary_integer. Have a maximum size. Are dense, so elements cannot be deleted. Oracle can store varrays and (non-indexed) tables in the database. Stored varrays cannot be manipulated directly by SQL – they must be retrieved first. Lots of curious rules... Collection Type Example declare type ed_train_type is table of train.t#%type index by binary_integer; ed_table ed_train_type; i binary_integer := 0; begin for ed_train in c1 loop i := i + 1; ed_table(i) := ed_train.t#; end loop; ... end Stored Procedures/Functions Oracle supports stored procedures, functions and packages. Stored procedures can be called from each other, from triggers, from Java, from Web Services, ... Client (C) SQL + Calls DBMS Tables Views Triggers Procedures Example Header A procedure has no result type, whereas a function returns a result. Function header from tutorial: function FastestTrain (src varchar, dst varchar) return varchar The body of a function is a PL/SQL block. Results are returned using return. Calling PL/SQL from JDBC Connection conn =... // Create a CallableStatement CallableStatement cstmt = conn.prepareCall("{? = call FastestTrain(?,?)}"); // Set its two parameters cstmt.setString(2, args[2]); cstmt.setString(3, args[3]); cstmt.registerOutParameter(1, Types.VARCHAR); // Execute the statement and print its result cstmt.execute(); System.out.println("Fastest = " + cstmt.getString(1)) Summary Claims for programming language extensions: Reduces impedance mismatches. Improves [Portfolio 01]: Performance. Programmer productivity. Portability. Security. The reality: SQL extensions are often not elegant. They are widely used. They are not portable across products. Performance always has many facets. Further Reading Oracle 10g PL/SQL User Guide and Reference [Chapter 1: Overview]. Oracle 9i PL/SQL User Guide and Reference [Appendix 1: Example Programs]. M. Piattini, O Diaz (eds), Advanced Database Technology and Design, Artech Press, 2000 [Chapter 6: Object-Relational Database Systems]. M. Stonebraker, P. Brown, Object-Relational DBMSs, 2nd Edition, Morgan-Kaufmann, 1999. Programming Language Extensions to SQL: Triggers Triggers An active database is one that can respond automatically to events. The events to which a database may want to react are mostly within the database, but could in principle be outside. Most relational products support active behaviour, and it is in SQL:2003. Active behaviour is expressed using rules containing: an event, an (optional) condition, and an action, a.k.a. ECA-rules. These active rules are known as triggers in relational products and SQL:2003. Applications of Triggers Extending built-in behaviours: integrity constraints. auditing. authorisation. statistics. data derivation. Triggers are thus generic mechanisms, powerful, but often harder to use than the built-in behaviour. Supporting application functionality: Alerters – the user is informed when something significant happens. Business rules – an organisational behaviour is enforced or carried out as a reaction to database changes. Business Rules Recovering business rules: Indicate how the organisation recovers from a problem. Example: too many people have enrolled on a seminar for the space allocated. reaction – book larger room, run two seminars in parallel, ... Causal business rules: Brings about a behaviour when a condition is satisfied: Example: enough people enrol for a seminar to make it viable. reaction – book a room, inform potential attendees, inform tutor. Trigger Structure Oracle trigger syntax: create or replace trigger name event [when condition] [for each row] action In Oracle: The condition is a boolean expression (that does not access the database). The action is a PL/SQL block. Rule Triggering U0 ... U1 transaction trigger R1 if C1 U2 U4 trigger R2 if C2 U3 Rulebase: R1: on U1 when C1 do U2, U4 R2: on U2 when C2 do U3 Trigger Concepts - 1 Transition granularity. A rule may trigger: once per tuple change – row transition granularity. once per update statement – statement transition granularity. Coupling mode. A rule condition may evaluate: as soon as the event has taken place – immediate coupling mode. at some time after the event took place – deferred coupling mode. Trigger Concepts - 2 Priorities: A single event may trigger multiple rules. A collection of deferred rules may be triggered at the same time by different events. Priorities may be: unspecified, relative, absolute, by creation date. Event types: A primitive event type is considered an atomic happening (e.g., the update to a tuple, a time of day). A composite event type is based on an algebra over primitive events (e.g., E1 OR E2, E1 AND E2, ...). Oracle Triggers Transition granularity: Coupling mode: immediate. Priorities: row triggers - FOR EACH ROW. statement triggers – no FOR EACH ROW. unspecified. Event types: primitive (DML, DDL and system (e.g. startup/shutdown)). composite (but only OR). DML Events Follow database updates: [BEFORE|AFTER] [BEFORE|AFTER] [BEFORE|AFTER] [BEFORE|AFTER] table. INSERT DELETE UPDATE UPDATE ON ON OF OF table. table. table. column on Plus disjunction, e.g.: BEFORE INSERT OR UPDATE OF visit. Condition Row triggers can have conditions that guard the action. The condition is a boolean expression (AND, OR, NOT, >, <, ...). The condition refers to literals and to event properties through correlation variables, e.g.: WHEN new.age < 21. Correlation variables available depend on event types: Event new old INSERT Y N DELETE N Y UPDATE Y Y Action An action is a PL/SQL block. An action: can refer to correlation variables, as :new, :old (row triggers only). can test the type of event being reacted to using inserting, updating, deleting. cannot use transaction control commands directly (but can raise exceptions). Trigger Design Issues Termination: Triggers can trigger each other recursively, which may lead to cycles (or a threshold as in Oracle). Confluence: The (arbitrary) order of selection for multiple triggered rules may lead to unanticipated behaviour. Mutating tables: in Oracle, a row trigger cannot modify a table in mid-update. create or replace trigger t9 before insert on visit for each row begin delete from visit where t# = :new.t#; end Example Triggers Requirement: maintain a table numBookings of the numbers of bookings of each train on each date. create table numBookings ( t# varchar(10) references train(t#), thedate date, num number, primary key (t#, thedate)) Events to monitor: insert on booking. delete on booking. update of t# on booking. update of date on booking. Insert Case create or replace trigger numBookings2 after insert on booking for each row declare numPresent integer; begin select count(*) into numPresent from numBookings where t# = :new.t# and thedate = :new.thedate; if (numPresent = 0) then insert into numBookings values (:new.t#, :new.thedate, 1); else update numBookings set num = num + 1 where t# = :new.t# and thedate = :new.thedate; end if; end; Comments on Insert Case AFTER event, as only update numBookings if booking actually changed. No use of condition, as need to conduct action for every insert to numBookings. Creates a numBookings tuple if none was present before (corresponding delete action should remove if no bookings remain). Delete Case create or replace trigger numBookings1 after delete on booking for each row declare currentNumber integer; begin select num into currentNumber from numBookings where t# = :old.t# and thedate = :old.thedate; if (currentNumber = 1) then delete from numBookings where t# = :old.t# and thedate = :old.thedate; else update numBookings set num = num - 1 where t# = :old.t# and thedate = :old.thedate; end if; end; Comments on Delete Case Broadly the inverse of the insert case. Many references to :old correlation variable (c.f. :new for insert case). Update case is broadly a delete then insert – see tutorial. This problem can also be addressed using statement triggers – see tutorial. Identifying Events A single application functionality may need to monitor many events. Example: Tables: emp(ename,bname,sal) boss(bname,sal) Constraint: no employee is paid more than his/her boss. Quiz: what events may invalidate the constraint? NEW on ??? UPDATE on ??? UPDATE on ??? UPDATE on ???. Choosing Reactions Many reactions may be plausible, for example, to restore a constraint. Different policies may be used in responding to different events. Different policies: For example, may change boss if boss.sal reduced, but raise salary of boss to match increase in employee’s salary. Quiz: what reactions could be used to resatisfy the constraint? Possible reactions: Decrease ??? Increase ??? Change ??? Delete ??? Delete ??? Selecting Transition Granularity Tuple: Access available to correlation variables. Precise response to specific changes possible. Often need many triggers to handle fine grained reactions. Statement: No access to correlation variables. No possibility of precise response to changes. Often need fewer triggers as generic reaction not very fine grained. Summary on Triggers Triggers: Extend the ways in which programming functionality can be stored in the database. Extend built-in facilities for integrity, security, etc. Are powerful ... but not always easy to develop or maintain. Further Reading Oracle 10g Database Concepts [Chapter 22: Triggers]. Oracle 10g Application Developers Guide [Chapter 9: Using Triggers]. M. Piattini, O Diaz (eds), Advanced Database Technology and Design, Artech Press, 2000 [Chapter 3: Active Databases].