* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Presentation
Serializability wikipedia , lookup
Commitment ordering wikipedia , lookup
Microsoft Access wikipedia , lookup
Global serializability wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Oracle Database wikipedia , lookup
Relational algebra wikipedia , lookup
Ingres (database) wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Versant Object Database wikipedia , lookup
ContactPoint wikipedia , lookup
Database model wikipedia , lookup
Dynamic Database Integration in a JDBC Driver Terrence Mason and Dr. Ramon Lawrence Iowa Database and Emerging Application Laboratory University of Iowa 7th International Conference on Enterprise Information Systems ICEIS 2005 Miami, Florida Presentation Discuss the contributions of JDBC Driver Review the Architecture Step through an example integration and query (Partitioned TPC-H* dataset) Review the experimental results Demonstrate efficient Database Integration *http://www.tpc.org/tpch/default.asp Contributions to Database Integration Standard API for Integration (JDBC) Automatic generation of a global view of integrated data sources – Annotation done locally – Common Vocabulary (National Cancer Institute-EVS) – Scalable to build a global schema Simple Conceptual Query Language Automatic Join Determination for queries Allows evolution of data sources Detects inconsistent data across sources Unity JDBC Driver Architecture Java Application Semantic Query Results Unity JDBC Driver Embedded Database Engine JDBC JDBC JDBC SQL DB1 DB2 ............ DBn Extending Standard JDBC API for Integration Standard Java Interfaces for Single Database JDBC Connections extended to Multiple Databases – Connection – Driver Manager – Statement – Result Set Java Code for JDBC Integration import java.sql.*; public class JDBCApplication { public static void main(String[] args) { { String url = “jdbc:unity://sources.xml"; (1) Connection con; (2) // Load UnityDriver class try { { Class.forName(``unity.jdbc.UnityDriver");} } (3) catch (java.lang.ClassNotFoundException e) { System.exit(1); } (4) try { //Initiate connection (5) con = DriverManager.getConnection(url); (6) Statement stmt = con.createStatement(); (7) ResultSet rst = stmt.executeQuery(`SELECT Part.Name, (8) LineItem.Quantity, Customer.Name (9) WHERE Customer.Name='Customer_25’ ” ); } (10) System.out.println(``Part , Quantity, Customer"); (11) while (rst.next()) (12) { System.out.println(rst.getString(``Part.Name") (13) +”,”+rst.getString(``LineItem.Quantity") (14) +”,”+rst.getString(``Customer.Name") ); (15) } con.close(); (16) } (17) catch (SQLException ex) { System.exit(1); } (18) } } XML File to Reference Data Sources <SOURCES> <DATABASE> <URL>jdbc:microsoft:sqlserver://IDEALAB5.cs.uiowa.edu:1433;DatabaseName=TPC; User=terry;Password=xxxxx</URL> <DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER> <XSPEC>xspec/Order.xml</XSPEC> </DATABASE> <DATABASE> <URL>jdbc:microsoft:sqlserver://IDEALAB3.cs.uiowa.edu:1433;DatabaseName=TPC; User=terry;Password=yyyyyy</URL> <DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER> <XSPEC>xspec/Part.xml</XSPEC> </DATABASE> </SOURCES> - <XSPEC> <databaseName>Order</databaseName> <databaseProductVersion>Microsoft SQL Server 2000 </databaseProductVersion> - <TABLE> •Order.xml file (XSpec) •Schema Information •Table <semanticTableName>Customer</semanticTableName> <tableName>CUSTOMER</tableName> - <FIELD> <semanticFieldName>Customer.Id</semanticFieldName> <fieldName>C_CUSTKEY</fieldName> <dataTypeName>int</dataTypeName> </FIELD> - <FIELD> <semanticFieldName>Customer.Name</semanticFieldName> <fieldName>C_NAME</fieldName> <dataTypeName>varchar</dataTypeName> <fieldSize>25</fieldSize> •Fields XML created Semi-automatically •PrimaryDocument key </FIELD> - <FIELD> <semanticFieldName>Customer.Nation.Id</semanticFieldName> <fieldName>C_NATIONKEY</fieldName> <dataTypeName>int</dataTypeName> •Foreign key • Schema Information - Extracted Automatically from Database •Join </FIELD> - <PRIMARYKEY> <keyScope>4</keyScope> • Annotation <keyScopeName>Organization</keyScopeName> • Order Database and Scopes – Semi-automatically - <FIELDS> • Annotation-Semantic Names • Scope of Keys <fieldName>C_CUSTKEY</fieldName> </FIELDS> </PRIMARYKEY> - <FOREIGNKEY> (Global joins) <keyScope>4</keyScope> <keyScopeName>Organization</keyScopeName> - <FIELDS> <fieldName>C_NATIONKEY</fieldName> </FIELDS> <toTableName>NATION</toTableName> </FOREIGNKEY> - <JOIN> <joinName>CUSTOMER->NATION</joinName> <fromKeyName>FK__CUSTOMER__C_NATI__7A672E12</fromKeyName> <fromTableName>CUSTOMER</fromTableName> <toKeyName>PK__NATION__6E01572D</toKeyName> <toTableName>NATION</toTableName> <joinType>3</joinType> </JOIN> Part Database part(p_partkey, p_name, p_mfgr) supplier(s_suppkey, s_name, s_nationkey) partsupp(ps_partkey,ps_suppkey) nation(n_nationkey, n_name, n_regionkey) region(r_regionkey, r_name) Order Database customer(c_custkey, c_name, c_nationkey) orders(o_orderkey,o_custkey,o_orderdate) lineitem(l_orderkey,l_partkey,l_suppkey,l_linenum,l_qty) nation(n_nationkey, n_name,n_regionkey) region(r_regionkey, r_name) Build Global Schema On Local Database Annotations Global Schema Part.Id, Part.Name, Part.Manufacturer Supplier.Id, Supplier.Name, Supplier.Nation.Id Order.Id, Order.Customer.Id, Order.Date LineItem.Linenumber, LineItem.Order.Id LineItem.Quantity, LineItem.Part.Id, LineItem.Supplier.Id, Customer.Id, Customer.Name, Customer.Nation.Id Nation.Id, Nation.Name, Nation.Region.Id Region.Id, Region.Name Attribute Only SQL Global Schema Query Language on concepts in Global Schema No FROM clause – Tables not specified Selection conditions on concepts Order by Global Schema Part.Id, Part.Name, Part.Manufacturer Supplier.Id, Supplier.Name, Supplier.Nation.Id Order.Id, Order.Customer.Id, Order.Date LineItem.Linenumber, LineItem.Order.Id LineItem.Quantity, LineItem.Part.Id, LineItem.Supplier.Id, Customer.Id, Customer.Name, Customer.Nation.Id Nation.Id, Nation.Name, Nation.Region.Id Region.Id, Region.Name Query: SELECT Part.Name, LineItem.Quantity, Customer.Name WHERE Customer.Name = 'Customer#000000025' Query Processing Steps Parse Semantic Query – Validate concepts – Create parse tree Map concepts to fields in local databases Determine joins to relate attributes in each local database Build Execution Tree (Relational Algebra) – Execute a sub-query to each local database – Find global join or union to relate sub-queries – Combine sub-queries into single result set Conceptual Query and Parse Tree Conceptual Query: SELECT Part.Name, LineItem.Quantity, Customer.Name WHERE Customer.Name = 'Customer#000000025' Parse Tree: SELECT Identifier: Part.Name Identifier: LineItem.Quantity Identifier: Customer.Name WHERE Comparison_Op: = Identifier: Customer.Name String: 'Customer#000000025' Join Graph Construction Graph represents joins for each local database Edges directed as N:1joins Automatically extracted into XSpec or added to the XSpec. Used to calculate joins for each sub-query Database Join Graphs Part Database Line Item Part Supp Part Order Database Order Supplier Nation Region Nation Region Customer Map the Concepts in query to Relations Part.Name, LineItem.Quantity, Customer.Name Part Database Line Item Part Supp Part Order Database Order Supplier Nation Region Nation Region Customer Determine Local Joins Steiner Tree Approximation Algorithm Part Database Order Database Line Item Order Part Global Join LineItem.Part.Id is foreign key to Part.Id Semantic Query: SELECT Part.Name, LineItem.Quantity, Customer.Name Where Customer.Name = 'Customer#000000025' Customer Build Execution Tree Relational Algebra Projection – Concepts in SELECT portion of conceptual query – Sub-query projections of required fields (global joins) Selection – WHERE conditions of conceptual query Joins – Determined from Join Graphs – Global joins identified by key scopes Sub-queries Sent to Each Local Database SQL through JDBC Part Database: SELECT P.P_NAME, P.P_PARTKEY FROM PART AS P Order Database: SELECT L.L_QUANTITY, C.C_NAME, L.L_PARTKEY FROM LINEITEM AS L, CUSTOMER AS C, ORDERS AS O WHERE C.C_NAME = 'Customer#000000025' AND O.O_CUSTKEY = C.C_CUSTKEY AND L.L_ORDERKEY = O.O_ORDERKEY • Local joins determined from join graphs •Selection Condition • Elements added to queries in order for the global join to be executed in Unity Driver. Operator Execution Tree Idealab1 Client (Unity Driver) Unity Embedded Database Engine Idealab5 Idealab3 Database Server Database Server Experimental Results Dynamic integration is efficient and scalable Minimal overhead Multi-source query processing – Competitive with single source execution – Possible to execute queries on a global schema Schema Integration Results Multiple Copies of TPC-H (Seconds) Number of Schemas Integrate Schemas Connect Parse Sub-queries Total Time 2 0.219 0.203 0.110 0.532 3 0.235 0.023 0.125 0.594 10 0.485 0.437 0.406 1.328 100 1.375 1.719 5.063 8.157 • Integration of schemas occurs in linear time based on number of schemas integrated. • Integration and Connection executes only once at start up. Not for each query. Query Small Result Size (76 Tuples) 5 Time (seconds) 4 3 2 1.3859 1 0.4553 0.4227 Unity TPC-H JDBC TPC-H 0 Separate Computers * Only 76 tuples transported over network for single sub-query * Separate requires entire Part table imported to Unity for join Query Large Result Size (6,000,215 tuples) 70 Time (seconds) 60 59.35 59.33 46.91 50 40 30 20 10 0 Unity TPC-H JDBC TPC-H Separate Computers * Distributed execution of the queries on multiple computers executed faster than a single database server due to parallelism for this particular query. Conclusions Integration possible in a JDBC Driver Local Annotation permits scalable integrations Minimal Overhead to Process Queries Query Multiple Database on a Global View – No need to specify joins – No requirement to know underlying schemas Future Works AutoJoin – Scalable inference engine for join determination Improve global query inference Sophisticated Global Query Optimizer Extend to support Federated Database Queries – No global schema – Fully Specified Queries Queries to Test Unity Performance (Data Labels on Charts) TPC-H - Conceptual query executed through Unity driver against a single source TPC-H database. JDBC TPC-H - SQL query equivalent to conceptual query executed directly through SQL Server JDBC driver on a single source TPC-H database. Partitioned on One Computer - Conceptual query executed on TPC-H data set virtually partitioned into the Part and Order databases