Download Presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Serializability wikipedia , lookup

IMDb wikipedia , lookup

Commitment ordering wikipedia , lookup

Microsoft Access wikipedia , lookup

Global serializability wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Oracle Database wikipedia , lookup

SQL wikipedia , lookup

PL/SQL wikipedia , lookup

Relational algebra wikipedia , lookup

Ingres (database) wikipedia , lookup

Functional Database Model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Database wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Versant Object Database wikipedia , lookup

ContactPoint wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Transcript
Dynamic Database
Integration
in a JDBC Driver
Terrence Mason and Dr. Ramon Lawrence
Iowa Database and Emerging Application Laboratory
University of Iowa
7th International Conference on Enterprise Information Systems
ICEIS 2005 Miami, Florida
Presentation
Discuss the contributions of JDBC Driver
Review the Architecture
Step through an example integration and
query (Partitioned TPC-H* dataset)
Review the experimental results
Demonstrate efficient Database
Integration
*http://www.tpc.org/tpch/default.asp
Contributions to Database Integration
Standard API for Integration (JDBC)
Automatic generation of a global view of
integrated data sources
– Annotation done locally
– Common Vocabulary (National Cancer Institute-EVS)
– Scalable to build a global schema
Simple Conceptual Query Language
Automatic Join Determination for queries
Allows evolution of data sources
Detects inconsistent data across sources
Unity JDBC Driver Architecture
Java Application
Semantic Query
Results
Unity JDBC Driver
Embedded Database
Engine
JDBC
JDBC
JDBC
SQL
DB1
DB2
............
DBn
Extending Standard JDBC API for Integration
Standard Java Interfaces for Single
Database JDBC Connections extended to
Multiple Databases
– Connection
– Driver Manager
– Statement
– Result Set
Java Code for JDBC Integration
import java.sql.*;
public class JDBCApplication {
public static void main(String[] args) {
{
String url = “jdbc:unity://sources.xml";
(1)
Connection con;
(2)
// Load UnityDriver class
try { { Class.forName(``unity.jdbc.UnityDriver");} }
(3)
catch (java.lang.ClassNotFoundException e) { System.exit(1); }
(4)
try {
//Initiate connection
(5)
con = DriverManager.getConnection(url);
(6)
Statement stmt = con.createStatement();
(7)
ResultSet rst = stmt.executeQuery(`SELECT Part.Name, (8)
LineItem.Quantity, Customer.Name
(9)
WHERE Customer.Name='Customer_25’ ” ); }
(10)
System.out.println(``Part , Quantity, Customer");
(11)
while (rst.next())
(12)
{ System.out.println(rst.getString(``Part.Name")
(13)
+”,”+rst.getString(``LineItem.Quantity")
(14)
+”,”+rst.getString(``Customer.Name") ); (15)
}
con.close();
(16)
}
(17)
catch (SQLException ex) { System.exit(1); }
(18)
}
}
XML File to Reference Data
Sources
<SOURCES>
<DATABASE>
<URL>jdbc:microsoft:sqlserver://IDEALAB5.cs.uiowa.edu:1433;DatabaseName=TPC;
User=terry;Password=xxxxx</URL>
<DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER>
<XSPEC>xspec/Order.xml</XSPEC>
</DATABASE>
<DATABASE>
<URL>jdbc:microsoft:sqlserver://IDEALAB3.cs.uiowa.edu:1433;DatabaseName=TPC;
User=terry;Password=yyyyyy</URL>
<DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER>
<XSPEC>xspec/Part.xml</XSPEC>
</DATABASE>
</SOURCES>
- <XSPEC>
<databaseName>Order</databaseName>
<databaseProductVersion>Microsoft SQL Server 2000 </databaseProductVersion>
- <TABLE>
•Order.xml file (XSpec)
•Schema Information
•Table
<semanticTableName>Customer</semanticTableName>
<tableName>CUSTOMER</tableName>
- <FIELD>
<semanticFieldName>Customer.Id</semanticFieldName>
<fieldName>C_CUSTKEY</fieldName>
<dataTypeName>int</dataTypeName>
</FIELD>
- <FIELD>
<semanticFieldName>Customer.Name</semanticFieldName>
<fieldName>C_NAME</fieldName>
<dataTypeName>varchar</dataTypeName>
<fieldSize>25</fieldSize>
•Fields
XML
created Semi-automatically
•PrimaryDocument
key
</FIELD>
- <FIELD>
<semanticFieldName>Customer.Nation.Id</semanticFieldName>
<fieldName>C_NATIONKEY</fieldName>
<dataTypeName>int</dataTypeName>
•Foreign key
• Schema Information - Extracted Automatically from Database
•Join
</FIELD>
- <PRIMARYKEY>
<keyScope>4</keyScope>
• Annotation
<keyScopeName>Organization</keyScopeName>
• Order
Database and Scopes – Semi-automatically
- <FIELDS>
• Annotation-Semantic Names
• Scope of Keys
<fieldName>C_CUSTKEY</fieldName>
</FIELDS>
</PRIMARYKEY>
- <FOREIGNKEY>
(Global joins)
<keyScope>4</keyScope>
<keyScopeName>Organization</keyScopeName>
- <FIELDS>
<fieldName>C_NATIONKEY</fieldName>
</FIELDS>
<toTableName>NATION</toTableName>
</FOREIGNKEY>
- <JOIN>
<joinName>CUSTOMER->NATION</joinName>
<fromKeyName>FK__CUSTOMER__C_NATI__7A672E12</fromKeyName>
<fromTableName>CUSTOMER</fromTableName>
<toKeyName>PK__NATION__6E01572D</toKeyName>
<toTableName>NATION</toTableName>
<joinType>3</joinType>
</JOIN>
Part Database
part(p_partkey, p_name, p_mfgr)
supplier(s_suppkey, s_name, s_nationkey)
partsupp(ps_partkey,ps_suppkey)
nation(n_nationkey, n_name, n_regionkey)
region(r_regionkey, r_name)
Order Database
customer(c_custkey, c_name, c_nationkey)
orders(o_orderkey,o_custkey,o_orderdate)
lineitem(l_orderkey,l_partkey,l_suppkey,l_linenum,l_qty)
nation(n_nationkey, n_name,n_regionkey)
region(r_regionkey, r_name)
Build Global Schema
On Local Database
Annotations
Global Schema
Part.Id, Part.Name, Part.Manufacturer
Supplier.Id, Supplier.Name, Supplier.Nation.Id
Order.Id, Order.Customer.Id, Order.Date
LineItem.Linenumber, LineItem.Order.Id
LineItem.Quantity, LineItem.Part.Id,
LineItem.Supplier.Id, Customer.Id,
Customer.Name, Customer.Nation.Id
Nation.Id, Nation.Name, Nation.Region.Id
Region.Id, Region.Name
Attribute Only SQL
Global Schema
Query Language on
concepts in Global
Schema
No FROM clause
– Tables not specified
Selection conditions on
concepts
Order by
Global Schema
Part.Id, Part.Name, Part.Manufacturer
Supplier.Id, Supplier.Name, Supplier.Nation.Id
Order.Id, Order.Customer.Id, Order.Date
LineItem.Linenumber, LineItem.Order.Id
LineItem.Quantity, LineItem.Part.Id,
LineItem.Supplier.Id, Customer.Id,
Customer.Name, Customer.Nation.Id
Nation.Id, Nation.Name, Nation.Region.Id
Region.Id, Region.Name
Query:
SELECT Part.Name, LineItem.Quantity, Customer.Name
WHERE Customer.Name = 'Customer#000000025'
Query Processing Steps
Parse Semantic Query
– Validate concepts
– Create parse tree
Map concepts to fields in local databases
Determine joins to relate attributes in each local
database
Build Execution Tree (Relational Algebra)
– Execute a sub-query to each local database
– Find global join or union to relate sub-queries
– Combine sub-queries into single result set
Conceptual Query and Parse
Tree
Conceptual Query:
SELECT Part.Name, LineItem.Quantity, Customer.Name
WHERE Customer.Name = 'Customer#000000025'
Parse Tree:
SELECT
Identifier: Part.Name
Identifier: LineItem.Quantity
Identifier: Customer.Name
WHERE
Comparison_Op: =
Identifier: Customer.Name
String: 'Customer#000000025'
Join Graph Construction
Graph represents joins for each local
database
Edges directed as N:1joins
Automatically extracted into XSpec or
added to the XSpec.
Used to calculate joins for each sub-query
Database Join Graphs
Part Database
Line
Item
Part
Supp
Part
Order Database
Order
Supplier
Nation
Region
Nation
Region
Customer
Map the Concepts in query to Relations
Part.Name, LineItem.Quantity, Customer.Name
Part Database
Line
Item
Part
Supp
Part
Order Database
Order
Supplier
Nation
Region
Nation
Region
Customer
Determine Local Joins
Steiner Tree Approximation Algorithm
Part Database
Order Database
Line
Item
Order
Part
Global Join
LineItem.Part.Id is
foreign key to Part.Id
Semantic Query:
SELECT Part.Name, LineItem.Quantity, Customer.Name
Where Customer.Name = 'Customer#000000025'
Customer
Build Execution Tree
Relational Algebra
Projection
– Concepts in SELECT portion of conceptual query
– Sub-query projections of required fields (global joins)
Selection
– WHERE conditions of conceptual query
Joins
– Determined from Join Graphs
– Global joins identified by key scopes
Sub-queries Sent to Each Local Database
SQL through JDBC
Part Database:
SELECT P.P_NAME, P.P_PARTKEY FROM PART AS P
Order Database:
SELECT L.L_QUANTITY, C.C_NAME, L.L_PARTKEY FROM
LINEITEM AS L, CUSTOMER AS C, ORDERS AS O WHERE
C.C_NAME = 'Customer#000000025' AND O.O_CUSTKEY
= C.C_CUSTKEY AND L.L_ORDERKEY = O.O_ORDERKEY
• Local
joins determined from join graphs
•Selection Condition
• Elements added to queries in order for the global
join to be executed in Unity Driver.
Operator Execution Tree
Idealab1 Client
(Unity Driver)
Unity Embedded
Database Engine
Idealab5
Idealab3
Database Server
Database Server
Experimental Results
Dynamic integration is efficient and scalable
Minimal overhead
Multi-source query processing
– Competitive with single source execution
– Possible to execute queries on a global schema
Schema Integration Results
Multiple Copies of TPC-H (Seconds)
Number of
Schemas
Integrate
Schemas
Connect
Parse
Sub-queries
Total
Time
2
0.219
0.203
0.110
0.532
3
0.235
0.023
0.125
0.594
10
0.485
0.437
0.406
1.328
100
1.375
1.719
5.063
8.157
• Integration of schemas occurs in linear time based on number of schemas integrated.
• Integration and Connection executes only once at start up. Not for each query.
Query Small Result Size
(76 Tuples)
5
Time (seconds)
4
3
2
1.3859
1
0.4553
0.4227
Unity TPC-H
JDBC TPC-H
0
Separate Computers
* Only 76 tuples transported over network for single sub-query
* Separate requires entire Part table imported to Unity for join
Query Large Result Size
(6,000,215 tuples)
70
Time (seconds)
60
59.35
59.33
46.91
50
40
30
20
10
0
Unity TPC-H
JDBC TPC-H
Separate
Computers
* Distributed execution of the queries on multiple computers executed faster
than a single database server due to parallelism for this particular query.
Conclusions
Integration possible in a JDBC Driver
Local Annotation permits scalable
integrations
Minimal Overhead to Process Queries
Query Multiple Database on a Global View
– No need to specify joins
– No requirement to know underlying schemas
Future Works
AutoJoin – Scalable inference engine for
join determination
Improve global query inference
Sophisticated Global Query Optimizer
Extend to support Federated Database
Queries
– No global schema
– Fully Specified Queries
Queries to Test Unity Performance
(Data Labels on Charts)
TPC-H - Conceptual query executed through
Unity driver against a single source TPC-H
database.
JDBC TPC-H - SQL query equivalent to
conceptual query executed directly through SQL
Server JDBC driver on a single source TPC-H
database.
Partitioned on One Computer - Conceptual
query executed on TPC-H data set virtually
partitioned into the Part and Order databases