Download Lecture Notes in Computer Science:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft SQL Server wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

SQL wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Selecting Objects: a Technique for Efficient Interaction
of OO Design with Relational Data Store
Polina Cherkasova
University of St.Petersburg, Russia
[email protected]
Abstract. The work deals with development process of complex database systems from the view point of performance and designing. Data behavior implementation is in question. The proposed architecture «Selecting Objects» bases
on the object SqlSelect that encapsulates way to obtain the data in form of SQL
query and can be passed through the call stack. This technique provides for
combining quick data processing of the relational model and object-oriented
designing.
1 Introduction
Currently relational database system is a major tool for data storing and managing
due to it provides strong mechanisms for quick processing of large amounts of data.
But, from the other hand, dew to high complexity of applications in question, one
developer is not able to keep in mind all details of such an application. Objectoriented design provides solution for this problem. That is why the desire to implement business functionality with object-oriented model takes place.
The problem known as Impedance Mismatch [7] is that the relational and the object-oriented model are different in nature. This means that data structures and manner
of coding should be transformed from one model to other at some place and by some
way.
Dependently on the way of this transformation, various benefits of the models can
be exploited. One of the most important benefits of the object-oriented model is freedom of designing, while the benefit of the relational model is quick processing of
large amounts of data.
At the beginning of the work we describe some statements that are needed for the
further discussion. Then we analyze known approaches from the view point of performance and designing. After that we demonstrate that the proposed approach "Selecting objects" has advantages over commonly used ones.
2 Application Design
This section responds the questions that are important for the further analysis: logical structure of the application, the advantage of bulk operations against single operations, the applicability of the bulk operations.
2.1 Logical Levels
Code of the application system can be split into three logical levels: Data Structures, Business Functionality and User Interface.
The Data Structures level declares format for data storing. For example:
1.
Tables in relational databases [3]
2.
Object attributes in object databases [10]
3.
XML structures [1]
The sense and behavior of data are defined at the Business Functionality level in
accordance with business requirements of the application. The Business Functionality
can be implemented as
1.
Functions, procedures and views in relational database [3]. It is important
that triggers are also members of the Business Functionality level.
2.
Object methods in object database [10]
3.
Middle tier in three-tier model [16]
User Interface provides toolkit for data access to a user. For example:
1.
Web application
2.
Stand-alone application
3.
Web service
We will consider systems where Data Structures level is implemented in the relational
database due to we are interesting in quick data processing.
2.2 Single and Bulk Operations
Supposing, there is a task to modify a set of records selected by a condition from a
table. For example “Move all employees of the department X to the department Y”.
We will compare two methods of the task implementation.
The first method is to obtain identifiers of all the records that meet the condition
and then to execute the operation for each one.
The second method is to execute the operation using one SQL query “update…
from…” that includes the condition in the “where” clause.
We will estimate execution time for the both methods, supposing that the table contains N records and S records meet the condition.
The first method contains the following steps:
1.
Obtain the list of identifiers:
a.
SQL query starting takes time StartTime [5].
b.
Searching for records by the condition. If the condition is indexed
field the search takes time S + ln(N).
2.
For each of S records:
a.
SQL query starting takes time StartTime.
b.
Searching for the record using its identifier takes time ln(N)
c.
Updating the record takes time UpdateTime
The total time is
StartTime + S + ln(N) + S (StartTime + ln(N) + UpdateTime)
The second method contains the following steps:
1.
Update records
a.
SQL query starting takes time StartTime.
b.
Searching the for the records takes time S + ln(N).
c.
Updating each of S records takes time S ∙ UpdateTime
The total time is
StartTime + S + ln(N) + S ∙ UpdateTime
The stated positions are true for the operations “insert” and “delete” also.
When S and N grows the first time grows as S ∙ ln(N). This is worse then growing
of the second time S + ln(N).
When S or N is small the constant StartTime starts to impact. When the Business
Functionality is implemented inside the DBMS, the StartTime is the time of SQL
query initialization. When the Business Functionality is implemented outside the
DBMS, the inter-process communication time is added. When the Business Functionality is executed on the separated computer, the network latency is added.
Consequently, using single operations instead of bulk operations affects the performance.
2.3 When Bulk Operations Can Be Used?
The end user, as a rule, doesn’t start bulk operations explicitly. The user interface
allows selection of objects one by one using the mouse. Bulk operations are initiated
implicitly dew to a selected object has a set of child objects that are to be modified.
These are samples of such operations:
1.
Delete object. Child objects are either to be deleted or to be updated.
2.
Move object from one area to another. For example, when a department is
moved from one office to another, visiting cards of employees should be updated
3.
Copy object from one area to another. Child objects, as a rule, must be copied also.
4.
Create an object basing on template. For example, every new customer is
provided with base set of services that can be modified later.
There are cases when an operation is bulk per se:
1.
Export or import
2.
Bulk condition is explicitly specified by the user. For example: “Send the
message to all managers”
3 The Analyses of the Commonly Used Approaches
3.1 Object-Oriented Interface
The statement about object-oriented interface (explained below): objectoriented interface may affect performance (1).
Essential property of Business Functionality level is its complexity. Typically used
resolution of the complexity is levels of abstraction in object-oriented model. I.e.
objects of the system are separated into levels and sublevels so that objects of lower
levels don’t depend on objects of higher levels. [8, 14]
This consideration induces to implement Business Functionality in form of objectoriented model. So the Business Functionality level becomes broken into sublevels
(see Fig.1).
User Interface
Business Functionality
Data Structures
Fig. 1. Levels of the object-oriented design embedded into logical levels of an application.
During object-oriented designing the often used technique is presentation of a business object by an object of model so that it encapsulates values of its attributes or
knowledge how to obtain them. The object gives own attributes to the clients in a form
of an interface, where every member of the interface deals with value of one database
attribute or one database record [16].
Such architecture separates every bulk operation into single actions at the coding
phase. As explained before, this decreases the performance.
3.2 Call-Back from the Data Structures Level
The statement about triggers: the impossibility of the call-back from the Data
Structures level may decrease performance (2).
There are two approaches to design database applications. The first one implements
Business Functionality inside DBMS using its language (PL SQL, Transact SQL). The
second one implements Business Functionality outside DBMS using high-level objectoriented language.
Fundamental difference of these approaches is that only first allows call-back1 from
the Data Structures level to the Business Functionality level. Such a call-back can be
implemented using triggers: a trigger belongs to the Business Functionality level but is
invited by the Data Structures level.
In practice it is very important to be able to execute some code for every inserted
(updated, deleted) record. That’s why in cases when Business Functionality is implemented outside DBMS a special rule is established. According to this rule a record
can be inserted (updated, deleted) by direct or indirect invocation of only one method
of Business Functionality. The meant code should be placed into this method.
This rule leads impossibility to modify more than one record using one SQL query,
i.e. impossibility of bulk modifications. Consequently, the performance is decreased.
3.3 Bulk Modifications
The statement about bulk modifications: existent approaches don’t allow
combining bulk modifications with execution of the same code for various data
sources (3).
Let a business operation be implemented as method of Business Functionality level.
For example, method CreateAccount inserts records into the tables Member, Contact
and Account. Supposing there is a task to create a set of accounts basing on data from
some table (for example, in context of data import). Existent approaches propose two
possible solutions.
The first is to select data from the table and to call the method for every record
looping them by cursor. The disadvantage of this solution is that single operations are
used instead of bulk operations.
The second solution is to implement method that creates accounts basing on data
from the table and uses bulk modification operators insert-select, update-from and
delete-from. The defect of this solution is that every new data source provokes new
implementation of the same logic [8].
3.4 Impedance Mismatch [7]
In the systems where Business Functionality is implemented outside DBMS, the
transformation of relational structures to object-oriented structures is performed, as a
rule, at the lowest level of Business Functionality. I.e. at the level that lies just above
the Data Structures level.
The samples are:
1.
Patterns “Metadata mapping”, “Repository” in [12]
1
“Call-back”, as we mean here, is possibility of one object to call another object and at the
same time to be independent of it. This allows a level to call higher levels synchronously.
2.
3.
4.
Majority of business systems based on EJB [16]
“The Hybrid Object-Relational Architecture” in [9]
“A Second Generation Object-Relational Enabler” in [4]
3.5 Summary
Object-oriented design helps in building and supporting of complex systems, but
traditional data access through object interface decreases performance of the relational
system (1). In addition, if Business Functionality is implemented outside DBMS, the
performance is decreased dew to impossibility of call-back from Data Structures level
(2).
There is bulk modifications problem (3) that affects performance also.
4 Overview of the Architecture «Selecting Objects»
The goal of the architecture «Selecting Objects» is to get object-oriented design
and to avoid any performance penalties.
The architecture bases on objects of the two categories: entities and data sets.
Entity is a class that is associated with a table in the database and encapsulates behavior of the table’s data in own methods.
Data set is a class that encapsulates SQL query “select…” and, so, presents a set of
data records. Interface of this class allows modification and execution of the SQL
query.
Methods of entity-classes call one another passing data sets as parameters or returned results. Finally, hidden query can be used by two ways:
1.
Data set is passed as result to the User Interface level that performs final adjustment of the query and executes it.
2.
Basing on the data set the SQL query for data modification is formed and executed.
There are two phases for every data set.
1.
The query is not executed yet:
a. The query can be modified
b. Data are not accessible
2.
The query is executed:
a. Data can be used
b. To obtain other data new one more call to the database is needed
c. If redundant data was requested, spent resources can not be recovered
4.1 Object-Oriented Interface (1)
Architecture «Selecting Objects» limits object-oriented model so that inheritance
and encapsulation are used for data behavior implementation, but not for data itself.
Instead of data, methods works with way how data are obtained, that is presented by
SQL query encapsulated in the data set. Consequently, object interface doesn't force
developer to separate data into records; data can be managed as sets of records.
4.2 Call-Back from the Data Structures Level (2)
The proposed approach uses possibility to call object-oriented code from DBMS.
So, object methods of Business Functionality can be implemented inside DBMS and
called from the Data Structures level using triggers.
Note that updating query leads one execution of the accordant logic independently
of the count of updated records.
4.3 Bulk Modifications (3)
A data set can be passed to a method as parameter. Then the method can be used
for various data source processing and at the same time can use constructions for bulk
modifications.
4.4 Impedance Mismatch [7]
In context of proposed architecture the object-oriented model manages relational
data without transformation to own format. The transformation is performed just before data are used.
Imaginary speaking relational and object-oriented model are "joined" one with other but are "impregnated" one with other [11].
4.5 Summary
Architecture «Selecting Objects» doesn't change the relational performance to
worst, but this leads to a set of limitations in the object-oriented model:
1.
Generally atomic data can not be obtained by single invocation of a member
of a business object.
2.
Business objects don’t encapsulate data, but their behavior only.
3.
Data themselves are not passed through the call stack. Instead, the call stack
passes knowledge how to obtain them. This knowledge is used for the obtaining just
before data are used.
5 Some Details of the Architecture “Selecting Objects”
5.1 Types of Data Sets
The architecture «Selecting Objects» works with data sets. Data set presents a list
of records of the same type. Data sets are distinguished by their types in accordant
with types of records. For example, “all suppliers” and “suppliers, names of which
starts with ‘A’” are two various data sets of the same type.
Data set types can detail one another. A type B details a type A if it’s semantic
doesn't contradict the semantic of A.
Data set types are declared at the Business Functionality level as classes derived
from the class SqlSelect. If data set type A is detailed by data set type B, then the class
of B must be derived from the class of A.
Every data set type declares base way how to obtain the data in the form of SQL
query. Base SQL query can be either hard coded inside the constructor or be passed as
a parameter of the constructor. Derived class can declare own base SQL query.
Data set is an instance of data set type class. Data set can perform some operations
with the base query (restriction, projection, sorting, extension and aggregation [3]),
but can not modify it.
Data set is similar to relational view. Data set is more than view because it gives
OO interface, inheritance and polymorphism. Data set is less than view because it
doesn’t allow some relational operations (for example, joining of two data sets is
restricted). Data set doesn’t substitute view: a view can be used inside the data set’s
query.
The prototypes of the class SqlSelect are
1.
The pattern Query Object in [12]
2.
Class of the same name in AVIcode AX.NET Studio [2].
3.
Class SqlCommand in Microsoft ADO.NET [13].
Example
Examples operates with the database of parts and suppliers from [3] using Microsoft SQL Server 2005 and C#.NET.
There are two tables (see Fig.2):
1.
City: the city City_ID has name CityName.
2.
Supplier: the supplier Supplier_ID has name SupplierName and is placed in
the city City_ID
City
City_ID
CityName
Supplier
Supplier_ID
City_ID
CityName
Fig. 2. Sample database
Following data set types can be declared (see Fig.3):
1.
NewCityQry: SqlSelect
City with name CityName
2.
CityQry: NewCityQry
City with name CityName and identifier City_ID
3.
SupplierQry: SqlSelect
Supplier with name SupplierName and identifier Supplier_ID placed in city
City_ID
4.
NewSupplierQry: SqlSelect
Suppliers with name SupplierName, placed in city with name CityNameForNewSupplier
5.
SupplierExtQry: SupplierQry
Suppliers with name SupplierName and identifier Supplier_ID, placed in city
City_ID with name CityName
NewCityQry
CityName
SupplierQry
Supplier_ID
City_ID
SupplierName
CityQry
City_ID
CityName
SupplierExtQry
Supplier_ID
City_ID
SupplierName
CityName
NewSupplierQry
SupplierName
CityNameForNewSupplier
Fig. 3. Data set types
Base SQL query for NewSupplierQry and NewCityQry is passed as parameter of
constructor.
Base SQL for other data set types are hard coded (see Table 1).
Table 1. Hard coded base SQL queries
Data set type
CityQry
SupplierQry
SupplierExtQry
Base SQL query
City
Supplier
select Supplier.*, CityName from Supplier
inner join City on City.City_ID = Supplier.City_ID
5.2 Data Selecting
To obtain data, one of SqlSelect's methods with prefix Execute should be called.
Example
This code calculates count of suppliers, names of which starts with the letter “A”.
SupplierQry qry = new SupplierQry();
qry.Where.And(SupplierQry.Fields.SupplierName + " like 'A%'");
int count = qry.ExecuteCount();
5.3 Class Entity and Data Modifications
An entity-class derived from the class Entity should be created for every table.
Methods of entity-classes can be implemented using helper methods of the base
class and methods of other entity-classes. Helper methods of the base class InsertRows, UpdateRows, DeleteRows has access modifier “protected” to allow modification of the table from the own entity-class only.
5.4 Triggers
To process inserted, updated and deleted records, handlers for appropriate events
of the class Entity should be implemented. The handler receives two data sets as parameters: inserted and deleted.
Example
Supposing, for every new supplier name, leading spaces must be removed and first
letter must be capitalized.
To meet these requirements, inserting and updating handlers calls the method ProcessNewData.
The method performs the following steps:
1.
Basing on the parameter creates data set wrongSuppliers
2.
Creates expression that calculates new name
3.
Calls updating by identifiers from wrongSuppliers, specifying expression for
updating. This leads execution of "update..select.." query.
private void ProcessNewData(SupplierQry inserted)
{
SupplierQry wrongSuppliers = new SupplierQry(inserted);
wrongSuppliers.Where.And(String.Format("not IsValidSupplierName({0})",
SupplierQry.Fields.SupplierName));
string expr = String.Format("CorrectSupplierName({0})",
SupplierQry.Fields.SupplierName);
this.UpdateRowsById(wrongSuppliers,
Assign.Expression(SupplierQry.Fields.SupplierName, expr));
}
Two static methods are declared in the class Supplier and are deployed into DBMS
as a user defined functions:
1.
2.
public static SqlBoolean IsValidSupplierName(SqlString name)
public static SqlString CorrectSupplierName(SqlString name)
6 Designing. Where is the Benefit?
When we restrict data encapsulation and data accessing with OO interface, we
truncated the object-oriented model. The question is: does the truncated objectoriented model still make sense? Does it allow object-oriented decomposition?
We will give some examples that illustrate how the object-oriented designing can
be used inside truncated object-oriented model.
6.4 Polymorphism and Call-Back
Supposing, there are two packages. The package Suppliers that deals with suppliers
and their contracts, and the package Projects that deals with parts, supplements and
projects. The package Projects depends on the package Suppliers, i.e. members of the
first one uses members of the second one.
The task is: when supplier is deleted all its supplements must be replaced by supplements of other suppliers or the project must be suspended. This means that the
package Projects must be synchronously called from the package Suppliers. The direct
call leads cycle reference between two packages and thus corrupts the OO design.
Truncated model allow using the pattern Call-Back Interface[6]: to define the interface IProjectForSupplier, that is implemented by the table-wrapper Project and is used
by the table-wrapper Supplier to invite required operations on supplier deleting (see
Fig.4).
Projects
Part
…
Project_Part
…
Project
…
Suppliers
Сity
…
Supplier
…
IProjectForSupplier
Fig. 4. Polymorphism and call-back
6.4 Inheritance and Reuse
Let permissions for access to business entities must be checked. I.e. for every user
and for every entity a list of allowed actions is known. It is comfortable to have tablewrapper’s method that returns list of actions, which are permitted to a user with an
entity.
Our architecture allows creation of abstract class SecureEntity that is derived from
the class Entity. The mentioned method can be implemented in this class. Tablewrappers of objects that require security checking should be derived form the class
SecureEntity (see Fig.5 ).
Projects
Suppliers
Supplier
…
Security
SecureEntity
ActionQry ListActions(memberID, entityID)
…
Fig. 5. Deriving and reuse
6.4 Encapsulation of Data and Complex Calculations
Data processing using the class SqlSelect is not always usable.
Supposing, future prices should be predicted basing on the supplements of a supplier. This task has two traits. At first, the complexity of calculations is high in comparison with complexity of data structures that are needed for the calculations. At
second, the consequent data processing is required. These traits lead involving of nontruncated object-oriented model.
For such tasks we determine data-class. An instance of this class:
1.
Encapsulates data that are associated with the instance of a business entity
and are needed for the task execution.
2.
Doesn’t access database, but get all the data as methods’ parameters.
This, for the task of the sample the class SupplierPricingData should be implemented (see Fig.6). This class:
1.
Is associated with the class Supplier
2.
Is initialized by identifier of the supplier and by its supplements
3.
Encapsulates the data that are needed to predict future prices
Let us note that
1.
Dew to the task’s traits, the rejection of the object model truncation doesn’t
lead the worse performance.
2.
Dew to lifetime of the Value-object is not more that lifetime of the current
transaction, encapsulated data don’t require synchronization with the database.
Projects
Part
…
Project_Part
…
Project
…
SupplierPricingData
SupplierPricingData(int supplierID, DataSet prices)
int PredictPrice(int partID, DateTime date)
Fig. 6. Sample of the Data-Class
7 Conclusions and Future Work
Thus, two goals were proposed: quick data processing and possibility of object designing. The first goal is of higher priority. I.e. solution of the second goal should not
affect the performance.
The architecture “Selecting Objects” meets the proposed goals just in required correlation.
The following questions need further research:
1.
Obtaining and analysis of the practical results
2.
Atomic parts of an SQL query and their nature.
3.
Questions of casting of data set types.
4.
Rules of interaction with application and the structure of interface level of
Business Functionality
5.
How to mitigate risk of low concurrency [15] due to encapsulated data behavior?
References
1. Albrecht Schmidt , Florian Waas , Martin Kersten , Daniela Florescu, Michael J. Carey,
Ioana Manolescu, Ralph Busse, Why and how to benchmark XML databases. ACM
SIGMOD Record, v.30 n.3, September 2001
2. AVIcode LLC Web site, 2005. http://www.avicodeconsulting.com/home.htm
3. C. J. Date. An introduction to database systems (7th ed.), Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1999.
4. Charly Kleissner, Enterprise Objects Framework: a second generation object-relational
enabler, Proceedings of the 1995 ACM SIGMOD international conference on Management
of data, p.455-459, May 22-25, 1995, San Jose, California, United States.
5. Dennis Shasha, Philippe Bonnet. Database Tuning: Principles, Experiments, and Troubleshooting Techniques. Morgan Kaufmann, 2002.
6. Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides. Design patterns: elements of
reusable object-oriented software. Addison-Wesley Longman Publishing Co., Inc., Boston,
MA, 1995.
7. François Banciihon, Object-oriented database systems, Proceedings of the seventh ACM
SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.152-162,
March 1988, Austin, Texas, United States
8. Grady Booch, Object oriented design with applications, Benjamin-Cummings Publishing
Co., Inc., Redwood City, CA, 1990.
9. Jeff Sutherland , Matthew Pope , Ken Rugg, The Hybrid Object-Relational Architecture
(HORA): an integration of object-oriented and relational technology, Proceedings of the
1993 ACM/SIGAPP symposium on Applied computing: states of the art and practice,
p.326-333, February 14-16, 1993, Indianapolis, Indiana, United States.
10. Jen-Yao Chung, Yi-Jing Lin, Daniel T. Chang. Object and relational databases. ACM
SIGPLAN OOPS Messenger , Addendum to the proceedings of the 10th annual conference
on Object-oriented programming systems, languages, and applications (Addendum). 1995,
Volume 6 Issue 4
11. Jim Gray. The Revolution in Database Architecture. SIGMOD 2004, June 13-18, 2004,
Paris, France.
12. Martin Fowler. Patterns of Enterprise Application Architecture. Addison-Wesley Longman
Publishing Co., Inc., Boston, MA, 2002.
13.
MSDN
Web
site,
1994.
Microsoft
ADO.NET.
http://msdn.microsoft.com/library/default.asp?url=/library/enus/cpguide/html/cpconoverviewofadonet.asp
14. O. L. Madsen , B. Moller-Pedersen, Virtual classes: a powerful mechanism in objectoriented programming, ACM SIGPLAN Notices, v.24 n.10, p.397-406, Oct. 1989.
15. Philip A. Bernstein , Vassco Hadzilacos , Nathan Goodman, Concurrency control and
recovery in database systems, Addison-Wesley Longman Publishing Co., Inc., Boston, MA,
1987.
16. Richard Monson-Haefel. Enterprise Java Beans. O’Reily, 2000.
.