Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Selecting Objects: a Technique for Efficient Interaction of OO Design with Relational Data Store Polina Cherkasova University of St.Petersburg, Russia [email protected] Abstract. The work deals with development process of complex database systems from the view point of performance and designing. Data behavior implementation is in question. The proposed architecture «Selecting Objects» bases on the object SqlSelect that encapsulates way to obtain the data in form of SQL query and can be passed through the call stack. This technique provides for combining quick data processing of the relational model and object-oriented designing. 1 Introduction Currently relational database system is a major tool for data storing and managing due to it provides strong mechanisms for quick processing of large amounts of data. But, from the other hand, dew to high complexity of applications in question, one developer is not able to keep in mind all details of such an application. Objectoriented design provides solution for this problem. That is why the desire to implement business functionality with object-oriented model takes place. The problem known as Impedance Mismatch [7] is that the relational and the object-oriented model are different in nature. This means that data structures and manner of coding should be transformed from one model to other at some place and by some way. Dependently on the way of this transformation, various benefits of the models can be exploited. One of the most important benefits of the object-oriented model is freedom of designing, while the benefit of the relational model is quick processing of large amounts of data. At the beginning of the work we describe some statements that are needed for the further discussion. Then we analyze known approaches from the view point of performance and designing. After that we demonstrate that the proposed approach "Selecting objects" has advantages over commonly used ones. 2 Application Design This section responds the questions that are important for the further analysis: logical structure of the application, the advantage of bulk operations against single operations, the applicability of the bulk operations. 2.1 Logical Levels Code of the application system can be split into three logical levels: Data Structures, Business Functionality and User Interface. The Data Structures level declares format for data storing. For example: 1. Tables in relational databases [3] 2. Object attributes in object databases [10] 3. XML structures [1] The sense and behavior of data are defined at the Business Functionality level in accordance with business requirements of the application. The Business Functionality can be implemented as 1. Functions, procedures and views in relational database [3]. It is important that triggers are also members of the Business Functionality level. 2. Object methods in object database [10] 3. Middle tier in three-tier model [16] User Interface provides toolkit for data access to a user. For example: 1. Web application 2. Stand-alone application 3. Web service We will consider systems where Data Structures level is implemented in the relational database due to we are interesting in quick data processing. 2.2 Single and Bulk Operations Supposing, there is a task to modify a set of records selected by a condition from a table. For example “Move all employees of the department X to the department Y”. We will compare two methods of the task implementation. The first method is to obtain identifiers of all the records that meet the condition and then to execute the operation for each one. The second method is to execute the operation using one SQL query “update… from…” that includes the condition in the “where” clause. We will estimate execution time for the both methods, supposing that the table contains N records and S records meet the condition. The first method contains the following steps: 1. Obtain the list of identifiers: a. SQL query starting takes time StartTime [5]. b. Searching for records by the condition. If the condition is indexed field the search takes time S + ln(N). 2. For each of S records: a. SQL query starting takes time StartTime. b. Searching for the record using its identifier takes time ln(N) c. Updating the record takes time UpdateTime The total time is StartTime + S + ln(N) + S (StartTime + ln(N) + UpdateTime) The second method contains the following steps: 1. Update records a. SQL query starting takes time StartTime. b. Searching the for the records takes time S + ln(N). c. Updating each of S records takes time S ∙ UpdateTime The total time is StartTime + S + ln(N) + S ∙ UpdateTime The stated positions are true for the operations “insert” and “delete” also. When S and N grows the first time grows as S ∙ ln(N). This is worse then growing of the second time S + ln(N). When S or N is small the constant StartTime starts to impact. When the Business Functionality is implemented inside the DBMS, the StartTime is the time of SQL query initialization. When the Business Functionality is implemented outside the DBMS, the inter-process communication time is added. When the Business Functionality is executed on the separated computer, the network latency is added. Consequently, using single operations instead of bulk operations affects the performance. 2.3 When Bulk Operations Can Be Used? The end user, as a rule, doesn’t start bulk operations explicitly. The user interface allows selection of objects one by one using the mouse. Bulk operations are initiated implicitly dew to a selected object has a set of child objects that are to be modified. These are samples of such operations: 1. Delete object. Child objects are either to be deleted or to be updated. 2. Move object from one area to another. For example, when a department is moved from one office to another, visiting cards of employees should be updated 3. Copy object from one area to another. Child objects, as a rule, must be copied also. 4. Create an object basing on template. For example, every new customer is provided with base set of services that can be modified later. There are cases when an operation is bulk per se: 1. Export or import 2. Bulk condition is explicitly specified by the user. For example: “Send the message to all managers” 3 The Analyses of the Commonly Used Approaches 3.1 Object-Oriented Interface The statement about object-oriented interface (explained below): objectoriented interface may affect performance (1). Essential property of Business Functionality level is its complexity. Typically used resolution of the complexity is levels of abstraction in object-oriented model. I.e. objects of the system are separated into levels and sublevels so that objects of lower levels don’t depend on objects of higher levels. [8, 14] This consideration induces to implement Business Functionality in form of objectoriented model. So the Business Functionality level becomes broken into sublevels (see Fig.1). User Interface Business Functionality Data Structures Fig. 1. Levels of the object-oriented design embedded into logical levels of an application. During object-oriented designing the often used technique is presentation of a business object by an object of model so that it encapsulates values of its attributes or knowledge how to obtain them. The object gives own attributes to the clients in a form of an interface, where every member of the interface deals with value of one database attribute or one database record [16]. Such architecture separates every bulk operation into single actions at the coding phase. As explained before, this decreases the performance. 3.2 Call-Back from the Data Structures Level The statement about triggers: the impossibility of the call-back from the Data Structures level may decrease performance (2). There are two approaches to design database applications. The first one implements Business Functionality inside DBMS using its language (PL SQL, Transact SQL). The second one implements Business Functionality outside DBMS using high-level objectoriented language. Fundamental difference of these approaches is that only first allows call-back1 from the Data Structures level to the Business Functionality level. Such a call-back can be implemented using triggers: a trigger belongs to the Business Functionality level but is invited by the Data Structures level. In practice it is very important to be able to execute some code for every inserted (updated, deleted) record. That’s why in cases when Business Functionality is implemented outside DBMS a special rule is established. According to this rule a record can be inserted (updated, deleted) by direct or indirect invocation of only one method of Business Functionality. The meant code should be placed into this method. This rule leads impossibility to modify more than one record using one SQL query, i.e. impossibility of bulk modifications. Consequently, the performance is decreased. 3.3 Bulk Modifications The statement about bulk modifications: existent approaches don’t allow combining bulk modifications with execution of the same code for various data sources (3). Let a business operation be implemented as method of Business Functionality level. For example, method CreateAccount inserts records into the tables Member, Contact and Account. Supposing there is a task to create a set of accounts basing on data from some table (for example, in context of data import). Existent approaches propose two possible solutions. The first is to select data from the table and to call the method for every record looping them by cursor. The disadvantage of this solution is that single operations are used instead of bulk operations. The second solution is to implement method that creates accounts basing on data from the table and uses bulk modification operators insert-select, update-from and delete-from. The defect of this solution is that every new data source provokes new implementation of the same logic [8]. 3.4 Impedance Mismatch [7] In the systems where Business Functionality is implemented outside DBMS, the transformation of relational structures to object-oriented structures is performed, as a rule, at the lowest level of Business Functionality. I.e. at the level that lies just above the Data Structures level. The samples are: 1. Patterns “Metadata mapping”, “Repository” in [12] 1 “Call-back”, as we mean here, is possibility of one object to call another object and at the same time to be independent of it. This allows a level to call higher levels synchronously. 2. 3. 4. Majority of business systems based on EJB [16] “The Hybrid Object-Relational Architecture” in [9] “A Second Generation Object-Relational Enabler” in [4] 3.5 Summary Object-oriented design helps in building and supporting of complex systems, but traditional data access through object interface decreases performance of the relational system (1). In addition, if Business Functionality is implemented outside DBMS, the performance is decreased dew to impossibility of call-back from Data Structures level (2). There is bulk modifications problem (3) that affects performance also. 4 Overview of the Architecture «Selecting Objects» The goal of the architecture «Selecting Objects» is to get object-oriented design and to avoid any performance penalties. The architecture bases on objects of the two categories: entities and data sets. Entity is a class that is associated with a table in the database and encapsulates behavior of the table’s data in own methods. Data set is a class that encapsulates SQL query “select…” and, so, presents a set of data records. Interface of this class allows modification and execution of the SQL query. Methods of entity-classes call one another passing data sets as parameters or returned results. Finally, hidden query can be used by two ways: 1. Data set is passed as result to the User Interface level that performs final adjustment of the query and executes it. 2. Basing on the data set the SQL query for data modification is formed and executed. There are two phases for every data set. 1. The query is not executed yet: a. The query can be modified b. Data are not accessible 2. The query is executed: a. Data can be used b. To obtain other data new one more call to the database is needed c. If redundant data was requested, spent resources can not be recovered 4.1 Object-Oriented Interface (1) Architecture «Selecting Objects» limits object-oriented model so that inheritance and encapsulation are used for data behavior implementation, but not for data itself. Instead of data, methods works with way how data are obtained, that is presented by SQL query encapsulated in the data set. Consequently, object interface doesn't force developer to separate data into records; data can be managed as sets of records. 4.2 Call-Back from the Data Structures Level (2) The proposed approach uses possibility to call object-oriented code from DBMS. So, object methods of Business Functionality can be implemented inside DBMS and called from the Data Structures level using triggers. Note that updating query leads one execution of the accordant logic independently of the count of updated records. 4.3 Bulk Modifications (3) A data set can be passed to a method as parameter. Then the method can be used for various data source processing and at the same time can use constructions for bulk modifications. 4.4 Impedance Mismatch [7] In context of proposed architecture the object-oriented model manages relational data without transformation to own format. The transformation is performed just before data are used. Imaginary speaking relational and object-oriented model are "joined" one with other but are "impregnated" one with other [11]. 4.5 Summary Architecture «Selecting Objects» doesn't change the relational performance to worst, but this leads to a set of limitations in the object-oriented model: 1. Generally atomic data can not be obtained by single invocation of a member of a business object. 2. Business objects don’t encapsulate data, but their behavior only. 3. Data themselves are not passed through the call stack. Instead, the call stack passes knowledge how to obtain them. This knowledge is used for the obtaining just before data are used. 5 Some Details of the Architecture “Selecting Objects” 5.1 Types of Data Sets The architecture «Selecting Objects» works with data sets. Data set presents a list of records of the same type. Data sets are distinguished by their types in accordant with types of records. For example, “all suppliers” and “suppliers, names of which starts with ‘A’” are two various data sets of the same type. Data set types can detail one another. A type B details a type A if it’s semantic doesn't contradict the semantic of A. Data set types are declared at the Business Functionality level as classes derived from the class SqlSelect. If data set type A is detailed by data set type B, then the class of B must be derived from the class of A. Every data set type declares base way how to obtain the data in the form of SQL query. Base SQL query can be either hard coded inside the constructor or be passed as a parameter of the constructor. Derived class can declare own base SQL query. Data set is an instance of data set type class. Data set can perform some operations with the base query (restriction, projection, sorting, extension and aggregation [3]), but can not modify it. Data set is similar to relational view. Data set is more than view because it gives OO interface, inheritance and polymorphism. Data set is less than view because it doesn’t allow some relational operations (for example, joining of two data sets is restricted). Data set doesn’t substitute view: a view can be used inside the data set’s query. The prototypes of the class SqlSelect are 1. The pattern Query Object in [12] 2. Class of the same name in AVIcode AX.NET Studio [2]. 3. Class SqlCommand in Microsoft ADO.NET [13]. Example Examples operates with the database of parts and suppliers from [3] using Microsoft SQL Server 2005 and C#.NET. There are two tables (see Fig.2): 1. City: the city City_ID has name CityName. 2. Supplier: the supplier Supplier_ID has name SupplierName and is placed in the city City_ID City City_ID CityName Supplier Supplier_ID City_ID CityName Fig. 2. Sample database Following data set types can be declared (see Fig.3): 1. NewCityQry: SqlSelect City with name CityName 2. CityQry: NewCityQry City with name CityName and identifier City_ID 3. SupplierQry: SqlSelect Supplier with name SupplierName and identifier Supplier_ID placed in city City_ID 4. NewSupplierQry: SqlSelect Suppliers with name SupplierName, placed in city with name CityNameForNewSupplier 5. SupplierExtQry: SupplierQry Suppliers with name SupplierName and identifier Supplier_ID, placed in city City_ID with name CityName NewCityQry CityName SupplierQry Supplier_ID City_ID SupplierName CityQry City_ID CityName SupplierExtQry Supplier_ID City_ID SupplierName CityName NewSupplierQry SupplierName CityNameForNewSupplier Fig. 3. Data set types Base SQL query for NewSupplierQry and NewCityQry is passed as parameter of constructor. Base SQL for other data set types are hard coded (see Table 1). Table 1. Hard coded base SQL queries Data set type CityQry SupplierQry SupplierExtQry Base SQL query City Supplier select Supplier.*, CityName from Supplier inner join City on City.City_ID = Supplier.City_ID 5.2 Data Selecting To obtain data, one of SqlSelect's methods with prefix Execute should be called. Example This code calculates count of suppliers, names of which starts with the letter “A”. SupplierQry qry = new SupplierQry(); qry.Where.And(SupplierQry.Fields.SupplierName + " like 'A%'"); int count = qry.ExecuteCount(); 5.3 Class Entity and Data Modifications An entity-class derived from the class Entity should be created for every table. Methods of entity-classes can be implemented using helper methods of the base class and methods of other entity-classes. Helper methods of the base class InsertRows, UpdateRows, DeleteRows has access modifier “protected” to allow modification of the table from the own entity-class only. 5.4 Triggers To process inserted, updated and deleted records, handlers for appropriate events of the class Entity should be implemented. The handler receives two data sets as parameters: inserted and deleted. Example Supposing, for every new supplier name, leading spaces must be removed and first letter must be capitalized. To meet these requirements, inserting and updating handlers calls the method ProcessNewData. The method performs the following steps: 1. Basing on the parameter creates data set wrongSuppliers 2. Creates expression that calculates new name 3. Calls updating by identifiers from wrongSuppliers, specifying expression for updating. This leads execution of "update..select.." query. private void ProcessNewData(SupplierQry inserted) { SupplierQry wrongSuppliers = new SupplierQry(inserted); wrongSuppliers.Where.And(String.Format("not IsValidSupplierName({0})", SupplierQry.Fields.SupplierName)); string expr = String.Format("CorrectSupplierName({0})", SupplierQry.Fields.SupplierName); this.UpdateRowsById(wrongSuppliers, Assign.Expression(SupplierQry.Fields.SupplierName, expr)); } Two static methods are declared in the class Supplier and are deployed into DBMS as a user defined functions: 1. 2. public static SqlBoolean IsValidSupplierName(SqlString name) public static SqlString CorrectSupplierName(SqlString name) 6 Designing. Where is the Benefit? When we restrict data encapsulation and data accessing with OO interface, we truncated the object-oriented model. The question is: does the truncated objectoriented model still make sense? Does it allow object-oriented decomposition? We will give some examples that illustrate how the object-oriented designing can be used inside truncated object-oriented model. 6.4 Polymorphism and Call-Back Supposing, there are two packages. The package Suppliers that deals with suppliers and their contracts, and the package Projects that deals with parts, supplements and projects. The package Projects depends on the package Suppliers, i.e. members of the first one uses members of the second one. The task is: when supplier is deleted all its supplements must be replaced by supplements of other suppliers or the project must be suspended. This means that the package Projects must be synchronously called from the package Suppliers. The direct call leads cycle reference between two packages and thus corrupts the OO design. Truncated model allow using the pattern Call-Back Interface[6]: to define the interface IProjectForSupplier, that is implemented by the table-wrapper Project and is used by the table-wrapper Supplier to invite required operations on supplier deleting (see Fig.4). Projects Part … Project_Part … Project … Suppliers Сity … Supplier … IProjectForSupplier Fig. 4. Polymorphism and call-back 6.4 Inheritance and Reuse Let permissions for access to business entities must be checked. I.e. for every user and for every entity a list of allowed actions is known. It is comfortable to have tablewrapper’s method that returns list of actions, which are permitted to a user with an entity. Our architecture allows creation of abstract class SecureEntity that is derived from the class Entity. The mentioned method can be implemented in this class. Tablewrappers of objects that require security checking should be derived form the class SecureEntity (see Fig.5 ). Projects Suppliers Supplier … Security SecureEntity ActionQry ListActions(memberID, entityID) … Fig. 5. Deriving and reuse 6.4 Encapsulation of Data and Complex Calculations Data processing using the class SqlSelect is not always usable. Supposing, future prices should be predicted basing on the supplements of a supplier. This task has two traits. At first, the complexity of calculations is high in comparison with complexity of data structures that are needed for the calculations. At second, the consequent data processing is required. These traits lead involving of nontruncated object-oriented model. For such tasks we determine data-class. An instance of this class: 1. Encapsulates data that are associated with the instance of a business entity and are needed for the task execution. 2. Doesn’t access database, but get all the data as methods’ parameters. This, for the task of the sample the class SupplierPricingData should be implemented (see Fig.6). This class: 1. Is associated with the class Supplier 2. Is initialized by identifier of the supplier and by its supplements 3. Encapsulates the data that are needed to predict future prices Let us note that 1. Dew to the task’s traits, the rejection of the object model truncation doesn’t lead the worse performance. 2. Dew to lifetime of the Value-object is not more that lifetime of the current transaction, encapsulated data don’t require synchronization with the database. Projects Part … Project_Part … Project … SupplierPricingData SupplierPricingData(int supplierID, DataSet prices) int PredictPrice(int partID, DateTime date) Fig. 6. Sample of the Data-Class 7 Conclusions and Future Work Thus, two goals were proposed: quick data processing and possibility of object designing. The first goal is of higher priority. I.e. solution of the second goal should not affect the performance. The architecture “Selecting Objects” meets the proposed goals just in required correlation. The following questions need further research: 1. Obtaining and analysis of the practical results 2. Atomic parts of an SQL query and their nature. 3. Questions of casting of data set types. 4. Rules of interaction with application and the structure of interface level of Business Functionality 5. How to mitigate risk of low concurrency [15] due to encapsulated data behavior? References 1. Albrecht Schmidt , Florian Waas , Martin Kersten , Daniela Florescu, Michael J. Carey, Ioana Manolescu, Ralph Busse, Why and how to benchmark XML databases. ACM SIGMOD Record, v.30 n.3, September 2001 2. AVIcode LLC Web site, 2005. http://www.avicodeconsulting.com/home.htm 3. C. J. Date. An introduction to database systems (7th ed.), Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1999. 4. Charly Kleissner, Enterprise Objects Framework: a second generation object-relational enabler, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.455-459, May 22-25, 1995, San Jose, California, United States. 5. Dennis Shasha, Philippe Bonnet. Database Tuning: Principles, Experiments, and Troubleshooting Techniques. Morgan Kaufmann, 2002. 6. Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides. Design patterns: elements of reusable object-oriented software. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1995. 7. François Banciihon, Object-oriented database systems, Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.152-162, March 1988, Austin, Texas, United States 8. Grady Booch, Object oriented design with applications, Benjamin-Cummings Publishing Co., Inc., Redwood City, CA, 1990. 9. Jeff Sutherland , Matthew Pope , Ken Rugg, The Hybrid Object-Relational Architecture (HORA): an integration of object-oriented and relational technology, Proceedings of the 1993 ACM/SIGAPP symposium on Applied computing: states of the art and practice, p.326-333, February 14-16, 1993, Indianapolis, Indiana, United States. 10. Jen-Yao Chung, Yi-Jing Lin, Daniel T. Chang. Object and relational databases. ACM SIGPLAN OOPS Messenger , Addendum to the proceedings of the 10th annual conference on Object-oriented programming systems, languages, and applications (Addendum). 1995, Volume 6 Issue 4 11. Jim Gray. The Revolution in Database Architecture. SIGMOD 2004, June 13-18, 2004, Paris, France. 12. Martin Fowler. Patterns of Enterprise Application Architecture. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 2002. 13. MSDN Web site, 1994. Microsoft ADO.NET. http://msdn.microsoft.com/library/default.asp?url=/library/enus/cpguide/html/cpconoverviewofadonet.asp 14. O. L. Madsen , B. Moller-Pedersen, Virtual classes: a powerful mechanism in objectoriented programming, ACM SIGPLAN Notices, v.24 n.10, p.397-406, Oct. 1989. 15. Philip A. Bernstein , Vassco Hadzilacos , Nathan Goodman, Concurrency control and recovery in database systems, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1987. 16. Richard Monson-Haefel. Enterprise Java Beans. O’Reily, 2000. .