Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Design of an Optimized GA Based DSS Query Execution Strategy in a Distributed Environment Manik Sharma Ph.D. Scholar, Punjab Technical University, Kapurthala, India. [email protected] Gurvinder Singh Professor & Head DCSE, Guru Nanak Dev University Amritsar, India. [email protected] Rajinder Singh Associate Professor, DCSE Guru Nanak Dev University Amritsar, India. [email protected] ABSTRACT Distributed query optimization is one of the key challenges in the field of database theory. On the basis of the distribution of data a query can be categorized as a centralized query or a distributed query. The processing of distributed query is entirely different from the centralized query as in the former case the data is distributed over number of sites. Distributed queries are of two types called Online Transaction Processing Query (OLTPQ) and Decision Support System Query (DSSQ). Joins and semi joins plays an important role in the optimization of a distributed query. Decision Support System Query (DSSQ) is one of the decisive types of distributed query. DSS queries are complex and time consuming in nature. Due to the decentralization of data and the complexity of query, it is mandatory to optimize the query execution plan in distributed DSS query. DSS queries can be optimized on the basis of Total Costs or Response Time of query. In this paper, a set of DSS queries are designed on the basis of TPC-DS (Transaction Processing Performance Council for Decision Support) benchmark. An effort is made to optimize DSS queries on the basis of Total Costs i.e. to minimize the usage of system resources of a DSS query. The queries are optimized stochastically using Genetic Algorithm with restricted growth encoding scheme. The set of DSS queries is further analyzed on the basis of joins & semi joins operations. The application of GA assists in achieving optimal allocation of sub operations of query in almost constant time independent of the complexity of the DSS query. Experimental results indicate that hybrid approach of join and semi join operation reduces the Total Costs by 1-20% in comparison to the use of only join operation. The contribution of this effort helps in optimizing distributed design of DSS database. Keywords: Distributed Database, Total Cost, Joins, Semi Joins, DSS Query, Genetic Algorithm. 1. INTRODUCTION Database is a collection of interrelated data designed to meet the information needs of an enterprise. Depending upon the distribution of data, the database is categorized as centralized or a distributed database. In Centralized database system the complete database is placed on a single central site and is shared among users. On the other hand, distributed database system is defined as collection of logically interrelated data distributed over several sites. Distributed 1 database is one of the major progresses in the field of database theory. The concept of distributed database originated around three decades ago. Technically Distributed Database system is cluster of distributed computers that are connected with one another with the help of some wired or wireless communication media. The data in a distributed database is distributed over number of available sites by using fragmentation or replication techniques. In short distributed database system is the convergence of database system and computer networks[1][4][19]. There are two types of queries in distributed database system viz. Online Transaction Processing (OLTP) and Decision Support System (DSS) queries. Decision Support System Query (DSSQ) is one of the decisive types of distributed query. A DSS query is normally used to retrieve or explore data from two or multiple sites. DSS queries are long running and complex queries that normally affect large amount of data as compared to OLTP queries. DSS queries are normally executed on quarter, semester or yearly basis for long term strategic planning. DSS queries consume significant amount of system resources and can saturate even CPU or memory server of the system [7][10][13][21]. Query processing and optimization is one of major challenge in the field of distributed database system. A distributed query can be executed in number of ways. Each query execution plan may use different amount of system resources. The core objective of query optimization is to find the best possible query execution strategy. The distributed queries can be optimized by minimizing the Total Costs (Total Time) or Response Time of a query. To increase the throughput of the system, one should optimize the Total Costs of a query. Total Costs is computed by considering various parameters like Input Output Costs, Processing Costs and the Cost of communication [1][2][9]. Total Cost = TCostio + TCostcpu + TCostcomm. Eq.-I Here Input Output Cost is sum of time taken by all input-output operations, Processing Cost is sum of time taken in the processing of a DSS query and Communication cost is sum of time taken in transmitting the data from end to another. Normally a query is composed of Selection, Projection, Join and Semi Join operations of Relational Algebra. In any query, Join is one of the most imperative operations in database theory that is used to extract information from two or more than two relations. Technically join operation is one of the special cases of cartesian product. In join operation, unlike cartesian product, the tuples of the join tables are checked against specified condition. There are various types of joins like equi-join, self join, inner join, outer join etc. Independent of type all of these are used to extract data from two or more relations[11]. A semi-join is one of the important operations in relation theory that is used to optimize a joins query. Semi join is used to reduce the size of relation that is used as an operand. A semi-join from Ri to Rj on attribute A can be denoted as Ri⋉Rj [20] . Rest of the paper is organized as follow. Objectives of study are given in section 2. Third section of the paper provides information about the design of DSS queries. Fourth section briefly explains the concept of query optimization. The fifth section of paper describes the basic concepts and working of Genetic Algorithm. Furthermore, Analysis of DSS queries using Genetic Algorithm with restricted growth encoding scheme is laid down in the sixth section of the paper. And last section concludes the paper. 2 2. AIMS OF THE STUDY The aim of this work is to design an optimized GA based DSS query execution strategy in a distributed environment. The major objectives which are to be achieved are given as below: A distributed database system is to be simulated by designing the statistics and mathematical model of the distributed database. A set of experimental DSS queries based on TPC-DS benchmark are to be designed for the simulated environment. The simulation cost coefficients of the DSS query like Input-Output Costs, Processing Costs and Communication Costs are to be designed. The DSS queries are to be optimized on the basis of Total Costs of query using novel Genetic Algorithm. To study the effect of using join or semi joins on the Total Cost of the DSS query. 3. DESIGN OF THE DSS QUERIES A DSS query is one of the important types of distributed query that plays an important role in strategic planning. It accesses large portion of database as compare to an OLTP query. In general, a DSS query focuses on an aggregated information. An output of a DSS query has large number of tuples. It applied more number of locks during its execution. A set of DSS queries is designed on the basis of TPC-DS benchmark. A set of DSS queries designed with a distributed database system is given as below[13] [21] [23]: Store (Storeid, Sname, Manager, Market, Address, Company, City, State: Varchar; No_of_Emp:Number; S_date: Date;) Customer (Custid, Cust_Fname, Cust_Lname, DOB, Contact, Email: Varchar;) Cust_Address (Custid, HouseNo, Street, Street2, City, State, Country: Varchar) Items (Itemcode, Name, Brand, Type, Size, Colour, Description, Ware_no: Varchar; Price: Number) Sales(Saleid, Storeid, Store_name, City, Warehouse_id: Varchar; Item_code, Qty, Unit_price, Tax, Discount,Net_price:Number, Custid: Varchar;) Callcentre (CCID, Cen_name, Manager, Cen_Address: Varchar; No_ofEmp, Area_in_SQFT:Number;) Webstore (Website, Web_id, Web_mkt_mgr, Nature: Varchar) Warehouse (Wareh_id, Wname, Wmanager, Address, Company, City, State: Varchar; No_of_Emp, Ware_size: Number; S_date: date;) Marketing (Mark_id, Mark_item, Mark_promo_name, Mark_Manager, Warehid: Varchar; Expenditure: Number; Mark_sdate, Mark_edate: Date) Shipping (Ship_id, Ship_mode, Ship_item, ship_address, Ship_cont_person: Varchar; Ship_date: Date; Ship_item_units: Number, itemcode: Number) The parameters viz. Input-Output, Processing & Communication plays major role in the optimization process of a distributed query. For analyzing the set of DSS queries that are 3 designed on the basis of TPC-DS benchmark, following distributed database statistics is considered. S.No. 1. 2. 3. 4. 5. 6. Table1: Database Statistics Parameter Value Degree of Relation 10 Cardinality of Relation 1,45,0000 Size of Tuples (in Bytes) 72 Size of Relation (in Kbytes) 10,2000 (Approx.) Block Size (in Kbytes) 8 Size of Relation in Blocks 12750 (Approx.) The following DSS queries are designed on the basis of TPC-DS benchmark. Each DSS query has different number of joins. The following part of this section provides the relational algebraic expression of each DSS. Query 1: DSS -1 Join ( 𝜋 (𝜎) Customer): Х: ( 𝜋 (𝜎) Cust_Address) Query 2: DSS – 2Joins ( 𝜋 (𝜎) Customer): Х: ( 𝜋 (𝜎) Cust_Address): Х: ( 𝜋 (𝜎) Sales) Query 3: DSS – 3Joins ( 𝜋 (𝜎)Customer ) :Х: ( 𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎) Marketing ) Query 4: DSS-4Joins: (𝜋(𝜎) Cust_Address) :Х: ( 𝜋 (𝜎)Sales ) :Х: (𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎) Cust_Address ) :Х: ( 𝜋 (𝜎) Sales ) Query 5: DSS-6Joins (𝜋 (𝜎)Store ) : X : (𝜋 (𝜎) Customer) :Х: ( 𝜋 (𝜎) Cust_Address ) :Х: (𝜋 (𝜎)Store ):Х: (𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎) Items ) Query 6: DSS-6Joins ( 𝜋 (𝜎) Sales ) :Х: ( 𝜋 (𝜎) Cust_Address ) :Х: ( 𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎)Item ) :Х: ( 𝜋 (𝜎) Marketing ) :Х: ( 𝜋 ( 𝜎)Sales ) :Х: ( 𝜋 (𝜎)Shipping ). Query 7: DSS-7Joins: ( 𝜋(𝜎) Sales) :Х: ( 𝜋 (𝜎) Cust_Address ):Х: ( 𝜋 (𝜎)Items ) :Х: ( 𝜋 (𝜎) Warehouse ):Х: ( 𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎)Marketing ) : Х: ( 𝜋 (𝜎)Shipping) :Х: ( 𝜋(𝜎)Webstore) Query 9: DSS-9Joins: ( 𝜋 (𝜎) Sales) :Х: ( 𝜋 (𝜎)Cust_Address ):Х: ( 𝜋 (𝜎)Items ) :Х: ( 𝜋 (𝜎) Warehouse ):Х: ( 𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎)Marketing ) : Х: ( 𝜋 (𝜎)Shipping) :Х: ( 𝜋(𝜎)Webstore: Х: ( 𝜋 (𝜎)Items) :Х: ( 𝜋(𝜎) callcentre) Query 10: DSS-10 Joins: 4 ( 𝜋(𝜎) Sales) :Х: ( 𝜋 (𝜎) Items ):Х: ( 𝜋 (𝜎)Cust_Address ):Х: ( 𝜋 (𝜎) Customer ) :Х: ( 𝜋 (𝜎) Store ):Х: ( 𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎) Warehouse ) :Х: ( 𝜋 (𝜎)Sales ): Х: ( 𝜋 (𝜎) Marketing) :Х: ( 𝜋(𝜎) Sales : Х: ( 𝜋 (𝜎) shipping) Query 11: DSS-12 Joins: ( 𝜋(𝜎) Sales) :Х: ( 𝜋 (𝜎) Items ):Х: ( 𝜋 (𝜎)Cust_Address ):Х: ( 𝜋 (𝜎) Customer ) :Х: ( 𝜋 (𝜎) Store ):Х: ( 𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎) Warehouse ) :Х: ( 𝜋 (𝜎)Sales ): Х: ( 𝜋 (𝜎) Marketing) :Х: ( 𝜋(𝜎) Sales : Х: ( 𝜋 (𝜎) shipping) :Х: ( 𝜋(𝜎) Sales : Х: ( 𝜋 (𝜎) Customer) Query 12: DSS-15 Joins: ( 𝜋(𝜎) Sales) :Х: ( 𝜋 (𝜎) Items ):Х: ( 𝜋 (𝜎)Cust_Address ):Х: ( 𝜋 (𝜎) Customer ) :Х: ( 𝜋 (𝜎) Store ):Х: ( 𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎) Warehouse ) :Х: ( 𝜋 (𝜎)Sales ): Х: ( 𝜋 (𝜎) Marketing) :Х: ( 𝜋(𝜎) Sales : Х: ( 𝜋 (𝜎) shipping) :Х: ( 𝜋(𝜎) Sales : Х: ( 𝜋 (𝜎) Customer) :X: (𝜋 (𝜎) Marketing :X: 𝜋 (𝜎) Store) 4. QUERY OPTIMIZATION Query optimization is one of dominant tasks in the field of database system. Search space and cost model are the key components of a query optimization process. Search space is composed of alternative query execution plans. Different optimization strategies produce different search space. Cost model associate different weights to Input Output, Processing and Communication. There are several ways to optimize a query in distributed database. One can rewrite the query, change the order of sub operations, or can minimize the access and movement of data to optimize a query. In this case to optimize DSS queries in distributed database system, number of QEP (query execution plans) are generated by manipulating the order of sites where the sub operations are to be executed. The cost of each query execution plan is generated and compared to get and optimal query execution plan[2][9][23]. Further a DSS query is optimized by using cost based optimization model. Total Costs of System Resources is considered as an optimization parameter. In distributed database system cost is associated with each basic operation (I/O, Processing & Communication) to generate Input Output Costs, Processing Costs and Communication Costs. To determine the Total Costs of system resources of a query, weights are associated with different operation like Input Output, Processing and Communication. TCDSS=∑IOCost + ∑CPUCost + ∑ CommCost Eq.-II Here TCDSS denotes Total Costs of system resources A typical class of DSS query is selected in which only selection, projection and join or semi joins operations are taken place. Join is one of the foremost operations of a DSS query. Since the cost of communication is heavily dependent upon the number of joins. Therefore, join operation is one of the costly operations in distributed database system. A join operation can be processed as a distributed or non distributed join. In distributed join, the join take place between two or more fragments of a relation. On the other hand a non distributed join takes place between two or more relations rather than on fragments. In this paper the focus is on non distributed joins [24]. It is obvious that in case of a DSS query in distributed database, the distribution of data over different sites increases the costs of communication; hence Total Cost of a DSS query is increased. So one must try to reduce the communication cost of a DSS query. Semi join is used 5 to reduce the transmission of data from one site to another. However, it is effective only when the amount of data transmission is significant. In some cases where significant data transmission is required, the cost of communication and hence Total Costs of a DSS query can be reduced by using semi joins instead of joins. 5. WORKING OF GENETIC ALGORITHMS A query can be analyzed and optimized by using different optimization techniques like Exhaustive Enumeration, Dynamic Programming, Simulated Annealing, Evolutionary Algorithm etc. In this case, an effort is made to optimized the queries using one of the evolutionary algorithm i.e. Genetic Algorithm. The remaining part of this section briefly explains the concepts and working of Genetic Algorithm. Genetic Algorithm commonly abbreviated as GA is one of the evolutionary algorithm used to solve complex problems. The concept of GA was given by John Holland. It works on the principle of survival of fittest. Genetic Algorithms are effectively used for computation intensive and optimization problems[23][25]. . One of the important parameter of GA is fitness function. Fitness function should be optimized. In this paper, Total Costs of system resources is taken as a fitness function. Total Cost of system resources and depends upon Input Output Costs, Processing Costs and Communication Costs. GA starts its working with an initial population followed by selection, crossover and mutation operators. Selection is an important operator of GA that selects the parent from initial population for crossover to get an effective offspring. Crossover operator selects two parents and combines them to get better offspring. Mutation operator further modifies an individual offspring generated by crossover operator. The working of Genetic Algorithm is as given below[18] [22]: Figure 1: Working of Genetic Algorithm 6 Genetic Algorithm provides nearly optimal solution of DSS query optimization problem by using improved randomized search strategy similar in biological evolution. GA propagates the solution from one generation to next generation formed by crossover and mutation to find more optimal solution. The use of genetic algorithm helps in finding an effective query allocation plan in a flash as compared to significant time taken in the traditional query optimization technique [16]. 6. ANALYSIS OF DSS QUERIES USING PROPOSED GENETIC ALGORITHM In this paper, Genetic Algorithm with restricted growth encoding scheme is proposed to compute the Total Costs of DSS query. As stated earlier, Total Costs of a query is heavily dependent upon Input Output Costs, Processing and Communication Costs. Some of the researchers have already used Genetic Approach while optimizing the queries. The novelty of the proposed Genetic Approach lies in the design of chromosome. Here, the growth of chromosome is restricted by using a constraint that the projection operation of query would be performed on the same site where the corresponding selection operation is performed. The psuedocode used for finding an optimal query execution plan using the proposed Genetic Approach with restricted growth encoding scheme is as given below: // Input Data Select the DSS query based upon TPC-DS benchmark database. Decompose the DSS query into sub queries based upon different operation like selection, projection and join. Input various parameters like Number of Sites, Relations, Operations, Fragments, Selection, Projection Operations, I/O Costs, CPU Costs, Communication Costs, Size of Population, Number of Generation etc. // Initial Population Restrict the design of chromosome in such a way that the projection operation would be performed on the same site where corresponding selection operation is executed. Randomly select the initial population of the above restricted designed chromosome. // Analyze the fitness Compute the fitness value of each chromosome from initial population based upon Total Costs of a query execution plan. // Apply Genetic Operators Select best two chromosomes that act as parent based upon the fitness value. Apply one point crossover operation over two selected parents. Finally, apply mutation operation on the resultant of crossover operation // Termination Generate DSS query allocation plan Goto step (Analyze the Fitness). 7 Furthermore, a set of DSS queries are analyzed using Joins and Semi Joins. The effect of these approaches on Total Costs of a query is observed. A set of DSS queries as designed in the previous section is analyzed by using Select-Project-Join operations and Select-Project(Joins+SemiJoins) operations. All the above operations are simulated in a custom designed simulator developed using MATLAB environment. A simulator takes number of input parameters like number of base relations, Number of Operations, Number of Intermediate Fragments, I/O speed coefficients, CPU Coefficients, Communication Coefficient, Number of Joins etc. Simulator takes a text file as an input and produces another text file with number of query execution plan for DSS query. The simulator provides number of alternate query execution plans. Query execution plan commonly abbreviated as QEP is a string of numerics. With the help of query execution plans, one is able to identify the location, where an operation is to be executed. The length of the query execution plan (QEP) or chromosome as given by simulator is one less than the number operations involved in a DSS query. The design of chromosome for a 3-Joins DSS query, where the maximum number of operations is eleven, is as given below: QEP: 1 3 4 2 1 3 4 2 2 3 From the above query execution plan it is found that the length of chromosome is ten i.e. one less than the number of operations. Here first four operations are selection operations, next four are projections operation. And last two operations are join operations. The eleventh operation is a join operation, the final operation. The location of final operation is predetermined and not mentioned in the query execution plan, since the location of it is already fixed and is given as an input to the simulator. From the above query execution plans, one came to know that which operation is performed on which site. The different experiments are performed are on a set of DSS queries to find the effectiveness of the proposed genetic approach in optimizing the DSS Queries in distributed database system. The default ratio between Input Output Cost and Communication cost is assumed to be 1: 1.6. It is assumed that the cost associated with processing operations of a query is ten times than the cost of an input output operations. The block size of relation is used while computing the Total Cost of system resources. One of the important factors while simulating is the allocation of data on different sites. In this case it is assumed that this model places a base relation on two different sites. The above said DSS queries are analyzed by using the block structure of the concerned base relation. The partial screen shot of simulator’s output is as given in the following Table 2. The table shows the different chromosome generated for a DSS query (3 Joins) with the computed costs of system resources. Table2: Simulator’s Results Chromosome 1331133144 2341234144 1241124144 2241224144 IO Cost 1814100 1816650 1814100 1811550 CPU Cost 179730 179985 179730 179475 Comm. Cost 17600 13200 13200 13200 Total Cost 2011430 2009835 2007030 2004225 Fitness Value 4.971587 4.975533 4.982487 4.98946 8 2341234141 2341234114 2241224141 2241224114 1241124114 2341234142 2241224133 2241224142 2341234124 2241224124 2341234121 2241224121 2341234122 1695450 1695450 1690350 1690350 1692900 1655050 1649950 1649950 1655050 1649950 1533850 1528750 1493450 167985 167985 167475 167475 167730 163985 163475 163475 163985 163475 151985 151475 147985 17600 15800 17600 15200 12000 20800 24600 20800 12600 8800 17000 13200 20200 1881035 1879235 1875425 1873025 1872630 1839835 1838025 1834225 1831635 1822225 1702835 1693425 1661635 5.316222 5.321314 5.332125 5.338957 5.340083 5.43527 5.440622 5.451894 5.459603 5.487797 5.87256 5.905192 6.018169 The Table 3 explains the location where all selection, projection and join operations are performed to get an optimal allocation plan. S.No. 1 2 3 4 5 6 7 8 9 10 Table3: Query Execution Plan Type of Operation in Operation QEP Selection 1 Selection 2 Selection 3 Selection 4 Projection 5 Projection 6 Projection 7 Projection 8 Join 9 Join 10 Location of Site 1 3 4 2 1 3 4 2 2 3 The last operation is fixed and is performed on site 4. The following table shows the effect of using Join operation and hybrid approach of Joins and Semi Joins operation on the Total Cost of system resources for different DSS queries as designed above. Table4: Total Cost Analysis using Joins and Semi Joins S.No. 1. 2. 3. 4. 5. 6. Number of Joins 2 3 4 5 6 7 Total Cost (Joins Only) 1170765 1661635 2172080 2487031 3024110 3393015 Total Cost (Joins & Semi Joins) 993080 1500865 1984440 2312487 2829780 3190875 9 7. 8. 9. 8 10 15 4199310 5222314 7913467 3976541 5176124 7745237 The Figure 2 shows the comparison of Total Costs of system resources for a set of DSS queries with Joins alone and with hybrid approach of joins and semi joins. The past research reveals that the use of semi join is limited to some cases, as it increases the local processing cost of a query. Therefore, a comparison is made between the usage of join and the hybrid approach of joins and semi joins. From the following graph it is clear that Total Costs is reduced up to 20% by innovative hybrid use of joins and semi joins. Total Costs of a Query Analysis of Total Costs using Join and Hybrid Approach of Join and Semi-Join Operations 9000000 8000000 7000000 6000000 5000000 4000000 3000000 2000000 1000000 0 Total Costs using Join Operation Total Costs using hybrid approach 2 3 4 5 6 7 8 10 15 A Set of DSS Queries with different Number of Join Operations Figure 2: Analysis of Total Cost with Join versus Joins and Semi Joins The Figure3 show the amount of time taken by simulator to provide an optimal sub query allocation plan for different DSS queries. From the figure 3 it is clear that Genetic Algorithm takes almost constant time to provide an optimal query execution plans for a set of DSS queries. Figure 3: Analysis of Time Taken in Finding Optimal Plan 10 7. CONCLUSION With the rapid increase in size and complexity of distributed database system, it becomes mandatory to optimize the queries in distributed database system. Due to the complexity of DSS queries, traditional techniques like exhaustive enumeration, dynamic programming is unable to optimize these types of queries. Due to improved randomized search strategy that is based on biological evolution process, Genetic Algorithms are effectively used in the optimization process of DSS queries in distributed database system as these provide the nearly optimal solution in finite amount of time, where the conventional techniques almost became intractable. From the above analysis it is clear that the DSS queries can be effectively optimized with the help of Genetic Algorithm. One of the interesting factors of using Genetic Algorithm in the optimization process is that one can get the optimal sub query allocation plan in very short and almost constant time, independent of the complexity of the DSS query. Further the result shows that for moderate to complex queries the hybrid approach of joins and semi joins operations reduces the Total Cost of system resources by 1-20% as compare to when only join operations are used with select and project operations. However for simple DSS queries (having joins less than equal to three) this hybrid approach does not show any improvement in the Total Cost of System resources. So from the above analysis it is concluded that one should use Select-Project-Join execution strategy for simple DSS Queries (Joins <=4). For moderate to complex DSS queries (Joins>4) Select-Project-(Join+Semijoins) execution strategies should be preferred. REFERENCES: [1] T.V.Vijay Kumar, Vikram Singh Feb 2011. Distributed Query Processing Plans Generation Using GA. IJCTE, Vol 3. No.1. [2] Narasimhaiah Gorla, Suk-Kyu Song 2010. Subquery allocation in Distributed Database using GA.JCS & T, Vol. 10, No.1. [3] Deepak Shukla, Dr. Deepak Arora 2011. An Efficient Approach of Block Nested Loop Algorithm based on Rate of Block Transfer. IJCA, Vol.21, No.3. [4] Swati Gupta, Kuntal Saroha, Bhawna 2011. Fundamental Research in Distributed Database. IJCSMS, Vol. 11, Issue 2. [5] Reza Ghaemi, Amin MilaniFard, Hamid tabatabee 2008. Evolutionary Query Optimization For Heterogeneous Distributed Database System. WASET, 43. [6] Johann Christoph Freytag March 1989. The Basic Principles of Query Optimization in Relational Database Management System. Internal Report, IR-KB-59. [7] Clark D. French 1995. One Size Fits All- Database Architecture Do Not Work for DSS. SIGMOD 95, Published by ACM, USA. [8] Sourabh Kumar, Gourav Khandelwal, Arjun Varshney et. Al Sep. 2011. Cost-Based Query Optimization with Heuristics. International Journal of Scientific & Engineering Research, Vol. 2, Issue 9. [9] Sangkyu Rho, Salvatore T. March 1997. Optimizing Distributed Join Queries: A GA Approach. Annals of OR 71. [10] Pedro Trancoso, Josep-L.Larriba-Pey, Zheng Zhanget. Al. 1997. The Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors. IEEE proceeding of the third International Symposium on HPCA held at San Antonio, USA. [11] S. Vellev 2009. Review of Algorithms for the Join Ordering Problems in Database Query Optimization. Information Technologies and Control, 2009. 11 [12] Rajinder Singh, Gurvinder Singh Nov. 2011. A Stochastic Simulation of Optimized Access Strategies for a Distributed Database Design. IJSER, Vol 2, Issue 11. [13] TPC Benchmark DS, Version 1.1.0, April 2002 online: www.tpc.org. [14] M. Sinha, SV Chande 2010. Query Optimization using Genetic Algorithm. Research Journal of Information Technology Vol. 2 No. 3. [15] Zehai Zhou 2007. Using Heuristics and Genetic Algorithm for Large Scale Database Query Optimization. Journal of Information and Computing Science, Vol. 2, No. 4. [16] Michael Steinbrunn, Guido Moerkotte, Alfans Kemper 1997. Heuristics and randomized optimization for the join ordering problem. VLDB Journal (6). [17] Vinay Harsora, Apurva Shah 2011. A Modified GA for Process Scheduling in Distributed System. IJCA Special Issue on Artificial Intelligence Techniques- Novel Approach & Practices Applications. [18] Kirti Nagpal, Vaishali Wadhwa 2012. Proposed Algorithm For Optimization Of Job Scheduling In Multiprocessor Systems Using Genetic Approach. International Journal of Computer Applications and Information Technology (IJCAIT), Vol 1, No. 3. [19] Garima Mahajan 2012. Query Optimization in DDBS”, International Journal of Computer Applications and Information Technology (IJCAIT), Vol. 1, No. 1. [20] Manik Sharma, Gurdev Singh 2012. Analysis of Joins and Semi Joins in Centralized and Distributed Database Queries. IEEE International Conference on Computing Sciences (ICCS). [21] Said Elnaffar, Pat Martin et. Al. 2008. Is it DSS or OLTP: Automatically identifying DBMS Workload. Journal of Intelligent Information System. Vol. 30, Issue 3. [22] Sookham R.P. Singh, Rajinder Singh Virk 2013. Genetic Algorithm for Staging Cervical Cancer. International Journal of Computer Applications & Information Technology, Vol.3, Issue II. [23] Manik Sharma, Gurvinder Singh et. Al. 2013. Stochastic Analysis of DSS Queries for a Distributed Database Design. International Journal of Computer Applications (IJCA), Vol. 82, No. 5. [24] F. Najjar, Y. Slimani 1998. The enhancement of semi joins strategies in distributed query optimization. Lecture Notes in Computer Science Volume 1470. [25] Priti Punia, Maninder Kaur 2013. A Parallel Evolutionary Approach for solving single variable optimization problems. International Journal of Computer Applications & Information Technology (IJCAIT) Vol. 3, Issue II. 12