Download Design of an Optimized GA Based DSS Query Execution Strategy in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Business intelligence wikipedia , lookup

Channel coordination wikipedia , lookup

Collaborative decision-making software wikipedia , lookup

Distributed workforce wikipedia , lookup

Transcript
Design of an Optimized GA Based DSS Query
Execution Strategy in a Distributed Environment
Manik Sharma
Ph.D. Scholar, Punjab Technical University, Kapurthala, India.
[email protected]
Gurvinder Singh
Professor & Head DCSE, Guru Nanak Dev University Amritsar, India.
[email protected]
Rajinder Singh
Associate Professor, DCSE Guru Nanak Dev University Amritsar, India.
[email protected]
ABSTRACT
Distributed query optimization is one of the key challenges in the field of database theory. On
the basis of the distribution of data a query can be categorized as a centralized query or a
distributed query. The processing of distributed query is entirely different from the centralized
query as in the former case the data is distributed over number of sites. Distributed queries are of
two types called Online Transaction Processing Query (OLTPQ) and Decision Support System
Query (DSSQ). Joins and semi joins plays an important role in the optimization of a distributed
query. Decision Support System Query (DSSQ) is one of the decisive types of distributed query.
DSS queries are complex and time consuming in nature. Due to the decentralization of data and
the complexity of query, it is mandatory to optimize the query execution plan in distributed DSS
query. DSS queries can be optimized on the basis of Total Costs or Response Time of query. In
this paper, a set of DSS queries are designed on the basis of TPC-DS (Transaction Processing
Performance Council for Decision Support) benchmark. An effort is made to optimize DSS
queries on the basis of Total Costs i.e. to minimize the usage of system resources of a DSS
query. The queries are optimized stochastically using Genetic Algorithm with restricted growth
encoding scheme. The set of DSS queries is further analyzed on the basis of joins & semi joins
operations.
The application of GA assists in achieving optimal allocation of sub operations of query
in almost constant time independent of the complexity of the DSS query. Experimental results
indicate that hybrid approach of join and semi join operation reduces the Total Costs by 1-20%
in comparison to the use of only join operation. The contribution of this effort helps in
optimizing distributed design of DSS database.
Keywords: Distributed Database, Total Cost, Joins, Semi Joins, DSS Query, Genetic
Algorithm.
1. INTRODUCTION
Database is a collection of interrelated data designed to meet the information needs of an
enterprise. Depending upon the distribution of data, the database is categorized as centralized or
a distributed database. In Centralized database system the complete database is placed on a
single central site and is shared among users. On the other hand, distributed database system is
defined as collection of logically interrelated data distributed over several sites. Distributed
1
database is one of the major progresses in the field of database theory. The concept of
distributed database originated around three decades ago. Technically Distributed Database
system is cluster of distributed computers that are connected with one another with the help of
some wired or wireless communication media. The data in a distributed database is distributed
over number of available sites by using fragmentation or replication techniques. In short
distributed database system is the convergence of database system and computer
networks[1][4][19].
There are two types of queries in distributed database system viz. Online Transaction Processing
(OLTP) and Decision Support System (DSS) queries. Decision Support System Query (DSSQ) is
one of the decisive types of distributed query. A DSS query is normally used to retrieve or
explore data from two or multiple sites. DSS queries are long running and complex queries that
normally affect large amount of data as compared to OLTP queries. DSS queries are normally
executed on quarter, semester or yearly basis for long term strategic planning. DSS queries
consume significant amount of system resources and can saturate even CPU or memory server of
the system [7][10][13][21].
Query processing and optimization is one of major challenge in the field of distributed database
system. A distributed query can be executed in number of ways. Each query execution plan may
use different amount of system resources. The core objective of query optimization is to find the
best possible query execution strategy. The distributed queries can be optimized by minimizing
the Total Costs (Total Time) or Response Time of a query. To increase the throughput of the
system, one should optimize the Total Costs of a query. Total Costs is computed by considering
various parameters like Input Output Costs, Processing Costs and the Cost of communication
[1][2][9].
Total Cost = TCostio + TCostcpu + TCostcomm.
Eq.-I
Here Input Output Cost is sum of time taken by all input-output operations, Processing Cost is
sum of time taken in the processing of a DSS query and Communication cost is sum of time
taken in transmitting the data from end to another.
Normally a query is composed of Selection, Projection, Join and Semi Join operations of
Relational Algebra. In any query, Join is one of the most imperative operations in database
theory that is used to extract information from two or more than two relations. Technically join
operation is one of the special cases of cartesian product. In join operation, unlike cartesian
product, the tuples of the join tables are checked against specified condition. There are various
types of joins like equi-join, self join, inner join, outer join etc. Independent of type all of these
are used to extract data from two or more relations[11].
A semi-join is one of the important operations in relation theory that is used to optimize a joins
query. Semi join is used to reduce the size of relation that is used as an operand. A semi-join
from Ri to Rj on attribute A can be denoted as Ri⋉Rj [20] .
Rest of the paper is organized as follow. Objectives of study are given in section 2. Third section
of the paper provides information about the design of DSS queries. Fourth section briefly
explains the concept of query optimization. The fifth section of paper describes the basic
concepts and working of Genetic Algorithm. Furthermore, Analysis of DSS queries using
Genetic Algorithm with restricted growth encoding scheme is laid down in the sixth section of
the paper. And last section concludes the paper.
2
2. AIMS OF THE STUDY
The aim of this work is to design an optimized GA based DSS query execution strategy in a
distributed environment. The major objectives which are to be achieved are given as below:





A distributed database system is to be simulated by designing the statistics and
mathematical model of the distributed database.
A set of experimental DSS queries based on TPC-DS benchmark are to be designed for the
simulated environment.
The simulation cost coefficients of the DSS query like Input-Output Costs, Processing Costs
and Communication Costs are to be designed.
The DSS queries are to be optimized on the basis of Total Costs of query using novel
Genetic Algorithm.
To study the effect of using join or semi joins on the Total Cost of the DSS query.
3. DESIGN OF THE DSS QUERIES
A DSS query is one of the important types of distributed query that plays an important role in
strategic planning. It accesses large portion of database as compare to an OLTP query. In
general, a DSS query focuses on an aggregated information. An output of a DSS query has large
number of tuples. It applied more number of locks during its execution. A set of DSS queries is
designed on the basis of TPC-DS benchmark. A set of DSS queries designed with a distributed
database system is given as below[13] [21] [23]:
Store (Storeid, Sname, Manager, Market, Address, Company, City, State: Varchar;
No_of_Emp:Number; S_date: Date;)
Customer (Custid, Cust_Fname, Cust_Lname, DOB, Contact, Email: Varchar;)
Cust_Address (Custid, HouseNo, Street, Street2, City, State, Country: Varchar)
Items (Itemcode, Name, Brand, Type, Size, Colour, Description, Ware_no:
Varchar; Price: Number)
Sales(Saleid, Storeid, Store_name, City, Warehouse_id: Varchar; Item_code, Qty,
Unit_price, Tax, Discount,Net_price:Number, Custid: Varchar;)
Callcentre (CCID, Cen_name, Manager, Cen_Address: Varchar; No_ofEmp,
Area_in_SQFT:Number;)
Webstore (Website, Web_id, Web_mkt_mgr, Nature: Varchar)
Warehouse (Wareh_id, Wname, Wmanager, Address, Company, City, State:
Varchar; No_of_Emp, Ware_size: Number; S_date: date;)
Marketing (Mark_id, Mark_item, Mark_promo_name, Mark_Manager, Warehid:
Varchar; Expenditure: Number; Mark_sdate, Mark_edate: Date)
Shipping (Ship_id, Ship_mode, Ship_item, ship_address, Ship_cont_person:
Varchar; Ship_date: Date; Ship_item_units: Number, itemcode: Number)
The parameters viz. Input-Output, Processing & Communication plays major role in the
optimization process of a distributed query. For analyzing the set of DSS queries that are
3
designed on the basis of TPC-DS benchmark, following distributed database statistics is
considered.
S.No.
1.
2.
3.
4.
5.
6.
Table1: Database Statistics
Parameter
Value
Degree of Relation
10
Cardinality of Relation
1,45,0000
Size of Tuples (in Bytes)
72
Size of Relation (in Kbytes)
10,2000 (Approx.)
Block Size (in Kbytes)
8
Size of Relation in Blocks
12750 (Approx.)
The following DSS queries are designed on the basis of TPC-DS benchmark. Each DSS query
has different number of joins. The following part of this section provides the relational algebraic
expression of each DSS.
Query 1: DSS -1 Join
( 𝜋 (𝜎) Customer): Х: ( 𝜋 (𝜎) Cust_Address)
Query 2: DSS – 2Joins
( 𝜋 (𝜎) Customer): Х: ( 𝜋 (𝜎) Cust_Address): Х: ( 𝜋 (𝜎) Sales)
Query 3: DSS – 3Joins
( 𝜋 (𝜎)Customer ) :Х: ( 𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎) Marketing )
Query 4: DSS-4Joins:
(𝜋(𝜎) Cust_Address) :Х: ( 𝜋 (𝜎)Sales ) :Х: (𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎) Cust_Address ) :Х: ( 𝜋 (𝜎)
Sales )
Query 5: DSS-6Joins
(𝜋 (𝜎)Store ) : X : (𝜋 (𝜎) Customer) :Х: ( 𝜋 (𝜎) Cust_Address ) :Х: (𝜋 (𝜎)Store ):Х: (𝜋 (𝜎)Sales )
:Х: ( 𝜋 (𝜎) Items )
Query 6: DSS-6Joins
( 𝜋 (𝜎) Sales ) :Х: ( 𝜋 (𝜎) Cust_Address ) :Х: ( 𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎)Item ) :Х: ( 𝜋 (𝜎)
Marketing ) :Х: ( 𝜋 ( 𝜎)Sales ) :Х: ( 𝜋 (𝜎)Shipping ).
Query 7: DSS-7Joins:
( 𝜋(𝜎) Sales) :Х: ( 𝜋 (𝜎) Cust_Address ):Х: ( 𝜋 (𝜎)Items ) :Х: ( 𝜋 (𝜎) Warehouse ):Х: ( 𝜋
(𝜎)Sales ) :Х: ( 𝜋 (𝜎)Marketing ) : Х: ( 𝜋 (𝜎)Shipping) :Х: ( 𝜋(𝜎)Webstore)
Query 9: DSS-9Joins:
( 𝜋 (𝜎) Sales) :Х: ( 𝜋 (𝜎)Cust_Address ):Х: ( 𝜋 (𝜎)Items ) :Х: ( 𝜋 (𝜎) Warehouse ):Х: ( 𝜋
(𝜎)Sales ) :Х: ( 𝜋 (𝜎)Marketing ) : Х: ( 𝜋 (𝜎)Shipping) :Х: ( 𝜋(𝜎)Webstore: Х: ( 𝜋 (𝜎)Items) :Х: (
𝜋(𝜎) callcentre)
Query 10: DSS-10 Joins:
4
( 𝜋(𝜎) Sales) :Х: ( 𝜋 (𝜎) Items ):Х: ( 𝜋 (𝜎)Cust_Address ):Х: ( 𝜋 (𝜎) Customer ) :Х: ( 𝜋 (𝜎) Store
):Х: ( 𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎) Warehouse ) :Х: ( 𝜋 (𝜎)Sales ): Х: ( 𝜋 (𝜎) Marketing) :Х: ( 𝜋(𝜎)
Sales : Х: ( 𝜋 (𝜎) shipping)
Query 11: DSS-12 Joins:
( 𝜋(𝜎) Sales) :Х: ( 𝜋 (𝜎) Items ):Х: ( 𝜋 (𝜎)Cust_Address ):Х: ( 𝜋 (𝜎) Customer ) :Х: ( 𝜋 (𝜎) Store
):Х: ( 𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎) Warehouse ) :Х: ( 𝜋 (𝜎)Sales ): Х: ( 𝜋 (𝜎) Marketing) :Х: ( 𝜋(𝜎)
Sales : Х: ( 𝜋 (𝜎) shipping) :Х: ( 𝜋(𝜎) Sales : Х: ( 𝜋 (𝜎) Customer)
Query 12: DSS-15 Joins:
( 𝜋(𝜎) Sales) :Х: ( 𝜋 (𝜎) Items ):Х: ( 𝜋 (𝜎)Cust_Address ):Х: ( 𝜋 (𝜎) Customer ) :Х: ( 𝜋 (𝜎) Store
):Х: ( 𝜋 (𝜎)Sales ) :Х: ( 𝜋 (𝜎) Warehouse ) :Х: ( 𝜋 (𝜎)Sales ): Х: ( 𝜋 (𝜎) Marketing) :Х: ( 𝜋(𝜎)
Sales : Х: ( 𝜋 (𝜎) shipping) :Х: ( 𝜋(𝜎) Sales : Х: ( 𝜋 (𝜎) Customer) :X: (𝜋 (𝜎) Marketing :X: 𝜋
(𝜎) Store)
4. QUERY OPTIMIZATION
Query optimization is one of dominant tasks in the field of database system. Search space and
cost model are the key components of a query optimization process. Search space is composed of
alternative query execution plans. Different optimization strategies produce different search
space. Cost model associate different weights to Input Output, Processing and Communication.
There are several ways to optimize a query in distributed database. One can rewrite the query,
change the order of sub operations, or can minimize the access and movement of data to
optimize a query. In this case to optimize DSS queries in distributed database system, number of
QEP (query execution plans) are generated by manipulating the order of sites where the sub
operations are to be executed. The cost of each query execution plan is generated and compared
to get and optimal query execution plan[2][9][23].
Further a DSS query is optimized by using cost based optimization model. Total Costs of System
Resources is considered as an optimization parameter. In distributed database system cost is
associated with each basic operation (I/O, Processing & Communication) to generate Input
Output Costs, Processing Costs and Communication Costs. To determine the Total Costs of
system resources of a query, weights are associated with different operation like Input Output,
Processing and Communication.
TCDSS=∑IOCost + ∑CPUCost + ∑ CommCost
Eq.-II
Here TCDSS denotes Total Costs of system resources
A typical class of DSS query is selected in which only selection, projection and join or semi joins
operations are taken place. Join is one of the foremost operations of a DSS query. Since the cost
of communication is heavily dependent upon the number of joins. Therefore, join operation is
one of the costly operations in distributed database system. A join operation can be processed as
a distributed or non distributed join. In distributed join, the join take place between two or more
fragments of a relation. On the other hand a non distributed join takes place between two or more
relations rather than on fragments. In this paper the focus is on non distributed joins [24].
It is obvious that in case of a DSS query in distributed database, the distribution of data over
different sites increases the costs of communication; hence Total Cost of a DSS query is
increased. So one must try to reduce the communication cost of a DSS query. Semi join is used
5
to reduce the transmission of data from one site to another. However, it is effective only when
the amount of data transmission is significant. In some cases where significant data transmission
is required, the cost of communication and hence Total Costs of a DSS query can be reduced by
using semi joins instead of joins.
5. WORKING OF GENETIC ALGORITHMS
A query can be analyzed and optimized by using different optimization techniques like
Exhaustive Enumeration, Dynamic Programming, Simulated Annealing, Evolutionary Algorithm
etc. In this case, an effort is made to optimized the queries using one of the evolutionary
algorithm i.e. Genetic Algorithm. The remaining part of this section briefly explains the
concepts and working of Genetic Algorithm.
Genetic Algorithm commonly abbreviated as GA is one of the evolutionary algorithm used to
solve complex problems. The concept of GA was given by John Holland. It works on the
principle of survival of fittest. Genetic Algorithms are effectively used for computation intensive
and optimization problems[23][25]. .
One of the important parameter of GA is fitness function. Fitness function should be optimized.
In this paper, Total Costs of system resources is taken as a fitness function. Total Cost of system
resources and depends upon Input Output Costs, Processing Costs and Communication Costs.
GA starts its working with an initial population followed by selection, crossover and mutation
operators. Selection is an important operator of GA that selects the parent from initial population
for crossover to get an effective offspring. Crossover operator selects two parents and combines
them to get better offspring. Mutation operator further modifies an individual offspring generated
by crossover operator. The working of Genetic Algorithm is as given below[18] [22]:
Figure 1: Working of Genetic Algorithm
6
Genetic Algorithm provides nearly optimal solution of DSS query optimization problem by using
improved randomized search strategy similar in biological evolution. GA propagates the
solution from one generation to next generation formed by crossover and mutation to find more
optimal solution. The use of genetic algorithm helps in finding an effective query allocation plan
in a flash as compared to significant time taken in the traditional query optimization technique
[16].
6. ANALYSIS OF DSS QUERIES USING PROPOSED GENETIC ALGORITHM
In this paper, Genetic Algorithm with restricted growth encoding scheme is proposed to compute
the Total Costs of DSS query. As stated earlier, Total Costs of a query is heavily dependent upon
Input Output Costs, Processing and Communication Costs. Some of the researchers have already
used Genetic Approach while optimizing the queries. The novelty of the proposed Genetic
Approach lies in the design of chromosome. Here, the growth of chromosome is restricted by
using a constraint that the projection operation of query would be performed on the same site
where the corresponding selection operation is performed. The psuedocode used for finding an
optimal query execution plan using the proposed Genetic Approach with restricted growth
encoding scheme is as given below:
// Input Data
Select the DSS query based upon TPC-DS benchmark database.
Decompose the DSS query into sub queries based upon different operation like selection,
projection and join.
Input various parameters like Number of Sites, Relations, Operations, Fragments,
Selection, Projection Operations, I/O Costs, CPU Costs, Communication Costs, Size of
Population, Number of Generation etc.
// Initial Population
Restrict the design of chromosome in such a way that the projection operation would be
performed on the same site where corresponding selection operation is executed.
Randomly select the initial population of the above restricted designed chromosome.
// Analyze the fitness
Compute the fitness value of each chromosome from initial population based upon Total
Costs of a query execution plan.
// Apply Genetic Operators
Select best two chromosomes that act as parent based upon the fitness value.
Apply one point crossover operation over two selected parents.
Finally, apply mutation operation on the resultant of crossover operation
// Termination
Generate DSS query allocation plan
Goto step (Analyze the Fitness).
7
Furthermore, a set of DSS queries are analyzed using Joins and Semi Joins. The effect of these
approaches on Total Costs of a query is observed. A set of DSS queries as designed in the
previous section is analyzed by using Select-Project-Join operations and Select-Project(Joins+SemiJoins) operations.
All the above operations are simulated in a custom designed simulator developed using
MATLAB environment. A simulator takes number of input parameters like number of base
relations, Number of Operations, Number of Intermediate Fragments, I/O speed coefficients,
CPU Coefficients, Communication Coefficient, Number of Joins etc. Simulator takes a text file
as an input and produces another text file with number of query execution plan for DSS query.
The simulator provides number of alternate query execution plans. Query execution plan
commonly abbreviated as QEP is a string of numerics. With the help of query execution plans,
one is able to identify the location, where an operation is to be executed. The length of the query
execution plan (QEP) or chromosome as given by simulator is one less than the number
operations involved in a DSS query. The design of chromosome for a 3-Joins DSS query, where
the maximum number of operations is eleven, is as given below:
QEP: 1 3 4 2 1 3 4 2 2 3
From the above query execution plan it is found that the length of chromosome is ten i.e. one less
than the number of operations. Here first four operations are selection operations, next four are
projections operation. And last two operations are join operations. The eleventh operation is a
join operation, the final operation. The location of final operation is predetermined and not
mentioned in the query execution plan, since the location of it is already fixed and is given as an
input to the simulator. From the above query execution plans, one came to know that which
operation is performed on which site.
The different experiments are performed are on a set of DSS queries to find the effectiveness of
the proposed genetic approach in optimizing the DSS Queries in distributed database system.
The default ratio between Input Output Cost and Communication cost is assumed to be 1: 1.6. It
is assumed that the cost associated with processing operations of a query is ten times than the
cost of an input output operations. The block size of relation is used while computing the Total
Cost of system resources.
One of the important factors while simulating is the allocation of data on different sites. In this
case it is assumed that this model places a base relation on two different sites. The above said
DSS queries are analyzed by using the block structure of the concerned base relation. The partial
screen shot of simulator’s output is as given in the following Table 2. The table shows the
different chromosome generated for a DSS query (3 Joins) with the computed costs of system
resources.
Table2: Simulator’s Results
Chromosome
1331133144
2341234144
1241124144
2241224144
IO
Cost
1814100
1816650
1814100
1811550
CPU
Cost
179730
179985
179730
179475
Comm.
Cost
17600
13200
13200
13200
Total
Cost
2011430
2009835
2007030
2004225
Fitness
Value
4.971587
4.975533
4.982487
4.98946
8
2341234141
2341234114
2241224141
2241224114
1241124114
2341234142
2241224133
2241224142
2341234124
2241224124
2341234121
2241224121
2341234122
1695450
1695450
1690350
1690350
1692900
1655050
1649950
1649950
1655050
1649950
1533850
1528750
1493450
167985
167985
167475
167475
167730
163985
163475
163475
163985
163475
151985
151475
147985
17600
15800
17600
15200
12000
20800
24600
20800
12600
8800
17000
13200
20200
1881035
1879235
1875425
1873025
1872630
1839835
1838025
1834225
1831635
1822225
1702835
1693425
1661635
5.316222
5.321314
5.332125
5.338957
5.340083
5.43527
5.440622
5.451894
5.459603
5.487797
5.87256
5.905192
6.018169
The Table 3 explains the location where all selection, projection and join operations are
performed to get an optimal allocation plan.
S.No.
1
2
3
4
5
6
7
8
9
10
Table3: Query Execution Plan
Type of
Operation in
Operation
QEP
Selection
1
Selection
2
Selection
3
Selection
4
Projection
5
Projection
6
Projection
7
Projection
8
Join
9
Join
10
Location
of Site
1
3
4
2
1
3
4
2
2
3
The last operation is fixed and is performed on site 4. The following table shows the effect of
using Join operation and hybrid approach of Joins and Semi Joins operation on the Total Cost of
system resources for different DSS queries as designed above.
Table4: Total Cost Analysis using Joins and Semi Joins
S.No.
1.
2.
3.
4.
5.
6.
Number
of Joins
2
3
4
5
6
7
Total Cost
(Joins Only)
1170765
1661635
2172080
2487031
3024110
3393015
Total Cost
(Joins & Semi Joins)
993080
1500865
1984440
2312487
2829780
3190875
9
7.
8.
9.
8
10
15
4199310
5222314
7913467
3976541
5176124
7745237
The Figure 2 shows the comparison of Total Costs of system resources for a set of DSS queries
with Joins alone and with hybrid approach of joins and semi joins. The past research reveals that
the use of semi join is limited to some cases, as it increases the local processing cost of a query.
Therefore, a comparison is made between the usage of join and the hybrid approach of joins and
semi joins. From the following graph it is clear that Total Costs is reduced up to 20% by
innovative hybrid use of joins and semi joins.
Total Costs of a Query
Analysis of Total Costs using Join and Hybrid Approach of Join and
Semi-Join Operations
9000000
8000000
7000000
6000000
5000000
4000000
3000000
2000000
1000000
0
Total Costs using Join Operation
Total Costs using hybrid approach
2
3
4
5
6
7
8
10
15
A Set of DSS Queries with different Number of Join Operations
Figure 2: Analysis of Total Cost with Join versus Joins and Semi Joins
The Figure3 show the amount of time taken by simulator to provide an optimal sub query
allocation plan for different DSS queries. From the figure 3 it is clear that Genetic Algorithm
takes almost constant time to provide an optimal query execution plans for a set of DSS queries.
Figure 3: Analysis of Time Taken in Finding Optimal Plan
10
7. CONCLUSION
With the rapid increase in size and complexity of distributed database system, it becomes
mandatory to optimize the queries in distributed database system. Due to the complexity of DSS
queries, traditional techniques like exhaustive enumeration, dynamic programming is unable to
optimize these types of queries. Due to improved randomized search strategy that is based on
biological evolution process, Genetic Algorithms are effectively used in the optimization process
of DSS queries in distributed database system as these provide the nearly optimal solution in
finite amount of time, where the conventional techniques almost became intractable. From the
above analysis it is clear that the DSS queries can be effectively optimized with the help of
Genetic Algorithm. One of the interesting factors of using Genetic Algorithm in the optimization
process is that one can get the optimal sub query allocation plan in very short and almost
constant time, independent of the complexity of the DSS query. Further the result shows that for
moderate to complex queries the hybrid approach of joins and semi joins operations reduces the
Total Cost of system resources by 1-20% as compare to when only join operations are used with
select and project operations. However for simple DSS queries (having joins less than equal to
three) this hybrid approach does not show any improvement in the Total Cost of System
resources.
So from the above analysis it is concluded that one should use Select-Project-Join execution
strategy for simple DSS Queries (Joins <=4). For moderate to complex DSS queries (Joins>4)
Select-Project-(Join+Semijoins) execution strategies should be preferred.
REFERENCES:
[1] T.V.Vijay Kumar, Vikram Singh Feb 2011. Distributed Query Processing Plans Generation
Using GA. IJCTE, Vol 3. No.1.
[2] Narasimhaiah Gorla, Suk-Kyu Song 2010. Subquery allocation in Distributed Database using
GA.JCS & T, Vol. 10, No.1.
[3] Deepak Shukla, Dr. Deepak Arora 2011. An Efficient Approach of Block Nested Loop
Algorithm based on Rate of Block Transfer. IJCA, Vol.21, No.3.
[4] Swati Gupta, Kuntal Saroha, Bhawna 2011. Fundamental
Research in Distributed
Database. IJCSMS, Vol. 11, Issue 2.
[5] Reza Ghaemi, Amin MilaniFard, Hamid tabatabee 2008. Evolutionary Query Optimization
For Heterogeneous Distributed Database System. WASET, 43.
[6] Johann Christoph Freytag March 1989. The Basic Principles of Query Optimization in
Relational Database Management System. Internal Report, IR-KB-59.
[7] Clark D. French 1995. One Size Fits All- Database Architecture Do Not Work for DSS.
SIGMOD 95, Published by ACM, USA.
[8] Sourabh Kumar, Gourav Khandelwal, Arjun Varshney et. Al Sep. 2011. Cost-Based Query
Optimization with Heuristics. International Journal of Scientific & Engineering Research,
Vol. 2, Issue 9.
[9] Sangkyu Rho, Salvatore T. March 1997. Optimizing Distributed Join Queries: A GA
Approach. Annals of OR 71.
[10] Pedro Trancoso, Josep-L.Larriba-Pey, Zheng Zhanget. Al. 1997. The Memory
Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors. IEEE
proceeding of the third International Symposium on HPCA held at San Antonio, USA.
[11] S. Vellev 2009. Review of Algorithms for the Join Ordering Problems in Database Query
Optimization. Information Technologies and Control, 2009.
11
[12] Rajinder Singh, Gurvinder Singh Nov. 2011. A Stochastic Simulation of Optimized
Access Strategies for a Distributed Database Design. IJSER, Vol 2, Issue 11.
[13] TPC Benchmark DS, Version 1.1.0, April 2002 online: www.tpc.org.
[14] M. Sinha, SV Chande 2010. Query Optimization using Genetic Algorithm. Research
Journal of Information Technology Vol. 2 No. 3.
[15] Zehai Zhou 2007. Using Heuristics and Genetic Algorithm for Large Scale Database
Query Optimization. Journal of Information and Computing Science, Vol. 2, No. 4.
[16] Michael Steinbrunn, Guido Moerkotte, Alfans Kemper 1997. Heuristics and randomized
optimization for the join ordering problem. VLDB Journal (6).
[17] Vinay Harsora, Apurva Shah 2011. A Modified GA for Process Scheduling in Distributed
System. IJCA Special Issue on Artificial Intelligence Techniques- Novel Approach &
Practices Applications.
[18] Kirti Nagpal, Vaishali Wadhwa 2012. Proposed Algorithm For Optimization Of Job
Scheduling In Multiprocessor Systems Using Genetic Approach. International Journal of
Computer Applications and Information Technology (IJCAIT), Vol 1, No. 3.
[19] Garima Mahajan 2012. Query Optimization in DDBS”, International Journal of
Computer Applications and Information Technology (IJCAIT), Vol. 1, No. 1.
[20] Manik Sharma, Gurdev Singh 2012. Analysis of Joins and Semi Joins in Centralized and
Distributed Database Queries. IEEE International Conference on Computing Sciences
(ICCS).
[21] Said Elnaffar, Pat Martin et. Al. 2008. Is it DSS or OLTP: Automatically identifying
DBMS Workload. Journal of Intelligent Information System. Vol. 30, Issue 3.
[22] Sookham R.P. Singh, Rajinder Singh Virk 2013. Genetic Algorithm for Staging Cervical
Cancer. International Journal of Computer Applications & Information Technology, Vol.3,
Issue II.
[23] Manik Sharma, Gurvinder Singh et. Al. 2013. Stochastic Analysis of DSS Queries for a
Distributed Database Design. International Journal of Computer Applications (IJCA), Vol.
82, No. 5.
[24] F. Najjar, Y. Slimani 1998. The enhancement of semi joins strategies in distributed query
optimization. Lecture Notes in Computer Science Volume 1470.
[25] Priti Punia, Maninder Kaur 2013. A Parallel Evolutionary Approach for solving single
variable optimization problems. International Journal of Computer Applications &
Information Technology (IJCAIT) Vol. 3, Issue II.
12