Download Given a query workload

Using Machine Learning Technique to Parallelize Databases: Where each query answered by a single node Jozsef Patvarczki, Elke A. Rundensteiner, and Neil T. Heffernan Abstract Proposed Solution Web based applications often suffer when trying to scale to support higher loads from the database being a bottleneck. We propose a rule-based data replication middleware for using multiple database servers for web applications. Knowing each query template in advance allows us to propose better solutions for balancing load across multiple servers in the scenario of web applications, above and beyond what is supported for traditional applications. Prior knowledge of all of the incoming query templates and the workload give us the ability to select an appropriate table placement where each query template can be answered with a single database server. Our goal is to minimize the effective response time from the database, by figuring out how to distribute the data across multiple nodes effectively. Instead of using theory only to do database layout, we need a system that will collect empirical data on when horizontal partitioning (HP), vertical partitioning (VP), de-normalization (DN), and full replication (FR) operators are effective. We have implemented a brute force search technique to try different operators, and then we used this empirically measured data to see if any speed up has occurred. After creating a large data set where these four different operators have been applied to make different databases, we can employ machine learning to induce rules to help govern the physical design of the database across an arbitrary number of computer nodes. This, in turn, would allow the database placement algorithm to converge quickly over time as its trains over a larger set of examples. • We characterize the problem as an AI search over layout; • Our hypothesis is that we can learn rules to capture human-like expertise and use these rules to better partition a given database; • By the help of the learned rules, we will be capable to fit layout characteristics, and the layout generation can be faster and faster; • We will perform the layout and empirically measure the cost, since we want to know what is effective and under what conditions; • We explore multiple ways to represent this knowledge (maybe decision-tree); • We Apply cross-validation to prevent overfitting our rules to training data; Problem Statement • A characteristic of web applications such as our ASSISTment Intelligent Tutoring System (www.ASSISTment.org), is that we know all the incoming query templates beforehand as the users typically interact with the system through a web interface [1]; • Prior knowledge of all the incoming query templates and the query workload give us the ability to select an appropriate table placement; • Given a query workload, that describes all the query templates for a web-based application, and the percentage of queries of each template that the application typically processes; • Given this workload and the optimization goal, determine the best possible placement using four operators (FR, HP, VP, and DN) and arbitrary number of database servers answering each query by a single node; • Our optimization goal is to maximize the total system throughput [2]. •Core parts of the system: (a) A data placement algorithm that can converge quickly over time as it trains over a set of examples and machine learned rules; (b) Parameterized and machine learned rules to help govern the physical design of the database across an arbitrary number of computer nodes; (c) A shared-nothing data replication middleware for Web-based applications that can be easily built using lowcost existing resources to realize database scaling possibilities without expensive storage area networks References 1. Tobias Groothuyse, Swaminathan Sivasubramanian, Guillaume Pierre, “Globetp: template-based database replication for scalable web applications”, WWW07, Banff, Canada, pp. 301-310 2. Jozsef Patvarczki, Murali Mani, and Neil Heffernan, "Performance Driven Database Design for Scalable Web Applications", Advances in Databases and Information Systems, In J. Grundspenkis, T. Morzy & G. Vossen (Eds) Advances in Databases and Information Systems Springer-Verlag: Berlin, ISBN 978-3-642-03972-0, pp . 43-58. Contact: Jozsef Patvarczki, [email protected]

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Given a query workload