Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Information privacy law wikipedia , lookup
Concurrency control wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Business intelligence wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Data vault modeling wikipedia , lookup
Versant Object Database wikipedia , lookup
Clusterpoint wikipedia , lookup
Virtual Database Performance Tuning ©1993-2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica LLC. All other company and product names may be trade names or trademarks of their respective owners and/or copyrighted materials of such owners. Abstract When you create an SQL data service with a virtual schema and deploy it to a Data Integration Service, you create a virtual database. Performance of virtual database queries can be improved if you follow performance tuning guidelines. This article gives several guidelines for tuning the performance of a virtual database. Supported Versions • Informatica Data Services 9.5.1 - 10.0 Table of Contents Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Virtual Database Design Guidelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Mapping Design Guidelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Virtual Database Caching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Result Set Caching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 SQL Query Tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Overview When you create an SQL data service with a virtual schema and deploy it to a Data Integration Service, you create a virtual database that end users can query. The virtual database contains virtual schemas and the virtual tables or stored procedures that define the database structure. You can observe performance tuning guidelines and practices to ensure fast performance of this virtual database. Observe the following guidelines to develop and performance-tune a virtual database: Observe virtual database design guidelines. Prioritize performance tuning activities, design for pushdown, and set user expectations about the results. Observe mapping design guidelines. You can design mappings in ways that optimize performance. Use caching for virtual tables. When you design a caching plan for the virtual database, you can select from among several options for best performance. Use result set caching. When repeated queries are identical, you can use result set caching to reduce database calls and the latency they might cause. Tune SQL queries. When a SQL query contains a join statement, you can follow guidelines to improve SQL query performance. 2 Virtual Database Design Guidelines When you design and tune a virtual database, keep the following guidelines in mind: Prioritize Agility and speed means not sweating the details. You might be able to get that ten-second query down to under one second. But if there isn't a big impact, then ignore it. Design for pushdown When designing mappings, you generally don't worry about memory usage or the difference of seconds in processing. Virtualization has higher concurrency, so more opportunities to eat up memory or CPU time. Your goal in designing virtual database mappings is to push down as much logic to the source systems as possible. This does put a higher load on the source systems, but it is necessary to support low latency. For more information about how to design mappings to support low latency, see the section on mappings below. Avoid transformations that block pushdown The optimizer is not always able to push logic to sources. Logically, you know that system A has a completely different set of customers than system B. But if you try to join a union of those two systems, the join cannot push down. Some transformations do not support pushdown optimization and cause optimization to stop. For example, Data Quality transformations do not support pushdown optimization, so you should avoid them. For more information about pushdown optimization, see "Pushdown Optimization" section in the Informatica Performance Tuning Guide. Set user expectations When you present a virtual database, end users have high expectations for database performance. In reality, sometimes it is impossible to make a certain query or virtual table fast. Set the expectation with users that there is a limit to physical disks, networks and the speed of light. Mapping Design Guidelines Use the following guidelines when you develop virtual table mappings: • Use sorted joins when you source data from different systems. Otherwise, the mapping stops at the join until all data has been read. When joining data from the same connection or database, the joins should automatically push down. • The advantage of logical data objects is their ease of reuse. However, the logic behind a logical data object is essentially pasted into the mapping at run time wherever it is used. Imagine a logical data object that has a lookup that generates a 2 GB cache file. If that logical data object is referenced three times in a mapping, then the mapping generates three 2 GB files during each mapping run. • Unions of different connections in mappings block pushdown. Unions to the same connection will push down. Unions used in virtual database SQL queries that are submitted from the client do not generally block pushdown. • Avoid configuring Expression transformations with default values. These potentially block pushdown. Note: This effect may be limited in version 9.6.0 and later. • 3 Source the same data multiple times. This may not seem intuitive. But consider a large mapping with one source that feeds many different joins. The optimizer cannot push a filter, a sort, or an aggregate from one pipeline to another for fear of impacting the other pipelines. • An Aggregator transformation allows passthrough ports without aggregation expressions, but this is not allowed in native SQL. For example, imagine reading a historical view of YearMonth, CustomerID, CustomerName, DollarValue and aggregating on YearMonth, CustomerID with sum(DollarValue). In this case, CustomerName could be passed through the Aggregator transformation, and Informatica returns the last occurrence of CustomerName. This logic is not possible in SQL and cannot be passed down. The query log will show that pushdown stops at the Aggregator transformation. Avoid this problem by using an aggregation function on all ports that are not part of the aggregation key. Virtual Database Caching Caching makes logical data objects physical by replicating data sets into a relational database. This database is managed by Informatica Data Services. Cache tables are automatically rebuilt on a schedule or via trigger. Caching can have a dramatic impact on performance. When you design a caching plan for the virtual database, remember the first principle of tuning: prioritize. Find the queries that are slow and investigate them further. If caching is a quick solution, then turn it on and be done with it. When you address caching, consider the following options: Virtual table with default caching This is the default for caching in Informatica Data Services. The entire object is cached. Caches can be refreshed on a schedule or by an external trigger. External caching Enable external caching by turning caching on for an object and then giving a cache table name. Informatica Data Services will not build the cache and instead will use this cache table whenever the object is referenced. In this case, you can use change data capture or other techniques that populate and update a cache table. Granular caching The same logical data object can be exposed as multiple virtual tables. You can create different schemas like “uncached.customers” and “cached.customers.” Or you can use instances with different names like “myschema.customers” and “myschema.customers_cached.” This helps users to decide if they need speed or freshness. Another way to achieve granular caching is by splitting virtual table mappings into multiple logical data objects. This technique allows you to cache segments of the virtual table that might otherwise cause slowdowns in processing. It also allows you to reuse cached data across multiple virtual tables that share common logic. For example, CUST_ORDERS and CUST_TICKETS both contain a union of customer data from CRM and ERP. You have detected that the CRM system is slow. The use of a union blocks pushdown. You can split the logic into a logical data object CUST_UNION and share the logic between the two virtual tables. When this logical data object is cached, performance improves for both virtual tables. Result Set Caching Result set caching is configured in the Data Integration Service process properties and the SQL data service properties. When result set caching is enabled, the first query is processed normally and cached in memory. Large results are cached to disk. Subsequent queries that exactly match the original query will return data from the cache without requesting data from the sources. If a subsequent query does not exactly match the original query, it will not use the cache. You can set the retention period for result set caches in the Result Set Cache Expiration Period property. After the retention period passes, the Data Integration Service purges the cache. 4 SQL Query Tuning When you tune the performance of SQL queries, consider the following guideline: SQL supports join statements in the FROM and WHERE clauses. Join statements in the WHERE clause produce uncertainty about whether the statement is a join or a filter. Put join statements in the FROM clause, and isolate filters in the WHERE clause. Example 1 In the following example, the join statement appears in the WHERE clause: SELECT * FROM customer C, orderh H, orderl L WHERE C.state = 'TX' AND C.cid = H.custid AND H.hid = L.headerid Example 2 In the following example, the join statement correctly appears in the FROM clause: SELECT * FROM customer C JOIN orderh H ON C.cid = H.custid JOIN orderl L ON H.hid = L.headerid WHERE C.state = 'TX' Author Tim Smith Senior Product Specialist 5