Download Virtual Database Performance Tuning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data model wikipedia , lookup

Information privacy law wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Business intelligence wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

SQL wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Database wikipedia , lookup

PL/SQL wikipedia , lookup

Data vault modeling wikipedia , lookup

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Virtual Database Performance Tuning
©1993-2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic,
photocopying, recording or otherwise) without prior consent of Informatica LLC. All other company and product names may be
trade names or trademarks of their respective owners and/or copyrighted materials of such owners.
Abstract
When you create an SQL data service with a virtual schema and deploy it to a Data Integration Service, you create a
virtual database. Performance of virtual database queries can be improved if you follow performance tuning guidelines.
This article gives several guidelines for tuning the performance of a virtual database.
Supported Versions
•
Informatica Data Services 9.5.1 - 10.0
Table of Contents
Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Virtual Database Design Guidelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Mapping Design Guidelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Virtual Database Caching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Result Set Caching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
SQL Query Tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Overview
When you create an SQL data service with a virtual schema and deploy it to a Data Integration Service, you create a
virtual database that end users can query. The virtual database contains virtual schemas and the virtual tables or
stored procedures that define the database structure. You can observe performance tuning guidelines and practices to
ensure fast performance of this virtual database.
Observe the following guidelines to develop and performance-tune a virtual database:
Observe virtual database design guidelines.
Prioritize performance tuning activities, design for pushdown, and set user expectations about the results.
Observe mapping design guidelines.
You can design mappings in ways that optimize performance.
Use caching for virtual tables.
When you design a caching plan for the virtual database, you can select from among several options for best
performance.
Use result set caching.
When repeated queries are identical, you can use result set caching to reduce database calls and the latency
they might cause.
Tune SQL queries.
When a SQL query contains a join statement, you can follow guidelines to improve SQL query performance.
2
Virtual Database Design Guidelines
When you design and tune a virtual database, keep the following guidelines in mind:
Prioritize
Agility and speed means not sweating the details. You might be able to get that ten-second query down to
under one second. But if there isn't a big impact, then ignore it.
Design for pushdown
When designing mappings, you generally don't worry about memory usage or the difference of seconds in
processing. Virtualization has higher concurrency, so more opportunities to eat up memory or CPU time.
Your goal in designing virtual database mappings is to push down as much logic to the source systems as
possible. This does put a higher load on the source systems, but it is necessary to support low latency. For
more information about how to design mappings to support low latency, see the section on mappings below.
Avoid transformations that block pushdown
The optimizer is not always able to push logic to sources. Logically, you know that system A has a completely
different set of customers than system B. But if you try to join a union of those two systems, the join cannot
push down.
Some transformations do not support pushdown optimization and cause optimization to stop. For example,
Data Quality transformations do not support pushdown optimization, so you should avoid them.
For more information about pushdown optimization, see "Pushdown Optimization" section in the Informatica
Performance Tuning Guide.
Set user expectations
When you present a virtual database, end users have high expectations for database performance. In reality,
sometimes it is impossible to make a certain query or virtual table fast. Set the expectation with users that
there is a limit to physical disks, networks and the speed of light.
Mapping Design Guidelines
Use the following guidelines when you develop virtual table mappings:
•
Use sorted joins when you source data from different systems. Otherwise, the mapping stops at the join until
all data has been read. When joining data from the same connection or database, the joins should
automatically push down.
•
The advantage of logical data objects is their ease of reuse. However, the logic behind a logical data object is
essentially pasted into the mapping at run time wherever it is used. Imagine a logical data object that has a
lookup that generates a 2 GB cache file. If that logical data object is referenced three times in a mapping, then
the mapping generates three 2 GB files during each mapping run.
•
Unions of different connections in mappings block pushdown. Unions to the same connection will push down.
Unions used in virtual database SQL queries that are submitted from the client do not generally block
pushdown.
•
Avoid configuring Expression transformations with default values. These potentially block pushdown.
Note: This effect may be limited in version 9.6.0 and later.
•
3
Source the same data multiple times. This may not seem intuitive. But consider a large mapping with one
source that feeds many different joins. The optimizer cannot push a filter, a sort, or an aggregate from one
pipeline to another for fear of impacting the other pipelines.
•
An Aggregator transformation allows passthrough ports without aggregation expressions, but this is not
allowed in native SQL. For example, imagine reading a historical view of YearMonth, CustomerID,
CustomerName, DollarValue and aggregating on YearMonth, CustomerID with sum(DollarValue). In this case,
CustomerName could be passed through the Aggregator transformation, and Informatica returns the last
occurrence of CustomerName. This logic is not possible in SQL and cannot be passed down. The query log
will show that pushdown stops at the Aggregator transformation.
Avoid this problem by using an aggregation function on all ports that are not part of the aggregation key.
Virtual Database Caching
Caching makes logical data objects physical by replicating data sets into a relational database. This database is
managed by Informatica Data Services. Cache tables are automatically rebuilt on a schedule or via trigger. Caching
can have a dramatic impact on performance.
When you design a caching plan for the virtual database, remember the first principle of tuning: prioritize. Find the
queries that are slow and investigate them further. If caching is a quick solution, then turn it on and be done with it.
When you address caching, consider the following options:
Virtual table with default caching
This is the default for caching in Informatica Data Services. The entire object is cached. Caches can be
refreshed on a schedule or by an external trigger.
External caching
Enable external caching by turning caching on for an object and then giving a cache table name. Informatica
Data Services will not build the cache and instead will use this cache table whenever the object is referenced.
In this case, you can use change data capture or other techniques that populate and update a cache table.
Granular caching
The same logical data object can be exposed as multiple virtual tables. You can create different schemas like
“uncached.customers” and “cached.customers.” Or you can use instances with different names like
“myschema.customers” and “myschema.customers_cached.” This helps users to decide if they need speed
or freshness.
Another way to achieve granular caching is by splitting virtual table mappings into multiple logical data
objects. This technique allows you to cache segments of the virtual table that might otherwise cause
slowdowns in processing. It also allows you to reuse cached data across multiple virtual tables that share
common logic.
For example, CUST_ORDERS and CUST_TICKETS both contain a union of customer data from CRM and
ERP. You have detected that the CRM system is slow. The use of a union blocks pushdown. You can split
the logic into a logical data object CUST_UNION and share the logic between the two virtual tables. When
this logical data object is cached, performance improves for both virtual tables.
Result Set Caching
Result set caching is configured in the Data Integration Service process properties and the SQL data service
properties.
When result set caching is enabled, the first query is processed normally and cached in memory. Large results are
cached to disk. Subsequent queries that exactly match the original query will return data from the cache without
requesting data from the sources. If a subsequent query does not exactly match the original query, it will not use the
cache.
You can set the retention period for result set caches in the Result Set Cache Expiration Period property. After the
retention period passes, the Data Integration Service purges the cache.
4
SQL Query Tuning
When you tune the performance of SQL queries, consider the following guideline:
SQL supports join statements in the FROM and WHERE clauses. Join statements in the WHERE clause produce
uncertainty about whether the statement is a join or a filter. Put join statements in the FROM clause, and isolate filters
in the WHERE clause.
Example 1
In the following example, the join statement appears in the WHERE clause:
SELECT *
FROM
customer C,
orderh H,
orderl L
WHERE C.state = 'TX'
AND C.cid = H.custid
AND H.hid = L.headerid
Example 2
In the following example, the join statement correctly appears in the FROM clause:
SELECT *
FROM
customer C
JOIN orderh H ON C.cid = H.custid
JOIN orderl L ON H.hid = L.headerid
WHERE C.state = 'TX'
Author
Tim Smith
Senior Product Specialist
5