Download Identifying Source Bottlenecks

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Performance Tuning & Best Practices for Informatica Developer Version 1.
Informatica 8.6
Performance Tuning & Best
Practices on Power Center 8.6
1
Performance Tuning & Best Practices for Informatica Developer Version 1.
Table of Content
1. Introduction
2. Transformations & Best Practices
1)
Source qualifier
2) Expression
3) Lookup
4) Sequence Generator
5) Aggregator
6) Filter
7) Router
8) Joiner
9) Normalizer
10)
Sorter
11)
Rank
12)
Update Strategy
13)
External Procedure
14)
Store Procedure
15)
XML Source Qualifier
16)
Union
3. Performance Tuning
o Source Query Tuning
o Transformation Tuning
o Memory Optimization
o Session Tuning
o Session Partitioning
 Database Partitioning
 Hash Partitioning
 Key Range
 Pass-through
 Round-Robin
o Pushdown optimization
o Identifying & Eliminating Bottlenecks
 Target Bottleneck
 Source Bottlenecks
 Transformations /Mapping Bottlenecks
 Session Bottlenecks
2
Performance Tuning & Best Practices for Informatica Developer Version 1.
1. Introduction
This document will address brief idea on performance tuning to optimize the performance of
informatica in real time environment. Also it includes frequently used transformation detail and
the best practices that can be used to optimize the performance. It also focus on the other areas
like SQL/Database tuning, debugging techniques, parallel processing, pushdown optimization
etc.
The primary objective of this document to help informatica developer while dealing with different
scenarios in Informatica that helps to optimize the performance.
2. Transformations
Transformations are the Informatica repository objects that used to build the business logic
according to which we can perform ETL.
Below is the list of frequently used Informatica transformations.
17)
18)
19)
20)
21)
22)
23)
24)
25)
26)
27)
28)
29)
30)
31)
32)
Source qualifier
Expression
Lookup
Sequence Generator
Aggregator
Filter
Router
Joiner
Normalizer
Sorter
Rank
Update Strategy
External Procedure
Store Procedure
XML Source Qualifier
Union
3
Performance Tuning & Best Practices for Informatica Developer Version 1.
2.1 Source Qualifier
The Source Qualifier transformation is any data’s first entry point into a mapping. It is used to
perform the following tasks:





Join data originating from the same source database
Filter records when the Informatica Server reads source data. (i.e. a SQL WHERE
condition)
Specify sorted ports. If you specify a number for sorted ports, the Informatica Server
adds an ORDER BY clause to the default SQL query.
Select only distinct values from the source. If you choose Select Distinct, the Informatica
Server adds a SELECT DISTINCT statement to the default SQL query.
Create a custom query (SQL override) to issue a special SELECT statement for the
Informatica Server to read source data.
The following figure shows joining two tables with one Source Qualifier transformation:
Best Practices






Only use SQL Overrides if there is a substantial performance gain or complexity
decrease. SQL Overrides need to be maintained manually and any changes to the data
structure will result in rewriting or modifying the SQL Override.
Do use the WHERE condition and SORTED ports in the Source Qualifier if possible,
rather than adding a filter or a sorter transformation.
Delete unused ports / only connect what is used. Reducing the number of records used
throughout the mapping provides better performance by minimizing the amount of data
moved.
Tune source qualifier queries to return only the data you need.
Perform large lookups in the Source Qualifier instead of through a traditional lookup.
When applicable, generate the default SQL in the Source Qualifier and use the ‘Validate’
option to verify that the resulting SQL is valid.
4
Performance Tuning & Best Practices for Informatica Developer Version 1.
2.2 Expression
The Expression transformation is a passive transformation, used to calculate values in a single
row before you write to the target. For example, you might need to adjust employee salaries,
concatenate first and last names, or convert strings to numbers. You can use the Expression
transformation to perform any non-aggregate calculations. You can also use the Expression
transformation to test conditional statements before you output the results to target tables or
other transformations.
Local variables can be used in Expression transformations and greatly enhance the capabilities
of this transformation object.
Best Practices







Calculate once, use many times. Avoid calculating or testing the same value over and
over. Calculate it once in an expression, and set a true/false flag. Within an expression,
use variables to calculate a value used several times.
Create an anchor expression transformation that will map the source table to an
intermediary transformation using the source column names. Do simple processes
(LTRIM/RTRIM, string/numeric conversions, testing for NULL, etc.) in this
transformation. This will enable an easier transition if the source table changes in the
future.
Watch your data types. The engine will automatically convert compatible types.
Sometimes conversion is excessive and happens on every transformation which slows
the mapping.
Expression names should begin with "EXP" followed by descriptive words
Do not propagate ports out of an Expression transformation if they are not used in the
mapping going forward.
Group input-outputs ports first, followed by variable ports and then by output ports.
Incorrectly ordering the ports in an Expression transformation can lead to errors and/or
inaccurate results.
If a reusable expression is being used to perform common calculations, consider using
User Defined Functions (UDF). UDFs are a new feature in PowerCenter 8.x
2.3 Lookup Transformation
The Lookup transformation is used to look up data in a relational table, view, synonym, or flat
file. When a lookup is used, the Informatica Server queries the lookup table based on the lookup
5
Performance Tuning & Best Practices for Informatica Developer Version 1.
ports in the transformation. It compares Lookup transformation port values to lookup table
column values based on the lookup condition. Use the result of the lookup to pass to other
transformations and the target.
One common error encounter when using the Lookup transformation deals with using the
Informatica $Source and $Target variables for the relational connection needed for the lookup.
When a bulk loader is used, the $Target variable is not valid and must be replaced with the
proper connection. Likewise, care should be taken when using the $Source variable to ensure
that the proper database is being queried for results.
Lookups are dealt with in greater detail below.
Best Practices









Avoid using the $Source and $Target variables in the Lookup Connection Information.
Connection names have been set up to be generic across Production, Test. If possible,
set the Connection Information in the Lookup transformation to one of these non-level
specific connection names.
Set the connections in the session for ease of migration.
Do not include any more ports in the Lookup other than necessary. Reducing the
amount of data processed provides better performance.
Avoid date time comparisons in lookup; replace with string.
Not all sources and targets treat strings with leading or trailing blanks the same. It may
be necessary to RTRIM and LTRIM string data prior to using it in a Lookup.
Lookups on small tables (<10,000) records can be cached and use ODBC. Lookups on
large tables should be cached As a general practice, do not use uncached lookups.
In place of lookups, tables can be joined in the source qualifier. However, often
necessitates left joins, which can complicate source qualifiers. Weigh performance vs
ease of maintenance when deciding between source qualifiers and lookups.
When you create more than one lookup condition, place conditions with an equal first in
order to optimize lookup performance.
Where lookup data does not change frequently, consider using a persistent cache to
improve performance. For example, validating state codes in the United States of
America.
2.4 Sequence Generator
The Sequence Generator transformation generates numeric values and is used to create unique
primary key values, replace missing primary keys, or cycle through a sequential range of
numbers.
It contains two output ports that you can connect to one or more transformations.
When NEXTVAL is connected to the input port of another transformation, the Integration Service
generates a sequence of numbers. When CURRVAL is connected to the input port of another
transformation, the Integration Service generates the NEXTVAL value plus the Increment By
value.
If you connect the CURRVAL port without connecting the NEXTVAL port, the Integration Service
passes a constant value for each row.
The Sequence Generator transformation is unique among all transformations because you
cannot add, edit, or delete the default ports, NEXTVAL and CURRVAL
6
Performance Tuning & Best Practices for Informatica Developer Version 1.
Best Practices

Try to use reusable sequence generator than using separate sequence generator if you
are using it for generating unique primary key.
2.5 Aggregator
The Aggregator transformation allows you to perform aggregate calculations, such as averages
and sums. The Aggregator transformation is an active transformation, which means that it can
change the number of rows that pass through it, in contrast to the passive Expression
transformation.
Additionally, Informatica allows for Incremental aggregation. When this feature is used, the
repository stores aggregate values so that the target table does not need to be queried during a
mapping run.
Best Practices




Factor out aggregate function calls where possible. SUM(A) + SUM(B) can become
SUM(A+B). Therefore, the server only searches through and groups the data once.
Do not use Aggregators for simple sorting; use the sorter transformation or the SORTED
ports option of the Source Qualifier.
Minimize aggregate function calls by using “group by”.
Do place Aggregators as early in the mapping as possible, as they reduce the number of
records being processed, thereby improving performance.


Wherever possible, sort incoming data to an Aggregated use the ‘Sorted input’ option to
improve performance.
2.6 Filter Transformation
The Filter transformation allows you to filter rows in a mapping. You pass all the rows from a
source transformation through the Filter transformation, and then enter a filter condition for the
transformation. All ports in a Filter transformation are input/output and only rows that meet the
condition pass through the Filter transformation.
Best Practices



Place Filters as early in the mapping as possible, as they reduce the number of records
being processed, thereby improving performance.
Use a Filter to screen rows that would be rejected by an update strategy. (Rejected
rows from an update strategy are logged to the bad file, decreasing performance.)
If you have aggregator transformation in mapping, use filter before aggregation to avoid
unnecessary aggregation.
If you need to test the same input data based on multiple conditions, consider using a Router
transformation instead of creating multiple Filter transformations. When you use a Router
transformation, the Integration Service processes incoming data only once. When you use
multiple Filter transformations, the Integration Service processes incoming data for each
transformation
7
Performance Tuning & Best Practices for Informatica Developer Version 1.
2.7 Router Transformation
A Router transformation is used to conditionally test data and route records based upon that
conditional test. It is similar to a Filter transformation because both transformations allow you to
use a condition to test data, but a Filter transformation tests data for one condition and drops the
rows of data that do not meet the condition whereas a Router transformation tests data for one
or more conditions and gives you the option to route rows of data that do not meet any of the
conditions to a default output group.
Additionally, a Router transformation allows the programmer to test the same input data for
multiple conditions – multiple Filter transformations would be needed to accomplish the same
functionality.
If multiple routing conditions are needed, the Router transformation should be used in deference
to multiple Filters as this is more readable and more efficient since each row need only be tested
once.
8
Performance Tuning & Best Practices for Informatica Developer Version 1.
Best Practices


Routers may not be the best choice if load order of the target(s) is important since it is
not possible to control the load order of the legs from a router.
The target load method(s) must be carefully chosen when using routers, especially if the
data is loading to the same target, in order to avoid table locks and ensure that the data
is loaded in the correct order.
2.8 Joiner Transformation
The Joiner transformation joins two related heterogeneous sources residing in different locations
or file systems. It can also join two tables from the same source. This is generally only done
when trying to avoid outer joins in the Source Qualifiers.
The two input pipelines include a master pipeline and a detail pipeline or a master and a detail
branch. The master pipeline ends at the Joiner transformation, while the detail pipeline continues
to the target.
One common point of confusion concerns the join types available. Below table summarizes the
join types available and their associated behavior:
Normal Join
The Informatica Server discards all rows of data from the master
9
Performance Tuning & Best Practices for Informatica Developer Version 1.
Master Outer Join
Detail Outer Join
Full Outer Join
and detail source that do not match, based on the condition.
The Informatica Server keeps all rows of data from the detail
source and the matching rows from the master source. It
discards the unmatched rows from the master source. Null
values are inserted in the data stream where needed.
The Informatica Server keeps all rows of data from the master
source and the matching rows from the detail source. It discards
the unmatched rows from the detail source. Null values are
inserted in the data stream where needed.
The Informatica Server keeps all rows of data from both the
master and detail sources. Null values are inserted in the data
stream where needed.
Best Practices



Whenever possible, perform joins in the database i.e. the source qualifier itself.
Whenever possible, sort incoming data to a Joiner transformation and use the ‘Sorted
input’ option to improve performance.
To improve performance of an unsorted Joiner transformation, designate the source with
fewer s as the ‘Master’.
2.8 Normalizer Transformation
The Normalizer transformation normalizes records from COBOL and other sources, allowing you
to organize data in different formats. The Normalizer transformation is used primarily with
COBOL sources, which are often stored in a de-normalized format. The OCCURS statement in
a COBOL file nests multiple records of information in a single record. Using the Normalizer
transformation, you break out repeated data within a record into separate records.
The Normalizer transformation can also be used with relational sources to create multiple rows
from a single row of data.
VSAM Normalizer transformation. A non-reusable transformation that is a Source Qualifier
transformation for a COBOL source.
Pipeline Normalizer transformation. A transformation that processes multiple-occurring data
from relational tables or flat files.
For example, you might have a relational table that stores four quarters of sales by store. You
need to create a row for each sales occurrence. You can configure a Normalizer transformation
to return a separate row for each quarter.
The following source rows contain four quarters of sales by store:
Store1 100
300
500
700
Store2 250
450
650
850
The Normalizer returns a row for each store and sales combination. It also returns an index that
identifies the quarter number:
Store1 100 1
10
Performance Tuning & Best Practices for Informatica Developer Version 1.
Store1 300 2
Store1 500 3
Store1 700 4
Store2 250 1
Store2 450 2
Store2 650 3
Store2 850 4
2.9 Sorter Transformation
The Sorter transformation is used to sort data. It can sort data from a source transformation in
ascending or descending order according to a specified sort key and can be configured for casesensitive sorting and whether the output rows should be distinct.
Best Practices
 Whenever possible, sort source data in the database i.e. the source qualifier itself.
 Default Sorter cache size is 8MB. If the amount of incoming data is greater than the
amount of Sorter cache size, the Integration Service temporarily stores data in the Sorter
transformation work directory. For best performance, configure sorter cache size with a
value less than or equal to the amount of physical RAM available on the Integration
Service machine.
2.10 Rank Transformation
The Rank transformation allows you to select only the top or bottom rank of data. You can use a
Rank transformation to return the largest or smallest numeric value in a port or group. You can
also use a Rank transformation to return the strings at the top or the bottom of a session sort
order.
The Rank transformation differs from the transformation functions MAX and MIN, in that it lets
you select a group of top or bottom values, not just one value. For example, use Rank to select
the top 10 salespersons in a given territory. Or, to generate a financial report, you might also use
a Rank transformation to identify the three departments with the lowest expenses in salaries and
overhead.
11
Performance Tuning & Best Practices for Informatica Developer Version 1.
Rank Caches
During a session, the Integration Service compares an input row with rows in the data cache. If
the input row out-ranks a cached row, the Integration Service replaces the cached row with the
input row. If you configure the Rank transformation to rank across multiple groups, the
Integration Service ranks incrementally for each group it finds.
The Integration Service stores group information in an index cache and row data in a data
cache. If you create multiple partitions in a pipeline, the Integration Service creates separate
caches for each partition.
2.11 Update Strategy
The Update Strategy transformation is used to flag rows for insert, delete, update, or reject. The
Update Strategy transformation can check data conditions and use its findings to issue the
proper SQL statements to the target database. To implement the Slowly Changing Dimension
load strategy Update Strategy is very much useful to flag the incoming record for INSERT,
UPDATE, etc. We can capture the transformation rejected records by enabling forward rejected
records option.
Operation
Insert
Update
Delete
Reject
Constant
Numeric Value
DD_INSERT 0
DD_UPDATE 1
DD_DELETE 2
DD_REJECT 3
Best Practices
 Do not codes update strategies when all rows to the target insert.
 DO include an update strategy when all rows to the target are update, unless a proof of
concept shows that there is performance degradation. This adds clarity to the mapping
for future developers.
 Rejected rows from an update strategy are logged to the bad file. Consider filtering if
retaining these rows isn’t critical due to the performance hit caused by logging.
 Avoid loading to the same target between different data flows where possible.
Do use conditional logic in an update strategy as necessary
2.12 External Procedure
External Procedure transformations operate in conjunction with procedures you create outside of
the Designer interface to extend Informatica’s functionality. Although the standard Informatica
transformations provide a wide range of options, there are occasions when the programmer may
wish to extend the functionality provided with PowerCenter.
External Procedure transformations operate in conjunction with procedures you create outside of
the Designer interface to extend PowerCenter functionality.
Although the standard transformations provide you with a wide range of options, there are
occasions when you might want to extend the functionality provided with PowerCenter. For
example, the range of standard transformations, such as Expression and Filter transformations,
may not provide the functionality you need. If you are an experienced programmer, you may
12
Performance Tuning & Best Practices for Informatica Developer Version 1.
want to develop complex functions within a dynamic link library (DLL) or UNIX shared library,
instead of creating the necessary Expression transformations in a mapping.
To get this kind of extensibility, use the Transformation Exchange (TX) dynamic invocation
interface built into PowerCenter. Using TX, you can create an Informatica External Procedure
transformation and bind it to an external procedure that you have developed. You can bind
External Procedure transformations to two kinds of external procedures:
COM external procedures (available on Windows only)
Informatica external procedures (available on Windows, AIX, HP-UX, Linux, and Solaris)
To use TX, you must be an experienced C, C++, or Visual Basic programmer.Use multi-threaded
code in external procedures.
2.14 Stored Procedure
A Stored Procedure transformation is used to call a stored procedure on a relational database.
Stored procedures must exist in the database before creating a Stored Procedure
transformation, and the stored procedure can exist in a source, target, or any database with a
valid connection to the server.
There are four ways we can run store procedure transformation
Normal: The stored procedure runs where the transformation exists in the mapping on a rowby-row basis. This is useful for calling the stored procedure for each row of data that passes
through the mapping, such as running a calculation against an input port. Connected stored
procedures run only in normal mode.
Pre-load of the Source: Before the session retrieves data from the source, the stored
procedure runs. This is useful for verifying the existence of tables or performing joins of data in a
temporary table.
Post-load of the Source: After the session retrieves data from the source, the stored
procedure runs. This is useful for removing temporary tables.
Pre-load of the Target: Before the session sends data to the target, the stored procedure runs.
This is useful for verifying target tables or disk space on the target system.
Post-load of the Target: After the session sends data to the target, the stored procedure runs.
This is useful for re-creating indexes on the database.
2.15 XML Source Qualifier
The XML Source Qualifier represents the data elements that the Informatica Server reads when
it executes a session with XML sources.
Each group in an XML source definition is analogous to a relational table, and the Designer
treats each group within the XML Source Qualifier transformation as a separate source of data.

You can link ports from one group in an XML Source Qualifier transformation to ports in
one input group of another transformation. You can copy the columns of several groups
13
Performance Tuning & Best Practices for Informatica Developer Version 1.
to one transformation, but you can link the ports of only one group to the corresponding
ports in the transformation.

You can link multiple groups from one XML Source Qualifier transformation to different
input groups in a transformation. You can link multiple groups from an XML Source
Qualifier transformation to different input groups in most multiple input group
transformations, such as a Joiner or Custom transformations.
2.16 Union Transformation
The Union transformation is a multiple input group transformation that you use to merge data
from multiple pipelines or pipeline branches into one pipeline branch. It merges data from
multiple sources similar to the UNION ALL SQL statement to combine the results from two or
more SQL statements. Similar to the UNION ALL statement, the Union transformation does not
remove duplicate rows.
The Union transformation is a non-blocking multiple input group transformation. You can connect
the input groups to different branches in a single pipeline or to different source pipelines.
When you add a Union transformation to a mapping, you must verify that you connect the same
ports in all input groups. If you connect all ports in one input group, but do not connect a port in
another input group, the Integration Service passes NULLs to the unconnected port.
The following figure shows a mapping with a Union transformation:
14
Performance Tuning & Best Practices for Informatica Developer Version 1.
Best practices:



All input groups and the output group must have matching ports. The precision, data
type, and scale must be identical across all groups.
To remove duplicate rows, you must add transformation such as a Router or Filter
transformation.
You cannot use a Sequence Generator or Update Strategy transformation upstream
from a Union transformation.
3.0 Performance Tuning
To achieve required valid data to be extracted, transformed and loaded in timely manner we
have to optimize the performance in below areas,




SQL/DB Tuning
Transformation Tuning
Session Tuning
Identifying & Eliminating bottleneck
3.1 Source Query Tuning







Optimize your DB query with directly running it on DB from the DB tool given by client
like TOAD , Teradata SQL administrator.
Run the query generated from writer thread in session log directly on DB and check the
result.
Restructure the Statement as per business need the data to be loaded in target table.
Use of DISTINCT must be minimized, DISTINCT always creates SORT.
Use ORDER BY and GROUP BY clause where ever necessary.
Queries that contain ORDER BY or GROUP BY clauses may benefit from creating an
index on the ORDER BY or GROUP BY columns.
Use Conditional filter to remove unnecessary data.
3.2 Transformation Tuning
Follow the best practices to minimize the transformation errors.
15
Performance Tuning & Best Practices for Informatica Developer Version 1.
3.2.1 Expressions Transformation
Simplify nested functions when possible instead of:
IIF(condition1,result1,IIF(condition2, result2,IIF… ))))))))))))
try:
DECODE (TRUE, condition1, result1, : conditionn, resultn)
3.2.2 Optimizing Aggregator Transformation





Group by simple columns.
Use sorted input
Use incremental aggregation.
Filter data before you aggregate it.
Limit port connections
3.2.3 Optimizing Joiner Transformations




Designate the master source as the source with fewer duplicate key values.
Designate the master source as the source with fewer rows.
Join sorted data when possible
Perform joins in a database when possible.
1. Create a pre-session stored procedure to join the tables in a database.
2. Use the Source Qualifier transformation to perform the join
3.2.4 Optimizing Lookup Transformations







Use the optimal database driver.
Cache lookup tables.
Optimize the lookup condition.
Filter lookup rows.
Index the lookup table.
Optimize multiple lookups.
Create a pipeline Lookup transformation and configure partitions in the pipeline that
builds the lookup source.
3.2.5 Optimizing Sequence Generator Transformations
To optimize Sequence Generator transformations, create a reusable Sequence Generator and
using it in multiple mappings simultaneously. Also, configure the Number of Cached Values
property.
3.2.6 Optimizing Sorter Transformations


Allocate enough memory to sort the data
Specify a different work directory for each partition in the Sorter transformation
3.2.7 Optimizing Source Qualifier Transformations
16
Performance Tuning & Best Practices for Informatica Developer Version 1.
Use the Select Distinct option for the Source Qualifier transformation if you want the Integration
Service to select unique values from a source. Use Select Distinct option to filter unnecessary
data earlier in the data flow. This can improve performance.
3.3 Memory Optimization
3.3.1 Tuning DTM Buffer
It is a temporary storage area for data and has been divided into blocks. This buffer size and
block size are tunable, however default setting for each is Auto
Default is Auto means DTM estimates optimal size.
Tuning the DTM Buffer:





Determine the minimum DTM buffer size (DTM buffer size) = (buffer block size) x
(minimum number of blocks) / 0.9
Increase by a multiple of the block size
If performance does not improve, return to previous setting
There is no “formula” for optimal DTM buffer size
Auto setting may be adequate for some sessions.
3.3.2 Transformation Caches
17
Performance Tuning & Best Practices for Informatica Developer Version 1.
It is temporary storage area for certain transformations and except for Sorter, each is divided into
a Data & Index Cache. The size of each transformation cache is tunable. The default setting for
each cache is Auto
There are four or five transformations that use caches while running,
3.3.2 .1 Aggregator Caches:
Unsorted Input
 Must read all input before releasing any output rows
 Index cache contains group keys
 Data cache contains non-group-by ports
Sorted Input
 Releases output row as each input group is processed
 Does not require data or index cache (both =0)
 May run much faster than unsorted BUT must consider the expense of sorting
Aggregator caches manual tuning:
3.3.2 .2 Joiner Caches
Unsorted Input:
 All master data loaded into cache
 Specify smaller data set as master
 Index cache contains join keys
 Data cache contains non-key connected outputs
18
Performance Tuning & Best Practices for Informatica Developer Version 1.
Sorted Input:
 Both inputs must be sorted on join keys
 Specify data set with fewest records under a single key as master
 Index cache contains up to 100 keys
 Data cache contains non-key connected outputs associated with the 100 keys
Joiner Caches manual Tuning
3.3.2 .3 Lookup Caches:
Data cache
Only connected output ports included in data cache
For unconnected lookup, only “return” port included in data cach
Index cache size
Only lookup keys included in index cache
Tuning:
 SQL override
 Persistent cache (if the lookup data is static)
 Optimize sort:
1. Default- lookup keys, then connected output ports in port order
2. Can be commented out or overridden in SQL override
3. Indexing strategy on table may impact performance.
 Can build lookup caches concurrently
19
Performance Tuning & Best Practices for Informatica Developer Version 1.

1. May improve session performance when there is significant activity upstream
from the lookup & the lookup cache is large.
2. This option applies to the individual session
Integration Service builds lookup caches at the beginning of the session run, even if no
row has entered a Lookup transformation
Lookup caches Manual Tuning:
3.3.2 .4 Rank Caches
 Index cache contains group keys
 Data cache contains non-group-by ports
 Cache sizes related to the number of groups & the number of ranks.
20
Performance Tuning & Best Practices for Informatica Developer Version 1.
3.3.2.5 Sorter Cache


Sorter Transformation
1. May be faster than a DB sort or 3rd party sorter
2. Index read from RDB = pre-sorted data
3. SQL SELECT DISTINCT may reduce the volume of data across the network
versus sorter with “Distinct” property set
Single cache (no separation of index & data)
If a cache setting is too small, DTM writes overflow to disk.
Determine if transformation caches are overflowing:
 Watch the cache directory on the file system while the session runs
 Use the session performance counters
Options to Tune:
 Increase the maximum memory allowed for Auto transformation cache sizes
 Set the cache sizes for individual transformations manually.
3.3.2 .6 Session Performance Counters
All transformations have counters. The Integration Service tracks the number of input rows,
output rows, and error rows for each transformation. Some transformations have performance
counters.
 Errorrows
 Readfromcache and Writetocache
 Readfromdisk and Writetodisk
 Rowsinlookupcache
We can collect session performance detail by enabling ‘collect performance data’ and’ write
performance data to repository’
.
Below image shows how we can find performance counters for aggregator transformation.
21
Performance Tuning & Best Practices for Informatica Developer Version 1.
Options to Tune:
 Non-0 counts for readfromdisk and writetodisk indicate sub-optimal settings for
transformation index or data caches
 This may indicate the need to tune transformation caches manually
 Any manual setting allocates memory outside of previously set maximum.
 Cache Calculators provide guidance in manual tuning of transformation caches.
3.4 Session Tuning
3.4.1 Sequential & Concurrent Batch
Sequential Batch: Run session one bye one.
Concurrent Batch: Run the session simultaneously.
Advantage of Concurrent Batch:
It takes informatica server resource and reduces time it takes run session separately.
Use this feature when we have multiple sources that process large amount of data in one
session. Split session and put into one concurrent batch to complete it quickly.
Disadvantage of Concurrent Batch:
Required more shared memory otherwise session may get failed.
3.4.2 Run Session on Grid
A grid is an alias assigned to a group of nodes that allows you to automate the distribution of
workflows and sessions across nodes.
 Balances the Integration Service workload.
 Processes concurrent sessions faster.
 Processes partitions faster.
The Integration Service requires CPU resources for parsing input data and formatting the output
data. A grid can improve performance when you have a performance bottleneck in the extract
and load steps of a session.
22
Performance Tuning & Best Practices for Informatica Developer Version 1.
Running a session on a grid can improve throughput because the grid provides more resources
to run the session.
When you run multiple sessions on a grid, session subtasks share node resources with subtasks
of other concurrent sessions.
3.4.3 Session Partitioning
Session portioning increase the session performance significantly. Informatica support the
following types of portioning.
You can define following partition types in workflow manager
1.
2.
3.
4.
5.
Round Robin
Database Partitioning
Pass-through
Hash key
Key range
1) Database Partitioning
The Integration Service queries the IBM DB2 or Oracle system for table partition information. It
reads partitioned data from the corresponding nodes in the database. Use database partitioning
with Oracle or IBM DB2 source instances on a multi-node table space. Use database partitioning
with DB2 targets
2) Hash Partitioning
Use hash partitioning when you want the Integration Service to distribute rows to the partitions
by group. For example you need to sort items by item ID but you do not know how many items
have a particular ID number.
Hash Auto Keys:
The DTM applies a hash function to a partition key to group data among partitions.
Use hash partitioning to ensure that groups of rows are processed in the same
partition.
Hash User Keys:
This is similar to hash auto keys except the user specifies which ports make up the
partition key.
3) Key Range
You specify one or more ports to form a compound partition key. The Integration Service passes
data to each partition depending on the ranges you specify for each port. Use key range
partitioning where the sources or targets in the pipeline are partitioned by key range.
4) Pass-through
The Integration Service passes all rows at one partition point to the next partition point without
redistributing them. Choose pass-through partitioning where you want to create an additional
pipeline stage to improve performance but do not want to change the distribution of data across
partitions
23
Performance Tuning & Best Practices for Informatica Developer Version 1.
5) Round-Robin
The Integration Service distributes data evenly among all partitions. Use round-robin partitioning
where you want each partition to process approximately the same number of rows.
3.4.4 Pushdown optimization:
Pushdown Optimization which is a new concept in Informatica PowerCentre, allows developers
to balance data transformation load among servers.
Pushdown Optimization
Pushdown optimization is a way of load-balancing among servers in order to
achieve optimal performance. Suppose an ETL logic needs to filter out data
based on some condition. One can either do it in database by using WHERE
condition in the SQL query or inside Informatica by using Informatica Filter
transformation. Sometimes, we can even "push" some transformation logic to
the target database instead of doing it in the source side
How does Push-Down Optimization work?
One can push transformation logic to the source or target database using pushdown
optimization. The Integration Service translates the transformation logic into SQL queries and
sends the SQL queries to the source or the target database which executes the SQL queries to
process the transformations. The amount of transformation logic one can push to the database
depends on the database, transformation logic, and mapping and session configuration. The
Integration Service analyzes the transformation logic it can push to the database and executes
the SQL statement generated against the source or target tables, and it processes any
transformation logic that it cannot push to the database.
Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that
the Integration Service can push to the source or target database. You can also use the
Pushdown Optimization Viewer to view the messages related to pushdown optimization.
For example:
Filter Condition used in this mapping is: DEPTNO>40
Suppose a mapping contains a Filter transformation that filters out all employees except those
with a DEPTNO greater than 40. The Integration Service can push the transformation logic to the
database. It generates the following SQL statement to process the transformation logic:
24
Performance Tuning & Best Practices for Informatica Developer Version 1.
INSERT INTO EMP_TGT(EMPNO, ENAME, SAL, COMM, DEPTNO)
SELECT EMP_SRC. EMPNO,
EMP_SRC.ENAME,
EMP_SRC.SAL,
EMP_SRC.COMM,
EMP_SRC.DEPTNO
FROM EMP_SRC
WHERE (EMP_SRC.DEPTNO >40)
The Integration Service generates an INSERT SELECT statement and it filters the data using a
WHERE clause. The Integration Service does not extract data from the database at this time.
We can configure pushdown optimization in the following ways:
Using source-side pushdown optimization:
The Integration Service pushes as much transformation logic as possible to the source
database. The Integration Service analyzes the mapping from the source to the target or until it
reaches a downstream transformation it cannot push to the source database and executes the
corresponding SELECT statement.
Using target-side pushdown optimization:
The Integration Service pushes as much transformation logic as possible to the target database.
The Integration Service analyzes the mapping from the target to the source or until it reaches an
upstream transformation it cannot push to the target database. It generates an INSERT,
DELETE, or UPDATE statement based on the transformation logic for each transformation it can
push to the database and executes the DML.
Using full pushdown optimization:
The Integration Service pushes as much transformation logic as possible to both source and
target databases. If you configure a session for full pushdown optimization, and the Integration
Service cannot push all the transformation logic to the database, it performs source-side or
target-side pushdown optimization instead. Also the source and target must be on the same
database. The Integration Service analyzes the mapping starting with the source and analyzes
each transformation in the pipeline until it analyzes the target. When it can push all
transformation logic to the database, it generates an INSERT SELECT statement to run on the
database. The statement incorporates transformation logic from all the transformations in the
mapping. If the Integration Service can push only part of the transformation logic to the
database, it does not fail the session, it pushes as much transformation logic to the source and
target database as possible and then processes the remaining transformation logic.
For example, a mapping contains the following transformations:
SourceDefn -> SourceQualifier -> Aggregator -> Rank -> Expression -> TargetDefn
SUM(SAL), SUM(COMM) Group by DEPTNO
RANK PORT on SAL
TOTAL = SAL+COMM

The Rank transformation cannot be pushed to the database. If the session is configured
for full pushdown optimization, the Integration Service pushes the Source Qualifier
25
Performance Tuning & Best Practices for Informatica Developer Version 1.
transformation and the Aggregator transformation to the source, processes the Rank
transformation, and pushes the Expression transformation and target to the target
database.
When we use pushdown optimization, the Integration Service converts the expression in the
transformation or in the workflow link by determining equivalent operators, variables, and
functions in the database. If there is no equivalent operator, variable, or function, the Integration
Service itself processes the transformation logic. The Integration Service logs a message in the
workflow log and the Pushdown Optimization Viewer when it cannot push an expression to the
database. Use the message to determine the reason why it could not push the expression to the
database.
3.4.5 Identifying & Eleminating Bottlenecks
Depending upon which thread is busy we can find out the bottlenecks.
The first challenge is to identify the bottleneck
 Target
 Source
 Transformations/Mapping
 Session
Tuning the most severe bottleneck may reveal another one.
3.4.5.1 Target Bottleneck
The most common performance bottleneck occurs when the Integration Service writes to a
target database. Small checkpoint intervals, small database network packet sizes, or problems
during heavy loading operations can cause target bottlenecks.
To identify a target bottleneck
Configure a copy of the session to write to a flat file target. If the session performance increases
significantly, you have a target bottleneck. If a session already writes to a flat file target, you
probably do not have a target bottleneck.
26
Performance Tuning & Best Practices for Informatica Developer Version 1.
Eliminating Target Bottlenecks



Drop index and key constraints
Perform bulk loading (Ignore DB log)
Increase the database network packet size.
3.4.5.2 Source Bottlenecks
Performance bottlenecks can occur when the Integration Service reads from a source database.
Inefficient query or small database network packet sizes can cause source bottlenecks.
Identifying Source Bottlenecks
If the session reads from a relational source, use the following methods to identify source
bottlenecks.



Filter transformation
Read test mapping
Database query
Using a Filter Transformation



Add a Filter transformation after each source qualifier.
Set the filter condition to false so that no data is processed passed the Filter
transformation.
If the time it takes to run the new session remains about the same, you have a source
bottleneck.
Using a Read Test Mapping




Make a copy of the original mapping
In the copied mapping, keep only the sources, source qualifiers, and any custom joins or
queries
Remove all transformations
Connect the source qualifiers to a file target.
Using a Database Query


Copy the read query directly from the session log. Execute the query against the source
database with a query tool
Measure the query execution time and the time it takes for the query to return the first
row
Eliminating Source Bottlenecks


Have the database administrator optimize database performance by optimizing the
query
Set the number of bytes the Integration Service reads per line if the Integration Service
reads from a flat file source
27
Performance Tuning & Best Practices for Informatica Developer Version 1.


Increase the database network packet size
Configure index and key constraints wherever it’s necessary.
3.4.5.3 Transformations /Mapping Bottlenecks
If you determine that you do not have a source or target bottleneck, you may have a mapping
bottleneck
Identifying Mapping Bottlenecks



Read the thread statistics and work time statistics in the session log. When the
Integration Service spends more time on the transformation thread than the writer or
reader threads, you have a transformation bottleneck. When the Integration Service
spends more time on one transformation, it is the bottleneck in the transformation thread
Analyze performance counters. High errorrows and rowsinlookupcache counters
indicate a mapping bottleneck
Add a Filter transformation before each target definition. Set the filter condition to false
so that no data is loaded into the target tables. If the time it takes to run the new session
is the same as the original session, you have a mapping bottleneck
Eliminating Mapping Bottlenecks
To eliminate mapping bottlenecks, optimize transformation settings in mappings.
3.4.5.4 Session Bottlenecks
If you do not have a source, target, or mapping bottleneck, you may have a session bottleneck.
Small cache size, low buffer memory, and small commit intervals can cause session bottlenecks.
Identifying Session Bottlenecks
To identify a session bottleneck, analyze the performance details. Performance details display
information about each transformation, such as the number of input rows, output rows, and error
rows.
Eliminating Session Bottlenecks
To eliminate session bottlenecks, optimize the session.
28