* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Identifying Source Bottlenecks
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
Performance Tuning & Best Practices for Informatica Developer Version 1. Informatica 8.6 Performance Tuning & Best Practices on Power Center 8.6 1 Performance Tuning & Best Practices for Informatica Developer Version 1. Table of Content 1. Introduction 2. Transformations & Best Practices 1) Source qualifier 2) Expression 3) Lookup 4) Sequence Generator 5) Aggregator 6) Filter 7) Router 8) Joiner 9) Normalizer 10) Sorter 11) Rank 12) Update Strategy 13) External Procedure 14) Store Procedure 15) XML Source Qualifier 16) Union 3. Performance Tuning o Source Query Tuning o Transformation Tuning o Memory Optimization o Session Tuning o Session Partitioning Database Partitioning Hash Partitioning Key Range Pass-through Round-Robin o Pushdown optimization o Identifying & Eliminating Bottlenecks Target Bottleneck Source Bottlenecks Transformations /Mapping Bottlenecks Session Bottlenecks 2 Performance Tuning & Best Practices for Informatica Developer Version 1. 1. Introduction This document will address brief idea on performance tuning to optimize the performance of informatica in real time environment. Also it includes frequently used transformation detail and the best practices that can be used to optimize the performance. It also focus on the other areas like SQL/Database tuning, debugging techniques, parallel processing, pushdown optimization etc. The primary objective of this document to help informatica developer while dealing with different scenarios in Informatica that helps to optimize the performance. 2. Transformations Transformations are the Informatica repository objects that used to build the business logic according to which we can perform ETL. Below is the list of frequently used Informatica transformations. 17) 18) 19) 20) 21) 22) 23) 24) 25) 26) 27) 28) 29) 30) 31) 32) Source qualifier Expression Lookup Sequence Generator Aggregator Filter Router Joiner Normalizer Sorter Rank Update Strategy External Procedure Store Procedure XML Source Qualifier Union 3 Performance Tuning & Best Practices for Informatica Developer Version 1. 2.1 Source Qualifier The Source Qualifier transformation is any data’s first entry point into a mapping. It is used to perform the following tasks: Join data originating from the same source database Filter records when the Informatica Server reads source data. (i.e. a SQL WHERE condition) Specify sorted ports. If you specify a number for sorted ports, the Informatica Server adds an ORDER BY clause to the default SQL query. Select only distinct values from the source. If you choose Select Distinct, the Informatica Server adds a SELECT DISTINCT statement to the default SQL query. Create a custom query (SQL override) to issue a special SELECT statement for the Informatica Server to read source data. The following figure shows joining two tables with one Source Qualifier transformation: Best Practices Only use SQL Overrides if there is a substantial performance gain or complexity decrease. SQL Overrides need to be maintained manually and any changes to the data structure will result in rewriting or modifying the SQL Override. Do use the WHERE condition and SORTED ports in the Source Qualifier if possible, rather than adding a filter or a sorter transformation. Delete unused ports / only connect what is used. Reducing the number of records used throughout the mapping provides better performance by minimizing the amount of data moved. Tune source qualifier queries to return only the data you need. Perform large lookups in the Source Qualifier instead of through a traditional lookup. When applicable, generate the default SQL in the Source Qualifier and use the ‘Validate’ option to verify that the resulting SQL is valid. 4 Performance Tuning & Best Practices for Informatica Developer Version 1. 2.2 Expression The Expression transformation is a passive transformation, used to calculate values in a single row before you write to the target. For example, you might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers. You can use the Expression transformation to perform any non-aggregate calculations. You can also use the Expression transformation to test conditional statements before you output the results to target tables or other transformations. Local variables can be used in Expression transformations and greatly enhance the capabilities of this transformation object. Best Practices Calculate once, use many times. Avoid calculating or testing the same value over and over. Calculate it once in an expression, and set a true/false flag. Within an expression, use variables to calculate a value used several times. Create an anchor expression transformation that will map the source table to an intermediary transformation using the source column names. Do simple processes (LTRIM/RTRIM, string/numeric conversions, testing for NULL, etc.) in this transformation. This will enable an easier transition if the source table changes in the future. Watch your data types. The engine will automatically convert compatible types. Sometimes conversion is excessive and happens on every transformation which slows the mapping. Expression names should begin with "EXP" followed by descriptive words Do not propagate ports out of an Expression transformation if they are not used in the mapping going forward. Group input-outputs ports first, followed by variable ports and then by output ports. Incorrectly ordering the ports in an Expression transformation can lead to errors and/or inaccurate results. If a reusable expression is being used to perform common calculations, consider using User Defined Functions (UDF). UDFs are a new feature in PowerCenter 8.x 2.3 Lookup Transformation The Lookup transformation is used to look up data in a relational table, view, synonym, or flat file. When a lookup is used, the Informatica Server queries the lookup table based on the lookup 5 Performance Tuning & Best Practices for Informatica Developer Version 1. ports in the transformation. It compares Lookup transformation port values to lookup table column values based on the lookup condition. Use the result of the lookup to pass to other transformations and the target. One common error encounter when using the Lookup transformation deals with using the Informatica $Source and $Target variables for the relational connection needed for the lookup. When a bulk loader is used, the $Target variable is not valid and must be replaced with the proper connection. Likewise, care should be taken when using the $Source variable to ensure that the proper database is being queried for results. Lookups are dealt with in greater detail below. Best Practices Avoid using the $Source and $Target variables in the Lookup Connection Information. Connection names have been set up to be generic across Production, Test. If possible, set the Connection Information in the Lookup transformation to one of these non-level specific connection names. Set the connections in the session for ease of migration. Do not include any more ports in the Lookup other than necessary. Reducing the amount of data processed provides better performance. Avoid date time comparisons in lookup; replace with string. Not all sources and targets treat strings with leading or trailing blanks the same. It may be necessary to RTRIM and LTRIM string data prior to using it in a Lookup. Lookups on small tables (<10,000) records can be cached and use ODBC. Lookups on large tables should be cached As a general practice, do not use uncached lookups. In place of lookups, tables can be joined in the source qualifier. However, often necessitates left joins, which can complicate source qualifiers. Weigh performance vs ease of maintenance when deciding between source qualifiers and lookups. When you create more than one lookup condition, place conditions with an equal first in order to optimize lookup performance. Where lookup data does not change frequently, consider using a persistent cache to improve performance. For example, validating state codes in the United States of America. 2.4 Sequence Generator The Sequence Generator transformation generates numeric values and is used to create unique primary key values, replace missing primary keys, or cycle through a sequential range of numbers. It contains two output ports that you can connect to one or more transformations. When NEXTVAL is connected to the input port of another transformation, the Integration Service generates a sequence of numbers. When CURRVAL is connected to the input port of another transformation, the Integration Service generates the NEXTVAL value plus the Increment By value. If you connect the CURRVAL port without connecting the NEXTVAL port, the Integration Service passes a constant value for each row. The Sequence Generator transformation is unique among all transformations because you cannot add, edit, or delete the default ports, NEXTVAL and CURRVAL 6 Performance Tuning & Best Practices for Informatica Developer Version 1. Best Practices Try to use reusable sequence generator than using separate sequence generator if you are using it for generating unique primary key. 2.5 Aggregator The Aggregator transformation allows you to perform aggregate calculations, such as averages and sums. The Aggregator transformation is an active transformation, which means that it can change the number of rows that pass through it, in contrast to the passive Expression transformation. Additionally, Informatica allows for Incremental aggregation. When this feature is used, the repository stores aggregate values so that the target table does not need to be queried during a mapping run. Best Practices Factor out aggregate function calls where possible. SUM(A) + SUM(B) can become SUM(A+B). Therefore, the server only searches through and groups the data once. Do not use Aggregators for simple sorting; use the sorter transformation or the SORTED ports option of the Source Qualifier. Minimize aggregate function calls by using “group by”. Do place Aggregators as early in the mapping as possible, as they reduce the number of records being processed, thereby improving performance. Wherever possible, sort incoming data to an Aggregated use the ‘Sorted input’ option to improve performance. 2.6 Filter Transformation The Filter transformation allows you to filter rows in a mapping. You pass all the rows from a source transformation through the Filter transformation, and then enter a filter condition for the transformation. All ports in a Filter transformation are input/output and only rows that meet the condition pass through the Filter transformation. Best Practices Place Filters as early in the mapping as possible, as they reduce the number of records being processed, thereby improving performance. Use a Filter to screen rows that would be rejected by an update strategy. (Rejected rows from an update strategy are logged to the bad file, decreasing performance.) If you have aggregator transformation in mapping, use filter before aggregation to avoid unnecessary aggregation. If you need to test the same input data based on multiple conditions, consider using a Router transformation instead of creating multiple Filter transformations. When you use a Router transformation, the Integration Service processes incoming data only once. When you use multiple Filter transformations, the Integration Service processes incoming data for each transformation 7 Performance Tuning & Best Practices for Informatica Developer Version 1. 2.7 Router Transformation A Router transformation is used to conditionally test data and route records based upon that conditional test. It is similar to a Filter transformation because both transformations allow you to use a condition to test data, but a Filter transformation tests data for one condition and drops the rows of data that do not meet the condition whereas a Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group. Additionally, a Router transformation allows the programmer to test the same input data for multiple conditions – multiple Filter transformations would be needed to accomplish the same functionality. If multiple routing conditions are needed, the Router transformation should be used in deference to multiple Filters as this is more readable and more efficient since each row need only be tested once. 8 Performance Tuning & Best Practices for Informatica Developer Version 1. Best Practices Routers may not be the best choice if load order of the target(s) is important since it is not possible to control the load order of the legs from a router. The target load method(s) must be carefully chosen when using routers, especially if the data is loading to the same target, in order to avoid table locks and ensure that the data is loaded in the correct order. 2.8 Joiner Transformation The Joiner transformation joins two related heterogeneous sources residing in different locations or file systems. It can also join two tables from the same source. This is generally only done when trying to avoid outer joins in the Source Qualifiers. The two input pipelines include a master pipeline and a detail pipeline or a master and a detail branch. The master pipeline ends at the Joiner transformation, while the detail pipeline continues to the target. One common point of confusion concerns the join types available. Below table summarizes the join types available and their associated behavior: Normal Join The Informatica Server discards all rows of data from the master 9 Performance Tuning & Best Practices for Informatica Developer Version 1. Master Outer Join Detail Outer Join Full Outer Join and detail source that do not match, based on the condition. The Informatica Server keeps all rows of data from the detail source and the matching rows from the master source. It discards the unmatched rows from the master source. Null values are inserted in the data stream where needed. The Informatica Server keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source. Null values are inserted in the data stream where needed. The Informatica Server keeps all rows of data from both the master and detail sources. Null values are inserted in the data stream where needed. Best Practices Whenever possible, perform joins in the database i.e. the source qualifier itself. Whenever possible, sort incoming data to a Joiner transformation and use the ‘Sorted input’ option to improve performance. To improve performance of an unsorted Joiner transformation, designate the source with fewer s as the ‘Master’. 2.8 Normalizer Transformation The Normalizer transformation normalizes records from COBOL and other sources, allowing you to organize data in different formats. The Normalizer transformation is used primarily with COBOL sources, which are often stored in a de-normalized format. The OCCURS statement in a COBOL file nests multiple records of information in a single record. Using the Normalizer transformation, you break out repeated data within a record into separate records. The Normalizer transformation can also be used with relational sources to create multiple rows from a single row of data. VSAM Normalizer transformation. A non-reusable transformation that is a Source Qualifier transformation for a COBOL source. Pipeline Normalizer transformation. A transformation that processes multiple-occurring data from relational tables or flat files. For example, you might have a relational table that stores four quarters of sales by store. You need to create a row for each sales occurrence. You can configure a Normalizer transformation to return a separate row for each quarter. The following source rows contain four quarters of sales by store: Store1 100 300 500 700 Store2 250 450 650 850 The Normalizer returns a row for each store and sales combination. It also returns an index that identifies the quarter number: Store1 100 1 10 Performance Tuning & Best Practices for Informatica Developer Version 1. Store1 300 2 Store1 500 3 Store1 700 4 Store2 250 1 Store2 450 2 Store2 650 3 Store2 850 4 2.9 Sorter Transformation The Sorter transformation is used to sort data. It can sort data from a source transformation in ascending or descending order according to a specified sort key and can be configured for casesensitive sorting and whether the output rows should be distinct. Best Practices Whenever possible, sort source data in the database i.e. the source qualifier itself. Default Sorter cache size is 8MB. If the amount of incoming data is greater than the amount of Sorter cache size, the Integration Service temporarily stores data in the Sorter transformation work directory. For best performance, configure sorter cache size with a value less than or equal to the amount of physical RAM available on the Integration Service machine. 2.10 Rank Transformation The Rank transformation allows you to select only the top or bottom rank of data. You can use a Rank transformation to return the largest or smallest numeric value in a port or group. You can also use a Rank transformation to return the strings at the top or the bottom of a session sort order. The Rank transformation differs from the transformation functions MAX and MIN, in that it lets you select a group of top or bottom values, not just one value. For example, use Rank to select the top 10 salespersons in a given territory. Or, to generate a financial report, you might also use a Rank transformation to identify the three departments with the lowest expenses in salaries and overhead. 11 Performance Tuning & Best Practices for Informatica Developer Version 1. Rank Caches During a session, the Integration Service compares an input row with rows in the data cache. If the input row out-ranks a cached row, the Integration Service replaces the cached row with the input row. If you configure the Rank transformation to rank across multiple groups, the Integration Service ranks incrementally for each group it finds. The Integration Service stores group information in an index cache and row data in a data cache. If you create multiple partitions in a pipeline, the Integration Service creates separate caches for each partition. 2.11 Update Strategy The Update Strategy transformation is used to flag rows for insert, delete, update, or reject. The Update Strategy transformation can check data conditions and use its findings to issue the proper SQL statements to the target database. To implement the Slowly Changing Dimension load strategy Update Strategy is very much useful to flag the incoming record for INSERT, UPDATE, etc. We can capture the transformation rejected records by enabling forward rejected records option. Operation Insert Update Delete Reject Constant Numeric Value DD_INSERT 0 DD_UPDATE 1 DD_DELETE 2 DD_REJECT 3 Best Practices Do not codes update strategies when all rows to the target insert. DO include an update strategy when all rows to the target are update, unless a proof of concept shows that there is performance degradation. This adds clarity to the mapping for future developers. Rejected rows from an update strategy are logged to the bad file. Consider filtering if retaining these rows isn’t critical due to the performance hit caused by logging. Avoid loading to the same target between different data flows where possible. Do use conditional logic in an update strategy as necessary 2.12 External Procedure External Procedure transformations operate in conjunction with procedures you create outside of the Designer interface to extend Informatica’s functionality. Although the standard Informatica transformations provide a wide range of options, there are occasions when the programmer may wish to extend the functionality provided with PowerCenter. External Procedure transformations operate in conjunction with procedures you create outside of the Designer interface to extend PowerCenter functionality. Although the standard transformations provide you with a wide range of options, there are occasions when you might want to extend the functionality provided with PowerCenter. For example, the range of standard transformations, such as Expression and Filter transformations, may not provide the functionality you need. If you are an experienced programmer, you may 12 Performance Tuning & Best Practices for Informatica Developer Version 1. want to develop complex functions within a dynamic link library (DLL) or UNIX shared library, instead of creating the necessary Expression transformations in a mapping. To get this kind of extensibility, use the Transformation Exchange (TX) dynamic invocation interface built into PowerCenter. Using TX, you can create an Informatica External Procedure transformation and bind it to an external procedure that you have developed. You can bind External Procedure transformations to two kinds of external procedures: COM external procedures (available on Windows only) Informatica external procedures (available on Windows, AIX, HP-UX, Linux, and Solaris) To use TX, you must be an experienced C, C++, or Visual Basic programmer.Use multi-threaded code in external procedures. 2.14 Stored Procedure A Stored Procedure transformation is used to call a stored procedure on a relational database. Stored procedures must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the server. There are four ways we can run store procedure transformation Normal: The stored procedure runs where the transformation exists in the mapping on a rowby-row basis. This is useful for calling the stored procedure for each row of data that passes through the mapping, such as running a calculation against an input port. Connected stored procedures run only in normal mode. Pre-load of the Source: Before the session retrieves data from the source, the stored procedure runs. This is useful for verifying the existence of tables or performing joins of data in a temporary table. Post-load of the Source: After the session retrieves data from the source, the stored procedure runs. This is useful for removing temporary tables. Pre-load of the Target: Before the session sends data to the target, the stored procedure runs. This is useful for verifying target tables or disk space on the target system. Post-load of the Target: After the session sends data to the target, the stored procedure runs. This is useful for re-creating indexes on the database. 2.15 XML Source Qualifier The XML Source Qualifier represents the data elements that the Informatica Server reads when it executes a session with XML sources. Each group in an XML source definition is analogous to a relational table, and the Designer treats each group within the XML Source Qualifier transformation as a separate source of data. You can link ports from one group in an XML Source Qualifier transformation to ports in one input group of another transformation. You can copy the columns of several groups 13 Performance Tuning & Best Practices for Informatica Developer Version 1. to one transformation, but you can link the ports of only one group to the corresponding ports in the transformation. You can link multiple groups from one XML Source Qualifier transformation to different input groups in a transformation. You can link multiple groups from an XML Source Qualifier transformation to different input groups in most multiple input group transformations, such as a Joiner or Custom transformations. 2.16 Union Transformation The Union transformation is a multiple input group transformation that you use to merge data from multiple pipelines or pipeline branches into one pipeline branch. It merges data from multiple sources similar to the UNION ALL SQL statement to combine the results from two or more SQL statements. Similar to the UNION ALL statement, the Union transformation does not remove duplicate rows. The Union transformation is a non-blocking multiple input group transformation. You can connect the input groups to different branches in a single pipeline or to different source pipelines. When you add a Union transformation to a mapping, you must verify that you connect the same ports in all input groups. If you connect all ports in one input group, but do not connect a port in another input group, the Integration Service passes NULLs to the unconnected port. The following figure shows a mapping with a Union transformation: 14 Performance Tuning & Best Practices for Informatica Developer Version 1. Best practices: All input groups and the output group must have matching ports. The precision, data type, and scale must be identical across all groups. To remove duplicate rows, you must add transformation such as a Router or Filter transformation. You cannot use a Sequence Generator or Update Strategy transformation upstream from a Union transformation. 3.0 Performance Tuning To achieve required valid data to be extracted, transformed and loaded in timely manner we have to optimize the performance in below areas, SQL/DB Tuning Transformation Tuning Session Tuning Identifying & Eliminating bottleneck 3.1 Source Query Tuning Optimize your DB query with directly running it on DB from the DB tool given by client like TOAD , Teradata SQL administrator. Run the query generated from writer thread in session log directly on DB and check the result. Restructure the Statement as per business need the data to be loaded in target table. Use of DISTINCT must be minimized, DISTINCT always creates SORT. Use ORDER BY and GROUP BY clause where ever necessary. Queries that contain ORDER BY or GROUP BY clauses may benefit from creating an index on the ORDER BY or GROUP BY columns. Use Conditional filter to remove unnecessary data. 3.2 Transformation Tuning Follow the best practices to minimize the transformation errors. 15 Performance Tuning & Best Practices for Informatica Developer Version 1. 3.2.1 Expressions Transformation Simplify nested functions when possible instead of: IIF(condition1,result1,IIF(condition2, result2,IIF… )))))))))))) try: DECODE (TRUE, condition1, result1, : conditionn, resultn) 3.2.2 Optimizing Aggregator Transformation Group by simple columns. Use sorted input Use incremental aggregation. Filter data before you aggregate it. Limit port connections 3.2.3 Optimizing Joiner Transformations Designate the master source as the source with fewer duplicate key values. Designate the master source as the source with fewer rows. Join sorted data when possible Perform joins in a database when possible. 1. Create a pre-session stored procedure to join the tables in a database. 2. Use the Source Qualifier transformation to perform the join 3.2.4 Optimizing Lookup Transformations Use the optimal database driver. Cache lookup tables. Optimize the lookup condition. Filter lookup rows. Index the lookup table. Optimize multiple lookups. Create a pipeline Lookup transformation and configure partitions in the pipeline that builds the lookup source. 3.2.5 Optimizing Sequence Generator Transformations To optimize Sequence Generator transformations, create a reusable Sequence Generator and using it in multiple mappings simultaneously. Also, configure the Number of Cached Values property. 3.2.6 Optimizing Sorter Transformations Allocate enough memory to sort the data Specify a different work directory for each partition in the Sorter transformation 3.2.7 Optimizing Source Qualifier Transformations 16 Performance Tuning & Best Practices for Informatica Developer Version 1. Use the Select Distinct option for the Source Qualifier transformation if you want the Integration Service to select unique values from a source. Use Select Distinct option to filter unnecessary data earlier in the data flow. This can improve performance. 3.3 Memory Optimization 3.3.1 Tuning DTM Buffer It is a temporary storage area for data and has been divided into blocks. This buffer size and block size are tunable, however default setting for each is Auto Default is Auto means DTM estimates optimal size. Tuning the DTM Buffer: Determine the minimum DTM buffer size (DTM buffer size) = (buffer block size) x (minimum number of blocks) / 0.9 Increase by a multiple of the block size If performance does not improve, return to previous setting There is no “formula” for optimal DTM buffer size Auto setting may be adequate for some sessions. 3.3.2 Transformation Caches 17 Performance Tuning & Best Practices for Informatica Developer Version 1. It is temporary storage area for certain transformations and except for Sorter, each is divided into a Data & Index Cache. The size of each transformation cache is tunable. The default setting for each cache is Auto There are four or five transformations that use caches while running, 3.3.2 .1 Aggregator Caches: Unsorted Input Must read all input before releasing any output rows Index cache contains group keys Data cache contains non-group-by ports Sorted Input Releases output row as each input group is processed Does not require data or index cache (both =0) May run much faster than unsorted BUT must consider the expense of sorting Aggregator caches manual tuning: 3.3.2 .2 Joiner Caches Unsorted Input: All master data loaded into cache Specify smaller data set as master Index cache contains join keys Data cache contains non-key connected outputs 18 Performance Tuning & Best Practices for Informatica Developer Version 1. Sorted Input: Both inputs must be sorted on join keys Specify data set with fewest records under a single key as master Index cache contains up to 100 keys Data cache contains non-key connected outputs associated with the 100 keys Joiner Caches manual Tuning 3.3.2 .3 Lookup Caches: Data cache Only connected output ports included in data cache For unconnected lookup, only “return” port included in data cach Index cache size Only lookup keys included in index cache Tuning: SQL override Persistent cache (if the lookup data is static) Optimize sort: 1. Default- lookup keys, then connected output ports in port order 2. Can be commented out or overridden in SQL override 3. Indexing strategy on table may impact performance. Can build lookup caches concurrently 19 Performance Tuning & Best Practices for Informatica Developer Version 1. 1. May improve session performance when there is significant activity upstream from the lookup & the lookup cache is large. 2. This option applies to the individual session Integration Service builds lookup caches at the beginning of the session run, even if no row has entered a Lookup transformation Lookup caches Manual Tuning: 3.3.2 .4 Rank Caches Index cache contains group keys Data cache contains non-group-by ports Cache sizes related to the number of groups & the number of ranks. 20 Performance Tuning & Best Practices for Informatica Developer Version 1. 3.3.2.5 Sorter Cache Sorter Transformation 1. May be faster than a DB sort or 3rd party sorter 2. Index read from RDB = pre-sorted data 3. SQL SELECT DISTINCT may reduce the volume of data across the network versus sorter with “Distinct” property set Single cache (no separation of index & data) If a cache setting is too small, DTM writes overflow to disk. Determine if transformation caches are overflowing: Watch the cache directory on the file system while the session runs Use the session performance counters Options to Tune: Increase the maximum memory allowed for Auto transformation cache sizes Set the cache sizes for individual transformations manually. 3.3.2 .6 Session Performance Counters All transformations have counters. The Integration Service tracks the number of input rows, output rows, and error rows for each transformation. Some transformations have performance counters. Errorrows Readfromcache and Writetocache Readfromdisk and Writetodisk Rowsinlookupcache We can collect session performance detail by enabling ‘collect performance data’ and’ write performance data to repository’ . Below image shows how we can find performance counters for aggregator transformation. 21 Performance Tuning & Best Practices for Informatica Developer Version 1. Options to Tune: Non-0 counts for readfromdisk and writetodisk indicate sub-optimal settings for transformation index or data caches This may indicate the need to tune transformation caches manually Any manual setting allocates memory outside of previously set maximum. Cache Calculators provide guidance in manual tuning of transformation caches. 3.4 Session Tuning 3.4.1 Sequential & Concurrent Batch Sequential Batch: Run session one bye one. Concurrent Batch: Run the session simultaneously. Advantage of Concurrent Batch: It takes informatica server resource and reduces time it takes run session separately. Use this feature when we have multiple sources that process large amount of data in one session. Split session and put into one concurrent batch to complete it quickly. Disadvantage of Concurrent Batch: Required more shared memory otherwise session may get failed. 3.4.2 Run Session on Grid A grid is an alias assigned to a group of nodes that allows you to automate the distribution of workflows and sessions across nodes. Balances the Integration Service workload. Processes concurrent sessions faster. Processes partitions faster. The Integration Service requires CPU resources for parsing input data and formatting the output data. A grid can improve performance when you have a performance bottleneck in the extract and load steps of a session. 22 Performance Tuning & Best Practices for Informatica Developer Version 1. Running a session on a grid can improve throughput because the grid provides more resources to run the session. When you run multiple sessions on a grid, session subtasks share node resources with subtasks of other concurrent sessions. 3.4.3 Session Partitioning Session portioning increase the session performance significantly. Informatica support the following types of portioning. You can define following partition types in workflow manager 1. 2. 3. 4. 5. Round Robin Database Partitioning Pass-through Hash key Key range 1) Database Partitioning The Integration Service queries the IBM DB2 or Oracle system for table partition information. It reads partitioned data from the corresponding nodes in the database. Use database partitioning with Oracle or IBM DB2 source instances on a multi-node table space. Use database partitioning with DB2 targets 2) Hash Partitioning Use hash partitioning when you want the Integration Service to distribute rows to the partitions by group. For example you need to sort items by item ID but you do not know how many items have a particular ID number. Hash Auto Keys: The DTM applies a hash function to a partition key to group data among partitions. Use hash partitioning to ensure that groups of rows are processed in the same partition. Hash User Keys: This is similar to hash auto keys except the user specifies which ports make up the partition key. 3) Key Range You specify one or more ports to form a compound partition key. The Integration Service passes data to each partition depending on the ranges you specify for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range. 4) Pass-through The Integration Service passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance but do not want to change the distribution of data across partitions 23 Performance Tuning & Best Practices for Informatica Developer Version 1. 5) Round-Robin The Integration Service distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows. 3.4.4 Pushdown optimization: Pushdown Optimization which is a new concept in Informatica PowerCentre, allows developers to balance data transformation load among servers. Pushdown Optimization Pushdown optimization is a way of load-balancing among servers in order to achieve optimal performance. Suppose an ETL logic needs to filter out data based on some condition. One can either do it in database by using WHERE condition in the SQL query or inside Informatica by using Informatica Filter transformation. Sometimes, we can even "push" some transformation logic to the target database instead of doing it in the source side How does Push-Down Optimization work? One can push transformation logic to the source or target database using pushdown optimization. The Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the source or the target database which executes the SQL queries to process the transformations. The amount of transformation logic one can push to the database depends on the database, transformation logic, and mapping and session configuration. The Integration Service analyzes the transformation logic it can push to the database and executes the SQL statement generated against the source or target tables, and it processes any transformation logic that it cannot push to the database. Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related to pushdown optimization. For example: Filter Condition used in this mapping is: DEPTNO>40 Suppose a mapping contains a Filter transformation that filters out all employees except those with a DEPTNO greater than 40. The Integration Service can push the transformation logic to the database. It generates the following SQL statement to process the transformation logic: 24 Performance Tuning & Best Practices for Informatica Developer Version 1. INSERT INTO EMP_TGT(EMPNO, ENAME, SAL, COMM, DEPTNO) SELECT EMP_SRC. EMPNO, EMP_SRC.ENAME, EMP_SRC.SAL, EMP_SRC.COMM, EMP_SRC.DEPTNO FROM EMP_SRC WHERE (EMP_SRC.DEPTNO >40) The Integration Service generates an INSERT SELECT statement and it filters the data using a WHERE clause. The Integration Service does not extract data from the database at this time. We can configure pushdown optimization in the following ways: Using source-side pushdown optimization: The Integration Service pushes as much transformation logic as possible to the source database. The Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the source database and executes the corresponding SELECT statement. Using target-side pushdown optimization: The Integration Service pushes as much transformation logic as possible to the target database. The Integration Service analyzes the mapping from the target to the source or until it reaches an upstream transformation it cannot push to the target database. It generates an INSERT, DELETE, or UPDATE statement based on the transformation logic for each transformation it can push to the database and executes the DML. Using full pushdown optimization: The Integration Service pushes as much transformation logic as possible to both source and target databases. If you configure a session for full pushdown optimization, and the Integration Service cannot push all the transformation logic to the database, it performs source-side or target-side pushdown optimization instead. Also the source and target must be on the same database. The Integration Service analyzes the mapping starting with the source and analyzes each transformation in the pipeline until it analyzes the target. When it can push all transformation logic to the database, it generates an INSERT SELECT statement to run on the database. The statement incorporates transformation logic from all the transformations in the mapping. If the Integration Service can push only part of the transformation logic to the database, it does not fail the session, it pushes as much transformation logic to the source and target database as possible and then processes the remaining transformation logic. For example, a mapping contains the following transformations: SourceDefn -> SourceQualifier -> Aggregator -> Rank -> Expression -> TargetDefn SUM(SAL), SUM(COMM) Group by DEPTNO RANK PORT on SAL TOTAL = SAL+COMM The Rank transformation cannot be pushed to the database. If the session is configured for full pushdown optimization, the Integration Service pushes the Source Qualifier 25 Performance Tuning & Best Practices for Informatica Developer Version 1. transformation and the Aggregator transformation to the source, processes the Rank transformation, and pushes the Expression transformation and target to the target database. When we use pushdown optimization, the Integration Service converts the expression in the transformation or in the workflow link by determining equivalent operators, variables, and functions in the database. If there is no equivalent operator, variable, or function, the Integration Service itself processes the transformation logic. The Integration Service logs a message in the workflow log and the Pushdown Optimization Viewer when it cannot push an expression to the database. Use the message to determine the reason why it could not push the expression to the database. 3.4.5 Identifying & Eleminating Bottlenecks Depending upon which thread is busy we can find out the bottlenecks. The first challenge is to identify the bottleneck Target Source Transformations/Mapping Session Tuning the most severe bottleneck may reveal another one. 3.4.5.1 Target Bottleneck The most common performance bottleneck occurs when the Integration Service writes to a target database. Small checkpoint intervals, small database network packet sizes, or problems during heavy loading operations can cause target bottlenecks. To identify a target bottleneck Configure a copy of the session to write to a flat file target. If the session performance increases significantly, you have a target bottleneck. If a session already writes to a flat file target, you probably do not have a target bottleneck. 26 Performance Tuning & Best Practices for Informatica Developer Version 1. Eliminating Target Bottlenecks Drop index and key constraints Perform bulk loading (Ignore DB log) Increase the database network packet size. 3.4.5.2 Source Bottlenecks Performance bottlenecks can occur when the Integration Service reads from a source database. Inefficient query or small database network packet sizes can cause source bottlenecks. Identifying Source Bottlenecks If the session reads from a relational source, use the following methods to identify source bottlenecks. Filter transformation Read test mapping Database query Using a Filter Transformation Add a Filter transformation after each source qualifier. Set the filter condition to false so that no data is processed passed the Filter transformation. If the time it takes to run the new session remains about the same, you have a source bottleneck. Using a Read Test Mapping Make a copy of the original mapping In the copied mapping, keep only the sources, source qualifiers, and any custom joins or queries Remove all transformations Connect the source qualifiers to a file target. Using a Database Query Copy the read query directly from the session log. Execute the query against the source database with a query tool Measure the query execution time and the time it takes for the query to return the first row Eliminating Source Bottlenecks Have the database administrator optimize database performance by optimizing the query Set the number of bytes the Integration Service reads per line if the Integration Service reads from a flat file source 27 Performance Tuning & Best Practices for Informatica Developer Version 1. Increase the database network packet size Configure index and key constraints wherever it’s necessary. 3.4.5.3 Transformations /Mapping Bottlenecks If you determine that you do not have a source or target bottleneck, you may have a mapping bottleneck Identifying Mapping Bottlenecks Read the thread statistics and work time statistics in the session log. When the Integration Service spends more time on the transformation thread than the writer or reader threads, you have a transformation bottleneck. When the Integration Service spends more time on one transformation, it is the bottleneck in the transformation thread Analyze performance counters. High errorrows and rowsinlookupcache counters indicate a mapping bottleneck Add a Filter transformation before each target definition. Set the filter condition to false so that no data is loaded into the target tables. If the time it takes to run the new session is the same as the original session, you have a mapping bottleneck Eliminating Mapping Bottlenecks To eliminate mapping bottlenecks, optimize transformation settings in mappings. 3.4.5.4 Session Bottlenecks If you do not have a source, target, or mapping bottleneck, you may have a session bottleneck. Small cache size, low buffer memory, and small commit intervals can cause session bottlenecks. Identifying Session Bottlenecks To identify a session bottleneck, analyze the performance details. Performance details display information about each transformation, such as the number of input rows, output rows, and error rows. Eliminating Session Bottlenecks To eliminate session bottlenecks, optimize the session. 28