Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Informatica Questions/Answers Level 1 1. How will you test your ETL Mapping? We will test the Informatica mapping with the help of debugger and breakpoint, the tracing level as verbose. 2. What is the purpose of creating slowly changing dimensions (SCD) and logic to create the SCD? It is the capturing the slowly changing data which changes very slowly with respect to the time. For example: the address of a customer may change in rare case. The address of a customer never changes frequently. There are 3 types of scd. Type1 - here the most recent changed data is stored. Type2- here the recent data as well as all past data (historical data) is stored. Type3- here partially historical data and recent data are stored. It means it stores most recent update and most recent history. As data warehouse is a historical data, so type2 is more useful for it. 3. What is source to target mapping and how it is helpful to develop ETL scripts? If we want to populate the data from the source then we will create the source to target mapping. 4. Do you know shell scripting? How comfortable you are to write shell scripts? Yes 5. When we use joiner in informatica and how many types we can join 100(N) tables? 99 Joiners (N-1 transformation) 6. Advantage of star schema? The main advantage of a star schema is optimized performance. A star schema keeps queries simple and provides fast response time because all the information about each level is stored in one row. 7. What is debugger and why do we use this? It is a wizard driven tool that runs a test session. With the help of debugger we can check the source, target and transformations behaviors. We can load or discard the target load. 8. Difference between star and snowflake? Star Schema The main advantage of a star schema is optimized performance. A star schema keeps queries simple and provides fast response time because all the information about each level is stored in one row. SnowFlake Schema The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. For example, a product dimension table in a star schema might be normalized into a Product table, a Product_Category table, and a Product_Manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance 9. Difference between Filter and Router? Filter You can filter rows in a mapping with the Filter transformation. You pass all the rows from a source transformation through the Filter transformation, and then enter a filter condition for the transformation. All ports in a Filter transformation are input/output, and only rows that meet the condition pass through the Filter transformation As an active transformation, the Filter transformation may change the number of rows passed through it A filter condition returns TRUE or FALSE for each row that passes through the transformation, depending on whether a row meets the specified condition. In filter we can have one condition Router A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. A Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. However, a Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group As an active transformation, the router transformation may change the number of rows passed through it In router we can have multiple condition In router we can have multiple condition 10. When do we use update strategy transformation? To update the target table 11. How many types of transformations we have in informatica? How you define these types? A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data. Below are the various transformations available in Informatica: Aggregator Application Source Qualifier Custom Expression Filter Joiner Lookup Normalizer Unstructured Data Rank Router Sequence Generator Sorter Source Qualifier Stored Procedure Transaction Control Union Update Strategy XML Generator XML Parser XML Source Qualifier Union 12. Difference between normalized and denormalized data? Normalized Data Denormalized Data Reduce the redundancy Improve the performance Large number of table Less number of table Many joins Less joins Less amount of data Large amount of data Processing Slow Processing Fast 13. Define the characteristics of a Datawarehouse? It should be Subject Oriented, Time variant, Non volatile and integrated. 14. Contrast between OLTP and Datawarehouse? Oltp DW FEW Indexes MANY Many Joins FEW Normalized Database Duplicate Rare Derived/aggregate Common 15. What are Target Options on the Servers? Bulk load and Normal Load. 16. In a sequential batch can you run the session if previous session fails? Yes, we can run the session if previous session fails. Level 2 17. Why do we create Synonym? So that the user should not be able to connect to the database table directly. 18. How do you improve performance of a lookup? We can improve the lookup performance by using the following methods: Optimizing the lookup condition: If you include more than one lookup condition, place the conditions with an equal sign first to optimize lookup performance. Indexing the lookup table: Create index on the lookup table. The index needs to include every column used in a lookup condition. Reducing the Number of Cached Rows: Use the Lookup SQL Override option to add a WHERE clause to the default SQL statement. This allows you to reduce the number of rows included in the cache. If the lookup source does not change between sessions, configure the Lookup transformation to use a persistent lookup cache. The Power Center Server then saves and reuses cache files from session to session, eliminating the time required to read the lookup source. When using a dynamic lookup and WHERE clause in SQL override. Make sure that you add a filter before the lookup. The filter should remove rows which do not satisfy the WHERE Clause. Reason During dynamic lookups while inserting the records in cache the WHERE clause is not evaluated, only the join condition is evaluated. So, the lookup cache and table are not in sync. Records satisfying the join condition are inserted into lookup cache. It’s better to put a filter before the lookup using WHERE clause so that it contains records satisfying both join condition and where clause. 19. When do we cache lookup table? Do you use a lookup on large table? If so how will you calculate the cache? Use the “Cache Size” property as AUTO. 20. When we use update strategy transformation? Can we update the target without using update strategy transformation in informatica? To update the target table. Yes we can use an expression transformation for updating the target. Also we can use the Post SQL at the session level to update the target table. 21. What kind of indexes you create in Data warehousing? What are the advantages and disadvantages of Index creation? B Tree Index Bitmap Index Unique Index Non unique Index 22. This is a scenario in which the source has 2 cols 10 A 10 A 20 C 30 D 40 E 20 C And there should be 2 targets one to show the duplicate values and another target for distinct rows. T1 T2 10 A 10 A 20 C 20 C 30 D which transformation can be used to load data into target? 40 E. Use an Aggregator Transformation and in the transformation, using count function, try to find out if there are duplicates or not. After aggregator, use a router and route the values based on the count. 23. What are Data driven sessions? The informatica server follows instructions coded into update strategy transformations with in the session mapping to determine how to flag records for insert, update, delete or reject. If you do not choose data driven option setting, the informatica server ignores all update strategy transformations in the mapping. 24. What do you mean by tracing level and their types? For every transformation you will find the tracing level, which determines the amount of information you need in the session log. Normal: Summarized level of information but not to row level. Verbose Initialization: Normal + contains names of index and data files Used and detail transformation information statistics. Verbose Data: Verbose Initialization + it provide row wise information that is passing through the mappings. Terse: Log information as well as error message with the notification of rejected data. 25. Is it possible to populate the fact before populating the Dimension, if yes, how and if no, Why? No, it is not possible to load the fact without loading the dimension. 26. I have a condition Marks =50. I created a router transformation where there are 2 groups one is g1 and another is g2. G1 is marks<=50 and g2 marks >=50, which condition will satisfy first and why? Marks=50 will satisfy both groups because in Informatica all record are checked for all the group conditions, wherever it satisfy the condition it goes to that group. (It is not like if ... else). 27. Your Business user says that revenue report doesn’t tally with the source system report though the ETL process did not fail today, how will you identify the exact problem? We can check the rejected record in the workflow monitor or the session log. 28. Give some scenario where we use pre and post sql? Pre SQl: To drop an Index, To truncate table, To insert some dummy values Post SQL: To recreate the Index, To delete duplicate data 29. Can we join tables in source qualifier, how? Yes, we can use source qualifier for joining tables. However, there are certain limitations. We need to have tables with one common field (Homogenous) to make the join, we can only make 30. By using Filter Transformation, How to pass rows that does not satisfy the condition (discarded rows) to another target? Connect the ports of the filter transformation to the Second target table and enable the 'FORWARD REJECTED ROWS' In the properties of the filter transformation rejected rows will be forwarded to this table. 31. What is Dimensional Modeling? Dimensional data model concept involves two types of tables and it is different from the 3rd normal form. This concept uses Facts table which contains the measurements of the business and Dimension table which contains the context(dimension of calculation) of the measurements. 32. If de-normalized improves data warehouse processes, why fact table is in normal form? Foreign keys of facts tables are primary keys of Dimension tables. It is clear that fact table contains columns which are primary key to other table that itself make normal form table. Level 3 33. When do we use mapplets? Give one scenario where we need to create mapplets? Mapplet is a set of transformation, wherein we get two additional transformation i.e. mapplet in and mapplet out. In between these two we can use other transformation. Mapplet can be reused anywhere in the mapping. If we require the same action to be performed again and again, we use the mapplets. Suppose in our mapping we have to do lookup on a few tables to get certain values. These lookup values are required in several mappings, we may use mapplet. 34. What are the different options we have in informatica to improve performance? The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server. Increase the session performance by following steps: The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance. So avoid network connections. Flat files: If the flat files stored on a machine other than the informatica server, move those files to the machine that consists of informatica server. Relational data sources: Minimize the connections to sources, targets and informatica server to Session performance may improve by moving target database into server system. Staging areas: If you use staging areas you force informatica server to perform multiple data passes. In such cases, Removing of staging areas may improve session performance. You can run multiple informatica servers against the same repository. Distributing the session load to multiple informatica servers may improve session performance. Running the informatica server in ASCII data movement mode improves the session performance. Because ASCII data movement mode stores a character value in one byte. Unicode mode takes 2 bytes to store a character. If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes. We can improve the session performance by configuring the network packet size, which allows data to cross the network at one time. To do this go to server manger, choose server configure database connections. If the target consist key constraints and indexes, slow the loading of data. To improve the session performance in this case drop constraints and indexes before you run the session and rebuild them after completion of session. Running parallel sessions by using concurrent batches will also reduce the time of loading the data. So concurrent batches may also increase the session performance. Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines. In some cases if a session contains a aggregator transformation, one can use incremental aggregation to improve session performance. Avoid transformation errors to improve the session performance. If the session contained lookup transformation you can improve the session performance by enabling the look up cache. If the session contains filter transformation, create that filter transformation nearer to the sources or you can use filter condition in source qualifier. Aggregator, Rank and joiner transformation may often decrease the session performance .Because they must group data before processing it. To improve session performance in this case use sorted ports option. 35. What are the reasons for SQL to take more than the expected time? a. It may be using Cartesian products b. It is doing a full table scans on large tables c. It may not have SQL standards and conventions and it is taking time in parsing d. Lack of indexes on columns contained in the WHERE clause e. Avoid joining too many tables f. Hints are not used appropriately g. The SHARED_CURSOR parameter is not used properly h. Use the Rule-based optimizer if is better than the Cost-based optimizer i. Unnecessary sorting j. Monitor index browning (due to deletions; rebuild as necessary) k. Use compound indexes with care (Do not repeat columns) l. Use different tablespaces for tables and indexes (as a general rule; this is old-school somewhat, but the main point is reduce I/O contention) m. Use table partitioning (and local indexes) when appropriate (partitioning is an extra cost feature) n. Use literals in the WHERE clause (use bind variables) o. Keep statistics up to date 36. What are the methods you follow to fine tune your SQl scripts? Same as above 37. How do you identify the reason for SQL to take more than the expected time? a. To check indexes on column it is composite indexes , (Check composite Primary key)(Check index monitoring), b. To check index types it is bitmap or b-tree index. if bitmap then check the cardinality of the columns on which index belongs. c. To check fetched data refered how many tables, then check the joiners d. To check fetch data direct from table or fetch via view, if view then check view types it is simplex or complex views e. If complex views then check relationship between tables where to view created. f. If materialized views then no need to check tables, fetch data directly from materialized views. 38. How to fine tune session which is taking long time to complete? a. Mapping Level Source Qualifier Transformation Sorted input, Use dynamic Cache in case of Lookup, Index cache, Data cache, In joiner Master should be small, Detail should be Large Update strategy use forward rejected rows. Limiting the Number of Connected Ports b. Session Level – Partition the data (Don’t use Key range) Increase the commit interval Use Incremental Aggregation Don’t collect the performance data Don’t use the Verbose as the Tracing level Cache small lookup tables. Use a persistent lookup cache for static lookups. c. Database Level 39. When you increase the buffer options in informatica? If there are millions of records at the source and the commit interval is increased from the default of 10000 to a higher number, then the buffer size has to be increased. 40. In Normal scenario, when do we get database driver error? There could be several cases where we get this error, commonly when you have defined the target definition which is not matching with the actual target. Secondly, if the key is not defined properly. 41. How will you update a record if the target table doesn’t have any primary key? In this condition we will write an Update Override query in the Target table to update the table. 42. When we are trying to run the workflow, we get an error message “no integration service”, what does this mean and how do we fix this? When we run a session without defining the integration service, we get this error. To resolve this we need to define the integration service. Click on workflow > edit and define the integration service. 43. What are mapplet and their ports? Mapplet is a set of transformation, wherein we get two addition transformation i.e. mapplet in and mapplet out. In between these two we can use other transformation. Mapplet can be reused anywhere in the mapping. However, sequence generator and transaction control transformations are not allowed. 44. How many joiners do we need to join 100 tables? No of tables – 1. 45. Your company has acquired a new company and asked you to upload their customer details in the existing customer database. In your existing database the primary key is numeric and in there it is string, secondly, you have some customer details available in your existing table. How will you handle this situation? We will use a surrogate key. 46. When you extract data from source during night, your extract process keep failing with “snapshot too old” error? Why do we get this error and how will you fix this problem? Snapshot_too_old exception is thrown when we are performing very large DML operation without committing, this can be resolved by increasing undo retention period. Contact the DBA to check for Undo retention period, extend the table space for this segment etc. 47. Can we create a mapping without using a source qualifier? Yes, we can create mapping without using source qualifier if the source is COBOL. We can use normalizer. 48. How can you limit the number of running sessions in a workflow? You can set this limit in informatica server setup properties. By default any sessions can run in parallel, you can set it required number depending on the memory on server machine and the database limit to support connections. 49. You want to attach a file as an e mail attachment from a particular directory using “email task” in informatica. How will you do that? You need to use the “%a” and then define the path of the filename. 50. Join two tables and pull two columns from each to the source qualifier and from source qualifier pull one column to the expression and then generate the SQL in source qualifier. How many columns will show up in the generated sql? It will only generate one column. 51. Which all databases PowerCenter Server on Windows can connect to? PowerCenter Server on Windows can connect to following databases: • IBM DB2 • Informix • Microsoft Access • • • • • Microsoft Excel Microsoft SQL Server Oracle Sybase Teradata 52. 27. Which all databases PowerCenter Server on UNIX can connect to? PowerCenter Server on UNIX can connect to following databases: • IBM DB2 • Informix • Oracle • Sybase • Teradata 53. What is conformed Dimension and what is the use? If a dimension table is connected to more then one Fact table is called confirm dimension. Fact Tables are connected by conformed dimensions, Fact tables cannot be connected directly, so using conformed dimension we can connect two fact tables. 54. What is Materialized view and how is it different from normal view? Materialized views are schema objects that can be used to summarize, precompute, replicate, and distribute data. A materialized view provides indirect access to table data by storing the results of a query in a separate schema object. Unlike an ordinary view, which does not take up any storage space or contain any data? We can schedule to refresh the materialized view automatically to get the data 55. Can we do partition at session level, what are the types? Yes 1. Round-robin 2. Pass-through 3. Hash partitioning 4. Key range