Download 1) How do you improve performance of a lookup

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Relational model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database model wikipedia , lookup

Transcript
Informatica Questions/Answers
Level 1
1. How will you test your ETL Mapping?
We will test the Informatica mapping with the help of debugger and breakpoint, the
tracing level as verbose.
2. What is the purpose of creating slowly changing dimensions (SCD) and logic
to create the SCD?
It is the capturing the slowly changing data which changes very slowly with respect to
the time. For example: the address of a customer may change in rare case. The
address of a customer never changes frequently.
There are 3 types of scd.
Type1 - here the most recent changed data is stored.
Type2- here the recent data as well as all past data (historical data) is stored.
Type3- here partially historical data and recent data are stored. It means it stores
most recent update and most recent history.
As data warehouse is a historical data, so type2 is more useful for it.
3. What is source to target mapping and how it is helpful to develop ETL scripts?
If we want to populate the data from the source then we will create the source to
target mapping.
4. Do you know shell scripting? How comfortable you are to write shell scripts?
Yes
5. When we use joiner in informatica and how many types we can join 100(N)
tables?
99 Joiners (N-1 transformation)
6. Advantage of star schema?
The main advantage of a star schema is optimized performance.
A star schema keeps queries simple and provides fast response time because all the
information about each level is stored in one row.
7. What is debugger and why do we use this?
It is a wizard driven tool that runs a test session. With the help of debugger we can
check the source, target and transformations behaviors. We can load or discard the
target load.
8. Difference between star and snowflake?
Star Schema
The main advantage of a star schema is optimized performance.
A star schema keeps queries simple and provides fast response time because all the
information about each level is stored in one row.
SnowFlake Schema
The snowflake schema is a more complex data warehouse model than a star
schema, and is a type of star schema. It is called a snowflake schema because the
diagram of the schema resembles a snowflake
Snowflake schemas normalize dimensions to eliminate redundancy. That is, the
dimension data has been grouped into multiple tables instead of one large table. For
example, a product dimension table in a star schema might be normalized into a
Product table, a Product_Category table, and a Product_Manufacturer table in a
snowflake schema. While this saves space, it increases the number of dimension
tables and requires more foreign key joins. The result is more complex queries and
reduced query performance
9. Difference between Filter and Router?
Filter
You can filter rows in a mapping with the
Filter transformation. You pass all the rows
from a source transformation through the
Filter transformation, and then enter a filter
condition for the transformation. All ports in
a Filter transformation are input/output,
and only rows that meet the condition pass
through the Filter transformation
As an active transformation, the Filter
transformation may change the number of
rows passed through it
A filter condition returns TRUE or FALSE
for each row that passes through the
transformation, depending on whether a
row meets the specified condition.
In filter we can have one condition
Router
A Router transformation is similar to a
Filter transformation because both
transformations allow you to use a
condition to test data. A Filter
transformation tests data for one condition
and drops the rows of data that do not
meet the condition. However, a Router
transformation tests data for one or more
conditions and gives you the option to
route rows of data that do not meet any of
the conditions to a default output group
As an active transformation, the router
transformation may change the number of
rows passed through it
In router we can have multiple condition
In router we can have multiple condition
10. When do we use update strategy transformation?
To update the target table
11. How many types of transformations we have in informatica? How you define
these types?
A transformation is a repository object that generates, modifies, or passes data. The
Designer provides a set of transformations that perform specific functions. For
example, an Aggregator transformation performs calculations on groups of data.
Below are the various transformations available in Informatica:




Aggregator
Application Source Qualifier
Custom
Expression


















Filter
Joiner
Lookup
Normalizer
Unstructured Data
Rank
Router
Sequence Generator
Sorter
Source Qualifier
Stored Procedure
Transaction Control
Union
Update Strategy
XML Generator
XML Parser
XML Source Qualifier
Union
12. Difference between normalized and denormalized data?
Normalized Data
Denormalized Data
Reduce the redundancy
Improve the performance
Large number of table
Less number of table
Many joins
Less joins
Less amount of data
Large amount of data
Processing Slow
Processing Fast
13. Define the characteristics of a Datawarehouse?
It should be Subject Oriented, Time variant, Non volatile and integrated.
14. Contrast between OLTP and Datawarehouse?
Oltp
DW
FEW
Indexes
MANY
Many
Joins
FEW
Normalized
Database
Duplicate
Rare
Derived/aggregate
Common
15. What are Target Options on the Servers?
Bulk load and Normal Load.
16. In a sequential batch can you run the session if previous session fails?
Yes, we can run the session if previous session fails.
Level 2
17. Why do we create Synonym?
So that the user should not be able to connect to the database table directly.
18. How do you improve performance of a lookup?
We can improve the lookup performance by using the following methods:
Optimizing the lookup condition:
If you include more than one lookup condition, place the conditions with an equal
sign first to optimize lookup performance.
Indexing the lookup table:
Create index on the lookup table. The index needs to include every column used in a
lookup condition.
Reducing the Number of Cached Rows:
Use the Lookup SQL Override option to add a WHERE clause to the default SQL
statement. This allows you to reduce the number of rows included in the cache.
If the lookup source does not change between sessions, configure the Lookup
transformation to use a persistent lookup cache. The Power Center Server then
saves and reuses cache files from session to session, eliminating the time required
to read the lookup source.
When using a dynamic lookup and WHERE clause in SQL override. Make sure that
you add a filter before the lookup. The filter should remove rows which do not satisfy
the WHERE Clause.
Reason
During dynamic lookups while inserting the records in cache the WHERE clause is
not evaluated, only the join condition is evaluated. So, the lookup cache and table
are not in sync. Records satisfying the join condition are inserted into lookup cache.
It’s better to put a filter before the lookup using WHERE clause so that it contains
records satisfying both join condition and where clause.
19. When do we cache lookup table? Do you use a lookup on large table? If so
how will you calculate the cache?
Use the “Cache Size” property as AUTO.
20. When we use update strategy transformation? Can we update the target
without using update strategy transformation in informatica?
To update the target table. Yes we can use an expression transformation for
updating the target. Also we can use the Post SQL at the session level to update the
target table.
21. What kind of indexes you create in Data warehousing? What are the
advantages and disadvantages of Index creation?
B Tree Index
Bitmap Index
Unique Index
Non unique Index
22. This is a scenario in which the source has 2 cols 10 A 10 A 20 C 30 D 40 E 20 C
And there should be 2 targets one to show the duplicate values and another
target for distinct rows. T1 T2 10 A 10 A 20 C 20 C 30 D which transformation
can be used to load data into target? 40 E.
Use an Aggregator Transformation and in the transformation, using count function,
try to find out if there are duplicates or not. After aggregator, use a router and route
the values based on the count.
23. What are Data driven sessions?
The informatica server follows instructions coded into update strategy
transformations with in the session mapping to determine how to flag records for
insert, update, delete or reject. If you do not choose data driven option setting, the
informatica server ignores all update strategy transformations in the mapping.
24. What do you mean by tracing level and their types?
For every transformation you will find the tracing level, which determines the amount
of information you need in the session log.
Normal: Summarized level of information but not to row level.
Verbose Initialization: Normal + contains names of index and data files Used and
detail transformation information statistics.
Verbose Data: Verbose Initialization + it provide row wise information that is passing
through the mappings.
Terse: Log information as well as error message with the notification of rejected data.
25. Is it possible to populate the fact before populating the Dimension, if yes, how
and if no, Why?
No, it is not possible to load the fact without loading the dimension.
26. I have a condition Marks =50. I created a router transformation where there are
2 groups one is g1 and another is g2. G1 is marks<=50 and g2 marks >=50,
which condition will satisfy first and why?
Marks=50 will satisfy both groups because in Informatica all record are checked for
all the group conditions, wherever it satisfy the condition it goes to that group. (It is
not like if ... else).
27. Your Business user says that revenue report doesn’t tally with the source
system report though the ETL process did not fail today, how will you identify
the exact problem?
We can check the rejected record in the workflow monitor or the session log.
28. Give some scenario where we use pre and post sql?
Pre SQl: To drop an Index, To truncate table, To insert some dummy values
Post SQL: To recreate the Index, To delete duplicate data
29. Can we join tables in source qualifier, how?
Yes, we can use source qualifier for joining tables. However, there are certain
limitations. We need to have tables with one common field (Homogenous) to make
the join, we can only make
30. By using Filter Transformation, How to pass rows that does not satisfy the
condition (discarded rows) to another target?
Connect the ports of the filter transformation to the Second target table and enable
the 'FORWARD REJECTED ROWS' In the properties of the filter transformation
rejected rows will be forwarded to this table.
31. What is Dimensional Modeling?
Dimensional data model concept involves two types of tables and it is different from
the 3rd normal form. This concept uses Facts table which contains the
measurements of the business and Dimension table which contains the
context(dimension of calculation) of the measurements.
32. If de-normalized improves data warehouse processes, why fact table is in
normal form?
Foreign keys of facts tables are primary keys of Dimension tables. It is clear that fact
table contains columns which are primary key to other table that itself make normal
form table.
Level 3
33. When do we use mapplets? Give one scenario where we need to create
mapplets?
Mapplet is a set of transformation, wherein we get two additional transformation i.e.
mapplet in and mapplet out. In between these two we can use other transformation.
Mapplet can be reused anywhere in the mapping. If we require the same action to be
performed again and again, we use the mapplets. Suppose in our mapping we have
to do lookup on a few tables to get certain values. These lookup values are required
in several mappings, we may use mapplet.
34. What are the different options we have in informatica to improve performance?
The goal of performance tuning is optimize session performance so sessions run
during the available load window for the Informatica Server. Increase the session
performance by following steps:
The performance of the Informatica Server is related to network connections. Data
generally moves across a network at less than 1 MB per second, whereas a local
disk moves data five to twenty times faster. Thus network connections often affect on
session performance. So avoid network connections.
Flat files: If the flat files stored on a machine other than the informatica server, move
those files to the machine that consists of informatica server.
Relational data sources: Minimize the connections to sources, targets and
informatica server to
Session performance may improve by moving target database into server system.
Staging areas: If you use staging areas you force informatica server to perform
multiple data passes.
In such cases, Removing of staging areas may improve session performance.
You can run multiple informatica servers against the same repository. Distributing the
session load to multiple informatica servers may improve session performance.
Running the informatica server in ASCII data movement mode improves the session
performance. Because ASCII data movement mode stores a character value in one
byte. Unicode mode takes 2 bytes to store a character.
If a session joins multiple source tables in one Source Qualifier, optimizing the query
may improve performance. Also, single table select statements with an ORDER BY
or GROUP BY clause may benefit from optimization such as adding indexes.
We can improve the session performance by configuring the network packet size,
which allows data to cross the network at one time. To do this go to server manger,
choose server configure database connections.
If the target consist key constraints and indexes, slow the loading of data. To
improve the session performance in this case drop constraints and indexes before
you run the session and rebuild them after completion of session.
Running parallel sessions by using concurrent batches will also reduce the time of
loading the data. So concurrent batches may also increase the session performance.
Partitioning the session improves the session performance by creating multiple
connections to sources and targets and loads data in parallel pipe lines.
In some cases if a session contains a aggregator transformation, one can use
incremental aggregation to improve session performance.
Avoid transformation errors to improve the session performance.
If the session contained lookup transformation you can improve the session
performance by enabling the look up cache.
If the session contains filter transformation, create that filter transformation nearer to
the sources or you can use filter condition in source qualifier.
Aggregator, Rank and joiner transformation may often decrease the session
performance .Because they must group data before processing it. To improve
session performance in this case use sorted ports option.
35. What are the reasons for SQL to take more than the expected time?
a. It may be using Cartesian products
b. It is doing a full table scans on large tables
c. It may not have SQL standards and conventions and it is taking time in
parsing
d. Lack of indexes on columns contained in the WHERE clause
e. Avoid joining too many tables
f. Hints are not used appropriately
g. The SHARED_CURSOR parameter is not used properly
h. Use the Rule-based optimizer if is better than the Cost-based optimizer
i. Unnecessary sorting
j. Monitor index browning (due to deletions; rebuild as necessary)
k. Use compound indexes with care (Do not repeat columns)
l. Use different tablespaces for tables and indexes (as a general rule; this is
old-school somewhat, but the main point is reduce I/O contention)
m. Use table partitioning (and local indexes) when appropriate (partitioning is
an extra cost feature)
n. Use literals in the WHERE clause (use bind variables)
o. Keep statistics up to date
36. What are the methods you follow to fine tune your SQl scripts?
Same as above
37. How do you identify the reason for SQL to take more than the expected time?
a. To check indexes on column it is composite indexes , (Check composite
Primary key)(Check index monitoring),
b. To check index types it is bitmap or b-tree index. if bitmap then check the
cardinality of the columns on which index belongs.
c. To check fetched data refered how many tables, then check the joiners
d. To check fetch data direct from table or fetch via view, if view then check
view types it is simplex or complex views
e. If complex views then check relationship between tables where to view
created.
f. If materialized views then no need to check tables, fetch data directly
from materialized views.
38. How to fine tune session which is taking long time to complete?
a. Mapping Level
Source Qualifier Transformation
Sorted input,
Use dynamic Cache in case of Lookup,
Index cache,
Data cache,
In joiner Master should be small, Detail should be Large
Update strategy use forward rejected rows.
Limiting the Number of Connected Ports
b. Session Level –
Partition the data (Don’t use Key range)
Increase the commit interval
Use Incremental Aggregation
Don’t collect the performance data
Don’t use the Verbose as the Tracing level
Cache small lookup tables.
Use a persistent lookup cache for static lookups.
c. Database Level
39. When you increase the buffer options in informatica?
If there are millions of records at the source and the commit interval is increased
from the default of 10000 to a higher number, then the buffer size has to be
increased.
40. In Normal scenario, when do we get database driver error?
There could be several cases where we get this error, commonly when you have
defined the target definition which is not matching with the actual target. Secondly, if
the key is not defined properly.
41. How will you update a record if the target table doesn’t have any primary key?
In this condition we will write an Update Override query in the Target table to update
the table.
42. When we are trying to run the workflow, we get an error message “no
integration service”, what does this mean and how do we fix this?
When we run a session without defining the integration service, we get this error. To
resolve this we need to define the integration service. Click on workflow > edit and
define the integration service.
43. What are mapplet and their ports?
Mapplet is a set of transformation, wherein we get two addition transformation i.e.
mapplet in and mapplet out. In between these two we can use other transformation.
Mapplet can be reused anywhere in the mapping. However, sequence generator and
transaction control transformations are not allowed.
44. How many joiners do we need to join 100 tables?
No of tables – 1.
45. Your company has acquired a new company and asked you to upload their
customer details in the existing customer database. In your existing database
the primary key is numeric and in there it is string, secondly, you have some
customer details available in your existing table. How will you handle this
situation?
We will use a surrogate key.
46. When you extract data from source during night, your extract process keep
failing with “snapshot too old” error? Why do we get this error and how will
you fix this problem?
Snapshot_too_old exception is thrown when we are performing very large DML
operation without committing, this can be resolved by increasing undo retention
period. Contact the DBA to check for Undo retention period, extend the table space
for this segment etc.
47. Can we create a mapping without using a source qualifier?
Yes, we can create mapping without using source qualifier if the source is COBOL.
We can use normalizer.
48. How can you limit the number of running sessions in a workflow?
You can set this limit in informatica server setup properties. By default any sessions
can run in parallel, you can set it required number depending on the memory on
server machine and the database limit to support connections.
49. You want to attach a file as an e mail attachment from a particular directory
using “email task” in informatica. How will you do that?
You need to use the “%a” and then define the path of the filename.
50. Join two tables and pull two columns from each to the source qualifier and
from source qualifier pull one column to the expression and then generate the
SQL in source qualifier. How many columns will show up in the generated sql?
It will only generate one column.
51. Which all databases PowerCenter Server on Windows can connect to?
PowerCenter Server on Windows can connect to following databases:
•
IBM DB2
•
Informix
•
Microsoft Access
•
•
•
•
•
Microsoft Excel
Microsoft SQL Server
Oracle
Sybase
Teradata
52. 27. Which all databases PowerCenter Server on UNIX can connect to?
PowerCenter Server on UNIX can connect to following databases:
•
IBM DB2
•
Informix
•
Oracle
•
Sybase
•
Teradata
53. What is conformed Dimension and what is the use?
If a dimension table is connected to more then one Fact table is called confirm
dimension.
Fact Tables are connected by conformed dimensions, Fact tables cannot be
connected directly, so using conformed dimension we can connect two fact tables.
54. What is Materialized view and how is it different from normal view?
Materialized views are schema objects that can be used to summarize, precompute,
replicate, and distribute data.
A materialized view provides indirect access to table data by storing the results of a
query in a separate schema object. Unlike an ordinary view, which does not take up
any storage space or contain any data?
We can schedule to refresh the materialized view automatically to get the data
55. Can we do partition at session level, what are the types?
Yes
1. Round-robin
2. Pass-through
3. Hash partitioning
4. Key range