Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Extract, Transform and Load (ETL) refers to a process in database usage and especially in data warehousing that: 1) extracts data from homogeneous or heterogeneous data sources 2) transforms the data for storing it in proper format or structure for querying and analysis purpose 3) loads it into the final target (database, more specifically, operational data store, data mart, or data warehouse). Usually all the three phases execute in parallel since the data extraction takes time, so while the data is being pulled another transformation process executes, processing the already received data and prepares the data for loading and as soon as there is some data ready to be loaded into the target, the data loading kicks off without waiting for the completion of the previous phases. ETL systems commonly integrate data from multiple applications (systems), typically developed and supported by different vendors or hosted on separate computer hardware. The disparate systems containing the original data are frequently managed and operated by different employees. For example a cost accounting system may combine data from payroll, sales and purchasing. 2 3 Commercially available ETL tools Informatica PowerCenter IBM Datastage Ab Initio Microstrategy Oracle Data Integrator (ODI) Microsoft SQL Server Integration Services (SSIS) Pentaho Data Integration (or Kettle) Talend FlyData 4 Oracle Data Integrator ODI Studio Navigators are as follows: Designer defines declarative rules for data transformation and data integrity. All project development takes place in this module; this is where database and application metadata are imported and defined. The Designer module uses metadata and rules to generate data integration scenarios or load plans for production. This is the core module for developers and metadata administrators. Operator manages and monitors data integration processes in production. It is designed for operators and shows execution logs with err or counts, the number of rows processed, execution statistics, the actual code that is executed, and so on. At design time, developers can also use the Operator module for troubleshooting purposes. Topology defines the physical and logical architecture of the infrastructure. The infrastructure or projects administrators register servers, database schemas and catalogs, and agents in the master repository through this module. Security manages user profiles, roles and their privileges. Security can also assign access authorization to objects and features. Security administrators generally use this module. All modules store their information in the centralized repository. 5 Project objects are stored in a Work Repository. Several Work Repositories can coexist in the same installation. This is useful for maintaining separate environments or to reflect a particular versioning lifecycle — for example, development, quality assurance, and production environments. A work repository stores information for: Models (i.e. metadata) — including data stores, columns, data integrity constraints, cross references, data lineage and impact analysis. Projects — including mappings, packages, procedures, folders, knowledge modules, and variables. Runtime information — including scenarios, load plans , scheduling information, and execution logs. 6 7 8