Download Extract, Transform and Load

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Big data wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Transcript
1
Extract, Transform and Load (ETL) refers to a process in database
usage and especially in data warehousing that:
1) extracts data from homogeneous or heterogeneous data sources
2) transforms the data for storing it in proper format or structure for
querying and analysis purpose
3) loads it into the final target (database, more specifically, operational
data store, data mart, or data warehouse).
Usually all the three phases execute in parallel since the data extraction
takes time, so while the data is being pulled another transformation process
executes, processing the already received data and prepares the data for
loading and as soon as there is some data ready to be loaded into the target,
the data loading kicks off without waiting for the completion of the
previous phases.
ETL systems commonly integrate data from multiple applications
(systems), typically developed and supported by different vendors or
hosted on separate computer hardware. The disparate systems containing
the original data are frequently managed and operated by different
employees. For example a cost accounting system may combine data from
payroll, sales and purchasing.
2
3
Commercially available ETL tools
Informatica PowerCenter
IBM Datastage
Ab Initio
Microstrategy
Oracle Data Integrator (ODI)
Microsoft SQL Server Integration Services (SSIS)
Pentaho Data Integration (or Kettle)
Talend
FlyData
4
Oracle Data Integrator
ODI Studio Navigators are as follows:
 Designer defines declarative rules for data transformation and data
integrity. All project development takes place in this module; this is where
database and application metadata are imported and defined. The Designer
module uses metadata and rules to generate data integration scenarios
or load plans for production. This is the core module for developers and
metadata administrators.
 Operator manages and monitors data integration processes in
production. It is designed for operators and shows execution logs with err
or counts, the number of rows processed, execution statistics, the actual
code that is executed, and so on. At design time, developers can also use
the Operator
module for troubleshooting purposes.
 Topology defines the physical and logical architecture of the
infrastructure. The infrastructure or projects administrators register
servers, database schemas and catalogs, and agents in the master repository
through this module.
 Security manages user profiles, roles and their privileges. Security
can also assign access authorization to objects and features. Security
administrators generally use this module.
All modules store their information in the centralized repository.
5
Project objects are stored in a Work Repository. Several Work
Repositories can coexist in the same installation. This is useful for
maintaining separate environments or to reflect a particular versioning
lifecycle — for example, development, quality assurance, and production
environments. A work repository stores information for:
 Models (i.e. metadata) — including data stores, columns, data integrity
constraints, cross references, data lineage and impact analysis.
 Projects — including mappings, packages, procedures, folders,
knowledge modules, and variables.
 Runtime information — including scenarios, load plans , scheduling
information, and execution logs.
6
7
8