Download PowerPoint - Clement Chang Web

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Operational transformation wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Clusterpoint wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

SAP IQ wikipedia , lookup

Forecasting wikipedia , lookup

Data analysis wikipedia , lookup

Information privacy law wikipedia , lookup

Database model wikipedia , lookup

3D optical data storage wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
Data Warehouse Tools and
Technologies - ETL
By:
Issarachevawat, Raynoo
Romieh, Christian
Wongkamolchun, Siri
Zhang, Ying
What Is ETL?
Extract -- the process of reading data from a outer database.
Transform -- the process of converting extracted data to a
form useable by the target database.
– Occurs by using rules or lookup tables or by combining the data with
other data.
Load -- the process of writing the data into the target database.
What does ETL do?
 Extracts data from multiple data sources
 Migrates data from one DB to another
 Converts DB from one format or type to another.
 Transforms the data to make it accessible to business
analysis
 Forms data marts and data warehouses
 Enables loading of multiple target databases
 Performs at least three specific functions
– reads data from an input source ;
– passes the stream of information through either an ETL
engine- or code-based process to modify, enhance, or
eliminate data elements based on the instructions of the job;
– writes the resultant data set back out to a flat file, relational
table, etc.
What can ETL be used?

To acquire a temporary subset of data (like a
VIEW) for reports or other purposes.

a more permanent data set may be acquired for
other purposes such as: the population of a
data mart or data warehouse
ETL SYSTEM
Operational Data
OLAP End Users
Local Data Marts
OLAP End Users
Local Data Marts
OLAP End Users
Local Data Marts
OLAP End Users
Local Data Marts
ETL Engine
Outer Sources
Extract
Transform
Load
Filter
Different vendor
Different format
Data Warehouse
Data extracted from the data warehouse
provide faster processing
Issues that are key to an effective ETL tool
 Scheduling and job dependencies: particularly relies
on graphical environment.
 Session nesting: When developing an ETL session for
a particular part of the system, nesting eliminates
duplicate development.
 Robust SQL support: Increases speed over using code
to read and write to a database.
 Version management: enables quick roll back rather
than manually making code changes. In many cases,
the DB’s version control may not work on the ETL.
Key Issues … (Cont’d)

Debugging functionality: very useful for developer
support.

ETL should rely on underlying database security.

Transformation capabilities vs. cleansing capabilities:
seldom very strong in both.

Metadata support: must work with the overall
metadata strategy.
Current ETL Market Share
Total Market Share: $667 Million
ETL Evaluation
Throughout the following sections, each of the vendors and their ETL
products are evaluated, focusing on primary differences between such
products.
Ascential Software
 Formed in July 2001
 Focuses on improving, developing, and perfecting their ETL and
“back-end” tools
 Do not have current plans of entering the BI tool market.
 The Ascential DataStage product family
– highly scalable ETL solution
– uses end-to-end metadata management and data quality assurance
functions.
– can create and manage scalable, complex data integration for enterprise
applications such as CRM, ERP, SCM, BI/analytics, E-business and data
warehouses.
Cognos Corporation
 Founded in 1969
 Prefers that all components of the enterprise data
warehouse are Cognos Products
– DecisionStream easily integrates with Cognos BI tools, etc.
– has difficulty integrating with other vendor Products.
 DecisionStream is powerful ETL software
– Allows users to extract and unite data from disparate sources
and deliver coordinated Business Intelligence across your
organization.
– includes advanced data merging, aggregation and
transformation capabilities: let users unite data from different
sources, and transform it into information using best-practices
dimensional design.
Informatica PowerConnect
 An extension to Informatica PowerCenter, and PowerCenterRT data integration
software.
 Eliminates the need for customers to manually code data extraction programs
for their enterprise applications.
 Ensures that mission-critical operational data can be effectively used to inform
key business decisions across the enterprise.
 Allows companies to directly source and integrate:







ERP
CRM
Real-time message queue
Mainframe
AS/400
Remote data
Metadata
with other enterprise data and deliver it to:




Data warehouses
Operational data stores
Business intelligence tools
Packaged analytic applications.
Conclusion
 Issues analyzed:
–
–
–
–
–
development environments
version control
Securities
metadata exchanges standards
Cost
 Cognos could not compete
– based on the relative youth
– limitations of ETL tools.
 unable to show support for version or revision control
 security provided by the underlying database, favors non-Cognos Products.
 The ETL tools presented by Ascential and Informatica are comparable
in numerous ways
 it would be best to select Informatica as an ETL vendor.
– more mature and stable as a company
– more comprehensive ETL at an efficient price.
Questions?
For Copies of the paper, Please email
Christian Romieh, [email protected]