* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PowerPoint - Clement Chang Web
Survey
Document related concepts
Operational transformation wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Clusterpoint wikipedia , lookup
Data center wikipedia , lookup
Forecasting wikipedia , lookup
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
Database model wikipedia , lookup
3D optical data storage wikipedia , lookup
Transcript
Data Warehouse Tools and Technologies - ETL By: Issarachevawat, Raynoo Romieh, Christian Wongkamolchun, Siri Zhang, Ying What Is ETL? Extract -- the process of reading data from a outer database. Transform -- the process of converting extracted data to a form useable by the target database. – Occurs by using rules or lookup tables or by combining the data with other data. Load -- the process of writing the data into the target database. What does ETL do? Extracts data from multiple data sources Migrates data from one DB to another Converts DB from one format or type to another. Transforms the data to make it accessible to business analysis Forms data marts and data warehouses Enables loading of multiple target databases Performs at least three specific functions – reads data from an input source ; – passes the stream of information through either an ETL engine- or code-based process to modify, enhance, or eliminate data elements based on the instructions of the job; – writes the resultant data set back out to a flat file, relational table, etc. What can ETL be used? To acquire a temporary subset of data (like a VIEW) for reports or other purposes. a more permanent data set may be acquired for other purposes such as: the population of a data mart or data warehouse ETL SYSTEM Operational Data OLAP End Users Local Data Marts OLAP End Users Local Data Marts OLAP End Users Local Data Marts OLAP End Users Local Data Marts ETL Engine Outer Sources Extract Transform Load Filter Different vendor Different format Data Warehouse Data extracted from the data warehouse provide faster processing Issues that are key to an effective ETL tool Scheduling and job dependencies: particularly relies on graphical environment. Session nesting: When developing an ETL session for a particular part of the system, nesting eliminates duplicate development. Robust SQL support: Increases speed over using code to read and write to a database. Version management: enables quick roll back rather than manually making code changes. In many cases, the DB’s version control may not work on the ETL. Key Issues … (Cont’d) Debugging functionality: very useful for developer support. ETL should rely on underlying database security. Transformation capabilities vs. cleansing capabilities: seldom very strong in both. Metadata support: must work with the overall metadata strategy. Current ETL Market Share Total Market Share: $667 Million ETL Evaluation Throughout the following sections, each of the vendors and their ETL products are evaluated, focusing on primary differences between such products. Ascential Software Formed in July 2001 Focuses on improving, developing, and perfecting their ETL and “back-end” tools Do not have current plans of entering the BI tool market. The Ascential DataStage product family – highly scalable ETL solution – uses end-to-end metadata management and data quality assurance functions. – can create and manage scalable, complex data integration for enterprise applications such as CRM, ERP, SCM, BI/analytics, E-business and data warehouses. Cognos Corporation Founded in 1969 Prefers that all components of the enterprise data warehouse are Cognos Products – DecisionStream easily integrates with Cognos BI tools, etc. – has difficulty integrating with other vendor Products. DecisionStream is powerful ETL software – Allows users to extract and unite data from disparate sources and deliver coordinated Business Intelligence across your organization. – includes advanced data merging, aggregation and transformation capabilities: let users unite data from different sources, and transform it into information using best-practices dimensional design. Informatica PowerConnect An extension to Informatica PowerCenter, and PowerCenterRT data integration software. Eliminates the need for customers to manually code data extraction programs for their enterprise applications. Ensures that mission-critical operational data can be effectively used to inform key business decisions across the enterprise. Allows companies to directly source and integrate: ERP CRM Real-time message queue Mainframe AS/400 Remote data Metadata with other enterprise data and deliver it to: Data warehouses Operational data stores Business intelligence tools Packaged analytic applications. Conclusion Issues analyzed: – – – – – development environments version control Securities metadata exchanges standards Cost Cognos could not compete – based on the relative youth – limitations of ETL tools. unable to show support for version or revision control security provided by the underlying database, favors non-Cognos Products. The ETL tools presented by Ascential and Informatica are comparable in numerous ways it would be best to select Informatica as an ETL vendor. – more mature and stable as a company – more comprehensive ETL at an efficient price. Questions? For Copies of the paper, Please email Christian Romieh, [email protected]