Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Data Warehousing Tyler Helmle Introduction Definition History Processes and Terms Management Techniques Definition Collection of Data Subject Oriented Integrated Time-Variant Non-Volatile To Support Managements Decision Making Process Subject Oriented Particular Subject Instead of Companies Ongoing Operations Integrated Gathered from variety of Sources Merged into a Whole Time-Variant Specific Time Period Monthly Weekly Daily Etc. Non-Volatile Stable Never Remove Data Consistent Picture of Business Definition Bill Inmon Single Subject 1990 Data Mart Possible to be Volatile Size Contraints Rolling Lengths Definition Ralph Kimball “a copy of transaction data specifically structured for query and analysis.” Less insight Less depth Not Less Accurate History Concept Originated (Late 1980’s) IBM Researchers Barry Delvin and Paul Murphy “Business Data Warehouse” Operational Systems to Decision Support History Trends Corporations had multiple Decision Support Systems Expensive Redundant Get to one efficient warehouse Data Marts History Bill Inmon “Father of the Data Warehouse” Building of the Data Warehouse 1991 Taught Data Warehouse Design Designing and Rolling Out Warehouse Surajit Chaudhuri and Umeshwar Dayal Design Define the architecture, do capacity planning, and select the storage servers, database and OLAP servers, and tools Integrate the servers, storage, and client tools Design the warehouse schema and views Define the physical warehouse organization, data placement, partitioning, and access methods Design (Cont.) Connect the sources using gateways, ODBC drivers, or other wrappers Design and implement scripts for data extraction, cleaning, transformation, load, and refresh Populate the repository with the schema and view definitions, scripts, and other metadata Design (Cont.) Design and implement end-user applications Roll out the warehouse and applications Design (Cont.) Database Designs Multidimensional Approach Normalized Approach Data Warehouse Raw Data Business will Decide Department? Market Data? Customer Data? Data Granularity Data Warehouse Data Extracting Extract from Source Systems Database Flat Files Converts into format for transformation Data Transforming Data Validation Difficult Common Problems Occur Data Transforming Selection of Columns Translation of Data Make all data consistent Derive New Calculated Value Filtering Sorting Joining of Data Data Transforming Transposing Splitting Columns Data Transforming Tools Data Migration Data Scrubbing Transformation of Data Domain Specific Knowledge Fixing or Eliminating Data Data Auditing Patterns Inconsistencies Data Load Batch Processing Visibility and Management Administrator Start, Cancel, Suspend, Resume Load Large Quick and Efficient Data Load Parallel Processing Splitting Data Files to provide Parallel Access Pipeline Component Simultaneous running of several components Simultaneous running of multiple processes All three types usually operate at same time Data Refresh When? How? Data Warehouse Data Mart Subset of Data Major Data Subject Increase Performance Separate Security Prove ROI before adding to Data Warehouse Data Mart Ease of Creation Lower Cost than Data Warehouse Data Mart Cons Limited Scalability Duplication of Data Data Inconsistency Low ability to leverage enterprise sources of data Data Warehouse Metadata “Data about Data” Clarifies Data Explains Data Helps Navigate Through Data Facilitates (Both Human and Computers) Understanding Usage Management Metadata Different from Data Sometimes can be both Point of view Types Descriptive Administrative Structural Technical Use Metadata Content Mutability Describe Resource Describe Content Immutable Mutable Logical Function Sub-symbolic Symbolic Logical Metadata Back End Extract, Transform, Load Front End Label Screens Create Reports Data Warehouse End Product Analysis Querying Reporting Data Mining Data Mining Requires Large Amount of Data Determine Patterns in Data Determine Relationships in Data Analyze Results and Conclude Standard Reporting Weekly/Monthly/Yearly Same Constant Data Recent Information Arrives at Expected Time on Expected Medium Configurable Reports Select Parameters to Show on Report Ex. Time Period Region Support of Data Remains the Same Ad Hoc Reporting End Users Pick and Choose Data Requires Training Knowledge of Data Do Not Want Users Miss Using the Data Dashboards / Scorecards Like Standard Reports Summarized Easy to Read Graphical Data Warehouse Data Warehouse Management Data Warehouse Administrator Visibility Knowledge of what is happening Loading, Extracting, Etc. Data Warehouse Management Responsibilities What is data being used for? Who is using the data? What is the response time? What kinds of activities are being submitted? Data Warehouse Management Data Management (Data Monitor) Counts of Data Analyzes Profiles of Data Determines Threshold of Data Makes DWA’s job easier Data Warehouse Management Security Confidential? Access Restriction Encryption Decryption Pros One Common Data Source Ease of Reporting and Analysis Inconsistencies Identified Right Away Retrieve Data Without Slowing Down Operational Systems Show Actual Performance Cons Data Is Not Always Most Current High Costs Installation Maintenance Support Hard To Stay Ahead Summary Data Warehousing can prove to be a success Common Knowledge of a database technology As technology and research continues, the possibilities are endless Questions?