Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Decision Support and Date Warehouse Jingyi Lu Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform, and Load (ETL) Decision Support System Information technology to help the knowledge worker(executive, manager, analyst) make faster and better decisions. – What were the sales volumes by region and product category for the last year? – Which orders should we fill to maximize revenues? – Will a 10% discount increase sales volume sufficiently? Decision Support Systems Created to facilitate the decision making process So much information that it is difficult to extract it all from a traditional database Need for a more comprehensive data storage facility Data Warehouse Decision Support Systems Extract Information from data to use as the basis for decision making Used at all levels of the Organization Tailored to specific business areas Ad Hoc queries to retrieve and display information Combines historical operation data with business activities Decision Support Systems OLAP vs. OLTP OLTP (On-line Transaction Processing): is characterized by a large number of short on-line transactions .-----> Operational database OLAP (On-line Analytical Processing):is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations.------> Data Warehouse OLAP vs. OLTP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date detailed, flat relational isolated repetitive historical, summarized, multidimensional integrated, consolidated ad-hoc lots of scans unit of work read/write index/hash on prim. key short, simple transaction # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response usage access complex query What is a Data Warehouse The repository for the DSS is the DATA WAREHOUSE Definition: Integrated, Subject-Oriented, Time-Variant, Nonvolatile database that provides support for decision making. Integrated The data warehouse is a centralized, consolidated database that integrated data derived from the entire organization Multiple Sources Diverse Sources Diverse Formats Subject-Oriented Data is arranged and optimized to provide answer to questions from diverse functional areas Data is organized and summarized by topic Sales / Marketing / Finance / Distribution / Etc. Time-Variant The Data Warehouse represents the flow of data through time Can contain projected data from statistical models Data is periodically uploaded then timedependent data is recomputed Nonvolatile Once data is entered it is NEVER removed Represents the company’s entire history Near term history is continually added to it Always growing Must support terabyte databases and multiprocessors Read-Only database for data analysis and query processing Dimensional Modeling Dimension dimension is a data element that categorizes each item in a data set into non-overlapping regions Facts a value or measurement, which represents a fact about the managed entity or system. typically numeric values that can be aggregated Dimensional Modeling Database is a set of facts (points) in a multidimensional space Fact tables contains business facts or measures and foreign keys which refer to primary keys in the dimension tables Dimension tables Each dimension table has a set of attributes e.g., Day, Month, Year of Date Attributes of a dimension may be related by partial order Hierarchy: e.g., Day > Month > Year Example of Star Schema Example of Snowflake Schema ETL ETL Extraction Transformation Loading – ETL To get data out of the source and load it into the data warehouse – simply a process of copying data from one database to other Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database Many data warehouses also incorporate data from nonOLTP systems such as text files, legacy systems, and spreadsheets; such data also requires extraction, transformation, and loading When defining ETL for a data warehouse, it is important to think of ETL as a process, not a physical implementation ETL ETL is often a complex combination of process and technology that consumes a significant portion of the data warehouse development efforts and requires the skills of business analysts, database designers, and application developers It is not a one time event as new data is added to the Data Warehouse periodically – monthly, daily, hourly Because ETL is an integral, ongoing, and recurring part of a data warehouse Automated Well documented Easily changeable ETL Staging Database ETL operations should be performed on a relational database server separate from the source databases and the data warehouse database Creates a logical and physical separation between the source systems and the data warehouse Minimizes the impact of the intense periodic ETL activity on source and data warehouse databases