Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS 338 Data Warehousing and Business Analytics Bojana Bislimovska Winter 2016 Major research Outline • Terminology • Data Warehouse Characteristics • Comparison with Relational DBs • Data Models for Data Warehouses • Data Warehouse Functionality Data Warehouse Terminology Major research • Data Warehouse Collection of information originating from multiple databases Allows complex analysis, knowledge discovery, and decision making based on historical data • Supported applications OLAP (Online Analytical Processing) –analysis of complex data from the data warehouse • Enables quick and straightforward querying of analytical data stored in data warehouses DSS (Decision-Support Systems) • Also known as EIS (Executive Information Systems) • Provides data and tools for complex decision-making Data mining • Knowledge discovery: searching data for unanticipated new knowledge Data Warehouse Terminology Major research • Data Warehousing – collection of decision-support technologies aimed at enabling the knowledge worker (executive manager, analyst) to make better and faster decisions • Online Transaction Processing (OLTP) – supported by traditinal DBs Include data modifications Query requirements • OLTP cannot be optimized for OLAP, DSS or data mining Data Warehouse Characteristics Major research • Information in data warehouse is typically not subject to modification Periodic updates – data refreshed incrementally • Warehouse insertions are handled by the ETL (extract, transform, load) process Reformatting of data before loading them into the warehouse • Encompass large volumes of data –order of magnitude larger than the source DBs Data Warehouses vs Relational DBs Major research • Operations Data warehouses optimized to find data correlations and to support trend analyses Traditional databases are transactional: optimized for access, update, and integrity assurance Data warehouses are less volatile than relational DBs. • Data currency Relational DBs required to maintain up-to-date, detailed data Data warehouses characterized by historical data I for atio i data arehouse is relati ely coarse grai ed ie fro ft. a d refresh policy is carefully chose , usually i cre e tal. • Data volume Data warehouses may be exceptionally large (7 years of records) • Data warehouse can be interpreted as a (special) view of the data , Classification of Data Warehouses Major research • The sheer volume of data is an issue, based on which Data Warehouses could be classified as follows. Enterprise-wide data warehouses • Huge projects requiring massive investment of time and resources Virtual data warehouses • Provide views of operational databases that are materialized for efficient access Logical data warehouses • Use data federation, virtualization and distribution techniques Data marts • Generally targeted to a subset of organization, such as a department, and are more tightly focused Data Modeling for Data Warehouses Major research • Traditional DBs generally represent data in two dimensions Rows and columns of a relational model Spreadsheets • Data Warehouses are usually multidimensional Data are stored in data cubes (hypercubes for more than three dimensions) Query performance is better than in the relational model Direct querying of data in any combination of dimensions Data Modeling for Data Warehouses Major research The mutlidimensional model involves two types of tables: • Dimension table – consists of tuples of attributes of the dimension • Fact table – contains tuples, one per recorded fact Each fact contains some measured (observed) variables and identifies them with pointers to dimension tables Data Modeling for Data Warehouses Major research Two common multidimensional schemas • Star schema – consists of a fact table with a single table for each dimension • Fact constelation – set of fact tables that share some dimension tables Data Modeling for Data Warehouses Major research Two common multidimensional schemas • Snowflake schema Variation on the star schema Dimension tables from a star schema are organized into a hierarchy by normalizing them Data Warehouse Functionality Major research • Influenced by SQL and spreadsheets • Aggregate a measure over one or more tables Examples: find total sales, find total sales by region, find the top-5 most sold products • Roll-up: summarizes data with increasing generalization Given total sales by city, can roll-up to get the total sales by country • Drill-down: reveals increasing levels of detail (the inverse of roll-up) Given total sales by country, can drill-down to get the total sales by city It can also drill-down on different dimension to get total sales by product for each country Roll-up vs Drill-down Major research T hree d imensio nal d at a cub e P r o d u c t P1 2 3 r r t e tr 4 a u Q l Q tr 3 a c Q F i s tr 2 Q 1 r t Q Reg 1 Reg 2 Reg 3 roll up Two Dimensional Model REGION P1 2 4 REG1 P1 2 5 P1 2 6 : : Region drill down P R O D U C T P123 P124 P125 P126 : : REG2 REG3 Data Warehouse Functionality Major research • Pivoting (rotation): changing from one dimensional hierarchy to another pivot Data Warehouse Functionality Major research • Slice and dice: reduction of data into smaller chunks so that information is made visible from multiple points of view • Sorting: data are sorted by ordinal value • Selection: data are filtered by value or range • Derived (computed) attributes: computed by operations on stored and derived values