Download Chapter25 - members.iinet.com.au

Chapter #25 – Data Warehousing Online Transaction Processing (OLTP) applications  DBMS widely used by org. for maintaining data that documents their everyday operations  operational data  Applications typically make transactions that make small changes and a larger number of transactions must be reliably and efficiently processed  DBMS been optimized extensively to perform well in such apps. Decision support applications  Current and historical data is comprehensively analysed and explored to identify useful trends and creating summaries of data to support high level decision making  DBMS vendors are adding features to their products to support it o New constructs o Novel indexing o Query optimisation techniques  to support complex queries  Use of views extensively used for applications involving complex data analysis Use of views  Precomputing view definitions can make queries run much faster Data warehouse  Organisation can consolidate info from several DBs into a data warehouse by copying tables from many sources into one location  Warehouses of data from multiple databases  Drawn from several databases maintained by different business units together with historical and summary info  Gives comprehensive view of all aspects of an enterprise Three classes of analysis tools available 1. Online Analytic Processing (OLAP) o Support class of stylised queries that involve group-by and aggregation operators o Provide excellent support for complex Boolean conditions, statistical functions and features for time-series analysis o Apps dominated by such queries are called OLAP o Support querying style in which data is best thought of as a multidimensional array 2. DBMSs optimised for decision support applications o DBMSs that support traditional SQL-style queries but are designed to also support OLAP queries efficiently o Vendors of RDBMS enhancing their products in this direction 3. Exploratory data analysis o Motivated by desire to finding interesting or unexpected trends and patterns in large data sets rather than the complex query characteristics in previous 2 o Amount of data in many applications too large to permit manual analysis or even traditional statistical analysis o Goal of data mining is to support exploratory analysis over very large data sets Data warehousing  OLAP or data mining queries over distributed data is likely to be slow  Such complex analysis, often statistical in nature, not essential that most current version of data is required  Data warehousing is the creation of a centralised repository of all the data  Availability of a data warehouse facilitates the application of OLAP and data mining tools (analysis tools) OLAP : Multidimensional data model  OLAP applications are dominated by complex queries involving group-by and aggregation operators  OLAP queries use multidimensional data model  Focus is on a collection of numeric measures  Each measure depends on a set of dimensions  Eg: o Measure attribute is sales o Dimensions are Product (pid), Location (locid), Time (timeid) o Given a product, location and time we have 1 associated sale value o Think of sales info being arranged into 3d array Sales  In OLAP apps : bulk of data can be represented in such a multidimensional array Multidimensional OLAP (MOLAP)  OLAP systems that use arrays to store multidimensional datasets Representation using relations (fact tables)  Multidimensional array can also be represented by a relation  This relation which relates the dimensions (product, location and time) to the measure of interest (sales) is called a fact table Dimensions  each dimensions can have a set of associated attributes o i.e., location dimension: identified by locid, has attributes country, state and city  Each dimension can be structured as a hierarchy:  information about dimensions can also be represented by relations: locations(locid: integer, city: string, state : string, country: string) o these relations are much smaller than the fact table o They are called dimension tables Relational OLAP (ROLAP)  OLAP systems that store all info including fact tables as relations Multidimensional Database Design  tables in a ROLAP        Suggests a star schema o Centered at the fact table (Sales) o Combination of fact table and dimension tables Star schema pattern very common in DB designed for OLAP Bulk of data typically in fact table o Has no redundancy (usually BCNF) Info about dimension values maintain in dimension tables Size of DB used for OLAP dominated by fact table (Sales) Small response times for interactive querying important in OLAP New storage structures and indexing techniques have been developed to support OLAP Creating and maintaining a warehouse  Since source DBs are often created and maintained by different groups, there are a number of semantic mismatches across these DBs o Different names for same attributes, different in how tables are normalised and structured o These differences must be reconciled when data brought into warehouse  Extracted: o Data extracted from operational databases and external sources      Cleaned: o Data cleaned to minimised errors and fill in missing info if possible Transformed: o Data transformed to reconcile semantic mismatched o Accomplished by defining a relational view over the tables in the data sources Loading: o Loading data consists of materialising such views and storing them in the warehouse o Sorting and generation of summary info o Data is partitioned and indexes are built for efficiency o Very slow process Refresh: o After data loaded into a warehouse, need to ensure data in warehouse is periodically refreshed to  reflect updates to the data source  purge old data Metadata repository: o Important task in maintaining a warehouse is keeping track of data currently stored on it  bookkeeping o Done by storing info about warehouse data in the system catalogs  Typically very large and often stored and managed in separate DB called metadata repository o Size and complexity of catalogs is due to  size/complexity of warehouse itself  size of administrative info that must be maintained Data Warehousing Architecture

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter25 - members.iinet.com.au