Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Warehousing Willem Visser RW334 Somebody is watching! • Everybody seems to be recording your every move • Loyalty cards • Cookies – Facebook, Twitter,… – Check out Collusion plug-in for Firefox • They want to know how to market to you • Same is true in business – Know your data, know your business Data Warehousing • Integrated repository of data to understand your business • Separate from the Operational Database • Supports decision making • Subject oriented • Time variant • Non-volatile Features • Only necessary data to allow modeling and decision making • Coming from potentially many sources • Time component – Even though operationally there might not be • Data doesn’t change after loading – No operational updates – Periodic refresh Operational vs Warehouse • Operational – Optimized for on-line transactional processing – OLTP • Warehouse – Optimized for online analytic processing – OLAP – Complex queries – Very large volumes Data cube • Multi dimensional data – Not just 3D (but mostly shown as such) Lattice of Cuboids all 0-D(apex) cuboid product product,date date country product,country 1-D cuboids date, country 2-D cuboids 3-D(base) cuboid product, date, country How much of the cube is materialized before the query: • Full (complete cuboid) • None (materialized on the fly) • Partial Slide by Dr. Hany Saleeb OLAP • “Querying and presenting text and numeric data from data warehouses in a dimensional cube-style” – Slicing a dimension: Per region, per product, per period – Drill-down: Country Region Town Suburb – Drill-up, drill-around, etc. – Visualization Slide by Cor Winkler Multi-Tiered Architecture other Metadata sources Operational DBs Extract Transform Load Refresh Monitor & Integrator Data Warehouse OLAP Server Serve Analysis Query Reports Data mining Data Marts Data Sources Data Storage OLAP Engine Front-End Tools Slide by Dr. Hany Saleeb Steps • Data extraction: – get data from multiple, heterogeneous, and external sources • Data cleaning: – detect errors in the data and rectify them when possible • Data transformation: – convert data from legacy or host format to warehouse format • Load: – sort, summarize, consolidate, compute views, check integrity, and build indices and partitions • Refresh – propagate the updates from the data sources to the warehouse Star-Schema Example SALES FACT TABLE TIME DIMENSION time_key (FK) product_key (FK) store_key (FK) promo_key (FK) dollars units cost time_key (PK) SQL_date day_of_week week_number month PRODUCT STORE DIMENSION store_key (PK) store_ID store_name address district region District Atherton Atherton Atherton Belmont Belmont Belmont PRODUCT DIMENSION product_key (PK) SKU description brand category package_type size flavor PROMOTION DIMENSION promotion_key (PK) promotion_name promotion_type price_treatment ad_treatment display_treatment coupon_type Brand Clean Fast More Power Zippy Clean Fast More Power Zippy Total Dollars $ 1,233 $ 2,239 $ 848 $ 2,097 $ 2,428 $ 633 Total Cost $ 1,058 $ 2,200 $ 650 $ 1,848 $ 2,350 $ 580 Gross Profit $ 175 $ 39 $ 198 $ 249 $ 78 $ 53 Slide by Cor Winkler Data Warehouse Usage • Three kinds of data warehouse applications – Information processing • supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs – Analytical processing • multidimensional analysis of data warehouse data • supports basic OLAP operations, slice-dice, drilling, pivoting – Data mining • knowledge discovery from hidden patterns • supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools. • Differences among the three tasks Slide by Dr. Hany Saleeb Information Dashboards Slide by Cor Winkler Information Exploitation Business Intelligence (BI) R Value of decision Reporting What happened? Historical info ← # of users Analysis Why it happened? Dynamic slice&dice Data Mining What might happen? Obscure data relationships and trends Intelligent agents Make it Complexity → happen! Automatic response to business triggers Complexity → Slide by Cor Winkler