Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Dimensional Model vs. ER Model • ER(Entity-Relationship) Modeling – – – – “to take redundancy out of data” one-to-many or many-to-many relatioship normalized table simplifies update and insert operations many tables make complex query difficult • Dimensional Model – designed for easy queries and navigation – produce few tables and keys compared to ER Model Star Join Schema (Retail Store) Schema not Recommended (Retail Store) Converted Tables (Retail Store) • SalesFact(time_key, product_key, store_key, promotion_key, dollars_sold, units_sold, dollars_cost) – 4 foreign composite keys • Time(time_key, day_of_week, month, quarter, year, holiday_flag) – time hierarchy • Product(product_key, description, brand, category) • Store(store_key, store_name, address, floor_plan_type) • Promotion(promotion_key, promotion_name) Data Warehouse Query • Find all the products that were sold in ‘4th Q 1995’ • • • • • • • – Brand Dollar Sales Unit Sales – Axon 780 263 – Widget 213 444 – …. Select p.brand, sum(f.dollars_sold), sum(f.units_sold) from SaleFact f, Product p, Time t where f.product_key = p.product_key and f.time_key = t.time_key and t.quarter = 4 and t.year = 1995 group by p.brand order by p.brand Grocery Store Case (1) • Time Dimension -> Time Table – – – – – – – – – – – – – time_key day_of_week day_member_in_month day_member_overall week_member_in_year week_member_overall quarter fiscal_period holiday_flag weekday_flag last_day_in_month_flag season : Christmas, Thanksgiving, Easter, Valentine’s Day, July 4th event : Super Bowl Sunday, Labor Strike, Hurricane Hugo, etc.. Grocery Case (2) • Product Dimension -> Product table – – – – – – – – – – – – – – product_key SKU_description -- Green 3-pack Brawny Paper Towels SKU_number -- UPC # xxxxxxxxxxxxxx package_size -- 3-pack brand -- Brawny subcategory -- Paper Towels category -- Paper department -- Grocery (Bakery, Photo, Drug, etc.) weight weight_unit_of_measure units_per_retail_case units_per_shipping_case cases_per_pallet shelf_width, shelf_height, self_depth, etc... Grocery Case (3) • Store Dimension -> Store Table – – – – – – – – – – – – – – store_key store_name store_number store_street_address store_city store_county store_state store_zip store_district store_region store_manager store_phone floor_plan_type, first_opened_date, last_remodeled_date, … store_sqft, grocery_sqft, frozen_sqft, meat_sqft, etc…. Grocery Case (4) • Promotion Dimension -> Promotion Table – – – – – – – – – – – promotion_key promotion_name price_reduction_type ad_type display_type coupon_type ad_media_name display_provider promotion_cost promo_begin_date promo_end_date, etc... Grocery Case (5) • Grocery Store Fact --> SaleFact Table – Units_sold, dollars_sold, dollars_cost, customer_count – several formulas • gross profit • several ratios like gross margin • Please take a look at grocery_new.mdb – unzip “grocery_new.zip” from class web References • Ralph Kimbell – “The Data Warehouse Toolkit: How to Design Dimensional Data Warehouse”, John Wiley, 1996 • Ralph Kimbell’s DBMS article – http://www.dbmsmag.com/9510d05.html – http://www.rkimball.com/ – http://pwp.starnetinc.com/larryg/index.html OLAP(Online Analytic Processing) • Applies to all analytic functions • Steps of OLAP – – – – – Interface : get user’s selection Query :generate SQL code Process : data analysis on the result from query Format :properly label rows and columns Display : report and chart OLAP Capabilities • • • • Query and Reporting Multidimensional Analysis Statistical analysis (user-directed) *extra Data Mining(agent processing) *extra – * extra is functionality added recently – DW/OLAP/DM overlap each other Multidimensional Analysis • Let users start from single dimension and navigate to other dimensions – slice, dice • navigating hierarchy of dimension – drill-down/drill-up/drill-across • Involves massive calculations – How have advertising expenses affected sales? – Which products should be discontinued? Statistical Analysis • SAS and SPSS want to play BIG roles • Calculating average, regression, correlation, factoring, clustering, etc. • Example, relationship of “product sales” to “price”, “distribution”, “weather” Data Mining • • • • Uses same technique as statistical analysis Adds neural network/learning algorithms computer program directed New market – IBM Intelligent Miner – Information Discovery – DataMind Pivot Table • MS Excel provides a limited multidimensional data base capability – usually works with Excel Chart – drill down/up is very limited – users have to prepare one table which is a result of joining Fact and several dimension tables – PivotTable = page/Table(row/column) OLAP using Access & Excel • Use SQL to prepare tables from Access – By Joining several tables(dimension + fact) – Learn SQL enough to join several tables – Please take a look at handout on SQL or DB book • Use Pivot table from Excel for browsing – Source of Pivot table is joined table – Pivot table provides limited navigation Grocery Case • Use Grocery_new.mdb • Generate Pivot table which let us do the following analysis Want to do analysis on dollars sales by date(day/month), category, store_name on all the transaction which belongs to year 1998