Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Adv. DBMS & DW Chapter 7: DW for a large Bank Hachim Haddouti Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 General Motivation for projects • • • • • • • • • Forecasting Comparing performance of units Monitoring, detecting fraud Visualization Conducted survey to see what customers were interested in new model car Want to select customers for advertising campaign Selecting according to demographic parameters Mining of mobile phone calls with schema ( 7 dimensions, geographic location can be determined with a precision of 100 m, etc.) Call Center Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Bank • • • • • Datawarehouse for a large Bank with objective to build a household DW where we can track all the accounts owned, to see all the individual account holders and residential ad commercial groupings contrast to tangible product businesses (value chain) portfolio of services: checking/savings accts, loans, cards, etc. goal: market effectively to households with existing accounts requirements: 1. 2. 3. 4. 5. 6. 7. 5 years data on each account, by end-of-month snapshot valid snapshot as of yesterday for the current month primary balance/account type. group kinds of accounts to compare primary balances custom dimension attr, numeric facts/account type account -> household. Accounts, owners come & go several x/year/household differing records of individual names & addresses over years demographic information, by individuals and households; behavior scores/activity Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Bank core fact table • dimensions: account, household, branch, product, status, time (p 110) • Grain by fact/month; include primary balance, transaction counts, etc. • Why separating Account and Household dimensions, although correlated? Product and Account? Dirty dimensions “..a typical bank is doing well if it can find more than 80% of the actual instances where the same individual has multiple accounts.” if individual account holder were extracted as a separate dimension, it would have many duplicates and extraneous entries.(actually, household is usually equally dirty) Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Bank • Semiadditive account balances (over Time dim) • PD: « Average period balances in financial data warehouses and in inventory data warehouses can be calculated by generalizing the SQL AVG function to instead compute Average Period Sum. Until the DBMS vendors provide the functionality, Average Period Sum must be computed in the end user’s application. ». Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Heterogeneous products • highly varied nature of financial services -- would makes dimension with many attributes (p 112), because of heterogeneity of products • DP: “In data warehouses where a dimension must describe a large number of heterogeneous items, the recommended technique is to create a core fact table and a core dimension table in order to allow queries to cross the disparate types, and to create a custom fact table and a custom dimension table for querying each individual type in depth.” (Fig 7.3 p 114) • every core fact table entry is expanded in just one custom fact table entry (custom fact table as tail of respective records in a core table) • Why duplicating Primary Balance and Transaction Account in each custom fact table? •DP: »The primary core facts should be duplicated in the custom fact tables. This virtually eliminates the need to access two fact tables in a single query in a heterogeneous product schema.” This makes sense only if the num o f the core facts is small. •Using big-dimension techniques for Household and Account dim (minidimensions) Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Chapter 8: Subscription Businesses • Subscription Businsess: Cable TV supplier • Issue: relationship between receipt of money and counting it as income is complicated ( all pay-in-advance biz such as insurance, publisher, ) • Subscription transactions: Modeling a large metropolitan cable TV supplier with more than 1M customers. • Transactions for cable television include: • pen/change account; • purchase/upgrade/renew package, • purchase PPV • cancel package (with reason) • cancel ppw (with reason) • downgrade package (with reason), • refund purchase (with reason) • close account (with reason) Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Chapter 8: Subscription Businesses • Marketing e.g. want to fetch no of new subscribers monthly, no of renewers of packages, canceler or downgrader and why, if the promotion was profitable or not. Operations wants to see what call load is, call traffic in order to plan staffing..CEO wants to know revenue each month, slicing and dicing revenue numbers by customer, package, by promotion, by boradcast time, average minute of PayPerView • Dimensions: transaction entry date/time, effective data/time, customer, sales rep, product, promotion, transaction (Fig 8.1, p 119) • Grain: Every sales transaction • only one fact, amount (meaning variable by transaction) • date granularity = day • customer dimension cleaner than in Bank (household and account issue). Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Chapter 8: Subscription Businesses • • Payments in advance booked as liability, not asset -- must be paid back if service not delivered (ie, income earned) • earned income calculations -- may depend on days in specific month; may overlap year end, or beginning of database, complicated upgrade/downgrade transactions Transaction-grained fact table not practical for calculating earned income. PD: “Pay-in-advance business scenarios typically require the combination of a • • transaction-grained fact table as well as a monthly-snapshot-grained f act table in order to answer questions of transaction frequency and timing as well as questions of earned income in a given month.. » • monthly snapshot table must be built to store the earned income (batch job to calculate carefully the month’s earned money for each account) Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6