Download MIT Data Warehouse

Best Practices in Higher Education Student Data Warehousing Forum Northwestern University October 21, 2003 Mary Weisse Team Leader MIT Data Warehouse [email protected] Warehouse  Overview  Design  Architecture and implementation  Integrity checking and controls Warehouse Overview  Read only  Integrated reporting  Institute wide  Multiple subject areas  Varied modes of access  Hub for data extraction by other systems Warehouse Design  Transaction vs reporting design  Star schema  Fact table  Dimensions  No user interface Star Schema Benefits  Intuitive joins  Limit on dimensions  Reuse of dimension tables in multiple star schemas Star Schema Example Dimensions Fact Dimensions Star Schema Example Dimensions Fact Dimensions Star Schema Example Dimensions Fact Dimensions No User Interface  All security at the database level  Naming of fields, and tables critical  No place to code around problems, give messages etc. Design Assumptions  Minimal support & operational costs  Standard (open) interfaces & components  Scaleable / able to evolve over time  Secure Risks  Run away queries  Poor data quality  Misunderstanding of the data by users may lead to erroneous reporting results Security  Machine security  Data encryption  Oracle roles  Access control  Dynamic views  Roles Roles Web Architectural Components  DBMS – Store, Manage, and Control Access to the Data  Metadata  Extract – Data definitions, load control, data conversion rules – Data taken from source systems  Transfer – Data copied to the warehouse server securely  Convert – Data translated into reporting format & structures  Load – Data loaded into database & indexes created  Transport – Data is securely transferred from the db to desktop  Query Tool – Retrieve data & create export Data Load Processing  Assumption: Information is better stale than incorrect  Grouping data loads  Error tolerances may vary  Checking status at each stage Process Files Cron 1 2 3 4 5 6 7 8 9 10 11 Check file existence Move to secure directory Decrypt Optional pre-conversion processing Convert Remove data Remove indexes Load data Optional post-load processing Restore indexes Optional post-batch processing Compute Statistics Calculate & add fields Create aggregate tables 12 Archive files External Systems (SAP) Meta Data Data 1 6, 7, 8, 9 5 2 10 3 4 Transfers Encrypted Decrypted Converted Archive Extraction  Minimize impact on production systems  Minimal data transformation done on source system  Performance  Data transformed in only one place  Incremental control  Extracted by date from last date run successfully  Control files to ensure that extracted data is complete Integrity Checks  Correct files on hand before job runs  Record & byte counts  Comparisons of control file to data file  Extract file structure is checked against metadata  DBMS constraints enforced Control of jobs  Cron–scheduling  Error checking system     What jobs should have run? Did they run successfully? Data scanned for discrepancies Mail sent to appropriate staff and users

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download MIT Data Warehouse