Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Best Practices in Higher Education Student Data Warehousing Forum Northwestern University October 21, 2003 Mary Weisse Team Leader MIT Data Warehouse [email protected] Warehouse Overview Design Architecture and implementation Integrity checking and controls Warehouse Overview Read only Integrated reporting Institute wide Multiple subject areas Varied modes of access Hub for data extraction by other systems Warehouse Design Transaction vs reporting design Star schema Fact table Dimensions No user interface Star Schema Benefits Intuitive joins Limit on dimensions Reuse of dimension tables in multiple star schemas Star Schema Example Dimensions Fact Dimensions Star Schema Example Dimensions Fact Dimensions Star Schema Example Dimensions Fact Dimensions No User Interface All security at the database level Naming of fields, and tables critical No place to code around problems, give messages etc. Design Assumptions Minimal support & operational costs Standard (open) interfaces & components Scaleable / able to evolve over time Secure Risks Run away queries Poor data quality Misunderstanding of the data by users may lead to erroneous reporting results Security Machine security Data encryption Oracle roles Access control Dynamic views Roles Roles Web Architectural Components DBMS – Store, Manage, and Control Access to the Data Metadata Extract – Data definitions, load control, data conversion rules – Data taken from source systems Transfer – Data copied to the warehouse server securely Convert – Data translated into reporting format & structures Load – Data loaded into database & indexes created Transport – Data is securely transferred from the db to desktop Query Tool – Retrieve data & create export Data Load Processing Assumption: Information is better stale than incorrect Grouping data loads Error tolerances may vary Checking status at each stage Process Files Cron 1 2 3 4 5 6 7 8 9 10 11 Check file existence Move to secure directory Decrypt Optional pre-conversion processing Convert Remove data Remove indexes Load data Optional post-load processing Restore indexes Optional post-batch processing Compute Statistics Calculate & add fields Create aggregate tables 12 Archive files External Systems (SAP) Meta Data Data 1 6, 7, 8, 9 5 2 10 3 4 Transfers Encrypted Decrypted Converted Archive Extraction Minimize impact on production systems Minimal data transformation done on source system Performance Data transformed in only one place Incremental control Extracted by date from last date run successfully Control files to ensure that extracted data is complete Integrity Checks Correct files on hand before job runs Record & byte counts Comparisons of control file to data file Extract file structure is checked against metadata DBMS constraints enforced Control of jobs Cron–scheduling Error checking system What jobs should have run? Did they run successfully? Data scanned for discrepancies Mail sent to appropriate staff and users