Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Best Practices in Higher Education Student Data Warehousing Forum Northwestern University October 21, 2003 Mary Weisse Team Leader MIT Data Warehouse [email protected] Warehouse  Overview  Design  Architecture and implementation  Integrity checking and controls Warehouse Overview  Read only  Integrated reporting  Institute wide  Multiple subject areas  Varied modes of access  Hub for data extraction by other systems Warehouse Design  Transaction vs reporting design  Star schema  Fact table  Dimensions  No user interface Star Schema Benefits  Intuitive joins  Limit on dimensions  Reuse of dimension tables in multiple star schemas Star Schema Example Dimensions Fact Dimensions Star Schema Example Dimensions Fact Dimensions Star Schema Example Dimensions Fact Dimensions No User Interface  All security at the database level  Naming of fields, and tables critical  No place to code around problems, give messages etc. Design Assumptions  Minimal support & operational costs  Standard (open) interfaces & components  Scaleable / able to evolve over time  Secure Risks  Run away queries  Poor data quality  Misunderstanding of the data by users may lead to erroneous reporting results Security  Machine security  Data encryption  Oracle roles  Access control  Dynamic views  Roles Roles Web Architectural Components  DBMS – Store, Manage, and Control Access to the Data  Metadata  Extract – Data definitions, load control, data conversion rules – Data taken from source systems  Transfer – Data copied to the warehouse server securely  Convert – Data translated into reporting format & structures  Load – Data loaded into database & indexes created  Transport – Data is securely transferred from the db to desktop  Query Tool – Retrieve data & create export Data Load Processing  Assumption: Information is better stale than incorrect  Grouping data loads  Error tolerances may vary  Checking status at each stage Process Files Cron 1 2 3 4 5 6 7 8 9 10 11 Check file existence Move to secure directory Decrypt Optional pre-conversion processing Convert Remove data Remove indexes Load data Optional post-load processing Restore indexes Optional post-batch processing Compute Statistics Calculate & add fields Create aggregate tables 12 Archive files External Systems (SAP) Meta Data Data 1 6, 7, 8, 9 5 2 10 3 4 Transfers Encrypted Decrypted Converted Archive Extraction  Minimize impact on production systems  Minimal data transformation done on source system  Performance  Data transformed in only one place  Incremental control  Extracted by date from last date run successfully  Control files to ensure that extracted data is complete Integrity Checks  Correct files on hand before job runs  Record & byte counts  Comparisons of control file to data file  Extract file structure is checked against metadata  DBMS constraints enforced Control of jobs  Cron–scheduling  Error checking system     What jobs should have run? Did they run successfully? Data scanned for discrepancies Mail sent to appropriate staff and users