Download MIT Data Warehouse

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Big data wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database model wikipedia , lookup

Transcript
Best Practices in Higher Education
Student Data Warehousing Forum
Northwestern University
October 21, 2003
Mary Weisse
Team Leader
MIT Data Warehouse
[email protected]
Warehouse
 Overview
 Design
 Architecture and implementation
 Integrity checking and controls
Warehouse Overview
 Read only
 Integrated reporting
 Institute wide
 Multiple subject areas
 Varied modes of access
 Hub for data extraction by other systems
Warehouse Design
 Transaction vs reporting design
 Star schema
 Fact table
 Dimensions
 No user interface
Star Schema Benefits
 Intuitive joins
 Limit on dimensions
 Reuse of dimension tables in multiple star schemas
Star Schema Example
Dimensions
Fact
Dimensions
Star Schema Example
Dimensions
Fact
Dimensions
Star Schema Example
Dimensions
Fact
Dimensions
No User Interface
 All security at the database level
 Naming of fields, and tables critical
 No place to code around problems, give messages etc.
Design Assumptions
 Minimal support & operational costs
 Standard (open) interfaces & components
 Scaleable / able to evolve over time
 Secure
Risks
 Run away queries
 Poor data quality
 Misunderstanding of the data by users may lead to
erroneous reporting results
Security
 Machine security
 Data encryption
 Oracle roles
 Access control
 Dynamic views
 Roles
Roles Web
Architectural Components
 DBMS
– Store, Manage, and Control Access to the Data
 Metadata
 Extract
– Data definitions, load control, data conversion rules
– Data taken from source systems
 Transfer
– Data copied to the warehouse server securely
 Convert
– Data translated into reporting format & structures
 Load
– Data loaded into database & indexes created
 Transport
– Data is securely transferred from the db to desktop
 Query Tool
– Retrieve data & create export
Data Load Processing
 Assumption: Information is better stale than incorrect
 Grouping data loads
 Error tolerances may vary
 Checking status at each stage
Process Files
Cron
1
2
3
4
5
6
7
8
9
10
11
Check file existence
Move to secure directory
Decrypt
Optional pre-conversion processing
Convert
Remove data
Remove indexes
Load data
Optional post-load processing
Restore indexes
Optional post-batch processing
Compute Statistics
Calculate & add fields
Create aggregate tables
12 Archive files
External
Systems
(SAP)
Meta Data
Data
1
6, 7, 8, 9
5
2
10
3
4
Transfers
Encrypted
Decrypted
Converted
Archive
Extraction
 Minimize impact on production systems
 Minimal data transformation done on source system
 Performance
 Data transformed in only one place
 Incremental control
 Extracted by date from last date run successfully
 Control files to ensure that extracted data is complete
Integrity Checks
 Correct files on hand before job runs
 Record & byte counts
 Comparisons of control file to data file
 Extract file structure is checked against metadata
 DBMS constraints enforced
Control of jobs
 Cron–scheduling
 Error checking system




What jobs should have run?
Did they run successfully?
Data scanned for discrepancies
Mail sent to appropriate staff and users