* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download warehouse_chapter13
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Transportation: Refreshing Warehouse Data Chapter 13 Developing a Refresh Strategy for Capturing Changed Data Consider load window Identify data volumes Identify cycle Know the technical infrastructure Plan a staging area Determine how to detect changes Operational databases User Requirements and Assistance     Users define the refresh cycle IT balances requirements against technical issues Document all tasks and processes Employ user skills Operational databases Load Window      Time available for entire ETT process Plan Test Prove Monitor Load Window 0 3am 6 User Access Period Load Window 9 12pm 3 6 9 12 Load Window       Plan and build processes according to a strategy. Consider volumes of data. Identify technical infrastructure. Ensure currency of data. Consider user access requirements first High availability requirements may mean a small load window User Access Period 0 3am 6 9 12pm 3 6 9 12 Scheduling the Load Window Requirements  Load cycle  Receive data File 1 File FTP 2 0 3 Control File File Names File types Number of files Number of loads First-time load or refresh Date of file Data range Records in file - counts Totals - amounts Control 4 process Open and read files to verify and analyze 3 am Scheduling the Load Window 6 5 Verify, analyze, reapply Load into warehouse File 1 File 2 3 am 8 7 Index data Create summaries 9 Update metadata Parallel load 6 am 9 am Scheduling the Load Window 11 10 Back up warehouse Create Views for Specialized tools 12 13 Users Access Publish Summary data User access 6 am 9 am Capturing Changed Data for Refresh      Capture new fact data Capture changed dimension data Determine method for capture of each Methods: - Wholesale data replacement - Comparison of database instances - Time stamping - Database triggers - Database log Hybird techniques Wholesale Data Replacement Operational databases T1     T2 Expensive Limited historical data, if any Data mart implementations Time period replacement T3 Comparison of Database Instance Yesterday’s Operational database Today’s Operational database   Database comparison Delta file holds Changed data Simple to perform, but expensive in time and processing Data file: - Changes to operational data since last refresh - Used by various techniques Time and Date Stamping Operational data    Delta file holds Changed data Fast scanning for records changed since last extraction Date Updated field No detection of deleted data Database Triggers Operation Server (DBMS) Trigger Trigger Trigger    Changed data intersected at the server level Extra I/O required Maintenance overhead Using a Database Log Operational data    Operational Server (DBMS) Log Log analysis And Data extraction Contains before and after images Requires system checkpoint Common technique Delta file holds Changed data Verdict    Consider each method on merit. Consider a hybrid approach if one approach is not suitable. Consider current technical, existing operational, and current application issues. Applying the Changes to Data You have a choice of techniques:  Overwrite a record  Add a record  Add a field  Maintain history  Add version numbers Overwriting a Record    Customer ID John Doe Single Customer ID John Doe Married Easy to implement Loses all history Not recommended Adding a New Record 1 Customer Id John Doe Single 1 Customer Id John Doe Single 1A Customer Id John Doe Married     History is preserved; dimensions grow. Time constraints are not required. Generalized key is created. Metadata tracks usage of keys. Adding a Current Field Customer Id John Doe Single Customer Id John Doe Single    Married 01-JAN-96 Maintains some history Loses intermediate values Is enhanced by adding an Effective Date field Limitations of Methods for Applying Changes    Complete history impossible Dimensions may grow large Maintenance overload 1234 Comer 1234 Comer 1 Main Street 200 First Ave 555-6789 222-3211 1234 Comer 1234-01 Comer 1 Main Street 200 First Ave 555-6789 222-3211 1234 Comer 1234-01 Comer 1 Main Street 200 First Ave Effective Date 555-6789 01-Apr-93 222-3212 01-Jun-97 Maintaining History HIST_CUST Time CUSTOMER Sales Product One-to-many relationship  Always retain current record  Consistently able to refer to record history History Preserved    History enables realistic analysis. History retains context of data. History provides for realistic historical analysis. - Reflect business changes - Maintain context between fact and dimension data - Retain sufficient data to relate old to new Version Numbering   Avoid double counting Facts hold version number Customer Time Sales Customer.CustId Version Customer Names 1234 1 Comer 1234 2 Comer Customer.CustId Version Sales Facts 1234 1 11,000 1234 2 12,000 Product Purging and Archiving Data   As data ages, its value depreciates. Remove old data from the warehouse: - Archive for later use - Purge without copy Techniques for Purging Data     TRUNCATE: Retains no rollback DELETE: Retains redo and rollback ALTER TABLE: Removes a partition PL/SQL: Uses database triggers Techniques for Archiving Data    Export to dump file from tables Import to tables from dump file ALTER TABLE EXCHANGE partitions EXP Database IMP .dmp Verdict   Defined by business requirements Must be managed Final Tasks    Update metadata - ETT - User Publish data - Availability - Changes - Subject area basis Use database roles to prevent and allow access Publishing Data     Control access using database roles 24-hour operation may be requested Compromise between load and access Consider - Staggering updates - Using temporary tables - Using separate tables ETT Tool Selection Criteria            Overlap with existing tools Availability of meta model Supported data sources Ease of modification and maintenance Required fine tuning of code Ease of change control Power of transformation logic Level of modularization Power of error, exception, resubmission features Intuitive documentation Performance of code ETT Tool Selection Criteria       Activity scheduling and sophistication Metadata generation Learning curve Flexibility Supported operation systems Cost Transportation Tools  Information Oracle  Platinum Technology  OpenBridge SQL*Loader Gateways PL/SQL Precompilers InfoPump Platinum Info Transport Replication Server Utilities  Oracle Symmetric and Heterogeneous Replication Gateways and Middleware        Brio Technology Information Co. Information Builders Oracle Platinum Technology Prism Software AG DataPrism OpenBridge EDA/SQL Gateways InfoHub Prism Manager Entire Transaction Propagator Summary This lesson discussed the following topics:  Capturing changed data  Applying the changes  Purging and archiving data  Publishing the data, controlling access, and automating processes  Identifying tools for transporting data into the warehouse
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            