Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Oracle Data Archiving Taming the Beast Dave Moore Neon Enterprise Software INTELLIGENCE. INNOVATION. INTEGRITY Agenda Archiving Defined Requirements and Solutions Oracle Archiving Strategies Oracle Row Removal Options Oracle Post Archive Operations INTELLIGENCE. INNOVATION. INTEGRITY Dave Oracle ACE Using Oracle since 1991 Product Author at Neon Enterprise Software Creator of OracleUtilities.com Author of “Oracle Utilities” from Rampant Tech Press Core competencies include performance, utilities and data management INTELLIGENCE. INNOVATION. INTEGRITY Database Archiving Database Archiving: The process of removing selected data records from operational databases that are not expected to be referenced again and storing them in an archive data store where they can be retrieved if needed. Purge INTELLIGENCE. INNOVATION. INTEGRITY Trends Impacting Archive Needs Data Retention Issues: Volume of data Length of retention requirement Varied types of data Security issues 0 Time Required 30+ Yrs INTELLIGENCE. INNOVATION. INTEGRITY Archiving All Types of Data Paper Blueprints Forms Claims Word Excel PDF XML IMS DB2 ORACLE SYBASE SQL Server IDMS VSAM Programs UNIX Files Outlook Lotus Notes Attachments Sound Pictures Video INTELLIGENCE. INNOVATION. INTEGRITY Data Archiving and ILM Create Operational Needed for completing business transactions Reference Archive Needed for reporting or expected queries Needed for compliance and business protection Discard Mandatory Retention Period INTELLIGENCE. INNOVATION. INTEGRITY Some Sample Regulations Impacting Data Retention INTELLIGENCE. INNOVATION. INTEGRITY What Does It All Mean? Enterprises must recognize that there is a business value in organizing their information and data. Organizations that fail to respond run the risk of seeing more of their cases decided on questions of process rather than merit. (Gartner, 20-April-2007, Research Note G00148170: Cost of E-Discovery Threatens to Skew Justice System) INTELLIGENCE. INNOVATION. INTEGRITY Operational Efficiency Database Archiving can be undertaken to improve operational efficiency Large volumes of data can interfere with production operations – efficiency of transactions – efficiency of utilities: BACKUP/RESTORE, REORG, etc. – Storage » Gartner: databases copied an average of 6 times! INTELLIGENCE. INNOVATION. INTEGRITY What Solutions Are Out There? Keep Data in Operational Database — Problems with authenticity of large amounts of data over long retention times Store Data in UNLOAD files (or backups) — Problems with schema change and reading archived data; using backups poses even more serious problems Move Data to a Parallel Reference Database — Combines problems of the previous two Move Data to a Database Archive INTELLIGENCE. INNOVATION. INTEGRITY Components of a Database Archiving Solution Production Database Data Extract Captured Structure Archive Policies Data Retention Archive Data Store and Retrieve Recall Database Data Recall Archive Store Archive Data Query Access Metadata Policies History Data & Metadata Archive Administration INTELLIGENCE. INNOVATION. INTEGRITY Archiving Requirements Policy based archiving: logical selection Keep data for very long periods of time Store very large amounts of data in archive Maintain Archives for ever changing operational systems Become independent from Applications/DBMS/Systems Protect authenticity of data Access data when needed; as needed Discard data after retention period automatically INTELLIGENCE. INNOVATION. INTEGRITY Policy based archiving Why : — Business objects are archived, not files — Rules for when something is ready can be complex — Data ready to be archived is distributed over database Implications: — User must provide policies for when something is to be archived How: — Full metadata description of data — Flexible specification of policy : “WHERE clause” INTELLIGENCE. INNOVATION. INTEGRITY For Example… Parts Master is the parent table to all other tables STORAGE INFO Part Number Bin Number Qty on Hand Qty on Order Qty Backorder PARTS MASTER Part Number Type Description Unit Type Cost Price Substitute Parts ORDER INFO Part Number PO Number Vendor ID Quantity Ordered Unit Cost Date Ordered Date Received SUMMARY BY QUARTER Part Number Year Q1 Disbursed Q2 Disbursed Q3 Disbursed Q4 Disbursed DISBURSEMENT Part Number Dept. ID CHIT ID Qty Disbursed Date Disbursed INTELLIGENCE. INNOVATION. INTEGRITY Keep Data for a Long Time Why: retention requirements in decades Implications: — Archive will outlive applications/DBMS/systems that generated them — Archive will outlive people who designed and managed operational systems — Archive will outlive media we store it on How: — Unique data store — Application/DBMS/system independence — Metadata independence — Continuous management of storage — Continuous management of archive content INTELLIGENCE. INNOVATION. INTEGRITY Maintain Archive for Changing Operational Systems Why : — Metadata changes frequently — Applications are re-engineered periodically – – – – Change DBMS platform Change System platform Replace with new application Consolidate after merger or acquisition Implications: — Archive must support multiple variations of an application — Archive must deal with metadata changes How: — Manage applications as major archive streams having multiple minor streams with metadata differences — Achieve independence from operating environment INTELLIGENCE. INNOVATION. INTEGRITY Achieve Metadata Independence Why : — Operational metadata is inadequate — Operational metadata changes — Operational systems keep only the “current” metadata — Data in archive often does not mirror data in operational structures Implications: — Archive must encapsulate metadata — Metadata must be improved How: — Metadata Capture, Validate, Enhance capabilities — Store structure that encapsulates with data — Keeps multiple versions of metadata INTELLIGENCE. INNOVATION. INTEGRITY Protect Authenticity of Data Why : — Potential use in lawsuits/ investigations — Potential use in business analysis Implications: — Protect from unwanted changes — Show original input — Cannot be managed in operational environment How: — — — — — — — SQL Access that does not support I/U/D Do not modify archive data on metadata changes Encryption as stored Checksum for detection of sabotage Limit access to functions Audit use of functions Maintain offsite backup copies for restore if sabotaged INTELLIGENCE. INNOVATION. INTEGRITY Access Data Directly From Archive Why : — Cannot depend on application environment Implications: — Full access capability within archive system How: — Industry standard interface (e.g. JDBC) — LOAD format output for – For load into a database – May be different from source database — Requires full and accurate metadata — Ability to review metadata — Ability to function across metadata changes INTELLIGENCE. INNOVATION. INTEGRITY Discard Function Why : — Legal exposure for data kept too long Implications: — Data cannot be kept in archive beyond retention period — Must be removed with no exposure to forensic software How: — Policy based discard — System level function — Tightly controlled and audited — True “zero out” capability — Discard from backups as well INTELLIGENCE. INNOVATION. INTEGRITY Database or Archive? Keep in DB Keep in Archive Performance Space Compliance INTELLIGENCE. INNOVATION. INTEGRITY Based on Data Availability Keep in DB Keep in Archive Purge Must be Available to App Must be Available Must Be Secure Not Needed INTELLIGENCE. INNOVATION. INTEGRITY Oracle Archiving Strategies Designed Up Front (Yeah, right) Determined by Application Owner Implemented by ____________ Utilize Oracle Features INTELLIGENCE. INNOVATION. INTEGRITY Finding Large Tables DBA_SEGMENTS (bytes) DBA_TABLES (num_rows) or based on I/O INTELLIGENCE. INNOVATION. INTEGRITY Rolling Windows Self Managing Mostly based on DATE Utilize DBMS Features — Partitioning — Transportable Tablespaces – – – – Exchange Partition Set tablespace read only Expdp Copy export file and data file INTELLIGENCE. INNOVATION. INTEGRITY Rolling Windows via Partitioning P1 … Data Profile Storage Profile Probably Never Accessed Cheap as you can get Read Only / Compressed … P47 Rarely Accessed Heavily Accessed Not so fast or expensive Fast, expensive Read Only Read / Write INTELLIGENCE. INNOVATION. INTEGRITY Why not use transportable tablespaces or Oracle exports for data retention? INTELLIGENCE. INNOVATION. INTEGRITY The Problem with Oracle Files Transportable Tablespaces Exports Backups Oracle Export Files & Datafiles Import Trans Tsp Version 16Z Year 2007 Year 2030 Not a good method for LT Data Retention INTELLIGENCE. INNOVATION. INTEGRITY Partitioning (Old ways) Range Partitioning Data is distributed based on partition key range of values – usually a date. Good When: Data is date-based. INTELLIGENCE. INNOVATION. INTEGRITY Partitioning (Old Ways) Hash Partitioning Uses hash algorithm to create equally sized buckets of data. Good When: No natural partition key and desire I/O balancing (hot spots). INTELLIGENCE. INNOVATION. INTEGRITY Partitioning (Old Ways) List Partitioning Data is distributed based on LIST of values in partition key. Good When: Have short list of values (States, Regions, Account Types) INTELLIGENCE. INNOVATION. INTEGRITY Partitioning (New Ways – 11G) Interval Partitioning Initial Partition is created manually, the rest are automatically created as new data arrives. Good When: Need a rolling window! INTELLIGENCE. INNOVATION. INTEGRITY Partitioning (New Ways – 11G) REF partitioning Related Tables benefit from same partitioning strategy, whether column exists in children or not! Good When: Desire related data to be partitioned in the same manner. INTELLIGENCE. INNOVATION. INTEGRITY Partitioning (New Ways – 11G) Virtual Column Partitioning Partition key may be based on virtual column Good When: Virtual column is required for partition key. INTELLIGENCE. INNOVATION. INTEGRITY Rows Gotta Go INTELLIGENCE. INNOVATION. INTEGRITY Row Removal Options SQL DELETE CTAS / DROP / RENAME TRUNCATE Row Marking INTELLIGENCE. INNOVATION. INTEGRITY SQL DELETE Good for small number of rows RI handled automatically Oracle was born to DELETE, better than any PL/SQL that you write. Issue with Un-indexed Foreign Keys INTELLIGENCE. INNOVATION. INTEGRITY DELETE Optimization Work in batches, committing (only when programmatically DELETING) Use parallel DML (Partitioned tables only) Drop Indexes before (if possible) Index FK columns INTELLIGENCE. INNOVATION. INTEGRITY CTAS Works well for PURGE, not archive Perfect when you want to keep low percentage of rows in the table Doesn’t handle RI – no DELETE was issued. Process 1. Create table with rows you want to keep 2. Drop old table 3. Rename table 4. Recreate indexes create table new_table unrecoverable as select * from old_table where ... INTELLIGENCE. INNOVATION. INTEGRITY TRUNCATE Congratulations if your application lends itself to TRUNCATE without losing new data What about RI? May truncate or drop individual partitions INTELLIGENCE. INNOVATION. INTEGRITY DROP DROP PARTITION What would you do before you drop it? Exchange partition with table Transportable tablespace. INTELLIGENCE. INNOVATION. INTEGRITY Things to Remember • Benchmark the best way for you • Benchmark against real data if possible • Use parallel DML INTELLIGENCE. INNOVATION. INTEGRITY Design Summary Create an architecture that lends itself to aging, archiving, deleting This architecture should compensate for business requirements — For instance, customer orders not accessible after 6 months … or — top query performance needed for all ‘ACTIVE’ accounts … etc Implement it – THE EASY PART INTELLIGENCE. INNOVATION. INTEGRITY Post Archive Challenges INTELLIGENCE. INNOVATION. INTEGRITY Post Archive Challenges “I have successfully deleted 10 billion rows from the table. HoooAhhhh! Performance will be great, space will be available, and I will get credit for optimizing our data warehouse application, saving the company billions of dollars” INTELLIGENCE. INNOVATION. INTEGRITY … 2 Days Later … INTELLIGENCE. INNOVATION. INTEGRITY Post Archive Challenges Hmmmmm. It looks like … - Queries are not any faster . . . - The Select count(*) took the same amount of time . . . - Space was not freed in Oracle (DBA_FREE_SPACE) . . . - Space was not freed in the operating system . . . WHY NOT ????? Where are the benefits ??? INTELLIGENCE. INNOVATION. INTEGRITY From Swiss to Provolone After DELETE After Maintenance INTELLIGENCE. INNOVATION. INTEGRITY Post Archive Challenges Statistics are not fresh High Water Marks are very high Space has not been freed within Oracle (if that’s what you want) Space has not been freed to the OS INTELLIGENCE. INNOVATION. INTEGRITY Refresh Statistics Help the optimizer, easy enough dbms_stats provides many options INTELLIGENCE. INNOVATION. INTEGRITY Automatic Stats Recommended by Oracle Calls DBMS_STATS_JOB_PROC Enabled via: Begin dbms_auto_task_admin.enable( client_name => ‘auto optimizer stats collection’, operation => NULL, window_name => NULL); END; / INTELLIGENCE. INNOVATION. INTEGRITY When do you go manual ? High transaction DELETEs or TRUNCATEs Bulk loads which add more than 10% of table size So there’s our answer – go manual. INTELLIGENCE. INNOVATION. INTEGRITY How do we Gather Them? NOT the Analyze Command Instead DBMS_STATS package exec dbms_stats.gather_table_stats(ownname => 'BDB', tabname => 'MASTER', estimate_percent => dbms_stats.auto_sample_size); INTELLIGENCE. INNOVATION. INTEGRITY High Water Mark INTELLIGENCE. INNOVATION. INTEGRITY High Water Mark INTELLIGENCE. INNOVATION. INTEGRITY Reset High Water Mark (HWM) DROP or TRUNCATE Multiple OTHER ways to do this depending on version In v9 … alter table move tablespace [tsp name]; — Row movement must be enabled — Tablespace must be a LMT — Can move into same tablespace — Will occupy 2X space temporarily — Must then rebuild indexes In v10 … alter table <table_name> shrink space; INTELLIGENCE. INNOVATION. INTEGRITY Freeing Allocated Space INTELLIGENCE. INNOVATION. INTEGRITY Create table, check space SQL> create table space_example as select * from dba_source; Table created. SQL> select count(*) from space_example; COUNT(*) ---------296463 SQL> exec dbms_space.unused_space(‘DAVE', 'SPACE_EXAMPLE'); Total blocks: 6328 Unused blocks: 1 Unused bytes: 8192 Last Used Block: 55 Last Used Block ID: 10377 Last Used Ext File ID: 4 INTELLIGENCE. INNOVATION. INTEGRITY Check datafile space Size Current Poss. FILE_NAME Poss. Size Savings -------------------------------------------------- -------- -------- -------/export/home/ora102/oradata/ora102/qasb001.dbf 29 46 17 /export/home/ora102/oradata/ora102/example01.dbf 69 100 31 /export/home/ora102/oradata/ora102/qasb002.dbf 41 41 0 /export/home/ora102/oradata/ora102/system01.dbf 493 500 7 /export/home/ora102/oradata/ora102/sysaux01.dbf 430 430 0 /export/home/ora102/oradata/ora102/undotbs01.dbf 91 175 84 /export/home/ora102/oradata/ora102/users01.dbf 44 83 39 /export/home/ora102/oradata/ora102/test.dbf 51 70 19 INTELLIGENCE. INNOVATION. INTEGRITY Delete rows, check space SQL> delete from space_example; 296463 rows deleted. SQL> commit; SQL> exec dbms_space.unused_space(‘DAVE', 'SPACE_EXAMPLE'); Total blocks: 6328 Unused blocks: 1 Unused bytes: 8192 Last Used Block: 55 Last Used Block ID: 10377 Last Used Ext File ID: 4 Nothing Changed ! INTELLIGENCE. INNOVATION. INTEGRITY Shrink it, check space SQL> alter table space_example enable row movement; SQL> alter table space_example shrink space; SQL> exec dbms_space.unused_space('BDB', 'SPACE_EXAMPLE'); Total blocks: 8 Unused blocks: 4 Unused bytes: 32768 Last Used Block: 4 Last Used Block ID: 5129 Last Used Ext File ID: 4 Space Freed From Table, but still in Oracle INTELLIGENCE. INNOVATION. INTEGRITY Check space again Size Current Poss. FILE_NAME Poss. Size Savings -------------------------------------------------- -------- -------- -------/export/home/ora102/oradata/ora102/qasb001.dbf 29 46 17 /export/home/ora102/oradata/ora102/example01.dbf 69 100 31 /export/home/ora102/oradata/ora102/qasb002.dbf 41 41 0 /export/home/ora102/oradata/ora102/system01.dbf 493 500 7 /export/home/ora102/oradata/ora102/sysaux01.dbf 430 430 0 /export/home/ora102/oradata/ora102/undotbs01.dbf 171 175 4 /export/home/ora102/oradata/ora102/users01.dbf 44 83 39 /export/home/ora102/oradata/ora102/test.dbf 1 70 69 This datafile should be resized to save 69 MB SQL> alter database datafile '/export/home/ora102/oradata/ora102/test.dbf' resize 1m; INTELLIGENCE. INNOVATION. INTEGRITY Free the Space Space is still reserved for future inserts and updates, just not freed back to the OS Space will not be automatically freed – confirm by checking DBA_FREE_SPACE Ways to set it free — drop — truncate — alter table move … — alter table shrink space … INTELLIGENCE. INNOVATION. INTEGRITY Unindexed Foreign Keys Example 1 Million Rows PARENT COL1 ON DELETE CASCADE CHILD COL1 COL1_PARENT 1 Million Rows SQL> DELETE FROM PARENT WHERE COL1 < 1000; Fky.sql INTELLIGENCE. INNOVATION. INTEGRITY Before Index delete from parent where col1 < 1000 call count ------- -----Parse 1 Execute 1 Fetch 0 ------- -----total 2 cpu elapsed disk query current -------- ---------- ---------- ---------- ---------0.01 0.08 2 27 0 0.90 0.80 4 2208799 6062 0.00 0.00 0 0 0 -------- ---------- ---------- ---------- ---------0.91 0.88 6 2208826 6062 rows ---------0 999 0 ---------999 delete from "DAVE"."CHILD" where "COL1_PARENT" = :1 call count ------- -----Parse 1 Execute 999 Fetch 0 ------- -----total 1000 cpu elapsed disk query current -------- ---------- ---------- ---------- ---------0.00 0.00 0 0 0 285.94 293.11 1543900 2208789 1029 0.00 0.00 0 0 0 -------- ---------- ---------- ---------- ---------285.94 293.11 1543900 2208789 1029 rows ---------0 999 0 ---------999 INTELLIGENCE. INNOVATION. INTEGRITY SQL> create index prnt_ndx on child(col1_parent); delete from parent where col1 < 1000 call count ------- -----Parse 1 Execute 1 Fetch 0 ------- -----total 2 cpu elapsed disk query current -------- ---------- ---------- ---------- ---------0.00 0.00 0 0 0 0.53 0.47 7 13 7053 0.00 0.00 0 0 0 -------- ---------- ---------- ---------- ---------0.53 0.47 7 13 7053 rows ---------0 999 0 ---------999 delete from "DAVE"."CHILD" where "COL1_PARENT" = :1 call count ------- -----Parse 1 Execute 999 Fetch 0 ------- -----total 1000 cpu elapsed disk query current -------- ---------- ---------- ---------- ---------0.00 0.00 0 0 0 0.42 0.46 2 3002 4058 0.00 0.00 0 0 0 -------- ---------- ---------- ---------- ---------0.42 0.46 2 3002 4058 rows ---------0 999 0 ---------999 INTELLIGENCE. INNOVATION. INTEGRITY Unindexed Foreign Keys Problem is not limited to DELETE statements Search database for unindexed FK columns Script is on asktom — Search for unindex.sql INTELLIGENCE. INNOVATION. INTEGRITY Summary Points Create sound Archiving strategy based on Oracle technical features as well as business and/or legal requirements Leverage partitioning Move partitions to cheap disk when appropriate Make partitions ‘read only’ and compressed Remove data via DROP or TRUNCATE if possible If SQL DELETE, make sure to perform maintenance operations Consider 3rd partyINTELLIGENCE. solutions INNOVATION. INTEGRITY “Well done is better than well said” Ben Franklin Questions? INTELLIGENCE. INNOVATION. INTEGRITY