Download Partition Maintenance Effects

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SQL wikipedia , lookup

Oracle Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Ingres (database) wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Transcript
Database
High Availability Management
“OPTIMAL USAGE OF ORACLE’S PARTITIONING OPTION”
Frank Bommarito, SageLogix, Inc.
PARTITIONING - THE BEGINNING
Partitioning, as a concept, has been in existence since the beginnings of large databases (i.e. data
warehouses). The basic concept of partitioning is to divide one large table into multiple smaller
units. Each of the smaller units (or partitions) can then be accessed and managed separately.
The huge growth in various industries in the last decade has resulted in a phenomenal increase of
data, and therefore, the databases and tables that contain this data. With such large-scale growth,
interesting challenges were introduced for database administrators. Operations such as rebuilding
indexes became nearly impossible to complete within designated outage windows. Duplication of
tables from production to test environment became unmanageable. Query tuning was complicated
by the sheer volume of data, resulting in sub-optimal performance for both index and full table
scans. Partitioning was introduced to relieve these issues and allow continued growth of these large
tables while providing the database administrator the ability to manage the database with smaller
maintenance windows.
Oracle based partitioning was implemented fully with Oracle Enterprise Edition 8.0. From an
application perspective, Oracle masks the partitioning, allowing select, update, insert, and delete
operations against all the partitions without application modifications. Additionally, Oracle’s
optimizer is ‘partition aware’, meaning that the optimizer will avoid performing operations on
partitions that do not match the query criteria. This process is known as partition pruning, and adds
a new dimension of performance tuning that was previously unavailable for very large tables.
PARTITIONING CONCEPTS
When a partition table is created, two distinct object types for the table are created as well. These
object types are known as GLOBAL and LOCAL. GLOBAL objects refer to the table as a whole
and are not concerned with any of the individual pieces. LOCAL objects are the individual partitions
themselves.
A standard Oracle table can have indexes, constraints, triggers, etc. These same features are available
for partitioned tables. However, the implementation of the indexes is slightly different for partitioned
tables. This difference stems from the fact that the table’s rows are physically stored in multiple
objects as opposed to one object.
Consider the following example:
Create table range_partition (
Part_key number,
Value1 varchar2(30),
Value2 number)
Partition by range (part_key)
(
partition p1 values less than (80000),
partition p2 values less than (160000),
Page 1 of 9
Frank Bommarito – SageLogix, Inc.
Paper# 35697
Database
High Availability Management
partition pmax values less than (maxvalue)
);
In this example, a single table is created, but contains three physical segments. Indexes need to be
created so that they can access all of the tables segments.
This statement will create a NON-PREFIXED LOCAL index. LOCAL indicates that THREE
separate indexes are created. NON-PREFIXED indicates that the index does not have the partition
key as the leading column. The partition key is derived from the statement “Partition by range
(part_key)”. In this example, the column part_key is the partition key.
Create index idx_example1 on range_partition (value2) LOCAL;
This statement will create one single index. This index is known as a GLOBAL index as it includes
rows from all partitions.
Create index idx_example2 on range_partition (value1) GLOBAL;
This statement creates a PREFIXED LOCAL index.
Create index idx_example3 on range_partition (part_key) LOCAL;
After executing the three index creation statements, three new database objects (indexes) and seven
new database segments (physical segments) will exist.
Performance Considerations:
Please look at the following example.
Facts:
The table range_partition is loaded with 256,000 rows. All three partitions have a near equal
distribution of those rows. All three columns have unique values. The three indexes above have
been created on the table.
20,000 queries are generated and executed for each column that is indexed.
Results
Column
Value1
Value2
Part_key
Index Type
GLOBAL
NON-PREFIXED LOCAL
PREFIXED LOCAL
Total Time
7 minutes 20 seconds
12 minutes 30 seconds
6 minutes 10 seconds
The same table was created without partitioning. Same table, same rows!
Column
Index Type
Total Time
Value1
Standard
7 minutes 20 seconds
Value2
Standard
7 minutes 20 seconds
Part_key
Standard
7 minutes 20 seconds
GLOBAL indexes are similar to indexes on non-partitioned tables. Local prefixed indexes are the
fastest option as partition pruning can occur. Non-prefixed local indexes are the slowest, as they
need to perform (in this example) more index scans than the other choices.
PARTITION MAINTENANCE
Given the performance considerations, why would anyone utilize a non-prefixed local index?
Page 2 of 9
Frank Bommarito – SageLogix, Inc.
Paper# 35697
Database
High Availability Management
The answer is maintenance. Partitions offer the ability to provide maintenance on excessively large
tables in a timely manner. This is largely due to the fact that the table is split up into smaller, more
manageable units. Each individual partition can have maintenance performed on it without affecting
the other partitions. The independence between partitions allows the maintenance on those very
partitions to take place simultaneously.
However, any maintenance on a partition does have an impact on GLOBAL table items. Each
GLOBAL table item potentially impacted by partition maintenance is identified below.
Maintenance operations may include:
1. Rebuild a specific partition’s data segments
2. Exchange a non-partitioned table with a partition
3. Merge two partitions together
4. Divide two partitions apart
5. Add new partitions to the table
6. Drop old partitions from the table
GLOBAL INDEXES
Whenever a single ROW is affected by a partition maintenance operation, the ENTIRE global index
becomes invalid. Starting with release 9i of Oracle, the “update global index” clause can be applied
to partition maintenance operations. This clause rebuilds only those components of the GLOBAL
index that are impacted. This rebuilding can cause performance degradations. However, these
degradations are minimal compared to the impact of INVALIDATING important indexes.
Example:
Alter table range_partition move partition p1 tablespace new_tablespace;
This command will rebuild the partition and locate the newly rebuilt partition in the tablespace
“new_tablespace”. If the partition “p1” has one or more rows in it, then, any global indexes will
become “UNUSABLE”. This means that application will begin to receive errors if the application
needs to access the index. The following command will rebuild the unusable index.
Alter index idx_example2 rebuild;
However, this command will need to re-index the entire table’s contents and will not be working only
on the deltas.
This command will perform the same operation, but, will add the additional task of “fixing” the
global index upon completion of the move.
Release 9i and above
Alter table range_partition move partition p1 tablespace new_tablespace update global indexes;
CONSTRAINTS
Constraints are likely the largest single prohibitive unit for partition maintenance. Most partition
maintenance operations do not work when constraints are enabled. Typically, the constraint needs to
be dropped a re-applied after the partition maintenance operations. To this end, Oracle has added
some new syntax that is handy when disabling constraints.
Alter constraint pk_contstraint disable keep index;
The keep index clause will not drop the index. The maintenance operations can proceed and the
index pieces that need rebuilding can occur. Once complete, the constraint can be re-enabled with a
relatively short time period.
Page 3 of 9
Frank Bommarito – SageLogix, Inc.
Paper# 35697
Database
High Availability Management
Example
CREATE TABLE part_test
(ID NUMBER NOT NULL, NUMB NUMBER)
PARTITION BY RANGE (ID)
(PARTITION P1 VALUES LESS THAN (10), PARTITION P2 VALUES LESS THAN (20));
CREATE unique INDEX part_test_pkx ON part_test (ID) LOCAL;
ALTER TABLE part_test ADD CONSTRAINT part_test_pk PRIMARY KEY (ID) USING
INDEX;
create table fk_table (id number, descr varchar2(30));
ALTER TABLE fk_table ADD CONSTRAINT fk_table_fk FOREIGN KEY (ID) REFERENCES
PART_TEST(ID);
create table part_exch (ID NUMBER NOT NULL, NUMB NUMBER);
insert into part_test values (1,1);
alter table part_test exchange partition p1 with table part_exch;
ERROR at line 1:
ORA-02266: unique/primary keys in table referenced by enabled foreign keys
Stored PL/SQL
Within Oracle databases, stored PL/SQL often exists. These program units have dependencies upon
database objects. When the database object is modified, the PL/SQL program units need recompilation to ensure that the modifications are valid. Partition maintenance operations seem to be
logically excluded from this. The addition of a new partition does not appear to have any logical
impact on stored PL/SQL. However, the addition of a new partition will invalidate any dependent
PL/SQL program. Release 9i and above of Oracle have handled this by automatically recompiling
the invalidated programs.
When will PL/SQL become invalid? The answer is whenever the data dictionary needs to add or
remove a row resulting from the partition maintenance operation.
The following command does not cause invalidation as the data dictionary is simply updated.
Alter table range_partition exchange partition p1 with table no_partition;
The following command does cause invalidation as the data dictionary is removing a row.
Alter table range_partition drop partition p1;
TYPES OF PARTITIONING
The example shown above utilized a partitioning type known as RANGE partitioning.
Oracle supports two other types of partitions (HASH and LIST). Also, within partitioning, a subpartition can exist. Sub-partitions also referred to as composite partitions, can be either HASH or
LIST.
RANGE
Range partitions are the most common. Table and index partitions are based on a list of columns
allowing to the database to store each occurrence in a given partition. These partitions are typically
used within data warehousing systems. The most common range boundary is based off of dates.
Page 4 of 9
Frank Bommarito – SageLogix, Inc.
Paper# 35697
Database
High Availability Management
Each partition is defined with an upper boundary. The storage location of each occurrence is then
found by comparing the partitioning key of the occurrence with this upper boundary. This upper
boundary is non-inclusive; in other words, the key of each occurrence must be less than this limit for
the record to be stored in this partition.
HASH
Hash partitions are ideal when there is no real method to divide a table based on a range. Hash
partitions utilize a hashing algorithm to programmatically take a column value and store that value
within a given partition. Each partition is defined with an upper boundary. The storage location of
each occurrence is then found by comparing the partitioning key of the occurrence with this upper
boundary. This upper boundary is non-inclusive; in other words, the key of each occurrence must be
less than this limit for the record to be stored in this partition. This type of partitioning is
recommended when it is difficult to define the criteria for the distribution of data.
LIST
List partitions have a hard-coded LIST of values that will exist within any partition. A common
usage would be with states. A state partition table would commonly have 50 partitions, one for each
state.
SUB-Partitions are utilized most often when the partition strategy does not provide small enough
partition units to achieve maintenance goals. When this is true, sub-partitions can further divide a
table based another column.
Examples
RANGE: A max partition will capture any values beyond the stated ranges – including NULLS
Create table range_partition
( date_col date)
partition by RANGE (date_col)
(
partition p_jan_2001 values less than (to_date(‘01022001’,’ddmmyyyy’)),
partition p_feb_2001 values less than (to_date(‘01032001’,’ddmmyyyy’)),
partition pmax values less than (maxvalue)
);
HASH – Hash partitions are most optimal when 8, 16, or 32 partitions are used.
Create table hash_partition
(account_id varchar2(30))
partition by HASH (account_id) partitions 16
LIST
Create table list_partition
(state_id varchar2(2))
partition by LIST (state_id)
(
partition P_MI values (‘MI’),
partition P_CO values (‘CO’)
);
Page 5 of 9
Frank Bommarito – SageLogix, Inc.
Paper# 35697
Database
High Availability Management
PRACTICAL PARTITIONING USAGES
There are four widely accepted usage models for partitioning. Each of these models is tailored for a
particular need. Usage of partitioning within the boundaries of these models allows for significant
application improvements in the area of performance, scalability, availability, and organization.
Partition Usage I – Data Warehousing
Partitions typically based on date ranges (daily or monthly).
Partition Usage II – OLTP
Partitions typically based upon a frequently accessed key.
Partition Usage III – ODS
Partitions typically based upon a date range and a key.
Partition Usage IV – Temporary Storage
Partitions rotate and are reused over time.
A typical example would be a partition based off of the day of month. Thirty-one partitions are
created and a date function is used to place rows in a partition based off of the day of month. These
partitions are read by another application that TRUNCATES the partitions after reading the data.
STATISTICS
The cost-based optimizer of Oracle is partitioning-aware. In fact, the rule-based optimizer does not
“do” partitions.
The cost-based optimizer works off of statistics. Statistics on standard tables are easier to generate
and comprehend than statistics on partition tables.
Statistics are the number one problem with partitioning implementations.
With partitions, there are LOCAL and GLOBAL statistics. GLOBAL statistics are utilized whenever
GLOBAL operations are performed. LOCAL statistics are utilized when the partition key is
available and partition elimination is possible.
Consider the following examples:
Select * from range_partition where value1 = :b1;
In this example, value1 is indexed GLOBALLY. This means that only global statistics are reviewed.
The optimizer will then determine if full tables scan or an index lookup is most appropriate.
Select * from range_partition
Where value1 = :b1
And value2 = :b2
And part_key = :b3
In this example, local statistics are evaluated along with global statistics. Local statistics come into
play because the PART_KEY is within the where clause.
Page 6 of 9
Frank Bommarito – SageLogix, Inc.
Paper# 35697
Database
High Availability Management
Statistics can be gathered LOCALLY or GLOBALLY. Once these are gathered, they are tied
together, in effect. This means that partition maintenance operations that impact GLOBAL
operations will also impact GLOBAL statistics. If a partition is added to an existing table, the
GLOBAL statistics will “disappear”.
The low down on statistics is as follows:
NO table statistics
If there are NO table statistics at all, then, the optimizer acts “relatively” rule-based.
Relatively Rule
Rule 1: If a GLOBAL index exists and can be used, it will be. LOCAL indexes are not considered
unless there is not a GLOBAL index that is usable.
Rule 2: If there are not any GLOBAL indexes, LOCAL indexes will be used if they exist.
What this means is that NO statistics is a viable option, if, all indexes created on the table are good
choices and any GLOBAL indexes are superior to LOCAL indexes. Conclusion, if the partition is to
be queried from a single column and that column is the partition key and is indexed, then, the
absence of gathering statistics is optimal.
Gathering Statistics - LOCALLY
If the following commands are used INITIALLY to gather statistics, and, no other complimentary
command is used, then GLOBAL statistics are derived.
execute dbms_stats.gather_table_stats(owner,'RANGE_PARTITION','P2',CASCADE=>TRUE);
or
execute dbms_stats.gather_table_stats(owner,'RANGE_PARTITION',
'P2',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL INDEXED COLUMNS SIZE 200');
The only difference between these two statistics commands is the generation of histograms.
Consideration when generating statistics locally:
GLOBAL statistics are populated after each running of a LOCAL script.
After generating statistics on SOME of the partitions:
Select num_rows from dba_tables where table_name = ‘RANGE_PARTITION’;
NUM_ROWS=NULL
Select partition_name,num_rows from dba_tab_partitions where table_name = ‘RANGE_PARTITION’;
All rows have a NUM_ROWS=NULL
After a SINGLE execution of a local partition statistic generation statement:
execute dbms_stats.gather_table_stats('SYSTEM','RANGE_PARTITION','P1',CASCADE=>TRUE);
Select num_rows from dba_tables where table_name = ‘RANGE_PARTITION’;
Global Result=250,000
Select partition_name,num_rows from dba_tab_partitions where table_name = ‘RANGE_PARTITION’;
Results=
P1
79999
P2
NULL
PMAX NULL
Page 7 of 9
Frank Bommarito – SageLogix, Inc.
Paper# 35697
Database
High Availability Management
The GLOBAL statistics are “guessed” at and populated. Once all LOCAL statistics are generated,
the GLOBAL statistics are still an aggregate and not “reality”. What this means is that gathering
statistics this way still uses the relatively rule method of optimization. If a GLOBAL index exists, it
is used. LOCAL indexes are evaluated if the where clause allows for partition pruning.
Why is this? Because the above commands did not and do not account for GLOBAL table units
(The GLOBAL indexes were never analyzed).
Once GLOBAL indexes are analyzed, then, all needed “units” have statistics and the optimizer takes
over (heaven help us).
execute dbms_stats.gather_index_stats('SYSTEM','RANGE_PARTITION_DESC');
Gathering Statistics - GLOBALLY
Usage of the following command to gather statistics will gather GLOBAL and LOCAL statistics.
execute dbms_stats.gather_table_stats(owner,'RANGE_PARTITION',
GRANULARITY=>'ALL',CASCADE=>TRUE);
This one command is equivalent to all of the commands above. This is the recommended approach
for the initial gathering of statistics on partitions as this ensures that ALL statistics are gathered.
Partition Maintenance Effects
The effects of performing partition maintenance vary by release. In release 8.x, the GLOBAL
statistics temporarily disappear (the num_rows value becomes NULL). In release 9.x, the GLOBAL
statistics do not change.
In either case, the statistics are no longer valid and are in need of updating. One of the advantages of
a partitioned table is to perform maintenance work on smaller segments. What this means are that
ONLY the modified partitions need to be updated. This update will correct the GLOBAL statistics
for the table (including GLOBAL index statistics).
Once any partition modifications occur, ensure to run the statistics immediately on the effected
partitions. Failure to do so can lead to the optimizer’s inability to parse the SQL statement (This
could lead to hanging). If this phenomenon occurs, the best corrective actions are to remove all
statistics and to generate them again.
PARTITIONING OPTIONS
Enable Row Movement
A new option with Oracle partition started in release 8I and above. This option allows updates to
the partition key to occur when that update would “relocate” a row from one partition to another
partition. Please note that such an operation WILL CHANGE the ROWID for the row. The
ROWID changes because the partition identification is stored within the ROWID. This could have
impact on application programs that utilize ROWID.
Exchange without validation
When a partition is exchanged with another table
Alter table part_table exchange partition p1 with table fk_table without validation;
It is possible that the rows from this partition cannot be transparently queried. The validation
ensures that rows in “fk_table” qualify for the given partition. When this option is bypassed (for
Page 8 of 9
Frank Bommarito – SageLogix, Inc.
Paper# 35697
Database
High Availability Management
performance reasons), then, care must be taken to ensure that the new partition rows do not violate
the constrained partition boundaries.
Given that partition pruning occurs prior to selection, the violation of the boundaries could render
false results from a query.
Example
CREATE TABLE part_test
(ID NUMBER NOT NULL) PARTITION BY RANGE (ID)
(PARTITION P1 VALUES LESS THAN (10), PARTITION P2 VALUES LESS THAN (20));
create table fk_table (id number not null);
insert into part_test values (5);
insert into fk_table values (5);
commit;
alter table part_test exchange partition p2 with table fk_table without validation;
-- Returns 2 rows - both with the value of 5
select * from SYSTEM.RANGE_PARTITION;
-- Returns 1 row - with the value of 5
select * from SYSTEM.RANGE_PARTITION where id=5;
Oracle Initialization Parameters
Oracle’s partitioning option coverts one single table into many physical segments. More physical
segments require more resources from the Oracle SGA. In particular, the Oracle initialization
parameter DML_LOCKS must be set to accommodate partitioning. If a table as 1000 partitions,
then DML_LOCKS must be set to at least 1000 or the table cannot be created.
CONCLUSION
With the advent of partitioning, improved database administration with maintenance operations
occurring at a partition level rather than at the table or index level allow Database Administrators to
provide improved SLA’s. This alone makes partitioning a crucial aspect of any database containing
large amounts of data. As maintenance windows decrease in length due to the cost of downtime,
understanding methods to shorten database downtime is critical for success. After researching the
various methods, each DBA should test the partitioning scheme best suited for his/her environment.
Performance and maintenance are the primary concerns to account for when implementing
partitioning option. The partitioning of large tables allows for faster data access, as well as decreased
maintenance windows. Obviously, partitioning should not be taken lightly, however, it should be
considered for any database with excessive data or when excessive growth is anticipated.
Please check out our website at www.sagelogix.com/partitioning. This location has a download zip
file containing source code , which will automate the maintenance of date based range partitions.
Page 9 of 9
Frank Bommarito – SageLogix, Inc.
Paper# 35697