Download using dbms_stats

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data analysis wikipedia , lookup

SQL wikipedia , lookup

Data vault modeling wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Versant Object Database wikipedia , lookup

PL/SQL wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Oracle Database wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Database Administration
Reviewed by Oracle Certified Master Korea Community
( http://www.ocmkorea.com http://cafe.daum.net/oraclemanager )
ANALYZE THAT DATABASE NOW
(USING DBMS_STATS)
ABSTRACT
The cost-based optimizer has the responsibility for determining the best method for executing a SQL statement. The key to
the optimizer is a current set of statistics which reflects the state of your database. This presentation discusses the Oracle
provided package DBMS_STATS which is used for statistics gathering. This tool replaces the ANALYZE command which
was deprecated in Oracle9i and is now used only for specialized functions. Also discussed are many of the new Oracle
Database 10g enhancements.
INTRODUCTION
Statistics can be generated that quantify the data distribution and storage characteristics of tables, columns, indexes, and
partitions. The cost-based optimizer uses these statistics to estimate the “cost” of each possible execution plan. The
optimizer uses this information to choose a particular access method and to determine the optimal join order and join method.
As a part of the design process for an application, a developer needs to design frequency requirements for statistics gathering.
This is generally based on the frequency of modifications to the objects that change data volumes or column values. The
database administrator then gathers the statistics. Therefore, it is essential that both DBAs and developers have an
understanding of the requirements and the process.
Beginning with Oracle8i, Oracle provides the package DBMS_STATS to gather the appropriate statistics. This package has
been enhanced in Oracle9i and again in Oracle Database 10g. Prior to Oracle8i, statistics were gathered using the ANALYZE
command. Oracle now strongly recommends using DBMS_STATS to collect statistics. The cost-based optimizer will
eventually use only statistics that have been gathered with DBMS_STATS. ANALYZE should be used only for VALIDATE
and LIST CHAINED ROWS. ANALYZE COMPUTE and ESTIMATE were deprecated in Oracle9i and are supported for
backward compatibility only.
UNDERSTANDING STATISTICS
Statistics which quantify the data distribution and storage characteristics of tables, columns, and indexes are critical for the
cost-based optimizer to make appropriate decisions.
OVERVIEW OF THE COST-BASED OPTIMIZER
There is almost always more than one way for Oracle to execute a Data Manipulation Language (DML) statement (SELECT,
INSERT, UPDATE, MERGE, and DELETE). The objective of the optimizer is to choose the most efficient method of
executing the statement. The optimizer tasks include:
• Possible transformation of the statement into an equivalent, more efficient expression
• Decide how to access the data stored in the database
• For tables that access more than one table, decides how to perform the necessary joins
Paper # 515
Database Administration
To accomplish this, the cost-based optimizer chooses an access path based on several factors:
• The available access paths for the statement
• Determines which access paths are available by examining
o The conditions in the WHERE clause
o The FROM clause if SAMPLE or SAMPLE BLOCK are used
• Generates possible execution plans for the available access paths
• Estimates the cost for each plan using the statistics for the tables, indexes, and columns accessible to the statement
• Chooses the execution plan with the lowest estimated cost (lowest execution time)
THE IMPORTANCE OF GATHERING STATISTICS
The optimizer requires accurate information to create efficient execution plans. It is important to analyze all tables and
indexes. For example, the optimizer makes its worst choices on a join when one table is analyzed and one is not. It will use
Oracle-defined default values when statistics do not exist
Statistics will need to be gathered periodically as they become stale over time due to changing data volumes or data
distribution. Existing statistics are updated the next time statistics are gathered. Any currently parsed statements that access
the object(s) are invalidated. The optimizer then chooses a new execution plan based on the new set of statistics.
Remember to analyze all tables and indexes. Don’t forget to forget to generate statistics when:
• A table is rebuilt or moved
• An index is added
• Moving from development to production or vice-versa
To determine which tables and indexes have been analyzed, use the LAST_ANALYZED column in the data dictionary views.
STATISTICS GATHERED AND STORED
The statistics gathered by DBMS_STATS include:
• Table statistics
o Number of rows
o Number of blocks used (always exact)
o Average row length
• Column statistics
o Number of distinct values per column
o Number of nulls in the column
o Data distribution
• Index statistics
o Index level (always exact)
o Number of leaf blocks
o Clustering factor
o Minimum/maximum key values
• System statistics (new in Oracle9i)
o I/O and CPU performance and utilization
Paper # 515
Database Administration
The statistics are stored in the data dictionary:
• DBA_TABLES
• DBA_TAB_HISTOGRAMS
• DBA_OBJECT_TABLES
• DBA_TAB_COL_STATISTICS
• DBA_INDEXES
• DBA_CLUSTERS
• DBA_TAB_PARTITIONS
• DBA_TAB_SUBPARTITIONS
• DBA_IND_PARTITIONS
• DBA_IND_SUBPARTITIONS
• DBA_PART_COL_STATISTICS
• DBA_PART_HISTOGRAMS
• DBA_SUBPART_COL_STATISTICS
• DBA_SUBPART_HISTOGRAMS
And the new Oracle Database 10g views:
• DBA_TAB_STATISTICS
• DBA_IND_STATISTICS
You may also use the equivalent USER_ or ALL_ views.
USING THE DBMS_STATS PACKAGE
The PL/SQL package lets you generate and manage statistics for cost-based optimization. The package may be used to gather,
modify, view, export, import, and delete statistics. You can also name sets of gathered statistics. The package is used to gather
statistics on individual indexes, tables, columns, and partitions. You may also gather statistics for all objects in a schema or the
entire database. Statistics cannot be gathered for a cluster but may be gathered for the individual tables in the cluster. For
partitioned tables and indexes, statistics can be gathered for each partition or subpartition as well as global statistics for the
entire table or index. The optimizer can then choose to use either the partition statistics or the global statistics.
DBMS_STATS gathers only the statistics needed by the cost-based optimizer. It does not gather the number of chained rows,
average free space, or the number of unused data blocks which were formally gathered by the ANALYZE command
Procedures in the DBMS_STATS package include:
Description
Procedure
GATHER_INDEX_STATS
Collects index statistics
GATHER_TABLE_STATS
Collects table, column, and index statistics
GATHER_SCHEMA_STATS
Collects statistics for all objects in a schema
GATHER_DATABASE_STATS
Collects statistics for all objects in a database
GATHER_SYSTEM_STATS
Collects statistics on I/O and CPU utilization
GATHER_DICTIONARY_STATS
Gathers statistics for all tables in the data dictionary (new in Oracle 10 g)
GATHER_FIXED_OBJECTS_STATS Gathers statistics for fixed objects (V$ views) (new in Oracle 10 g)
May also gather index statistics while creating or rebuilding the index using the COMPUTE STATISTICS option. Beware, in
order to ensure correctness of the statistics, Oracle always uses the base tables when creating the index even if another index is
available that could be used.
Paper # 515
Database Administration
SYNTAX
Following is the syntax for the GATHER_TABLE_STATS procedure. The other procedures have similar syntax and options.
See the documentation referenced later for the complete syntax for all of the procedures.
Syntax:
DBMS_STATS.GATHER_TABLE_STATS (
ownname
VARCHAR2,
tabname
VARCHAR2,
partname
VARCHAR2 DEFAULT NULL,
estimate_percent
NUMBER DEFAULT NULL,
block_sample
BOOLEAN DEFAULT FALSE,
method_opt
VARCHAR2 DEFAULT ’FOR ALL COLUMNS SIZE 1’,
degree
NUMBER DEFAULT NULL,
granularity
VARCHAR2 DEFAULT ’DEFAULT’,
cascade
BOOLEAN DEFAULT FALSE,
stattab
VARCHAR2 DEFAULT NULL,
statid
VARCHAR2 DEFAULT NULL,
statown
VARCHAR2 DEFAULT NULL,
no_invalidate
BOOLEAN DEFAULT FALSE)
Parameter Descriptions:
Parameter
Description
ownname
Schema of table to analyze.
tabname
Name of table.
partname
Name of partition.
estimate_percent Percentage of rows to estimate (NULL means compute) The valid range is
[0.000001,100]. Use the constant DBMS_STATS.AUTO_SAMPLE_SIZE to have
Oracle determine the best sample size for good statistics.
block_sample
Whether or not to use random block sampling instead of random row sampling.
Random block sampling is more efficient, but if the data is not randomly
distributed on disk, then the sample values may be somewhat correlated. Only
pertinent when doing an estimate statistics.
method_opt
Accepts:
FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
FOR COLUMNS [size clause] column|attribute [size_clause]
[,column|attribute [size_clause]...], where size_clause is defined
as: size_clause := SIZE {integer | REPEAT | AUTO | SKEWONLY}
integer—Number of histogram buckets. Must be in the range [1,254].
REPEAT—Collects histograms only on the columns that already have histograms.
AUTO—Oracle determines the columns to collect histograms based on data distribution
and the workload of the columns.
SKEWONLY—Oracle determines the columns to collect histograms based on the data
distribution of the columns.
degree
Degree of parallelism. NULL means use the table default value specified by
the DEGREE clause in the CREATE TABLE or ALTER TABLE statement. Use the
constant DBMS_STATE.DEFAULT_DEGREE to specify the default value based on the
initialization parameters.
Paper # 515
Database Administration
granularity
no_invalidate
The granularity of statistics to collect (only pertinent if the table/index is partitioned):
’DEFAULT’ - gathers global and partition-level statistics
’SUBPARTITION’ - gathers subpartition-level statistics
’PARTITION ’- gathers partition-level statistics
’GLOBAL’ - gathers global statistics
’ALL’ - gathers all (subpartition, partition, and global) statistics
Dependent cursors are not invalidated if this parameter is set to TRUE
GENERATING STATISTICS
Statistics may be generated using:
• Estimation based on random data sampling
• Exact computation
• User-defined statistics collection methods
We will concentrate on the first two. User-defined statistics are defined using SET and GET operations. This is beyond the
scope of our discussion but are well-documented in the Oracle documentation referenced later.
To perform an exact computation, Oracle requires enough space to perform a full-table scan and sort of the table. This may
require the use of temporary space. For estimations, sufficient space is required to scan and sort only the rows in the
requested sample. Indexes take less time and space.
Some statistics are always computed exactly, such as the number of data blocks currently containing data and the depth of an
index tree.
Oracle recommends estimating statistics by setting the ESTIMATE_PERCENT parameter to
DBMS_STATS.AUTO_SAMPLE_SIZE. This maximizes performance gains while achieving “necessary statistical accuracy.”
Oracle then determines the best sample size for good statistics. For example, to collect table and column statistics for all
tables in the SCOTT schema with auto-sampling:
EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS
(OWNNAME => ‘SCOTT’,
ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE,
CASCADE => TRUE);
The CASCADE => TRUE option causes all indexes for the tables to also be analyzed.
ESTIMATING STATISTICS
DBMS_STATS allows the specification of a user-defined sampling percentage.
• Row sampling reads rows without regard to physical placement on disk
o Provides the most random data for estimates
o To use, specify ESTIMATE_PERCENT => 5
o Valid values from 0.000001 to 100
o A null value causes an exact computation (all rows analyzed)
o A value of 5% usually forces the optimizer to make the same decision as an exact computation
o A value greater than 15% will usually result in 100% of the rows being read and analyzed
Paper # 515
Database Administration
For example:
EXECUTE DBMS_STATS.GATHER_TABLE_STATS
(OWNNAME => ‘SCOTT’,
TABNAME => ‘EMPLOYEE’,
ESTIMATE_PERCENT => 5);
Block sampling reads a random sample of blocks and uses all of the rows in those blocks for estimates
o Reduces the amount of I/O activity for a given sample size
o Can reduce the randomness of the sample
o Specify the BLOCK_SAMPLE parameter with an appropriate percent
o For example:
EXECUTE DBMS_STATS.GATHER_INDEX_STATS
(OWNNAME => ‘SCOTT’,
INDNAME => ‘EMPLOYEE_PK’,
BLOCK_SAMPLE => 5);
o
•
STATISTICS FOR THE DATA DICTIONARY
New in Oracle Database 10g is the ability to gather statistics for the data dictionary. The objective is to enhance the
performance of queries. There are two basic types of dictionary base tables.
The statistics for normal base tables are gathered using GATHER_DICTIONARY STATISTICS. They may also be
gathered using GATHER_SCHEMA_STATS for the SYS schema. Oracle recommends gathering these statistics at a similar
frequency as your other database objects.
Statistics for fixed objects (the V$ views on the X$ tables) are gathered using the GATHER_FIXED_OBJECT_STATS
procedure. The initial collection of these statistics is normally sufficient. Repeat only if workload characteristics have changed
dramatically. The SYSDBA privilege or ANALYZE ANY DICTIONARY and ANALYZE ANY privileges are required to
execute the procedures for gathering data dictionary statistics.
TRANSFERRING STATISTICS
Statistics can be saved into a statistics table and then loaded into the data dictionary. This allows statistics to be saved before
gathering new ones. They could then be used to restore the old statistics in the event performance degrades. For instance,
when upgrading Oracle to a new release. May also be used to copy statistics to another instance. As an example, may want to
copy from a production database to a test database. Then the smaller database could be used for testing but will see the
execution plans that would occur on the larger production database.
To transfer the statistics, use the following DBMS_STATS procedures:
Procedure
Description
CREATE_STAT_TABLE
Creates the statistics table
EXPORT_TABLE_STATISTICS
Copies the statistics into the table
IMPORT_TABLE_STATISTICS
Copies the statistics into the data dictionary of the other instance
These procedures may also be used to export/import column, index, schema, database, or system statistics. May also be used
to maintain and experiment with sets of statistics.
Paper # 515
Database Administration
Occasionally, gathering new statistics causes a degradation in performance. For specific, critical tables, it is possible to save the
old statistics before gathering new ones. You could then revert to the old statistics if necessary while analyzing and resolving
the reasons for the performance change.
BEGIN
DBMS_STATS.CREATE_STAT_TABLE
(OWNNAME => 'USER1',
TABNAME => 'SAVESTATS');
DBMS_STATS.GATHER_TABLE_STATS
(OWNNAME => 'USER1',
TABNAME => 'EMPLOYEE',
STATTAB => 'SAVESTATS');
END;
If the new statistics are causing the optimizer to generate poor plans, restore the original stats while resolving the problem.
BEGIN
DBMS_STATS.DELETE_TABLE_STATS
(OWNNAME => 'USER1',
TABNAME => 'EMPLOYEE');
DBMS_STATS.IMPORT_TABLE_STATS
(OWNNAME => 'USER1',
TABNAME => 'EMPLOYEE',
STATTAB => 'SAVESTATS');
END;
RESTORING AND PURGING STATISTICS
Beginning in Oracle Database 10g, statistics may be restored as of a specified time stamp. This may be useful when newly
collected statistics leads to sub-optimal execution plans. To revert to a previous set of statistics for a specific table, use the
RESTORE_TABLE_STATS procedure. This may also be done at the schema or database level. Similar procedures are
available for fixed objects, dictionary, and system statistics. Whenever statistics in the dictionary are modified, old versions are
automatically saved. Old statistics are automatically purged at regular intervals based on a configurable retention setting or
purged beyond a specified time stamp using the PURGE_STATS procedure.
There are dictionary views that display the time of statistics modifications. These views are useful in determining the time
stamp to be used for statistics restoration. The GET_STATS_HISTORY_RETENTION function can be used to get the
current statistics history retention value. The GET_STATS_HISTORY_AVAILABILITY function gets the oldest time
stamp where statistics history is available. Statistics cannot be restored to a time stamp older than the oldest available time
stamp.
COLUMN HISTOGRAMS
It is essential that the optimizer be aware of columns that have severe data skewness or it may make bad decisions. This is
accomplished by providing histograms which provide detailed information to the optimizer about data distributions which are
not uniform. Examples of where this may be an issue include:
• A column which has non-uniform data distribution such as an employee name column where there my be more
SMITHs than ZABRISKIEs
• A column where a large percentage of the rows have the same value for a given column such as most employees hired
in the past two years.
Paper # 515
Database Administration
Histograms can result in significant performance improvements but does require a knowledge of the application and the data
distribution. As an example, to develop histograms only for indexed columns:
METHOD_OPT => ‘FOR ALL INDEXED COLUMNS SIZE 10’
where size specifies the number of histogram buckets.
Beginning with Oracle9i, Oracle provides additional options for automating the decision of which columns to build
histograms for and the definition of the appropriate number of histogram buckets. AUTO creates histograms based on data
distribution and the way columns a re being accessed by the application. This does require the application to run for an
amount of time to allow the workload to be captured. To use:
METHOD_OPT => ‘FOR ALL INDEXED COLUMNS SIZE AUTO’
The SIZE option, SKEWONLY creates histograms based strictly on data distribution. This is useful while time is allowed to
capture the workload. However, it is expensive in terms of resource so it should be restricted to the initial collection of
statistics and then it is generally better to use the AUTO option.
The final SIZE option, REPEAT refreshes the existing column statistics including histograms. If a column does not presently
have histograms, new ones will not be created.
AUTOMATING STATISTICS COLLECTION
The DBMS_STATS procedures can be executed inside PL/SQL procedures. As an example:
CREATE OR REPLACE PROCEDURE gather_stats_proc
BEGIN
DBMS_STATS.GATHER_SCHEMA_STATS
(OWNNAME => ‘SCOTT’,
ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE);
END;
/
The procedure could then be scheduled using the Oracle-provided package, DBMS_JOB in releases through Oracle9i,
DBMS_SCHEDULER in Oracle Database 10g, third-party scheduling tools, or other job scheduling tools at the operating
system level.
In Oracle8i, the MONITORING clause in the CREATE (or ALTER) TABLE statement is set to start or stop the collection of
modification statistics:
ALTER TABLE table_name MONITORING|NOMONITORING
May also activate monitoring using the DBMS_STATS procedures:
ALTER_SCHEMA_TAB_MONITORING
ALTER_DATABASE_TAB_MONITORING
Monitoring can be controlled using DBMS_STATS to enable or disable.
In Oracle9i, monitoring may be set the schema or database level. For an individual schema use
ALTER_SCHEMA_TABLE_MONITORING. For the entire database use ALTER_DATABASE_TABLE_MONITORING.
Paper # 515
Database Administration
All of the above monitoring options are obsolete in Oracle Database 10g. They may still be used, however, they will have no
affect. Beginning with Oracle Database 10g, the STATISTICS_LEVEL initialization parameter sets table monitoring at the
database level. It defaults to TYPICAL which ensures collection of all major statistics. Set to BASIC to disable monitoring.
MONITORING tracks the approximate number of INSERT, UPDATE, and DELETE operations for the table since the last
time statistics were gathered. The monitoring information is maintained in the System Global Area (SGA). Then, periodically
(about every three hours) the information is incorporated into the data dictionary. This information can be seen in the
DBA_TAB_MODIFICATIONS view. Oracle uses this information to identify tables with stale statistics. Statistics are
considered stale when 10% of the rows have changed.
Automated statistics gathering is then invoked by setting a recurring job specifying OPTIONS => GATHER STALE. The
job would be set to execute at an appropriate interval for the specific application. Beginning with Oracle9i, GATHER STALE
can be changed to GATHER AUTO as discussed earlier.
LOCKING STATISTICS
Beginning in Oracle Database 10g, statistics may be locked. When a table is locked, all statistics on the table, including table,
column, histograms and the statistics on all dependent indexes are locked. When the statistics are locked, the GATHER_*,
IMPORT_*, and DELETE_* procedures will raise an error. The procedures that operate on multiple objects
(GATHER_SCHEMA_STATS) will skip modifying the statistics of an object if it is locked. This feature can be used to freeze
the current set of statistics or to keep the statistics empty when required. Set locking using the procedures
LOCK_SCHEMA_STATS and/or LOCK_TABLE_STATS. Use the UNLOCK_* procedures to unlock the statistics.
GATHERING SYSTEM STATISTICS
System statistics enable the optimizer to consider a system’s I/O and CPU performance and utilization. System statistics
depend on many factors and do not stay constant at all times. Therefore, capture statistics during the interval of time when
the system has the most common workload. For example, gather OLTP statistics for use during the daytime and batch
statistics for use during the nighttime. The appropriate set is then activated as required. Unlike table, index, or column
statistics, Oracle does not invalidate already parsed SQL statements. New statements are parsed with new statistics. You must
have DBA privileges to gather system statistics.
For example:
BEGIN
DBMS_STATS.GATHER_SYSTEM_STATS
(GATHERING_MODE => ‘INTERVAL’,
INTERVAL => 480,
STATTAB => ‘MYSTATS’,
STATID => ‘OLTP’);
END;
/
Statistics are gathered for 480 minutes, placed in a table called MYSTATS, and given an identifier of OLTP. A similar job
would gather statistics for nighttime operation.
Then activate the appropriate set of statistics using your preferred scheduling tool.
Paper # 515
Database Administration
SUMMARY
As of Oracle Database 10g, release 1, there are nearly seventy procedures and functions in the DBMS_STATS package and
many additional options. We have discussed the ones that are generally the most useful. There are many others that you may
be of value in specific instances. As an example, there is a set of DELETE procedures. All are well documented in the Oracle
manual, PL/SQL Packages and Types Reference. You may also want to reference the Oracle Database, Performance Tuning Guide
manual.
Since DBMS_STATS was introduced in Oracle8i, Oracle, and most individuals doing Oracle tuning, have recommended using
it to generate statistics for the cost-based optimizer. The Oracle documentation states, “The cost-based optimizer will
eventually use only statistics that have been collected by DBMS_STATS.” If your shop is still using the ANALYZE command,
you may want to seriously consider taking the time to make the switch. The package is easy to use. Conversion from existing
ANALYZE jobs is straight forward and generally, not time-consuming.
Paper # 515