Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data analysis wikipedia , lookup
Data vault modeling wikipedia , lookup
Concurrency control wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Versant Object Database wikipedia , lookup
Clusterpoint wikipedia , lookup
Oracle Database wikipedia , lookup
Database Administration Reviewed by Oracle Certified Master Korea Community ( http://www.ocmkorea.com http://cafe.daum.net/oraclemanager ) ANALYZE THAT DATABASE NOW (USING DBMS_STATS) ABSTRACT The cost-based optimizer has the responsibility for determining the best method for executing a SQL statement. The key to the optimizer is a current set of statistics which reflects the state of your database. This presentation discusses the Oracle provided package DBMS_STATS which is used for statistics gathering. This tool replaces the ANALYZE command which was deprecated in Oracle9i and is now used only for specialized functions. Also discussed are many of the new Oracle Database 10g enhancements. INTRODUCTION Statistics can be generated that quantify the data distribution and storage characteristics of tables, columns, indexes, and partitions. The cost-based optimizer uses these statistics to estimate the “cost” of each possible execution plan. The optimizer uses this information to choose a particular access method and to determine the optimal join order and join method. As a part of the design process for an application, a developer needs to design frequency requirements for statistics gathering. This is generally based on the frequency of modifications to the objects that change data volumes or column values. The database administrator then gathers the statistics. Therefore, it is essential that both DBAs and developers have an understanding of the requirements and the process. Beginning with Oracle8i, Oracle provides the package DBMS_STATS to gather the appropriate statistics. This package has been enhanced in Oracle9i and again in Oracle Database 10g. Prior to Oracle8i, statistics were gathered using the ANALYZE command. Oracle now strongly recommends using DBMS_STATS to collect statistics. The cost-based optimizer will eventually use only statistics that have been gathered with DBMS_STATS. ANALYZE should be used only for VALIDATE and LIST CHAINED ROWS. ANALYZE COMPUTE and ESTIMATE were deprecated in Oracle9i and are supported for backward compatibility only. UNDERSTANDING STATISTICS Statistics which quantify the data distribution and storage characteristics of tables, columns, and indexes are critical for the cost-based optimizer to make appropriate decisions. OVERVIEW OF THE COST-BASED OPTIMIZER There is almost always more than one way for Oracle to execute a Data Manipulation Language (DML) statement (SELECT, INSERT, UPDATE, MERGE, and DELETE). The objective of the optimizer is to choose the most efficient method of executing the statement. The optimizer tasks include: • Possible transformation of the statement into an equivalent, more efficient expression • Decide how to access the data stored in the database • For tables that access more than one table, decides how to perform the necessary joins Paper # 515 Database Administration To accomplish this, the cost-based optimizer chooses an access path based on several factors: • The available access paths for the statement • Determines which access paths are available by examining o The conditions in the WHERE clause o The FROM clause if SAMPLE or SAMPLE BLOCK are used • Generates possible execution plans for the available access paths • Estimates the cost for each plan using the statistics for the tables, indexes, and columns accessible to the statement • Chooses the execution plan with the lowest estimated cost (lowest execution time) THE IMPORTANCE OF GATHERING STATISTICS The optimizer requires accurate information to create efficient execution plans. It is important to analyze all tables and indexes. For example, the optimizer makes its worst choices on a join when one table is analyzed and one is not. It will use Oracle-defined default values when statistics do not exist Statistics will need to be gathered periodically as they become stale over time due to changing data volumes or data distribution. Existing statistics are updated the next time statistics are gathered. Any currently parsed statements that access the object(s) are invalidated. The optimizer then chooses a new execution plan based on the new set of statistics. Remember to analyze all tables and indexes. Don’t forget to forget to generate statistics when: • A table is rebuilt or moved • An index is added • Moving from development to production or vice-versa To determine which tables and indexes have been analyzed, use the LAST_ANALYZED column in the data dictionary views. STATISTICS GATHERED AND STORED The statistics gathered by DBMS_STATS include: • Table statistics o Number of rows o Number of blocks used (always exact) o Average row length • Column statistics o Number of distinct values per column o Number of nulls in the column o Data distribution • Index statistics o Index level (always exact) o Number of leaf blocks o Clustering factor o Minimum/maximum key values • System statistics (new in Oracle9i) o I/O and CPU performance and utilization Paper # 515 Database Administration The statistics are stored in the data dictionary: • DBA_TABLES • DBA_TAB_HISTOGRAMS • DBA_OBJECT_TABLES • DBA_TAB_COL_STATISTICS • DBA_INDEXES • DBA_CLUSTERS • DBA_TAB_PARTITIONS • DBA_TAB_SUBPARTITIONS • DBA_IND_PARTITIONS • DBA_IND_SUBPARTITIONS • DBA_PART_COL_STATISTICS • DBA_PART_HISTOGRAMS • DBA_SUBPART_COL_STATISTICS • DBA_SUBPART_HISTOGRAMS And the new Oracle Database 10g views: • DBA_TAB_STATISTICS • DBA_IND_STATISTICS You may also use the equivalent USER_ or ALL_ views. USING THE DBMS_STATS PACKAGE The PL/SQL package lets you generate and manage statistics for cost-based optimization. The package may be used to gather, modify, view, export, import, and delete statistics. You can also name sets of gathered statistics. The package is used to gather statistics on individual indexes, tables, columns, and partitions. You may also gather statistics for all objects in a schema or the entire database. Statistics cannot be gathered for a cluster but may be gathered for the individual tables in the cluster. For partitioned tables and indexes, statistics can be gathered for each partition or subpartition as well as global statistics for the entire table or index. The optimizer can then choose to use either the partition statistics or the global statistics. DBMS_STATS gathers only the statistics needed by the cost-based optimizer. It does not gather the number of chained rows, average free space, or the number of unused data blocks which were formally gathered by the ANALYZE command Procedures in the DBMS_STATS package include: Description Procedure GATHER_INDEX_STATS Collects index statistics GATHER_TABLE_STATS Collects table, column, and index statistics GATHER_SCHEMA_STATS Collects statistics for all objects in a schema GATHER_DATABASE_STATS Collects statistics for all objects in a database GATHER_SYSTEM_STATS Collects statistics on I/O and CPU utilization GATHER_DICTIONARY_STATS Gathers statistics for all tables in the data dictionary (new in Oracle 10 g) GATHER_FIXED_OBJECTS_STATS Gathers statistics for fixed objects (V$ views) (new in Oracle 10 g) May also gather index statistics while creating or rebuilding the index using the COMPUTE STATISTICS option. Beware, in order to ensure correctness of the statistics, Oracle always uses the base tables when creating the index even if another index is available that could be used. Paper # 515 Database Administration SYNTAX Following is the syntax for the GATHER_TABLE_STATS procedure. The other procedures have similar syntax and options. See the documentation referenced later for the complete syntax for all of the procedures. Syntax: DBMS_STATS.GATHER_TABLE_STATS ( ownname VARCHAR2, tabname VARCHAR2, partname VARCHAR2 DEFAULT NULL, estimate_percent NUMBER DEFAULT NULL, block_sample BOOLEAN DEFAULT FALSE, method_opt VARCHAR2 DEFAULT ’FOR ALL COLUMNS SIZE 1’, degree NUMBER DEFAULT NULL, granularity VARCHAR2 DEFAULT ’DEFAULT’, cascade BOOLEAN DEFAULT FALSE, stattab VARCHAR2 DEFAULT NULL, statid VARCHAR2 DEFAULT NULL, statown VARCHAR2 DEFAULT NULL, no_invalidate BOOLEAN DEFAULT FALSE) Parameter Descriptions: Parameter Description ownname Schema of table to analyze. tabname Name of table. partname Name of partition. estimate_percent Percentage of rows to estimate (NULL means compute) The valid range is [0.000001,100]. Use the constant DBMS_STATS.AUTO_SAMPLE_SIZE to have Oracle determine the best sample size for good statistics. block_sample Whether or not to use random block sampling instead of random row sampling. Random block sampling is more efficient, but if the data is not randomly distributed on disk, then the sample values may be somewhat correlated. Only pertinent when doing an estimate statistics. method_opt Accepts: FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause] FOR COLUMNS [size clause] column|attribute [size_clause] [,column|attribute [size_clause]...], where size_clause is defined as: size_clause := SIZE {integer | REPEAT | AUTO | SKEWONLY} integer—Number of histogram buckets. Must be in the range [1,254]. REPEAT—Collects histograms only on the columns that already have histograms. AUTO—Oracle determines the columns to collect histograms based on data distribution and the workload of the columns. SKEWONLY—Oracle determines the columns to collect histograms based on the data distribution of the columns. degree Degree of parallelism. NULL means use the table default value specified by the DEGREE clause in the CREATE TABLE or ALTER TABLE statement. Use the constant DBMS_STATE.DEFAULT_DEGREE to specify the default value based on the initialization parameters. Paper # 515 Database Administration granularity no_invalidate The granularity of statistics to collect (only pertinent if the table/index is partitioned): ’DEFAULT’ - gathers global and partition-level statistics ’SUBPARTITION’ - gathers subpartition-level statistics ’PARTITION ’- gathers partition-level statistics ’GLOBAL’ - gathers global statistics ’ALL’ - gathers all (subpartition, partition, and global) statistics Dependent cursors are not invalidated if this parameter is set to TRUE GENERATING STATISTICS Statistics may be generated using: • Estimation based on random data sampling • Exact computation • User-defined statistics collection methods We will concentrate on the first two. User-defined statistics are defined using SET and GET operations. This is beyond the scope of our discussion but are well-documented in the Oracle documentation referenced later. To perform an exact computation, Oracle requires enough space to perform a full-table scan and sort of the table. This may require the use of temporary space. For estimations, sufficient space is required to scan and sort only the rows in the requested sample. Indexes take less time and space. Some statistics are always computed exactly, such as the number of data blocks currently containing data and the depth of an index tree. Oracle recommends estimating statistics by setting the ESTIMATE_PERCENT parameter to DBMS_STATS.AUTO_SAMPLE_SIZE. This maximizes performance gains while achieving “necessary statistical accuracy.” Oracle then determines the best sample size for good statistics. For example, to collect table and column statistics for all tables in the SCOTT schema with auto-sampling: EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS (OWNNAME => ‘SCOTT’, ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE, CASCADE => TRUE); The CASCADE => TRUE option causes all indexes for the tables to also be analyzed. ESTIMATING STATISTICS DBMS_STATS allows the specification of a user-defined sampling percentage. • Row sampling reads rows without regard to physical placement on disk o Provides the most random data for estimates o To use, specify ESTIMATE_PERCENT => 5 o Valid values from 0.000001 to 100 o A null value causes an exact computation (all rows analyzed) o A value of 5% usually forces the optimizer to make the same decision as an exact computation o A value greater than 15% will usually result in 100% of the rows being read and analyzed Paper # 515 Database Administration For example: EXECUTE DBMS_STATS.GATHER_TABLE_STATS (OWNNAME => ‘SCOTT’, TABNAME => ‘EMPLOYEE’, ESTIMATE_PERCENT => 5); Block sampling reads a random sample of blocks and uses all of the rows in those blocks for estimates o Reduces the amount of I/O activity for a given sample size o Can reduce the randomness of the sample o Specify the BLOCK_SAMPLE parameter with an appropriate percent o For example: EXECUTE DBMS_STATS.GATHER_INDEX_STATS (OWNNAME => ‘SCOTT’, INDNAME => ‘EMPLOYEE_PK’, BLOCK_SAMPLE => 5); o • STATISTICS FOR THE DATA DICTIONARY New in Oracle Database 10g is the ability to gather statistics for the data dictionary. The objective is to enhance the performance of queries. There are two basic types of dictionary base tables. The statistics for normal base tables are gathered using GATHER_DICTIONARY STATISTICS. They may also be gathered using GATHER_SCHEMA_STATS for the SYS schema. Oracle recommends gathering these statistics at a similar frequency as your other database objects. Statistics for fixed objects (the V$ views on the X$ tables) are gathered using the GATHER_FIXED_OBJECT_STATS procedure. The initial collection of these statistics is normally sufficient. Repeat only if workload characteristics have changed dramatically. The SYSDBA privilege or ANALYZE ANY DICTIONARY and ANALYZE ANY privileges are required to execute the procedures for gathering data dictionary statistics. TRANSFERRING STATISTICS Statistics can be saved into a statistics table and then loaded into the data dictionary. This allows statistics to be saved before gathering new ones. They could then be used to restore the old statistics in the event performance degrades. For instance, when upgrading Oracle to a new release. May also be used to copy statistics to another instance. As an example, may want to copy from a production database to a test database. Then the smaller database could be used for testing but will see the execution plans that would occur on the larger production database. To transfer the statistics, use the following DBMS_STATS procedures: Procedure Description CREATE_STAT_TABLE Creates the statistics table EXPORT_TABLE_STATISTICS Copies the statistics into the table IMPORT_TABLE_STATISTICS Copies the statistics into the data dictionary of the other instance These procedures may also be used to export/import column, index, schema, database, or system statistics. May also be used to maintain and experiment with sets of statistics. Paper # 515 Database Administration Occasionally, gathering new statistics causes a degradation in performance. For specific, critical tables, it is possible to save the old statistics before gathering new ones. You could then revert to the old statistics if necessary while analyzing and resolving the reasons for the performance change. BEGIN DBMS_STATS.CREATE_STAT_TABLE (OWNNAME => 'USER1', TABNAME => 'SAVESTATS'); DBMS_STATS.GATHER_TABLE_STATS (OWNNAME => 'USER1', TABNAME => 'EMPLOYEE', STATTAB => 'SAVESTATS'); END; If the new statistics are causing the optimizer to generate poor plans, restore the original stats while resolving the problem. BEGIN DBMS_STATS.DELETE_TABLE_STATS (OWNNAME => 'USER1', TABNAME => 'EMPLOYEE'); DBMS_STATS.IMPORT_TABLE_STATS (OWNNAME => 'USER1', TABNAME => 'EMPLOYEE', STATTAB => 'SAVESTATS'); END; RESTORING AND PURGING STATISTICS Beginning in Oracle Database 10g, statistics may be restored as of a specified time stamp. This may be useful when newly collected statistics leads to sub-optimal execution plans. To revert to a previous set of statistics for a specific table, use the RESTORE_TABLE_STATS procedure. This may also be done at the schema or database level. Similar procedures are available for fixed objects, dictionary, and system statistics. Whenever statistics in the dictionary are modified, old versions are automatically saved. Old statistics are automatically purged at regular intervals based on a configurable retention setting or purged beyond a specified time stamp using the PURGE_STATS procedure. There are dictionary views that display the time of statistics modifications. These views are useful in determining the time stamp to be used for statistics restoration. The GET_STATS_HISTORY_RETENTION function can be used to get the current statistics history retention value. The GET_STATS_HISTORY_AVAILABILITY function gets the oldest time stamp where statistics history is available. Statistics cannot be restored to a time stamp older than the oldest available time stamp. COLUMN HISTOGRAMS It is essential that the optimizer be aware of columns that have severe data skewness or it may make bad decisions. This is accomplished by providing histograms which provide detailed information to the optimizer about data distributions which are not uniform. Examples of where this may be an issue include: • A column which has non-uniform data distribution such as an employee name column where there my be more SMITHs than ZABRISKIEs • A column where a large percentage of the rows have the same value for a given column such as most employees hired in the past two years. Paper # 515 Database Administration Histograms can result in significant performance improvements but does require a knowledge of the application and the data distribution. As an example, to develop histograms only for indexed columns: METHOD_OPT => ‘FOR ALL INDEXED COLUMNS SIZE 10’ where size specifies the number of histogram buckets. Beginning with Oracle9i, Oracle provides additional options for automating the decision of which columns to build histograms for and the definition of the appropriate number of histogram buckets. AUTO creates histograms based on data distribution and the way columns a re being accessed by the application. This does require the application to run for an amount of time to allow the workload to be captured. To use: METHOD_OPT => ‘FOR ALL INDEXED COLUMNS SIZE AUTO’ The SIZE option, SKEWONLY creates histograms based strictly on data distribution. This is useful while time is allowed to capture the workload. However, it is expensive in terms of resource so it should be restricted to the initial collection of statistics and then it is generally better to use the AUTO option. The final SIZE option, REPEAT refreshes the existing column statistics including histograms. If a column does not presently have histograms, new ones will not be created. AUTOMATING STATISTICS COLLECTION The DBMS_STATS procedures can be executed inside PL/SQL procedures. As an example: CREATE OR REPLACE PROCEDURE gather_stats_proc BEGIN DBMS_STATS.GATHER_SCHEMA_STATS (OWNNAME => ‘SCOTT’, ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE); END; / The procedure could then be scheduled using the Oracle-provided package, DBMS_JOB in releases through Oracle9i, DBMS_SCHEDULER in Oracle Database 10g, third-party scheduling tools, or other job scheduling tools at the operating system level. In Oracle8i, the MONITORING clause in the CREATE (or ALTER) TABLE statement is set to start or stop the collection of modification statistics: ALTER TABLE table_name MONITORING|NOMONITORING May also activate monitoring using the DBMS_STATS procedures: ALTER_SCHEMA_TAB_MONITORING ALTER_DATABASE_TAB_MONITORING Monitoring can be controlled using DBMS_STATS to enable or disable. In Oracle9i, monitoring may be set the schema or database level. For an individual schema use ALTER_SCHEMA_TABLE_MONITORING. For the entire database use ALTER_DATABASE_TABLE_MONITORING. Paper # 515 Database Administration All of the above monitoring options are obsolete in Oracle Database 10g. They may still be used, however, they will have no affect. Beginning with Oracle Database 10g, the STATISTICS_LEVEL initialization parameter sets table monitoring at the database level. It defaults to TYPICAL which ensures collection of all major statistics. Set to BASIC to disable monitoring. MONITORING tracks the approximate number of INSERT, UPDATE, and DELETE operations for the table since the last time statistics were gathered. The monitoring information is maintained in the System Global Area (SGA). Then, periodically (about every three hours) the information is incorporated into the data dictionary. This information can be seen in the DBA_TAB_MODIFICATIONS view. Oracle uses this information to identify tables with stale statistics. Statistics are considered stale when 10% of the rows have changed. Automated statistics gathering is then invoked by setting a recurring job specifying OPTIONS => GATHER STALE. The job would be set to execute at an appropriate interval for the specific application. Beginning with Oracle9i, GATHER STALE can be changed to GATHER AUTO as discussed earlier. LOCKING STATISTICS Beginning in Oracle Database 10g, statistics may be locked. When a table is locked, all statistics on the table, including table, column, histograms and the statistics on all dependent indexes are locked. When the statistics are locked, the GATHER_*, IMPORT_*, and DELETE_* procedures will raise an error. The procedures that operate on multiple objects (GATHER_SCHEMA_STATS) will skip modifying the statistics of an object if it is locked. This feature can be used to freeze the current set of statistics or to keep the statistics empty when required. Set locking using the procedures LOCK_SCHEMA_STATS and/or LOCK_TABLE_STATS. Use the UNLOCK_* procedures to unlock the statistics. GATHERING SYSTEM STATISTICS System statistics enable the optimizer to consider a system’s I/O and CPU performance and utilization. System statistics depend on many factors and do not stay constant at all times. Therefore, capture statistics during the interval of time when the system has the most common workload. For example, gather OLTP statistics for use during the daytime and batch statistics for use during the nighttime. The appropriate set is then activated as required. Unlike table, index, or column statistics, Oracle does not invalidate already parsed SQL statements. New statements are parsed with new statistics. You must have DBA privileges to gather system statistics. For example: BEGIN DBMS_STATS.GATHER_SYSTEM_STATS (GATHERING_MODE => ‘INTERVAL’, INTERVAL => 480, STATTAB => ‘MYSTATS’, STATID => ‘OLTP’); END; / Statistics are gathered for 480 minutes, placed in a table called MYSTATS, and given an identifier of OLTP. A similar job would gather statistics for nighttime operation. Then activate the appropriate set of statistics using your preferred scheduling tool. Paper # 515 Database Administration SUMMARY As of Oracle Database 10g, release 1, there are nearly seventy procedures and functions in the DBMS_STATS package and many additional options. We have discussed the ones that are generally the most useful. There are many others that you may be of value in specific instances. As an example, there is a set of DELETE procedures. All are well documented in the Oracle manual, PL/SQL Packages and Types Reference. You may also want to reference the Oracle Database, Performance Tuning Guide manual. Since DBMS_STATS was introduced in Oracle8i, Oracle, and most individuals doing Oracle tuning, have recommended using it to generate statistics for the cost-based optimizer. The Oracle documentation states, “The cost-based optimizer will eventually use only statistics that have been collected by DBMS_STATS.” If your shop is still using the ANALYZE command, you may want to seriously consider taking the time to make the switch. The package is easy to use. Conversion from existing ANALYZE jobs is straight forward and generally, not time-consuming. Paper # 515