Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Tuning Oracle RAC Guy Peleg President Maklee Engineering [email protected] Friday, May 12, 2017 Agenda RAC overview & Performance Expectations Performance Tips SQL Tuning Typical RAC configuration App Server1 App Server2 Client Public Network Database Instance 1 Private Network (Interconnect) Node 1 Database Instance 2 Node 2 Storage Network Local Storage Local Storage Shared Database Oracle RAC Oracle RAC provides two main features: Availability Scalability May operate in two modes: All nodes are active (load distributed between nodes) Active/Passive RAC scaling/performance considerations are similar to OpenVMS clustering scaling/performance considerations Interconnect Locks Sharing RAC Scaling – Maklee’s Golden Rules Application that does not scale on a standalone node – will not scale on RAC Start with single instance tuning shutdown all nodes measure scaling test scaling by adding CPUs Add one node at a time to measure scalability Scalability Benchmark 2 nodes cluster 1.3 Ghz rx2600, running OpenVMS V8.3-1H1 Oracle 10gR2 RAC Latest set of patches Test database contains information about 50,000 customers 200,000 customer orders 200,000 ordered items Scalability Benchmark PL/SQL procedure to fetch data about 2000 random customers Read only test All data in SGA No I/O CPU Bound Scalability Benchmark 1400 1200 1 Job 1000 800 600 400 200 6 Parallel jobs one instance 3 Parallel jobs instance 1 3 Parallel jobs instance 2 0 Elapsed time (seconds per job) to complete the test Less is better RAC Proof Of Concept MAKLEE Engineering recently performed a RAC proof of concept installation at a large chain of department stores in Switzerland. Benchmarked a single Alpha GS1280 (production node) vs. a RAC cluster running 2 Integrity servers rx6600. The goals were: Install RAC Get hands on experience with RAC Perform RAC scaling tests Make a go/no go decision on implementing RAC in production Hardware & Software Configuration Oracle RAC Configuration: 2 nodes OpenVMS Cluster Each node is rx6600 with 8 cores OpenVMS V8.3-1H1 EVA8000 storage Products installed: Oracle CRS (Cluster Ready Services) Oracle 10g R2 DBCA executed for configuring RAC enabled database Database patches 27 Parallel Database Import Jobs 160 Standalone GS1280 140 120 rx6600 RAC/standalone Itanium 100 80 60 40 20 0 Minutes to complete database import less is better Database Import Itanium outperformed Alpha Operating in RAC environment does not increase the throughput of the import operation Spreading the jobs across two nodes or running all jobs on one node yields identical performance/throughput No performance degradation witnessed Batch Processing Benchmark 45 54 jobs - Single Alpha GS1280 40 35 54 jobs - Single rx6600 30 54 jobs - Spread across the RAC 25 20 15 10 5 0 Minutes to complete batch processing cycle Less is better Batch Processing Benchmark Itanium outperformed Alpha RAC allows scaling outside of the box Second RAC node adds 40% more throughput Another Example – European Bank European Bank migrating from Alpha to Itanium 2 nodes AlphaServer ES47 -> 2 nodes rx7640 Migrating to Oracle 10gR2 RAC Availability is main concern Interactive users will be distributed between nodes No plans to distributed batch load between nodes Needed to verify that RAC does not degrade performance Another Example – European Bank Benchmarked various batch jobs – focusing on one specific batch job. Initial results did not favor Itanium. Batch Processing Benchmark 35 Alpha ES47 30 25 rx7640 out of the box 20 rx7640 after tuning 15 10 5 0 Minutes to complete selected batch job Less is better European Bank - Summary Tuning is critical for achieving optimal performance Don’t run “out of the box”. 66% improvement after (minimal) tuning The specific benchmark is running 52% faster on Itanium comparing to Alpha. European Bank - Summary All other batch jobs/applications witnessed similar improvement. RAC increases availability and does not degrade performance. RAC will go into production in few weeks Performance Tips CRS Base Priority CRS is running in batch Usually, runs in a dedicated batch queue By default, base priority of a batch queue is 4 On a system with thousands of processes, CRS may need to compete (and sometimes lose) for CPU resources CRS should be given high priority Set base priority of CRS queue to 12 RAC Cluster Interconnect The performance of the cluster interconnect is critical to the performance of the RAC. Interconnect used for Cluster management Locks Cache Fusion Oracle requires (at least one) dedicated cluster interconnect Gigabit Ethernet is highly recommended Enable Jumbo Frames Transfer rate of ~ 25MB per second (faster than some disks ;-) Cluster interconnect Performance Latency is CRITICAL for RAC performance Measure the latency of the interconnect: set numwidth 20 column "AVG CR BLOCK RECEIVE TIME (ms)" format 9999999.9 select b1.inst_id, b2.value "GCS CR BLOCKS RECEIVED", b1.value "GCS CR BLOCK RECEIVE TIME", ((b1.value/b2.value) * 10) "AVG CR BLOCK RECEIVE TIME (ms)" from gv$sysstat b1, gv$sysstat b2 where b1.name='gc cr block receive time' and b2.name='gc cr blocks received' and b1.inst_id=b2.inst_id; Cluster interconnect Performance Latency should be lower than 15ms OpenVMS achieved 0.5ms on blades RAC (BL860) V8.3-1H1 Gigabit Ethernet Jumbo Frames enabled Load distribution between instances set pagesize 60 space 2 numwidth 8 linesize 132 verify off feedback off column service_name format a20 truncated heading 'Service' column instance_name heading 'Instance' format a10 column service_time heading 'Service Time|mSec/Call' format 999999999 select service_name, instance_name, elapsedpercall service_time, cpupercall cpu_time, dbtimepercall db_time, callspersec throughput from gv$instance gvi, gv$active_services gvas, gv$servicemetric gvsm where gvas.inst_id=gvsm.inst_id and gvas.name_hash=gvsm.service_name_hash and gvi.inst_id=gvsm.inst_id and gvsm.group_id=10 order by service_name, gvi.inst_id; Standalone Database Import 8 rx6600 before tuning 7 6 5 rx6600 after tuning 4 3 2 1 0 Minutes to complete database import less is better Database import Install imp.exe as resident image with shared address space $ install add imp.exe/resident/share=addr Increase default quotas for BEQ’s mailboxes $ define/sys ORA_BEQ_MBXSIZ 64000 $ define/sys ORA_BEQ_MBXSBFQ 64000 Set DEFMBXBUFQUO to 64000 Set DEFMBXMXMSG to 64000 DBMS_STATS.GATHER_SCHEMA_STATS 100 90 80 70 60 50 40 30 20 10 0 rx6600 before tuning rx6600 after tuning Minutes to gather database statistics (350GB database) Less is better DBMS_STATS.GATHER_SCHEMA_STATS Calling gather_schema_stats results in a database server process being created The server process in not multithreaded Typically consumes 100% of one CPU Performance improvement achieved by affinitizing the server process to one CPU and increasing QUANTUM to 20. SORT Analyze the efficiency of sort operations Determine the number of optimal, one pass and multipass operations SELECT optimal_count, round(optimal_count*100/total, 2) optimal_perc, onepass_count, round(onepass_count*100/total, 2) onepass_perc, multipass_count, round(multipass_count*100/total, 2) multipass_perc FROM (SELECT decode(sum(total_executions), 0, 1, sum(total_executions)) total, sum(OPTIMAL_EXECUTIONS) optimal_count, sum(ONEPASS_EXECUTIONS) onepass_count, sum(MULTIPASSES_EXECUTIONS) multipass_count FROM v$sql_workarea_histogram WHERE low_optimal_size > 64*1024); Sizing the SGA Reserve memory for the SGA (SYSMAN) Avoid automatic memory management in the SGA whenever possible. The following query will help properly size the SGA select sga_size, sga_size_factor as size_factor, estd_physical_reads as estimated_physical_reads from v$sga_target_advice order by sga_size_factor; Sizing the SGA SQL> select sga_size, sga_size_factor as size_factor, 2 estd_physical_reads as estimated_physical_reads 3 from v$sga_target_advice order by sga_size_factor; SGA_SIZE SIZE_FACTOR ESTIMATED_PHYSICAL_READS ---------- ----------- -----------------------4356 ,75 44485808 5808 1 24659539 7260 1,25 24659539 8712 1,5 24659539 10164 1,75 24659539 SQL> What’s wrong in this picture? $ show memory System Memory Resources on 1-APR-2008 15:32:35.62 Physical Memory Usage (bytes): Main Memory (GB) Total 64.00 Free 58.27 In Use 5.69 Modified 0.02 Extended File Cache (Time of last reset: 31-MAR-2008 15:14:46.99) Allocated (MBytes) 397.03 Maximum size (MBytes) 32768.00 Free (MBytes) 17.82 Minimum size (MBytes) 3.12 In use (MBytes) 379.20 Percentage Read I/Os 77% Read hit rate 99% Write hit rate 0% Read I/O count 5368075 Write I/O count 1578011 Read hit count 5315683 Write hit count 0 Reads bypassing cache 79 Writes bypassing cache 241954 Files cached open 739 Files cached closed 2255 Vols in Full XFC mode 0 Vols in VIOC Compatible mode 52 Vols in No Caching mode 0 Vols in Perm. No Caching mode 0 .... Of the physical memory in use, 8.52 GB are permanently allocated to OpenVMS. $ SQL Tuning The next step in improving performance SQL Tuning ! With previous “Alpha Vs. Itanium” benchmarks we had to play it fare Not a single SQL statement was changed. SQL tuning may improve performance by magnitudes SQL Tuning All the tools that are required for SQL tuning are shipping with the database: Automatic Workload Repository (AWR) Endless amount of performance related information Enhanced version of statpak Active Session History (ASH) Automatic Database Diagnostic Monitor (ADDM) SQL Access Advisor SQL Tuning Advisor Statspack analyzer (not part of the DB but available for free) The power of SQL tuning AWR was used to analyze the “scalability benchmark” 97% of the time was spent executing single SQL statement After SQL tuning – elapsed time of the benchmark was reduced from 411 seconds to 3.18 seconds ! 130 times 450 400 350 300 250 200 150 100 50 0 Untuned version Tuned version faster!!!! The power of SQL tuning “Real life” example rx6600, Oracle 10g, DWH DB Single SQL statement required 140 minutes to complete By biasing the optimizer, elapsed time reduced to 10 minutes 140 120 100 80 60 40 20 0 Untuned version Tuned version Questions? See us at www.maklee.com for: • Performance improvements • Oracle Tuning • Platform Migration • Custom Engineering solutions • Custom Training