Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
RAC be Nimble, RAC be Quick Bert Scalzo, Domain Expert, Oracle Solutions [email protected] Copyright © 2006 Quest Software About the Author … Domain Expert & Product Architect for Quest Software Oracle Background: • Worked with Oracle databases for over two decades (starting with version 4) • Work history includes time at both “Oracle Education” and “Oracle Consulting” Academic Background: • Several Oracle Masters certifications • BS, MS and PhD in Computer Science • MBA (general business) • Several insurance industry designations Key Interests: • Data Modeling • Database Benchmarking • Database Tuning & Optimization • "Star Schema" Data Warehouses • Oracle on Linux – and specifically: RAC on Linux Articles for: • Oracle’s Technology Network (OTN) • Oracle Magazine, • Oracle Informant • PC Week (eWeek) Articles for: • Dell Power Solutions Magazine • The Linux Journal • www.linux.com • www.orafaq.com 1 Books by Bert … Coming in 2009 … Out Now … 2nd Edition Coming Soon 2 Agenda • RAC Challenges – “RAC in the Box” syndrome – Must Optimize every subsystem • RAC Optimization Approach – Optimization Approach Universe – Why a True Top-Down Approach – Top-Down Process & Tools (but NOT a sales pitch) • Real-world Scenario – Dell’s use of Quest’s Approach & Tools for Oracle RAC – “Best Practices” applied incrementally & results 3 Oracle RAC is Great, but … • Too often people expect RAC to “auto-magically” function out of the box with little to no optimization • During RAC optimization attempts, people far too often concentrate on just a single dimension – typically the Oracle database “stuff” (i.e. ratios, waits, parms, etc…) • During RAC optimization attempts, Oracle is often too heavily weighted as the primary source and reason for most, if not all, of the performance bottleneck issues • Not enough true “application nature” is identified and accounted for during the overall optimization process • Result: Too many people achieve sub-par results! 4 I call it “RAC in the BOX” syndrome • That’s just my stupid name for it (hope it catches) • But nearly half the RAC sites I visit are suffering from RAC performance issues related to this! • Still far too RAC experts among the general DBA population (although improving each & every day) • Really no simple, single button tools yet to make RAC “auto-magically” “fire on all cylinders” • Many people too often and readily bail on RAC, and fall-back to “big SMP boxes” (the evil that they know) • But with just a little manual “box winding”, anyone should be able to “pop the RAC weasel” free 5 RAC a is System, Must Tune its Entirety 6 RAC Performance = Sum of its Parts • Application Nature (affects everything else below) • Public Network • Storage Network • Storage Sub-System • Oracle Instance Configuration Traditional Focus • Oracle Cluster Configuration • Private Network (i.e. Interconnect) 7 Optimization Approaches (i.e. Focus) Top-Down Pro-Active Side-Ways (Ad-Hoc) Bottom-Up Re-Active 8 Optimization Approach Techniques/Tools DBA_, V$ and X$ Aggregate Information •Oracle OEM Diagnostics & Tuning Packs Top-Down •Confio Ignite for Oracle •BMC DBXray for Oracle •Quest Spotlight for Oracle & Spotlight for RAC Side-Ways (Ad-Hoc) V$, STATS$, DBA_HIST_ Response Time ~= Wait Events Bottom-Up •Cary Millsap (Hotsos) calls this “Method R” •Anjo Kolk & et al Oracle) call this “YAPP Method” •Kyle Haily (PerfVision) paper on “Waits Defined” Trace/Instrumentation 9 True Top-Down Optimization Focus • • • • “Low Hanging Fruit” Obvious yet Overlooked Subtle yet Highly Critical “Dumb Question” Method Radically Different than other RAC tuning sessions: Not going to delve into obscure hardware, OS, network, Oracle and RAC tuning parameters or configurations, Just easy stuff that makes a big difference 10 Why Top-Down Works (now don’t laugh…) Drive Dallas to Austin: •Initial route: •45 south to Houston, 610 west to 290, and then 290 west to Austin •Drive Time = 7 Hours! •Improved Route: •45 south, cut across 79 west half way to Houston, and then 35 south to Austin •Drive Time = 4.5 Hours •36% Reduction •Top-Down Route: •Dumb Question: Why Houston? •35 South •Drive Time = 3.5 Hours •50% Reduction 11 RAC Performance Testing Process (using tools) 1. Benchmark Factory 2. Spotlight on RAC Industry standard benchmark: TPC-C Key Metric = Avg Response Time Record before & after results Apply Top-Down Analysis & Revision Confirm improvements 3. TOAD with DBA Module AWR/ADDM & Stats Pack Again record before & after for improvements confirmation 12 Tool #1 – Load Generator Benchmark Factory® • Test Oracle RAC environments rapidly and reliably • Perform database “scalability” or “goal” testing to determine the most optimal RAC configuration • Tests: – TPC-C – TPC-H – Trace File playback – etc, etc, etc … • Let’s DBA concentrate on task at hand - Optimization 13 14 Tool #2 – Ad-Hoc Monitor (to confirm results) Spotlight® on RAC • Monitor Oracle RAC environments rapidly and reliably • Diagnose Oracle RAC environment health levels at – Node – Cluster – ASM – Instance – Interconnect • Intelligent performance alerts plus market-leading GUI for entire RAC to instances architecture & bottlenecks • Let’s DBA concentrate on task at hand - Diagnosing 15 16 17 18 Tool #3 – Ad-Hoc Monitor (to confirm results) TOAD® with DBA Module • Expedite typical DBA management & tuning tasks • Great Productivity Enhancing Features – Database Health Check – Database Probe – Database Monitor – AWR/ADDM Reports – UNIX Monitor – Stats Pack Reports • See Toad World paper – Title: “Maximize Database Performance Via Toad for Oracle” – http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx • Let’s DBA concentrate on task at hand – Correcting (i.e. Fixing) 19 20 21 Real-world Scenario Quest strategic partner & customer Dell uses Quest’s solution for Oracle RAC to test the performance of the Oracle RAC architecture running on Dell Power Edge servers and EMC Clarion SAN & iSCSI Disk Arrays 22 DELL Success Story www.Quest.com/success_stories/Dell-Quest.pdf 23 DELL Success Story Database Configuration used at DELL for the RAC test environment 24 Step-By-Step Example Apply Methodology, “Best Practices”, and Quest’s RAC tools to optimize and quantify the approximate percentage of the improvements Note – will quote some specific examples for a given RAC setup, your mileage will surely vary Test = TPC-C (OLTP) for 200-2000 users, 10 GB 25 Remember – must tune RAC as system Start Here Ask lots of Questions … 26 Step 1 - Application Nature Know Your Application Demands (this info flows downstream) • OLTP vs. Data Warehousing – Primarily Read vs. Write – Average Transaction Size – Likelihood of “Dead Lock” – Logging, Flashback and Recovery Requirements – etc, etc, etc … • Concurrent User Load Profile (i.e. user load over time) • Focus on User Response Time Requirements – For example, TPC-C must run each transaction <= 2 seconds – Response Time ~= Wait Events • Cary Millsap (Hotsos) calls this “Method R” • Anjo Kolk & et al (Oracle) call this “YAPP Method” • Kyle Haily (PerfVision) paper “Waits Defined” • Don’t skip this step – cost can be enormous – and that no network, OS, or database tuning can compensate for !!!!! 27 Application “Best Practices” Well known rules: • Write efficient SQL and/or PL/SQL code (explain plans) • Use bind variables to reduce unnecessary “re-parsing” • Often the underlying Application Code (e.g. Benchmark) NOT changeable, so you can’t do anything Deeper TPC-C Analysis (remember across all next steps): • Primarily Read with Some Writes • Small Average Transaction Size • High Concurrency with Potential Deadlocks • Logging for ACID compliance and no flashback 28 Public Network “Best Practices” Well known rules: • Isolate Network (for single or related applications only) • Use Gigabit Ethernet (consider “bonding” multiple cards) • Use Layer 2 or 3 Switches and verify Gigabit throughput TPC-C Analysis Ramifications: • Primarily Reads = Nothing • Small Transaction = No jumbo frames, Standard SDU/TCU • High Concurrency = Multiple Ethernet Segments (collisions) • No Logging, etc… = Nothing 29 Storage Network “Best Practices” Well known rules: • Isolate Network (for single or related applications only) • Use Fiber Channel for SAN, 10GB Ethernet for NAS/iSCSI • Consider multiple pathways per storage controller and HBA • Consider TCP/IP offload engine (TOE) NIC’s or iSCSI HBA’s TPC-C Analysis Ramifications: • Primarily Reads = Nothing • Small Transaction = Jumbo frames since “Block Level” IO • High Concurrency = Fiber Channel and Multiple Pathways (if budget) • No Logging, etc… = Nothing 30 Storage Sub-System “Best Practices” Well known rules: • More Smaller Disks generally higher overall throughput • More memory cache generally higher overall throughput • Avoid “write-back” mode if no backup power source (e.g. battery) • Align Stripe Boundaries: drive, OS block, LVM, file sys, database block, etc • Stripe Depth (i.e. size) from 256 KB to 1 MB • Stripe Width (i.e. # disks) between 4 and 16 • Stripe Depth = Stripe Width X Drive IO Size = One IO per Disk per IO request • Average I/O <= Stripe Width X Stripe Depth • Write-intensive = RAID 0+1/1+0 and Read-intensive = RAID 3 (sequential) or 5 (scattered) TPC-C Analysis Ramifications: • Primarily Reads = RAID 5, Adjust cache memory allocations & look-ahead algorithms • Small Transaction = Stripe Depth >= db_block_size X db_file_multiblock_read_count • High Concurrency = Low Deadlock, so spread DB objects and partitions across LUN’s • No Logging, etc… = Logging but no flashback, so write IO is reasonable, so RAID 5 OK 31 Oracle Instance “Best Practices” Well known rules: • Size & Tune the SGA appropriately for application nature • Choose reasonable block size based on application nature (?8K?) • Partition large objects & indexes across storage devices & spindles • Don’t assume any “golden rules” – i.e. test all assumptions! TPC-C Analysis Ramifications: • Primarily Reads = opt_index_caching=80, opt_index_adj_cost=20 • Small Transaction = Size redo logs correctly for small size X high load • High Concurrency = cursor_space_for_time=t, cursor_sharng=similar • No Logging, etc… = Turn off “Recycle Bin” but keep “LOGGING” 32 Oracle Cluster “Best Practices” Well known rules: • Increase default SGA size for all ASM instances (64M too small) • Interconnect is the most important bottleneck – many bonded NIC’s • Consider hash partitions & reverse indexes to spread IO across nodes • Don’t assume any “golden rules” – i.e. really test all assumptions!!! TPC-C Analysis Ramifications: • Primarily Reads = Nothing • Small Transaction = Decrease db_file_multiblock_read_count • High Concurrency = Decrease block size (?4K?) – see next slide • No Logging, etc… = Nothing 33 Single Instance – No Block Contention 34 Cluster – Block Contention Costs 35 Private Network “Best Practices” Well known rules: • Isolate Network (for single cluster only – and 0% public) • Use 10GB Ethernet or Inifini-Band (Dell found 15% RTI) • Consider multiple pathways per HBA and storage controller • Jumbo frames since high “Block Level” IO between nodes TPC-C Analysis Ramifications: • Primarily Reads = Nothing • Small Transaction = Nothing • High Concurrency = Lower block size until interconnect traffic OK – Consider increasing the OS priority of the global cache cluster services • No Logging, etc… = Nothing 36 Let’s Apply the Recommendations 1 Read Count 2 Block Size 3 4 Test 1 16 8 False Exact 0 100 False Test 2 2 8 False Exact 0 100 False Test 3 2 4 False Exact 0 100 False Test 4 2 4 True True 80 20 False Test 5 2 4 True True 80 20 True Cursor Cursor Index Space Share Cache Index Cost 5 Jumbo 37 Results – TPS (lesser interest here) Transactions / Second 35.00 30.00 25.00 Run 1 20.00 Run 2 Run 3 15.00 Run 4 Run 5 10.00 5.00 0.00 50 100 150 200 250 300 350 400 450 500 38 Results – Average Response Time Average Response Time 6.00 5.00 4.00 Run 1 Run 2 3.00 Run 3 Run 4 Run 5 2.00 1.00 Sub Second 0.00 50 100 150 200 250 300 350 400 450 500 39 Thank you Please offer any questions or comments Remember: • Eliminate “RAC in the Box” syndrome – eat low hanging fruit • Example was for TPC-C or OLTP type application • TPC-H or Data Warehouse would NOT be the same • Your mileage may well vary (especially percentages) Toad World Article: “Maximize Database Performance Via Toad for Oracle” http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx Dell Power Solutions article: http://www.quest.com/success_stories/Dell-Quest.pdf 40