Download Slide 0 - NoCOUG

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Airborne Networking wikipedia , lookup

Network tap wikipedia , lookup

Transcript
RAC be Nimble,
RAC be Quick 
Bert Scalzo, Domain Expert, Oracle Solutions
[email protected]
Copyright © 2006 Quest Software
About the Author …
Domain Expert & Product Architect for Quest Software
Oracle Background:
•
Worked with Oracle databases for well over two decades (starting with version 4)
•
Work history includes time at both Oracle Education and Oracle Consulting
Academic Background:
•
Several Oracle Masters certifications
•
BS, MS and PhD in Computer Science
•
MBA (general business)
•
Several insurance industry designations
Key Interests:
•
Data Modeling
•
Database Benchmarking
•
Database Tuning & Optimization
•
"Star Schema" Data Warehouses
•
Oracle on Linux – and specifically: RAC on Linux
Articles for:
•
Oracle’s Technology Network (OTN)
•
Oracle Magazine,
•
Oracle Informant
•
PC Week (eWeek)
Articles for:
•
Dell Power Solutions
Magazine
•
The Linux Journal
•
www.linux.com
•
www.orafaq.com
1
Books by Author …
Potentially in 2008 …
2
Agenda
• RAC Performance
– “RAC in the Box”  syndrome
– Must Optimize every subsystem
• RAC Optimization Approach
– RAC Optimization Methodology
– via Quest’s RAC Tools (just example - not a sales pitch)
• Real-world Scenario
– Dell’s use of Quest’s Solution for Oracle RAC
– “Best Practices” applied incrementally & results
3
Optimization Focus
• “Low Hanging Fruit”
• Obvious yet Overlooked
• Subtle yet Highly Critical
Radically Different than other RAC tuning sessions:
Not going to delve into obscure hardware, OS, network,
Oracle and RAC tuning parameters or configurations,
Just easy stuff that makes a big difference 
4
Oracle RAC is Great, but …
• Too often people expect RAC to “auto-magically”
function out of the box with little to no optimization
• During RAC optimization attempts, people far too
often concentrate on just a single dimension – the
Oracle database “stuff” (i.e. waits, parameters, etc)
• During RAC optimization attempts, Oracle is often
too heavily weighted as the source and reason for
most, if not all, of the performance bottlenecks
• Not enough “application nature” is identified and
accounted for during the optimization process
• Result: Too many people achieve sub-par results!
5
I call it “RAC in the BOX” syndrome 
• That’s just my stupid name for it (hope it catches)
• But nearly half the RAC sites I visit are suffering
from RAC performance issues related to this!
• Still far too RAC experts among the general DBA
population (although improving every day)
• Really no simple, single button tools yet to make
RAC “auto-magically” “fire on all cylinders”
• Many people bail on RAC far too soon and simply
fall-back to big SMP boxes (the evil that they know)
• But with just a little manual “box winding”, anyone
should be able to “pop the RAC weasel” free 
6
RAC is System, Must Tune its Entirety
7
RAC Performance = Sum of its Parts
• Application Nature (affects everything else below)
• Public Network
• Storage Network
• Storage Sub-System
• Oracle Instance Configuration
• Oracle Cluster Configuration
• Private Network (i.e. Interconnect)
8
RAC Performance Testing Methodology
1. Benchmark Factory
www.TPC.org tests
or Oracle trace file
playback generator
2. Spotlight on RAC
Monitor RAC system for
performance bottlenecks
RAC Performance
Testing Methodology
3. TOAD with DBA Module
“AWR/ADDM Reports “
Identify and correct
Top five wait events
9
Quest Solutions for Oracle RAC
Benchmark Factory®
• Test Oracle RAC environments rapidly and reliably
• Perform database “scalability” or “goal” testing to
determine the most optimal RAC configuration
• Tests:
– TPC-C
– TPC-H
– Trace File playback
– etc, etc, etc …
• Let’s DBA concentrate on task at hand - Optimization
10
11
Quest Solutions for Oracle RAC
Spotlight® on RAC
• Monitor Oracle RAC environments rapidly and reliably
• Diagnose Oracle RAC environment health levels at
– Node
– Cluster
– ASM
– Instance
– Interconnect
• Intelligent performance alerts plus market-leading GUI
for entire RAC to instances architecture & bottlenecks
• Let’s DBA concentrate on task at hand - Diagnosing
12
13
14
15
Quest Solutions for Oracle RAC
TOAD® with DBA Module
• Expedite typical DBA management & tuning tasks
• Great Productivity Enhancing Features
– Database Health Check
– Database Probe
– Database Monitor
– AWR/ADDM Reports
– UNIX Monitor
– Stats Pack Reports (coming soon)
• See Toad World paper
– Title: “Maximize Database Performance Via Toad for Oracle”
–
http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx
• Let’s DBA concentrate on task at hand – Correcting (i.e. Fixing)
16
17
Real-world Scenario
Quest strategic partner & customer
Dell uses Quest’s solution for Oracle
RAC to test the performance of the
Oracle RAC architecture running on
Dell Power Edge servers and EMC
Clarion SAN & iSCSI Disk Arrays
18
DELL Success Story
www.Quest.com/success_stories/Dell-Quest.pdf
19
DELL Success Story
Database Configuration used at DELL
for the RAC test environment
20
DELL Success Story
Near
Linear
Scalability!
21
Step-By-Step Example
Apply Methodology, “Best Practices”,
and Quest’s RAC tools to optimize
and quantify the approximate
percentage of the improvements
Note – will quote some specific examples for a
given RAC setup, your mileage will surely vary
Test = TPC-C (OLTP) for 200-2000 users, 10 GB
22
Step 1 - Application Nature
Know Your Application Demands (this info flows downstream)
•
OLTP vs. Data Warehousing
–
–
–
–
–
•
•
Concurrent User Load Profile (i.e. user load over time)
Focus on User Response Time Requirements
–
–
•
Primarily Read vs. Write
Average Transaction Size
Likelihood of “Dead Lock”
Logging, Flashback and Recovery Requirements
etc, etc, etc …
For example, TPC-C must run each transaction <= 2 seconds
Response Time ~= Wait Events
• Cary Millsap (Hotsos) calls this “Method R”
• Anjo Kolk & et al … (Oracle) call this “YAPP Method”
• Kyle Haily (PerfVision) paper “Waits Defined”
• etc, etc, etc …
Don’t skip this step – cost can be enormous – and that no
network, OS, or database tuning can compensate for !!!!!
23
Application “Best Practices”
Well known rules:
• Write efficient SQL and/or PL/SQL code (explain plans)
• Use bind variables to reduce unnecessary “re-parsing”
• Often the underlying Application Code (e.g. Benchmark)
NOT changeable, so you can’t do anything
Deeper TPC-C Analysis (remember across all next steps):
• Primarily Read with Some Writes
• Small Average Transaction Size
• High Concurrency with Potential Deadlocks
• Logging for ACID compliance and no flashback
24
Public Network “Best Practices”
Well known rules:
• Isolate Network (for single or related applications only)
• Use Gigabit Ethernet (consider “bonding” multiple cards)
• Use Layer 2 or 3 Switches and verify Gigabit throughput
TPC-C Analysis Ramifications:
• Primarily Reads = Nothing
• Small Transaction = No jumbo frames, Standard SDU/TCU
• High Concurrency = Multiple Ethernet Segments (collisions)
• No Logging, etc… = Nothing
25
Storage Network “Best Practices”
Well known rules:
• Isolate Network (for single or related applications only)
• Use Fiber Channel for SAN, 10GB Ethernet for NAS/iSCSI
• Consider multiple pathways per storage controller and HBA
• Consider TCP/IP offload engine (TOE) NIC’s or iSCSI HBA’s
TPC-C Analysis Ramifications:
• Primarily Reads = Nothing
• Small Transaction = Jumbo frames since “Block Level” IO
• High Concurrency = Fiber Channel and Multiple Pathways (if budget)
• No Logging, etc… = Nothing
26
Storage Sub-System “Best Practices”
Well known rules:
• More Smaller Disks generally higher overall throughput
• More memory cache generally higher overall throughput
• Avoid “write-back” mode if no backup power source (e.g. battery)
• Align Stripe Boundaries: drive, OS block, LVM, file sys, database block, etc
• Stripe Depth (i.e. size) from 256 KB to 1 MB
• Stripe Width (i.e. # disks) between 4 and 16
• Stripe Depth = Stripe Width X Drive IO Size = One IO per Disk per IO request
• Average I/O <= Stripe Width X Stripe Depth
• Write-intensive = RAID 0+1/1+0 and Read-intensive = RAID 3 (sequential) or 5 (scattered)
TPC-C Analysis Ramifications:
• Primarily Reads = RAID 5, Adjust cache memory allocations & look-ahead algorithms
• Small Transaction = Stripe Depth >= db_block_size X db_file_multiblock_read_count
• High Concurrency = Low Deadlock, so spread DB objects and partitions across LUN’s
• No Logging, etc… = Logging but no flashback, so write IO is reasonable, so RAID 5 OK
27
Oracle Instance “Best Practices”
Well known rules:
• Size & Tune the SGA appropriately for application nature
• Choose reasonable block size based on application nature (?8K?)
• Partition large objects & indexes across storage devices & spindles
• Don’t assume any “golden rules” – i.e. test all assumptions!
TPC-C Analysis Ramifications:
• Primarily Reads = opt_index_caching=80, opt_index_adj_cost=20
• Small Transaction = Size redo logs correctly for small size X high load
• High Concurrency = cursor_space_for_time=t, cursor_sharng=similar
• No Logging, etc… = Turn off “Recycle Bin” but keep “LOGGING”
28
Oracle Cluster “Best Practices”
Well known rules:
• Increase default SGA size for all ASM instances (64M too small)
• Interconnect is the most important bottleneck – many bonded NIC’s
• Consider hash partitions & reverse indexes to spread IO across nodes
• Don’t assume any “golden rules” – i.e. really test all assumptions!!!
TPC-C Analysis Ramifications:
• Primarily Reads = Nothing
• Small Transaction = Decrease db_file_multiblock_read_count
• High Concurrency = Decrease block size (?4K?) – see next slide
• No Logging, etc… = Nothing
29
Single Instance – No Block Contention
30
Cluster – Block Contention Costs
31
Private Network “Best Practices”
Well known rules:
• Isolate Network (for single cluster only – and 0% public)
• Use 10GB Ethernet or Inifini-Band (Dell found 15% RTI)
• Consider multiple pathways per HBA and storage controller
• Jumbo frames since high “Block Level” IO between nodes
TPC-C Analysis Ramifications:
• Primarily Reads = Nothing
• Small Transaction = Nothing
• High Concurrency = Lower block size until interconnect traffic OK
– Consider increasing the OS priority of the global cache cluster services
• No Logging, etc… = Nothing
32
Let’s Apply the Recommendations
Read
Count
Block
Size
Cursor Cursor Index
Space Share Cache
Index
Cost
Jumbo
Test 1
16
8
False
Exact
0
100
False
Test 2
2
8
False
Exact
0
100
False
Test 3
2
4
False
Exact
0
100
False
Test 4
2
4
True
True
80
20
False
Test 5
2
4
True
True
80
20
True
33
Results – TPS (lesser interest here)
Transactions / Second
35.00
30.00
25.00
Run 1
20.00
Run 2
Run 3
15.00
Run 4
Run 5
10.00
5.00
0.00
50
100
150
200
250
300
350
400
450
500
34
Results – Average Response Time
Average Response Time
6.00
5.00
4.00
Run 1
Run 2
3.00
Run 3
Run 4
Run 5
2.00
1.00
Sub
Second
0.00
50
100
150
200
250
300
350
400
450
500
35
Thank you
Please offer any questions or comments
Remember:
• Eliminate “RAC in the Box” syndrome – eat low hanging fruit 
• Example was for TPC-C or OLTP type application
• TPC-H or Data Warehouse would NOT be the same
• Your mileage may well vary (especially percentages)
Toad World Article: “Maximize Database Performance Via Toad for Oracle”
http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx
Dell Power Solutions article:
http://www.quest.com/success_stories/Dell-Quest.pdf
36