Download Slide 0

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Airborne Networking wikipedia , lookup

Network tap wikipedia , lookup

Transcript
RAC be Nimble,
RAC be Quick 
Bert Scalzo, Domain Expert, Oracle Solutions
[email protected]
Copyright © 2006 Quest Software
About the Author …
Domain Expert & Product Architect for Quest Software
Oracle Background:
•
Worked with Oracle databases for over two decades (starting with version 4)
•
Work history includes time at both “Oracle Education” and “Oracle Consulting”
Academic Background:
•
Several Oracle Masters certifications
•
BS, MS and PhD in Computer Science
•
MBA (general business)
•
Several insurance industry designations
Key Interests:
•
Data Modeling
•
Database Benchmarking
•
Database Tuning & Optimization
•
"Star Schema" Data Warehouses
•
Oracle on Linux – and specifically: RAC on Linux
Articles for:
•
Oracle’s Technology Network (OTN)
•
Oracle Magazine,
•
Oracle Informant
•
PC Week (eWeek)
Articles for:
•
Dell Power Solutions
Magazine
•
The Linux Journal
•
www.linux.com
•
www.orafaq.com
1
Books by Bert …
Coming in 2009 …
Out Now …
2nd Edition
Coming Soon
2
Agenda
• RAC Challenges
– “RAC in the Box”  syndrome
– Must Optimize every subsystem
• RAC Optimization Approach
– Optimization Approach Universe
– Why a True Top-Down Approach
– Top-Down Process & Tools (but NOT a sales pitch)
• Real-world Scenario
– Dell’s use of Quest’s Approach & Tools for Oracle RAC
– “Best Practices” applied incrementally & results
3
Oracle RAC is Great, but …
• Too often people expect RAC to “auto-magically”
function out of the box with little to no optimization
• During RAC optimization attempts, people far too often
concentrate on just a single dimension – typically the
Oracle database “stuff” (i.e. ratios, waits, parms, etc…)
• During RAC optimization attempts, Oracle is often too
heavily weighted as the primary source and reason for
most, if not all, of the performance bottleneck issues
• Not enough true “application nature” is identified and
accounted for during the overall optimization process
• Result: Too many people achieve sub-par results!
4
I call it “RAC in the BOX” syndrome 
• That’s just my stupid name for it (hope it catches)
• But nearly half the RAC sites I visit are suffering from
RAC performance issues related to this!
• Still far too RAC experts among the general DBA
population (although improving each & every day)
• Really no simple, single button tools yet to make RAC
“auto-magically” “fire on all cylinders”
• Many people too often and readily bail on RAC, and
fall-back to “big SMP boxes” (the evil that they know)
• But with just a little manual “box winding”, anyone
should be able to “pop the RAC weasel” free 
5
RAC a is System, Must Tune its Entirety
6
RAC Performance = Sum of its Parts
• Application Nature (affects everything else below)
• Public Network
• Storage Network
• Storage Sub-System
• Oracle Instance Configuration
Traditional
Focus
• Oracle Cluster Configuration
• Private Network (i.e. Interconnect)
7
Optimization Approaches (i.e. Focus)
Top-Down
Pro-Active
Side-Ways (Ad-Hoc)
Bottom-Up
Re-Active
8
Optimization Approach Techniques/Tools
DBA_, V$ and X$ Aggregate Information
•Oracle OEM Diagnostics & Tuning Packs
Top-Down
•Confio Ignite for Oracle
•BMC DBXray for Oracle
•Quest Spotlight for Oracle & Spotlight for RAC
Side-Ways (Ad-Hoc)
V$, STATS$,
DBA_HIST_
Response Time ~= Wait Events
Bottom-Up
•Cary Millsap (Hotsos) calls this “Method R”
•Anjo Kolk & et al Oracle) call this “YAPP Method”
•Kyle Haily (PerfVision) paper on “Waits Defined”
Trace/Instrumentation
9
True Top-Down Optimization Focus
•
•
•
•
“Low Hanging Fruit”
Obvious yet Overlooked
Subtle yet Highly Critical
“Dumb Question” Method
Radically Different than other RAC tuning sessions:
Not going to delve into obscure hardware, OS, network,
Oracle and RAC tuning parameters or configurations,
Just easy stuff that makes a big difference 
10
Why Top-Down Works (now don’t laugh…)
Drive Dallas to Austin:
•Initial route:
•45 south to Houston, 610 west to 290, and
then 290 west to Austin
•Drive Time = 7 Hours!
•Improved Route:
•45 south, cut across 79 west half way to
Houston, and then 35 south to Austin
•Drive Time = 4.5 Hours
•36% Reduction
•Top-Down Route:
•Dumb Question: Why Houston?
•35 South
•Drive Time = 3.5 Hours
•50% Reduction
11
RAC Performance Testing Process (using tools)
1. Benchmark Factory
2. Spotlight on RAC
Industry standard
benchmark: TPC-C
Key Metric = Avg
Response Time
Record before & after results
Apply Top-Down
Analysis & Revision
Confirm improvements
3. TOAD with DBA Module
AWR/ADDM & Stats Pack
Again record before & after
for improvements confirmation
12
Tool #1 – Load Generator
Benchmark Factory®
• Test Oracle RAC environments rapidly and reliably
• Perform database “scalability” or “goal” testing to
determine the most optimal RAC configuration
• Tests:
– TPC-C
– TPC-H
– Trace File playback
– etc, etc, etc …
• Let’s DBA concentrate on task at hand - Optimization
13
14
Tool #2 – Ad-Hoc Monitor (to confirm results)
Spotlight® on RAC
• Monitor Oracle RAC environments rapidly and reliably
• Diagnose Oracle RAC environment health levels at
– Node
– Cluster
– ASM
– Instance
– Interconnect
• Intelligent performance alerts plus market-leading GUI
for entire RAC to instances architecture & bottlenecks
• Let’s DBA concentrate on task at hand - Diagnosing
15
16
17
18
Tool #3 – Ad-Hoc Monitor (to confirm results)
TOAD® with DBA Module
• Expedite typical DBA management & tuning tasks
• Great Productivity Enhancing Features
– Database Health Check
– Database Probe
– Database Monitor
– AWR/ADDM Reports
– UNIX Monitor
– Stats Pack Reports
• See Toad World paper
– Title: “Maximize Database Performance Via Toad for Oracle”
–
http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx
• Let’s DBA concentrate on task at hand – Correcting (i.e. Fixing)
19
20
21
Real-world Scenario
Quest strategic partner & customer
Dell uses Quest’s solution for Oracle
RAC to test the performance of the
Oracle RAC architecture running on
Dell Power Edge servers and EMC
Clarion SAN & iSCSI Disk Arrays
22
DELL Success Story
www.Quest.com/success_stories/Dell-Quest.pdf
23
DELL Success Story
Database Configuration used at DELL
for the RAC test environment
24
Step-By-Step Example
Apply Methodology, “Best Practices”,
and Quest’s RAC tools to optimize
and quantify the approximate
percentage of the improvements
Note – will quote some specific examples for a
given RAC setup, your mileage will surely vary
Test = TPC-C (OLTP) for 200-2000 users, 10 GB
25
Remember – must tune RAC as system
Start Here
Ask lots of
Questions …
26
Step 1 - Application Nature
Know Your Application Demands (this info flows downstream)
•
OLTP vs. Data Warehousing
–
Primarily Read vs. Write
–
Average Transaction Size
–
Likelihood of “Dead Lock”
–
Logging, Flashback and Recovery Requirements
–
etc, etc, etc …
•
Concurrent User Load Profile (i.e. user load over time)
•
Focus on User Response Time Requirements
–
For example, TPC-C must run each transaction <= 2 seconds
–
Response Time ~= Wait Events
• Cary Millsap (Hotsos) calls this “Method R”
• Anjo Kolk & et al (Oracle) call this “YAPP Method”
• Kyle Haily (PerfVision) paper “Waits Defined”
•
Don’t skip this step – cost can be enormous – and that no
network, OS, or database tuning can compensate for !!!!!
27
Application “Best Practices”
Well known rules:
• Write efficient SQL and/or PL/SQL code (explain plans)
• Use bind variables to reduce unnecessary “re-parsing”
• Often the underlying Application Code (e.g. Benchmark)
NOT changeable, so you can’t do anything
Deeper TPC-C Analysis (remember across all next steps):
• Primarily Read with Some Writes
• Small Average Transaction Size
• High Concurrency with Potential Deadlocks
• Logging for ACID compliance and no flashback
28
Public Network “Best Practices”
Well known rules:
• Isolate Network (for single or related applications only)
• Use Gigabit Ethernet (consider “bonding” multiple cards)
• Use Layer 2 or 3 Switches and verify Gigabit throughput
TPC-C Analysis Ramifications:
• Primarily Reads = Nothing
• Small Transaction = No jumbo frames, Standard SDU/TCU
• High Concurrency = Multiple Ethernet Segments (collisions)
• No Logging, etc… = Nothing
29
Storage Network “Best Practices”
Well known rules:
• Isolate Network (for single or related applications only)
• Use Fiber Channel for SAN, 10GB Ethernet for NAS/iSCSI
• Consider multiple pathways per storage controller and HBA
• Consider TCP/IP offload engine (TOE) NIC’s or iSCSI HBA’s
TPC-C Analysis Ramifications:
• Primarily Reads = Nothing
• Small Transaction = Jumbo frames since “Block Level” IO
• High Concurrency = Fiber Channel and Multiple Pathways (if budget)
• No Logging, etc… = Nothing
30
Storage Sub-System “Best Practices”
Well known rules:
• More Smaller Disks generally higher overall throughput
• More memory cache generally higher overall throughput
• Avoid “write-back” mode if no backup power source (e.g. battery)
• Align Stripe Boundaries: drive, OS block, LVM, file sys, database block, etc
• Stripe Depth (i.e. size) from 256 KB to 1 MB
• Stripe Width (i.e. # disks) between 4 and 16
• Stripe Depth = Stripe Width X Drive IO Size = One IO per Disk per IO request
• Average I/O <= Stripe Width X Stripe Depth
• Write-intensive = RAID 0+1/1+0 and Read-intensive = RAID 3 (sequential) or 5 (scattered)
TPC-C Analysis Ramifications:
• Primarily Reads = RAID 5, Adjust cache memory allocations & look-ahead algorithms
• Small Transaction = Stripe Depth >= db_block_size X db_file_multiblock_read_count
• High Concurrency = Low Deadlock, so spread DB objects and partitions across LUN’s
• No Logging, etc… = Logging but no flashback, so write IO is reasonable, so RAID 5 OK
31
Oracle Instance “Best Practices”
Well known rules:
• Size & Tune the SGA appropriately for application nature
• Choose reasonable block size based on application nature (?8K?)
• Partition large objects & indexes across storage devices & spindles
• Don’t assume any “golden rules” – i.e. test all assumptions!
TPC-C Analysis Ramifications:
• Primarily Reads = opt_index_caching=80, opt_index_adj_cost=20
• Small Transaction = Size redo logs correctly for small size X high load
• High Concurrency = cursor_space_for_time=t, cursor_sharng=similar
• No Logging, etc… = Turn off “Recycle Bin” but keep “LOGGING”
32
Oracle Cluster “Best Practices”
Well known rules:
• Increase default SGA size for all ASM instances (64M too small)
• Interconnect is the most important bottleneck – many bonded NIC’s
• Consider hash partitions & reverse indexes to spread IO across nodes
• Don’t assume any “golden rules” – i.e. really test all assumptions!!!
TPC-C Analysis Ramifications:
• Primarily Reads = Nothing
• Small Transaction = Decrease db_file_multiblock_read_count
• High Concurrency = Decrease block size (?4K?) – see next slide
• No Logging, etc… = Nothing
33
Single Instance – No Block Contention
34
Cluster – Block Contention Costs
35
Private Network “Best Practices”
Well known rules:
• Isolate Network (for single cluster only – and 0% public)
• Use 10GB Ethernet or Inifini-Band (Dell found 15% RTI)
• Consider multiple pathways per HBA and storage controller
• Jumbo frames since high “Block Level” IO between nodes
TPC-C Analysis Ramifications:
• Primarily Reads = Nothing
• Small Transaction = Nothing
• High Concurrency = Lower block size until interconnect traffic OK
– Consider increasing the OS priority of the global cache cluster services
• No Logging, etc… = Nothing
36
Let’s Apply the Recommendations
1
Read
Count
2
Block
Size
3
4
Test 1
16
8
False
Exact
0
100
False
Test 2
2
8
False
Exact
0
100
False
Test 3
2
4
False
Exact
0
100
False
Test 4
2
4
True
True
80
20
False
Test 5
2
4
True
True
80
20
True
Cursor Cursor Index
Space Share Cache
Index
Cost
5
Jumbo
37
Results – TPS (lesser interest here)
Transactions / Second
35.00
30.00
25.00
Run 1
20.00
Run 2
Run 3
15.00
Run 4
Run 5
10.00
5.00
0.00
50
100
150
200
250
300
350
400
450
500
38
Results – Average Response Time
Average Response Time
6.00
5.00
4.00
Run 1
Run 2
3.00
Run 3
Run 4
Run 5
2.00
1.00
Sub
Second
0.00
50
100
150
200
250
300
350
400
450
500
39
Thank you
Please offer any questions or comments
Remember:
• Eliminate “RAC in the Box” syndrome – eat low hanging fruit 
• Example was for TPC-C or OLTP type application
• TPC-H or Data Warehouse would NOT be the same
• Your mileage may well vary (especially percentages)
Toad World Article: “Maximize Database Performance Via Toad for Oracle”
http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx
Dell Power Solutions article:
http://www.quest.com/success_stories/Dell-Quest.pdf
40