Download unit-v databaseTunin..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Versant Object Database wikipedia , lookup

Transcript
Unlock Content
Performance Tuning
October 12,2004
PROPRIETARY AND CONFIDENTIAL
Copyright © 2004 Mark Logic Corporation. All rights reserved.
Copyright © 2004 Mark Logic Corporation
1
Agenda
Mark Logic Sizing & Performance Framework
Introduction to Performance Tuning
Importance of Architecture
Layered Based Approach
Database Tuning
Application Tuning
Important Tips
Summary
Copyright © 2004 Mark Logic Corporation
2
Mark Logic Sizing and Performance
Framework
Mark Logic query evaluation
Distributed architecture
Merges
Memory
Copyright © 2004 Mark Logic Corporation
3
Mark Logic Query Evaluation
eg, collection(“foo”)//article/metadata/author[@authorid=123]
1.
2.
3.
4.
5.
Query execution plan created
Maps a query to a intersection of result sets from indexes
plus verification
In this example something like [collection(“foo”)] &
[article/metadata] & [metadata/author] &
[author/@authorid=123]
Optimizer decides what set of predicates to evaluate
prior to fragment inspection
Fetch list of candidate fragments from indexes identified in
query plan
Intersects list of fragments
Fetches fragments which satisfy all indexes from plan
Verifies which fragments meet query criteria
Copyright © 2004 Mark Logic Corporation
4
Distributed architecture
Load Balancer
Query
Evaluator
Query
Evaluator
Query
Evaluator
Query
Evaluator

Increase number of evaluators to
scale query processing power
Fragment interface
Data
Manager
Data
Manager
Data
Manager
Data
Manager
Data
Store
Data
Store
Data
Store
Data
Store

Increase number of data managers to
scale data set size
Distributed model allows customers to scale query evaluation and data management
components independently
 Scalability delivered cost-effectively through horizontal infrastructure expansion

Copyright © 2004 Mark Logic Corporation
5
Merges
Forests are composed of stands
Stands are physical sets of files on disk
Stands (data and index) are periodically merged to
reduce query I/O
Merge occurs when:
sum(smaller stand size)*ratio >= larger stand size
Merge time linear in total size of stands being merged
Merge rate typically 25-100 GB/hour
Merges use lots of CPU (can peg a single CPU) while
running
Can flood I/O bus on workstation-class HW
Copyright © 2004 Mark Logic Corporation
6
Memory
Typically need 1GB RAM per 16-32 GB of data
For typical text-type XML
Smaller fragments will increase memory needs
Performance requirements will influence
memory needs
Where the memory goes
Memory mapped indexes (must have)
Caches (nice to have if you care about
performance)
Copyright © 2004 Mark Logic Corporation
7
Performance tuning
A Simple Definition
Calibrating the system for desired response time is called
performance tuning.
Generally human scale response time is fine.
Response time requirements change with application
types:
UI based application
Back-end applications
Workflow applications
And many more types….
Copyright © 2004 Mark Logic Corporation
8
Important Factors in tuning
Know your data
Know your application
Define performance goals
Right application architecture
Identifying bottlenecks is important as well as
difficult – Use tools
DB/App tuning is only one of the aspects
Benchmark if necessary
Plan for future
Good Methodology is strong base for performance tuning
Copyright © 2004 Mark Logic Corporation
9
Architecture and Design
System Architecture and Design is important step for
right performance, includes
Database Model
XML Structure
Fragmentation
Collections
Clustering
E-Node (Evaluator Node)
D-Node (Data Node)
XQuery and Java
For applications using XDBC
Connection Lifecycle
Copyright © 2004 Mark Logic Corporation
10
Architecture (continued…)
Indexes
Default Indexes
 Stemmed searches, Fast Phrase searches, Fast Case
Sensitive searches, Fast Element-word searches
Optional
 Word searches, character searches and fast element
character searches, range indexes
Application flow specific points
Using system resources efficiently
Multi-threading/Parallel Processing
Database contention consideration
Response size
Number of calls to database
Copyright © 2004 Mark Logic Corporation
11
Layer Based Approach to Tuning
It is important to tune
each layer
Hardware, Network and OS
provide the playground for
database and application to
operate.
Application Tuning
Database Tuning
Operating System Tuning
Analogy: Even the best
athlete may not perform
well on a wet ground.
Bottleneck may not be in
DB or App layer
Each Layer may have
sub-layers
Focus on DB and App
tuning today
H/W & Network Tuning
Copyright © 2004 Mark Logic Corporation
12
Let’s looks at our tools
xdmp:query-trace
When query tracing is enabled, "info" level messages are
logged detailing the search optimizations performed.
xdmp:query-meters
Returns the current value of the resource meters for this
query.
Server Logs
ErrorLog, HTTP Logs, XDBC Logs
OS Tools
Top, glance, Task Manager, netstat, vmstat, perfmon etc.
Copyright © 2004 Mark Logic Corporation
13
Database Tuning
Fragmentation
One of the most important tuning parameters, affects
Query and Data Load performance
Memory Tuning
Group Caches (allocated per host)
In Memory Database parameters (per forest)
Indexes
CIS creates most of the indexes by default
Other database parameters
Merge-min ratio, journal pre-allocation, merge-min size
Copyright © 2004 Mark Logic Corporation
14
Database Tuning - Fragmentation
doc root
Key tuning parameter
Directly affects query
performance
link
node
link node
link node
XQuery Engine
Impact on disk i/o
Search Indexes
Also affects load
performance
XML Datastore
10-100K fragment size
is recommended
Avoid overfragmenting and
under-fragmenting
No need to fragment
each element you
search
Copyright © 2004 Mark Logic Corporation
15
Database Tuning - Fragmentation
XML Data:
<MedlineCitationSet>
<Status>XYZ</Status>
<MedlineCitation Owner="NLM" Status="Completed">
Consider the
example document
and query
<MedlineID>21980102</MedlineID>
…………..
Two configurations
<MedlineCitation>
<MedlineCitation Owner="NLM" Status="Completed">
<MedlineID>21980104</MedlineID>
…………..
<MedlineCitation>
A. No Fragmentation
B. Fragmentation on
“MedlineCitation”
File Size is 7MB
……
<MedlineCitationSet>
Query:
/MedlineCitationSet /MedlineCitation[MedlineID = "21980102"]
2.6 Sec
(A)
vs. 140ms (B)
~ 19-20 times faster
Copyright © 2004 Mark Logic Corporation
16
Database Tuning – Memory
Query
Load/Update
Group Buffer caches
Cache data for queries,
per host in the group
Effect on query
performance.
Database in-memory
parameters
Expanded Tree
Cache
In Memory List
Compressed Tree
Cache
In Memory Tree
List Cache
In Memory Range
Index
Impact on in-memory
stand size (per forest)
Effect on load rate
Minimize disk writes
and number of merges
Disk
Copyright © 2004 Mark Logic Corporation
17
Database Tuning – Indexes
CIS has 4 indexes turned “ON” by default
stemmed word searches, fast phrase, fast case-sensitive
and fast element word searches
cts functions use these indexes
cts:search(/foo,cts:word-query(“bar”, (“stemmed”,”casesensitive”)))
Consider other indexes also if you need wild-carding
etc.
Indexes slow down the load rate
No need to drop default indexes unless you need a better
load rate
May affect your query performance if you drop indexes
Copyright © 2004 Mark Logic Corporation
18
Database Tuning – DB Parameters
Controlling Merge
Query performance is directly impacted by number of
stands
Merge Min Ratio
More stands can result in slow query performance
More merges can result in slow performance based on data
load rates
Merge min size
Journal
Pre-Allocate journal if possible
It will allow contiguous hard disk space for journal
Copyright © 2004 Mark Logic Corporation
19
Application Tuning – Database Design
Database Design
XML Structure
Avoid Contention
Collections
Query Tuning
Identifying slow queries using tools
Tune Queries
Java Layer Tuning
Re-Use Connections
Do more in single call if possible
Copyright © 2004 Mark Logic Corporation
20
Application Tuning – Database Design
Right XML Structure - Very Important
Similar to right table design
Searching on elements is faster than attributes – so try to define
commonly searched things as elements
For commonly searched condition – Try to generate metadata if
possible
Design the structure to avoid going bottom-up(..) in a document
while querying
Think about if/where fragmentation is needed
Avoid Contention
Design your document structure to avoid contention
Multiple calls trying to acquire locks on document at the same
time will be slow
Copyright © 2004 Mark Logic Corporation
21
Application Tuning – Database Design
Use of Collections is recommended
Query 1
Query 2
Query 3
Collections
Collections help in
narrowing the “scope”
of a query within
indexes
Better data
management
A
B
A document can live in
multiple collections
Be careful if collection
membership is dynamic
C
Copyright © 2004 Mark Logic Corporation
22
Application Tuning – Database Design
More on collections
Complex Queries like
/A/B[element1 = “X][not(element3=“Z")][element4@attr1=
“T”]/../bar[val = “XYZ]
can be simplified as
collection(“foo”)/A/bar
Simple and Faster Query
Gains can be in order of 5-10 times
Copyright © 2004 Mark Logic Corporation
23
Application Tuning – Analysis
xdmp:query-meters
Display Elapsed Time
Display Cache Hits and Misses
Display Fragments Added and Deleted
Display Document and Fragment level cache
efficiency
For analyzing complex queries - Use elapsed time
between two function calls
Has some overhead
Use for analysis only
Copyright © 2004 Mark Logic Corporation
24
Application Tuning – Analysis
Examples
Simple Example
count(input()), xdmp:query-meters()
Another example
let qm0 := xdmp:query-meters()
let $val1 := foo()
let qm1 := xdmp:query-meters()
………
Difference between qm0 and qm1 “elapsed-time” is time
taken to execute foo()
Copyright © 2004 Mark Logic Corporation
25
Application Tuning – Analysis
xdmp:query-trace()
Generates log message set detailing how a rooted XPath
is being evaluated. Example:
xdmp:query-trace(true()),
/MedlineCitationSet /MedlineCitation[MedlineID =
"21980102"]
Tells you if a particular path used in query is searchable
or not
Has overhead, Turn on and off as required
Copyright © 2004 Mark Logic Corporation
26
Application Tuning – Analysis
2004-10-07 11:21:18 Info: line 4: Analyzing path:
input()/child::MedlineCitationSet/child::MedlineCitation[child::MedlineID = "21980102"]
2004-10-07 11:21:18 Info: line 4: Step 1 is searchable: input()
2004-10-07 11:21:18 Info: line 3: Step 2 axis is conditionally searchable: child
2004-10-07 11:21:18 Info: line 3: Step 2 test is searchable: MedlineCitationSet
2004-10-07 11:21:18 Info: line 4: Step 2 is searchable: child::MedlineCitationSet
2004-10-07 11:21:18 Info: line 4: Step 3 axis is conditionally searchable: child
2004-10-07 11:21:18 Info: line 4: Step 3 test is searchable: MedlineCitation
2004-10-07 11:21:18 Info: line 4: Step 3 predicate 1 is conditionally searchable: child::MedlineID =
"21980102"
2004-10-07 11:21:18 Info: line 4: Step 3 is searchable: child::MedlineCitation[child::MedlineID =
"21980102"]
2004-10-07 11:21:18 Info: line 4: Path is searchable.
2004-10-07 11:21:18 Info: line 4: Gathering constraints.
2004-10-07 11:21:18 Info: line 3: Step 2 test contributed 1 constraint: MedlineCitationSet
2004-10-07 11:21:18 Info: line 4: Step 3 test contributed 1 constraint: MedlineCitation
2004-10-07 11:21:18 Info: line 3: Comparison contributed hash value constraint: MedlineID = "21980102"
2004-10-07 11:21:18 Info: line 4: Step 3 predicate 1 contributed 1 constraint: child::MedlineID =
"21980102"
2004-10-07 11:21:18 Info: line 4: Executing search.
2004-10-07 11:21:18 Info: line 4: Selected 1 fragment to filter
Copyright © 2004 Mark Logic Corporation
27
Application Tuning – Query Tips
XPath starting with input(), //, collection(),
xdmp:directory, doc() are highly optimized
Avoid passing variables holding big sequences and big results
Compress let and where clauses into XPath
DON’T
define function bar($nodes as node()*) as xs:boolean {
…….
}
let $val1 := input()/foo[elementA = “XYZ”]
let $val := bar($val1)
DO
define function bar($value as xs:string) as xs:boolean
{
input()/foo[elementA = “XYZ”].…….
}
let $val := bar(“XYZ”)
Copyright © 2004 Mark Logic Corporation
28
Application Tuning – Query Tips
Use simple child steps in the query evaluation
Avoid “..” Before predicates
DON’T
/root/head[DocumentId = $docId]/../citations/citation[id = “ABC”]
DO
/root[head/DocumentId = $docId]/citations/citation[id = “ABC”]
Copyright © 2004 Mark Logic Corporation
29
Application Tuning – Query Tips
Be Specific
If you want first foo in database
DON’T
//foo[1]
DO
(//foo)[1]
Second query could be faster by 2-1000 times depending on how much data
you have
Would fetch you different results
Copyright © 2004 Mark Logic Corporation
30
Application Tuning – Query Tips
XPath Predicates that cross fragment
boundaries cannot use indexes
Assume foo is fragment root
DON’T
/root[foo/bar = “1”]/status
DO
let $i := /root/foo[bar = “1”]/../status
Notice the “..” after the predicate, which is fine
Copyright © 2004 Mark Logic Corporation
31
Application Tuning – XDBC Tips
XDBC has in-built connection pooling.
Re-Use Connections
XDMPConnection conn = getConnetion();
XDBCStatement stmt = conn.createStatement();
XDBCResultSequence result = stmt.executeQuery(query);
………….
……………..
finally{
//DON’T FORGET TO CLOSE result, stmt and connection.
result.close(); stmt.close(); conn.close();
}
You can monitor socket connections using “netstat”
Copyright © 2004 Mark Logic Corporation
32
Application Tuning – XDBC Tips
Execute more in one query - if you can
For example:
You have N documents to be updated using same
query
You have N results to be retrieved using same function
You want to delete N documents
Execute more in one query
You can delete documents in a collection using
“xdmp:collection-delete” function
Need Cache tuning to run big queries
Copyright © 2004 Mark Logic Corporation
33
Application Tuning – XDBC Tips
DON’T
for (int i = 0; i < n ; i++)
// Execute the query multiple times
{
//Call XQuery Function and pass ONE value each time
……………………………….
ResultSequence result = stmt.executeQuery(query);
}
DO
List params = new ArrayList();
for (int i = 0; i < n ; i++)
{
//Add params to the list or create a string representing sequence
……………………….
}
//Pass the sequence to the function which can also return a sequence
ResultSequence result = stmt.executeQuery(query);
Copyright © 2004 Mark Logic Corporation
34
Summary
Know you application and data
Application Architecture and Design is
important – Design well
Analyze and Identify bottlenecks
Tune each layer
Test and benchmark
See if desired goals are met.
GO-LIVE 
Copyright © 2004 Mark Logic Corporation
35
References
Mark Logic CIS Developer Guide
Mark Logic CIS Admin Guide
Mark Logic XQuery function reference
xqzone
support.marklogic.com
Copyright © 2004 Mark Logic Corporation
36
Sample Template
Thank You
ASHEESH MANGLA
What
do we think of this
[email protected]
Sfdhskdfjh kjsfhd
t: 650.655.2331
f: 650.655.2310
Copyright © 2004 Mark Logic Corporation
Sdflkhsdf sdflk
37