Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Unlock Content Performance Tuning October 12,2004 PROPRIETARY AND CONFIDENTIAL Copyright © 2004 Mark Logic Corporation. All rights reserved. Copyright © 2004 Mark Logic Corporation 1 Agenda Mark Logic Sizing & Performance Framework Introduction to Performance Tuning Importance of Architecture Layered Based Approach Database Tuning Application Tuning Important Tips Summary Copyright © 2004 Mark Logic Corporation 2 Mark Logic Sizing and Performance Framework Mark Logic query evaluation Distributed architecture Merges Memory Copyright © 2004 Mark Logic Corporation 3 Mark Logic Query Evaluation eg, collection(“foo”)//article/metadata/author[@authorid=123] 1. 2. 3. 4. 5. Query execution plan created Maps a query to a intersection of result sets from indexes plus verification In this example something like [collection(“foo”)] & [article/metadata] & [metadata/author] & [author/@authorid=123] Optimizer decides what set of predicates to evaluate prior to fragment inspection Fetch list of candidate fragments from indexes identified in query plan Intersects list of fragments Fetches fragments which satisfy all indexes from plan Verifies which fragments meet query criteria Copyright © 2004 Mark Logic Corporation 4 Distributed architecture Load Balancer Query Evaluator Query Evaluator Query Evaluator Query Evaluator Increase number of evaluators to scale query processing power Fragment interface Data Manager Data Manager Data Manager Data Manager Data Store Data Store Data Store Data Store Increase number of data managers to scale data set size Distributed model allows customers to scale query evaluation and data management components independently Scalability delivered cost-effectively through horizontal infrastructure expansion Copyright © 2004 Mark Logic Corporation 5 Merges Forests are composed of stands Stands are physical sets of files on disk Stands (data and index) are periodically merged to reduce query I/O Merge occurs when: sum(smaller stand size)*ratio >= larger stand size Merge time linear in total size of stands being merged Merge rate typically 25-100 GB/hour Merges use lots of CPU (can peg a single CPU) while running Can flood I/O bus on workstation-class HW Copyright © 2004 Mark Logic Corporation 6 Memory Typically need 1GB RAM per 16-32 GB of data For typical text-type XML Smaller fragments will increase memory needs Performance requirements will influence memory needs Where the memory goes Memory mapped indexes (must have) Caches (nice to have if you care about performance) Copyright © 2004 Mark Logic Corporation 7 Performance tuning A Simple Definition Calibrating the system for desired response time is called performance tuning. Generally human scale response time is fine. Response time requirements change with application types: UI based application Back-end applications Workflow applications And many more types…. Copyright © 2004 Mark Logic Corporation 8 Important Factors in tuning Know your data Know your application Define performance goals Right application architecture Identifying bottlenecks is important as well as difficult – Use tools DB/App tuning is only one of the aspects Benchmark if necessary Plan for future Good Methodology is strong base for performance tuning Copyright © 2004 Mark Logic Corporation 9 Architecture and Design System Architecture and Design is important step for right performance, includes Database Model XML Structure Fragmentation Collections Clustering E-Node (Evaluator Node) D-Node (Data Node) XQuery and Java For applications using XDBC Connection Lifecycle Copyright © 2004 Mark Logic Corporation 10 Architecture (continued…) Indexes Default Indexes Stemmed searches, Fast Phrase searches, Fast Case Sensitive searches, Fast Element-word searches Optional Word searches, character searches and fast element character searches, range indexes Application flow specific points Using system resources efficiently Multi-threading/Parallel Processing Database contention consideration Response size Number of calls to database Copyright © 2004 Mark Logic Corporation 11 Layer Based Approach to Tuning It is important to tune each layer Hardware, Network and OS provide the playground for database and application to operate. Application Tuning Database Tuning Operating System Tuning Analogy: Even the best athlete may not perform well on a wet ground. Bottleneck may not be in DB or App layer Each Layer may have sub-layers Focus on DB and App tuning today H/W & Network Tuning Copyright © 2004 Mark Logic Corporation 12 Let’s looks at our tools xdmp:query-trace When query tracing is enabled, "info" level messages are logged detailing the search optimizations performed. xdmp:query-meters Returns the current value of the resource meters for this query. Server Logs ErrorLog, HTTP Logs, XDBC Logs OS Tools Top, glance, Task Manager, netstat, vmstat, perfmon etc. Copyright © 2004 Mark Logic Corporation 13 Database Tuning Fragmentation One of the most important tuning parameters, affects Query and Data Load performance Memory Tuning Group Caches (allocated per host) In Memory Database parameters (per forest) Indexes CIS creates most of the indexes by default Other database parameters Merge-min ratio, journal pre-allocation, merge-min size Copyright © 2004 Mark Logic Corporation 14 Database Tuning - Fragmentation doc root Key tuning parameter Directly affects query performance link node link node link node XQuery Engine Impact on disk i/o Search Indexes Also affects load performance XML Datastore 10-100K fragment size is recommended Avoid overfragmenting and under-fragmenting No need to fragment each element you search Copyright © 2004 Mark Logic Corporation 15 Database Tuning - Fragmentation XML Data: <MedlineCitationSet> <Status>XYZ</Status> <MedlineCitation Owner="NLM" Status="Completed"> Consider the example document and query <MedlineID>21980102</MedlineID> ………….. Two configurations <MedlineCitation> <MedlineCitation Owner="NLM" Status="Completed"> <MedlineID>21980104</MedlineID> ………….. <MedlineCitation> A. No Fragmentation B. Fragmentation on “MedlineCitation” File Size is 7MB …… <MedlineCitationSet> Query: /MedlineCitationSet /MedlineCitation[MedlineID = "21980102"] 2.6 Sec (A) vs. 140ms (B) ~ 19-20 times faster Copyright © 2004 Mark Logic Corporation 16 Database Tuning – Memory Query Load/Update Group Buffer caches Cache data for queries, per host in the group Effect on query performance. Database in-memory parameters Expanded Tree Cache In Memory List Compressed Tree Cache In Memory Tree List Cache In Memory Range Index Impact on in-memory stand size (per forest) Effect on load rate Minimize disk writes and number of merges Disk Copyright © 2004 Mark Logic Corporation 17 Database Tuning – Indexes CIS has 4 indexes turned “ON” by default stemmed word searches, fast phrase, fast case-sensitive and fast element word searches cts functions use these indexes cts:search(/foo,cts:word-query(“bar”, (“stemmed”,”casesensitive”))) Consider other indexes also if you need wild-carding etc. Indexes slow down the load rate No need to drop default indexes unless you need a better load rate May affect your query performance if you drop indexes Copyright © 2004 Mark Logic Corporation 18 Database Tuning – DB Parameters Controlling Merge Query performance is directly impacted by number of stands Merge Min Ratio More stands can result in slow query performance More merges can result in slow performance based on data load rates Merge min size Journal Pre-Allocate journal if possible It will allow contiguous hard disk space for journal Copyright © 2004 Mark Logic Corporation 19 Application Tuning – Database Design Database Design XML Structure Avoid Contention Collections Query Tuning Identifying slow queries using tools Tune Queries Java Layer Tuning Re-Use Connections Do more in single call if possible Copyright © 2004 Mark Logic Corporation 20 Application Tuning – Database Design Right XML Structure - Very Important Similar to right table design Searching on elements is faster than attributes – so try to define commonly searched things as elements For commonly searched condition – Try to generate metadata if possible Design the structure to avoid going bottom-up(..) in a document while querying Think about if/where fragmentation is needed Avoid Contention Design your document structure to avoid contention Multiple calls trying to acquire locks on document at the same time will be slow Copyright © 2004 Mark Logic Corporation 21 Application Tuning – Database Design Use of Collections is recommended Query 1 Query 2 Query 3 Collections Collections help in narrowing the “scope” of a query within indexes Better data management A B A document can live in multiple collections Be careful if collection membership is dynamic C Copyright © 2004 Mark Logic Corporation 22 Application Tuning – Database Design More on collections Complex Queries like /A/B[element1 = “X][not(element3=“Z")][element4@attr1= “T”]/../bar[val = “XYZ] can be simplified as collection(“foo”)/A/bar Simple and Faster Query Gains can be in order of 5-10 times Copyright © 2004 Mark Logic Corporation 23 Application Tuning – Analysis xdmp:query-meters Display Elapsed Time Display Cache Hits and Misses Display Fragments Added and Deleted Display Document and Fragment level cache efficiency For analyzing complex queries - Use elapsed time between two function calls Has some overhead Use for analysis only Copyright © 2004 Mark Logic Corporation 24 Application Tuning – Analysis Examples Simple Example count(input()), xdmp:query-meters() Another example let qm0 := xdmp:query-meters() let $val1 := foo() let qm1 := xdmp:query-meters() ……… Difference between qm0 and qm1 “elapsed-time” is time taken to execute foo() Copyright © 2004 Mark Logic Corporation 25 Application Tuning – Analysis xdmp:query-trace() Generates log message set detailing how a rooted XPath is being evaluated. Example: xdmp:query-trace(true()), /MedlineCitationSet /MedlineCitation[MedlineID = "21980102"] Tells you if a particular path used in query is searchable or not Has overhead, Turn on and off as required Copyright © 2004 Mark Logic Corporation 26 Application Tuning – Analysis 2004-10-07 11:21:18 Info: line 4: Analyzing path: input()/child::MedlineCitationSet/child::MedlineCitation[child::MedlineID = "21980102"] 2004-10-07 11:21:18 Info: line 4: Step 1 is searchable: input() 2004-10-07 11:21:18 Info: line 3: Step 2 axis is conditionally searchable: child 2004-10-07 11:21:18 Info: line 3: Step 2 test is searchable: MedlineCitationSet 2004-10-07 11:21:18 Info: line 4: Step 2 is searchable: child::MedlineCitationSet 2004-10-07 11:21:18 Info: line 4: Step 3 axis is conditionally searchable: child 2004-10-07 11:21:18 Info: line 4: Step 3 test is searchable: MedlineCitation 2004-10-07 11:21:18 Info: line 4: Step 3 predicate 1 is conditionally searchable: child::MedlineID = "21980102" 2004-10-07 11:21:18 Info: line 4: Step 3 is searchable: child::MedlineCitation[child::MedlineID = "21980102"] 2004-10-07 11:21:18 Info: line 4: Path is searchable. 2004-10-07 11:21:18 Info: line 4: Gathering constraints. 2004-10-07 11:21:18 Info: line 3: Step 2 test contributed 1 constraint: MedlineCitationSet 2004-10-07 11:21:18 Info: line 4: Step 3 test contributed 1 constraint: MedlineCitation 2004-10-07 11:21:18 Info: line 3: Comparison contributed hash value constraint: MedlineID = "21980102" 2004-10-07 11:21:18 Info: line 4: Step 3 predicate 1 contributed 1 constraint: child::MedlineID = "21980102" 2004-10-07 11:21:18 Info: line 4: Executing search. 2004-10-07 11:21:18 Info: line 4: Selected 1 fragment to filter Copyright © 2004 Mark Logic Corporation 27 Application Tuning – Query Tips XPath starting with input(), //, collection(), xdmp:directory, doc() are highly optimized Avoid passing variables holding big sequences and big results Compress let and where clauses into XPath DON’T define function bar($nodes as node()*) as xs:boolean { ……. } let $val1 := input()/foo[elementA = “XYZ”] let $val := bar($val1) DO define function bar($value as xs:string) as xs:boolean { input()/foo[elementA = “XYZ”].……. } let $val := bar(“XYZ”) Copyright © 2004 Mark Logic Corporation 28 Application Tuning – Query Tips Use simple child steps in the query evaluation Avoid “..” Before predicates DON’T /root/head[DocumentId = $docId]/../citations/citation[id = “ABC”] DO /root[head/DocumentId = $docId]/citations/citation[id = “ABC”] Copyright © 2004 Mark Logic Corporation 29 Application Tuning – Query Tips Be Specific If you want first foo in database DON’T //foo[1] DO (//foo)[1] Second query could be faster by 2-1000 times depending on how much data you have Would fetch you different results Copyright © 2004 Mark Logic Corporation 30 Application Tuning – Query Tips XPath Predicates that cross fragment boundaries cannot use indexes Assume foo is fragment root DON’T /root[foo/bar = “1”]/status DO let $i := /root/foo[bar = “1”]/../status Notice the “..” after the predicate, which is fine Copyright © 2004 Mark Logic Corporation 31 Application Tuning – XDBC Tips XDBC has in-built connection pooling. Re-Use Connections XDMPConnection conn = getConnetion(); XDBCStatement stmt = conn.createStatement(); XDBCResultSequence result = stmt.executeQuery(query); …………. …………….. finally{ //DON’T FORGET TO CLOSE result, stmt and connection. result.close(); stmt.close(); conn.close(); } You can monitor socket connections using “netstat” Copyright © 2004 Mark Logic Corporation 32 Application Tuning – XDBC Tips Execute more in one query - if you can For example: You have N documents to be updated using same query You have N results to be retrieved using same function You want to delete N documents Execute more in one query You can delete documents in a collection using “xdmp:collection-delete” function Need Cache tuning to run big queries Copyright © 2004 Mark Logic Corporation 33 Application Tuning – XDBC Tips DON’T for (int i = 0; i < n ; i++) // Execute the query multiple times { //Call XQuery Function and pass ONE value each time ………………………………. ResultSequence result = stmt.executeQuery(query); } DO List params = new ArrayList(); for (int i = 0; i < n ; i++) { //Add params to the list or create a string representing sequence ………………………. } //Pass the sequence to the function which can also return a sequence ResultSequence result = stmt.executeQuery(query); Copyright © 2004 Mark Logic Corporation 34 Summary Know you application and data Application Architecture and Design is important – Design well Analyze and Identify bottlenecks Tune each layer Test and benchmark See if desired goals are met. GO-LIVE Copyright © 2004 Mark Logic Corporation 35 References Mark Logic CIS Developer Guide Mark Logic CIS Admin Guide Mark Logic XQuery function reference xqzone support.marklogic.com Copyright © 2004 Mark Logic Corporation 36 Sample Template Thank You ASHEESH MANGLA What do we think of this [email protected] Sfdhskdfjh kjsfhd t: 650.655.2331 f: 650.655.2310 Copyright © 2004 Mark Logic Corporation Sdflkhsdf sdflk 37