Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MarkLogic 8: Semantics Stephen Buxton, John Snelson, Aries Li November 2014 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. MarkLogic 8 Feature Presentations Topics Product Manager Developer Experience: Samplestack and Reference Architecture Kasey Alderete Developer Experience: Node.js and Java Client APIs, Server-side JavaScript, and Native JSON Justin Makeig REST Management API, Flexible Replication, Sizing, and Reference Hardware Architectures Caio Milani Bitemporal Jim Clark Semantics Stephen Buxton SLIDE: 2 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Agenda Semantics Overview – 1-3-10 – MarkLogic Semantics in 1 slide – How MarkLogic Semantics supports the "pillars" – 3 slides – 10 slides to explain MarkLogic Semantics SPARQL 1.1 support SPARQL Update Inference Performance Q&A SLIDE: 3 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics Enterprise triple store, document store, database combined Store and query billions of facts and relationships; infer new facts Facts and relationships provide context for better search Flexible data modeling—integrate and link data from different sources Standards-based for ease of use and integration – RDF, SPARQL, and standard REST interfaces Even better with Built-in Search and Bitemporal – Triples, documents, and data combined SLIDE: 4 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Better Answers From Today’s Data Find more relevant information using facts as context – Example: Search for "cardiac catheter"; show documents about "devices that stimulate nerves" and "implantable devices" Present more, better information for more productive users – Example: Search for "Ireland"; show facts about Ireland with search results Publish information dynamically to web or print or mobile – Example: BBC Sports page about a team, event, sport, person – Example: Wiley Custom Select for customized learning materials SLIDE: 5 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. More Information, More Productivity SLIDE: 6 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Intelligent Data Layer Discover connections between entities – Example: Show me papers cited by Steven Pinker, and papers cited by… Walk hierarchies – Example: Show me who directly owns Acme, and who ultimately owns them Infer new facts for simple data modeling, powerful queries – Example: If Acme owns Amertek, then Amertek is owned by Acme – Example: If prod001 is a blue henley, it's also a blue shirt Facts may be embedded in documents to keep context SLIDE: 7 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. More Intelligent Web Experiences Enforced Linear Journey SLIDE: 8 Contextual Journey © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Simpler Data Integration Flexible data modeling through triples – Triples are atomic and schema-less – Triples are easy to share, easy to combine, and readily available Integrate data by adding links – Load triples as-is, then add triples to link entities or documents – No need to change the underlying data – Example: cust123 (source1) is the sameAs cus_id_456 (source2) – Example: cust123 hasOrderDoc /orders/ab42.json SLIDE: 9 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Simpler Data Integration Disconnected Data cust123 Semantics as the Glue cust123 cust_id_456 order_ab12 Accounts SLIDE: 10 Support Acquired Company Accounts Support Acquired Company © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Example of an App Using Semantics How does “Euro zone” relate to “European Union”, “Europe OECD”, or “Europe”? How does a term such as “Small States,” relate to “Least Developed Countries,” “Lower Middle Income,” or “Low & Middle Income.” SLIDE: 11 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. What Is “Linked Data”? Example of RDF <http://example.org/dir/js> <http://xmlns.com/foaf/0.1/firstname> "John". <http://example.org/dir/js> <http://xmlns.com/foaf/0.1/lastname> "Smith". Example of SPARQL SELECT ?person ?place WHERE { ?person <http://example.org/LivesIn> ?place . ?place <http://example.org/IsIn> "England" . } SLIDE: 12 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. MarkLogic Semantics Architecture Interface Layer JSON, XML, RDF, Geo, Binaries mlcp HTTP Protocol REST API JavaScript XQuery SQL SPARQL Query Layer JS SPARQL XQuery Indexes Universal Index Geospatial Index Triple Index Triple Cache Scalability and Elasticity ACID Transactions Bitemporal Triple Store Storage Layer SLIDE: 13 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Combination Query Example A call comes into your call center: – “Some maniac in a blue van just tried to run me down" – "I got the first three letters of his license plate: ABC" You need to look for similar incident reports – Reports that mention a "blue van" … around the same time … around the same place … with a license plate that starts with "ABC" SLIDE: 14 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Combination Query Example <report> vehicle near airport </title> <title>Suspicious Suspicious vehicle… <date> 2012-11-12Z </date> <type> observation/surveillance</type> <threat> <type> suspicious activity</type> <category> suspicious vehicle </category> </threat> <location> <lat> 37.497075 </lat> <long> -122.363319 </long> </location> van… with license plate ABC 123 was observed parked behind the airport sign… <description> A blue van <triple><subject>IRIID </subject> <predicate>isa </predicate><object> license-plate</object></triple> <triple><subject>IRIID </subject> <predicate>value </predicate><object> ABC 123 </object></triple> </description> </report> SLIDE: 15 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. MarkLogic Semantics Use Cases Information Delivery Platforms Open Government Initiatives – Dynamic Publishing Regulatory compliance – Custom Publishing Data provenance Data Integration – Customer 360 – Know Your Customer Enterprise Reference Data Research Management – Patient 360 Repository Decision Support Metadata Catalogs Intelligence, fraud detection SLIDE: 16 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Better Together Inference Traversal Document Store SLIDE: 17 + Data Store + Triple Store © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Tech Specs MarkLogic 7 MarkLogic 8 Store and manage hundreds of billions of RDF triples Everything in MarkLogic 7, plus: Query across documents, data, and triples SPARQL 1.1 including updates and aggregates Triple index for sub-second search results Graph traversal with property paths and transitive closures Triple cache for high performance across large clusters Automatic inference using rule sets Bulk-load triples via MarkLogic Content Pump – Supplied rule sets for RDFS, RDFS+, OWL Horst Provenance and reification by adding metadata – Support for user-defined rule sets SPARQL 1.0+ over REST or XQuery SPARQL calls from server-side programs with query restrictions Standard SPARQL endpoint and graph store protocol support XQuery helper modules for serializations and transitive closures Updates, aggregates via MarkLogic APIs Semantic enrichment with partners (e.g. Smartlogic, Temis, NetOwl) Enterprise Features: ACID transactions, scalability and elasticity, HA/DR, government-grade security, monitoring and performance tools SLIDE: 18 SPARQL from server-side JavaScript, Node.js © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Agenda Semantics Overview – 1-3-10 SPARQL 1.1 support – Property paths – SPARQL Aggregates SPARQL Update Inference Performance Q&A SLIDE: 19 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SPARQL 1.1 Support MarkLogic 8.0-1 includes support for all the major features of SPARQL 1.1 including paths, aggregates, and SPARQL Update Details: see https://wiki.marklogic.com/display/rootwiki/SPARQL+support+- +MarkLogic+8.0-1 Contact [email protected] with enhancement requests (or log a support ticket) SLIDE: 20 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Property paths SPARQL 1.1 Query – Property Paths Property path operators added for MarkLogic 8: – ? – zero or one path – + – one or more path – * – zero or more path SLIDE: 21 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Property paths - example ## find papers that cite paperA, and papers that cite papers that cite paperA, and so on SELECT ?s WHERE { ?s c:cites*/dc:title "Paper A" . } ORDER BY ?s Note: taken from Bob DuCharme's "Learning SPARQL" For more examples, see http://ea.marklogic.com/wp-content/uploads/2014/08/SPARQL-paths-examplesEA2.pdf SLIDE: 22 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SPARQL aggregates Aggregate functionality includes: AVG GROUP_CONCAT GROUP BY .. HAVING <some aggregate variable> SUM ORDER BY <some aggregate variable> MIN GROUP BY <more than one item> GROUP BY COUNT MAX SAMPLE SLIDE: 23 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SPARQL aggregates - example ## count how many companies are in each industry sector SELECT ?industry ( COUNT ( ?company ) AS ?count_companies ) FROM <http://marklogic.com/semantics/sb/COMPANIES100/> WHERE { ?company demov:industry ?industry . } GROUP BY ?industry For more examples, see http://ea.marklogic.com/wp-content/uploads/2014/09/SPARQL-aggregates-examples-EA2.pdf SLIDE: 24 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Agenda Semantics Overview – 1-3-10 SPARQL 1.1 support SPARQL Update – SPARQL Update operations, examples – Managed triples – Graph permissions – Locking Inference Performance Q&A SLIDE: 25 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SPARQL UPDATE operations 1. GRAPH MANAGEMENT Manipulate RDF graphs using the SPARQL 1.1 Update language CREATE – create a graph DROP – drop a graph and its contents COPY – make the destination graph into a copy of the source graph; any content in the destination graph before this operation will be removed (think copy/paste) MOVE – move the contents of the source graph into the destination graph, and remove them from the source graph; any content in the destination graph before this operation will be removed (think cut/paste) ADD – add the contents of the source graph into the destination graph; keep the source graph intact; keep the initial contents of the destination graph intact SLIDE: 26 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SPARQL UPDATE operations 2. GRAPH UPDATE Delete, insert, and update (delete/insert) triples using the SPARQL 1.1 Update language. INSERT DATA DELETE DATA DELETE .. INSERT WHERE DELETE WHERE INSERT WHERE CLEAR SLIDE: 27 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SPARQL Update – example[1] DROP SILENT GRAPH <graph-1> ; CREATE GRAPH <graph-1> ; PREFIX PREFIX prod: http://example.com/products/ PREFIX ex: <http://example.com/> INSERT DATA { GRAPH <graph-1> { prod:1001 rdf:type ex:color prod:1002 rdf:type ex:color } } ; SLIDE: 28 ex:Henley ; "blue" . ex:Shirt ; "red" . © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SPARQL Update – example[2] PREFIX prod: <http://example.com/products/> PREFIX ex: http://example.com/ ## change all blue products to (only) "azure" WITH <http://marklogic.com/semantics/sb/products/> DELETE { ?prod ex:color ?c . } INSERT { ?prod ex:color "azure" . } WHERE { ?prod ex:color "blue" } SLIDE: 29 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SPARQL Update – more examples GRAPH MANAGEMENT: http://ea.marklogic.com/wp-content/uploads/2014/08/SPARQL-update-examples- graph-EA2.pdf GRAPH UPDATE: http://ea.marklogic.com/wp-content/uploads/2014/08/SPARQL-update-examples- EA2.pdf SLIDE: 30 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SPARQL Update APIs Query Console – new mode "SPARQL Update" REST – post to /v1/graphs/sparql Server-side built-ins – sem:sparql-update() – sem.sparqlUpdate() SLIDE: 31 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Managed Triples SPARQL Update operations operate over managed triples only Managed triples are triples loaded into the database using – mlcp with -input_file_type RDF – sem:rdf-load() – sem:rdf-insert() cf embedded triples – Triples embedded in XML or JSON documents – SPARQL Update operations don’t affect embedded triples [Strictly, a managed triple is a sem:triple element in a document with root element sem:triples] SLIDE: 32 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Graph Permissions Set permissions when you create a graph: import module namespace sem = http://marklogic.com/semantics at "/MarkLogic/semantics.xqy"; sem:sparql-update( ) 'CREATE GRAPH <graphs/sb/graph-1>', (),(),(), ( xdmp:permission( "demo-reader", "read" ), xdmp:permission( "demo-writer", "update" ) ) Note: arg5 is called $default-permissions , but you should set permissions explicitly – See also sem:graph-set-permissions() SLIDE: 33 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Graph Permissions Check permissions on the graph you just created: import module namespace sem = http://marklogic.com/semantics at "/MarkLogic/semantics.xqy"; sem:graph-get-permissions( sem:iri("graphs/sb/graph-1") ) See also – sem:graph-set-permissions() – sem:graph-add-permissions() – sem:graph-remove-permissions() SLIDE: 34 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Graph Permissions Set permissions when you create a graph: var sem = require('/MarkLogic/semantics'); sem.sparqlUpdate( 'CREATE GRAPH <graphs/sb/graph-2>', [],[],[], ( xdmp.permission( "demo-reader", "read" ), xdmp.permission( "demo-writer", "update" ) ) ) Note: arg5 is called $default-permissions , but you should set permissions explicitly – See also sem.graphSetPermissions() SLIDE: 35 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Graph Permissions Check permissions on the graph you just created: var sem = require('/MarkLogic/semantics'); sem.graphGetPermissions( sem.iri("graphs/sb/graph-5") ) See also – sem:graphSetPermissions() – sem:graphAddPermissions() – sem:graphRemovePermissions() SLIDE: 36 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Graph Permissions All SPARQL queries over managed triples are governed by the graph permissions – … because managed triples documents will inherit those permissions at ingest Artefact: for each graph, you'll see a graph document – The database URI of the graph document is the URI of the graph – The graph document belongs to: – a collection whose name is the URI of the graph – a collection that contains all graph documents i.e. http://marklogic.com/semantics#graphs SLIDE: 37 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SPARQL Update locking sem:sparql-update( $sparql , $bindings, $options, $store, $default-permissions), “locking” option: – read-write: read-lock documents containing triples being accessed, write-lock documents being updated. – write: Only write-lock documents being updated. – Default is locking=read-write. sem:sparql( $sparql , $bindings, $options, $store), “locking” option: – read-write: read-lock documents containing triples being accessed – write: no locks (because sem:sparql() doesn't write) – Default is locking=read-write. Locking is ignored in query transaction. SLIDE: 38 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Agenda Semantics Overview – 1-3-10 SPARQL 1.1 support SPARQL Update Inference – Backwards vs forwards chaining – What are rules, and how do you use them? – Tips on Inference Performance Q&A SLIDE: 39 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Backward-chaining vs Forward-chaining Forward-chaining: At ingest time, insert a new triple John livesIn England John livesIn Londo n isIn Englan d livesIn Backward-chaining: At query time, return results as if a new triple existed John livesIn England MarkLogic 8 does inference this way SLIDE: 40 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Backward-chaining vs Forward-chaining Forward-chaining: At ingest time, insert a new triple Ingest (and update) is very slow More diskspace Materialize every possible inferred triple Implications for security, ACID .. But queries are fast John livesIn London isIn England livesIn Backward-chaining: At query time, return results as if a new triple existed Fast ingest, less diskspace, security and ACID are straightforward, only do the work that's needed .. But work is done at query-time SLIDE: 41 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Inference rules Choose an appropriate ruleset – the right level of inference Set a default ruleset for the database Specify a ruleset as part of your query Create your own ruleset SLIDE: 42 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Choose an appropriate ruleset Supplied rulesets are in $MARKLOGIC/Config with .rules extension Common levels of inference: – rdfs, rdfs-plus, owl-horst – each has an optimized ruleset + "full" Rulesets are built up in a modular way, using import SLIDE: 43 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Rule - example excerpt from subClassOf.rules rule "subClassOf rdfs9" construct { ?x a ?c2 } { ?x a ?c1 . ?c1 rdfs:subClassOf ?c2 . filter(?c1!=?c2) } Syntax is similar to SPARQL CONSTRUCT We can read the rule as: foreach x, c1, c2 where x is a c1 and c1 is a subclass of c2, construct "x is a c2" e.g. prod001 is a henley; henley is a subclass of shirt; construct "prod001 is a shirt" SLIDE: 44 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Ruleset – example – rdfs.rules # is a comment line Prefix – same as SPARQL Import other rulesets – modular SLIDE: 45 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Set a default ruleset(s) for the database SLIDE: 46 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Set a default ruleset(s) for the database xquery version "1.0-ml"; import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy"; (: add "subClassOf.rules" as default ruleset for database "Documents" :) let $config := admin:get-configuration() let $dbid := admin:database-get-id($config, "Documents") let $rules := admin:database-ruleset("subClassOf.rules") let $c := admin:database-add-default-ruleset($config, $dbid, $rules) return admin:save-configuration($c) (: See also: admin:database-get-default-rulesets() admin:database-delete-default-ruleset() :) SLIDE: 47 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Specify a ruleset(s) as part of your query (: create a store that uses the RDFS ruleset for inferencing :) let $rdfs-store := sem:ruleset-store("rdfs.rules",sem:store( "no-default-rulesets" ) ) return (: use the store you just created - pass it into sem:sparql() :) sem:sparql(' PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ex: <http://example.com/> SELECT ?product FROM <http://marklogic.com/semantics/sb/products/inf-1> WHERE { ?product rdf:type ex:Shirt ; ex:color "blue" } ', (),(), $rdfs-store ) For full examples, see http://ea.marklogic.com/wp-content/uploads/2014/12/SPARQL-inference-examples-EA3-2.pdf SLIDE: 48 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Create your own ruleset (: create a rules file and insert it into the Schemas database :) (: Note: run this from Query Console with "Content Source" set to "Schemas" :) xdmp:document-insert( '/rules/livesin.rules' , text{ ' # my rules for inference prefix ex: <http://example.com/> prefix gn: <http://www.geonames.org/ontology#> rule "lives in" construct { ?person ex:livesIn ?place2 } { ?person ex:livesIn ?place1 . ?place1 gn:parentFeature ?place2 }' } ) SLIDE: 49 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Use your own ruleset (: find places that John Smith lives in – with inferencing, using my ruleset :) let $my-store := sem:ruleset-store("/rules/livesin.rules", sem:store() ) return (: use the store you just created - pass it in to sem:sparql() :) sem:sparql(' PREFIX ex: http://example.com/ SELECT ?person ?placeName FROM http://marklogic.com/semantics/sb/customers/inf-1 WHERE { ?person ex:livesIn ?place . ?place gn:name ?placeName } ORDER BY ?person ', (), (), $my-store ) For full examples, see http://ea.marklogic.com/wp-content/uploads/2014/12/SPARQL-inference-examples-EA3-2.pdf SLIDE: 50 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Inference rules - Summary Choose an appropriate ruleset – the right level of inference – can use more than one ruleset Set a default ruleset for the database – Admin UI or XQuery/JavaScript API Specify a ruleset as part of your query – create a sem:store using your ruleset location(s) – include or override default ruleset – ruleset location is resolved from Schemas database, then $MARKLOGIC/Config Create your own ruleset – text file inserted in Schemas database SLIDE: 51 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Tips on Inference Use the fewest rules that you actually need – Query performance slows as you add rules – Database default + query-time ruleset(s) gives great flexibility Consider doing inference in your query, possibly with paths – Gives you the most control, best performance, most predictable results SLIDE: 52 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Tips on Inference Use the fewest rules that you actually need – Query performance slows as you add rules – Database default + query-time ruleset(s) gives great flexibility Consider doing inference in your query, possibly with paths – Gives you the most control, best performance, most predictable results ## find all blue shirts (including henleys) without inference SELECT ?product FROM <http://marklogic.com/semantics/sb/products/inf-1> WHERE { ?product rdf:type/rdfs:subClassOf* ex:Shirt . ?product ex:color "blue" } SLIDE: 53 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Agenda Semantics Overview – 1-3-10 SPARQL 1.1 support SPARQL Update Inference Performance – Improvements 7 to 8 – Inference performance – Diagnosing slow queries Q&A SLIDE: 54 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SPARQL Performance 8.0-1 will be ~10% faster than 7.0-4.1 – Especially for queries with larger joins – Reduced overhead for common join algorithms 7.0-4.1 is ~10% faster than 7.0-4 – Query optimization fix 7.0-4 is 65% faster than 7.0-1 (23% faster than 7.0-3) – Cost optimization improvements in 7.0-3 – Execution efficiency improvements in 7.0-4 Significant investment in performance planned for 8.0-2 to 9.0 SLIDE: 55 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Inference Algorithms Tableau or equivalent (Racer Pro, Pellet, etc.) – Most powerful – Severe scaling problems Forward chaining – Pay costs during ingest in disk space and time – Hours and double disk space are not uncommon (bulk ingest, ontology changes) – Materialized triples make queries fast Backward chaining – Pay costs during querying time in extra triple index lookups, CPU, and memory – Ingest is fast SLIDE: 56 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Architectural Reasons for Backwards Chaining MarkLogic can query arbitrary subsets of triples (cts:query constraint) – MarkLogic security is an important example of subsetting triples – MVCC is also a subset based on document timestamps Very hard to efficiently query materialized inferences if arbitrary subsets of the database might be queried SLIDE: 57 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Benefits of Backwards Chaining Flexibility! Choose your ontology at query time – include or exclude triples with cts:query or named graphs Choose your rulesets at query time Perform inference across in memory triples and database triples Fast scale-out ingest SLIDE: 58 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Inference Performance Different performance profile to competitors that use forward-chaining inference Restricted triple patterns will perform much better – ?s a :type – ?s :hasA ?o Never query for ?s ?p ?o ! Less rules is better. Twenty rules is a lot (RDFS+), ten is better (RDFS) – Rules are applied recursively, with exponentially increasing complexity – Rulesets are modular and can be flexibly combined Inference memory size may need increasing – To complete the query execution, not to get better performance SLIDE: 59 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Alternatives to Automatic Inference Property Paths in SPARQL ?s a foaf:Document rule ”rdfs9" construct { ?x a ?c2 }{ ?x a ?c1 . ?c1 rdfs:subClassOf ?c2 } SLIDE: 60 } ?s rdfs:subClassOf*/a foaf:Document © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Alternatives to Automatic Inference Bulk Materialization If the rulesets, ontologies, security, and data are static or change infrequently Consider materializing inferred triples as a one-off – Use “?s ?p ?o” query with built in inference, piped into sem:rdf-insert() – or use the rulesets as the basis for SPARQL CONSTRUCT queries SLIDE: 61 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Recursive Rule Application ?s a foaf:Document ?x = ?s rule ”rdfs9" construct { ?x a ?c2 }{ ?x a ?c1 . ?c1 rdfs:subClassOf ?c2 } SLIDE: 62 ?c2 = foaf:Document © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Recursive Rule Application ?s a foaf:Document rule ”rdfs9" construct { ?x a ?c2 }{ ?x a ?c1 . ?c1 rdfs:subClassOf ?c2 } SLIDE: 63 rule "rdfs11" construct { ?c1 rdfs:subClassOf ?c3 }{ ?c1 rdfs:subClassOf ?c2 . ?c2 rdfs:subClassOf ?c3 } © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Recursive Rule Application ?s a foaf:Document rule ”rdfs9" construct { ?x a ?c2 }{ ?x a ?c1 . ?c1 rdfs:subClassOf ?c2 } SLIDE: 64 rule ”rdfs2" construct { ?x a ?c }{ ?x ?p ?y . ?p rdfs:domain ?c } rule "rdfs11" construct { ?c1 rdfs:subClassOf ?c3 }{ ?c1 rdfs:subClassOf ?c2 . ?c2 rdfs:subClassOf ?c3 } © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Making a SPARQL Query Fast SPARQL with Inference? Try using a smaller ruleset – owl-horst.rules > rdfs-plus.rules > rdfs.rules – or try combining the rulesets for the specific ontology predicates/types you’re using Try using a smaller ontology, or a smaller set of data – restrict to a named graph or cts:query Turn on the “SPARQL Execution” trace flag to log the triple index lookups as they happen Are you querying for a large result set? – Slowest inferences are just accessing a lot of data SLIDE: 65 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Making a SPARQL Query Fast Optimization Time Optimization happens when a query is first seen Cached by string value of the query – Re-optimized after ~5 minutes to read new statistics Don’t count optimization time in query time Warm the cache up with the “prepare” option to sem:sparql() Pass bindings to sem:sparql(), don’t use string concatenation (safer and faster) Use higher levels of optimization (ie: “optimize=2”) for bigger or problematic queries – Longer spent optimizing can find a better query plan – Trade off between planning and doing SLIDE: 66 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Making a SPARQL Query Fast Serialization RDF serialization can be slow (sem:rdf-serialize(), sem:query-results-serialize()) – Avoid and use the sequence of maps directly if possible – Return aggregates using fn:count() or SPARQL 1.1 Aggregates in ML 8.0 The SPARQL endpoint uses sem:query-results-serialize(). – May get better results using XQuery and sem:sparql() SLIDE: 67 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Making a SPARQL Query Fast If You Need Support to Speed Up a Query Contact MarkLogic File a support case credit: http://commons.wikimedia.org/wiki/User:Takkk SLIDE: 68 © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.