Download MarkLogic 8: Semantics

Document related concepts
no text concepts found
Transcript
MarkLogic 8: Semantics
Stephen Buxton, John Snelson, Aries Li
November 2014
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
MarkLogic 8 Feature Presentations
Topics
Product Manager
Developer Experience: Samplestack and Reference Architecture
Kasey Alderete
Developer Experience: Node.js and Java Client APIs, Server-side
JavaScript, and Native JSON
Justin Makeig
REST Management API, Flexible Replication, Sizing, and
Reference Hardware Architectures
Caio Milani
Bitemporal
Jim Clark
Semantics
Stephen Buxton
SLIDE: 2
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Agenda
 Semantics Overview – 1-3-10
– MarkLogic Semantics in 1 slide
– How MarkLogic Semantics supports the "pillars" – 3 slides
– 10 slides to explain MarkLogic Semantics
 SPARQL 1.1 support
 SPARQL Update
 Inference
 Performance
 Q&A
SLIDE: 3
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Semantics
Enterprise triple store, document store, database combined

Store and query billions of facts and relationships;
infer new facts

Facts and relationships provide context for better
search

Flexible data modeling—integrate and link data from
different sources

Standards-based for ease of use and integration
– RDF, SPARQL, and standard REST interfaces

Even better with Built-in Search and Bitemporal
– Triples, documents, and data combined
SLIDE: 4
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Better Answers From Today’s Data
 Find more relevant information using facts as context
– Example: Search for "cardiac catheter"; show documents about "devices that stimulate
nerves" and "implantable devices"
 Present more, better information for more productive users
– Example: Search for "Ireland"; show facts about Ireland with search results
 Publish information dynamically to web or print or mobile
– Example: BBC Sports page about a team, event, sport, person
– Example: Wiley Custom Select for customized learning materials
SLIDE: 5
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
More Information, More Productivity
SLIDE: 6
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Intelligent Data Layer
 Discover connections between entities
– Example: Show me papers cited by Steven Pinker, and papers cited by…
 Walk hierarchies
– Example: Show me who directly owns Acme, and who ultimately owns them
 Infer new facts for simple data modeling, powerful queries
– Example: If Acme owns Amertek, then Amertek is owned by Acme
– Example: If prod001 is a blue henley, it's also a blue shirt
 Facts may be embedded in documents to keep context
SLIDE: 7
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
More Intelligent Web Experiences
Enforced Linear Journey
SLIDE: 8
Contextual Journey
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Simpler Data Integration
 Flexible data modeling through triples
– Triples are atomic and schema-less
– Triples are easy to share, easy to combine, and readily available
 Integrate data by adding links
– Load triples as-is, then add triples to link entities or documents
– No need to change the underlying data
– Example: cust123 (source1) is the sameAs cus_id_456 (source2)
– Example: cust123 hasOrderDoc /orders/ab42.json
SLIDE: 9
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Simpler Data Integration
Disconnected Data
cust123
Semantics as the Glue
cust123
cust_id_456
order_ab12
Accounts
SLIDE: 10
Support
Acquired
Company
Accounts
Support
Acquired
Company
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Example of an App Using Semantics
How does “Euro zone”
relate to “European
Union”, “Europe
OECD”, or “Europe”?
How does a term such
as “Small States,”
relate to “Least
Developed Countries,”
“Lower Middle
Income,” or “Low &
Middle Income.”
SLIDE: 11
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
What Is “Linked Data”?
Example of RDF
<http://example.org/dir/js> <http://xmlns.com/foaf/0.1/firstname> "John".
<http://example.org/dir/js> <http://xmlns.com/foaf/0.1/lastname> "Smith".
Example of SPARQL
SELECT ?person ?place
WHERE
{
?person <http://example.org/LivesIn> ?place .
?place <http://example.org/IsIn> "England" .
}
SLIDE: 12
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
MarkLogic Semantics Architecture
Interface Layer
JSON, XML,
RDF, Geo,
Binaries
mlcp
HTTP
Protocol
REST API
JavaScript
XQuery
SQL
SPARQL
Query Layer
JS
SPARQL
XQuery
Indexes
Universal
Index
Geospatial
Index
Triple
Index
Triple
Cache
Scalability
and
Elasticity
ACID
Transactions
Bitemporal
Triple
Store
Storage Layer
SLIDE: 13
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Combination Query Example
 A call comes into your call center:
– “Some maniac in a blue van just tried to run me down"
– "I got the first three letters of his license plate: ABC"
 You need to look for similar incident reports
– Reports that mention a "blue van"
… around the same time
… around the same place
… with a license plate that starts with "ABC"
SLIDE: 14
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Combination Query Example
<report>
vehicle
near airport </title>
<title>Suspicious
Suspicious
vehicle…
<date> 2012-11-12Z </date>
<type> observation/surveillance</type>
<threat>
<type> suspicious activity</type>
<category> suspicious vehicle </category>
</threat>
<location>
<lat> 37.497075 </lat>
<long> -122.363319 </long>
</location>
van…
with license plate ABC 123 was observed parked behind the airport sign…
<description> A blue van
<triple><subject>IRIID </subject> <predicate>isa </predicate><object> license-plate</object></triple>
<triple><subject>IRIID </subject> <predicate>value </predicate><object> ABC 123 </object></triple>
</description>
</report>
SLIDE: 15
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
MarkLogic Semantics Use Cases
 Information Delivery Platforms
 Open Government Initiatives
– Dynamic Publishing
 Regulatory compliance
– Custom Publishing
 Data provenance
 Data Integration
– Customer 360
– Know Your Customer
 Enterprise Reference Data
 Research Management
– Patient 360 Repository
 Decision Support
 Metadata Catalogs
 Intelligence, fraud detection
SLIDE: 16
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Better Together
Inference
Traversal
Document Store
SLIDE: 17
+
Data Store
+
Triple Store
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Tech Specs
MarkLogic 7
MarkLogic 8

Store and manage hundreds of billions of RDF triples
Everything in MarkLogic 7, plus:

Query across documents, data, and triples

SPARQL 1.1 including updates and aggregates

Triple index for sub-second search results

Graph traversal with property paths and transitive closures

Triple cache for high performance across large clusters

Automatic inference using rule sets

Bulk-load triples via MarkLogic Content Pump
–
Supplied rule sets for RDFS, RDFS+, OWL Horst

Provenance and reification by adding metadata
–
Support for user-defined rule sets

SPARQL 1.0+ over REST or XQuery

SPARQL calls from server-side programs with query restrictions

Standard SPARQL endpoint and graph store protocol support

XQuery helper modules for serializations and transitive closures

Updates, aggregates via MarkLogic APIs

Semantic enrichment with partners (e.g. Smartlogic, Temis, NetOwl)

Enterprise Features: ACID transactions, scalability and elasticity,
HA/DR, government-grade security, monitoring and performance tools
SLIDE: 18

SPARQL from server-side JavaScript, Node.js
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Agenda

Semantics Overview – 1-3-10

SPARQL 1.1 support
– Property paths
– SPARQL Aggregates

SPARQL Update

Inference

Performance

Q&A
SLIDE: 19
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SPARQL 1.1 Support
 MarkLogic 8.0-1 includes support for all the major features of SPARQL 1.1
including paths, aggregates, and SPARQL Update
 Details: see https://wiki.marklogic.com/display/rootwiki/SPARQL+support+-
+MarkLogic+8.0-1
 Contact [email protected] with enhancement requests
(or log a support ticket)
SLIDE: 20
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Property paths
 SPARQL 1.1 Query – Property Paths
Property path operators added for MarkLogic 8:
– ? – zero or one path
– + – one or more path
– * – zero or more path
SLIDE: 21
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Property paths - example
## find papers that cite paperA, and papers that cite papers that cite paperA,
and so on
SELECT ?s
WHERE {
?s c:cites*/dc:title "Paper A" .
}
ORDER BY ?s
Note: taken from Bob DuCharme's "Learning SPARQL"
For more examples, see http://ea.marklogic.com/wp-content/uploads/2014/08/SPARQL-paths-examplesEA2.pdf
SLIDE: 22
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SPARQL aggregates
Aggregate functionality includes:

AVG

GROUP_CONCAT

GROUP BY .. HAVING <some aggregate variable>
 SUM

ORDER BY <some aggregate variable>
 MIN

GROUP BY <more than one item>
 GROUP BY
 COUNT
 MAX
 SAMPLE
SLIDE: 23
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SPARQL aggregates - example
## count how many companies are in each industry sector
SELECT ?industry ( COUNT ( ?company ) AS ?count_companies )
FROM <http://marklogic.com/semantics/sb/COMPANIES100/>
WHERE {
?company demov:industry ?industry .
}
GROUP BY ?industry
For more examples, see http://ea.marklogic.com/wp-content/uploads/2014/09/SPARQL-aggregates-examples-EA2.pdf
SLIDE: 24
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Agenda

Semantics Overview – 1-3-10

SPARQL 1.1 support

SPARQL Update
– SPARQL Update operations, examples
– Managed triples
– Graph permissions
– Locking

Inference

Performance

Q&A
SLIDE: 25
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SPARQL UPDATE operations
1. GRAPH MANAGEMENT
Manipulate RDF graphs using the SPARQL 1.1 Update language
 CREATE – create a graph
 DROP – drop a graph and its contents
 COPY – make the destination graph into a copy of the source graph; any content in the destination
graph before this operation will be removed (think copy/paste)
 MOVE – move the contents of the source graph into the destination graph, and remove them from the
source graph; any content in the destination graph before this operation will be removed (think cut/paste)
 ADD – add the contents of the source graph into the destination graph; keep the source graph intact;
keep the initial contents of the destination graph intact
SLIDE: 26
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SPARQL UPDATE operations
2. GRAPH UPDATE
Delete, insert, and update (delete/insert) triples using the SPARQL 1.1 Update language.
 INSERT DATA
 DELETE DATA
 DELETE .. INSERT WHERE
 DELETE WHERE
 INSERT WHERE
 CLEAR
SLIDE: 27
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SPARQL Update – example[1]
DROP SILENT GRAPH <graph-1> ;
CREATE GRAPH <graph-1> ;
PREFIX PREFIX prod: http://example.com/products/
PREFIX ex: <http://example.com/>
INSERT DATA
{
GRAPH <graph-1>
{
prod:1001 rdf:type
ex:color
prod:1002 rdf:type
ex:color
}
} ;
SLIDE: 28
ex:Henley ;
"blue" .
ex:Shirt ;
"red" .
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SPARQL Update – example[2]
PREFIX prod: <http://example.com/products/>
PREFIX ex:
http://example.com/
## change all blue products to (only) "azure"
WITH <http://marklogic.com/semantics/sb/products/>
DELETE
{
?prod ex:color ?c .
}
INSERT
{
?prod ex:color "azure" .
}
WHERE {
?prod ex:color "blue"
}
SLIDE: 29
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SPARQL Update – more examples
GRAPH MANAGEMENT:
 http://ea.marklogic.com/wp-content/uploads/2014/08/SPARQL-update-examples-
graph-EA2.pdf
GRAPH UPDATE:
 http://ea.marklogic.com/wp-content/uploads/2014/08/SPARQL-update-examples-
EA2.pdf
SLIDE: 30
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SPARQL Update APIs
 Query Console – new mode "SPARQL Update"
 REST – post to /v1/graphs/sparql
 Server-side built-ins
– sem:sparql-update()
– sem.sparqlUpdate()
SLIDE: 31
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Managed Triples
 SPARQL Update operations operate over managed triples only
 Managed triples are triples loaded into the database using
– mlcp with -input_file_type RDF
– sem:rdf-load()
– sem:rdf-insert()
 cf embedded triples
– Triples embedded in XML or JSON documents
– SPARQL Update operations don’t affect embedded triples
 [Strictly, a managed triple is a sem:triple element in a document with root element sem:triples]
SLIDE: 32
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Graph Permissions
 Set permissions when you create a graph:
import module namespace sem = http://marklogic.com/semantics
at "/MarkLogic/semantics.xqy";
sem:sparql-update(
)
'CREATE GRAPH <graphs/sb/graph-1>',
(),(),(),
(
xdmp:permission( "demo-reader", "read" ),
xdmp:permission( "demo-writer", "update" )
)
 Note: arg5 is called $default-permissions , but you should set permissions explicitly
– See also sem:graph-set-permissions()
SLIDE: 33
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Graph Permissions
 Check permissions on the graph you just created:
import module namespace sem = http://marklogic.com/semantics
at "/MarkLogic/semantics.xqy";
sem:graph-get-permissions(
sem:iri("graphs/sb/graph-1")
)
 See also
– sem:graph-set-permissions()
– sem:graph-add-permissions()
– sem:graph-remove-permissions()
SLIDE: 34
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Graph Permissions
 Set permissions when you create a graph:
var sem = require('/MarkLogic/semantics');
sem.sparqlUpdate(
'CREATE GRAPH <graphs/sb/graph-2>',
[],[],[],
(
xdmp.permission( "demo-reader", "read" ),
xdmp.permission( "demo-writer", "update" )
)
)
 Note: arg5 is called $default-permissions , but you should set permissions explicitly
– See also sem.graphSetPermissions()
SLIDE: 35
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Graph Permissions
 Check permissions on the graph you just created:
var sem = require('/MarkLogic/semantics');
sem.graphGetPermissions(
sem.iri("graphs/sb/graph-5")
)
 See also
– sem:graphSetPermissions()
– sem:graphAddPermissions()
– sem:graphRemovePermissions()
SLIDE: 36
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Graph Permissions
 All SPARQL queries over managed triples are governed by the graph permissions
– … because managed triples documents will inherit those permissions at ingest
 Artefact: for each graph, you'll see a graph document
– The database URI of the graph document is the URI of the graph
– The graph document belongs to:
– a collection whose name is the URI of the graph
– a collection that contains all graph documents
i.e. http://marklogic.com/semantics#graphs
SLIDE: 37
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SPARQL Update locking
 sem:sparql-update( $sparql , $bindings, $options, $store, $default-permissions),
“locking” option:
– read-write: read-lock documents containing triples being accessed, write-lock
documents being updated.
– write: Only write-lock documents being updated.
– Default is locking=read-write.
 sem:sparql( $sparql , $bindings, $options, $store), “locking” option:
– read-write: read-lock documents containing triples being accessed
– write: no locks (because sem:sparql() doesn't write)
– Default is locking=read-write. Locking is ignored in query transaction.
SLIDE: 38
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Agenda

Semantics Overview – 1-3-10

SPARQL 1.1 support

SPARQL Update

Inference
– Backwards vs forwards chaining
– What are rules, and how do you use them?
– Tips on Inference

Performance

Q&A
SLIDE: 39
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Backward-chaining vs Forward-chaining
Forward-chaining: At ingest time, insert a new triple
John livesIn England
John
livesIn
Londo
n
isIn
Englan
d
livesIn
Backward-chaining: At query time, return results as if a new triple existed
John livesIn England
 MarkLogic 8 does inference this way
SLIDE: 40
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Backward-chaining vs Forward-chaining
Forward-chaining: At ingest time, insert a new triple

Ingest (and update) is very slow

More diskspace

Materialize every possible inferred triple

Implications for security, ACID

.. But queries are fast
John
livesIn
London
isIn
England
livesIn
Backward-chaining: At query time, return results as if a new triple existed

Fast ingest, less diskspace, security and ACID are straightforward, only do the work that's needed

.. But work is done at query-time
SLIDE: 41
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Inference rules
 Choose an appropriate ruleset – the right level of inference
 Set a default ruleset for the database
 Specify a ruleset as part of your query
 Create your own ruleset
SLIDE: 42
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Choose an appropriate ruleset
 Supplied rulesets are in $MARKLOGIC/Config
with .rules extension
 Common levels of inference:
– rdfs, rdfs-plus, owl-horst
– each has an optimized ruleset + "full"
 Rulesets are built up in a modular way, using
import
SLIDE: 43
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Rule - example

excerpt from subClassOf.rules
rule "subClassOf rdfs9" construct {
?x a ?c2
} {
?x a ?c1 .
?c1 rdfs:subClassOf ?c2 .
filter(?c1!=?c2)
}

Syntax is similar to SPARQL CONSTRUCT

We can read the rule as:
foreach x, c1, c2 where x is a c1 and c1 is a subclass of c2, construct "x is a c2"

e.g. prod001 is a henley; henley is a subclass of shirt; construct "prod001 is a shirt"
SLIDE: 44
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Ruleset – example – rdfs.rules
 # is a comment line
 Prefix – same as SPARQL
 Import other rulesets – modular
SLIDE: 45
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Set a default ruleset(s) for the database
SLIDE: 46
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Set a default ruleset(s) for the database
xquery version "1.0-ml";
import module namespace admin = "http://marklogic.com/xdmp/admin"
at "/MarkLogic/admin.xqy";
(: add "subClassOf.rules" as default ruleset for database "Documents" :)
let $config := admin:get-configuration()
let $dbid
:= admin:database-get-id($config, "Documents")
let $rules := admin:database-ruleset("subClassOf.rules")
let $c
:= admin:database-add-default-ruleset($config, $dbid, $rules)
return
admin:save-configuration($c)
(: See also:
admin:database-get-default-rulesets()
admin:database-delete-default-ruleset()
:)
SLIDE: 47
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Specify a ruleset(s) as part of your query
(: create a store that uses the RDFS ruleset for inferencing :)
let $rdfs-store := sem:ruleset-store("rdfs.rules",sem:store( "no-default-rulesets" ) )
return
(: use the store you just created - pass it into sem:sparql() :)
sem:sparql('
PREFIX rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ex:
<http://example.com/>
SELECT ?product
FROM <http://marklogic.com/semantics/sb/products/inf-1>
WHERE
{
?product rdf:type ex:Shirt ;
ex:color "blue"
}
',
(),(),
$rdfs-store
)
 For full examples, see http://ea.marklogic.com/wp-content/uploads/2014/12/SPARQL-inference-examples-EA3-2.pdf
SLIDE: 48
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Create your own ruleset
(: create a rules file and insert it into the Schemas database :)
(: Note: run this from Query Console with "Content Source" set to "Schemas" :)
xdmp:document-insert(
'/rules/livesin.rules' ,
text{
'
# my rules for inference
prefix ex:
<http://example.com/>
prefix gn:
<http://www.geonames.org/ontology#>
rule "lives in" construct {
?person ex:livesIn ?place2
} {
?person ex:livesIn ?place1 .
?place1 gn:parentFeature ?place2
}'
}
)
SLIDE: 49
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Use your own ruleset
(: find places that John Smith lives in – with inferencing, using my ruleset :)
let $my-store := sem:ruleset-store("/rules/livesin.rules", sem:store() )
return
(: use the store you just created - pass it in to sem:sparql() :)
sem:sparql('
PREFIX ex:
http://example.com/
SELECT ?person ?placeName
FROM http://marklogic.com/semantics/sb/customers/inf-1
WHERE
{
?person ex:livesIn ?place .
?place
gn:name ?placeName
}
ORDER BY ?person
', (), (),
$my-store
)
 For full examples, see http://ea.marklogic.com/wp-content/uploads/2014/12/SPARQL-inference-examples-EA3-2.pdf
SLIDE: 50
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Inference rules - Summary
 Choose an appropriate ruleset
– the right level of inference
– can use more than one ruleset
 Set a default ruleset for the database
– Admin UI or XQuery/JavaScript API
 Specify a ruleset as part of your query
– create a sem:store using your ruleset location(s)
– include or override default ruleset
– ruleset location is resolved from Schemas database, then $MARKLOGIC/Config
 Create your own ruleset
– text file inserted in Schemas database
SLIDE: 51
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Tips on Inference
 Use the fewest rules that you actually need
– Query performance slows as you add rules
– Database default + query-time ruleset(s) gives great flexibility
 Consider doing inference in your query, possibly with paths
– Gives you the most control, best performance, most predictable results
SLIDE: 52
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Tips on Inference
 Use the fewest rules that you actually need
– Query performance slows as you add rules
– Database default + query-time ruleset(s) gives great flexibility
 Consider doing inference in your query, possibly with paths
– Gives you the most control, best performance, most predictable results
## find all blue shirts (including henleys) without inference
SELECT ?product
FROM <http://marklogic.com/semantics/sb/products/inf-1>
WHERE
{
?product rdf:type/rdfs:subClassOf* ex:Shirt .
?product ex:color "blue"
}
SLIDE: 53
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Agenda

Semantics Overview – 1-3-10

SPARQL 1.1 support

SPARQL Update

Inference

Performance
– Improvements 7 to 8
– Inference performance
– Diagnosing slow queries

Q&A
SLIDE: 54
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SPARQL Performance

8.0-1 will be ~10% faster than 7.0-4.1
– Especially for queries with larger joins
– Reduced overhead for common join algorithms

7.0-4.1 is ~10% faster than 7.0-4
– Query optimization fix

7.0-4 is 65% faster than 7.0-1 (23% faster than 7.0-3)
– Cost optimization improvements in 7.0-3
– Execution efficiency improvements in 7.0-4

Significant investment in performance planned for 8.0-2 to 9.0
SLIDE: 55
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Inference Algorithms

Tableau or equivalent (Racer Pro, Pellet, etc.)
– Most powerful
– Severe scaling problems

Forward chaining
– Pay costs during ingest in disk space and time
– Hours and double disk space are not uncommon (bulk ingest, ontology changes)
– Materialized triples make queries fast

Backward chaining
– Pay costs during querying time in extra triple index lookups, CPU, and memory
– Ingest is fast
SLIDE: 56
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Architectural Reasons for Backwards Chaining
 MarkLogic can query arbitrary subsets of triples (cts:query constraint)
– MarkLogic security is an important example of subsetting triples
– MVCC is also a subset based on document timestamps
 Very hard to efficiently query materialized inferences if arbitrary subsets of the database might
be queried
SLIDE: 57
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Benefits of Backwards Chaining
 Flexibility!
 Choose your ontology at query time
– include or exclude triples with cts:query or named graphs
 Choose your rulesets at query time
 Perform inference across in memory triples and database triples
 Fast scale-out ingest
SLIDE: 58
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Inference Performance
 Different performance profile to competitors that use forward-chaining inference
 Restricted triple patterns will perform much better
– ?s a :type
– ?s :hasA ?o
 Never query for ?s ?p ?o !
 Less rules is better. Twenty rules is a lot (RDFS+), ten is better (RDFS)
– Rules are applied recursively, with exponentially increasing complexity
– Rulesets are modular and can be flexibly combined
 Inference memory size may need increasing
– To complete the query execution, not to get better performance
SLIDE: 59
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Alternatives to Automatic Inference
Property Paths in SPARQL
?s a foaf:Document
rule ”rdfs9" construct {
?x a ?c2
}{
?x a ?c1 .
?c1 rdfs:subClassOf ?c2
}
SLIDE: 60
}
 ?s
rdfs:subClassOf*/a
 foaf:Document
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Alternatives to Automatic Inference
Bulk Materialization
 If the rulesets, ontologies, security, and data are static or change infrequently
 Consider materializing inferred triples as a one-off
– Use “?s ?p ?o” query with built in inference, piped into sem:rdf-insert()
– or use the rulesets as the basis for SPARQL CONSTRUCT queries
SLIDE: 61
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Recursive Rule Application
?s a foaf:Document
?x = ?s
rule ”rdfs9" construct {
?x a ?c2
}{
?x a ?c1 .
?c1 rdfs:subClassOf ?c2
}
SLIDE: 62
?c2 = foaf:Document
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Recursive Rule Application
?s a foaf:Document
rule ”rdfs9" construct {
?x a ?c2
}{
?x a ?c1 .
?c1 rdfs:subClassOf ?c2
}
SLIDE: 63
rule "rdfs11" construct {
?c1 rdfs:subClassOf ?c3
}{
?c1 rdfs:subClassOf ?c2 .
?c2 rdfs:subClassOf ?c3
}
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Recursive Rule Application
?s a foaf:Document
rule ”rdfs9" construct {
?x a ?c2
}{
?x a ?c1 .
?c1 rdfs:subClassOf ?c2
}
SLIDE: 64
rule ”rdfs2" construct {
?x a ?c
}{
?x ?p ?y .
?p rdfs:domain ?c
}
rule "rdfs11" construct {
?c1 rdfs:subClassOf ?c3
}{
?c1 rdfs:subClassOf ?c2 .
?c2 rdfs:subClassOf ?c3
}
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Making a SPARQL Query Fast
SPARQL with Inference?
 Try using a smaller ruleset
– owl-horst.rules > rdfs-plus.rules > rdfs.rules
– or try combining the rulesets for the specific ontology predicates/types you’re using
 Try using a smaller ontology, or a smaller set of data
– restrict to a named graph or cts:query
 Turn on the “SPARQL Execution” trace flag to log the triple index lookups as they happen
 Are you querying for a large result set?
– Slowest inferences are just accessing a lot of data
SLIDE: 65
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Making a SPARQL Query Fast
Optimization Time

Optimization happens when a query is first seen

Cached by string value of the query
– Re-optimized after ~5 minutes to read new statistics

Don’t count optimization time in query time

Warm the cache up with the “prepare” option to sem:sparql()

Pass bindings to sem:sparql(), don’t use string concatenation (safer and faster)

Use higher levels of optimization (ie: “optimize=2”) for bigger or problematic queries
– Longer spent optimizing can find a better query plan
– Trade off between planning and doing
SLIDE: 66
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Making a SPARQL Query Fast
Serialization

RDF serialization can be slow (sem:rdf-serialize(), sem:query-results-serialize())
– Avoid and use the sequence of maps directly if possible
– Return aggregates using fn:count() or SPARQL 1.1 Aggregates in ML 8.0

The SPARQL endpoint uses sem:query-results-serialize().
– May get better results using XQuery and sem:sparql()
SLIDE: 67
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Making a SPARQL Query Fast
If You Need Support to Speed Up a Query

Contact MarkLogic

File a support case
credit: http://commons.wikimedia.org/wiki/User:Takkk
SLIDE: 68
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.