Download 32N1634 DBSG Graph Query Language 07-02

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The
JTC1 SC32N1634
Graph Query Language:
Towards a Unification of Graph
Query Approaches
David Silberberg
[email protected]
443-778-6231
Outline
 Goals & Example Scenario
 Key Features of GQL
 Computational Complexity of Query Execution
 Future Directions
2
Heritage Style Viewgraphs
Goals of the Graph Query Language (GQL)
Project
 To unify disparate graph query approaches into a single,
seamless, and declarative language
 Supports semantic search over graph data structures
represented by schemas
 Supports traditional graph algorithms that systematically
follow edges to discover interesting subgraphs (e.g., shortest
path, minimal spanning tree, etc.)
 Supports metrics-oriented graphs algorithms (e.g., social
network analysis, etc.)
 Supports special commands tailored to analysis of graphs
 Supports ontology-assisted query
 To quantify the scalability of this type of language
3
Heritage Style Viewgraphs
Assumptions
 Data model is a typed graph that adheres to a schema
 Not XML – graphs tend to be more highly connected
 Not a semantic model – inference cannot, in general, be
performed on the schema
 Data graphs can be large
 Query languages are only an abstract representation of questions
 The object is finding the right abstraction for the way people
think about interacting with graphs
 Other query languages onto other data models will work – but
do those languages help facilitate or hinder the formulation of
those requests or the interpretation of the results?
 Algorithms are external to the graph management system
 There are too many algorithms
 New algorithms may be implemented or modified regularly
 We are not the experts in writing efficient algorithms
4
Heritage Style Viewgraphs
Graph Interaction Methods
 Graph interactions take many forms
 Browse
 One-step-at-a-time exploration of a graph
 Semantic Schema-Based Search
 Several-steps-at-a-time graph query
 Algorithms
 Find subgraphs
 Calculate graph metrics
 Analysis
 Hypothesis expressions, etc.
 GQL is a declarative graph query language for integrating all
these approaches!
6
Heritage Style Viewgraphs
Example Scenario
 Farmer Jones' lettuce crop did well this year, but few other
farmers did well. Why?
 First, find Farmer Jones. (Browsing)
Jones
7
Heritage Style Viewgraphs
Example Scenario
 Rabbits usually eat lettuce. Let's find the rabbits that ate
Farmer Jones' lettuce. (Semantic Schema-Based Search)
Prize
Roman
Jones
Bugs
Icy
Harvey
8
Heritage Style Viewgraphs
Example Scenario
 Let's look at all the farmers, and their locations, whose lettuce
was eaten by fewer than 5 rabbits. (Semantic Schema-Based
Search)
Prize
Roman
Bugs
Jones
Leafy
Harvey
Icy
Soft
Smith
Peter
Crispy
Smalltown,
USA
Green
Harris
Tasty
9
Heritage Style Viewgraphs
Example Scenario
 What commonalities do the farmers have with each other and
with the rabbits? (Semantic and Algorithmic Search)
Acme Rent-aFox
Prize
Roman
Red
Bugs
Leafy
Jones
Icy
Harvey
Soft
Smith
Sly
Crispy
Peter
Smalltown,
USA
Green
Harris
Tasty
10
Heritage Style Viewgraphs
Example Scenario
 If Fred fox ate Prize lettuce, what else would we learn?
(Analysis-specific Methods, Semantic Search, and
Algorithmic Search)
Fox
Enterprises
Acme Rent-aFox
Prize
Fred
Roman
Brer
Bugs
Leafy
Jones
Icy
Harvey
Soft
Smith
Red
Crispy
Peter
Smalltown,
USA
Green
Harris
Sly
Tasty
11
Heritage Style Viewgraphs
Outline
 Goals & Example Scenario
 Key Features of GQL
 Computational Complexity of Query Execution
 Future Directions
12
Heritage Style Viewgraphs
Related Work
 Four categories of graph query languages and
examples
1. Knowledge base (subject-predicate-object) query languages
 SPARQL, RQL, RAL, RDF Query Language
2. Graph reasoning query languages
 OWL-QL, GraphLog, Query and Inference Service for
RDF
3. Query languages with graph operators
 GOQL
 GRAM
4. Graphical user interface query language
 QGRAPH
13
Heritage Style Viewgraphs
Features of GQL that Support Analysis
 Schema-based graph query






Returns a single graph or a set of graphs (not tables or XML files)
Aliasing
Graph exploration through wildcard search
Embedded queries (helps achieve first order logic expressiveness)
Creates new graph structures in query results
Query over defined patterns (of activity or behavior, for example)
 Special commands tailored to analysis
 Hypothesis expressions
 Composite vertices (of vertices and edges)
 External algorithms that return graphs (e.g., shortest path)
 External algorithms that return metrics (e.g., social network
analysis)
 Ontology-assisted graph query
NEXT
14
Heritage Style Viewgraphs
Example Graph Model
time
Chases
Fox
name
Fox: fox1
name: George
age: 3
Rabbit
Eats
age
time
name
age
time
Eats
Lettuce
name
Eats
Carrot
name
time
Chases: chases1
Rabbit: rabbit1
time: 2pm
name: Peter
age: 2
Eats: eats1
Eats: eats3
Lettuce: lettuce1
time: 8am
name: PrizeLettuce
Eats: eats4
Carrot: carrot1
time: 3pm
Chases: chases2
time: 5pm
Fox: fox2
name: Fred
15
Eats: eats2
age: 2
time: 9am
Rabbit: rabbit2
name: Bugs
age: 4
Rabbit: rabbit3
name: Jack
age: 1
time: 7pm
name: CarrotTop
Eats: eats5
Carrot: carrot2
time: 7am
name: BigCarrot
Eats: eats6
Lettuce: lettuce2
time: 8am
name: Icy
Heritage Style Viewgraphs
GQL Operators - Overview
 Basic Syntax
 SUBGRAPH clause
 Finds a subgraph in the source graph
 CONSTRAINT clause
 Filters the subgraph based on property constraints
 RETURN clause
 Describes the resulting graph or sets of graphs to return
 Syntax for analysis
 ASSUME clause
 Supports hypothesis statements
 PATTERN clause
 Defines search patterns
BACK
16
Heritage Style Viewgraphs
Simple Query that Returns a Single Graph
SUBGRAPH Fox Chases Rabbit AND Fox Eats Rabbit
CONSTRAINT Chases.Time < Eats.Time
RETURN
Fox Chases Rabbit AND Fox Eats Rabbit
Fox: fox1
Chases: chases1
Rabbit: rabbit1
time: 2pm
name: George
age: 3
name: Peter
age: 2
Eats: eats1
time: 3pm
 Type represents variable
 Motivated by languages like SQL
 In constrast to (Fox ?f1)
18
Heritage Style Viewgraphs
Returning a Set of Graphs
 Can be done with edge expansion or joins in the RETURN clause
 Can be seamlessly integrated with non-graph expansion
expressions
 Any query can be returned as a set of graphs if desired
SUBGRAPH Fox Chases Rabbit
RETURN
Fox Chases# Rabbit
Fox: fox1
Chases: chases1
Rabbit: rabbit1
result graph 1
time: 2pm
name: George
age: 3
Fox: fox1
name: Peter
Chases: chases2
age: 2
Rabbit: rabbit2
result graph 2
time: 5pm
name: George
19
age: 3
name: Bugs
age: 4
BACK
Heritage Style Viewgraphs
Aliasing
SUBGRAPH
Fox ALIAS ChasingFox Chases Rabbit AND
Fox ALIAS EatingFox Eats Rabbit
CONSTRAINT ChasingFox.name <> EatingFox.name
RETURN
ChasingFox Chases Rabbit AND
EatingFox Eats Rabbit
 If our graph had an additional edge in which George Fox chased
Jack Rabbit at 8 a.m., the result would look like:
Fox: fox1
name: George
age: 3
Chases: chases3
time: 8am
Eats: eats2
Fox: fox2
name: Fred
20
age: 2
time: 9am
Rabbit: rabbit3
name: Jack
age: 1
BACK
Heritage Style Viewgraphs
Embedded Queries
 Significant component of first order logic expressiveness
 To request the first fox that ate a rabbit, the following
existential query is formulate:
SUBGRAPH
CONSTRAINT
RETURN
Fox Eats ALIAS E1 Rabbit
NOT EXISTS
(SUBGRAPH Fox Eats ALIAS E2 Rabbit
CONSTRAINT E1.time > E2.time)
Fox Eats Rabbit
Eats: eats2
Fox: fox2
name: Fred
age: 2
time: 9am
Rabbit: rabbit3
name: Jack
age: 1
BACK
21
Heritage Style Viewgraphs
New Result Graph Structure Query
SUBGRAPH
RETURN
Fox Eats Rabbit AND Rabbit Eats Lettuce
Fox new(Ingests) Lettuce
Fox: fox1
name: George
Ingests: ingests1
age: 3
Lettuce: lettuce1
name: PrizeLettuce
Fox: fox2
name: Fred
age: 2
Ingests: ingests3
Lettuce: lettuce2
name: Icy
BACK
22
Heritage Style Viewgraphs
Hypothesis Expressions
 Enables queries on hypothetical data
SUBGRAPH
Fox Chases Rabbit AND
Fox Eats Rabbit AND
Rabbit Eats Lettuce
CONSTRAINT Chases.time < ‘8am’
RETURN
Fox new(Ingests) Lettuce
ASSUME
EDGE Chases [NEW time = ‘7am’]
FROM Fox[CONSTRAINT name= ‘Fred’]
TO
Rabbit[CONSTRAINT name= ‘Jack’]
 Motivated by OWL-QL
23
BACK
Heritage Style Viewgraphs
Composite Vertices
 Composite vertices
 Composed of vertices and edges
 Contained vertices can be composite as well
name
Place
location
OccuredAt
HuntingEvent
time
time
Chases
Fox
Eats
name
Eats
Lettuce
name
Eats
Carrot
name
Rabbit
age
time
name
age
time
time
24
Heritage Style Viewgraphs
Composite Vertex Queries - continued
SUBGRAPH
CONSTRAINT
RETURN
HuntingEvent OccuredAt Place
AND
HuntingEvent DIRECTLY CONTAINS Rabbit AND
Rabbit Eats Lettuce
Place.name = ‘Smith Game Park’
Rabbit Eats Lettuce
time
Eats
Lettuce
name
Rabbit
name
age
 Addresses a subset of Harel's Higraphs
 Multiple hops
 CONTAINS or IS-CONTAINED-BY
 Feasible because of the hierarchy
25
BACK
Heritage Style Viewgraphs
Wildcard Queries
SUBGRAPH Fox * ALIAS InterestingEdge Rabbit
RETURN
Fox InterestingEdge Rabbit
Fox: fox1
name: George
age: 3
Chases: chases1
Rabbit: rabbit1
time: 2pm
name: Peter
age: 2
Eats: eats1
time: 3pm
Chases: chases2
time: 5pm
Eats: eats2
Fox: fox2
name: Fred
age: 2
time: 9am
Rabbit: rabbit2
name: Bugs
age: 4
Rabbit: rabbit3
name: Jack
age: 1
 One edge wildcard queries
 Multiple hops
 May be computationally expensive in a graph
 Can be handled by an external AllPath() algorithm
26
BACK
Heritage Style Viewgraphs
Pattern Definition
 Assigns names to interesting graph patterns
 Can be reused in multiple queries
PATTERN Predator (Fox new(PreysUpon) Rabbit) =
SUBGRAPH Fox Chases Rabbit AND
Fox Eats Rabbit
CONSTRAINT Chases.time < Eats.time
RETURN
Fox new(PreysUpon) Rabbit
27
Heritage Style Viewgraphs
Pattern Use
 Query:
SUBGRAPH
RETURN
Predator(Fox PreysUpon Rabbit) AND
Rabbit Eats Lettuce
Fox new(Ingests) Lettuce
 Is evaluated as if it were:
SUBGRAPH
CONSTRAINT
RETURN
Fox Chases Rabbit AND
Fox Eats Rabbit AND
Rabbit Eats Lettuce
Chases.time < Eats.time
Fox new(Ingests) Lettuce
BACK
28
Heritage Style Viewgraphs
External Graph Algorithms that Return
Subgraphs
 Shortest Path
SUBGRAPH
RETURN
GameWarden Chases Fox AND
ShortestPath(Fox, Rabbit) ALIAS SP_alias AND
Rabbit Eats Lettuce
GameWarden Chases Fox AND
SP_alias AND
Rabbit Eats Lettuce
 Adjacent Vertices
SUBGRAPH
CONSTRAINT
RETURN
29
AdjacentVertices(Rabbit) ALIAS AV_alias
count_edges(Rabbit) > 10
AV_alias
BACK
Heritage Style Viewgraphs
External Graph Algorithms that Return
Metrics
 Centrality: Find the Foxes that eventually Eat the Rabbits, who play a
central role in the garden activities
SUBGRAPH
CONSTRAINT
RETURN
Fox Eats Rabbit
Centrality (Fox, Rabbit, Lettuce) > .8
Fox Eats Rabbit
 Clustering Coefficient: Find the Foxes that are likely to work together
when Chasing Rabbits
SUBGRAPH
CONSTRAINT
RETURN
30
Fox ALIAS Fox1 Chases Rabbit AND
Fox ALIAS Fox2 Chases Rabbit
ClusteringCoefficient (Fox1, Fox2) > .6
AND Fox1 <> Fox2
Fox Eats Rabbit
Heritage Style Viewgraphs
Some Issues with External Algorithms
 Algorithms do not filter results, they operate direction on the
graph and tie into the rest of the results
 Algorithms need to return a set of graphs (or a graph under some
circumstances) in a standard format
 Order of query execution
 No current way to refer to the result vertices and edges of
algorithms that are not specifically identified in the query
SUBGRAPH AdjacentVertices(Rabbit) ALIAS AV_alias
CONSTRAINT
ClusteringCoefficient (<Vertex1 ?>, <Vertex2 ?>) > .6
RETURN
<Vertex1 ?> <Edge1 ?> Rabbit AND
<Vertex2 ?> <Edge2 ?> Rabbit
BACK
31
Heritage Style Viewgraphs
Ontology Assisted Query
Organism
isA
isA
Animal
isA
Carnivore
isA
Wolf
Ontology
isA
Chases
Eats
Herbivore
Eats
isA
isA
isA
Fox
Hare
Sheep
Vegetable
isA
Lettuce
isA
Carrot
Mappings
time
Chases
Fox
name
32
Eats
age
time
Rabbit
name
age
time
Eats
Lettuce
name
Eats
Carrot
name
Graph Schema
time
Heritage Style Viewgraphs
Ontology-Assisted Query Result
SUBGRAPH
RETURN
Carnivore Eats Herbivore AND
Herbivore Eats Vegetable
Carnivore new(Ingests) Vegetable
Fox: fox1
name: George
Ingests: ingests1
age: 3
Lettuce: lettuce1
name: PrizeLettuce
Fox: fox2
name: Fred
age: 2
Ingests: ingests3
Lettuce: lettuce2
name: Icy
33
Heritage Style Viewgraphs
Some Issues of Ontology-Assisted Query
 Why not just have an ontology query language?
 Performance issues?
 Scaling issues?
 Capitalize on features that semantics bring to bear on a graph
query language
 Semantic abstraction (e.g., subsumption, hierarchy)
 Use inference to create semantically consistent models
 Impose semantic on the graph model
BACK
34
Heritage Style Viewgraphs
Outline
 Goals & Example Scenario
 Key Features of GQL
 Computational Complexity of Query Execution
 Future Directions
35
Heritage Style Viewgraphs
Query Optimization
 Query execution time is the key to success for any query language
– GQL is no exception
 We apply relational database optimization techniques to graph
queries
 Optimization issues
 Addressed query optimization on a per path-segment basis – yes
 Address path-segment ordering – initial thoughts
 Address the management of large amounts of intermediate
results of a query – not yet
 Address incorporating external algorithms – not yet
 Address ontology elaboration performance – not yet
36
Heritage Style Viewgraphs
Query Optimization
 Query plan representations are used to define query
execution plans
 Query plan representations are constructed to optimize
the query execution time
 Via graph algebra
 Via graph statistics to estimate query costs for each
operation
 Query optimizer determines
 The best algorithm to execute each operation
 The best operation ordering to optimize overall query
execution time
37
Heritage Style Viewgraphs
Graph Statistics
 Estimating costs requires statistical knowledge of the graph
 We estimate the cost of the path segment operator
 One of the most common and costly operations
 Statistics that we initially considered useful:
 Vertex Cardinality: The number of vertices of type v is count(v) or just V.
 Vertex Edge Set Cardinality: The total number of edges e that emanate
from all vertices of type v is count(ev) or just EV.
 Edge Cardinality: The number of edges of type e is count(e) or just E.
 Edge Distribution: The number of different vertex type pairs that edges of
type e connect of just ED.
 Selectivity Factor: The percentage of vertices or edges that match a
property constraint is sel(), where  is the property constraint.
 Uniformity assumption
 Independence assumption
39
Heritage Style Viewgraphs
Path Segment – Vertex Search, No Indices
 Algorithm
 Iterate through a set of vertices of type v in O(V) time
 For each vertex, iterate through its edge list to find
edges of type e in O(EV/V) time
 Follow the edge to vertex w in constant time
 Execution time is O(V*(EV/V)) = O(EV)
40
Heritage Style Viewgraphs
Path Segment – Indices on Vertex Edge Set

Requires each edge set to be indexed through a logarithmic-time search
tree (e.g., B+ tree)
 Next values are (virtually) collocated with the matching value
 Enables a constant time search for the next value(s)

Algorithm
 Iterate through vertices of type v in time O(V)
 Find matching edge(s) in logarithmic time O(log(EV/V)
 Iterate through the matching edges in time O(E/EDV)

Execution time is O(V * (log(EV/V) + E/EDV) ) = O(V*log(EV/V) + E/ED)

If ED  E (i.e., one edge of type e emanates from each v), then the algorithm
tends to operate in time O(V*log(EV/V))
If ED  E and EV V, the algorithm tends operate in time O(V)
If ED  E and EV >> V, the algorithm tends to operate in time O(V*log(EV))
If ED >> E, then the algorithm tends to operate in time O(E/ED)



41
Heritage Style Viewgraphs
Path Segment – Edge Indices, Constraint
 Beneficial when the query includes a constraint v on an indexed property of
vertices of type v
 Vertex edge sets are indexed as well
 Algorithm
 Logarithmic-time search through the indexed properties v in time O(log(V))
 Iterate through vertices (collocated in the index) that satisfy the constraint in time
O(sel(v)*V)
 Performs a logarithmic-time search on the edges of each matching vertex in time
O(log(EV/V))
 Iterate through the matching edges in time O(E/EDV)
 Execution time is O(log(V) + (sel(v)*V*(log(EV/V) + E/EDV)) ) = O(log(V) +
sel(v)*V*log(EV/V) + sel(v)*E/ED)
 If sel(v)  0, the dominant factor is the search for vertices or O(log(V))
 If the selectivity factor is higher, the execution time approaches the times of the
previous slide
42
Heritage Style Viewgraphs
Path Segment – Edge Search, No Indices
 Algorithm
 Iterate over edge types e and select those that connect v to w in
time O(E)
 Find the corresponding vertices in constant time
 Execution time is O(E)
43
Heritage Style Viewgraphs
Path Segment – Edge Search, Constraint
 Beneficial when the query statement includes a constraint e on an
indexed property of edges of type e
 Algorithm
 Performs a logarithmic-time search through properties to find the
first matching edge in time O(log(E))
 Performs a linear search through all subsequent matching edges in
time O(sel(e)*E)
 Find both vertices attached to each edge in constant time
 Execution time is O(log(E) + sel(e)*E)
 If sel(e)  0, the algorithm tends to an execution time of O(log(E))
 Otherwise, the algorithm tends to an execution time of O(E)
44
Heritage Style Viewgraphs
Varying Number of Vertices per Vertex Type
Varying Number of Vertices per Vertex Type with Constraints
10000
Execution Time (ms) per 1000 Iterations
high property constraint selectivity
low property constraint selectivity
1000
100
10
Number of Vertices per Vertex Type
1
45
100
1000
10000
100
1000
10000
Algorithm 2
192
315
2713
184
342
2671
Algorithm 3
188
334
560
47
43
45
Algorithm 4
152
144
146
163
146
144
Algorithm 5
147
160
145
34
33
35
Heritage Style Viewgraphs
Varying Number of Edges per Vertex
Varying Number of Edges per Vertex with Constraints
1000
Execution Time (ms) per 1000 Iterations
high property constraint selectivity
low property constraint selectivity
100
10
Number of Edges per Vertex
1
46
5
10
50
100
500
1000
5
10
50
100
500
1000
Algorithm 2
356
320
289
298
275
284
348
329
291
292
289
277
Algorithm 3
27
25
78
77
82
81
9
10
15
22
75
80
Algorithm 4
71
67
66
68
65
66
70
68
65
75
63
65
Algorithm 5
13
20
70
72
73
71
8
8
9
10
21
21
Heritage Style Viewgraphs
Varying Edge Types with Constraints
Varying Number of Edge Types with Constraints
100000
high property constraint selectivity
low property constraint selectivity
Execution Time (ms) per 1000 Iterations
10000
1000
100
10
Number of Edge Types
1
47
1
10
100
1000
1
10
100
1000
Algorithm 2
349
187
192
32
323
181
183
32
Algorithm 3
352
204
183
39
15
11
13
12
Algorithm 4
29342
1709
137
17
30231
1779
157
19
Algorithm 5
146
148
146
22
10
10
10
10
Heritage Style Viewgraphs
Path Segment Ordering
 Assume the following query
SUBGRAPH
Fox Chases Rabbit AND
Rabbit Eats Lettuce
CONSTRAINT Rabbit.age < 3
RETURN
Fox new(Ingests) Lettuce
 Query processing produces the following query execution plan
p Fox
new (Ingests) Lettuce
s Rabbit.age < 3


Eats Lettuce
Fox Chases Rabbit
48
Heritage Style Viewgraphs
Path Segment Execution Order Choice
 Which is more efficient?
p Fox
p Fox
new Ingests Lettuce
s Rabbit.age < 3
s Rabbit.age < 3


Eats Lettuce
Fox Chases Rabbit
49
new Ingests Lettuce
or

Fox Chases

Rabbit Eats Lettuce
Heritage Style Viewgraphs
Execution Order Heuristics
 In simple terms
 Identify the path segment operation that promises to
return the least number of results
 Then identify the next operation that promises to
return the next least number of results
 It is actually more complicated than this
 Need to search an exponential number of orderings to
find the most efficient ordering
 Heuristics can make this search tractable
50
Heritage Style Viewgraphs
Path-Segment Ordering Metric
 Order the path segment operators to return the fewest results
 Rough heuristic:
 If predicates v, e, and w are applied to V, E and W respectively
 Start with V and use selectivity factors to estimate execution time
 Execution time is:
 V * sel(v) * (E/EDV) * sel(e) * (WED/E) * sel(w)
 Or, sel(v) * sel(e) * sel(w) * W
 Use this formula to determine whether Fox Chases Rabbit should
precede or follow Rabbit Eats Lettuce
51
Heritage Style Viewgraphs
Outline
 Goals & Example Scenario
 Key Features of GQL
 Computational Complexity of Query Execution
 Future Directions
52
Heritage Style Viewgraphs
Prototype Implementation Schedule
 Currently Implemented
 Schema search returning a single graph
 Pattern matching
 Aliasing
 Ontology assisted graph query
 Next to be implemented – within approximately 6 months
 Externally defined functions
 Wildcard search
 Hypothesis expressions
 Future
 Return a set of graphs (instead of a single graph)
 Embedded queries
 Return new graph structures in query results
 Composite vertices (of vertices and edges)
 Predefined patterns
 Query Optimization
53
Heritage Style Viewgraphs
Future Work
 Relate GQL to a graphical interface
 Enables analysts to express queries through graphical means
 Can leverage several technologies (QGraph, Conceptual Graphs,
etc.)
 Augment GQL to include Uncertainty, Geospatial and Temporal
operators and data structures
 Address query optimization techniques
 Create a generic (as much as possible) back-end API to integrate
with data sources
 Relational
 Different graph approaches
54
Heritage Style Viewgraphs