Download d ienco - IndexMed 2016

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

IEEE 1355 wikipedia , lookup

Network tap wikipedia , lookup

Airborne Networking wikipedia , lookup

Transcript
An overview of Graph Categories and
Graph Primitives
Dino Ienco ([email protected])
https://sites.google.com/site/dinoienco/
Topics I’m interested in:
•
•
•
•
•
1
Graph Database and Graph Data Mining
Social Network Analysis
Semi-Supervised and Clustering
Deep Learning
Data Stream
Overview
What is a Graph and Why Graphs are useful nowadays?
Type of graphs:
• Directed vs Undirected
• Directed Acyclic Graph (DAG)
• Labeled Graph: Node and Edge Labeled
• Hypergraph
• Multigraph
Graph-Based approaches and methodologies:
• Graph Data Management
• Graph Mining
• (Social) Network Analysis
• Knowledge Graphs
• Graph Visualization
2
What is a Graph and Why is useful
A graph is a mathematical structure (and a
computer science abstract data type)
A graph is generally composed by a set of Vertices
and a set of Edges: G = (V,E)
An edge is a pair of nodes that are linked each other
Node
Edge
3
What is a Graph and Why is useful
Graph can be augmented with layer or attribute information
Why graphs are useful nowadays:
• Many real world data can be represented by this structure
• It allows to represent at the same time instances (nodes) and instances relation
(edges)
• We can associate additional information to each instances
Graphs are expressive structures that allow to model
complex information
4
Type of Graphs
Undirected Graphs:
Graphs in which edges between two nodes
do not have direction, edge (1,2) is equal
to edge (2,1)
This kind of graphs are usually employed to represent standard
relations without precedence:
•Biological Networks
•Social Networks
•Interaction Networks
•Co-Authorship Networks
•Image Segmentation Networks
5
Type of Graphs
Directed Graphs:
Graphs in which edges have direction,
edge (1,2) is different from edge (2,1)
This kind of graphs are usually employed to represent causality or
information propagation:
•Biological Networks
•Social Networks
•Communication Networks
•Co-Authorship Networks
•Citation Networks
6
Type of Graphs
Directed Acyclic Graphs:
A particular class of Directed Graph in
which there is no cycle (we cannot start
and end a graph traversal from and to the
same node)
This kind of graphs are usually employed to represent workflows or
describe temporal state evolutions:
•Any kind of workflows
•Particular case of human (or animal) interactions (spread
phenomena)
7
Type of Graphs
Attributed Graphs:
Directed (or Undirected) graphs that have
a vector of information (or a set of items)
associated to each node
This kind of graphs are usually employed to represent more
complex information regarding the instances (nodes) of the
graph:
•Social Networks
•Any kind of domain that associate to each node additional
information
8
Type of Graphs
Weighted (or edge-labeled) Graphs:
Directed (or undirected) graphs that have a numerical
weight (or a discrete label) associate to each edge
This kind of graphs allows to represent edge information such as
the strength of the connection between two nodes or a category
of edge:
•Social Networks
•Communication Networks
•Biological Networks
•Any kind of domain in which we are able to quantify the
strength between instances (distances or similarities)
9
Type of Graphs
HyperGraphs:
Graphs in which an edge can link more than two
nodes. A generalization of common graphs where an
edge is a connection between two nodes
This kind of graphs allows to represent complex interactions
among more than two entities:
•Social Networks
•Co-authorship network
•Relational Databases
•Any kind of domain in which we need to manage n-ary
relationships
10
Type of Graphs
MultiLayer Graphs:
Directed (or Undirected) graphs that have
different layers (edge set). Each layer is
associated to a different kind of interaction
between nodes.
This kind of graphs allow to represent multiple (semantically
different) interactions between two nodes:
•Social Networks (with different layers: social, family, work, etc)
•Co-Authorship networks (one layer for each conference)
•Biological networks (gene interactions with pathways spec.)
•Remote Sensing image networks (one layer for each spectral
indicators)
11
Graph-Based approaches
and Methodologies
Graph Data Management:
Techniques to store, index and query graph data
Graph Mining:
Techniques and approaches to extract interesting knowledge from graph
(Social) Network Analysis:
Techniques and approaches to supply information about the global
network structure but also inspecting the role of the nodes in the network
Knowledge Graphs:
Knowledge bases that can be represented as graph. Metadata and data
represented by RDF format or linked open data
Graph Visualization:
Techniques and approaches that can be used to perform Visual Analytics
on network data
12
Graph Data Management
Techniques to store, indexing and query graph data
Work at database level in order to supply fast implementation of graph
primitives and basic graph operations
Example of Operations:
• Exact Subgraph Queries
• Approximate Subgraph Queries
• Reachability Queries
• Regular Path Queries
• Etc…
Settings (How the graph database looks like?):
Single Large Graph (Social Networks, Biological Network, Ecological Network)
Collection of Graphs (Chemical networks, Human interaction patterns, Temporal
road networks)
13
Graph Data Management
Subgraph Queries:
Given a query graph Q and a data graph G, find all the
embeddings of Q in G
Applications:
Biochemical compounds matching,
Pattern matching in biological or social network,
Graph Database
Reachability Queries:
Given a data graph G and two nodes n1 and N2, verify if n1
(resp.n2) is reachable from n2 (resp. n1).
Applications:
Analyze Web graphs, Link analysis, Biological
networks
14
Graph Mining Techniques
Techniques to extract interesting knowledge from graph
Exploit graph primitives in order to mine knowledge or patterns that can
characterize the graph database
Example of Techniques:
• Frequent Pattern (Graph) Mining
• Enumerate Quasi-Cliques
• Graph Clustering
• Etc…
Techniques are available for both graph database settings (Single /
Collection) but, techniques for one setting do not directly work for
another setting
15
Graph Mining Techniques
Frequent Graph Mining:
Given a data graph G, find all the patterns (subgraphs) that
have a frequency greater than a certain threshold t
Applications:
Mine frequent behavior in any kind of network (biological,
ecological, co-authorship, remote sensing image graph,
social network)
Graph Clustering:
Given a data graph G, find k groups
of nodes such that the nodes
belonging to a group are structurally
and semantically similar
Applications:
Network Partitions, Module identification in
16
biological networks (gene-gene interactions)
(Social) Network Analysis
Techniques and approaches to supply information about
the global network structure but also inspecting the role
of the nodes in the network
Describe structural properties of the network at global and local scale
Example of Techniques:
• Community Detection
• Link Analysis
• Role Identification (hubs, outliers, etc..)
• Etc…
Techniques are mainly exploited to analyze single large graphs since
they were introduce in the context of social network analysis but, now,
they are exploited to analyze any kind of network
17
(Social) Network Analysis
Community Detection:
Given a data graph G, find a set l of (overlapping) group such
that each group contains nodes linked each other.
Applications:
Community extraction in Social Network, Extract plant
group in Phytosociology Mine gene (or protein) modules in
biological network
Link Analysis:
Given a data graph G, find a rank of
the graph nodes related to their
importance in the graph topology
Applications:
Leader identification in network, Node ranking
by structural importance
18
Knowledge Graphs
Knowledge bases that can be represented as graph.
Metadata and data represented by RDF format or linked
open data
Useful to store data and Metadata on which reasoning can be done
Example of Techniques:
• SPARQL Query
• Reasoning
• Etc…
The knowledge (ontology) is represented as graph and can be
navigated and browsed by logical operator and reasoning is possible
19
Knowledge Graphs
RDF format:
Data and Metadata are represented with triple (subject,
predicate, object).
Applications:
Web of data representation language, Linked Open
data, data publishing, metadata exchange and data
description in any domain (social, biological, chemical,
environmental, ecological)
SPARQL:
Language to query and manipulate
graph RDF. The answer will be a set
of triples that match the query
variable and structure
Applications:
Knowledge based Question Answering,
Knowledge-Based Reasoning
20
Graph Visualization
Techniques and Approaches that can be used to perform
Visual Analytics on network/graph data
Techniques developed to visualize graph structure highlighting
particular properties of these graphs.
Example of Techniques:
• Vertex Positioning
• Edge Bundling
• Etc…
These kind of techniques allow interaction to navigate and browse the
visual graph structure and perform zoom-in/zoom-out operations. The
general schema involves visual metaphors to abstract network
properties
21
Graph Visualization
1664
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,
VOL. 22,
NO. 6,
JUNE
Vertex Positioning:
Given a graph G, locate the vertex V of the graph in a two (or
three) dimensional space in roder to preserve topological
properties and highlight graph specificity
Applications:
Graph Visualization, Visual
inspection of any kind of graph
Fig. 2. Evaluating the effectiveness of clustering coefficient on quadrilateral Simmelian backbone for a synthetic network with a hidden group s
ture. Highest clustering coefficient (a) denotes the parameter, where the groups just start breaking apart (d), which is also the point where the re
ing backbone is most similar to the ground-truth cluster graph. (e) Filtering removes more and more intra-cluster edges and destroys the rel
positioning of the groups.
Edge Bundling:
of how clear the cluster structure in the network is, but
without performing the actual clustering task. This quantification then allows to give visual support for manual selection and a fully automatic parameter extraction.
Instead of performing clustering by choosing among the
vast amount of existing methods [19], [20], we measure an
often observed side effect of clusters in networks, namely a
high average clustering coefficient [21]. The clustering coefficient captures to which extent the neighborhood of a vertex
is connected amongst themselves.
In a series of backbones, with varying sparsification
parameter, the main assumption is that a backbone with a
high clustering coefficient is more likely to contain cohesive
groups than a backbone with a low clustering coefficient. If
the quantification using the clustering coefficient is effective, its highest value should point us to the sparsification
parameter, where the resulting backbone is most similar to
a predefined cluster graph [22] representing the underlying
group structure.
More precisely, we use the phi coefficient as a similarity
measure to evaluate the effectiveness of the clustering coefficient. The phi coefficient can be understood as a correlation
measure between the entries of two matrices, where the first
Given a graph G, draw graph edges
to highlight graph specificities and
improve the visual result
Applications:
Highlight common interaction patterns, improve
visibility, support visual analysis of communities
22
3.1 Efficient Computation of the Clustering
Coefficient
We now investigate how the clustering coefficient can be c
puted efficiently for every possible sparsification paramet
The local clustering coefficient is defined as the perc
age of closed triples at a vertex v:
CðvÞ ¼
!ðvÞ jfðvi ; vj Þ 2 Ej vi ; vj 2 Nv gj
! "
¼
;
dðvÞ
tðvÞ
2
with !ðvÞ being the number of closed triples (triangles)
and tðvÞ the number of connected triples at v. For NðvÞ
we define the clustering coefficient to be zero, which p
ishes peripheral degree one vertices. The global (or aver
clustering coefficient is then
1 X
CðvÞ:
C! ¼
jV j v2V
While Sun et al. [23] investigate efficient updating sch
for network statistics, their update scheme for the cluste
coefficient is not clear to us. They give a runtime of Oð < d
Conclusions
Graphs are an ubiquitous structure to model many type of
data
Many useful tool to manage, analyze and visualize graph data
The Data Science field is working heavily on graph-based
approaches but sometimes the different communities do not
communicate each other
Main issue
Each field (social network, biology, ecology, remote sensing,
semantic web, etc..) has its own specificity and needs
specific graph primitives
23
Conclusions
What about Ecology and Graphs?
Many tools already exist, a possible road map to follow:
1) Understand which kind of graph representation I need
2) Understand which are the analysis I want to do on my
data (query, structure summarization, role identification,
frequent structure mining, classification, etc..)
3) Search for algorithms or methods that can fit my needs
4) Deploy and Adapt such methods to my problem
5) Maybe…come back to point 1) to adjust my previous
considerations
24
People I’m working with on Graph Analysis
MCF A. Sallaberry
Lirmm - France
(Graph Visualization)
Post-Doc R. Interdonato
Univ. Calabria - Italy
(Social Network Analysis)
MCF F. Ulliana
Lirmm - France
(Knowlege Graph & Reasoning)
Prof. Pascal Poncelet
Lirmm - France
(Graph Data Mining)
Researcher A. Tagarelli
Univ. Calabria - Italy
(Social Network Analysis)
PhD Student V. Ingalalli
Lirmm - France
(Graph Data Management)
Post-Doc F. Nor-Guttler
Dynafor - France
(Remote Sensing)
PhD Student L. Khiali
MTD - France
(Data Mining)
25
Thank you for your attention
Questions
26