Download pptx

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Transcript
Presented by: Omar Alqahtani
Spring 2016
Authors:
Publication:

ICDE 2015
Type:

Research Paper
2

Data Exploration platforms assist users to discover interesting objects within large
volumes of scientific and business data.

Similar to top-k and skyline, but what is it?

Data diversification is to extract from a query result, a small set of non-redundant points
that are diverse among themselves according to some distance measure.

Current approach is process-first-diversity-next. Drawback?

Motivation: the need to efficiently provide users with effective insights during data
exploration.
3


Progressive Data Diversification (pDiverse) scheme.

The main idea is to detect and prune those data points in the query result that cannot be
included in the final diverse set.

By utilizing partial distance computation, will reduce the amount of CPU and I/O Incurred
during query diversification.
Also,

Progressive Greedy (pGreedy) heuristic, which forms the core of our pDiverse scheme.

Extending pGreedy to work with column-store.

Integrated model, which combined range query with the diversification.

Optimizing pDiverse by incorporating novel techniques for ordering of dimensions and
approximation of diversity
4

Mostly, there are three categories of diversification:
Content based -- Novelty based --
Semantic coverage based

Formal definition:

It is NP-Hard problem, so, greedy-based heuristics are the ones most widely used.
5
Presented by: Omar Alqahtani
Spring 2016
Authors:
Publication:

ICDE 2015
Type:

Research Paper
7

Query execution performance of database systems depends heavily on query
optimization decisions.

Best possible plan, mostly, needs cost model to estimate performance of viable
alternatives.

Cost models rely on statistics about the data. But?

As a result, commercial DBMS often assume uniform data distributions and attribute
value independence, which is in reality hardly the case.

Suboptimal plans

Subpar performance
8
9

They define robustness in the context of query processing as:
The ability of a system to efficiently cope with unexpected and adverse conditions, and
deliver near-optimal performance for all query inputs.
10
Based on:

Understanding of the data distributions is a continuous process.

Also, distribution may develop throughout the execution of a query plan.

Since one execution strategy might not be optimal over the entire data set.
They propose:

A new class of morphable operators that continuously and seamlessly adjust their execution
strategy as the understanding of the data evolves.

Smooth Scan Operator that morphs between an index look-up and a full table scan, which:

achieves near-optimal performance regardless of the operator’s selectivity

obliviously to the existing data statistics.
11

Some works focus on dealing with the problem at the optimizer level, but:


in dynamic environments, they could bring only partial benefits as the environment keeps
changing even after optimization.
Orthogonal approaches on run-time adaptivity, however:

They are lacking the flexibility at the level of access paths.

remain sensitive to the accuracy of statistics.
12
Presented by: Zohreh Raghebi
Spring 2016
Authors:
Publication:

ICDE 2015
Type:

Research Paper
14

Rapid growth of event based social network services

Meetup and Plancast

Connects people through events

Allow users to form online groups

Publish and announce events to other group members
15

1) Which groups would a particular user like to join?

2) Which tags might a group choose when constructing its profiles?

3) Who will attend an upcoming event?

To design recommendation systems for three specific tasks
Tags to groups
groups to users
Events to users
16

[1] Proposed a factorization model


[2] Introduced a topic model


To exploits social and location features for event-based group
recommendation
To solve the tag recommendation problem for groups
[3] Used a simple graph-based approach

To recommend users for an event

Performs the information diffusion over user network
Lack of general
solution
17

To model the interactions between multiple entities

Users, Events, Groups, and Tags

Analyzing the data to extract some useful temporal patterns of user behaviors

Convert the recommendation problem into a node proximity calculation problem
18

To evaluate the node proximity

Heterogeneous graph contains multiple types of entities

Influence each other via different types of interactions

To balance the importance of these influences for proximity calculation

The importance of them may vary from one recommendation problem to
another
19

Random Walk with Restart (RWR) to calculate node proximity for recommendations

RWR is developed on univariate Markov chain for homogeneous graphs

As a generalization, multivariate Markov chain (MMC)


To model the random walk process in a heterogeneous graph
MMC is able to explicitly model the influences between different entities
20

Existing MMC based methods need to manually set the influence weights between
different types of entities

Multiple types of entities exist

Learning scheme tries to fid the optimal set of weights
21

A general model, to handle multiple recommendation problems in an event-based
social network

To avoid the issue of manual parameter assignment

Propose a learning framework to find appropriate parameters for the model

The values of learned parameters indicate the importance of different types of entities
in different recommendation tasks

Better understandings on user behavior in an event-based social network
22
Presented by: Zohreh Raghebi
Spring 2016
Authors:
Publication:

ICDE 2015
Type:

Research Paper
24

Knowledge is represented as a graph

There is uncertainty in the presence of each edge in the graph

Uncertain graphs have been used extensively

Communication networks

Social networks

Protein interaction networks
25

Identification of dense substructures within a graph

Clique, a completely connected subgraph

Maximal clique, is a clique that is not contained within any other clique

Enumerating all maximal cliques

Finding overlapping communities from social networks

Finding overlapping multiple protein complexes

Analysis of email networks
26

Clique in an uncertain graph
 A set of vertices that has a high probability of being a
completely connected subgraph

Applications
 Finding sets of vertices help to unearth robust
communities within an uncertain graph

A group of proteins such that it is likely that each protein interacts with each
other protein
27

A set of vertices U is an α-maximal clique if U is a clique with probability at least α

There does not exist a vertex set S such that U ⊂ S and S is a clique with probability at
least α

When α = 1, we have the notion of a maximal clique in a deterministic graph
28

The problem of finding reliable subgraphs


In contrast, interested in finding subgraphs that are not just connected,


Finding subgraphs that are connected with a high probability
Fully connected with a high probability
Enumerating the k cliques with the highest probability of existence

Focus on enumerating all α-maximal cliques in a graph
29

f(n, α) be the maximum number of α-maximal cliques

Proofs……………
30

Using depth-first-search (DFS) with backtracking

Starts with a set of vertices C that is an α-clique

Incrementally adds vertices to C


While retaining the property of C being an α-clique
The algorithm backtracks to explore other possible vertices

until all possible search paths have been explored
31

First, To save the effort of needing to check if a new vertex v can be used to extend C

Consider only those vertices that are already connected to every vertex within C

This leads us to incrementally track vertices that can still be used to extend C
32

Second, not all vertices that extend C into a clique preserve the property of C being an
α-clique.

Adding a new vertex v to C decreases the clique probability


By a factor equal to the product of the edge probabilities between v and every vertex in C.
Incrementally maintaining this factor for each vertex v
33
34