Download E6909 presentation - Network Algorithms and Dynamics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Transcript
ELEN E6906 Network Algorithms and Dynamics
Influence Maximization
based on: Chen, Wei, Yajun Wang, and Siyu Yang.
"Efficient influence maximization in social networks."
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,
2009.
Binyamin Stein
Abstract
Influence maximization is the problem of finding a small subset of nodes (seed nodes) in a social network
that could maximize the spread of influence. In this paper, we study the efficient influence maximization
from two complementary directions. One is to improve the original greedy algorithm of [5] and its
improvement [7] to further reduce its running time, and the second is to propose new degree discount
heuristics that improves influence spread. We evaluate our algorithms by experiments on two large
academic collaboration graphs obtained from the online archival database arXiv.org. Our experimental
results show that (a) our improved greedy algorithm achieves better running time comparing with the
improvement of [7] with matching influence spread, (b) our degree discount heuristics achieve much better
influence spread than classic degree and centrality-based heuristics, and when tuned for a specific influence
cascade model, it achieves almost matching influence thread with the greedy algorithm, and more
importantly (c) the degree dis- count heuristics run only in milliseconds while even the improved greedy
algorithms run in hours in our experiment graphs with a few tens of thousands of nodes.
Based on our results, we believe that fine-tuned heuristics may provide truly scalable solutions to the
influence maximization problem with satisfying influence spread and blazingly fast running time.
Therefore, contrary to what implied by the conclusion of [5] that traditional heuristics are outperformed by
the greedy approximation algorithm, our results shed new lights on the research of heuristic algorithms.
Chen, Wei, Yajun Wang, and Siyu Yang. "Efficient influence maximization in social networks." Proceedings of the 15th ACM SIGKDD
international conference on Knowledge discovery and data mining. ACM, 2009.
Influence Maximization
• In a social network, influence refers to ability of a user to
promote or spread a concept to other users via a “word-ofmouth” effect.
• If an idea is introduced by multiple users of high influence, the
idea will spread more quickly.
• If there is a limiting factor on the number of users that can
introduce the idea, then it is of interest to chose the subset of
users with the most influence.
• This problem is referred to as influence maximization.
A Social network can be
modeled as an undirected
graph G = (V, E), where
• vertices 𝑉 represent the
individual users
• 𝑛 is the number of vertices
• edges 𝐸 represent the
relationships between
users
• 𝑚 is the number of edges
“Social Network Analysis (SNA) Diagrams,” Social
Network Analysis (SNA) Software with Sentinel
Visualizer Diagrams. [Online]. Available at:
http://www.fmsasg.com/socialnetworkanalysis/.
[Accessed: 13-May-2016].
The Challenge
• Modern social networks tend to
• be large-scale
• have complex connection structures
• be dynamic (change in time)
and therefore an efficient and scalable solution is required
• However, this problem is NP-hard, meaning it is only practical to
find approximate solutions.
• The standard to beat is the greedy algorithm and a
improvement called CELF optimization
The Greedy Algorithm
𝑅𝑎𝑛𝐶𝑎𝑠(𝑆) runs a random
influence propagation from seed
𝑆 and returns the set of vertices
influenced in that propagation.
Proposed by: D. Kempe, J. M.
Kleinberg, and É. Tardos.
Maximizing the spread of influence
through a social network. In
Proceedings of the 9th ACM SIGKDD
Conference on Knowledge Discovery
and Data Mining, pages 137–146,
2003.
The Greedy Algorithm
𝑅𝑎𝑛𝐶𝑎𝑠(𝑆) runs 𝑂(𝑚)
Total time complexity:
𝑂(𝑘𝑛𝑅𝑚)
Cost-Effective Lazy Forward (CELF)
Optimization
• Takes advantage of submodularity of the influence
maximization objective:
• i.e., the fact that the incremental increase in influence due to
adding a vertex 𝑣 into seed 𝑆 is smaller if 𝑆 is larger,
• Does so by only computing 𝑠𝑣 for a vertex 𝑣 in iteration 𝑖 if the
value in iteration 𝑖 − 1 was larger than all the values of 𝑠𝑣
calculated already in iteration 𝑖.
• Makes the greedy algorithm ~700x faster.
• Proposed by: J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos,J. VanBriesen,
and N. S. Glance. Cost-effective outbreak detection in networks. In Proceedings
of the 13th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,
pages 420–429, 2007.
Proposal
• Find possible algorithms to beat the greedy-CELF
results:
• Without compromising the influence spread
• With a worthwhile compromise in influence spread
The Cascade Model
𝑅𝐺′ (𝑆) returns the set of
vertices reachable from
vertice set 𝑆 in randomly
generated graph 𝐺 ′ .
Graph 𝐺
Graph 𝐺
Seed set 𝑆
Edges in 𝐺′
Graph 𝐺
Seed set 𝑆
Edges in 𝐺′
𝑅𝐺′ (𝑆)
Graph 𝐺
Seed set 𝑆
Edges in 𝐺′
𝑅𝐺′ (𝑆)
𝑣 ∉ 𝑅𝐺′ (𝑆)
𝑅𝐺 ′ ({𝑣})
The Cascade Model
𝑅𝐺′ (𝑆) runs 𝑂(𝑚)
Total time complexity:
𝑂(𝑘𝑅𝑚)
Degree Discount Heuristics
• Degree Heuristics make the influence of a vertice a function of
its degree. This is a very fast algorithm but does not produce as
large influence spread compared to the greedy algorithm.
• Degree discount heuristics take advantage of the fact that
expected number of additional vertices in the neighborhood of a
vertice 𝑣 that are influenced by adding 𝑣 to the seed is
1 − 𝑝 𝑡𝑣 ∙ 1 + 𝑑𝑣 − 𝑡𝑣 ∙ 𝑝
= 1 + (𝑑𝑣 −2𝑡𝑣 − 𝑑𝑣 − 𝑡𝑣 𝑡𝑣 𝑝 + 𝑜(𝑡𝑣 ))𝑝
where 𝑡𝑣 is the number of seeds in the neighborhood of 𝑣.
The Degree Discount Algorithm
The Degree Discount Algorithm
Total time complexity:
𝑂(𝑘 log 𝑛 + 𝑚)
Experimental Results
Chen, Wei, Yajun Wang, and Siyu Yang. "Efficient influence maximization in social networks." Proceedings of the
15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009.
Results
• Combining the cascade model improvements with CELF
optimaization results in a 15% − 34% improvement in running
time over the CELF-greedy model, while matching it’s influence
spread.
• Degree discount heuristics can improve the running time of the
cascade model by more than six orders of magnitude
(~1,000,000 × faster) and only decreases influence spread by
~3.5%, which is a vast improvement.
Thank You
Any Questions?