Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ELEN E6906 Network Algorithms and Dynamics Influence Maximization based on: Chen, Wei, Yajun Wang, and Siyu Yang. "Efficient influence maximization in social networks." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009. Binyamin Stein Abstract Influence maximization is the problem of finding a small subset of nodes (seed nodes) in a social network that could maximize the spread of influence. In this paper, we study the efficient influence maximization from two complementary directions. One is to improve the original greedy algorithm of [5] and its improvement [7] to further reduce its running time, and the second is to propose new degree discount heuristics that improves influence spread. We evaluate our algorithms by experiments on two large academic collaboration graphs obtained from the online archival database arXiv.org. Our experimental results show that (a) our improved greedy algorithm achieves better running time comparing with the improvement of [7] with matching influence spread, (b) our degree discount heuristics achieve much better influence spread than classic degree and centrality-based heuristics, and when tuned for a specific influence cascade model, it achieves almost matching influence thread with the greedy algorithm, and more importantly (c) the degree dis- count heuristics run only in milliseconds while even the improved greedy algorithms run in hours in our experiment graphs with a few tens of thousands of nodes. Based on our results, we believe that fine-tuned heuristics may provide truly scalable solutions to the influence maximization problem with satisfying influence spread and blazingly fast running time. Therefore, contrary to what implied by the conclusion of [5] that traditional heuristics are outperformed by the greedy approximation algorithm, our results shed new lights on the research of heuristic algorithms. Chen, Wei, Yajun Wang, and Siyu Yang. "Efficient influence maximization in social networks." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009. Influence Maximization • In a social network, influence refers to ability of a user to promote or spread a concept to other users via a “word-ofmouth” effect. • If an idea is introduced by multiple users of high influence, the idea will spread more quickly. • If there is a limiting factor on the number of users that can introduce the idea, then it is of interest to chose the subset of users with the most influence. • This problem is referred to as influence maximization. A Social network can be modeled as an undirected graph G = (V, E), where • vertices 𝑉 represent the individual users • 𝑛 is the number of vertices • edges 𝐸 represent the relationships between users • 𝑚 is the number of edges “Social Network Analysis (SNA) Diagrams,” Social Network Analysis (SNA) Software with Sentinel Visualizer Diagrams. [Online]. Available at: http://www.fmsasg.com/socialnetworkanalysis/. [Accessed: 13-May-2016]. The Challenge • Modern social networks tend to • be large-scale • have complex connection structures • be dynamic (change in time) and therefore an efficient and scalable solution is required • However, this problem is NP-hard, meaning it is only practical to find approximate solutions. • The standard to beat is the greedy algorithm and a improvement called CELF optimization The Greedy Algorithm 𝑅𝑎𝑛𝐶𝑎𝑠(𝑆) runs a random influence propagation from seed 𝑆 and returns the set of vertices influenced in that propagation. Proposed by: D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 137–146, 2003. The Greedy Algorithm 𝑅𝑎𝑛𝐶𝑎𝑠(𝑆) runs 𝑂(𝑚) Total time complexity: 𝑂(𝑘𝑛𝑅𝑚) Cost-Effective Lazy Forward (CELF) Optimization • Takes advantage of submodularity of the influence maximization objective: • i.e., the fact that the incremental increase in influence due to adding a vertex 𝑣 into seed 𝑆 is smaller if 𝑆 is larger, • Does so by only computing 𝑠𝑣 for a vertex 𝑣 in iteration 𝑖 if the value in iteration 𝑖 − 1 was larger than all the values of 𝑠𝑣 calculated already in iteration 𝑖. • Makes the greedy algorithm ~700x faster. • Proposed by: J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos,J. VanBriesen, and N. S. Glance. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 420–429, 2007. Proposal • Find possible algorithms to beat the greedy-CELF results: • Without compromising the influence spread • With a worthwhile compromise in influence spread The Cascade Model 𝑅𝐺′ (𝑆) returns the set of vertices reachable from vertice set 𝑆 in randomly generated graph 𝐺 ′ . Graph 𝐺 Graph 𝐺 Seed set 𝑆 Edges in 𝐺′ Graph 𝐺 Seed set 𝑆 Edges in 𝐺′ 𝑅𝐺′ (𝑆) Graph 𝐺 Seed set 𝑆 Edges in 𝐺′ 𝑅𝐺′ (𝑆) 𝑣 ∉ 𝑅𝐺′ (𝑆) 𝑅𝐺 ′ ({𝑣}) The Cascade Model 𝑅𝐺′ (𝑆) runs 𝑂(𝑚) Total time complexity: 𝑂(𝑘𝑅𝑚) Degree Discount Heuristics • Degree Heuristics make the influence of a vertice a function of its degree. This is a very fast algorithm but does not produce as large influence spread compared to the greedy algorithm. • Degree discount heuristics take advantage of the fact that expected number of additional vertices in the neighborhood of a vertice 𝑣 that are influenced by adding 𝑣 to the seed is 1 − 𝑝 𝑡𝑣 ∙ 1 + 𝑑𝑣 − 𝑡𝑣 ∙ 𝑝 = 1 + (𝑑𝑣 −2𝑡𝑣 − 𝑑𝑣 − 𝑡𝑣 𝑡𝑣 𝑝 + 𝑜(𝑡𝑣 ))𝑝 where 𝑡𝑣 is the number of seeds in the neighborhood of 𝑣. The Degree Discount Algorithm The Degree Discount Algorithm Total time complexity: 𝑂(𝑘 log 𝑛 + 𝑚) Experimental Results Chen, Wei, Yajun Wang, and Siyu Yang. "Efficient influence maximization in social networks." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009. Results • Combining the cascade model improvements with CELF optimaization results in a 15% − 34% improvement in running time over the CELF-greedy model, while matching it’s influence spread. • Degree discount heuristics can improve the running time of the cascade model by more than six orders of magnitude (~1,000,000 × faster) and only decreases influence spread by ~3.5%, which is a vast improvement. Thank You Any Questions?