Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
EVOLVING SOCIAL GRAPHS CLUSTERING Affiliation Athena Vakali Department of Informatics, Aristotle University, Thessaloniki, Greece [email protected] Synonyms Evolving Community Detection Glossary SM-EC : Sequential Mapping-driven Evolving Clustering TS-EC: Temporal Smoothing-driven Evolving Clustering MD-EC : Milestones’ detection-driven Evolving Clustering IA-EC: Incremental Adaptation-driven Evolving Clustering Definition Social graphs: In the current Web 2.0 or Social Web era, users’ intensive engagement in social networking and content sharing applications results in the formation of a massive amount of new associations daily among the actors involved. The types of such associations vary, depending on the application at hand and the may correspond to either explicit relationships invoked by users’ actions. Apart from users, implicit associations between other types of social application “actors” can be inferred, such as between web resources (i.e. images, articles, videos, etc) annotated with the same metadata. In a given web application’s context, such a complex network comprising the involved “actors” that belong to different modes and the associations existing between these modes constitutes its corresponding social graph. Associations formed in the context of social networking applications are often multiway; i.e. they involve multiple entities (e.g. User A commenting on Post P of User B), and are more precisely captured in a generalized graph structure (i.e. hypergraph) with its (hyper)edges connecting more than two nodes. However, for simplicity and depending on the required analysis task, they are usually projected into simple graph structures G(V,E), V: the set of vertexes and E: the set of edges [1]. Due to the existence of different modes, independent subsets of nodes and edges are identified within V and E, respectively. For example, in a social tagging network, the modes of users U, resources R, and metadata (i.e. tags) M can be represented in V with three different set of nodes, V= {U; R; M}, while the bipartite associations existing between users-resources, resources-metadata and users- metadata can be represented with the corresponding sets of edges in E, i.e. E = {UR; RM; UM}. Depending on the creation process, social graph structures can be weighted / unweighted, unipartite / bipartite / multipartite, directed / undirected, etc. [2]. Figure 1 shows an example of transforming a tag assignment tripartite to a unipartite graph. Figure 1: Creation of a tripartite graph based on tag assignments in a social tagging network and its projection into a unipartite graph. u, r, and m stands for the User, Resource and Metadata modes, respectively, whereas ER and IR represent explict and implicit associations between the corresponding modes. Evolving social graphs: Entities and associations in social web applications undergo several states across time, since new entities e.g. users can appear, or old ones disappear, while some associations may have a longer duration than other, e.g. two users may constantly exchange messages, whereas other may have communicated just on a single occasion. In order to capture the evolution in social data, evolving representation structures are needed that model the different data states in successive time-steps [3]. Figure 2 depicts an evolving social dataset as a 3-layered stream G of individual successive snapshot graphs representing the existing data associations at a given time. The snapshot layer captures evolving social data as a (static) graph snapshots’ sequence, the segment layer partitions the online graph stream in segments, comprising successive similar snapshots and captures their connectivity in structures such as tensors (i.e. generalized matrices with more than two dimensions), while the stream layer represents directly the graph stream, modeled as a tensor updated on an upcoming snapshot or as a simple matrix, using appropriate edge aggregation or a multi-graph, updated in either a new association’s arrival (single update), or on a upcoming graph snapshot (batch update). For the sake of simplicity, simple graphs are often used instead of tensors, with their edges aggregating past interactions under a given weight updating scheme. Figure 2: Layers for evolving social data graph structures [3]. Evolving social graphs clustering: Social graph clustering or community detection is the process of identifying clusters or latent communities in a social graph. Given a social graph G = (V; E), a community C can be coarsely defined as a subgraph of G comprising a set of entities that are associated with a common element (e.g. a topic, an event, an activity or a cause) of interest [2]. Social data clustering is approached as an evolving process across time, assuming that social data continuously undergo changes generating successive correlated clustering states, while aiming to extract some kind of knowledge about identified patterns’ evolution [3]. A proposed classification [3] of evolving social graphs clustering approaches in four seminal categories (given in Table 1): Sequential Mapping-driven Evolving Clustering (SM-EC), Temporal Smoothing-driven Evolving Clustering (TS-EC), Milestones’ detection-driven Evolving Clustering (MD-EC), and Incremental Adaptation-driven Evolving Clustering (IA-EC). Table 1: Proposed classification of evolving social graphs clustering approaches representative approaches data model updating scheme evolution aspect clusters’ temporal dependency SM-EC [4] snapshot graphs/tensors snapshot graphs/tensors batch independent clustering at each time-step captures long-term changes in clusters’ composition/ignores shortterm concept drifts MD-EC [8] snapshot graphs/tensors and segments batch IA-EC [9,10] snapshot graphs/tensors and changestream single/ batch tracks individual cluster’s evolution assumes clusters evolve smoothly from the previous model/incorporates deviation as regularization term generates a new clustering model on an identified drastic change in data’s structure builds new clusters by adapting the previous ones TS-EC [5,6,7] batch clusters’ re-computation requirement required required identifies time-points when clustering structure drastically changes required identifies individual clusters’ evolution through different new data-dependent adaptation processes not required Cross-References • Inferring Social Ties and Communities in Social Networks • Modeling Social Preferences Based on Social Interactions • Models of Social Networks • Managing Social Graph Data: Flow of Retrieval and Maintenance References 1. Peter Mika. 2007. Ontologies are us: A unified model of social networks and semantics. Web Semant. 5, 1 (March 2007), 5-15. 2. Symeon Papadopoulos, Yiannis Kompatsiaris, Athena Vakali, and Ploutarchos Spyridonos. 2012. Community detection in Social Media. Data Min. Knowl. Discov. 24, 3 (May 2012), 515-554. 3. M. Giatsoglou and A. Vakali. 2012. Capturing Social Data Evolution via Graph Clustering. IEEE Internet Computing. Preprint. DOI: 10.1109/MIC.2012.24. 4. G. Palla, A.-L. Barabási, T. Vicsek. 2007. Quantifying social group evolution. Nature 446:7136, 664667. 5. D. Chakrabarti, R. Kumar, and A. Tomkins. 2006. Evolutionary clustering. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '06). ACM, New York, NY, USA, 554-560. 6. L. Tang, H. Liu, J. Zhang. 2011. Identifying Evolving Groups in Dynamic Multi-Mode Networks. IEEE Transactions on Knowledge and Data Engineering, 18 Jul. 2011, 72-85. 7. Y.-R. Lin, J. Sun, P. Castro, R. Konuru, H. Sundaram, and A. Kelliher. 2009. MetaFac: community discovery via relational hypergraph factorization. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '09). ACM, New York, NY, USA, 527-536. 8. J. Sun, C. Faloutsos, S. Papadimitriou, and P.S. Yu. 2007. GraphScope: parameter-free mining of large time-evolving graphs. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '07). ACM, New York, NY, USA, 687-696. 9. J. Sun, D. Tao, S. Papadimitriou, P.S. Yu, and C. Faloutsos. 2008. Incremental tensor analysis: Theory and applications. ACM Trans. Knowl. Discov. Data 2, 3, Article 11 (October 2008), 37 pages. 10.H. Ning, W. Xu, Y. Chi, Y. Gong, and T.S. Huang. 2010. Incremental Spectral Clustering by Efficiently Updating the Eigen-System. Pattern Recognition, 43(1).