* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Social-network-worksheet
Survey
Document related concepts
Transcript
Social Network Mining for Digital Library Application Dr. Thanachart Ritbumroong King Mongkut’s University of Technology Thonburi Assist. Prof. Dr. Satidchoke Phosaard Suranaree University of Technology Agenda Social Media Mining Concepts Data Extraction and Preparation Social Network Analysis Social Media Mining for Recommender System SOCIAL MEDIA CONCEPTS Social Media Mining The process of representing, analyzing, and extracting actionable patterns from social media data The study on how individuals (also known as social atoms) interact and how Social Molecule communities (i.e., social molecules) form Social Media is … “the group of internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of usergenerated content” … (Kaplan and Haenlein, 2010) Applications Facebook: People you may know Amazon: Other customers suggested these items Netflix: movie suggestions for you Targeted marketing Online advertising Major Components of a Network Vertices – Nodes, agents, entities, or items – Representing people, or social structures (workgroups, teams, organizations, institutions, states, or countries) Edges – Links, ties, connections, or relationships – Connecting two vertices together – Representing proximity, collaborations, partnerships, transactions, etc. Directed Graph ʋ1 ʋ2 ʋ3 Unidirected Graph ʋ1 ʋ2 ʋ3 Network Data Differ from attribute data Two ways of presenting network data – as a matrix Mike Nicole Tim Mike Nicole 0 1 1 Tim 0 0 0 Mike 1 0 0 – as an edge list Vertex 1 Vertex 2 Nicole Tim Nicole Mike Mike Nicole Nicole Tim Types of Networks Full, Partial, and Egocentric Networks – Full: contain all entities – Partial: subset, topic centric – Egocentric: include only individuals who are connected to a specified ego (person) Unimodal, Multimodal, and Affiliation Networks – Unimodal: one type of vertex – Multimodal: many types of vertex (persons, posts, pictures, etc.) – Affiliation: bimodal network Multiplex Networks – multiple types of connection (following, reply to, mention, etc.) Network Analysis Metrics Aggregate Networks Metrics: describing entire networks – Density • the level of interconnectedness of the vertices • a count of the number of relationships observed to be present in a network divided by the total number of possible relationships that could be present – Centralization • the amount to which the network is centered on one or a few important nodes Density the total number of possible relationships directed graph emax = n*(n-1) density = e/ emax undirected graph emax = n*(n-1)/2 Directed Graph ʋ1 ʋ2 ʋ3 density = 3/ 6 = 0.5 Unidirected Graph ʋ1 ʋ2 ʋ3 density = 2/ 3 = 0.67 Centralization Freeman’s general formula for centralization: C maximum value in the network g CD i1 (n ) CD (i) * D [(N 1)(N 2)] Degree Centralization CD = 0.167 CD = 1.0 CD = 0.167 Network Analysis Metrics Vertex-Specific Networks Metrics: describing a specific vertex – Degree Centrality • a simple count of the total number of connections linked to a vertex • for directed networks; in-degree (point inward) and out-degree (point outward) – Betweenness Centralities • the amount to which the network is centered on one or a few important nodes Normalized Degree Centrality divide by the max. possible, i.e. (N-1) Betweenness Centrality how many pairs of individuals would have to go through you in order to reach one another in the minimum number of hops? A B C D E A lies between no two other vertices B lies between A and 3 other vertices: C, D, and E C lies between 4 pairs of vertices (A,D),(A,E),(B,D),(B,E) NODEXL: INTRODUCTION What is NodeXL? An open source software for social network analysis Extension to Microsoft Excel Easy to use Provide basic network analysis and visualization features www.codeplex.com/NodeXL NodelXL Template Edges – Vertex 1 = source – Vertex 2 = destination – Attributes NodeXL Edge List Vertices will be automatically generated Showing the graph There are several automatic layouts that can be selected from the control in the graph pane or in the NodeXL ribbon. Fruchterman-Reingo Harel-Koren Fast Multiscale Adding descriptive data Color – CSS color names – RGB format (240, 12, 135) Size – Between 1 and 100 Shape – – – – – – – – – – – 1 = Circle 2 = Disk 3 = Sphere 4 = Square 5 = Solid Square 6 = Diamond 7 = Solid Diamond 8 = Triangle 9 = Solid Triangle 10 = Label 11 = Image Label Color Autofilling Allowsyou to provide instructions on how NodeXL should fill in the worksheet columns such as those relating to size and shape. Autofilling Graph with details Calculating Metrics Network analysis metrics can be automatically calculated in NodeXL. Once completed, NodeXL displays each vertex specific metric in a set of Graph Metrics columns in the Vertices worksheet. Graph Metrics Graph type. Undirected or directed Vertices. The number of total vertices Unique edges. The number of unique edges found in the edges worksheet. Edges with duplicates. The number of repeated vertex pairs on the edges worksheet. Total edges. The number of total edges Self-loops. The number of edges that connect a vertex with itself. Graph Metrics (Cont') Connected components. The number of connected components (i.e., clusters of vertices that are connected to each other but separate from other vertices in the graph). Single vertex connected components. The number of isolated vertices that are not connected to any other vertices in the graph. Maximum vertices in a connected component. The number of vertices in the connected component with the most vertices. Graph Metrics (Cont') Maximum edges in a connected component. The number of edges in the connected component with the most edges. Maximum geodesic distance (diameter). The geodesic distance is the length of the shortest path between two people. Average geodesic distance. The average of all geodesic distances. This value gives a sense of how “close” community members are from one another. Graph density. The number between 0 and 1 indicating how interconnected the vertices are in the network. Vertex Specific Metrics Degree – The degree of a vertex (sometimes called degree Centrality) is a count of the number of unique edges that are connected to it. Betweenness Centrality – how many pairs of individuals would have to go through you in order to reach one another in the minimum number of hops? Closeness Centrality – How close each person is to the other people in the network – the inverse of the sum of the shortest distances between the vertex and all other vertices reachable from it Vertex Specific Metrics Eigenvector Centrality – takes into consideration not only how many connections a vertex has (i.e., its degree), but also the degree of the vertices that it is connected to – a measure of the importance of a node in a network – It assigns relative scores to all nodes in the network based on the principle that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring node Pagerank – the importance of each vertex within the graph using a link analysis algorithm developed by Larry Page Clustering Coefficient – a vertex in a graph quantifies how close the vertex and its neighbors are to being a clique (complete graph)." Import data from Social Media Social Network Importer allow users to directly download and import different Facebook networks http://socialnetimporter.codeplex.com/ Installation Guide – Close NodeXL – Download the zip file from http://socialnetimporter.codeplex.com/ Unzip the file: you will find two items: FacebookAPI.DLL SocialNetImporter.DLL – Copy these files to the NodeXL Plug-ins Directory specified in the "Import Options..." (Using third-party graph data importers in NodeXL Excel Template 2014) – Restart NodeXL: you should see the Facebook Import option in the NodeXL>Data>Import menu. Exercise: Facebook Download data from your facebook account Edge List Visualizing Social Network Calculate Metrics Vertex Specific Metrics NODEXL: CLUSTERING Clustering NodexL can automatically identify clusters based on the network structure. An algorithm will look for groups of densely clustered vertices that are only loosely connected to vertices in another cluster. The number of clusters is not predetermined; instead the algorithm dynamically determines the number it thinks is best. Clustering Results Visualizing Clusters NODEXL: MULTIMODAL NETWORK Import data from FB Fanpage Using Social Network Importer to download FB fanpage data LibraryCMU FB Fanpage Data Visualizing Likes & Comments PERSONALIZATION AND RECOMMENDER SYSTEMS Personalization Information and services can be modified to meet the unique and specific needs of an individual or a community by changing presentation, content, and/or services based on a person’s task, background, history, device, information needs, location, etc. (user’s context) Recommender Systems A type of personalization that learns about a person’s needs and then proactively identify and recommend information that matches those needs Useful when they identify information a person was previously unaware of Can be user-driven which involves a user directly invoking and supporting the personalization process by providing explicit input. Collaborative Content-Based systems focus on properties of items. Similarity of items is determined by measuring the similarity in their properties. Collaborative-Filtering systems focus on the relationship between users and items. Similarity of items is determined by the similarity of the ratings of those items by the users who have rated both items. Simple way to do it Transforming Multimodal Affiliation Networks into Unimodal Networks – Bimodal affiliation networks can be transformed into two single-mode networks – person-to-affiliation network person-to-person network – person-to-page network person-to-person network – person-to-item network person-to-person network Person-to-affiliation network Example of person-to-affiliation network username Adam Kuban Adam Kuban Adam Kuban Adam Kuban ag3208 AliceBlue AliceBlue alliect Alm25 Amandarama Amandarama annatr annatr annatr anniedra annien arm1970 atom12 AuntJone avryan BananaMonkey BangieB Barbieri13 bebes bessfour discussion B_InVideos B_SupposedTop10 B_WomanFindsCell F_Portland F_CuttingMelon F_DoubleParked F_Portions F_Vietnamese F_BestFarmers F_CuttingMelon F_IveNeverTasted F_SundriedTomatoes F_BestFarmers F_Portions F_IveNeverTasted F_ChezLaurence F_BestFarmers B_WomanFindsCell F_FearBroiling F_SundriedTomatoes F_BestFarmers F_CheffTell F_FearBroiling F_SundriedTomatoes F_FearBroiling Create person-to-affiliation matrix Use pivot table to create the matrix affiliation user count of relationships Create an affiliation-to-affiliation matrix Create the matrix by summing up products of the relationships between two affiliations affiliation affiliation Sum of product Similarity measures Cosine-based similarity Also known as vector-based similarity, this formulation views two items and their ratings as vectors, and defines the similarity between them as the angle between these vectors: Example x = (4.75,4.5, 5,4.25,4) y = (4,3, 5,2,1) x 4.752 4.52 52 4.252 42 10.09 y 42 32 52 22 12 7.416 x y (4.75 4) (4.5 3) (5 5) (4.25 2) (4 1) 70 cos( x, y ) 70 0.935 10.09 7.416 Bibilography Hansen, D., Shneiderman, B., & Smith, M. A. (2010). Analyzing social media networks with NodeXL: Insights from a connected world. Morgan Kaufmann. Russell, M. A. (2013). Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More. " O'Reilly Media, Inc.". Xu, G., Zhang, Y., & Li, L. (2010). Web mining and social networking: techniques and applications (Vol. 6). Springer. Zafarani, R., Abbasi, M. A., & Liu, H. (2014). Social Media Mining: An Introduction. Cambridge University Press.