* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Exercise#1
Survey
Document related concepts
Transcript
Social Network Analytics Using NodeXL Dr. Thanachart Ritbumroong Social Media Analytics The process of representing, analyzing, and extracting actionable patterns from social media data The study on how individuals (also known as social atoms) interact and how Social Molecule communities (i.e., social molecules) form Social Media is … “the group of internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of usergenerated content” … (Kaplan and Haenlein, 2010) What is NodeXL? An open source software for social network analysis Extension to Microsoft Excel Easy to use Provide basic network analysis and visualization features www.codeplex.com/NodeXL Major Components of a Network Vertices – Nodes, agents, entities, or items – Representing people, or social structures (workgroups, teams, organizations, institutions, states, or countries) Edges – Links, ties, connections, or relationships – Connecting two vertices together – Representing proximity, collaborations, partnerships, transactions, etc. Directed Graph ʋ1 ʋ2 ʋ3 Undirected Graph ʋ1 ʋ2 ʋ3 Network Data Differ from attribute data Two ways of presenting network data – as a matrix Nicole Tim Mike Nicole 0 1 1 Tim 0 0 0 Mike 1 0 0 – as an edge list Vertex 1 Vertex 2 Nicole Tim Nicole Mike Mike Nicole NodelXL Template Edges – Vertex 1 = source – Vertex 2 = destination – Attributes NodeXL Edge List Vertices will be automatically generated Showing the graph There are several automatic layouts that can be selected from the control in the graph pane or in the NodeXL ribbon. Fruchterman-Reingo Harel-Koren Fast Multiscale Exercise#1 Use NodeXL to visualize the following network Vertex 1 Vertex 2 Nicole Tim Nicole Mike Mike Nicole Mike Nicole Tim Types of Networks Full, Partial, and Egocentric Networks – Full: contain all entities – Partial: subset, topic centric – Egocentric: include only individuals who are connected to a specified ego (person) Unimodal, Multimodal, and Affiliation Networks – Unimodal: one type of vertex – Multimodal: many types of vertex (persons, posts, pictures, etc.) – Affiliation: bimodal network Multiplex Networks – multiple types of connection (following, reply to, mention, etc.) Adding descriptive data Color – CSS color names – RGB format (240, 12, 135) Size – Between 1 and 100 Shape – – – – – – – – – – – 1 = Circle 2 = Disk 3 = Sphere 4 = Square 5 = Solid Square 6 = Diamond 7 = Solid Diamond 8 = Triangle 9 = Solid Triangle 10 = Label 11 = Image Label Color Autofilling Allowsyou to provide instructions on how NodeXL should fill in the worksheet columns such as those relating to size and shape. Autofilling Graph with details Exercise#2 Use NodeXL to visualize the following relationship Post Users who clicked like Picture: Selfie Tim, Mike, John, Bob, Nicole, Ann, Kate, Pam Picture: Dinner John, Bob, Nicole, Pam Picture: Party with friends Tim, Nicole, Ann, Kate, Pam Text: Complaining about weather Mike, Bob, Nicole Text: Sharing information about discounted coupon Tim, Mike, Ann, Kate, Pam Network Analysis Metrics Aggregate Networks Metrics: describing entire networks – Density • the level of interconnectedness of the vertices • a count of the number of relationships observed to be present in a network divided by the total number of possible relationships that could be present – Centralization • the amount to which the network is centered on one or a few important nodes Density the total number of possible relationships directed graph emax = n*(n-1) density = e/ emax undirected graph emax = n*(n-1)/2 Directed Graph ʋ1 ʋ2 ʋ3 density = 3/ 6 = 0.5 Unidirected Graph ʋ1 ʋ2 ʋ3 density = 2/ 3 = 0.67 Calculating Metrics Network analysis metrics can be automatically calculated in NodeXL. Once completed, NodeXL displays each vertex specific metric in a set of Graph Metrics columns in the Vertices worksheet. Calculate Metrics Graph Metrics Graph type. Undirected or directed Vertices. The number of total vertices Unique edges. The number of unique edges found in the edges worksheet. Edges with duplicates. The number of repeated vertex pairs on the edges worksheet. Total edges. The number of total edges Self-loops. The number of edges that connect a vertex with itself. Graph Metrics (Cont') Connected components. The number of connected components (i.e., clusters of vertices that are connected to each other but separate from other vertices in the graph). Single vertex connected components. The number of isolated vertices that are not connected to any other vertices in the graph. Maximum vertices in a connected component. The number of vertices in the connected component with the most vertices. Graph Metrics (Cont') Maximum edges in a connected component. The number of edges in the connected component with the most edges. Maximum geodesic distance (diameter). The geodesic distance is the length of the shortest path between two people. Average geodesic distance. The average of all geodesic distances. This value gives a sense of how “close” community members are from one another. Graph density. The number between 0 and 1 indicating how interconnected the vertices are in the network. Exercise#1 Determine Graph Density for each network Mike Mike Nicole Tim Tim b) a) Mike c) Nicole Nicole Bob Bob Network Analysis Metrics Vertex-Specific Networks Metrics: describing a specific vertex – Degree Centrality • a simple count of the total number of connections linked to a vertex • for directed networks; in-degree (point inward) and out-degree (point outward) – Betweenness Centralities • the amount to which the network is centered on one or a few important nodes Vertex Specific Metrics Degree – The degree of a vertex (sometimes called degree Centrality) is a count of the number of unique edges that are connected to it. Betweenness Centrality – how many pairs of individuals would have to go through you in order to reach one another in the minimum number of hops? Closeness Centrality – How close each person is to the other people in the network – the inverse of the sum of the shortest distances between the vertex and all other vertices reachable from it Vertex Specific Metrics Eigenvector Centrality – takes into consideration not only how many connections a vertex has (i.e., its degree), but also the degree of the vertices that it is connected to – a measure of the importance of a node in a network – It assigns relative scores to all nodes in the network based on the principle that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring node Pagerank – the importance of each vertex within the graph using a link analysis algorithm developed by Larry Page Clustering Coefficient – a vertex in a graph quantifies how close the vertex and its neighbors are to being a clique (complete graph)." Let's use this network to demonstrate Degree The Degree of a vertex (sometimes called Degree Centrality) is a count of the number of edges that are connected to it. If we were using an undirected graph (such as the Party Network), the single Degree metric would be split into two metrics: – (1) In-Degree, which measures the number of edges that point toward the node of interest, and – (2) Out-Degree, which measures the number of edges that the node of interest points toward. Fill in the degree of each node Betweenness Centrality Vertices that are included in many of the shortest paths between other vertices have a higher Betweenness Centrality than those that are not included. In a case where Betweenness Centrality is 0, if this person was removed from the graph everyone would still be connected to everyone else and their shortest communication paths would not even be altered. High Betweenness Centrality indicates that the person acts as a “bridge” in passing information. Fill in betweeness centrality of each node Closeness Centrality Another characteristic you may care about is how close each person is to the other people in the network. If information flowed through edges in the network, some people would be able to contact all the other people in only a few steps, while others may require many steps. Closeness Centrality is a measure of the average shortest distance from each vertex to each other vertex. It is equal to 1 / sum of the distances between that node and all other nodes Fill in closeness centrality of each node Eigenvector Centrality In many cases, a connection to a popular individual is more important than a connection to a loner. The Eigenvector Centrality metric attempts to take into consideration not only how many connections a node has (i.e., its Degree), but also the Degree of the nodes that it is connecting to. Fill in eigenvector centrality of each node Exercise#3 Analyze this network Import data from Social Media Social Network Importer allow users to directly download and import different Facebook networks http://socialnetimporter.codeplex.com/ Installation Guide – Close NodeXL – Download the zip file from http://socialnetimporter.codeplex.com/ Unzip the file: you will find two items: FacebookAPI.DLL SocialNetImporter.DLL – Copy these files to the NodeXL Plug-ins Directory specified in the "Import Options..." (Using third-party graph data importers in NodeXL Excel Template 2014) – Restart NodeXL: you should see the Facebook Import option in the NodeXL>Data>Import menu. Exercise: Facebook Download data from your facebook account Edge List Visualizing Social Network Exercise#4 Analyze Facebook Network NODEXL: CLUSTERING Clustering NodexL can automatically identify clusters based on the network structure. An algorithm will look for groups of densely clustered vertices that are only loosely connected to vertices in another cluster. The number of clusters is not predetermined; instead the algorithm dynamically determines the number it thinks is best. Clustering Results Visualizing Clusters Exercise#5 Analyze Amazon Purchase History Bibilography Hansen, D., Shneiderman, B., & Smith, M. A. (2010). Analyzing social media networks with NodeXL: Insights from a connected world. Morgan Kaufmann. Russell, M. A. (2013). Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More. " O'Reilly Media, Inc.". Xu, G., Zhang, Y., & Li, L. (2010). Web mining and social networking: techniques and applications (Vol. 6). Springer. Zafarani, R., Abbasi, M. A., & Liu, H. (2014). Social Media Mining: An Introduction. Cambridge University Press.