Download Exercise#1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cracking of wireless networks wikipedia , lookup

Computer network wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Network tap wikipedia , lookup

Airborne Networking wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Transcript
Social Network Analytics
Using NodeXL
Dr. Thanachart Ritbumroong
Social Media Analytics
 The process of representing, analyzing, and extracting
actionable patterns from social media data
 The study on how individuals (also known as social atoms)
interact and how Social Molecule communities (i.e., social
molecules) form
 Social Media is … “the group of internet-based applications
that build on the ideological and technological foundations of
Web 2.0, and that allow the creation and exchange of usergenerated content” … (Kaplan and Haenlein, 2010)
What is NodeXL?
 An open source software for
social network analysis
 Extension to Microsoft Excel
 Easy to use
 Provide basic network
analysis and visualization
features
 www.codeplex.com/NodeXL
Major Components of a Network
 Vertices
– Nodes, agents, entities, or items
– Representing people, or social structures (workgroups, teams,
organizations, institutions, states, or countries)
 Edges
– Links, ties, connections, or relationships
– Connecting two vertices together
– Representing proximity, collaborations, partnerships, transactions, etc.
Directed Graph
ʋ1
ʋ2
ʋ3
Undirected Graph
ʋ1
ʋ2
ʋ3
Network Data
 Differ from attribute data
 Two ways of presenting network data
– as a matrix
Nicole
Tim
Mike
Nicole
0
1
1
Tim
0
0
0
Mike
1
0
0
– as an edge list
Vertex 1
Vertex 2
Nicole
Tim
Nicole
Mike
Mike
Nicole
NodelXL Template
 Edges
– Vertex 1 = source
– Vertex 2 = destination
– Attributes
NodeXL Edge List
Vertices will be automatically generated
Showing the graph
There are several automatic layouts that can be selected from the control in the
graph pane or in the NodeXL ribbon.
Fruchterman-Reingo
Harel-Koren Fast Multiscale
Exercise#1
 Use NodeXL to visualize the following network
Vertex 1
Vertex 2
Nicole
Tim
Nicole
Mike
Mike
Nicole
Mike
Nicole
Tim
Types of Networks
 Full, Partial, and Egocentric Networks
– Full: contain all entities
– Partial: subset, topic centric
– Egocentric: include only individuals who are connected to a specified
ego (person)
 Unimodal, Multimodal, and Affiliation Networks
– Unimodal: one type of vertex
– Multimodal: many types of vertex (persons, posts, pictures, etc.)
– Affiliation: bimodal network
 Multiplex Networks
– multiple types of connection (following, reply to, mention, etc.)
Adding descriptive data
 Color
– CSS color names
– RGB format (240, 12, 135)
 Size
– Between 1 and 100
 Shape
–
–
–
–
–
–
–
–
–
–
–
1 = Circle
2 = Disk
3 = Sphere
4 = Square
5 = Solid Square
6 = Diamond
7 = Solid Diamond
8 = Triangle
9 = Solid Triangle
10 = Label
11 = Image
 Label
Color
Autofilling
 Allowsyou to provide
instructions on how
NodeXL should fill in the
worksheet columns such
as those relating to size
and shape.
Autofilling
Graph with details
Exercise#2
 Use NodeXL to visualize the following relationship
Post
Users who clicked like
Picture: Selfie
Tim, Mike, John, Bob, Nicole, Ann, Kate, Pam
Picture: Dinner
John, Bob, Nicole, Pam
Picture: Party with friends
Tim, Nicole, Ann, Kate, Pam
Text: Complaining about weather
Mike, Bob, Nicole
Text: Sharing information about discounted coupon
Tim, Mike, Ann, Kate, Pam
Network Analysis Metrics
 Aggregate Networks Metrics: describing entire networks
– Density
• the level of interconnectedness of the vertices
• a count of the number of relationships observed to be present in a network
divided by the total number of possible relationships that could be present
– Centralization
• the amount to which the network is centered on one or a few important
nodes
Density
 the total number of possible relationships
 directed graph
emax = n*(n-1)
 density = e/ emax
 undirected graph
emax = n*(n-1)/2
Directed Graph
ʋ1
ʋ2
ʋ3
 density = 3/ 6 = 0.5
Unidirected Graph
ʋ1
ʋ2
ʋ3
 density = 2/ 3 = 0.67
Calculating Metrics
 Network analysis metrics
can be automatically
calculated in NodeXL.
 Once completed, NodeXL
displays each vertex
specific metric in a set of
Graph Metrics columns in
the Vertices worksheet.
Calculate Metrics
Graph Metrics
 Graph type. Undirected or directed
 Vertices. The number of total
vertices
 Unique edges. The number of
unique edges found in the edges
worksheet.
 Edges with duplicates. The
number of repeated vertex pairs on
the edges worksheet.
 Total edges. The number of total
edges
 Self-loops. The number of edges
that connect a vertex with itself.
Graph Metrics (Cont')
 Connected components. The
number of connected components
(i.e., clusters of vertices that are
connected to each other but
separate from other vertices in the
graph).
 Single vertex connected
components. The number of
isolated vertices that are not
connected to any other vertices in
the graph.
 Maximum vertices in a connected
component. The number of
vertices in the connected
component with the most vertices.
Graph Metrics (Cont')
 Maximum edges in a connected
component. The number of edges
in the connected component with
the most edges.
 Maximum geodesic distance
(diameter). The geodesic distance
is the length of the shortest path
between two people.
 Average geodesic distance. The
average of all geodesic distances.
This value gives a sense of how
“close” community members are
from one another.
 Graph density. The number
between 0 and 1 indicating how
interconnected the vertices are in
the network.
Exercise#1
 Determine Graph Density for each network
Mike
Mike
Nicole
Tim
Tim
b)
a)
Mike
c)
Nicole
Nicole
Bob
Bob
Network Analysis Metrics
 Vertex-Specific Networks Metrics: describing a specific
vertex
– Degree Centrality
• a simple count of the total number of connections linked to a vertex
• for directed networks; in-degree (point inward) and out-degree (point
outward)
– Betweenness Centralities
• the amount to which the network is centered on one or a few important
nodes
Vertex Specific Metrics
 Degree
– The degree of a vertex (sometimes
called degree Centrality) is a count
of the number of unique edges that
are connected to it.
 Betweenness Centrality
– how many pairs of individuals
would have to go through you in
order to reach one another in the
minimum number of hops?
 Closeness Centrality
– How close each person is to the
other people in the network
– the inverse of the sum of the
shortest distances between the
vertex and all other vertices
reachable from it
Vertex Specific Metrics
 Eigenvector Centrality
– takes into consideration not only how
many connections a vertex has (i.e., its
degree), but also the degree of the
vertices that it is connected to
– a measure of the importance of a node
in a network
– It assigns relative scores to all nodes
in the network based on the principle
that connections to high-scoring nodes
contribute more to the score of the
node in question than equal
connections to low-scoring node
 Pagerank
– the importance of each vertex within
the graph using a link analysis
algorithm developed by Larry Page
 Clustering Coefficient
– a vertex in a graph quantifies how
close the vertex and its neighbors are
to being a clique (complete graph)."
Let's use this network to demonstrate
Degree
 The Degree of a vertex (sometimes called Degree Centrality)
is a count of the number of edges that are connected to it. If we
were using an undirected graph (such as the Party Network),
the single Degree metric would be split into two metrics:
– (1) In-Degree, which measures the number of edges that point toward
the node of interest, and
– (2) Out-Degree, which measures the number of edges that the node of
interest points toward.
Fill in the degree of each node
Betweenness Centrality
 Vertices that are included in many of the shortest paths
between other vertices have a higher Betweenness Centrality
than those that are not included.
 In a case where Betweenness Centrality is 0, if this person was
removed from the graph everyone would still be connected to
everyone else and their shortest communication paths would
not even be altered.
 High Betweenness Centrality indicates that the person acts as a
“bridge” in passing information.
Fill in betweeness centrality of each node
Closeness Centrality
 Another characteristic you may care about is how close each
person is to the other people in the network.
 If information flowed through edges in the network, some
people would be able to contact all the other people in only a
few steps, while others may require many steps.
 Closeness Centrality is a measure of the average shortest
distance from each vertex to each other vertex.
 It is equal to 1 / sum of the distances between that node and all
other nodes
Fill in closeness centrality of each node
Eigenvector Centrality
 In many cases, a connection to a popular individual is more
important than a connection to a loner.
 The Eigenvector Centrality metric attempts to take into
consideration not only how many connections a node has (i.e.,
its Degree), but also the Degree of the nodes that it is
connecting to.
Fill in eigenvector centrality of each node
Exercise#3
 Analyze this network
Import data from Social Media
Social Network Importer
 allow users to directly download and import different Facebook
networks
 http://socialnetimporter.codeplex.com/
 Installation Guide
– Close NodeXL
– Download the zip file from http://socialnetimporter.codeplex.com/
Unzip the file: you will find two items:
FacebookAPI.DLL
SocialNetImporter.DLL
– Copy these files to the NodeXL Plug-ins Directory specified in the "Import
Options..." (Using third-party graph data importers in NodeXL Excel
Template 2014)
– Restart NodeXL: you should see the Facebook Import option in the
NodeXL>Data>Import menu.
Exercise: Facebook
 Download data from your
facebook account
Edge List
Visualizing Social Network
Exercise#4
 Analyze Facebook Network
NODEXL: CLUSTERING
Clustering
 NodexL can automatically identify clusters based on the
network structure.
 An algorithm will look for groups of densely clustered vertices
that are only loosely connected to vertices in another cluster.
 The number of clusters is not predetermined; instead the
algorithm dynamically determines the number it thinks is best.
Clustering Results
Visualizing Clusters
Exercise#5
 Analyze Amazon Purchase History
Bibilography
 Hansen, D., Shneiderman, B., & Smith, M. A. (2010). Analyzing social
media networks with NodeXL: Insights from a connected world. Morgan
Kaufmann.
 Russell, M. A. (2013). Mining the Social Web: Data Mining Facebook,
Twitter, LinkedIn, Google+, GitHub, and More. " O'Reilly Media, Inc.".
 Xu, G., Zhang, Y., & Li, L. (2010). Web mining and social networking:
techniques and applications (Vol. 6). Springer.
 Zafarani, R., Abbasi, M. A., & Liu, H. (2014). Social Media Mining: An
Introduction. Cambridge University Press.