Download 2. dynamic social network analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Hierarchal Structure of Community and Link Analysis
Seema Mishra
G C Nandi
Indian Institute of Information
Technology, Allahabad
Robotics and AI Lab
[email protected]
ABSTRACT
Discovering the hierarchy of organizational structure in a
dynamic social network can unveil significant patterns which
can help in network analysis. In this paper, we formulated a
methodology to establish the most influential person in a
temporal communication network from the perspective of
frequency of interactions which works on hierarchal structure.
With the help of frequency of interactions, we have calculated
the individual score of each person from Page Rank
algorithm. Subsequently, a graph is generated that showed the
influence of each individual in the network. Rigorous
experiments we performed using Enron data set to establish a
fact that our proposed methodology correctly identifies the
influential persons over the temporal network. We have used
Enron Company’s email data set that describes how
employees of company interacted with each other. We could
analyze from our methodology and verify from the facts in the
Company’s dataset since after bankruptcy, the result of
interactions and behaviors of the individual of the network are
absolutely known. Our result shows that the proposed
methodology is generic and can be applied to other data sets
of communication to identify influential at particular
instances.
General Terms
Data Mining, Link Mining, Link Analysis, Social Network
Analysis, Dynamic Social Network Analysis.
Keywords
Dynamic social network analysis, social network analysis,
hierarchical structure.
.
1. INTRODUCTION
A network structure is the perfect epitome provides a formal
way of representing data that emphasizes the association
between entities. This representation has a substantial
importance gives the insight of knowledge into the data. Since
for the work to be done many systems these days are
interconnected and behaviors of individuals reflect the
function of whole system at large extent. Networks are
primarily studied in mathematical framework i.e. graph [1].
In modern era, social network analysis is proliferated area of
research, has been in existence for quite some time and
experiencing a surge in popularity to understand the behavior
of the users at individual and group level [Wasserman &
Faust, 1994, Wellman, 1996]. Understanding the behavior of
Indian Institute of Information
Technology, Allahabad
Robotics and AI Lab
[email protected]
individual social networking methods assuage the analysts to
revealing hidden patterns from social communication. In order
to model the social network mathematically, most popular
data structure typically known as graphs are used where the
nodes depict the individual or group of person, or event or
organization
etc
and
each
link/edge
represents
connection/relationship between two individual. Social
network analysis attempts to understand the network and its
components like nodes (social entities commonly known as
actor or event) and connections (inter-connection, ties, and
links). It has main focus of analyzing individuals and their
interdependent relationships among them rather than
individuals and their attributes as we deal in conventional data
structure.
The target of this paper is to contribute:
a)
Proposing a hierarchical structure of social network
changing with time
1.1 TYPES OF SOCIAL NETWORK
Sociometric: It involves the entire population and focus on
global structure pattern of social network.
Egocentric: It focus individual interaction pattern for
analyzing social network. Limitation of this kind of analysis is
that it is very difficult to collect case by case data.
2. DYNAMIC SOCIAL NETWORK
ANALYSIS
Versatile power of social network is being applied to mining
pattern of social interaction in wide ranging applications
including: disease modeling [4] information transmission and
behavior analysis [2, 3, 16] and business management and
behavior analysis. Network analysis came in to picture as its
practical applications in intelligence and surveillance [5] and
has become popularized paradigm to uncover anti-social
network such like criminal, terrorist and fraud network
majorly after the tragically event of September 11th, 2001
which has shattered the whole world.
Social interaction could be in any form that depends on the
type of data available [6]. It might be verbal or written
communication (cell phones, emails, and blogs chatting),
scientific collaboration (co-authorship network, citation
network), browed websites, and group of animals. This
mathematical network model is very successful in analysis of
1
social network but major drawback is that it may miss the
temporal aspect of interaction because social interaction is
inherently dynamic in nature. The static model of interaction
can give the information that could be inaccurate and decision
made based merely on this contributed information might lead
analyst to faulty direction.
Several shortcomings can be highlighted when dealing with
static model of social network that could forbid
acknowledging the casual relationship of pattern of social
interaction [6]:
.

What is the rate of spreading diseases while modeling
diseases and who is the central person whom should be
vaccinated to control spread of it among group of person.

What are the causes and consequences of social structure
evolution?
Dynamic social network analysis is emerging research area
play a crucial role to fill gap between traditional social
network analysis and time domain. Dynamic study of network
includes classical network analysis, link analysis, and multiagent systems.
Dynamic network analysis facilitates the analysis of multiple
types of nodes (multi-node) and multiple types of links (multiplex) simultaneously. On the contrary, static network analysis
can only focus at most two mode data and analysis one type of
link at a time. There are several characteristics of dynamic
network:


Nature of nodes are dynamic, there properties changes
with respect to time.
Deals with meta-network.

Network evolution is consequent of agent-based
modeling.
Furthermore the network analysis exists in four levels [15]:
Attribute oriented analysis, Position oriented analysis, and
Structure oriented analysis, Dynamic network analysis. Fig1
shows the categorization. The attribute analysis captures the
properties of vertices and edges and finds the causal relation
with the structure of network.
Dynamic Social
Network Analysis
Structure
oriented
analysis
( t1 , t 2 ...t n )
Position
oriented
analysis
Fig 1: Level of network analysis
Position oriented analysis aims to investigate the mico level
stage of network. It looks into every single entity i.e. node or
organization and their characteristic in the network. Structure
analysis is macro level analysis of network and investigates
the average metrics of social network. Dynamic analysis can
utilizes the all measures of aforementioned analysis in order
to identify progressive behavior of network and individual.
2.1 ROLE OF DATA MINING IN
SOCIAL NETWORK
Social computing can make use of data mining techniques in
following analysis




Community Detection
Classification
Link Prediction
Network Modeling
2.2 COMMUNITY IN SOCIAL
NETWORK
A community is sub part of whole network between which
inter-community interaction is relatively frequent and strong
than intra-community interaction. It can be in any form for
example group, subgroup, and cluster. It may: a) citation
network represents related papers on single topics, b) web
pages on related topics.
Community detection is a classical problem in social network
analysis. Commonly, it can be the problem of identifying subgraphs of original graph and called vertex sparsifier [12].
These small networks uphold the relevant information of
original group. Four levels of analyses are being conducted in
community identification as shown in fig 1:
2.3 ANALYSIS OF PREVIOUS
RESEARCH
Hierarchical methods for community detection falls into two
categories: agglomerative and divisive. In former case each
node is assumed to be a community and repetitively group
together. Similarly, in later case initially whole network is
considered a community and divided subsequently into
smaller one. Most methods are graph clustering and partition.
Distance based structural equivalence [7] uses distance
metrics to identify similar entities. In graph partition methods
several algorithms has been proposed [8, 9]. Newman- Girvan
method and spectral clustering methods [10, 11] uses a notion
of modularity and utilizes edge between-ness metric to divide
into groups.
In analyzing dynamic pattern, many methods use the temporal
snapshots of interaction over the times [13, 14].
Attribute oriented analysis
2
3. DISCOVERING HEIRARCHY OF
GROUP
Before analyzing community hierarchy we define several
basic terminologies. Hierarchy of community provides the
power of each individual in a group. If somehow we know
this chain we can find the leader of group. Regarding this we
used well known algorithm PageRank which calculate the
individual score I_score of each person to represent
importancy of person. Higher the score of person more
powerful the person is.
Definition: If
P  { p1 , p2 ... pn }
be the collection of
persons involved in communication. For any members
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
float sum2 = 0
float e = 0
do loop:
del_prev=del
iter++
fort i = 1 to 150
R[1][i] = ((1 - d) / N) + d*fun(i)
for j = 0 to 150
sum2 += R[1][j]
sum1 += R[0][j]
e = sum1 - sum2
for j = 1 to 150
R[1][j] = R[1][j] + e*L[j]/sum_all
For j = 1 to 150
pi and
p j if I_score ( pi )  I_score( p j ) then pi can be the leader
of
pj .
3.1 Proposed Model
Figure 2 shows the architecture of system proposed .
Extracting
Names of
Distinct user
Enron Email
data
Pre processing
Socio matrix
January
February
March ...
De cember
Socio matrix for the twelve months
Graph of twelve months as shown in fig 3
Graph over the twelve
Fig 2: Model for extracting graph over months
First the Individual score of each member calculated with the
help of Page Rank algorithm described following.
Input: Social Network G.
Output: Individual score of each person
Steps:
1.
2.
3.
4.
5.
d = 0.85
epi = 0.01
del = 0,delta[150]
del_prev=0
float sum1 = 0
21.
22.
23.
24.
25
26.
27.
sum2 += R[1][j] - R[0][j]
del = sum2
for j = 1to 150
R[0][j] = R[1][j]
while(epi > del && del_prev!=del)
for i=0 to150
R[0][i]=10000.0*R[0][i]
In I_score (), d is damping factor generally assumed to be
around 0.85. R is vector stores the individual score of each
person. N denotes the total number f persons in the network.
Sum_all defines the sum of the weight of all edges. Fun(
3
defines
W ( p
p j M ( pi )
ij
).
Pk ( j )
L( p j )
weights of all edges linked with
edges linking
p i and
where
L( p j ) sum
of
p j and W ( pij ) weights of
pj.
4. PRACTICAL IMPLEMENTATION
AND ANALYSIS
In this section, we evaluate the capability of the proposed
approach on discovering organizational structure and to
exploring evolution of organizational structure in a dynamic
social network and link between individuals. We performed
the experiments on Enron email data set. Email
communication data has become a practical source for
research in network analysis like social network. Mostly the
experiments are carried out on the artificial data due to the
non-availability of real life communication data. The Enron
email data set [27] has become a benchmark for this sort of
research domain in network analysis. This data set was made
public and posted on web by the Federal Energy Regulatory
Commission during its investigation for fraud happened in
company, in order to make it test bed for validating and
testing the efficacy of methodologies developed for counterterrorism, fraud detection and link analysis. Data is about 150
users communication mostly senior managers organized into
folder. But this set has still lots of issues like integrity issue
and duplicate messages issue. For preprocessing, first the
names of distinct users were extracted and duplicated ids were
neglected.
The proposed approach is implemented in DEV C++. The
experiments were conducted on a 2.1GHz PC with
Core(TM)2 Dual- Core Pentium 4 processor with 2 GB RAM.
On examining the results in figure 3, we analyzed the number
of grouping in the month of May was maximum. Figure 2
shows that Jim was the person who headed the group from Jan
to Feb. Monique was leader throughout the months form Jan
to April.
Fig 3: Graph over the months columns
4
[8] C. Chekuri, A. Goldberg, D. Karger, M. Levin and C.
Stein. Experimental study of minimum cut algorithms. In
Proc. 8th SAIM Syposium on Discreet Algorithm, P324333, 1997.
[9] Andrew Y. Wu, et al., Mining scale-free networks using
geodesic clustering, In Proc. KDD’04, pages 719-724,
2004.
[10] M.E.J Newman and M. Girvan - Finding and evaluating
community structure in networks, Phys. Rev. E 69,
026113 (2004).
[11] M.E.J Newman - Modularity and community structure in
networks, PNAS June 6, 2006 vol. 103 no. 23 85778582.
[12] A.
Moitra.
Approximation
algorithms
for
multicommoditytype
problems
with
guarantees
independent of the graph size.FOCS, pages 3–12, 2009.
[13] Ding Zhou, Isaac Councill, Hongyuan Zha, C. Lee Giles.
Discovering Temporal Communities from Social
Network Documents. In Proc. of ICDM’07, pages 745
750, 2007.
[14] C.
5. CONCLUSION
In this paper we introduced a concept of hierarchy of positions
in group by taking temporal interaction data of twelve months
in organization that shows how the position and link of group
members changes when people joining and leaving the group.
This hierarchal group interaction is significant to facilitate
link analysis of individuals with the time period In the future
we are planning to improve the integrity issues of
preprocessed Enron data results of experiments.
Tantipathananandh, Tanya Berger-Wolf, David
Kempe. A Framework for Community Identification in
Dynamic Social Networks. In Proc. of KDD’07, pages
717 - 726, 2007.
[15] Mueller, C., Meuthrath, B., Baumgraß, A.: Analyzing
Wiki-based networks to improve knowledge processes in
organizations. Accepted contribution in J.UCS Journal of
Universal Computer Science, 2008.
[16] J. P. Scott. Social Network Analysis: A Handbook.
SAGE Publications, January 2000.
6. REFERENCES
[1] Hanneman, Robert A. and Mark Riddle. Introduction to
social network methods. Riverside, CA: University of
California
(published
in
digital
form
http://faculty.ucr.edu/~hanneman/, 2005.
[2] J. Baumes, M. Goldberg, M. Magdon-Ismail, and W.
Wallace. Discovering hidden Inform., 2004
[3] J. Tyler, D. Wilkinson, and B. Huberman. Email
asspectroscopy: Automated discovery of community
structure within organizations. Proc. 1st Intl. Conf. on
Comm. And Tech., 2003.
[4] M.
Kretzschmar and M. Morris. Measures of
concurrency innetworks and the spread of infectious
disease. Math. Biosci.133:165–195, 1996.
[5] J. Baumes, M. Goldberg, M. Magdon-Ismail, and W.
Wallace. Discovering hidden groups in communication
networks. Proc. 2nd NSF/NIJ Symp. on Intel. and
Security Inform., 2004.
[6] T. Y. Berger-Wolf and J. Saia. A framework for analysis
of dynamic social networks. In Proc. KDD’06, 523–528,
2006.
[7] Santo Fortunato and Claudio Castellano - Community
Structure in Graphs,chapter of Springer’s Encyclopedia
of Complexity and System Science (2008).
5