Download Resolution Limit in Community Detection

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cracking of wireless networks wikipedia , lookup

Distributed firewall wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Computer network wikipedia , lookup

Network tap wikipedia , lookup

Airborne Networking wikipedia , lookup

Transcript
Yilin Shen
02/18/2009
1

Definition:
Given a network (Graph G=(V,E)), A COMMUNITY is a
subgraph of a network whose nodes are more tightly
connected with each other than with nodes outside the
subgraph.

Applications:
◦
◦
◦
◦
Social Networks
Biochemical networks
Internet
Food webs
2

Definition:
A quantitative measure to essentially compare the number of
links inside a given module with the expected value for a
randomized graph of the same size and degree sequence.

Objective:
Maximize the Modularity:
The number of links inside a given module
The expected value for a randomized graph of the same size and
degree sequence
3
2

d 
ls  d s  
1 
Q    ls 

   

L s 1 
4 L  s 1  L  2 L  
m
2
s
m
ls : # links inside module s
L : # links in the network
ds : The total degree of the nodes in module s
d s2
: Expected # of links in module s
2L
4
ds
=
2L
Probability that a stub, randomly selected,
ends in module s
5
ds
2L
ds ds
 =
2L 2L
ds
2L
Probability that the link is internal to
module s
ds ds
d s2
 L  =
2L 2L
4L
Expected number of links in
module s
6

ls , a l
out
s

 als , L
Subgraph S is a module
2
ls  d s 

0

L  L 
Since ds  2ls  lsout  2ls  als   2  a  ls
ls   a  2  ls 
4L

  0  ls 
2
L 
L
 a  2

2
7


ls , a lsout  als , L
Consider “weak” definition for a community
a  2  lsout  2ls  d sout  d sin

L
ls  , a  2
4
L
4L
Since a  2  
L,
2
4  a  2
4L
L
Therefore for each ls  , ls 
2
4
 a  2
holds.
8
A network made of m identical complete
graphs (or ‘cliques’) (actually the m connected
components are not necessarily cliques),
disjoint from each other.
 l  2l 2 
1
Q  m 
   1
m
 L  2 L  
which converges to 1 when the number of
cliques goes to infinity.
9
A connected network with N nodes and L
links which maximizes modularity.
 ls  2ls  2 2 
Q   


s 1 
 L  2 L  
where
m
m
l
s 1
s
 Lm
10
For fixed m, we easily know that Q reaches
maximum when ls  l  L / m  1
m 1
QM  m, L   1  
L m
For variable m,
dQM  m, L 
dm


1 1
2
*
*
   2  m  L  QM m , L  1 
L m
L
The corresponding number of links in each
module is l  L  1 .
11
The crucial point here is that modularity
seems to have some intrinsic scale of order
L , which constrains the number and the
size of the modules. For a given total number
of nodes and links we could build many more
than L modules, but the corresponding
network would be less “modular”, namely
with a value of the modularity lower than the
maximum
12
Since M1 and M2 are constructed modules, we have
a1  b1  2, a2  b2  2, l1 , l2  L / 4
13
•
•
Let’s consider the following case
QA : M1 and M2 are separate modules
QB : M1 and M2 is a single module
Q  QB  QA   2 La1l1   a1  b1  2  a2  b2  2  l1l2 
 
2L2
Since both M1 and M2 are modules by
construction, we need Q  QB  QA  0
That is,
2La1
l2 
 a1  b1  2 a2  b2  2
14
Now let’s see how it contradicts
the constructed modules M1 and M2
We consider the following two scenarios: ( l1  l2  l )
•
•
The two modules have a perfect balance between internal and
external degree (a1+b1=2, a2+b2=2), so they are on the edge
between being or not being communities, in the weak sense.
The two modules have the smallest possible external degree,
which means that there is a single link connecting them to
the rest of the network and only one link connecting each
other (a1=a2=b1=b2=1/l).
15
When a1  a2  2 and b1  0, b2  0 , the right side of
l2 
2 La1
 a1  b1  2  a2  b2  2 
can reach the maximum value lRmax  L / 4
In this case, l  lRmax  L / 4 may happen.
16
a1=a2=b1=b2=1/l
l l
min
R

L
2
17
2
1
Qsin gle  1 

m  m  1  2 n
1
2
Qpairs  1 

m  m  1  2 n
Qsin gle  Qpairs  m  m 1  2  n
18
For example, p=5, m=20
The maximal modularity of the network
corresponds to the partition in which the two
smaller cliques are merged
19
Any two interconnected modules, fuzzy or not, are
merged if the number
of links inside each of them
min
does not exceed lR .
If modularity optimization finds a module S with lS
internal links, it may be that the latter is a
combination of two or more smaller communities.
lS  2lRmin  2L
The upper limit of lS can be much larger than 2L ,
if the substructures are on average more
interconnected with each other.
20
21




Modularity is actually not consistent with its optimization which
may favor network partitions with groups of modules combined
into larger communities.
The resolution limit of modularity does not rely on particular
network structures, but only on the comparison between the
sizes of interconnected communities and that of the whole
network, where the sizes are measured by the number of links.
An increase of the number of modules does not necessarily
correspond to an increase in modularity because the modules
would be smaller and so would be each term of the sum.
Quality functions are still helpful, but their role should be
probably limited to the comparison of partitions with the same
number of modules.
22