* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Resolution Limit in Community Detection
Survey
Document related concepts
Transcript
Yilin Shen 02/18/2009 1 Definition: Given a network (Graph G=(V,E)), A COMMUNITY is a subgraph of a network whose nodes are more tightly connected with each other than with nodes outside the subgraph. Applications: ◦ ◦ ◦ ◦ Social Networks Biochemical networks Internet Food webs 2 Definition: A quantitative measure to essentially compare the number of links inside a given module with the expected value for a randomized graph of the same size and degree sequence. Objective: Maximize the Modularity: The number of links inside a given module The expected value for a randomized graph of the same size and degree sequence 3 2 d ls d s 1 Q ls L s 1 4 L s 1 L 2 L m 2 s m ls : # links inside module s L : # links in the network ds : The total degree of the nodes in module s d s2 : Expected # of links in module s 2L 4 ds = 2L Probability that a stub, randomly selected, ends in module s 5 ds 2L ds ds = 2L 2L ds 2L Probability that the link is internal to module s ds ds d s2 L = 2L 2L 4L Expected number of links in module s 6 ls , a l out s als , L Subgraph S is a module 2 ls d s 0 L L Since ds 2ls lsout 2ls als 2 a ls ls a 2 ls 4L 0 ls 2 L L a 2 2 7 ls , a lsout als , L Consider “weak” definition for a community a 2 lsout 2ls d sout d sin L ls , a 2 4 L 4L Since a 2 L, 2 4 a 2 4L L Therefore for each ls , ls 2 4 a 2 holds. 8 A network made of m identical complete graphs (or ‘cliques’) (actually the m connected components are not necessarily cliques), disjoint from each other. l 2l 2 1 Q m 1 m L 2 L which converges to 1 when the number of cliques goes to infinity. 9 A connected network with N nodes and L links which maximizes modularity. ls 2ls 2 2 Q s 1 L 2 L where m m l s 1 s Lm 10 For fixed m, we easily know that Q reaches maximum when ls l L / m 1 m 1 QM m, L 1 L m For variable m, dQM m, L dm 1 1 2 * * 2 m L QM m , L 1 L m L The corresponding number of links in each module is l L 1 . 11 The crucial point here is that modularity seems to have some intrinsic scale of order L , which constrains the number and the size of the modules. For a given total number of nodes and links we could build many more than L modules, but the corresponding network would be less “modular”, namely with a value of the modularity lower than the maximum 12 Since M1 and M2 are constructed modules, we have a1 b1 2, a2 b2 2, l1 , l2 L / 4 13 • • Let’s consider the following case QA : M1 and M2 are separate modules QB : M1 and M2 is a single module Q QB QA 2 La1l1 a1 b1 2 a2 b2 2 l1l2 2L2 Since both M1 and M2 are modules by construction, we need Q QB QA 0 That is, 2La1 l2 a1 b1 2 a2 b2 2 14 Now let’s see how it contradicts the constructed modules M1 and M2 We consider the following two scenarios: ( l1 l2 l ) • • The two modules have a perfect balance between internal and external degree (a1+b1=2, a2+b2=2), so they are on the edge between being or not being communities, in the weak sense. The two modules have the smallest possible external degree, which means that there is a single link connecting them to the rest of the network and only one link connecting each other (a1=a2=b1=b2=1/l). 15 When a1 a2 2 and b1 0, b2 0 , the right side of l2 2 La1 a1 b1 2 a2 b2 2 can reach the maximum value lRmax L / 4 In this case, l lRmax L / 4 may happen. 16 a1=a2=b1=b2=1/l l l min R L 2 17 2 1 Qsin gle 1 m m 1 2 n 1 2 Qpairs 1 m m 1 2 n Qsin gle Qpairs m m 1 2 n 18 For example, p=5, m=20 The maximal modularity of the network corresponds to the partition in which the two smaller cliques are merged 19 Any two interconnected modules, fuzzy or not, are merged if the number of links inside each of them min does not exceed lR . If modularity optimization finds a module S with lS internal links, it may be that the latter is a combination of two or more smaller communities. lS 2lRmin 2L The upper limit of lS can be much larger than 2L , if the substructures are on average more interconnected with each other. 20 21 Modularity is actually not consistent with its optimization which may favor network partitions with groups of modules combined into larger communities. The resolution limit of modularity does not rely on particular network structures, but only on the comparison between the sizes of interconnected communities and that of the whole network, where the sizes are measured by the number of links. An increase of the number of modules does not necessarily correspond to an increase in modularity because the modules would be smaller and so would be each term of the sum. Quality functions are still helpful, but their role should be probably limited to the comparison of partitions with the same number of modules. 22