* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide
Network tap wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Computer network wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Routing in delay-tolerant networking wikipedia , lookup
네트워크의 과학, 복잡한세상을 바라보는 새로운 시각 정하웅 KAIST, 물리학과 Albert László Barabási, Réka Albert (Univ. of Notre Dame) Zoltán N. Oltvai (Northwestern Univ.) www.nd.edu/~networks Austin Powers: The spy who shagged me Let’s make it legal Robert Wagner Wild Things What Price Glory Barry Norton A Few Good Man Monsieur Verdoux What is Complexity? A popular paradigm: Simple systems display complex behavior non-linear systems chaos fractals 3 Body Problem Earth( ) Jupiter ( ) Sun ( ) Main Entry: 1com·plex Function: noun Etymology: Late Latin complexus totality, from Latin, embrace, from complecti Date: 1643 1 : a whole made up of complicated or interrelated parts Society Internet Human body : NETWORKS! chemical network Society Nodes: individuals Links: social relationship (family/work/friendship/etc.) S. Milgram (1967) John Guare “Six Degrees of Separation” Social networks: Many individuals with diverse social interactions between them. 9-11 Terror Hijacker’s Network Communication networks The Earth is developing an electronic nervous system, a network with diverse nodes and links are -computers -phone lines -routers -TV cables -satellites -EM waves Communication networks: Many non-identical components with diverse connections between them. GENOME protein-gene interactions PROTEOME protein-protein interactions METABOLISM Bio-chemical reactions Citrate Cycle Complex systems Made of many non-identical elements connected by diverse interactions. NETWORK Erdös-Rényi model (1960) Connect with probability p p=1/6 N=10 k ~ 1.5 Pál Erdös (1913-1996) Degree Distribution P(k) : prob. that a certain node will have k links Poisson distribution - Random - Democratic ARE COMPLEX NETWORKS REALLY RANDOM? To test this: We need to pragmatically investigate the topology of large real networks. World Wide Web Nodes: WWW documents Links: URL links 800 million documents (S. Lawrence, Nature,1999) ROBOT: collects all URL’s found in a document and follows them recursively R. Albert, H. Jeong, A-L Barabasi, Nature, 401 130 (1999) What did we expect? k ~ 6 P(k=500) ~ 10-99 NWWW ~ 109 N(k=500)~10-90 We find: out= 2.45 in = 2.1 P(k=500) ~ 10-6 NWWW ~ 109 N(k=500) ~ 103 Pout(k) ~ k-out Pin(k) ~ k- in What does it mean? Poisson distribution Exponential Network Power-law distribution Scale-free Network INTERNET BACKBONE Nodes: computers, routers Links: physical lines (Faloutsos, Faloutsos and Faloutsos, 1999) SEX-Web Nodes: people (females; males) Links: sexual relationships 4781 Swedes; 18-74; 59% response rate. (Liljeros et al. Nature 2001) ACTOR CONNECTIVITIES Nodes: actors Links: cast jointly Days of Thunder (1990) Far and Away (1992) Eyes Wide Shut (1999) N = 212,250 actors k = 28.78 P(k) ~k- =2.3 SCIENCE CITATION INDEX Nodes: papers Links: citations 1736 PRL papers (1988) P(k) ~k- ( = 3) (S. Redner, 1998) Econo network Nodes: individual, company, country... Links: economic activities Pi(t) : stock price at time t , Yi ln Pi (t ) ln Pi (t 1) GENOME protein-gene interactions PROTEOME protein-protein interactions METABOLISM Bio-chemical reactions Citrate Cycle protein-protein network (yeast) Jeong et al. Nature 411, 41 (2001) p53 network (mammals) metabolic network (E. coli) Jeong et al. Nature 407, 651 (2000). Other Examples of Scale-Free Network Email network Nodes: individual email address Links: email communication Phone-call networks Nodes: phone-number Links: completed phone call (Abello et al, 1999) Networks in linguistics Nodes: words Links: appear next or one word apart from each other (Ferrer et al, 2001) Networks in Electronic auction (eBay) Nodes: agents, individuals Links: bids for the same item (H. Jeong et al, 2001) Most real world networks have the same internal structure: Scale-free networks How? Why? ORIGIN OF SCALE-FREE NETWORKS (1) The number of nodes (N) is NOT fixed. Networks continuously expand by the addition of new nodes Examples: WWW : addition of new documents Citation : publication of new papers (2) The attachment is NOT uniform. A node is linked with higher probability to a node that already has a large number of links. Examples : WWW : new documents link to well known sites (CNN, YAHOO, NewYork Times, etc) Citation : well cited papers are more likely to be cited again (1) GROWTH : Scale-Free Model At every timestep we add a new node with m edges (connected to the nodes already present in the system). (2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the connectivity ki of that node ki ( ki ) jk j P(k) ~k-3 A.-L.Barabási, R. Albert, Science 286, 509 (1999) Continuum Theory ki ki ki ( ki ) A , with initial condition ki (ti ) m t k 2 t j j t ki (t ) m ti m 2t m 2t m 2t P(ki (t ) k ) Pt (ti 2 ) 1 Pt (ti 2 ) 1 2 k k k (m0 t ) P(ki (t ) k ) 2m 2t 1 3 P(k ) ~ k k mo t k 3 γ=3 A.-L.Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999) Most real world networks have the same internal structure: Scale-free networks How? Why? Efficiency of resource usage With same number of nodes and links (same amount of resources), construct scale-free and exponential networks. Diameter (Scale-free) < Diameter (Exponential) (Diameter : average distance between two nodes) Scale-free network is more efficient than exponential network! Robustness Complex systems maintain their basic functions even under errors and failures (cell mutations; Internet router breakdowns) 1 Relative size of largest cluster S fc 0 1 Fraction of removed nodes, f node failure Robustness of scale-free networks Extreme failure tolerance Failures Topological error tolerance 1 3 : fc=1 S 0 Attacks (R. Cohen et al PRL, 2000) fc f fc 1 Low survivability under attacks! Achilles’ Heel of complex network failure attack Internet Protein network R. Albert, H. Jeong, A.L. Barabasi, Nature 406 378 (2000) Bio-informatics vs. Networks Human Genome Project completed! Inventory of all genes Only list of proteins Post Genome Era • Transcriptomics • Proteomics Needs information on interactions “Human Network Project” GENOME protein-gene interactions PROTEOME protein-protein interactions METABOLISM Bio-chemical reactions Citrate Cycle METABOLISM Bio-chemical reactions Citrate Cycle Nodes: chemicals (proteins, Metabolic Networks substrates) Links: bio-chem. reaction Metabolic networks Archaea Bacteria Eukaryotes Organisms from all three domains of life are scale-free networks! H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, 407 651 (2000) Properties of metabolic networks Average distances are independent of organisms! by making more links between nodes. based on “design principles” of the cell through evolution. cf. Other scale-free network: D~log(N) GENOME protein-gene interactions PROTEOME protein-protein interactions METABOLISM Bio-chemical reactions Citrate Cycle PROTEOME protein-protein interactions Yeast protein network Nodes: proteins Links: physical interactions (binding) P. Uetz, et al. Nature 403, 623-7 (2000). Topology of the protein network k k0 P(k ) ~ (k k0 ) exp( ) k Power-law with exponential cut-off : (physical limitation) H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature, 411, 41 (2001) [3280 protein with 4434 interactions] P(k) ~ k- Uetz 2.4 Ito 2.3 www.nd.edu/~networks/cell While there is only 13% overlap between the Uetz et al and Ito et al data, their large-scale topology is identical. Ito et al, PNAS 97, 1143 (2000); PNAS 98, 4569 (2001). Yeast protein network - lethality and topological position - Highly connected proteins are more essential (lethal) than less connected proteins. GENOME protein-gene interactions PROTEOME protein-protein interactions METABOLISM Bio-chemical reactions Citrate Cycle protein-gene interactions Nature 408 307 (2000) “… since 1989 … there have been over 17,000 publications centered on p53 … this work has led to considerable confusion and controversy.” … “One way to understand the p53 network is to compare it to the Internet. The cell, like the Internet, appears to be a ‘scale-free network’.” p53 network (mammals) Complexity Network Science collaboration WWW Food Web Scale-free network Citation pattern Internet Cell UNCOVERING ORDER HIDDEN WITHIN COMPLEX SYSTEMS Conclusions • Scale-free networks are ubiquitous • Effective use of resources • Robust against errors and random failures • Vulnerable under attack • New way of understanding complexity • Applicable to many areas Outlook • Directed Networks • Weighted Networks • Dynamics on networks Austin Powers: The spy who shagged me Let’s make it legal Robert Wagner Wild Things What Price Glory Barry Norton A Few Good Man Monsieur Verdoux Bonus: Why Kevin Bacon? Kevin Bacon No. of movies : 46 No. of actors : 1811 Average separation: 2.79 Measure the average distance between Kevin Bacon and all other actors. Is Kevin Bacon the most connected actor? NO! Rod Steiger Donald Pleasence Martin Sheen Christopher Lee Robert Mitchum Charlton Heston Eddie Albert Robert Vaughn Donald Sutherland John Gielgud Anthony Quinn James Earl Jones Average distance 2.537527 2.542376 2.551210 2.552497 2.557181 2.566284 2.567036 2.570193 2.577880 2.578980 2.579750 2.584440 # of movies 112 180 136 201 136 104 112 126 107 122 146 112 # of links 2562 2874 3501 2993 2905 2552 3333 2761 2865 2942 2978 3787 Kevin Bacon Kevin Bacon 2.786981 2.786981 46 46 1811 1811 Rank Name 1 2 3 4 5 6 7 8 9 10 11 12 … 876 876 … #1 Rod Steiger #876 Kevin Bacon Donald #2 Pleasence #3 Martin Sheen References • R. Albert, H. Jeong, A.L. Barabasi, Nature 401 130 (1999). • R. Albert, A.L. Barabasi, Science 286 509 (1999). • A.L. Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999) • R. Albert, H. Jeong, A.L. Barabasi, Nature 406 378 (2000). • H.Jeong, B.Tombor, R.Albert, Z.N.Oltvai, A.L.Barabasi, Nature 407 651 (2000). • H. Jeong, S.P. Mason, A.L. Barabasi, Z.N. Oltvai, Nature 411 41 (2001). • J. Podani, Z.N. Oltvai, H. Jeong, B. Tombor, A.L. Barabasi, Nature Genetics (2001). • S.H. Yook, H. Jeong, A.-L. Barabasi, PNAS (2002) URL: http://stat.kaist.ac.kr Email: Hawoong Jeong [email protected] Paul Erdös has a Bacon number of 4! Paul Erdös was in N is a number (1993) with Gene Patterson Gene Patterson was in Box of Moonlight (1996) with John Turturro John Turturro was in Gung Ho (1986) with Clint Howard Clint Howard was in My Dog Skip (2000) with Kevin Bacon Information Tech. vs. Networks Relative Coverage of Search Engines (83% commercial content excluded Feb. 1999) NON-EQUAL ACCESS OF INFORMATION Based on network structure, such as number of incoming links & popularity, find better information! Designing better Internet! Spatial Distribution of Routers Fractal set Box counting: N(l) No. of boxes of size l that contain routers N(l) ~ l -Df Df=1.5 인터넷의 미래 The Grid Distributed, high performance(parallel) computing and data handling infrastructure NT ET 대형연구 장비 BT 연구망 (Gbps) Supercomputer 첨단대형 연구장비 원격제어 및 감시 가상실험 시스템 N*Grid 선박 CT 첨단연구장비 학제간공동 연구 기계 고성능 클러스터 대규모 데이터 (BT, NT, etc.) 반도체 데이터 그리드와 컴퓨터 그리드를 이용한 대규모 분산컴퓨팅 가능 ST 자동차 기상 (PC 등 컴퓨터의 유휴자원을 이용, 슈퍼컴퓨터의 수천-수만배 이상의 계산능력을 가짐.) 분산컴퓨터의 미래 분산컴퓨터 (Distributed Computer) 사용이 허가되어진 컴퓨터들에게 커다란 작업을 분할하여 계산 후 결과를 수집, 분석하는 방법. 단점: 컴퓨터 소유주가 자원사용을 허용해야함 (참여율 저조) 좀더 적극적인 접근 방법 스텔스-컴퓨터 (Stealth Computer) 분산 컴퓨터가 소유자의 허가를 받아야하는 것과 달리 소유자의 허가와 상관없이 네트워크에 연결된 모든 컴퓨터에 기본적으로 제공되는 표준 통신프로토콜을 이용, 모든 컴퓨터에 분산컴퓨팅 기법을 적용하는 방법. 문제점: 인터넷에 연결된 기본자원의 소유개념 등 여러가지 새로운 윤리적인 문제 발생. HTTP 프로토콜과 TCP/IP checksum을 이용한 스텔스 컴퓨팅의 한 예. Can we stop AIDS? Because of scale-free structure of the network it is hard to stop AIDS with standard methods. (There is no threshold.) By treating hubs (or at least, hub-like nodes) with higher probability, we can restore finite threshold. Food Web Nodes: trophic species Links: trophic interactions R.J. Williams, N.D. Martinez Nature (2000) SCIENCE COAUTHORSHIP Nodes: scientist (authors) Links: write paper together (Newman, 2000, H. Jeong et al 2001) Integrated drug target identification based on genomic data Cy5 Cy3 Taxonomy using networks Based on Topology of metabolic networks A: Archaea B: Bacteria E: Eukaryotes Taxonomy J. Podani et al, Nature Genetics (2001) Can we stop AIDS? Because of scale-free structure of the network it is hard to stop AIDS with standard methods. (There is no threshold.) By treating hubs (or at least, hub-like nodes) with higher probability, we can restore finite threshold. Modeling the Internet’s large scale topology S.H. Yook, H. Jeong, A.-L. Barabasi [cond-mat/0107417] INTERNET BACKBONE Nodes: computers, routers Links: physical lines (Faloutsos, Faloutsos and Faloutsos, 1999) Preferential Attachment • Compare maps taken at different times (Dt = 6 months) • Measure Dk(k), increase in No. of links for a node with k links Preferential Attachment: Dk(k) ~ k Spatial Distributions Router density Population density Spatial Distribution of Routers Fractal set Box counting: N(l) No. of boxes of size l that contain routers N(l) ~ l -Df Df=1.5 Distance dep. INTERNET (i, j ) ~ kj a d ij s # of links distance between two nodes N(l) ~ l-Df Df=1.5 Dk(k) ~ ka a=1 P(d) ~ d-s s=1 Good agreement with real Internet Data! Traditional modeling: Network as a static graph Given a network with N nodes and L links Create a graph with statistically identical topology RESULT: model the static network topology PROBLEM: real networks are not static! • new nodes Evolve through : local processes • new links between existing nodes • nodes removed • links removed Real networks are dynamical systems! Erdös-Rényi model (1960) Connect with probability p p=1/6 N=10 k ~ 1.5 - Random - Democratic Pál Erdös (1913-1996) Poisson distribution Watts-Strogatz Clustering: My friends will know each other with high probability! Probability to be connected C »p # of links between 1,2,…n neighbors C= n(n-1)/2 N nodes forms a regular lattice. With probability p, each edge is rewired randomly. (Nature 393, 440 (1998)) Small-World Phenomena C(p) : clustering coeff. L(p) : average path length Networks are clustered [large C(p)] but have a small characteristic path length [small L(p)]. Evolving networks OBJECTIVE: capture the network dynamics • identify the processes that contribute to the network topology METHOD : • measure their frequency from real data • develop dynamical models that capture these processes BONUS: get the topology correctly. (1) GROWTH : Scale-free model At every timestep we add a new node with m edges (connected to the nodes already present in the system). (2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the connectivity ki of that node ki ( ki ) jk j P(k) ~k-3 A.-L.Barabási, R. Albert, Science 286, 509 (1999) Model - A (Without Growth) Model - B (Without Preferential Attachment) For given fixed number of nodes N, Number of nodes are increasing, links are added with preferential attachment. but links are added with random probability. Temporarily shows scale-free behavior, but ends up to exponential network Poisson distribution (exponential network) Preferential Attachment ki Dki ( ki ) ~ t Dt For given Dt,Dk (k) k vs. Dk : increment of # of links during unit time Citation network Internet Universality? WWW (in) = 2.1 WWW (out) = 2.45 Internet (domain) Internet (router) Actor Citation index Powergrid = 2.2 = 2.5 = 2.3 = 3 = 4 Extended Model • prob. p : internal links • prob. q : link deletion • prob. 1-p-q : add node P(k) ~ (k+(p,q,m))-(p,q,m) [1,) p=0.937 m=1 = 31.68 = 3.07 Actor network • Predict the network topology from microscopic processes with parameters (p,q,m) • Scaling but no universality Other Models • Model with non-linear preferential attachment : (k) ~ ka P(k) ~ no scaling for a1 a<1 : stretch-exponential a>1 : no-scaling (a>2 : “winner-takes-all”) (Krapivsky et al (2000).) • Model with initial attractiveness : (k) ~ A+ka P(k) ~ k- where =2 + A/m (Dorogovtsev et al (2000).) • Model with aging : each node has lifetime node cannot get links after finite time. (actor) P(k) : power-law with exponential cutoff (Amaral et al (2000).) Other Models (continued) • Model with limitation : each node has capacity limit. node cannot get links after finite # of links.(internet) P(k) : power-law with exponential cutoff (Amaral et al (2000).) • Model with fitness (i) : each node has an intrinsic ability to get links at the expense of other nodes : (ki) ~ iki C C m 1 P(k ) ( ) ( ) k (Bianconi et al (2000).) Other Models (continued) • Alternative mechanisms for preferential attachment: 1. Copying mechanism: (concept from WWW) : after adding new node, it is connected to with prob. p: randomly selected node with prob. 1-p: copied target from selected ‘prototye node’ 2. Attaching to edges: : after adding new node, instead of selecting target node, select target edges and connect new node to end point of the edges. Whole cellular network References • R. Albert, H. Jeong, A.L. Barabasi, Nature 401 130 (1999). • R. Albert, A.L. Barabasi, Science 286 509 (1999). • A.L. Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999) • R. Albert, H. Jeong, A.L. Barabasi, Nature 406 378 (2000). • H.Jeong, B.Tombor, R.Albert, Z.N.Oltvai, A.L.Barabasi, Nature 407 651 (2000). • H. Jeong, S.P. Mason, A.L. Barabasi, Z.N. Oltvai, Nature 411 41 (2001). • J. Podani, Z.N. Oltvai, H. Jeong, B. Tombor, A.L. Barabasi, Nature Genetics (2001). URL: http://www.nd.edu/~networks Diameter of WWW: 19 degrees of separation 3 l15=2 [125] 6 1 l17=4 [1346 7] 4 5 2 7 … < l > = ?? Finite size scaling: create a network with N nodes with Pin(k) and Pout(k) < l > = 0.35 + 2.06 log(N) 19 degrees of separation R. Albert et al Nature (99) nd.edu <l> based on 800 million webpages [S. Lawrence et al Nature (99)] IBM A. Broder et al WWW9 (00) Parameter (s,a,Df) dependence of P(k) and P(d) Robustness of scale-free networks attack Internet failure N~6209 <k> ~ 3.93 Fraction of nodes eliminated attack WWW failure Fraction of nodes eliminated N~325,729 <k> ~ 4.6 Taxonomy by using Cellular networks UPGMA (unweighted pair grouping method with arithmetic means) Three domains of Life dij = Jacard index or rank order A 1 d A3 2 3 d13 d 23 2