Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
<is web> ISWeb – Information Systems & Semantic Web University of Koblenz ▪ Landau Technical Basics Sergej Sizov Information Retrieval Summer term 2009 <Nr.> of 20 Technical Basics: Outline Probability Theory & Stochastics Events, Probabilities, Random Variables, Distributions, Basics from Information Theory, Markov chains Linear Algebra Vectors and matrices, eigenvectors, common decompositions Network Analysis Properties of social networks Course Information Retrieval Summer Term 2009 Sergej Sizov 2-2 Part 1: Basics from probability theory and stochastics Course Information Retrieval Summer Term 2009 Sergej Sizov 2-3 Basic Probability Theory A probability space is a triple (, E, P) with • a set of elementary events (sample space), • a family E (event space) of subsets of with E which is closed under , , and with a countable number of operands (with finite usually E=2), and • a probability measure P: E [0,1] with P[]=1 and P[i Ai] = i P[Ai] for countably many, pairwise disjoint Ai Properties of P: P[A] + P[A] = 1 P[A B] = P[A] + P[B] – P[A B] P[] = 0 (null/impossible event) P[ ] = 1 (true/certain event) Course Information Retrieval Summer Term 2009 Sergej Sizov 2-4 Independence and Conditional Probabilities Two events A, B of a prob. space are independent if P[A B] = P[A] P[B]. A finite set of events A={A1, ..., An} is independent if for every subset S A the equation P[ Ai ] P[Ai ] Ai S Ai S holds. The conditional probability P[A | B] of A under the P[ A B] condition (hypothesis) B is defined as: P[ A | B] P[ B] Event A is conditionally independent of B given C if P[A | BC] = P[A | C]. Course Information Retrieval Summer Term 2009 Sergej Sizov 2-5 Total Probability and Bayes’ Theorem Total probability theorem: For a partitioning of into events B1, ..., Bn: n P[ A] P[ A| Bi ] P[ Bi ] i 1 P[ A B] P[ B | A] P[ A] Bayes‘ theorem: P[ A | B] P[ B] P[ B] P[A|B] is called posterior probability P[A] is called prior probability Course Information Retrieval Summer Term 2009 Sergej Sizov 2-6 Example: Probabilistic Retrieval with Term Independence Ranking Proportional to Relevance Odds P[ R | d ] sim(d , q) O( R | d ) P[R | d ] odds for relevance (ratio of relevant documents) P[d | R] P[ R] P[d | R] P[R] P[d | R] ~ P[d | R] Bayes‘ theorem P[ X i | R] i P[ X i | R] independence or linked dependence Xi = 1 if d includes i-th term, 0 otherwise P[ Xi | R] sim(d , q)´ log iq P[ Xi | R] log P[ Xi | R] log P[ Xi | R] iq Course Information Retrieval Summer Term 2009 Sergej Sizov 2-7 Example: Naive Bayes classification with Binary Features estimate: P [ d ck | d has X ] P [ d has X | d ck ] P [ d ck ] P [ d has X ] ~ P [ X | d ck ] P [ d ck ] im1 P [ X i | d ck ] P [ d ck ] with feature independence or linked dependence: P [ X | d ck ] P [ X i | d ck ] i P [ X | d ck ] P [ X i | d ck ] im1 pik Xi ( 1 pik )1 Xi pk with empirically estimated pik=P[Xi=1|ck], pk=P[ck] m m pik log P[ck | d ] ~ X i log log (1 pik ) log pk (1 pik ) i 1 i 1 for binary classification with odds rather than probs for simplification Course Information Retrieval Summer Term 2009 Sergej Sizov 2-8 Random Variables A random variable (RV) X on the prob. space (, E, P) is a function X: M with M R s.t. {e | X(e) x} E for all x M (X is measurable). FX: M [0,1] with FX(x) = P[X x] is the (cumulative) distribution function (cdf) of X. With countable set M the function fX: M [0,1] with fX(x) = P[X = x] is called the (probability) density function (pdf) of X; in general fX(x) is F‘X(x). For a random variable X with distribution function F, the inverse function F-1(q) := inf{x | F(x) > q} for q [0,1] is called quantile function of X. (0.5 quantile (50th percentile) is called median) Random variables with countable M are called discrete, otherwise they are called continuous. For discrete random variables the density function is also referred to as the probability mass function. Course Information Retrieval Summer Term 2009 Sergej Sizov 2-9 Important Discrete Distributions • Bernoulli distribution with parameter p: P [ X x ] p x ( 1 p )1 x for x { 0,1} • Uniform distribution over {1, 2, ..., m}: 1 P [ X k ] f X (k ) m for1 k m • Binomial distribution (coin toss n times repeated; X: #heads): n k P [ X k ] f X (k ) p (1 p) nk k • Poisson distribution (with rate ): P[ X k ] f X k (k ) e k! • Geometric distribution (#coin tosses until first head): P [ X k ] f X (k ) (1 p) k p Course Information Retrieval Summer Term 2009 Sergej Sizov 2-10 Important Continuous Distributions • Uniform distribution in the interval [a,b] 1 f X ( x ) ba for a x b ( 0 otherwise ) • Exponential distribution (z.B. time until next event of a Poisson process) with rate = limt0 (# events in t) / t : f X ( x ) e x for x 0 ( 0 otherwise ) x x • Hyperexponential distribution: f X ( x ) p1 e 1 ( 1 p )2 e 2 Course Information Retrieval Summer Term 2009 Sergej Sizov 2-11 Multidimensional (Multivariate) Distributions Let X1, ..., Xm be random variables over the same prob. space with domains dom(X1), ..., dom(Xm). The joint distribution of X1, ..., Xm has a density function f X1,...,X m ( x1 ,...,xm ) with x1dom( X1 ) or ... xmdom( X m ) ... dom( X 1 ) f X1 ,...,X m ( x1 ,..., xm ) 1 f X 1,..., Xm ( x1 ,...,xm ) dxm ...dx1 1 dom( X m ) The marginal distribution of Xi in the joint distribution of X1, ..., Xm has the density function ... ... f X1 ,...,X m ( x1 ,..., xm ) or x1 xi 1 xi 1 ... X1 xm ... f X1 ,...,X m ( x1 ,..., xm ) dxm ... dxi 1 dxi 1 ... dx1 X i 1 X i 1 Course Information Retrieval Xm Summer Term 2009 Sergej Sizov 2-12 Important Multivariate Distributions multinomial distribution (n trials with m-sided dice): n k1 km p1 ... pm P [ X 1 k1 ... X m k m ] f X1 ,...,X m ( k1 ,..., k m ) k1 ... k m n n! : with k1 ... k m k1! ... k m ! multidimensional normal distribution: f X1 ,...,X m ( x ) 1 ( 2 )m 1 ( x )T 1 ( x ) e 2 with covariance matrix with ij := Cov(Xi,Xj) Course Information Retrieval Summer Term 2009 Sergej Sizov 2-13 Moments For a discrete random variable X with density fX E [ X ] k f X (k ) is the expectation value (mean) of X E [ X i ] k i f X (k ) is the i-th moment of X kM kM V [ X ] E [( X E[ X ])2 ] E[ X 2 ] E[ X ]2 is the variance of X For a continuous random variable X with density fX E [ X ] x f X ( x) dx is the expectation value of i E [ X ] xi f X ( x) dx is the i-th moment of X 2 2 2 V [ X ] E [( X E[ X ]) ] E[ X ] E[ X ] X is the variance of X Theorem: Expectation values are additive: E [ X Y ] E[ X ] E[ Y ] (distributions are not) Course Information Retrieval Summer Term 2009 Sergej Sizov 2-14 Important Properties of Expectation and Variance E[aX+b] = aE[X]+b for constants a, b E[X1+X2+...+Xn] = E[X1] + E[X2] + ... + E[Xn] (i.e. expectation values are generally additive, but distributions are not!) Var[aX+b] = a2 Var[X] for constants a, b Var[X1+X2+...+Xn] = Var[X1] + Var[X2] + ... + Var[Xn] if X1, X2, ..., Xn are independent RVs Course Information Retrieval Summer Term 2009 Sergej Sizov 2-15 Correlation of Random Variables Covariance of random variables Xi and Xj:: Cov( Xi, Xj) : E[ ( Xi E[ Xi]) ( Xj E[ Xj]) ] Var( Xi ) Cov( Xi , Xi ) E [ X 2 ] E [ X ] 2 Correlation coefficient of Xi and Xj ( Xi, Xj) : Course Information Retrieval Cov ( Xi, Xj) Var ( Xi) Var ( Xj) Summer Term 2009 Sergej Sizov 2-16 Markov Chains 0.5 0.2 0.8 0: sunny 1: cloudy 0.3 2: rainy 0.5 0.3 0.4 p0 = 0.8 p0 + 0.5 p1 + 0.4 p2 p1 = 0.2 p0 + 0.3 p2 p2 = 0.5 p1 + 0.3 p2 p0 + p1 + p2 = 1 p0 0.657, p1 = 0.2, p2 0.143 state set: finite or infinite state transition prob‘s: pij time: discrete or continuous state prob‘s in step t: pi(t) = P[S(t)=i] Markov property: P[S(t)=i | S(0), ..., S(t-1)] = P[S(t)=i | S(t-1)] interested in stationary state probabilities: p j : lim p(j t ) lim pk( t 1 ) pkj t t k p j pk pkj k guaranteed to exist for irreducible, aperiodic, finite Markov chains Course Information Retrieval Summer Term 2009 Sergej Sizov pj 1 j 2-17 Example: GAFOR Regions for Germany 38 = Neuwieder Becken ! Course Information Retrieval Summer Term 2009 Sergej Sizov 2-18 Markov Chains: Formal Definition A stochastic process is a family of random variables {X(t) | t T}. T is called parameter space, and the domain M of X(t) is called state space. T and M can be discrete or continuous. A stochastic process is called Markov process if for every choice of t1, ..., tn+1 from the parameter space and every choice of x1, ..., xn+1 from the state space the following holds: P [ X ( tn 1 ) xn 1 | X ( t1 ) x1 X ( t2 ) x2 ... X ( tn ) xn ] P [ X ( tn 1 ) xn 1 | X ( tn ) xn ] A Markov process with discrete state space is called Markov chain. A canonical choice of the state space are the natural numbers. Notation for Markov chains with discrete parameter space: Xn rather than X(tn) with n = 0, 1, 2, ... Course Information Retrieval Summer Term 2009 Sergej Sizov 2-19 Properties of Markov Chains with Discrete Parameter Space (1) The Markov chain Xn with discrete parameter space is homogeneous if the transition probabilities pij := P[Xn+1 = j | Xn=i] are independent of n irreducible if every state is reachable from every other state with positive probability: P[ X n j | X 0 i ] 0 for all i, j n 1 aperiodic if every state i has period 1, where the period of i is the greatest common divisor of all (recurrence) values n for which P[ X n i X k i for k 1,..., n 1 | X 0 i ] 0 Course Information Retrieval Summer Term 2009 Sergej Sizov 2-20 Properties of Markov Chains with Discrete Parameter Space (2) The Markov chain Xn with discrete parameter space is positive recurrent if for every state i the recurrence probability is 1 and the mean recurrence time is finite: P[ X n i X k i for k 1,..., n 1 | X 0 i ] 1 n1 n P[ X n i X k i for k 1,..., n 1 | X 0 i ] n1 ergodic if it is homogeneous, irreducible, aperiodic, and positive recurrent. Course Information Retrieval Summer Term 2009 Sergej Sizov 2-21 Results on Markov Chains with Discrete Parameter Space (1) For the n-step transition probabilities pij( n ) : P [ X n j | X 0 i ] the following holds: ( n 1 ) pij( n ) pik pkj with pij( 1 ) : pik k ( n l ) ( l ) pik pkj for 1 l n 1 k (n) P in matrix notation: Pn For the state probabilities after n steps (j n ) : P [ X n j ] the following holds: (j n ) i( 0 ) pij( n ) i in matrix notation: Course Information Retrieval (0) with initial state probabilities i (n) (0) (n) Summer Term 2009 P (ChapmanKolmogorov equation) Sergej Sizov 2-22 Results on Markov Chains with Discrete Parameter Space (2) Every homogeneous, irreducible, aperiodic Markov chain with a finite number of states is positive recurrent and ergodic. For every ergodic Markov chain there exist j : lim (j n ) stationary state probabilities n These are independent of (0) and are the solutions of the following system of linear equations: j i pij for all j i j 1 j in matrix notation: (with 1n row vector ) Course Information Retrieval (balance equations) P 1 1 Summer Term 2009 Sergej Sizov 2-23 Part 2: Basics of Information Theory Course Information Retrieval Summer Term 2009 Sergej Sizov 2-24 Elementary Information Theory Let f(x) be the probability (or relative frequency) of the x-th symbol 1 in some text d. The entropy of the text H ( d ) f ( x ) log2 (or the underlying prob. distribution f) is: f(x) x H(d) is a lower bound for the bits per symbol needed with optimal coding (compression). For two prob. distributions f(x) and g(x) the relative entropy (Kullback-Leibler divergence) of f to g is f(x) D( f g ) : f ( x ) log g( x ) x Relative entropy is a measure for the (dis-)similarity of two probability or frequency distributions. It corresponds to the average number of additional bits needed for coding information (events) with distribution f when using an optimal code for distribution g. The cross entropy of f(x) to g(x) is: H ( f , g ) : H ( f ) D( f g ) f ( x ) log g( x ) x Course Information Retrieval Summer Term 2009 Sergej Sizov 2-25 Compression • Text is sequence of symbols (with specific frequencies) • Symbols can be • letters or other characters from some alphabet • strings of fixed length (e.g. trigrams) • or words, bits, syllables, phrases, etc. Limits of compression: Let pi be the probability (or relative frequency) of the i-th symbol in text d 1 Then the entropy of the text: H (d ) pi log 2 p i i is a lower bound for the average number of bits per symbol in any compression (e.g. Huffman codes) Note: compression schemes such as Ziv-Lempel (used in zip) are better because they consider context beyond single symbols; with appropriately generalized notions of entropy the lower-bound theorem does still hold Course Information Retrieval Summer Term 2009 Sergej Sizov 2-26 The Entropy of Flickr Course Information Retrieval Summer Term 2009 Sergej Sizov 2-27 The Entropy of Flickr (2) Course Information Retrieval Summer Term 2009 Sergej Sizov 2-28 Part 3: Using Vectors and Matrices Course Information Retrieval Summer Term 2009 Sergej Sizov 2-29 IR and Linear Algebra Vector space model revisited.. Terms characterize documents (term-document matrix) But also: documents characterize terms.. User/resource/tag relationships in folksonomies Alternate document/term representation forms also possible: e.g. concept based (using thesaurus like WordNet) e.g. with rich background document corpus (using Wikipedia in ESA) But matrix representation is also frequently used for Representing connections in graph models (adjacency matrix) Representing transitions in stochastic models (probabilistic matrix) .. and many other IR relevant applications and methods Course Information Retrieval Summer Term 2009 Sergej Sizov 2-30 Foundations from Linear Algebra A set S of vectors is called linearly independent if no x S can be written as a linear combination of other vectors in S. The rank of matrix A is the maximal number of linearly independent row or column vectors. A basis of an nn matrix A is a set S of linearly independent row or column vectors such that all rows or columns are linear combinations of vectors from S. A set S of n1 vectors is an orthonormal basis if for all x, y S: x Course Information Retrieval 2 : n 2 Xi 1 y i 1 Summer Term 2009 2 and x y 0 Sergej Sizov 2-31 Eigenvalues and Eigenvectors Let A be a real-valued nn matrix, x a real-valued n1 vector, and a real-valued scalar. Solutions x ≠ 0 and of the equation Ax = x are called an Eigenvector and Eigenvalue of A. Eigenvectors of A are vectors whose direction is preserved by the linear transformation described by A. The Eigenvalues of A are the roots (Nullstellen) of the characteristic polynom f() of A: f ( ) A I 0 with the determinant (developing the i-th row): n (ij) A ( 1 )i j aij A( ij ) where matrix A is derived from A by removing the i-th row and the j-th column j 1 The real-valued nn matrix A is symmetric if aij=aji for all i, j. A is positive definite if for all n1 vectors x 0: xT A x > 0. If A is symmetric then all Eigenvalues of A are real. If A is symmetric and positive definite then all Eigenvalues are positive. Course Information Retrieval Summer Term 2009 Sergej Sizov 2-32 Singular Value Decomposition (SVD) Theorem: Each real-valued mn matrix A with rank r can be decomposed into the form A = U VT with an mr matrix U with orthonormal column vectors, an rr diagonal matrix , and an nr matrix V with orthonormal column vectors. This decomposition is called singular value decomposition and is unique when the elements of are sorted. Theorem: In the singular value decomposition A = U VT of matrix A the matrices U, , and V can be derived as follows: • consists of the singular values of A, i.e. the positive roots of the Eigenvalues of AT A, • the columns of U are the Eigenvectors of A AT, • the columns of V are the Eigenvectors of AT A. Course Information Retrieval Summer Term 2009 Sergej Sizov 2-33 SVD for Regression Theorem: Let A be an mn matrix with rank r, and let Ak = Uk k VkT, where the kk diagonal matrix k contains the k largest singular values of A and the mk matrix Uk and the nk matrix Vk contain the corresponding Eigenvectors from the SVD of A. Among all mn matrices C with rank at most k Ak is the matrix that minimizes the Frobenius norm AC Course Information Retrieval 2 F m n ( Aij Cij )2 i 1 j 1 Summer Term 2009 Sergej Sizov 2-34 SVD and Vector Space Model A ti (term i) .............. Term-document matrix: dj (doc j) ........................ mn Observation: dot product tTi∙tj gives the „similarity“ between ti, tj over the documents. The matrix product AAT contains all these dot products. Element (x,y) (= element (y,x)) contains the dot product tTx∙ty. The matrix ATA contains the dot products between document vectors, and gives their „similarity“ over terms: element (m,n) (=element (n,m)) = dTm∙dn SVD Theorem: In the singular value decomposition A = U VT of matrix A the matrices U, , and V can be derived as follows: • consists of the singular values of A, i.e. the positive roots of the Eigenvalues of ATA, • the columns of U are the Eigenvectors of AAT, • the columns of V are the Eigenvectors of ATA. Course Information Retrieval Summer Term 2009 Sergej Sizov 2-35 Key Idea of Latent Concept Models Objective: Transformation of document vectors from high-dimensional term vector space into lower-dimensional topic vector space with • exploitation of term correlations (e.g. „Web“ and „Internet“ frequently occur in together) • implicit differentiation of polysems that exhibit different term correlations for different meanings (e.g. „Java“ with „Library“ vs. „Java“ with „Kona Blend“ vs. „Java“ with „Borneo“) mathematically: given: m terms, n docs (usually n > m) and a mn term-document similarity matrix A, needed: largely similarity-preserving mapping of column vectors of A into k-dimensional vector space (k << m) for given k Course Information Retrieval Summer Term 2009 Sergej Sizov 2-36 Latent Semantic Indexing (LSI) [Deerwester et al. 1990] A is the mn term-document matrix. Then: • U and Uk are the mr and mk term-topic similarity matrices, • V and Vk are the nr and nk document-topic similarity matrices, • AAT and AkAkT are the term-term similarity matrices, • ATA and AkTAk are the document-document similarity matrices ........................ = k ........... ........ mn mn mr mk VkTT 1 00 0 k 0 r rr kk mapping of m1 vectors into latent-topic space: doc j ......... ....... term i Uk .............. A latent topic t .............. doc j ...................... ...................... latent topic t rn kn dj q U k T d j : d j ' U k T q : q' scalar-product similarity in latent-topic space: dj‘Tq‘ = ((kVkT)*j)T q’ Course Information Retrieval Summer Term 2009 Sergej Sizov 2-37 Part 4: Understanding Social Networks Course Information Retrieval Summer Term 2009 Sergej Sizov 2-38 Small World Experiment (Six degreees of separation) Stanley Milgram: 1967: Letters were handed out to people in Nebraska to be sent to a target (stock broker) in Boston People were instructed to pass on the letters to someone they knew on first-name basis The letters that reached the destination followed paths of avg length 5.2 (i.e. around 6) Duncan Watts: 2001: Milgram's experiment recreated on the internet using an e-mail message as the "package" that needed to be delivered, with 48,000 senders and 19 targets (in 157 countries). the avg number of intermediaries was also around 6. See also: The Kevin Bacon game The Erdös number etc. Course Information Retrieval Summer Term 2009 Sergej Sizov 2-39 Kevin Bacon Experiment Craig Fass, Brian Turtle and Mike Ginelli: 1994: motivated by Bacon's most recent movie „The Air Up There” and his career discussion Vertices: actors and actresses Edge between u and v if they appeared in a movie together Kevin Bacon No. of movies : 46 No. of actors : 1811 Average separation: 2.79 Is Kevin Bacon the most connected actor? See also: http://oracleofbacon.org/ Course Information Retrieval Rank Name 1 2 3 4 5 6 7 8 9 10 11 12 … 876 876 … Rod Steiger Donald Pleasence Martin Sheen Christopher Lee Robert Mitchum Charlton Heston Eddie Albert Robert Vaughn Donald Sutherland John Gielgud Anthony Quinn James Earl Jones Kevin Bacon 2.786981 Kevin Bacon Summer Term 2009 Sergej Sizov Average distance 2.537527 2.542376 2.551210 2.552497 2.557181 2.566284 2.567036 2.570193 2.577880 2.578980 2.579750 2.584440 # of movies 112 180 136 201 136 104 112 126 107 122 146 112 2.786981 46 1811 46 # of links 2562 2874 3501 2993 2905 2552 3333 2761 2865 2942 2978 3787 1811 2-40 Quantities of Interest Connected components: how many, and how large? Network diameter: maximum (worst-case) or average? exclude infinite distances? (disconnected components) the small-world phenomenon Clustering: to what extent that links tend to cluster “locally”? what is the balance between local and long-distance connections? what roles do the two types of links play? Degree distribution: what is the typical degree in the network? what is the overall distribution? Course Information Retrieval Summer Term 2009 Sergej Sizov 2-41 Graph theory: undirected graph notation Graph G=(V,E) – V = set of vertices – E = set of edges 2 1 3 5 4 undirected graph E={(1,2),(1,3),(2,3),(3,4),(4,5)} Course Information Retrieval Summer Term 2009 Sergej Sizov 2-42 Graph theory: directed graph notation Graph G=(V,E) – V = set of vertices – E = set of edges 2 1 3 5 4 directed graph E={‹1,2›, ‹2,1› ‹1,3›, ‹3,2›, ‹3,4›, ‹4,5›} Course Information Retrieval Summer Term 2009 Sergej Sizov 2-43 undirected graph: degree distribution 2 degree d(i) of node i – number of edges incident on node i 1 3 degree sequence – [d(1),d(2),d(3),d(4),d(5)] – [2,2,3,2,1] 5 4 degree distribution – [(1,1),(2,3),(3,1)] Course Information Retrieval Summer Term 2009 Sergej Sizov 2-44 Directed graph: in/outdegrees 2 in-degree din(i) of node i – number of edges pointing to node i out-degree dout(i) of node i – number of edges leaving node i 1 3 in-degree sequence – [1,2,1,1,1] out-degree sequence – [2,1,2,1,0] Course Information Retrieval 5 Summer Term 2009 Sergej Sizov 4 2-45 Degree distributions frequency fk = fraction of nodes with degree k = probability of a randomly selected node to have degree k fk k degree Problem: find the probability distribution that best fits the observed data Course Information Retrieval Summer Term 2009 Sergej Sizov 2-46 Power-law distributions The degree distributions of most real-life networks follow a power law p(k) = Ck-α Right-skewed/Heavy-tail distribution there is a non-negligible fraction of nodes that has very high degree (hubs) scale-free: no characteristic scale, average is not informative highly concentrated around the mean the probability of very high degree nodes is exponentially small Course Information Retrieval Summer Term 2009 Sergej Sizov 2-47 Power-law signature p(k) = Ck-α Power-law distribution gives a line in the log-log plot log p(k) = -α logk + logC log frequency frequency α log degree degree α : power-law exponent (typically 2 ≤ α ≤ 3) Course Information Retrieval Summer Term 2009 Sergej Sizov 2-48 Power-law: Examples Taken from [Newman 2003] Course Information Retrieval Summer Term 2009 Sergej Sizov 2-49 Collective Statistics (M. Newman 2003) Course Information Retrieval Summer Term 2009 Sergej Sizov 2-50 Web Structure: Power-Law Degrees Study of Web Graph (Broder et al. 2000) • power-law distributed degrees: P[degree=k] ~ (1/k) with 2.1 for indegrees and 2.7 for outdegrees Course Information Retrieval Summer Term 2009 Sergej Sizov 2-51 Clustering (Transitivity) coefficient Measures the density of triangles (local clusters) in the graph Two different ways to measure it: The ratio of the means C (1) triangles centered at node i triples centered at node i i i Course Information Retrieval Summer Term 2009 Sergej Sizov 2-52 Example 1 4 3 2 Course Information Retrieval 5 Summer Term 2009 C (1) Sergej Sizov 3 3 1 1 6 8 2-53 Clustering (Transitivity) coefficient Clustering coefficient for node i triangles centered at node i Ci triples centered at node i The mean of the ratios C Course Information Retrieval (2) 1 Ci n Summer Term 2009 Sergej Sizov 2-54 Example 1 C (2) C (1) 4 3 2 5 1 13 1 1 1 6 5 30 3 8 The two clustering coefficients give different measures C(2) increases with nodes with low degree Course Information Retrieval Summer Term 2009 Sergej Sizov 2-55 Collective Statistics (M. Newman 2003) Course Information Retrieval Summer Term 2009 Sergej Sizov 2-56 Clustering coefficient for random graphs The probability of two of your neighbors also being neighbors is p, independent of local structure clustering coefficient C = p when z is fixed C = z/n =O(1/n) Course Information Retrieval Summer Term 2009 Sergej Sizov 2-57 Graphs: paths Path from node i to node j: a sequence of edges (directed or undirected from node i to node j) path length: number of edges on the path nodes i and j are connected cycle: a path that starts and ends at the same node 2 2 1 1 3 5 Course Information Retrieval 4 Summer Term 2009 3 5 Sergej Sizov 4 2-58 Graphs: shortest paths Shortest Path from node i to node j also known as BFS path, or geodesic path 2 2 1 3 5 Course Information Retrieval 4 Summer Term 2009 1 3 5 Sergej Sizov 4 2-59 Measuring the small world phenomenon 2 dij = shortest path between i and j Diameter: 1 d max dij 3 i, j Characteristic path length: 5 1 dij n(n - 1)/2 i j 4 2 Harmonic mean 1 -1 d n(n - 1)/2 i j ij 1 1 3 5 Course Information Retrieval Summer Term 2009 Sergej Sizov 4 2-60 Collective Statistics (M. Newman 2003) Course Information Retrieval Summer Term 2009 Sergej Sizov 2-61 Degree correlations Do high degree nodes tend to link to high degree nodes? Newman compute the correlation coefficient of the degrees xi, yi of the two endpoints of an edge i Course Information Retrieval Summer Term 2009 Sergej Sizov 2-62 Collective Statistics (M. Newman 2003) Course Information Retrieval Summer Term 2009 Sergej Sizov 2-63 Graphs: fully connected cliques Clique Kn A graph that has all possible n(n-1)/2 edges 2 1 3 5 Course Information Retrieval Summer Term 2009 4 Sergej Sizov 2-64 Directed Graph 2 Strongly connected graph: there exists a path from every i to every j 3 1 Weakly connected graph: If edges are made to be undirected the graph is connected Course Information Retrieval Summer Term 2009 5 Sergej Sizov 4 2-65 Web Structure: Connected Components Study of Web Graph (Broder et al. 2000) SCC = strongly connected component Source: A.Z. Broder et al., WWW 2000 ..strongly connected core tends to have small diameter Course Information Retrieval Summer Term 2009 Sergej Sizov 2-66 Summary: real network properties Most nodes have only a small number of neighbors (degree), but there are some nodes with very high degree (power-law degree distribution) scale-free networks If a node x is connected to y and z, then y and z are likely to be connected high clustering coefficient Most nodes are just a few edges away on average. small world networks Course Information Retrieval Summer Term 2009 Sergej Sizov 2-67 A “Canonical” Natural Network has… Few connected components: often only 1 or a small number, indep. of network size Small diameter: often a constant independent of network size (like 6) or perhaps growing only logarithmically with network size or even shrink? typically exclude infinite distances A high degree of clustering: considerably more so than for a random network in tension with small diameter A heavy-tailed degree distribution: a small but reliable number of high-degree vertices often of power law form Is it possible that there is a unifying underlying generative process? Course Information Retrieval Summer Term 2009 Sergej Sizov 2-68 Digression: epidemics in social networks Observations: Many viruses do not appear in wild nature at all Some viruses are widely present in animal/human populations What is the virus propagation speed ? What fraction of the Population will be infected ? Problems: Communication patterns • Influences the spread mechanism of the virus • May change over time Transfer probability • May change over time Death of the virus • Virus effects (computer crash, human death) • Agent action (computer cleaning, human hospitalization, …) Course Information Retrieval Summer Term 2009 Sergej Sizov 2-69 Infection models SI-Modell (rumor spreading) susceptible infected SIS-Modell (birthrate/deathrate) susceptible infected susceptible Work deterministically Solved through differential equations Discrete models Graph based models Based on random contacts (walks, calls, etc) SIR-Modell susceptible infected recovered Course Information Retrieval Continuous models Summer Term 2009 Analysis of Markov processes Sergej Sizov 2-70 Infection modles (2) SI-Modell (rumor spreading) susceptible infected At the beginning, one individual is infected Each contact infects one new individual In each time step ß contacts between individuals take place SIS-Modell (birthrate/deathrate) susceptible infected susceptible Like SI-Model, but the fraction of all infected individuals recovers, can be infected again Alternatively: with probability the individual can be infected again SIR-Modell susceptible infected recovered Like SIS-Modell, but recovered individuals cannot be infected again (immunization) Course Information Retrieval Summer Term 2009 Sergej Sizov 2-71 Epidemic communication in arbitrary social networks Takes topological characteristics into account without being limited by them Discrete time SIS like infection: a node is healthy at time t if it Was healthy before t and not infected at t Healthy N2 N1 X Infected Course Information Retrieval Summer Term 2009 N3 Sergej Sizov 2-72 Topology-independent epidemic model Takes topological characteristics into account without being limited by them Discrete time A node is healthy at time t if it Was healthy before t and not infected at t OR Was infected before t, cured and not re-infected at t Healthy N2 N1 X Infected Course Information Retrieval Summer Term 2009 N3 Sergej Sizov 2-73 Topology-independent epidemic model Takes topological characteristics into account without being limited by them Discrete time A node is healthy at time t if it Was healthy before t and not infected at t OR Was infected before t, cured and not re-infected at t OR Was infected before t, therefore ignored re-infection attempts and was subsequently cured at t Healthy N2 N1 X Infected Course Information Retrieval Summer Term 2009 N3 Sergej Sizov 2-74 Epidemic threshold The epidemic threshold τ is the value such that β/δ < τ there is no epidemic where β = birth rate, and δ = death rate What is this threshold for an arbitrary graph? [Theorem 1] τ = 1/ λ1,A where λ1,A is the largest eigenvalue of the adjacency matrix A of the topology λ1,A alone captures the property of the graph! Course Information Retrieval Summer Term 2009 Sergej Sizov 2-75 Epidemic threshold for various networks Homogeneous networks λ1,A = <k>; τ = 1/<k> where <k> = average degree Star networks λ1,A = √d; τ = 1/ √d where d = the degree of the central node Finite power-law networks τ = 1/ λ1,A Course Information Retrieval Summer Term 2009 Sergej Sizov 2-76 Epidemic threshold [Theorem 1] The epidemic threshold is given by » τ = 1/ λ1,A How fast will an infection die out? [Theorem 2] Below the epidemic threshold, the epidemic dies out exponentially If β/δ < τ (β = birth rate, δ = death rate) then any local breakout of infection dies out exponentially fast Course Information Retrieval Summer Term 2009 Sergej Sizov 2-77 Epidemic threshold experiments (Star) Number of Infected Nodes 50 Star β= 0.016 45 40 β/δ > τ (above threshold) 35 30 25 20 15 β/δ = τ (close to the threshold) 10 5 0 0 50 100 150 200 Time δ: Course Information Retrieval 0.04 0.08 Summer Term 2009 0.12 0.16 0.20 Sergej Sizov β/δ < τ (below threshold) 2-78 Epidemic threshold experiments (Oregon) Number of Infected Nodes 500 Oregon β = 0.001 β/δ > τ (above threshold) 400 300 200 β/δ = τ (at the threshold) 100 0 0 250 500 750 Time δ: Course Information Retrieval 0.05 0.06 Summer Term 2009 1000 β/δ < τ (below threshold) 0.07 Sergej Sizov 2-79 SI-Model: Deterministic modeling with continuous time n the number of individuals remains constant S(t) healthy individuals at time t I(t) infected individuals at time t Relative fractions: s(t) := S(t)/n i(t) := I(t)/n In each time step, the individual contacts ß partners Assumption: under ß contact partners of the infected agent are ß s(t) healthy Assumption: all I(t) infected agents produce ß s(t) I(t) infections Recursive relationships Relative fractions: I(t+1) = I(t) + ß i(t) S(t) i(t+1) = i(t) + ß i(t) s(t) S(t+1) = S(t) – ß i(t) S(t) s(t+1) = s(t) – ß i(t) s(t) Course Information Retrieval Summer Term 2009 Sergej Sizov 2-80 SI-Model (2) i(t+1) = i(t) + ß i(t) s(t) s(t+1) = s(t) – ß i(t) s(t) Idea: i(t) is a continuous function i(t+1)-i(t) would be its 1st derivate .. Solution: Course Information Retrieval Summer Term 2009 Sergej Sizov 2-81 SI-Model: Interpretation Theorem: Until n/2 individuals are infected, the number of infected ones grows exponentially Then the number of healthy individuals shrinks exponentially Course Information Retrieval Summer Term 2009 Sergej Sizov 2-82