Download I(t)

Document related concepts
no text concepts found
Transcript
<is web>
ISWeb – Information Systems & Semantic Web
University of Koblenz ▪ Landau
Technical Basics
Sergej Sizov
Information Retrieval
Summer term 2009
<Nr.> of 20
Technical Basics: Outline
Probability Theory & Stochastics
Events, Probabilities, Random Variables, Distributions,
Basics from Information Theory, Markov chains
Linear Algebra
Vectors and matrices, eigenvectors, common decompositions
Network Analysis
Properties of social networks
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-2
Part 1: Basics from probability theory and stochastics
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-3
Basic Probability Theory
A probability space is a triple (, E, P) with
• a set  of elementary events (sample space),
• a family E (event space) of subsets of  with E which is closed
under , , and  with a countable number of operands
(with finite  usually E=2), and
• a probability measure P: E  [0,1] with P[]=1 and
P[i Ai] = i P[Ai] for countably many, pairwise disjoint Ai
Properties of P:
P[A] + P[A] = 1
P[A  B] = P[A] + P[B] – P[A  B]
P[] = 0 (null/impossible event)
P[ ] = 1 (true/certain event)
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-4
Independence and Conditional Probabilities
Two events A, B of a prob. space are independent
if P[A  B] = P[A] P[B].
A finite set of events A={A1, ..., An} is independent
if for every subset S A the equation P[
Ai ]   P[Ai ]
Ai S
Ai S
holds.
The conditional probability P[A | B] of A under the
P[ A  B]
condition (hypothesis) B is defined as:
P[ A | B] 
P[ B]
Event A is conditionally independent of B given C
if P[A | BC] = P[A | C].
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-5
Total Probability and Bayes’ Theorem
Total probability theorem:
For a partitioning of  into events B1, ..., Bn:
n
P[ A]   P[ A| Bi ] P[ Bi ]
i 1
P[ A  B] P[ B | A] P[ A]
Bayes‘ theorem: P[ A | B] 

P[ B]
P[ B]
P[A|B] is called posterior probability
P[A] is called prior probability
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-6
Example: Probabilistic Retrieval with Term Independence
Ranking Proportional to Relevance Odds
P[ R | d ]
sim(d , q)  O( R | d ) 
P[R | d ]
odds for relevance
(ratio of relevant documents)
P[d | R]  P[ R]

P[d | R]  P[R]
P[d | R]
~
P[d | R]
Bayes‘ theorem
P[ X i | R]
 i
P[ X i | R]
independence or
linked dependence
Xi = 1 if d includes
i-th term, 0 otherwise
P[ Xi | R]
sim(d , q)´ log iq
P[ Xi | R]
  log P[ Xi | R]  log P[ Xi | R]
iq
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-7
Example: Naive Bayes classification with Binary Features


estimate: P [ d ck | d has X ]  P [ d has X | d ck ] P [ d ck ]
P [ d has X ]
~ P [ X | d ck ] P [ d ck ]
 im1 P [
X i | d ck ] P [ d ck ]
with feature independence
or linked dependence:
P [ X | d  ck ]
P [ X i | d  ck ]
 i
P [ X | d  ck ]
P [ X i | d  ck ]
 im1 pik Xi ( 1  pik )1 Xi pk
with empirically estimated
pik=P[Xi=1|ck], pk=P[ck]
m
m
pik
 log P[ck | d ] ~  X i log
  log (1  pik )  log pk
(1  pik ) i 1
i 1
for binary classification with odds rather than probs for simplification
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-8
Random Variables
A random variable (RV) X on the prob. space (, E, P) is a function
X:   M with M  R s.t. {e | X(e) x} E for all x M
(X is measurable).
FX: M  [0,1] with FX(x) = P[X  x] is the
(cumulative) distribution function (cdf) of X.
With countable set M the function fX: M  [0,1] with fX(x) = P[X = x]
is called the (probability) density function (pdf) of X;
in general fX(x) is F‘X(x).
For a random variable X with distribution function F, the inverse function
F-1(q) := inf{x | F(x) > q} for q  [0,1] is called quantile function of X.
(0.5 quantile (50th percentile) is called median)
Random variables with countable M are called discrete,
otherwise they are called continuous.
For discrete random variables the density function is also
referred to as the probability mass function.
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-9
Important Discrete Distributions
• Bernoulli distribution with parameter p: P [ X  x ]  p x ( 1  p )1 x
for x  { 0,1}
• Uniform distribution over {1, 2, ..., m}:
1
P [ X  k ]  f X (k ) 
m
for1  k  m
• Binomial distribution (coin toss n times repeated; X: #heads):
 n k
P [ X  k ]  f X (k )    p (1  p) nk
k
• Poisson distribution (with rate ):
P[ X  k ]  f X
k


(k )  e
k!
• Geometric distribution (#coin tosses until first head):
P [ X  k ]  f X (k )  (1  p) k p
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-10
Important Continuous Distributions
• Uniform distribution in the interval [a,b]
1
f X ( x )
ba
for a  x  b ( 0 otherwise )
• Exponential distribution (z.B. time until next event of a
Poisson process) with rate  = limt0 (# events in t) / t :
f X ( x )   e x
for x  0 ( 0 otherwise )
 x
 x
• Hyperexponential distribution: f X ( x )  p1 e 1  ( 1  p )2 e 2
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-11
Multidimensional (Multivariate) Distributions
Let X1, ..., Xm be random variables over the same prob. space
with domains dom(X1), ..., dom(Xm).
The joint distribution of X1, ..., Xm has a density function
f X1,...,X m ( x1 ,...,xm )
with

x1dom( X1 )

or
...
xmdom( X m )

...
dom( X 1 )

f X1 ,...,X m ( x1 ,..., xm )  1
f X 1,..., Xm ( x1 ,...,xm ) dxm ...dx1  1
dom( X m )
The marginal distribution of Xi in the joint distribution
of X1, ..., Xm has the density function
 ...   ...  f X1 ,...,X m ( x1 ,..., xm ) or
x1
xi 1 xi 1
 ... 
X1
xm
 ...  f X1 ,...,X m ( x1 ,..., xm ) dxm ... dxi 1 dxi 1 ... dx1
X i 1 X i 1
Course Information Retrieval
Xm
Summer Term 2009
Sergej Sizov
2-12
Important Multivariate Distributions
multinomial distribution (n trials with m-sided dice):
 n  k1
km
 p1 ... pm
P [ X 1  k1  ...  X m  k m ]  f X1 ,...,X m ( k1 ,..., k m )  
 k1 ... k m 
 n 
n!
 :
with 
 k1 ... k m  k1! ... k m !
multidimensional normal distribution:

f X1 ,...,X m ( x ) 
1
( 2 )m 
 
1  
 ( x   )T  1 ( x   )
e 2
with covariance matrix  with ij := Cov(Xi,Xj)
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-13
Moments
For a discrete random variable X with density fX
E [ X ]   k f X (k )
is the expectation value (mean) of X
E [ X i ]   k i f X (k )
is the i-th moment of X
kM
kM
V [ X ]  E [( X  E[ X ])2 ]  E[ X 2 ]  E[ X ]2
is the variance of X
For a continuous random variable X with density fX

E [ X ]   x f X ( x) dx is the expectation value of


i
E [ X ]   xi f X ( x) dx is the i-th moment of X

2
2
2
V [ X ]  E [( X  E[ X ]) ]  E[ X ]  E[ X ]
X
is the variance of X
Theorem: Expectation values are additive: E [ X  Y ]  E[ X ]  E[ Y ]
(distributions are not)
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-14
Important Properties of Expectation and Variance
E[aX+b] = aE[X]+b for constants a, b
E[X1+X2+...+Xn] = E[X1] + E[X2] + ... + E[Xn]
(i.e. expectation values are generally additive, but distributions are not!)
Var[aX+b] = a2 Var[X] for constants a, b
Var[X1+X2+...+Xn] = Var[X1] + Var[X2] + ... + Var[Xn]
if X1, X2, ..., Xn are independent RVs
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-15
Correlation of Random Variables
Covariance of random variables Xi and Xj::
Cov( Xi, Xj) : E[ ( Xi  E[ Xi]) ( Xj  E[ Xj]) ]
Var( Xi )  Cov( Xi , Xi )  E [ X 2 ]  E [ X ] 2
Correlation coefficient of Xi and Xj
 ( Xi, Xj) :
Course Information Retrieval
Cov ( Xi, Xj)
Var ( Xi) Var ( Xj)
Summer Term 2009
Sergej Sizov
2-16
Markov Chains
0.5
0.2
0.8
0: sunny
1: cloudy
0.3
2: rainy
0.5
0.3
0.4
p0 = 0.8 p0 + 0.5 p1 + 0.4 p2
p1 = 0.2 p0 + 0.3 p2
p2 = 0.5 p1 + 0.3 p2
p0 + p1 + p2 = 1
 p0  0.657, p1 = 0.2, p2  0.143
state set: finite or infinite
state transition prob‘s: pij
time: discrete or continuous
state prob‘s in step t: pi(t) = P[S(t)=i]
Markov property: P[S(t)=i | S(0), ..., S(t-1)] = P[S(t)=i | S(t-1)]
interested in stationary state probabilities:
p j : lim p(j t )  lim  pk( t 1 ) pkj
t 
t  k
p j   pk pkj
k
guaranteed to exist for irreducible, aperiodic, finite Markov chains
Course Information Retrieval
Summer Term 2009
Sergej Sizov
 pj  1
j
2-17
Example: GAFOR Regions for Germany
38 = Neuwieder
Becken !
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-18
Markov Chains: Formal Definition
A stochastic process is a family of
random variables {X(t) | t  T}.
T is called parameter space, and the domain M of X(t) is called
state space. T and M can be discrete or continuous.
A stochastic process is called Markov process if
for every choice of t1, ..., tn+1 from the parameter space and
every choice of x1, ..., xn+1 from the state space the following holds:
P [ X ( tn 1 )  xn 1 | X ( t1 )  x1  X ( t2 )  x2  ...  X ( tn )  xn ]
 P [ X ( tn 1 )  xn 1 | X ( tn )  xn ]
A Markov process with discrete state space is called Markov chain.
A canonical choice of the state space are the natural numbers.
Notation for Markov chains with discrete parameter space:
Xn rather than X(tn) with n = 0, 1, 2, ...
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-19
Properties of Markov Chains with Discrete Parameter Space (1)
The Markov chain Xn with discrete parameter space is
homogeneous if the transition probabilities
pij := P[Xn+1 = j | Xn=i] are independent of n
irreducible if every state is reachable from every other state
with positive probability:

 P[ X n  j | X 0  i ]  0 for all i, j
n 1
aperiodic if every state i has period 1, where the
period of i is the greatest common divisor of all
(recurrence) values n for which
P[ X n  i  X k  i for k  1,..., n  1 | X 0  i ]  0
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-20
Properties of Markov Chains with Discrete Parameter Space (2)
The Markov chain Xn with discrete parameter space is
positive recurrent if for every state i the recurrence probability
is 1 and the mean recurrence time is finite:

 P[ X n  i  X k  i for k  1,..., n  1 | X 0  i ] 1
n1

 n P[ X n  i  X k  i for k  1,..., n  1 | X 0  i ]  
n1
ergodic if it is homogeneous, irreducible, aperiodic, and
positive recurrent.
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-21
Results on Markov Chains with Discrete Parameter Space (1)
For the n-step transition probabilities
pij( n ) : P [ X n  j | X 0  i ] the following holds:
( n 1 )
pij( n )   pik
pkj with pij( 1 ) : pik
k
( n l ) ( l )
  pik
pkj for 1  l  n  1
k
(n)
P
in matrix notation:
 Pn
For the state probabilities after n steps
 (j n ) : P [ X n  j ] the following holds:
 (j n )
   i( 0 ) pij( n )
i
in matrix notation:
Course Information Retrieval
(0)

with initial state probabilities i

(n)

(0) (n)
Summer Term 2009
P
(ChapmanKolmogorov equation)
Sergej Sizov
2-22
Results on Markov Chains with Discrete Parameter Space (2)
Every homogeneous, irreducible, aperiodic Markov chain
with a finite number of states is positive recurrent and ergodic.
For every ergodic Markov chain there exist
 j : lim  (j n )
stationary state probabilities
n 
These are independent of (0)
and are the solutions of the following system of linear equations:
 j    i pij for all j
i
 j  1
j
in matrix notation:
(with 1n row vector )
Course Information Retrieval
(balance
equations)
  P

 1 1
Summer Term 2009
Sergej Sizov
2-23
Part 2: Basics of Information Theory
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-24
Elementary Information Theory
Let f(x) be the probability (or relative frequency) of the x-th symbol
1
in some text d. The entropy of the text
H ( d )   f ( x ) log2
(or the underlying prob. distribution f) is:
f(x)
x
H(d) is a lower bound for the bits per symbol
needed with optimal coding (compression).
For two prob. distributions f(x) and g(x) the
relative entropy (Kullback-Leibler divergence) of f to g is
f(x)
D( f g ) :  f ( x ) log
g( x )
x
Relative entropy is a measure for the (dis-)similarity of
two probability or frequency distributions.
It corresponds to the average number of additional bits
needed for coding information (events) with distribution f
when using an optimal code for distribution g.
The cross entropy of f(x) to g(x) is:
H ( f , g ) : H ( f )  D( f g )    f ( x ) log g( x )
x
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-25
Compression
• Text is sequence of symbols (with specific frequencies)
• Symbols can be
• letters or other characters from some alphabet 
• strings of fixed length (e.g. trigrams)
• or words, bits, syllables, phrases, etc.
Limits of compression:
Let pi be the probability (or relative frequency)
of the i-th symbol in text d
1
Then the entropy of the text: H (d )   pi log 2 p
i
i
is a lower bound for the average number of bits per symbol
in any compression (e.g. Huffman codes)
Note:
compression schemes such as Ziv-Lempel (used in zip)
are better because they consider context beyond single symbols;
with appropriately generalized notions of entropy
the lower-bound theorem does still hold
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-26
The Entropy of Flickr
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-27
The Entropy of Flickr (2)
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-28
Part 3: Using Vectors and Matrices
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-29
IR and Linear Algebra
Vector space model revisited..
 Terms characterize documents (term-document matrix)
 But also: documents characterize terms..
 User/resource/tag relationships in folksonomies
 Alternate document/term representation forms also possible:
 e.g. concept based (using thesaurus like WordNet)
 e.g. with rich background document corpus (using Wikipedia in ESA)
But matrix representation is also frequently used for
 Representing connections in graph models (adjacency matrix)
 Representing transitions in stochastic models (probabilistic matrix)
.. and many other IR relevant applications and methods
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-30
Foundations from Linear Algebra
A set S of vectors is called linearly independent if no
x  S can be written as a linear combination of other vectors in S.
The rank of matrix A is the maximal number of
linearly independent row or column vectors.
A basis of an nn matrix A is a set S of linearly independent row or
column vectors such that all rows or columns are linear combinations
of vectors from S.
A set S of n1 vectors is an orthonormal basis if for all x, y S:
x
Course Information Retrieval
2
:
n
2
 Xi  1  y
i 1
Summer Term 2009
2
and x  y  0
Sergej Sizov
2-31
Eigenvalues and Eigenvectors
Let A be a real-valued nn matrix, x a real-valued n1 vector,
and  a real-valued scalar. Solutions x ≠ 0 and  of the equation
Ax = x are called an Eigenvector and Eigenvalue of A.
Eigenvectors of A are vectors whose direction is preserved
by the linear transformation described by A.
The Eigenvalues of A are the roots (Nullstellen) of the
characteristic polynom f() of A:
f (  )  A  I  0
with the determinant (developing the i-th row):
n
(ij)
A   ( 1 )i  j aij A( ij ) where matrix A is derived from A by
removing the i-th row and the j-th column
j 1
The real-valued nn matrix A is symmetric if aij=aji for all i, j.
A is positive definite if for all n1 vectors x  0: xT A  x > 0.
If A is symmetric then all Eigenvalues of A are real.
If A is symmetric and positive definite then all Eigenvalues are positive.
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-32
Singular Value Decomposition (SVD)
Theorem:
Each real-valued mn matrix A with rank r can be decomposed
into the form A = U    VT
with an mr matrix U with orthonormal column vectors,
an rr diagonal matrix , and
an nr matrix V with orthonormal column vectors.
This decomposition is called singular value decomposition
and is unique when the elements of  are sorted.
Theorem:
In the singular value decomposition A = U    VT of matrix A
the matrices U, , and V can be derived as follows:
•  consists of the singular values of A,
i.e. the positive roots of the Eigenvalues of AT  A,
• the columns of U are the Eigenvectors of A  AT,
• the columns of V are the Eigenvectors of AT  A.
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-33
SVD for Regression
Theorem:
Let A be an mn matrix with rank r, and let Ak = Uk  k  VkT,
where the kk diagonal matrix k contains the k largest singular values
of A and the mk matrix Uk and the nk matrix Vk contain the
corresponding Eigenvectors from the SVD of A.
Among all mn matrices C with rank at most k
Ak is the matrix that minimizes the Frobenius norm
AC
Course Information Retrieval
2
F
m n
   ( Aij  Cij )2
i 1 j 1
Summer Term 2009
Sergej Sizov
2-34
SVD and Vector Space Model
A
ti (term i)
..............
Term-document matrix:
dj (doc j)
........................
mn
Observation: dot product tTi∙tj gives the „similarity“ between ti, tj over the
documents. The matrix product AAT contains all these dot products. Element
(x,y) (= element (y,x)) contains the dot product tTx∙ty.
The matrix ATA contains the dot products between document vectors, and
gives their „similarity“ over terms: element (m,n) (=element (n,m)) = dTm∙dn
SVD Theorem: In the singular value decomposition A = U    VT of matrix A
the matrices U, , and V can be derived as follows:
•  consists of the singular values of A,
i.e. the positive roots of the Eigenvalues of ATA,
• the columns of U are the Eigenvectors of AAT,
• the columns of V are the Eigenvectors of ATA.
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-35
Key Idea of Latent Concept Models
Objective:
Transformation of document vectors from
high-dimensional term vector space into
lower-dimensional topic vector space with
• exploitation of term correlations
(e.g. „Web“ and „Internet“ frequently occur in together)
• implicit differentiation of polysems that exhibit
different term correlations for different meanings
(e.g. „Java“ with „Library“ vs. „Java“ with „Kona Blend“ vs. „Java“ with „Borneo“)
mathematically:
given: m terms, n docs (usually n > m) and a
mn term-document similarity matrix A,
needed: largely similarity-preserving mapping
of column vectors of A
into k-dimensional vector space (k << m) for given k
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-36
Latent Semantic Indexing (LSI)
[Deerwester et al. 1990]
A is the mn term-document matrix. Then:
• U and Uk are the mr and mk term-topic similarity matrices,
• V and Vk are the nr and nk document-topic similarity matrices,
• AAT and AkAkT are the term-term similarity matrices,
• ATA and AkTAk are the document-document similarity matrices
........................
=

k
........... 
........
mn
mn
mr
mk
VkTT
1
00
0 k
0
r
rr
kk
mapping of m1 vectors into latent-topic space:

doc j
.........
.......
term i
Uk
..............
A
latent
topic t
..............
doc j
......................
......................
latent
topic t
rn
kn
dj
q
U k T  d j : d j '
U k T  q : q'
scalar-product similarity in latent-topic space: dj‘Tq‘ = ((kVkT)*j)T  q’
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-37
Part 4: Understanding Social Networks
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-38
Small World Experiment (Six degreees of separation)
Stanley Milgram: 1967: Letters were handed out to people in Nebraska
to be sent to a target (stock broker) in Boston
 People were instructed to pass on the letters to someone they knew
on first-name basis
 The letters that reached the destination followed paths of avg length
5.2 (i.e. around 6)
Duncan Watts: 2001: Milgram's experiment recreated on the internet
 using an e-mail message as the "package" that needed to be
delivered, with 48,000 senders and 19 targets (in 157 countries).
 the avg number of intermediaries was also around 6.
See also:
 The Kevin Bacon game
 The Erdös number
 etc.
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-39
Kevin Bacon Experiment
Craig Fass, Brian Turtle and Mike Ginelli: 1994: motivated by Bacon's
most recent movie „The Air Up There” and his career discussion
Vertices: actors and actresses
Edge between u and v if they appeared in a movie together
Kevin Bacon
No. of movies : 46
No. of actors : 1811
Average separation: 2.79
Is Kevin Bacon the
most connected
actor?
See also:
http://oracleofbacon.org/
Course Information Retrieval
Rank
Name
1
2
3
4
5
6
7
8
9
10
11
12
…
876
876
…
Rod Steiger
Donald Pleasence
Martin Sheen
Christopher Lee
Robert Mitchum
Charlton Heston
Eddie Albert
Robert Vaughn
Donald Sutherland
John Gielgud
Anthony Quinn
James Earl Jones
Kevin Bacon 2.786981
Kevin Bacon
Summer Term 2009
Sergej Sizov
Average
distance
2.537527
2.542376
2.551210
2.552497
2.557181
2.566284
2.567036
2.570193
2.577880
2.578980
2.579750
2.584440
# of
movies
112
180
136
201
136
104
112
126
107
122
146
112
2.786981
46
1811 46
# of
links
2562
2874
3501
2993
2905
2552
3333
2761
2865
2942
2978
3787
1811
2-40
Quantities of Interest
Connected components:
 how many, and how large?
Network diameter:
 maximum (worst-case) or average?
 exclude infinite distances? (disconnected components)
 the small-world phenomenon
Clustering:
 to what extent that links tend to cluster “locally”?
 what is the balance between local and long-distance connections?
 what roles do the two types of links play?
Degree distribution:
 what is the typical degree in the network?
 what is the overall distribution?
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-41
Graph theory: undirected graph notation
Graph G=(V,E)
– V = set of vertices
– E = set of edges
2
1
3
5
4
undirected graph
E={(1,2),(1,3),(2,3),(3,4),(4,5)}
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-42
Graph theory: directed graph notation
Graph G=(V,E)
– V = set of vertices
– E = set of edges
2
1
3
5
4
directed graph
E={‹1,2›, ‹2,1› ‹1,3›, ‹3,2›, ‹3,4›, ‹4,5›}
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-43
undirected graph: degree distribution
2
degree d(i) of node i
– number of edges incident on
node i
1
3
degree sequence
– [d(1),d(2),d(3),d(4),d(5)]
– [2,2,3,2,1]
5
4
degree distribution
– [(1,1),(2,3),(3,1)]
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-44
Directed graph: in/outdegrees
2
in-degree din(i) of node i
– number of edges pointing to node
i
out-degree dout(i) of node i
– number of edges leaving node i
1
3
in-degree sequence
– [1,2,1,1,1]
out-degree sequence
– [2,1,2,1,0]
Course Information Retrieval
5
Summer Term 2009
Sergej Sizov
4
2-45
Degree distributions
frequency
fk = fraction of nodes with degree k
= probability of a randomly
selected node to have degree k
fk
k
degree
Problem: find the probability distribution that best fits the observed data
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-46
Power-law distributions
The degree distributions of most real-life networks follow a
power law
p(k) = Ck-α
Right-skewed/Heavy-tail distribution
 there is a non-negligible fraction of nodes that has very
high degree (hubs)
 scale-free: no characteristic scale, average is not
informative
 highly concentrated around the mean
 the probability of very high degree nodes is
exponentially small
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-47
Power-law signature
p(k) = Ck-α
Power-law distribution gives a line in the log-log plot
log p(k) = -α logk + logC
log frequency
frequency
α
log degree
degree
α : power-law exponent (typically 2 ≤ α ≤ 3)
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-48
Power-law: Examples
Taken from [Newman 2003]
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-49
Collective Statistics (M. Newman 2003)
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-50
Web Structure: Power-Law Degrees
Study of Web Graph (Broder et al. 2000)
• power-law distributed degrees: P[degree=k] ~ (1/k)
with   2.1 for indegrees and   2.7 for outdegrees
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-51
Clustering (Transitivity) coefficient
Measures the density of triangles (local clusters) in the graph
Two different ways to measure it:
The ratio of the means
C (1)
 triangles centered at node i

 triples centered at node i
i
i
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-52
Example
1
4
3
2
Course Information Retrieval
5
Summer Term 2009
C
(1)
Sergej Sizov
3
3


1 1  6 8
2-53
Clustering (Transitivity) coefficient
Clustering coefficient for node i
triangles centered at node i
Ci 
triples centered at node i
The mean of the ratios
C
Course Information Retrieval
(2)
1
 Ci
n
Summer Term 2009
Sergej Sizov
2-54
Example
1
C
(2)
C
(1)
4
3
2
5
1
13
 1  1  1 6  
5
30
3

8
The two clustering coefficients give different measures
C(2) increases with nodes with low degree
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-55
Collective Statistics (M. Newman 2003)
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-56
Clustering coefficient for random graphs
The probability of two of your neighbors also being neighbors is p,
independent of local structure
 clustering coefficient C = p
 when z is fixed C = z/n =O(1/n)
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-57
Graphs: paths
Path from node i to node j: a sequence of edges (directed or undirected from
node i to node j)
 path length: number of edges on the path
 nodes i and j are connected
 cycle: a path that starts and ends at the same node
2
2
1
1
3
5
Course Information Retrieval
4
Summer Term 2009
3
5
Sergej Sizov
4
2-58
Graphs: shortest paths
Shortest Path from node i to node j
 also known as BFS path, or geodesic path
2
2
1
3
5
Course Information Retrieval
4
Summer Term 2009
1
3
5
Sergej Sizov
4
2-59
Measuring the small world phenomenon
2
dij = shortest path between i and j
Diameter:
1
d  max dij
3
i, j
Characteristic path length:
5
1

dij

n(n - 1)/2 i j
4
2
Harmonic mean
1
-1
 
d

n(n - 1)/2 i j ij
1
1
3
5
Course Information Retrieval
Summer Term 2009
Sergej Sizov
4
2-60
Collective Statistics (M. Newman 2003)
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-61
Degree correlations
Do high degree nodes tend to link to high degree nodes?
Newman
 compute the correlation coefficient of the degrees xi, yi of the
two endpoints of an edge i
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-62
Collective Statistics (M. Newman 2003)
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-63
Graphs: fully connected cliques
Clique Kn
A graph that has all possible n(n-1)/2 edges
2
1
3
5
Course Information Retrieval
Summer Term 2009
4
Sergej Sizov
2-64
Directed Graph
2
Strongly connected graph: there
exists a path from every i to
every j
3
1
Weakly connected graph: If edges
are made to be undirected the
graph is connected
Course Information Retrieval
Summer Term 2009
5
Sergej Sizov
4
2-65
Web Structure: Connected Components
Study of Web Graph (Broder et al. 2000)
SCC =
strongly
connected
component
Source: A.Z. Broder et al., WWW 2000
..strongly connected core tends to have small diameter
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-66
Summary: real network properties
Most nodes have only a small number of neighbors
(degree), but there are some nodes with very high degree
(power-law degree distribution)
 scale-free networks
If a node x is connected to y and z, then y and z are likely to
be connected
 high clustering coefficient
Most nodes are just a few edges away on average.
 small world networks
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-67
A “Canonical” Natural Network has…
Few connected components:
 often only 1 or a small number, indep. of network size
Small diameter:
 often a constant independent of network size (like 6)
 or perhaps growing only logarithmically with network size or
even shrink?
 typically exclude infinite distances
A high degree of clustering:
 considerably more so than for a random network
 in tension with small diameter
A heavy-tailed degree distribution:
 a small but reliable number of high-degree vertices
 often of power law form
Is it possible that there is a unifying underlying generative process?
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-68
Digression: epidemics in social networks
Observations:
 Many viruses do not appear in wild nature at all
 Some viruses are widely present in animal/human populations
What is the virus propagation speed ?
What fraction of the Population will be infected ?
Problems:
 Communication patterns
• Influences the spread mechanism of the virus
• May change over time
 Transfer probability
• May change over time
 Death of the virus
• Virus effects (computer crash, human death)
• Agent action (computer cleaning, human hospitalization, …)
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-69
Infection models
SI-Modell (rumor spreading)
 susceptible  infected
SIS-Modell
(birthrate/deathrate)
 susceptible  infected
 susceptible
 Work deterministically
 Solved through
differential equations
Discrete models
 Graph based models
 Based on random contacts
(walks, calls, etc)
SIR-Modell
 susceptible  infected
 recovered
Course Information Retrieval
Continuous models
Summer Term 2009
 Analysis of Markov
processes
Sergej Sizov
2-70
Infection modles (2)
SI-Modell (rumor spreading)
 susceptible  infected
 At the beginning, one individual is infected
 Each contact infects one new individual
 In each time step ß contacts between individuals take place
SIS-Modell (birthrate/deathrate)
 susceptible  infected  susceptible
 Like SI-Model, but the fraction  of all infected individuals
recovers, can be infected again
 Alternatively: with probability  the individual can be infected
again
SIR-Modell
 susceptible  infected  recovered
 Like SIS-Modell, but recovered individuals cannot be infected
again (immunization)
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-71
Epidemic communication in arbitrary social networks
Takes topological characteristics into account without being
limited by them
Discrete time
SIS like infection: a node is healthy at time t if it
 Was healthy before t and not infected at t
Healthy
N2
N1
X
Infected
Course Information Retrieval
Summer Term 2009
N3
Sergej Sizov
2-72
Topology-independent epidemic model
Takes topological characteristics into account without being
limited by them
Discrete time
A node is healthy at time t if it
 Was healthy before t and not infected at t OR
 Was infected before t, cured and not re-infected at t
Healthy
N2
N1
X
Infected
Course Information Retrieval
Summer Term 2009
N3
Sergej Sizov
2-73
Topology-independent epidemic model
Takes topological characteristics into account without being
limited by them
Discrete time
A node is healthy at time t if it
 Was healthy before t and not infected at t OR
 Was infected before t, cured and not re-infected at t OR
 Was infected before t, therefore ignored re-infection
attempts and was subsequently cured at t
Healthy
N2
N1
X
Infected
Course Information Retrieval
Summer Term 2009
N3
Sergej Sizov
2-74
Epidemic threshold
The epidemic threshold τ is the value such that
 β/δ < τ  there is no epidemic
 where β = birth rate, and δ = death rate
What is this threshold for an arbitrary graph?
[Theorem 1]
τ = 1/ λ1,A
where λ1,A is the largest eigenvalue of the
adjacency matrix A of the topology
λ1,A alone captures the
property of the graph!
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-75
Epidemic threshold for various networks
Homogeneous networks
 λ1,A = <k>; τ = 1/<k>
 where <k> = average degree
Star networks
 λ1,A = √d; τ = 1/ √d
 where d = the degree of the central node
Finite power-law networks
 τ = 1/ λ1,A
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-76
Epidemic threshold
[Theorem 1] The epidemic threshold is given by
» τ = 1/ λ1,A
How fast will an infection die out?
[Theorem 2] Below the epidemic threshold, the epidemic
dies out exponentially
 If β/δ < τ (β = birth rate, δ = death rate) then
 any local breakout of infection dies out exponentially fast
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-77
Epidemic threshold experiments (Star)
Number of Infected Nodes
50
Star
β= 0.016
45
40
β/δ > τ
(above threshold)
35
30
25
20
15
β/δ = τ
(close to the
threshold)
10
5
0
0
50
100
150
200
Time
δ:
Course Information Retrieval
0.04
0.08
Summer Term 2009
0.12
0.16
0.20
Sergej Sizov
β/δ < τ
(below threshold)
2-78
Epidemic threshold experiments (Oregon)
Number of Infected Nodes
500
Oregon
β = 0.001
β/δ > τ
(above threshold)
400
300
200
β/δ = τ
(at the threshold)
100
0
0
250
500
750
Time
δ:
Course Information Retrieval
0.05
0.06
Summer Term 2009
1000
β/δ < τ
(below threshold)
0.07
Sergej Sizov
2-79
SI-Model: Deterministic modeling with continuous time
n
the number of individuals remains constant
S(t)
healthy individuals at time t
I(t)
infected individuals at time t
Relative fractions:
s(t) := S(t)/n
i(t) := I(t)/n
In each time step, the individual contacts ß partners
 Assumption: under ß contact partners of the infected agent are ß
s(t) healthy
 Assumption: all I(t) infected agents produce ß s(t) I(t) infections
Recursive relationships
Relative fractions:
 I(t+1)
= I(t) + ß i(t) S(t)
i(t+1)
= i(t) + ß i(t) s(t)
 S(t+1)
= S(t) – ß i(t) S(t)
s(t+1)
= s(t) – ß i(t) s(t)
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-80
SI-Model (2)
 i(t+1) = i(t) + ß i(t) s(t)
s(t+1) = s(t) – ß i(t) s(t)
Idea:
 i(t) is a continuous function
 i(t+1)-i(t) would be its 1st derivate ..
Solution:
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-81
SI-Model: Interpretation
Theorem:
Until n/2 individuals are infected, the number of infected ones
grows exponentially
Then the number of healthy individuals shrinks exponentially
Course Information Retrieval
Summer Term 2009
Sergej Sizov
2-82