Download Introduction to Complex Networks

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Programming for
Geographical Information Analysis:
Advanced Skills
Online mini-lecture: Introduction to Complex
Networks
Dr Andy Evans
This Lecture
Types of Network
Random
Spatial
Scall-free
Small-world
Network Statistics
Network types
Various types of abstract graph have been
suggested. We mentioned two in lecture four: the
tree and the lattice.
Some appear to be more useful for understanding
real world social and environmental networks.
The simplest of these is the Random Graph.
Nodes are connected randomly in some manner.
Erdős–Rényi Construction
Produces the simplest Random Graph.
Edges are progressively added, with each node having
the same probability of being involved.
Spatial Graphs
Where the ability to connect between nodes is
constrained by space.
Generally this means a higher probability of connection
to nearby nodes.
Various types: including random-spatial.
Caveman Graphs
Individual highly-connected groups.
No connection between groups.
Network statistics
Distribution/average of node degree.
Distances:
Eccentricity: distance from a node to the node furthest
from it.
Average path length: average eccentricity.
Radius: minimum eccentricity in the graph.
Diameter: maximum eccentricity in the graph.
Global clustering: how many nodes are connected in
complete connection triangles (triadic closures) as a
proportion of the connected triplets in the graph.
Network statistics
Trees
Lattices
Low average degree
Narrow degree
distribution
Low clustering
High APL
Low average degree
Narrow degree
distribution
Low clustering
High APL
Network statistics
Random
Spatial
Caveman
Low average
degree
Normal degree
distribution
Low clustering
Low APL
Medium average
degree
Narrow degree
range
Medium clustering
Long APL
High average degree
Narrow degree
range
High clustering
Infinite APL
Scale-free Networks
Barabási and Albert looked at the
real networks, including the internet.
Percentage of nodes of various
degrees
90
80
They saw the distribution of links
matched an inverse power law.
Number of nodes of degree k =
k-x
This relationship is constant,
whatever k, i.e. The distribution is
scale-free.
70
60
50
x=1
40
X=2
30
X=3
20
10
0
0
5
10
15
20
Barabási–Albert construction
Attach more edges to those nodes that already have
more edges.
Probability of attachment proportional to node degree.
Produces a scale-free network.
Scale-free Networks
Still a fairly high number of nodes of
5+ degree.
Percentage of nodes of various
degrees
These are known as Hubs.
90
Basis (kinda) for the Google
PageRank algorithm.
80
70
60
Networks have a high resistance.
50
x=1
X=2
40
High clustering, but degree of
clustering relates to network size.
Large networks = smaller clustering.
X=3
30
20
10
0
0
5
10
15
20
Scale-free Networks
Scale-free networks seem like the kinds of networks that
might be good for modelling people.
But, does social clustering really change with size of
network?
There is some evidence that human group sizes are
limited.
Dunbar Number
Robin Dunbar suggests that human brain size suggests ~150
people, which seems to match pre-industrial communities.
But others have found a wide range of figures.
There is some evidence that once groups grow above this limit
the core group doesn’t scale, but a new hierarchy of group
management develops.
Either way, the core group size is unlikely to scale with the
network.
♫♪ It’s a small world afterall ♫♪
How is it we often meet complete strangers with whom we have a
mutual acquaintance?
It’s said that you’re only six mutual associates away from anyone in
the world (“Six Degrees of Separation”).
Stanley Milgram (1967) sent packages to people in Nebraska and
Kansas, with instructions to pass them to people they thought
might be closer to targets in Massachusetts. Took an average of 5
steps to arrive.
How can this be possible given the following..?
Every person knows only around a thousand people.
There are six billion people on the globe.
The Kevin Bacon Game
Can you link any actor to Bacon via co-stars in films?
Anyone whose co-starred in a film with Kevin Bacon has a Bacon
Number of one.
Anyone who’s been in a film with a co-star of Bacon has a Bacon
Number of two, etc.
Six Degrees of Kevin Bacon
Barbara Windsor has a Bacon number of three.
Barbara Windsor was in Comrades (1987) with Robert Stephens
Robert Stephens was in Chaplin (1992) with Diane Lane
Diane Lane was in My Dog Skip (2000) with Kevin Bacon
Steve McFadden has a Bacon
number of two
Steve McFadden was in Buster (1988)
with Phil Collins
Phil Collins was in Balto (1995) with
Kevin Bacon
Is Kevin Bacon the centre of the
Universe?
The Internet Movie Database has ~850,000 connected films. Each
film has an average number of actors of 61.
Yet the maximum Bacon Number found so far is only 12.
The average number of films between any actor and Bacon is only
2.980 films.
So why is this so?
Because social groups are a form of network known as Small World
graphs.
Small World graphs
A mix of strongly Clustered groups with a few hub individuals who
know many groups (cause the social groups to overlap).
Kevin
Fall between extremes in the level of local clustering and average path
length like the scale-free networks.
But, more realistic clustering – which doesn’t scale.
Watts and Strogatz construction
Start with a ring network, with each point connected to
its k neighbours (i.e. start with strong clustering).
Rewrite each edge to one randomly picked, if some
probability β is met.
More characteristics
Average Path Length is proportional to ln(vertices).
Average Path Length is inversely proportional to
ln(associates).
The Average Path Length decreases extremely rapidly as
lynchpins / shortcuts increase slightly from nothing.
Shortcuts cross vast areas of variable space to link with
unexpected groups.
Very robust to random losses – at worst flows will route to
another hub.
Spatial graphs
Shortcuts are rare (it’s easier to link to nearby nodes than
stretch to the other side of a net) so they rarely show Small
World characteristics.
In such networks the Average Path Length scales more
linearly with the number of vertices.
Example of a real network
Disease spread.
2001 UK Foot and Mouth epizootic.
Farm-to-farm spread by air: spatial network.
Farm-to-farm spread by cattle movements: small-world
network.
Foot and Mouth daily cases
Cutting movements improved on
1967.
Cases decreased when
probability of inflection lowered.
Source: BBC / MAFF 4 May 2001
1967
24hr cull policy
50
Healthy cull policy
Initial May 5th
predictions
400d-1
40
30
20
10
0
24 Feb
10 Mar
24 Mar
7 Apr
29 Apr
Uses of Small World theory
The spread of disease (Watts, 1999).
Spreading is controlled by…
The length of time that someone is infectious.
The length of time someone is removed (sick but not infectious,
or if infinite = immune or dead).
The infection probability / rate between 0 and 1.
People are either Susceptible, Infectious or Removed.
Watts mapped the proportions of these groups in Small
World societies and physically limited networks for
different disease parameters.
Violent deadly diseases
Small World
1
 Such diseases reach
equilibrium when people are
removed faster than the
disease spreads.
 There’s a massive difference
in deaths dependent on
shortcuts.
 Hence cutting off diseased
population is vital.
Fraction of shortcuts = 0
Fraction of shortcuts = 0.9
0
0
Tipping point
Disease takes off
Probability of infection 1
Everyone dies
Other characteristics of disease spread
If the disease infects the whole population, the time to do
so is also strongly dependent on the fraction of shortcuts.
In physically limited graphs, however, the spread is about
the same whatever the range over which vertices can
connect.
Diseases are worse in Small World situations, but more
easily controlled.
Other uses of Small World theory
Spread of information / fashion / “memes”.
The resilience of networks to attack.
The efficiency of distribution systems.
Software
Masses of software
E.g. Inflow
Network Centrality
Small-World Networks
Cluster Analysis
Network Density
Prestige / Influence
Structural Equivalence
Network Neighborhood
External / Internal Ratio
Weighted Average Path Length
Shortest Paths & Path Distribution
Other key statistics
Centrality: various measures, including degree, but two
are:
Betweenness centrality: number of shortest paths
passing through a node.
Closeness centrality: average of shortest paths to all
other nodes.
Node degree (or other) correlation: how similar are
nodes to their neighbours?