* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Information Diffusion
IEEE 802.1aq wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Computer network wikipedia , lookup
Network tap wikipedia , lookup
Lecture 27:
Information diffusion
CS 765 Complex Networks
Slides are modified from Lada Adamic, David Kempe, Bill Hackbor
outline
factors influencing information diffusion
network structure: which nodes are connected?
strength of ties: how strong are the connections?
studies in information diffusion:
Granovetter: the strength of weak ties
J-P Onnela et al: strength of intermediate ties
Kossinets et al: strength of backbone ties
Davis: board interlocks and adoption of practices
network position and access to information
Burt: Structural holes and good ideas
Aral and van Alstyne: networks and information advantage
networks and innovation
Lazer and Friedman: innovation
factors influencing diffusion
network structure (unweighted)
density
degree distribution
clustering
connected components
community structure
strength of ties (weighted)
frequency of communication
strength of influence
spreading agent
attractiveness and specificity of information
Strong tie defined
A strong tie
frequent contact
affinity
many mutual contacts
“forbidden triad”:
strong ties are
likely to “close”
Less likely to be a bridge (or a local bridge)
Source: Granovetter, M. (1973). "The Strength of Weak Ties",
edge embeddeness
embeddeness: number of common neighbors the two
endpoints have
neighborhood overlap:
school kids and 1st through 8th choices of friends
snowball sampling:
will you reach more different kids by asking each kid to name
their 2 best friends, or their 7th & 8th closest friend?
Source: M. van Alstyne, S. Aral. Networks, Information & Social Capital
is it good to be embedded?
What are the advantages of occupying an embedded
position in the network?
What are the disadvantages of being embedded?
Advantages of being a broker (spanning structural
holes)?
Disadvantages of being a broker?
outline
factors influencing information diffusion
network structure: which nodes are connected?
strength of ties: how strong are the connections?
studies in information diffusion:
Granovetter: the strength of weak ties
J-P Onnela et al: strength of intermediate ties
Kossinets et al: strength of backbone ties
Davis: board interlocks and adoption of practices
network position and access to information
Burt: Structural holes and good ideas
Aral and van Alstyne: networks and information advantage
networks and innovation
Lazer and Friedman: innovation
how does strength of a tie influence diffusion?
M. S. Granovetter: The Strength of Weak Ties, AJS, 1973:
finding a job through a contact that one saw
frequently (2+ times/week) 16.7%
occasionally (more than once a year but < 2x week) 55.6%
rarely 27.8%
but… length of path is short
contact directly works for/is the employer
or is connected directly to employer
strength of tie: frequency of communication
Kossinets, Watts, Kleinberg, KDD 2008:
which paths yield the most up to date info?
how many of the edges form the “backbone”?
source: Kossinets et al. “The structure of information pathways in a social communication network”
the strength of intermediate ties
strong ties
frequent communication, but ties are redundant due to high
clustering
weak ties
reach far across network, but communication is infrequent…
“Structure and tie strengths in mobile communication networks”
use nation-wide cellphone call records and simulate diffusion
using actual call timing
Localized strong ties slow infection spread.
source: Onnela J. et.al. Structure and tie strengths in mobile communication networks
how can information diffusion be different from
simple contagion (e.g. a virus)?
simple contagion:
infected individual infects neighbors with information at some
rate
threshold contagion:
individuals must hear information (or observe behavior) from a
number or fraction of friends before adopting
in lab: complex contagion (Centola & Macy, AJS, 2007)
how do you pick individuals to “infect” such that your opinion
prevails
http://www.ladamic.com/netlearn/NetLogo4/DiffusionCompetition.html
Framework
The network consists of nodes (individual) and edges
(links between nodes)
Each node is in one of two states
Susceptible (in other words, healthy)
Infected
Susceptible-Infected-Susceptible (SIS) model
Cured nodes immediately become susceptible
Infected by neighbor
Susceptible
Infected
Cured
internally
Framework (Continued)
Homogeneous birth rate β on all edges between infected
and susceptible nodes
Homogeneous death rate δ for infected nodes
Healthy
Prob. δ
N2
Prob. β
N1
Infected
X
N3
SIR and SIS Models
An SIS model consists of two group
Susceptible: Those who may contract the disease
Infected: Those infected
An SIR model consists of three group
Susceptible: Those who may contract the disease
Infected: Those infected
Recovered: Those with natural immunity or those that have died.
Important Parameters
α is the transmission coefficient, which determines the rate at
which the disease travels from one population to another.
γ is the recovery rate: (I persons)/(days required to recover)
R0 is the basic reproduction number.
R0 So (So )1 /
(Number of new cases arising from one infective) x (Average duration of infection)
If R0 > 1 then ∆I > 0 and an epidemic occurs
SIR and SIS Models
SIS Model:
S SI I
I SI I
SIR Model:
S SI
I SI I
R I
Diffusion in networks: ER graphs
review: diffusion in ER graphs
http://www.ladamic.com/netlearn/NetLogo501/ERDiffusion.html
Quiz Q:
When the density of the network increases, diffusion in
the network is
faster
slower
unaffected
ER graphs: connectivity and density
nodes infected after 10 steps, infection rate = 0.15
average degree = 2.5
average degree = 10
Quiz Q:
When nodes preferentially attach to high degree nodes,
the diffusion over the network is
faster
slower
unaffected
Diffusion in “grown networks”
nodes infected after 4 steps, infection rate = 1
preferential attachment
non-preferential growth
http://www.ladamic.com/netlearn/NetLogo501/BADiffusion.html
Diffusion in small worlds
What is the role of the long-range links in diffusion over small
world topologies?
http://www.ladamic.com/netlearn/NetLogo4/SmallWorldDiffusionSIS.html
diffusion of innovation
surveys:
farmers adopting new varieties of hybrid corn by observing what
their neighbors were planting (Ryan and Gross, 1943)
doctors prescribing new medication (Coleman et al. 1957)
spread of obesity & happiness in social networks (Christakis and
Fowler, 2008)
online behavioral data:
Spread of Flickr photos & Digg stories
(Lerman, 2007)
joining LiveJournal groups & CS conferences
(Backstrom et al. 2006)
+ others e.g. Anagnostopoulos et al. 2008
Open question: how do we tell influence from
correlation?
approaches:
time resolved data: if adoption time is shuffled, does it yield the
same patterns?
if edges are directed: does reversing the edge direction yield
less predictive power?
27
Example: adopting new practices
poison pills
diffused through interlocks
geography had little to do with it
more likely to be influenced
by tie to firm doing something
similar & having similar centrality
golden parachutes
did not diffuse through interlocks
geography was a significant factor
more likely to follow “central” firms
why did one diffuse through the “network” while the other
did not?
Source: Corporate Elite Networks and Governance Changes in the 1980s.
Social Network and Spread of
Influence
Social network plays a
fundamental role as a medium
for the spread of INFLUENCE
among its members
Opinions, ideas, information,
innovation…
Direct Marketing takes the “word-of-mouth”
effects to significantly increase profits
Gmail, Tupperware popularization, Microsoft
Origami …
Problem Setting
Given
a limited budget B for initial advertising
e.g. give away free samples of product
estimates for influence between individuals
Goal
trigger a large cascade of influence
e.g. further adoptions of a product
Question
Which set of individuals should B target at?
Application besides product marketing
spread an innovation
detect stories in blogs
Models of Influence
First mathematical models
[Schelling '70/'78, Granovetter '78]
Large body of subsequent work:
[Rogers '95, Valente '95, Wasserman/Faust '94]
Two basic classes of diffusion models: threshold
and cascade
General operational view:
A social network is represented as a directed graph,
with each person (customer) as a node
Nodes start either active or inactive
An active node may trigger activation of neighboring nodes
Monotonicity assumption: active nodes never deactivate
Threshold dynamics
The network:
• aij is the adjacency matrix (N ×N)
• un-weighted
• undirected
aij {0,1}
aij a ji
The nodes:
• are labelled i , i from 1 to N;
• have a state ; vi (t ) {0,1}
• and a threshold ri from some distribution.
Threshold dynamics
Node i has state vi (t ) {0,1}
and threshold ri
Neighbourhood average: r i
1
ki
a v
ij
j
j
Updating:
if ri ri
1
vi
unchanged otherwise
The fraction of nodes in state vi=1 is r(t):
Example
Inactive Node
0.6
0.3
Active Node
0.2
X
Threshold
0.2
Active neighbors
0.1
0.4
U
0.5
w
0.3
0.5
Stop!
0.2
v
Independent Cascade Model
When node v becomes active, it has a single chance of
activating each currently inactive neighbor w.
The activation attempt succeeds with probability pvw .
Example
0.6
Inactive Node
0.3
0.2
X
0.4
0.5
w
0.2
U
0.1
0.3
0.2
0.5
v
Stop!
Active Node
Newly active
node
Successful
attempt
Unsuccessful
attempt
outline
factors influencing information diffusion
network structure: which nodes are connected?
strength of ties: how strong are the connections?
studies in information diffusion:
Granovetter: the strength of weak ties
J-P Onnela et al: strength of intermediate ties
Kossinets et al: strength of backbone ties
Davis: board interlocks and adoption of practices
network position and access to information
Burt: Structural holes and good ideas
Aral and van Alstyne: networks and information advantage
networks and innovation
Lazer and Friedman: innovation
Burt: structural holes and good ideas
Managers asked to come up with an idea to improve the
supply chain
Then asked:
whom did you discuss the idea with?
whom do you discuss supply-chain issues with in general
do those contacts discuss ideas with one another?
673 managers (455 (68%) completed the survey)
~ 4000 relationships (edges)
results
people whose networks bridge structural holes have
higher compensation
positive performance evaluations
more promotions
more good ideas
these brokers are
more likely to express ideas
less likely to have their ideas dismissed by judges
more likely to have their ideas evaluated as valuable
Aral & Alstyne: Study of a head hunter firm
Three firms initially
Unusually measurable inputs and outputs
1300 projects over 5 yrs and
125,000 email messages over 10 months (avg 20% of time!)
Metrics
(i) Revenues per person and per project,
(ii) number of completed projects,
(iii) duration of projects,
(iv) number of simultaneous projects,
(v) compensation per person
Main firm 71 people in executive search (+2 firms partial data)
27 Partners, 29 Consultants, 13 Research, 2 IT staff
Four Data Sets per firm
52 Question Survey (86% response rate)
E-Mail
Accounting
15 Semi-structured interviews
Email structure matters
Coefficientsa
New Contract Revenue
Unstandardized Coefficients
B
Ave. E-Mail Size
Colleagues’ Ave.
Response Time
a.
b.
Sig. F
B
Std. Error
0.40
(Base Model)
Best structural pred.
Unstandardized Coefficients
Adj. R2
Std. Error
Contract Execution Revenue Coefficientsa
12604.0***
-10.7**
-198947.0
Adj. R2
Sig. F
0.19
4454.0
0.52
.006
1544.0**
639.0
0.30
.021
4.9
0.56
.042
-9.3*
4.7
0.34
.095
168968.0
0.56
.248
157789.0
0.42
.026
Dependent Variable: Bookings02
Base Model: YRS_EXP, PARTDUM, %_CEO_SRCH, SECTOR(dummies), %_SOLO.
-368924.0**
a. Dependent Variable: Billings02
b.
N=39. *** p<.01, ** p<.05, * p<.1
Sending shorter e-mail helps get contracts and finish them.
Faster response from colleagues helps finish them.
diverse networks drive performance by
providing access to novel information
network structure (having high degree) correlates with
receiving novel information sooner (as deduced from
hashed versions of their email)
getting information sooner correlates with $$ brought in
controlling for # of
years worked
job level
….
Network Structure Matters
Coefficientsa
New Contract Revenue
Unstandardized Coefficients
B
Std. Error
Betweenness
a.
b.
Unstandardized Coefficients
Sig. F
B
Std. Error
0.40
(Base Model)
Size Struct. Holes
Adj. R2
Contract Execution Revenue Coefficientsa
Adj. R2
Sig. F
0.19
13770***
4647
0.52
.006
7890*
1297*
773
0.47
.040
1696**
Dependent Variable: Bookings02
Base Model: YRS_EXP, PARTDUM, %_CEO_SRCH, SECTOR(dummies), %_SOLO.
4656
0.24
.100
697
0.30
.021
a. Dependent Variable: Billings02
b.
N=39. *** p<.01, ** p<.05, * p<.1
Bridging diverse communities is significant.
Being in the thick of information flows is significant.
outline
factors influencing information diffusion
network structure: which nodes are connected?
strength of ties: how strong are the connections?
studies in information diffusion:
Granovetter: the strength of weak ties
J-P Onnela et al: strength of intermediate ties
Kossinets et al: strength of backbone ties
Davis: board interlocks and adoption of practices
network position and access to information
Burt: Structural holes and good ideas
Aral and van Alstyne: networks and information advantage
networks and innovation
Lazer and Friedman: innovation
networked coordination game
choice between two things, A and B
e.g. basketball and soccer
if friends choose A, they get payoff a
if friends choose B, they get payoff b
if one chooses A while the other chooses B, their payoff
is 0
coordinating with one’s friends
Let A = basketball, B = soccer. Which one should you learn to play?
fraction p = 3/5 play basketball
fraction p = 2/5 play soccer
which choice has higher payoff?
d neighbors
p fraction play basketball (A)
(1-p) fraction play soccer (B)
if choose A, get payoff
p * d *a
if choose B, get payoff
(1-p) * d * b
so should choose A if
p d a ≥ (1-p) d b
or
p ≥ b / (a + b)
two equilibria
everyone adopts A
everyone adopts B
what happens in between?
What if two nodes switch at random? Will a cascade
occur?
example:
a = 3, b = 2
payoff for nodes interaction using behavior A is 3/2
as large as what they get if they both choose B
nodes will switch from B to A if at least
q = 2/(3+2) = 2/5 of their neighbors are using A
how does a cascade occur
suppose 2 nodes start playing basketball due to
external factors
e.g. they are bribed with a free pair of shoes by some
devious corporation
Quiz Q:
Which node(s) will switch to playing basketball next?
the complete cascade
you pick the initial 2 nodes
A larger example (Easley/Kleinberg Ch. 19)
does the cascade spread throughout the network?
http://www.ladamic.com/netlearn/NetLogo412/CascadeModel.html
implications for viral marketing
if you could pay a small number of individuals to use
your product,
which individuals would you pick?
Quiz question:
What is the role of communities in complex contagion
enabling ideas to spread in the presence of thresholds
creating isolated pockets impervious to outside ideas
allowing different opinions to take hold in different parts of the
network
bilingual nodes
so far nodes could only choose between A and B
what if you can play both A and B,
but pay an additional cost c?
try it on a line
Increase the cost of being bilingual so that no node
chooses to do so. Let the cascade run
Now lower the cost.
What happens?
Quiz Q:
The presence of bilingual nodes
helps the superior solution to spread throughout the network
helps inferior options to persist in the network
causes everyone in the network to become bilingual
knowledge, thresholds, and collective action
nodes need to coordinate across a network, but
have limited horizons
can individuals coordinate?
each node will act if at least x people (including
itself) mobilize
nodes will not mobilize
mobilization
there will be some turnout
innovation in networks
network topology influences who talks to whom
who talks to whom has important implications for
innovation and learning
better to innovate or imitate?
brainstorming:
more minds together,
but also danger of groupthink
working in isolation:
more independence
slower progress
in a network context
modeling the problem space
Kauffman’s NK model
N dimensional problem space
N bits, each can be 0 or 1
K describes the smoothness of the fitness landscape
how similar is the fitness of sequences with only 1-2 bits flipped
(K = 0, no similarity, K large, smooth fitness)
Kauffman’s NK model
fitness
K large
distance
K medium
K small
Update rules
As a node, you start out with a random bit string
At each iteration
If one of your neighbors has a solution that is more fit than yours,
imitate (copy their solution)
Otherwise innovate by flipping one of your bits
Quiz Q:
Relative to the regular lattice, the network with many
additional, random connections has on average:
slower convergence to a local optimum
smaller improvement in the best solution relative to the initial
maximum
more oscillations between solutions
networks and innovation:
is more information diffusion always better?
linear network
fully connected network
Nodes can innovate on their own (slowly) or adopt
their neighbor’s solution
Best solutions propagate through the network
source: Lazer and Friedman, The Parable of the Hare and the Tortoise: Small Worlds, Diversity, and System Performance
networks and innovation
fully connected network
converges more quickly on
a solution, but if there are
lots of local maxima in the
solution space, it may get
stuck without finding
optimum.
linear network (fewer
edges) arrives at better
solution eventually
because individuals
innovate longer
Coordination: graph coloring
Application: coloring a map: limited set of colors, no
two adjacent countries should have the same color
graph coloring on a network
Each node is a human subject. Different experimental
conditions:
knowledge of neighbors’ color
knowledge of entire network
Compare:
regular ring lattice
small-world topology
scale-free networks
Kearns et al., ‘An Experimental Study of the Coloring Problem on Human Subject Networks’,
Science, 313(5788), pp. 824-827, 2006
simulation
Kearns et al. “An Experimental Study of the Coloring
Problem on Human Subject Networks”
network structure affects convergence in coordination games,
e.g. graph coloring
http://www.ladamic.com/netlearn/NetLogo4/GraphColoring.html
Quiz Q:
As the rewiring probability is increased from 0 to 1 the
following happens:
the solution time decreases
the solution time increases
the solution time initially decreases then increases again
to sum up
network topology influences processes occurring on
networks
what state the nodes converge to
how quickly they get there
process mechanism matters:
simple vs. complex contagion
coordination
learning
diffusion can be simple (person to person) or complex
(individuals having thresholds)
people in special network positions (the brokers) have an
advantage in receiving novel info & coming up with
“novel” ideas
in some scenarios, information diffusion may hinder
innovation