Download Information Diffusion

Document related concepts

IEEE 802.1aq wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

CAN bus wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Computer network wikipedia , lookup

Network tap wikipedia , lookup

Peer-to-peer wikipedia , lookup

Airborne Networking wikipedia , lookup

Transcript
Lecture 27:
Information diffusion
CS 765 Complex Networks
Slides are modified from Lada Adamic, David Kempe, Bill Hackbor
outline
 factors influencing information diffusion
 network structure: which nodes are connected?
 strength of ties: how strong are the connections?
 studies in information diffusion:
 Granovetter: the strength of weak ties
 J-P Onnela et al: strength of intermediate ties
 Kossinets et al: strength of backbone ties
 Davis: board interlocks and adoption of practices
 network position and access to information
 Burt: Structural holes and good ideas
 Aral and van Alstyne: networks and information advantage
 networks and innovation
 Lazer and Friedman: innovation
factors influencing diffusion
 network structure (unweighted)
 density
 degree distribution
 clustering
 connected components
 community structure
 strength of ties (weighted)
 frequency of communication
 strength of influence
 spreading agent
 attractiveness and specificity of information
Strong tie defined
 A strong tie
 frequent contact
 affinity
 many mutual contacts
“forbidden triad”:
strong ties are
likely to “close”
 Less likely to be a bridge (or a local bridge)
Source: Granovetter, M. (1973). "The Strength of Weak Ties",
edge embeddeness
 embeddeness: number of common neighbors the two
endpoints have
 neighborhood overlap:
school kids and 1st through 8th choices of friends
 snowball sampling:
 will you reach more different kids by asking each kid to name
their 2 best friends, or their 7th & 8th closest friend?
Source: M. van Alstyne, S. Aral. Networks, Information & Social Capital
is it good to be embedded?
 What are the advantages of occupying an embedded
position in the network?
 What are the disadvantages of being embedded?
 Advantages of being a broker (spanning structural
holes)?
 Disadvantages of being a broker?
outline
 factors influencing information diffusion
 network structure: which nodes are connected?
 strength of ties: how strong are the connections?
 studies in information diffusion:
 Granovetter: the strength of weak ties
 J-P Onnela et al: strength of intermediate ties
 Kossinets et al: strength of backbone ties
 Davis: board interlocks and adoption of practices
 network position and access to information
 Burt: Structural holes and good ideas
 Aral and van Alstyne: networks and information advantage
 networks and innovation
 Lazer and Friedman: innovation
how does strength of a tie influence diffusion?
 M. S. Granovetter: The Strength of Weak Ties, AJS, 1973:
 finding a job through a contact that one saw
 frequently (2+ times/week) 16.7%
 occasionally (more than once a year but < 2x week) 55.6%
 rarely 27.8%
 but… length of path is short
 contact directly works for/is the employer
 or is connected directly to employer
strength of tie: frequency of communication
 Kossinets, Watts, Kleinberg, KDD 2008:
 which paths yield the most up to date info?
 how many of the edges form the “backbone”?
source: Kossinets et al. “The structure of information pathways in a social communication network”
the strength of intermediate ties
 strong ties
 frequent communication, but ties are redundant due to high
clustering
 weak ties
 reach far across network, but communication is infrequent…
 “Structure and tie strengths in mobile communication networks”
 use nation-wide cellphone call records and simulate diffusion
using actual call timing
Localized strong ties slow infection spread.
source: Onnela J. et.al. Structure and tie strengths in mobile communication networks
how can information diffusion be different from
simple contagion (e.g. a virus)?
 simple contagion:
 infected individual infects neighbors with information at some
rate
 threshold contagion:
 individuals must hear information (or observe behavior) from a
number or fraction of friends before adopting
 in lab: complex contagion (Centola & Macy, AJS, 2007)
 how do you pick individuals to “infect” such that your opinion
prevails
http://www.ladamic.com/netlearn/NetLogo4/DiffusionCompetition.html
Framework
 The network consists of nodes (individual) and edges
(links between nodes)
 Each node is in one of two states
 Susceptible (in other words, healthy)
 Infected
 Susceptible-Infected-Susceptible (SIS) model
 Cured nodes immediately become susceptible
Infected by neighbor
Susceptible
Infected
Cured
internally
Framework (Continued)
 Homogeneous birth rate β on all edges between infected
and susceptible nodes
 Homogeneous death rate δ for infected nodes
Healthy
Prob. δ
N2
Prob. β
N1
Infected
X
N3
SIR and SIS Models
An SIS model consists of two group
 Susceptible: Those who may contract the disease
 Infected: Those infected
An SIR model consists of three group
 Susceptible: Those who may contract the disease
 Infected: Those infected
 Recovered: Those with natural immunity or those that have died.
Important Parameters
 α is the transmission coefficient, which determines the rate at
which the disease travels from one population to another.
 γ is the recovery rate: (I persons)/(days required to recover)
 R0 is the basic reproduction number.

R0  So  (So )1 /  

(Number of new cases arising from one infective) x (Average duration of infection)
If R0 > 1 then ∆I > 0 and an epidemic occurs
SIR and SIS Models
SIS Model:
S  SI  I
I  SI  I
SIR Model:
S  SI
I  SI  I
R  I
Diffusion in networks: ER graphs
 review: diffusion in ER graphs
http://www.ladamic.com/netlearn/NetLogo501/ERDiffusion.html
Quiz Q:
 When the density of the network increases, diffusion in
the network is
 faster
 slower
 unaffected
ER graphs: connectivity and density
nodes infected after 10 steps, infection rate = 0.15
average degree = 2.5
average degree = 10
Quiz Q:
 When nodes preferentially attach to high degree nodes,
the diffusion over the network is
 faster
 slower
 unaffected
Diffusion in “grown networks”
 nodes infected after 4 steps, infection rate = 1
preferential attachment
non-preferential growth
http://www.ladamic.com/netlearn/NetLogo501/BADiffusion.html
Diffusion in small worlds

What is the role of the long-range links in diffusion over small
world topologies?
http://www.ladamic.com/netlearn/NetLogo4/SmallWorldDiffusionSIS.html
diffusion of innovation
 surveys:
 farmers adopting new varieties of hybrid corn by observing what
their neighbors were planting (Ryan and Gross, 1943)
 doctors prescribing new medication (Coleman et al. 1957)
 spread of obesity & happiness in social networks (Christakis and
Fowler, 2008)
 online behavioral data:
 Spread of Flickr photos & Digg stories
(Lerman, 2007)
 joining LiveJournal groups & CS conferences
(Backstrom et al. 2006)
 + others e.g. Anagnostopoulos et al. 2008
Open question: how do we tell influence from
correlation?
 approaches:
 time resolved data: if adoption time is shuffled, does it yield the
same patterns?
 if edges are directed: does reversing the edge direction yield
less predictive power?
27
Example: adopting new practices
 poison pills
 diffused through interlocks
 geography had little to do with it
 more likely to be influenced
by tie to firm doing something
similar & having similar centrality
 golden parachutes
 did not diffuse through interlocks
 geography was a significant factor
 more likely to follow “central” firms
 why did one diffuse through the “network” while the other
did not?
Source: Corporate Elite Networks and Governance Changes in the 1980s.
Social Network and Spread of
Influence
 Social network plays a
fundamental role as a medium
for the spread of INFLUENCE
among its members
 Opinions, ideas, information,
innovation…
 Direct Marketing takes the “word-of-mouth”
effects to significantly increase profits
 Gmail, Tupperware popularization, Microsoft
Origami …
Problem Setting
 Given
 a limited budget B for initial advertising
 e.g. give away free samples of product
 estimates for influence between individuals
 Goal
 trigger a large cascade of influence
 e.g. further adoptions of a product
 Question
 Which set of individuals should B target at?
 Application besides product marketing
 spread an innovation
 detect stories in blogs
Models of Influence
 First mathematical models
 [Schelling '70/'78, Granovetter '78]
 Large body of subsequent work:
 [Rogers '95, Valente '95, Wasserman/Faust '94]
 Two basic classes of diffusion models: threshold
and cascade
 General operational view:
 A social network is represented as a directed graph,
with each person (customer) as a node
 Nodes start either active or inactive
 An active node may trigger activation of neighboring nodes
 Monotonicity assumption: active nodes never deactivate
Threshold dynamics
The network:
• aij is the adjacency matrix (N ×N)
• un-weighted
• undirected
aij  {0,1}
aij  a ji
The nodes:
• are labelled i , i from 1 to N;
• have a state ; vi (t ) {0,1}
• and a threshold ri from some distribution.
Threshold dynamics
Node i has state vi (t ) {0,1}
and threshold ri
Neighbourhood average: r i 
1
ki
a v
ij
j
j
Updating:
if ri  ri
1
vi  
 unchanged otherwise
The fraction of nodes in state vi=1 is r(t):
Example
Inactive Node
0.6
0.3
Active Node
0.2
X
Threshold
0.2
Active neighbors
0.1
0.4
U
0.5
w
0.3
0.5
Stop!
0.2
v
Independent Cascade Model
 When node v becomes active, it has a single chance of
activating each currently inactive neighbor w.
 The activation attempt succeeds with probability pvw .
Example
0.6
Inactive Node
0.3
0.2
X
0.4
0.5
w
0.2
U
0.1
0.3
0.2
0.5
v
Stop!
Active Node
Newly active
node
Successful
attempt
Unsuccessful
attempt
outline
 factors influencing information diffusion
 network structure: which nodes are connected?
 strength of ties: how strong are the connections?
 studies in information diffusion:
 Granovetter: the strength of weak ties
 J-P Onnela et al: strength of intermediate ties
 Kossinets et al: strength of backbone ties
 Davis: board interlocks and adoption of practices
 network position and access to information
 Burt: Structural holes and good ideas
 Aral and van Alstyne: networks and information advantage
 networks and innovation
 Lazer and Friedman: innovation
Burt: structural holes and good ideas
 Managers asked to come up with an idea to improve the
supply chain
 Then asked:
 whom did you discuss the idea with?
 whom do you discuss supply-chain issues with in general
 do those contacts discuss ideas with one another?
 673 managers (455 (68%) completed the survey)
 ~ 4000 relationships (edges)
results
 people whose networks bridge structural holes have
 higher compensation
 positive performance evaluations
 more promotions
 more good ideas
 these brokers are
 more likely to express ideas
 less likely to have their ideas dismissed by judges
 more likely to have their ideas evaluated as valuable
Aral & Alstyne: Study of a head hunter firm
 Three firms initially
 Unusually measurable inputs and outputs
 1300 projects over 5 yrs and
 125,000 email messages over 10 months (avg 20% of time!)
 Metrics
(i) Revenues per person and per project,
(ii) number of completed projects,
(iii) duration of projects,
(iv) number of simultaneous projects,
(v) compensation per person
 Main firm 71 people in executive search (+2 firms partial data)
 27 Partners, 29 Consultants, 13 Research, 2 IT staff

Four Data Sets per firm




52 Question Survey (86% response rate)
E-Mail
Accounting
15 Semi-structured interviews
Email structure matters
Coefficientsa
New Contract Revenue
Unstandardized Coefficients
B
Ave. E-Mail Size
Colleagues’ Ave.
Response Time
a.
b.
Sig. F 
B
Std. Error
0.40
(Base Model)
Best structural pred.
Unstandardized Coefficients
Adj. R2
Std. Error
Contract Execution Revenue Coefficientsa
12604.0***
-10.7**
-198947.0
Adj. R2
Sig. F 
0.19
4454.0
0.52
.006
1544.0**
639.0
0.30
.021
4.9
0.56
.042
-9.3*
4.7
0.34
.095
168968.0
0.56
.248
157789.0
0.42
.026
Dependent Variable: Bookings02
Base Model: YRS_EXP, PARTDUM, %_CEO_SRCH, SECTOR(dummies), %_SOLO.
-368924.0**
a. Dependent Variable: Billings02
b.
N=39. *** p<.01, ** p<.05, * p<.1
Sending shorter e-mail helps get contracts and finish them.
Faster response from colleagues helps finish them.
diverse networks drive performance by
providing access to novel information
 network structure (having high degree) correlates with
receiving novel information sooner (as deduced from
hashed versions of their email)
 getting information sooner correlates with $$ brought in
 controlling for # of
years worked
 job level
 ….
Network Structure Matters
Coefficientsa
New Contract Revenue
Unstandardized Coefficients
B
Std. Error
Betweenness
a.
b.
Unstandardized Coefficients
Sig. F 
B
Std. Error
0.40
(Base Model)
Size Struct. Holes
Adj. R2
Contract Execution Revenue Coefficientsa
Adj. R2
Sig. F 
0.19
13770***
4647
0.52
.006
7890*
1297*
773
0.47
.040
1696**
Dependent Variable: Bookings02
Base Model: YRS_EXP, PARTDUM, %_CEO_SRCH, SECTOR(dummies), %_SOLO.
4656
0.24
.100
697
0.30
.021
a. Dependent Variable: Billings02
b.
N=39. *** p<.01, ** p<.05, * p<.1
Bridging diverse communities is significant.
Being in the thick of information flows is significant.
outline
 factors influencing information diffusion
 network structure: which nodes are connected?
 strength of ties: how strong are the connections?
 studies in information diffusion:
 Granovetter: the strength of weak ties
 J-P Onnela et al: strength of intermediate ties
 Kossinets et al: strength of backbone ties
 Davis: board interlocks and adoption of practices
 network position and access to information
 Burt: Structural holes and good ideas
 Aral and van Alstyne: networks and information advantage
 networks and innovation
 Lazer and Friedman: innovation
networked coordination game
 choice between two things, A and B
 e.g. basketball and soccer
 if friends choose A, they get payoff a
 if friends choose B, they get payoff b
 if one chooses A while the other chooses B, their payoff
is 0
coordinating with one’s friends
Let A = basketball, B = soccer. Which one should you learn to play?
fraction p = 3/5 play basketball
fraction p = 2/5 play soccer
which choice has higher payoff?
 d neighbors
 p fraction play basketball (A)
 (1-p) fraction play soccer (B)
 if choose A, get payoff
p * d *a
 if choose B, get payoff
(1-p) * d * b
 so should choose A if
 p d a ≥ (1-p) d b
or
 p ≥ b / (a + b)
two equilibria
 everyone adopts A
 everyone adopts B
what happens in between?
 What if two nodes switch at random? Will a cascade
occur?
 example:
 a = 3, b = 2
 payoff for nodes interaction using behavior A is 3/2
 as large as what they get if they both choose B
 nodes will switch from B to A if at least
q = 2/(3+2) = 2/5 of their neighbors are using A
how does a cascade occur
 suppose 2 nodes start playing basketball due to
external factors
 e.g. they are bribed with a free pair of shoes by some
devious corporation
Quiz Q:
Which node(s) will switch to playing basketball next?
the complete cascade
you pick the initial 2 nodes
 A larger example (Easley/Kleinberg Ch. 19)
 does the cascade spread throughout the network?
http://www.ladamic.com/netlearn/NetLogo412/CascadeModel.html
implications for viral marketing
 if you could pay a small number of individuals to use
your product,
 which individuals would you pick?
Quiz question:
 What is the role of communities in complex contagion
 enabling ideas to spread in the presence of thresholds
 creating isolated pockets impervious to outside ideas
 allowing different opinions to take hold in different parts of the
network
bilingual nodes
 so far nodes could only choose between A and B
 what if you can play both A and B,
 but pay an additional cost c?
try it on a line
 Increase the cost of being bilingual so that no node
chooses to do so. Let the cascade run
 Now lower the cost.
 What happens?
Quiz Q:
 The presence of bilingual nodes
 helps the superior solution to spread throughout the network
 helps inferior options to persist in the network
 causes everyone in the network to become bilingual
knowledge, thresholds, and collective action
 nodes need to coordinate across a network, but
have limited horizons
can individuals coordinate?
 each node will act if at least x people (including
itself) mobilize
nodes will not mobilize
mobilization
 there will be some turnout
innovation in networks
 network topology influences who talks to whom
 who talks to whom has important implications for
innovation and learning
better to innovate or imitate?
brainstorming:
more minds together,
but also danger of groupthink
working in isolation:
more independence
slower progress
in a network context
modeling the problem space
 Kauffman’s NK model
 N dimensional problem space
 N bits, each can be 0 or 1
 K describes the smoothness of the fitness landscape
 how similar is the fitness of sequences with only 1-2 bits flipped
(K = 0, no similarity, K large, smooth fitness)
Kauffman’s NK model
fitness
K large
distance
K medium
K small
Update rules
 As a node, you start out with a random bit string
 At each iteration
 If one of your neighbors has a solution that is more fit than yours,
imitate (copy their solution)
 Otherwise innovate by flipping one of your bits
Quiz Q:
 Relative to the regular lattice, the network with many
additional, random connections has on average:
 slower convergence to a local optimum
 smaller improvement in the best solution relative to the initial
maximum
 more oscillations between solutions
networks and innovation:
is more information diffusion always better?
linear network
fully connected network
 Nodes can innovate on their own (slowly) or adopt
their neighbor’s solution
 Best solutions propagate through the network
source: Lazer and Friedman, The Parable of the Hare and the Tortoise: Small Worlds, Diversity, and System Performance
networks and innovation
 fully connected network
converges more quickly on
a solution, but if there are
lots of local maxima in the
solution space, it may get
stuck without finding
optimum.
 linear network (fewer
edges) arrives at better
solution eventually
because individuals
innovate longer
Coordination: graph coloring
 Application: coloring a map: limited set of colors, no
two adjacent countries should have the same color
graph coloring on a network
 Each node is a human subject. Different experimental
conditions:
 knowledge of neighbors’ color
 knowledge of entire network
 Compare:
 regular ring lattice
 small-world topology
 scale-free networks
Kearns et al., ‘An Experimental Study of the Coloring Problem on Human Subject Networks’,
Science, 313(5788), pp. 824-827, 2006
simulation
 Kearns et al. “An Experimental Study of the Coloring
Problem on Human Subject Networks”
 network structure affects convergence in coordination games,
e.g. graph coloring
http://www.ladamic.com/netlearn/NetLogo4/GraphColoring.html
Quiz Q:
 As the rewiring probability is increased from 0 to 1 the
following happens:
 the solution time decreases
 the solution time increases
 the solution time initially decreases then increases again
to sum up
 network topology influences processes occurring on
networks
 what state the nodes converge to
 how quickly they get there
 process mechanism matters:
 simple vs. complex contagion
 coordination
 learning
 diffusion can be simple (person to person) or complex
(individuals having thresholds)
 people in special network positions (the brokers) have an
advantage in receiving novel info & coming up with
“novel” ideas
 in some scenarios, information diffusion may hinder
innovation