Download Document

Document related concepts
no text concepts found
Transcript
Statistical Analysis of Social Networks
1) From description to Inference: Confidence
intervals for measures
2) QAP Models – (review)
1) Networks as independent variables
2) Networks as dependent variables
3) ERGM / Markov Chain Monte Carlo (MCMC)
4) Simulations from process to network
http://www.soc.duke.edu/~jmoody77/s884/notes/stochasticNets.ppt
Statistical Analysis of Social Networks
Confidence Intervals: Bootstraps and Jackknifes
(Snijders & Borgatti, 1999)
Goal: “Useful to have an indication of how precise a given description is,
particularly when making comparisons between groups.”
Assumes that “a researcher is interested in some descriptive statistic …
and wishes to have a standard error for this descriptive statistic without
making implausibly strong assumptions about how the network came
about.”
Confidence Intervals: Bootstraps and Jackknifes
(Snijders & Borgatti, 1999)
Jackknifes.
Given a dataset w. N sample elements, N artificial datasets are
created by deleting each sample element in turn from the
observed dataset.
In standard practice, the formula for the standard error is then:
SE j 
N 1 N
2
(
Z

Z
)

i

N i 1
Jackknifes: Example on regular data
Obs
i
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
MEAN:
x
0.85
0.70
1.00
0.59
0.22
0.69
0.43
0.32
0.50
0.67
0.60
s1
.
0.70
1.00
0.59
0.22
0.69
0.43
0.32
0.50
0.67
0.57
s2
0.85
.
1.00
0.59
0.22
0.69
0.43
0.32
0.50
0.67
0.58
s3
0.85
0.70
.
0.59
0.22
0.69
0.43
0.32
0.50
0.67
0.55
s4
0.85
0.70
1.00
.
0.22
0.69
0.43
0.32
0.50
0.67
0.60
s5
0.85
0.70
1.00
0.59
.
0.69
0.43
0.32
0.50
0.67
0.64
s6
0.85
0.70
1.00
0.59
0.22
.
0.43
0.32
0.50
0.67
0.59
s7
0.85
0.70
1.00
0.59
0.22
0.69
.
0.32
0.50
0.67
0.61
s8
0.85
0.70
1.00
0.59
0.22
0.69
0.43
.
0.50
0.67
0.63
s9
0.85
0.70
1.00
0.59
0.22
0.69
0.43
0.32
.
0.67
0.61
s10
0.85
0.70
1.00
0.59
0.22
0.69
0.43
0.32
0.50
.
0.59
Jackknifes: Example on regular data
SEj = 0.0753
SE = 0.0753
Jackknifes: For networks
For networks,we need to adjust the scaling parameter:
SE j 
N 2 N
2
(
Z

Z
)

i

2 N i 1
Where Z-i is the network statistic calculated without vertex i,
and Z-• is the average of Z-1 … Z-N.
This procedure will work for any network statistic Z, and
UCINET will use it to test differences in network density.
Jackknifes: For networks
An example based on the Trade data. Density, Std. Errors and
confidence intervals for each matrix.
DIP_DEN
DIP_SEJ
DIP_UB
DIP_LB
0.6684783 0.0636125 0.7931588 0.5437978
CRUDE_DEN CRUDE_SEJ CRUDE_UB CRUDE_LB
0.5561594 0.0676669 0.6887866 0.4235323
FOOD_DEN FOOD_SEJ
FOOD_UB
FOOD_LB
0.5561594 0.0633776 0.6803794 0.4319394
MAN_DEN
MAN_SEJ
MAN_UB
MAN_LB
0.5615942 0.0724143 0.7035263 0.4196621
MIN_DEN
MIN_SEJ
MIN_UB
MIN_LB
0.2445652 0.0530224 0.3484891 0.1406414
Bootstrap
In general, bootstrap techniques effectively treat the given
sample as the population, then draw samples, with replacement,
from the observed distribution.
For networks, we draw random samples of the vertices, creating
a new network Y*
Y  Yi ( k )i ( h), for i(k )  i(h)
*
kh
If i(k) = i(h), then randomly fill in the dyads based from the set of
all possible dyads (I.e. fill in this cell with a random draw from
the population).
Bootstrap
For each bootstrap sample:
• Draw N random numbers, with replacement, from 1 to N,
denoted i(1)..i(N)
• Construct Y* based on i(1)..i(N)
• Calculate the statistic of interest, called Z*m,
Repeat this process M (=thousands) of times.
1
M
*(m )
*( ) 2
SEb 
(Z
Z )

m 1
M 1
Bootstraps: Comparing density
Bootstraps: Comparing density
BOOTSTRAP PAIRED SAMPLE T-TEST
-------------------------------------------------------------------------------Density of trade_min is: 0.2446
Density of trade_dip is: 0.6685
Difference in density is: -0.4239
Number of bootstrap samples: 5000
Variance of ties for trade_min: 0.1851
Variance of ties for trade_dip: 0.2220
Classical standard error of difference: 0.0272
Classical t-test (indep samples): -15.6096
Estimated bootstrap standard error for density of trade_min: 0.0458
Estimated bootstrap standard error for density of trade_dip: 0.0553
Bootstrap standard error of the difference (indep samples): 0.0719
95% confidence interval for the difference (indep samples): [-0.5648, -0.2831]
bootstrap t-statistic (indep samples): -5.8994
Bootstrap SE for the difference (paired samples): 0.0430
95% bootstrap CI for the difference (paired samples): [-0.5082, -0.3396]
t-statistic: -9.8547
Average bootstrap difference: -0.3972
Proportion of absolute differences as large as observed: 0.0002
Proportion of differences as large as observed: 1.0000
Proportion of differences as large as observed: 0.0002
Measurement Sensitivity
A related question: How confident can you be in any measure
on an observed network, given the likelihood that observed ties
are, in fact, observed with error?
•Implies that some of the observed 0s are in fact 1s and
some of the 1s are in fact 0s.
•Suggests that we view the network not as a binary array of
0s and 1s, but instead a set of probabilities, such that:
Pij = f(Aij)
We can then calculate the statistic of interest M times under
different realizations of the network given Pij and get a
distribution of the statistic of interest.
Measurement Sensitivity
It seems a reasonable approach to assessing the effect of measurement error
on the ties in a network is to ask how would the network measures change if
the observed ties differed from those observed. This question can be
answered simply with Monte Carlo simulations on the observed network.
Thus, the procedure I propose is to:
• Generate a probability matrix from the set of observed ties,
• Generate many realizations of the network based on these underlying
probabilities, and
•Compare the distribution of generated statistics to those observed in the
data.
•How do we set pij?
•Range based on observed features (Sensitivity analysis)
•Outcome of a model based on observed patterns (ERGM)
Measurement Sensitivity
As an example, consider the problem of defining “friendship”
ties in highschools.
Should we count nominations that are not reciprocated?
Measurement Sensitivity
All ties
Reciprocated
Measurement Sensitivity
Measurement Sensitivity
Measurement Sensitivity
Measurement Sensitivity
Measurement Sensitivity
Measurement Sensitivity
Statistical Analysis of Social Networks
Comparing multiple networks: QAP
The substantive question is how one set of relations (or
dyadic attributes) relates to another.
For example:
• Do marriage ties correlate with business ties in the
Medici family network?
• Are friendship relations correlated with joint
membership in a club?
(review)
Assessing the correlation is straight forward, as we simply
correlate each corresponding cell of the two matrices:
Marriage
1 ACCIAIUOL
2
ALBIZZI
3 BARBADORI
4 BISCHERI
5 CASTELLAN
6
GINORI
7 GUADAGNI
8 LAMBERTES
9
MEDICI
10
PAZZI
11
PERUZZI
12
PUCCI
13
RIDOLFI
14 SALVIATI
15
STROZZI
16 TORNABUON
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
1
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
1
0
0
0
Business
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 1
6 0 0 1
7 0 0 0
8 0 0 0
9 0 0 1
10 0 0 0
11 0 0 1
12 0 0 0
13 0 0 0
14 0 0 0
15 0 0 0
16 0 0 0
0
0
0
0
0
0
1
1
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
1
0
0
0
0
0
Correlation:
1 0.3718679
0.3718679
1
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
1
0
0
0
1
0
0
0
1
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
1
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
Dyads:
1 2 0
1 3 0
1 4 0
1 5 0
1 6 0
1 7 0
1 8 0
1 9 1
1 10 0
1 11 0
1 12 0
1 13 0
1 14 0
1 15 0
1 16 0
2 1 0
2 3 0
2 4 0
2 5 0
2 6 1
2 7 1
2 8 0
2 9 1
2 10 0
2 11 0
2 12 0
2 13 0
2 14 0
2 15 0
2 16 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
(review)
Comparing multiple networks: QAP
But is the observed value statistically significant?
Can’t use standard inference, since the assumptions are violated.
Instead, we use a permutation approach.
Essentially, we are asking whether the observed correlation is
large (small) compared to that which we would get if the
assignment of variables to nodes were random, but the
interdependencies within variables were maintained.
Do this by randomly sorting the rows and columns of the matrix,
then re-estimating the correlation.
(review)
Comparing multiple networks: QAP
When you permute, you have to permute both the rows and the
columns simultaneously to maintain the interdependencies in
the data:
ID ORIG
A
B
C
D
E
0
0
0
0
0
1
0
0
0
0
2
1
0
0
0
Sorted
3
2
1
0
0
4
3
2
1
0
A
D
B
C
E
0
0
0
0
0
3
0
2
1
0
1
0
0
0
0
2
0
1
0
0
4
1
3
2
0
(review)
Comparing multiple networks: QAP
Procedure:
1. Calculate the observed correlation
2. for K iterations do:
a) randomly sort one of the matrices
b) recalculate the correlation
c) store the outcome
3. compare the observed correlation to the distribution of
correlations created by the random permutations.
(review)
Comparing multiple networks: QAP
(review)
Running QAP in UCINET:
(review)
QAP MATRIX CORRELATION
-------------------------------------------------------------------------------Observed matrix:
Structure matrix:
# of Permutations:
Random seed:
PadgBUS
PadgMAR
2500
356
Univariate statistics
1
Mean
2 Std Dev
3
Sum
4 Variance
5
SSQ
6
MCSSQ
7 Euc Norm
8 Minimum
9 Maximum
10 N of Obs
1
2
PadgBUS PadgMAR
------- ------0.125
0.167
0.331
0.373
30.000 40.000
0.109
0.139
30.000 40.000
26.250 33.333
5.477
6.325
0.000
0.000
1.000
1.000
240.000 240.000
Hubert's gamma: 16.000
Bivariate Statistics
1
Pearson Correlation:
2
Simple Matching:
3
Jaccard Coefficient:
4 Goodman-Kruskal Gamma:
5
Hamming Distance:
1
2
3
4
5
6
7
Value
Signif
Avg
SD P(Large) P(Small)
NPerm
--------- --------- --------- --------- --------- --------- --------0.372
0.000
0.001
0.092
0.000
1.000 2500.000
0.842
0.000
0.750
0.027
0.000
1.000 2500.000
0.296
0.000
0.079
0.046
0.000
1.000 2500.000
0.797
0.000
-0.064
0.382
0.000
1.000 2500.000
38.000
0.000
59.908
5.581
1.000
0.000 2500.000
(review)
Running QAP in UCINET: Regression
Using the same logic,we can estimate alternative models, such as
regression. Only complication is that you need to permute all of
the independent matrices in the same way each iteration.
(review)
Simple Example:
NODE
1
2
3
4
5
6
7
8
9
0
1
1
1
0
0
0
0
0
1
0
1
0
0
0
1
0
0
ADJMAT
1 1 0 0
1 0 0 0
0 0 1 0
0 0 1 0
1 1 0 1
0 0 1 0
1 0 0 0
0 0 1 1
0 0 0 1
0
1
1
0
0
0
0
0
0
0
0
0
0
1
1
0
0
1
0
0
0
0
0
1
0
1
0
0
1
0
0
1
0
0
0
1
1
0
0
0
1
0
0
0
1
SAMERCE
0 0 1 0
0 0 1 0
0 1 0 1
1 0 0 1
0 0 0 0
1 1 0 0
1 1 0 1
1 1 0 1
0 0 1 0
0
0
1
1
0
1
0
1
0
0
0
1
1
0
1
1
0
0
1
1
0
0
1
0
0
0
0
0
0
1
1
0
0
1
1
0
0
0
0
0
1
1
0
0
1
SAMESEX
1 1 0 0 1
0 0 1 1 0
0 1 0 0 1
1 0 0 0 1
0 0 0 1 0
0 0 1 0 0
1 1 0 0 0
1 1 0 0 1
0 0 1 1 0
1
0
1
1
0
0
1
0
0
0
1
0
0
1
1
0
0
0
(review)
Simple Example:
Y
0.32
0.59
0.54
0.50
0.04
0.02
0.41
0.01
-0.17
Distance (Dij=abs(Yi-Yj)
.000 .277 .228 .181 .278
.277 .000 .049 .096 .555
.228 .049 .000 .047 .506
.181 .096 .047 .000 .459
.278 .555 .506 .459 .000
.298 .575 .526 .479 .020
.095 .182 .134 .087 .372
.307 .584 .535 .488 .029
.481 .758 .710 .663 .204
.298
.575
.526
.479
.020
.000
.392
.009
.184
.095
.182
.134
.087
.372
.392
.000
.401
.576
.307
.584
.535
.488
.029
.009
.401
.000
.175
.481
.758
.710
.663
.204
.184
.576
.175
.000
(review)
Simple Example:
(review)
Simple Example, continuous dep. variable:
# of permutations:
Diagonal valid?
Random seed:
Dependent variable:
Expected values:
Independent variables:
2000
NO
995
EX_SIM
C:\moody\Classes\soc884\examples\UCINET\mrqap-predicted
EX_SSEX
EX_SRCE
EX_ADJ
Number of valid observations among the X variables = 72
N = 72
Number of permutations performed: 1999
MODEL FIT
R-square Adj R-Sqr Probability
# of Obs
-------- --------- ----------- ----------0.289
0.269
0.059
72
REGRESSION COEFFICIENTS
Independent
----------Intercept
EX_SSEX
EX_SRCE
EX_ADJ
Un-stdized
Stdized
Proportion Proportion
Coefficient Coefficient Significance
As Large
As Small
----------- ----------- ------------ ----------- ----------0.460139
0.000000
0.034
0.034
0.966
-0.073787
-0.170620
0.140
0.860
0.140
-0.020472
-0.047338
0.272
0.728
0.272
-0.239896
-0.536211
0.012
0.988
0.012
(review)
We can also model the network as a dependent variable. In the
next example, I model diplomacy as a function of trade, using the
country data.
Note that using UCINET, this gives a linear probability model,
OLS on a 0-1 dependent variable. This is often not optimal…
(review)
MULTIPLE REGRESSION QAP TRADE DIPLOMACY
-------------------------------------------------------------------------------# of permutations:
2000
Diagonal valid?
NO
Random seed:
1000
Dependent variable:
trade_dip
Expected values:
C:\moody\Classes\soc884\examples\UCINET\mrqap-predicted
Independent variables:
TRADE_FOOD
TRADE_CRUDE
TRADE_MAN
TRADE_MIN
Number of valid observations among the X variables = 552
N = 552
Number of permutations performed: 1999
MODEL FIT
R-square Adj R-Sqr Probability
# of Obs
-------- --------- ----------- ----------0.317
0.314
0.000
552
REGRESSION COEFFICIENTS
Un-stdized
Stdized
Proportion Proportion
Independent Coefficient Coefficient Significance
As Large
As Small
-------------- ----------- ----------- ------------ ----------- ----------Intercept
0.339308
0.000000
1.000
1.000
0.000
TRADE_FOOD
0.049975
0.052744
0.200
0.200
0.800
TRADE_CRUDE
0.109233
0.115284
0.033
0.033
0.967
TRADE_MAN
0.367435
0.387285
0.000
0.000
1.000
TRADE_MIN
0.140151
0.127965
0.058
0.058
0.942
Expected values saved as dataset mrqap-predicted
Valid observations saved as dataset mrqap-valid
(review)
One solution is to use a QAP Logit model. Note (yet) implemented
in UCINET, but you can use DAMN or SAS.
Let’s look at the country trade data again, using a logit model.
DAMN
*** REGRESSION PARAMETERS ***
man
coefficient=
1.80356 p-value=0.00000
food
coefficient=
0.30928 p-value=0.14867
crude
coefficient=
0.59624 p-value=0.00867
min
coefficient=
1.61641 p-value=0.00000
constant coefficient=
-0.78063 p-value=0.00000
Final Likelihood:
-252.366402
QAP Results: Logit model coefficients
Significance results are equal to the proportion of QAP parameters
that are as *small* as the observed parameters.
PARAMETERS
COEF
SIG
INTERCEPT -.949 .0010
TMAN
1.863 .9980
TFOOD
.3914 .8420
TCRUDE
.6667 .9800
TMIN
1.658 .9980
(review)
Modeling Social Networks parametrically:
Exponential Random Graph models – ERGM/p*
A long research tradition in statistics and random graph theory has
lead to parametric models of networks.
These are models of the entire graph, though as we will see they
often work on the dyads in the graph to be estimated.
Substantively, the approach is to ask whether the graph in question
is an element of the class of all random graphs with the given
known elements. For example, all graphs with 5 nodes and 3
edges, or, put probabilistically, the probability of observing the
current graph given the conditions.
Random Graphs and Conditional Expectations
The basis for the statistical modeling of graphs rests on
random graph theory.
Simply put, Random graph theory asks what properties do
we expect when ties (Xij) form at random.
The simplest random graph is the Bernoulli random graph,
where Xij is a constant and independent: says simply that
each edge in the graph has an independent probability of
being 1/0.
Typically this is an uninteresting distribution of graphs, and
we want to know what the graph looks like conditional on
other features of the graph.
Random Graphs and Conditional Expectations
A Bernoulli graph is only conditional on the expected number of
edges. So effectively we ask “What is the probability of
observing the graph we have, given the set of all possible graphs
with the same number of edges.”
We might, instead, want to condition on the degree distribution
(sent or received) or all graphs with a particular dyad distribution
(same number of Mutual, Asymmetric and Null dyads).
Closed form solutions for some graph statistics (like the triad
census) are known for out-degree, in-degree and MAN (but not
all 3 simultaneously).
Random Graphs and Conditional Expectations
PAJEK gives you the unconditional expected values:
-----------------------------------------------------------------------------Triadic Census 2. i:\people\jwm\s884\homework\prison.net (67)
-----------------------------------------------------------------------------Working...
---------------------------------------------------------------------------Type
Number of triads (ni)
Expected (ei)
(ni-ei)/ei
---------------------------------------------------------------------------1 - 003
39221
37227.47
0.05
2 - 012
5860
9587.83
-0.39
3 - 102
2336
205.78
10.35
4 - 021D
61
205.78
-0.70
5 - 021U
80
205.78
-0.61
6 - 021C
103
411.55
-0.75
7 - 111D
105
17.67
4.94
8 - 111U
69
17.67
2.91
9 - 030T
13
17.67
-0.26
10 - 030C
1
5.89
-0.83
11 - 201
12
0.38
30.65
12 - 120D
15
0.38
38.56
13 - 120U
7
0.38
17.46
14 - 120C
5
0.76
5.59
15 - 210
12
0.03
367.67
16 - 300
5
0.00
21471.04
---------------------------------------------------------------------------Chi-Square: 137414.3919***
6 cells (37.50%) have expected frequencies less than 5.
The minimum expected cell frequency is 0.00.
Random Graphs and Conditional Expectations
SPAN gives you the (X|MAN) distributions:
Triad Census
T TPCNT
003
012
102
021D
021U
021C
111D
111U
030T
030C
201
120D
120U
120C
210
300
39221
5860
2336
61
80
103
105
69
13
1
12
15
7
5
12
5
0.8187
0.1223
0.0488
0.0013
0.0017
0.0022
0.0022
0.0014
0.0003
209E-7
0.0003
0.0003
0.0001
0.0001
0.0003
0.0001
PU
EVT
0.8194
0.1213
0.0476
0.0015
0.0015
0.003
0.0023
0.0023
0.0001
239E-7
0.0009
286E-7
286E-7
573E-7
442E-7
549E-8
39251
5810.8
2278.7
70.949
70.949
141.9
112.39
112.39
3.4292
1.1431
42.974
1.3717
1.3717
2.7433
2.1186
0.2631
VARTU STDDIF
427.69
1053.5
321.01
67.37
67.37
127.58
103.57
103.57
3.3956
1.1393
38.123
1.368
1.368
2.7285
2.1023
0.2621
-1.472
1.5156
3.1954
-1.212
1.1027
-3.444
-0.727
-4.264
5.1939
-0.134
-5.017
11.652
4.8122
1.3662
6.8151
9.2522
Modeling Social Networks parametrically:
ERGM approaches
The earliest approaches are based on simple random graph theory,
but there’s been a flurry of activity in the last 10 years or so.
Key historical references:
- Holland and Leinhardt (1981) JASA
- Frank and Strauss (1986) JASA
- Wasserman and Faust (1994) – Chap 15 & 16
-Wasserman and Pattison (1996)
Best current update: http://www.jstatsoft.org/v24
Modeling Social Networks parametrically:
ERGM approaches
The “p1” model of Holland and Leinhardt is the classic foundation
– the basic idea is that you can generate a statistical model of the
network by predicting the counts of types of ties (asym, null, sym).
They formulate a log-linear model for these counts; but the model
is equivalent to a logit model on the dyads:
logit X ij  1   i   j   ( X ji )
Note the subscripts! This implies a distinct parameter for every
node i and j in the model, plus one for reciprocity.
Modeling Social Networks parametrically:
ERGM approaches
Modeling Social Networks parametrically:
ERGM approaches
Results from SAS version on PROSPER datasets
Modeling Social Networks parametrically:
ERGM approaches
Once you know the basic model format, you can imagine other
specifications:
logit X ij  1   i   j   ( X ji ) (orig)
logit X ij  1   i   j   g ( X ji ) (different ial reciprocit y)
logit X ij  1   i   j   ( X ji )  (node chars)  (orig)
Key is to ensure that the specification doesn’t imply a linear
dependency of terms.
Model fit is hard to judge – newer work shows that the se’s are
“approximate” ;-)
Modeling Social Networks parametrically:
ERGM approaches
exp{ z ( x)}
p ( X  x) 
 ( )
Where:
 is a vector of parameters (like regression coefficients)
z is a vector of network statistics, conditioning the graph
 is a normalizing constant, to ensure the probabilities sum to 1.
Modeling Social Networks parametrically:
ERGM approaches
The simplest graph is a Bernoulli random graph,where each Xij
is independent:
p( X  x) 
exp{ ij xij }
i, j
 ( )
Where:
ij = logit[P(Xij = 1)]
() =P[1 + exp(ij )]
Note this is one of the few cases where () can be written.
Modeling Social Networks parametrically:
ERGM approaches
Typically, we add a homogeneity condition, so that all
isomorphic graphs are equally likely. The homogeneous
bernulli graph model:
p( X  x) 
exp  { xij }
Where:
() =[1 + exp()]g
i, j
 ( )
Modeling Social Networks parametrically:
ERGM approaches
If we want to condition on anything much more complicated than
density, the normalizing constant ends up being a problem. We
need a way to express the probability of the graph that doesn’t
depend on that constant. First some terms:
X i, j  Sociomatri x with ij element forced to 1
X i, j  Sociomatri x with ij element forced to 0
X ic, j  Sociomatri x with no tie between i and j
Modeling Social Networks parametrically:
ERGM approaches
exp( wij ) 
p( X ij  1 | X )
c
ij
c
ij
p( X ij  0 | X )
exp{ z ( xij )}

c
p( X ij  0 | X ij ) exp{ z ( xij )}
p( X ij  1 | X ijc )



 exp{ [ z ( xij )  z ( xij )]
 p( X ij  1 | X ijc ) 



 ij  log 


[
z
(
x
)

z
(
x
ij
ij )]
c 
 p( X ij  0 | X ij ) 
Modeling Social Networks parametrically:
ERGM approaches
 p( X ij  1 | X ijc ) 



 ij  log 
  [ z ( xij )  z ( xij )]
c 
 p( X ij  0 | X ij ) 
Note that we can now model the conditional probability of
the graph, as a function of a set of difference statistics,
without reference to the normalizing constant. The model,
then, simply reduces to a logit model on the dyads.
Modeling Social Networks parametrically:
ERGM approaches
 p( X ij  1 | X ijc ) 



 ij  log 
  [ z ( xij )  z ( xij )]
c 
 p( X ij  0 | X ij ) 
Consider the simplest possible model: the Bernoulli random graph
model, which says the only feature of interest is the number of
edges in the graph. What is the change statistic for that feature?
z ( xij  1) (assume edge is present, so value is one)
z ( xij  0) (assume edge is absent, so vakye is zero)
z[ xij  xij ]  1 (differenc e is 1 for all dyads)
Modeling Social Networks parametrically:
ERGM approaches
Consider the simplest possible model: the Bernoulli random graph
model, which says the only feature of interest is the number of
edges in the graph. What is the change statistic for that feature?
The “Edges” parameter is simply an intercept-only model.
NODE
ADJMAT
1
0 1 1 1 0 0 0 0 0
2
1 0 1 0 0 0 1 0 0
3
1 1 0 0 1 0 1 0 0
4
1 0 0 0 1 0 0 0 0
5
0 0 1 1 0 1 0 1 0
6
0 0 0 0 1 0 0 1 1
7
0 1 1 0 0 0 0 0 0
8
0 0 0 0 1 1 0 0 1
9
0 0 0 0 0 1 0 1 0
Density: 0.311
Modeling Social Networks parametrically:
ERGM approaches
Consider the simplest possible model: the Bernoulli random graph
model, which says the only feature of interest is the number of
edges in the graph. What is the change statistic for that feature?
The “Edges” parameter is simply an intercept-only model.
proc logistic descending data=dydat;
model nom =;
run; quit;
---see results copy coef --data chk;
x=exp(-0.5705)/(1+exp(-0.5705));
run;
proc print data=chk;
run;
Modeling Social Networks parametrically:
ERGM approaches
Modeling Social Networks parametrically:
ERGM approaches
The logit model estimation procedure was popularized by Wasserman
& colleagues, and a good guide to this approach is:
Including:
A Practical Guide To Fitting p* Social Network Models
Via Logistic Regression
The site includes the PREPSTAR program for creating the
variables of interest. The following example draws from this
work. – this bit nicely walks you through the logic of
constructing change variables, model fit and so forth.
But the estimates are not very good for any parameters other than
“dyad independent” parameters!
Modeling Social Networks parametrically:
ERGM approaches
Parameters that are often fit include:
1) Expansiveness and attractiveness parameters. = dummies for
each sender/receiver in the network
2) Degree distribution
3) Mutuality
4) Group membership (and all other parameters by group)
5) Transitivity / Intransitivity
6) K-in-stars, k-out-stars
7) Cyclicity
8) Node-level covariates (Matching, difference)
9) Edge-level covariates (dyad-level features such as exposure)
10) Temporal data – such as relations in prior waves.
Modeling Social Networks parametrically:
Exponential Random Graph Models
Modeling Social Networks parametrically:
Exponential Random Graph Models
…and there are
LOTS of terms…
Complete Network Analysis
Stochastic Network Analysis
An example:
Network Model Coefficients, In school Networks
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
Modeling Social Networks parametrically:
Exponential Random Graph Models
In practice, logit estimated models are difficult to estimate, and
we have no good sense of how approximate the PMLE is.
The STATNET generalization is to use MCMC methods to better
estimate the parameters. This is essentially a simulation
procedure working “under the hood” to explore the space of
graphs described by the model parameters; searching for the best
fit to the observed data.
Modeling Social Networks parametrically:
Exponential Random Graph Models
You can specify a model as a simple statement on terms:
Modeling Social Networks parametrically:
Exponential Random Graph Models
Modeling Social Networks parametrically:
Exponential Random Graph Models:
Modeling Social Networks parametrically:
Exponential Random Graph Models:
Modeling Social Networks parametrically:
Exponential Random Graph Models:
Example 2: Hierarchy & Context in AH
Modeling Social Networks parametrically:
Exponential Random Graph Models:
Example 2:
Hierarchy & Context
in AH
Modeling Social Networks parametrically:
Exponential Random Graph Models: Degeneracy
"Assessing Degeneracy in Statistical Models of Social Networks" Mark S. Handcock, CSSS Working Paper #39
Modeling Social Networks parametrically:
Exponential Random Graph Models: Degeneracy
"Assessing Degeneracy in Statistical Models of Social Networks" Mark S. Handcock, CSSS Working Paper #39
Modeling Social Networks parametrically:
Exponential Random Graph Models: Degeneracy
"Assessing Degeneracy in Statistical Models of Social Networks" Mark S. Handcock, CSSS Working Paper #39
Generating Random Graph Samples
A conceptual merge between exponential random graph models and QAP/sensitivity
models is to attempt to identify a sample of graphs from the universe you are trying
to model.
exp{ z ( x)}
p ( X  x) 
 ( )
That is, generate X empirically, then compare z(x) to see how
likely a measure on x would be given X. The difficulty,
however, is generating X.
Generating Random Graph Samples
The first option would be to generate all isomorphic graphs within a given
constraint.
This is possible for small graphs, but the number gets large fast. For a
network with 3 nodes, there are 16 possible directed graphs. For a
network with 4 nodes, there are 218, for 5 nodes 9608, for 6
nodes1,540,944, and so on…
So, the best approach is to sample from the universe, but, of course, if you
had the universe you wouldn’t need to sample from it. How do you
sample from a population you haven’t observed?
(a) use a construction algorithm that generates a random graph with
known constraints (b) use a ERGM model like above.
Generating Random Graph Samples
Tom Snijders has a program called ZO (Zero-One) for doing this.
http://stat.gamma.rug.nl/snijders/
The program only works well for smallish networks (less than ~100)
Generating Random Networks with Structural Constraints.
General strategy:
Assign arcs at random within the cells of an adjacency matrix until
the desired graph is achieved.
Process.
1) Define the pool of open arcs.
Any cells of the g by g matrix which are structurally zero are not allowed.
5
3
4
5
1
3
0
2
6
7
7
1
3
6
0
5
3
4
2
5
Generating Random Networks with Structural Constraints.
2) Randomly draw an element from the available set.
5
3
4
5
1
3
0
2
6
7
7
1
3
6
0
5
3
4
2
5
Generating Random Networks with Structural Constraints.
3) Check to see if selected cell meets the
structural condition.
4) If a condition is met,then remove any
implicated cells from the pool.
5
3
4
5
1
3
0
2
6
7
7 1 3 6 0 5 3 4 2 5
5
3
4
5
1
3
0
2
6
7
7 1 3 6 0 5 3 4 2 5
Generating Random Networks with Structural Constraints.
5) Check for Identification: Does the last arc
imply the set of arcs for another?
5
3
4
5
1
3
0
2
6
7
7
1
3
6
0
5
3
4
2
5
In this example, there are only 7 available spots left in the last row,
equal to the number needed to fill that row condition.
Generating Random Networks with Structural Constraints.
Process:
1) Identify the pool of open cells.
2) Randomly draw an arc from this pool.
3) Check the structural conditions against this arc.
4) If structural conditions are met, then remove implied cells from
the pool.
5) Check for identification of other arcs.
Types of constraints:
• Structural Patterns, such as the in and out degree,
prohibition against cycles, etc.
• Category Mixing Constraints. Nodes in category i
restricted to nodes from category j.
• Event Counts. Number of mutual arcs, number of ties
between group i and j, etc.
Social Relations at “Holy Trinity” School.
7th Grade
8th Grade
9th Grade
10th Grade
11th Grade
12th Grade
g = 74
l = 466
d = .086
M=108
Transitivity: .357
Mean Degree: 6.3
Number of Mutual Dyads
2000 Networks, with fixed In and Out Degree
250
Number of Networks
200
150
Z.O.
100
RANFIX
50
0
5
10
15
20
25
30
Number of Mutual Dyads
35
40
Distribution of Selected Triad Types
Simulations compared to Observed
2000 random networks, with fixed in and out degree.
350
300
Count
250
200
150
Z.O.
100
50
RANFIX
0
Observed.
030T
201
120D
120U
120C
210
300
Romantic Networks
Romantic Networks