Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Generalized Exponential Random Graph Model for Weighted Networks James D. Wilson1 , Matthew J. Denny2 , Shankar Bhamidi3 , Skyler Cranmer4 , Bruce Desmarais2 1 University of San Fransisco 4 2 Penn State 3 UNC Chapel Hill OSU June 24th, 2016 [email protected], mjdenny.com Matt Denny The GERGM June 24th, 2016 1 / 23 Outline 1 ERGMs for Unweighted Graphs 2 The Generalized Exponential Random Graph Model (GERGM) 3 Metropolis Hastings: Expanded Applicability 4 Likelihood Degeneracy 5 Software! Matt Denny The GERGM June 24th, 2016 2 / 23 Assessing Network Structure Matt Denny The GERGM June 24th, 2016 3 / 23 Exponential Random Graph Models (ERGMs) Setting: Network x composed of n nodes and M edges Assume binary edges between nodes i, j: xi,j ∈ {0, 1} X : family of all graphs with n nodes and binary edges PX : probability measure on X Aim: Identify (and then estimate) the relational covariates that capture network structure through PX Matt Denny The GERGM June 24th, 2016 4 / 23 Exponential Random Graph Models (ERGMs) The probability (likelihood) of observing network x: PX (x, θ) = exp(θ T h(x)) T ∑ exp(θ h(z)) , x ∈ {0, 1}M z∈X h ∶ {0, 1}M → Rp : Network covariates θ ∈ Rp : unknown parameters (we have to estimate these!) Matt Denny The GERGM June 24th, 2016 5 / 23 Weighted Graphs Setting: Network y composed of n nodes and M edges Edges are continuous valued between nodes i, j: yi,j ∈ (−∞, ∞) Y : family of all graphs with n nodes PY : probability measure on Y Examples: Migration, Finance, Communication, Voting, Biological, Trade, etc. Matt Denny The GERGM June 24th, 2016 6 / 23 Generalized Exponential Random Graph Model Probability measure on family of weighted graphs w/ n nodes, M edges Model: Two Steps 1) Model joint structure of Y on restricted network X ∈ [0, 1]M : fX (x, θ) = exp (θ T h(x)) T ∫[0,1]m exp (θ h(z)) dz , x ∈ [0, 1]M h ∶ [0, 1]M → Rp : Network covariates (endogeneous or exogenous) θ ∈ Rp : unknown structural parameters Matt Denny The GERGM June 24th, 2016 7 / 23 Generalized Exponential Random Graph Model 2) Transform onto space of continuous weights: fY (y , θ, Λ) = exp (θ T h(T (y , β))) T ∫[0,1]m exp (β h(z)) dz ∏ tij (y , β), y ∈ RM ij T ∶ RM → [0, 1]m : parametric transformation function Monotonically increasing An appropriate choice: cumulative distribution functions β ∈ Rq : unknown transformation parameters Matt Denny The GERGM June 24th, 2016 8 / 23 Features of the GERGM Flexible model for any type of weighted network Estimation via Markov Chain Monte-Carlo or Maximum pseudo-likelihood Simplifies to multivariate linear regression when h(x) = 0 Matt Denny The GERGM June 24th, 2016 9 / 23 Model Specification: Weighted Network Statistics Network Statistic Parameter Value αR Reciprocity θR ⎛ ⎞ ∑ xij xji ⎝ i<j ⎠ Cyclic Triads θCT ⎞ ⎛ ∑ (xij xjk xki + xik xkj xji ) ⎠ ⎝i<j<k In-Two-Stars θITS ⎛ ⎞ ∑ ∑ xji xki ⎝ i j<k ≠i ⎠ θOTS ⎛ ⎞ ∑ ∑ xij xik ⎝ i j<k ≠i ⎠ αCT αITS αOTS Out-Two-Stars αE Edge Density θE Transitive Triads θTT ⎛ ⎞ ∑ xij ⎝ i≠j ⎠ ⎛ ∑ (xij xjk xik + xij xkj xki + xij xkj xik ) + ⎝i<j<k αTT ⎞ ∑ (xji xjk xki + xji xjk xik + xji xkj xki ) ⎠ i<j<k Matt Denny The GERGM June 24th, 2016 10 / 23 Inference on the GERGM Likelihood of GERGM is intractable; relies on MCMC Gibbs (Desmarais, et al., 2012) Major Issue: Restricts model specification by requiring first order network statistics ∂ 2 h(x) = 0, i, j ∈ [n] ∂xij2 Metropolis-Hastings (Wilson, et al. 2015): Removes above restriction on GERGM Matt Denny The GERGM June 24th, 2016 11 / 23 Metropolis-Hastings Sampling Framework: Acceptance/Rejection algorithm for weighted edges w/ multivariate truncated normal proposal distribution. Proposal: qσ (w∣x) = σ −1 φ( w−x σ ) −x Φ( 1−x σ ) − Φ( σ ) , 0≤w ≤1 Advantages: Flexible model specification Interaction effects Exponential weighting of covariates New available models can avoid likelihood degeneracy Matt Denny The GERGM June 24th, 2016 12 / 23 Comparison to Gibbs Sampling: U.S. Migration Data Describes the inter-state migration in U.S. from 2006 to 2007. Fit a GERGM with 5 network statistics, 11 demographic covariates Matt Denny The GERGM June 24th, 2016 13 / 23 Equivalent Covariate Estimates Matt Denny The GERGM June 24th, 2016 14 / 23 Goodness of Fit M-H and Gibbs have comparable (and good) performance Matt Denny The GERGM June 24th, 2016 15 / 23 Likelihood Degeneracy GERGM? 3 2 1 0 Density 4 5 Simulated Network Densities ERGM 0.0 0.2 0.4 0.6 0.8 1.0 Network Density Matt Denny The GERGM June 24th, 2016 16 / 23 Case Study: In-Two-Stars Model Model: fX (x, θ, α) = exp (θE hE (x) + θITS hITS (x)) , C(θE , θITS ) x ∈ [0, 1]m Edge density: hE (x) = ∑i≠j xij /m α In-Two-Stars: hITS (x, α) = (∑i ∑j<k ≠i xji xki ) Known to suffer from degeneracy issues in the binary case. Matt Denny The GERGM June 24th, 2016 17 / 23 Exponential Down-Weighting In-Two-Stars 350 α 300 0.1 0.25 0.5 0.75 0.9 1 Edge Density 1.0 0.8 250 0.6 200 150 0.4 100 α 0 8 7 6 5 4 3 2 1 0 −1 −2 −3 −4 −5 −6 −7 −8 −9 −10 9 8 10 7 6 5 4 3 2 1 0 −1 −2 −3 −4 −5 −6 −7 −8 −9 −10 0.0 9 0.1 0.25 0.5 0.75 0.9 1 10 0.2 50 Figure: Statistics from 1M simulated In-Two-Stars networks with various values of α weighting. Matt Denny The GERGM June 24th, 2016 18 / 23 Non-Degeneracy = 0.75 0.75 0.75 0.75 0.50 0.25 Scaled Frequency 1.00 0.00 0.50 0.25 6 7 0.50 0.25 0.00 0.00 5 8 30 Simulated In−Two−Stars Value 40 80 50 1.00 0.75 0.75 0.75 0.25 0.00 Scaled Frequency 1.00 1.00 0.50 0.50 0.25 0.3 0.4 Simulated Network Density 0.5 160 200 0.50 0.25 0.00 0.00 0.2 120 Simulated In−Two−Stars Value Simulated In−Two−Stars Value Scaled Frequency Scaled Frequency = 1.00 1.00 Scaled Frequency Scaled Frequency = 0.50 1.00 0.4 0.5 0.6 Simulated Network Density 0.7 0.5 0.6 0.7 0.8 Simulated Network Density Matt Denny GERGM and network density June 24th, 2016 19 Figure: Histograms of simulated in The two-stars statistics for/ 23 Defeating Degeneracy Causes: Bimodal density distribution vs. poor initialization. Solutions: Estimate intercept via transformation. Exponential down-weighting. Weighted MPLE. Adaptive grid search. Matt Denny The GERGM June 24th, 2016 20 / 23 GERGM Software in R github.com/matthewjdenny/GERGM Matt Denny The GERGM June 24th, 2016 21 / 23 Thank you! Funding: National Science Foundation Grants: DGE-1144860, DMS-1105581, DMS- 1310002, SES-1357622, SES-1357606, SES-1461493, and CISE-1320219 Materials: Paper: http://ssrn.com/abstract=2795219 Github: github.com/matthewjdenny/GERGM CRAN: GERGM Matt Denny The GERGM June 24th, 2016 22 / 23 GERGM - Metropolis Hastings Procedure 1 (t) (t) For i, j ∈ [n], generate proposal edge x̃i,j ∼ qσ (⋅∣xi,j ) independently across edges. 2 Set x (t+1) ⎧ (t) (t) ⎪ ⎪ ⎪x̃ = (x̃i,j )i,j∈[n] =⎨ ⎪ ⎪x (t) ⎪ ⎩ w.p. ρ(x (t) , x̃ (t) ) w.p. 1 − ρ(x (t) , x̃ (t) ) where ρ(x, y ) = min ( fX (y ∣θ) m qσ (xi ∣yi ) , 1) ∏ fX (x∣θ) i=1 qσ (yi ∣xi ) m = min (exp (θ′ (h(y ) − h(x))) ∏ i=1 Matt Denny The GERGM qσ (xi ∣yi ) , 1) qσ (yi ∣xi ) June 24th, 2016 23 / 23