Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introductory Social Network Analysis with Pajek November 4, 2016 Teaching Assistant: Anum Masood SEIEE 1 Overview of Network Analysis Tools Pajek network analysis and visualization, menu driven, suitable for large networks Netlogo agent based modeling recently added network modeling capabilities GUESS network analysis and visualization, extensible, script-driven (jython) platforms: Windows (on linux via Wine) download platforms: any (Java) download platforms: any (Java) download Other software tools that we will not be using but that you may find useful: visualization and analysis: UCInet - user friendly social network visualization and analysis software (suitable smaller networks) iGraph - if you are familiar with R, you can use iGraph as a module to analyze or create large networks, or you can directly use the C functions Jung - comprehensive Java library of network analysis, creation and visualization routines Graph package for Matlab (untested?) - if Matlab is the environment you are most comfortable in, here are some basic routines SIENA - for p* models and longitudinal analysis SNA package for R - all sorts of analysis + heavy duty stats to boot NetworkX - python based free package for analysis of large graphs InfoVis Cyberinfrastructure - large agglomeration of network analysis tools/routines, partly menu driven visualization only: GraphViz - open source network visualization software (can handle large/specialized networks) TouchGraph - need to quickly create an interactive visualization for the web? yEd - free, graph visualization and editing software specialized: CLAIR library - NLP and IR library (Perl Based) includes network analysis routines Tools Useful for Social Networks • Pajek: extensive menu-driven functionality, including many network metrics, and manipulations – But not extensible • Guess: extensible, scriptable tool of exploratory data analysis, but more limited selection of built-in methods compared to Pajek • NetLogo: general agent based simulation platform with excellent network modeling support – many of the demos in this course were built with NetLogo • NetDraw: network visualization tool associated with UCInet. UCInet is not free, but NetDraw is. Other Tools: gephi / Cytoscape • http://gephi.org • primarily for visualization, has some nice touches • Cytoscape is mainly used for visualization of biological networks/ pathways / interactions analysis. Other Visualization Tools: Walrus • developed at CAIDA available under the GNU GPL. • “…best suited to visualizing moderately sized graphs that are nearly trees. A graph with a few hundred thousand nodes and only a slightly greater number of links is likely to be comfortable to work with.” Java-based Implemented Features • • – – – – – – – rendering at a guaranteed frame rate regardless of graph size coloring nodes and links with a fixed color, or by RGB values stored in attributes labeling nodes picking nodes to examine attribute values generating subgraph: displaying a subset of nodes or links based on a user-supplied boolean attribute interactive pruning of the graph to temporarily reduce clutter and occlusion zooming in and out Source: CAIDA, http://www.caida.org/tools/visualization/walrus/ Visualization Tool: GraphViz • • • • Takes descriptions of graphs in simple text languages Outputs images in useful formats Options for shapes and colors Standalone or use as a library • dot: hierarchical or layered drawings of directed graphs, by avoiding edge crossings and reducing edge length • neato (Kamada-Kawai) and fdp (Fruchterman-Reinhold with heuristics to handle larger graphs) • twopi – radial layout • circo – circular layout http://www.graphviz.org/ Dot (GraphViz) Visualization Tools: YEd - JavaTM Graph Editor http://www.yworks.com/en/products_yed_about.htm (good primarily for layouts, scales better, maybe free) yEd and 26,000 Nodes (Takes a Few Seconds) Visualization Tools: Prefuse • (free) user interface toolkit for interactive information visualization – – – – – built in Java using Java2D graphics library data structures and algorithms pipeline architecture featuring reusable, composable modules animation and rendering support architectural techniques for scalability • requires knowledge of Java programming • website: http://prefuse.sourceforge.net/ – CHI paper http://guir.berkeley.edu/pubs/chi2005/prefuse.pdf Simple Prefuse Visualizations Source: Prefuse, http://prefuse.sourceforge.net/ Prefuse Application: Flow Maps A flow map of migration from California from 1995-2000, generated automatically by Prefuse system using edge routing but no layout adjustment. http://graphics.stanford.edu/papers/flow_map_layout/ Prefuse Application: Vizster http://jheer.org/vizster/ Visualization Tool: Manyeyes • http://manyeyes.alphaw orks.ibm.com/manyeyes / • Only for Visualization • Not just for networks, but many other data type • Web based, very easy to use 14 Outline • In Pajek – – – – – – visualization and layouts degree connected components snowball sampling one mode projections of bipartite graphs thresholding weighted graphs Using Pajek for Exploratory Social Network Analysis • Pajek – (pronounced in Slovenian as Pah-yek) means ‘spider’ • website: http://vlado.fmf.uni-lj.si/pub/networks/pajek/ • wiki: http://pajek.imfm.si/doku.php – – – – download application (free) tutorials lectures datasets • Windows only (works on Linux via Wine, Mac via Darwine) • Reference book: ‘Exploratory Social Network Analysis with Pajek’ by Wouter de Nooy, Andrej Mrvar and Vladimir Batagelj Pajek: Interface we’ll use today Drop down list of networks opened or created with pajek. Active is displayed Drop down list of network partitions by discrete variables, e.g. degree, mode, label Drop down list of continuous node attributes, e.g. centrality, clustering coefficients can be used for clustering Source: Pajek, Free for noncommercial use - http://pajek.imfm.si/doku.php?id=download Pajek: Opening a Network File click on folder icon to open a file Save changes to your network, network partitions, etc., if you’d like to keep them Source: Pajek, Free for noncommercial use - http://pajek.imfm.si/doku.php?id=download Pajek: Working with Network Files • The active network, partition, etc is shown on top of the drop down list Draw the network Source: Pajek, Free for noncommercial use - http://pajek.imfm.si/doku.php?id=download Pajek data format Louise Ada Cora directed edges from Ada(1) to Louise(3) w/ eight “2” and color Black undirected edges between Ada(1) to Cora(2) w/ weight “1” and color Black number of vertices *Vertices 26 1 "Ada" 2 "Cora" 3 "Louise" .. *Arcs 1 3 2 c Black 1 2 1 c Black 2 1 1 c Black .. *Edges 2 3 1 c Black .. vertex x,y,z coordinates (optional) 0.1646 0.2144 0.5000 0.0481 0.3869 0.5000 0.3472 0.1913 0.5000 Pajek: Let’s Get Started • Opening a network – File Network Read • Visualization – Draw Draw • Essential measurements Pajek: Opening a File • A planar graph and layouts in Pajek • Download the file ‘NetScience.net' from the website [http://vlado.fmf.unilj.si/pub/networks/data/collab/netscience.htm] • Open it in Pajek by either clicking on the yellow folder icon under the word "Network" or by selecting FileNetworkRead from the main menu panel • A report window should pop up confirming that the graph has been read and the filename and location will be displayed in the 'active' position of the network dropdown list Pajek: Visualization & Manual Positioning • Visualize the network using Pajek's DrawDraw command from the main menu panel. • This will bring up the 'draw' window with its own menu bar at the top • Reposition the vertices by clicking on them and holding down the mouse button while dragging them to a new location. Continue doing this until you have shown that the graph is planar (no edges cross have to cross ) • (If you think this is really fun to do in your spare time, go to http://www.planarity.net) Pajek: Visualization & Layout Algorithms • Now let Pajek do the work for you by selecting from the draw toolbar several layout algorithms under 'LayoutEnergy'. • Why did you select the layout algorithm you did? • Did the layout leave any lines crossed? If you were to do this assignment over, what order would you do it in? A Directed Network • • girls’ school dormitory dining-table partners (Moreno, The sociometry reader, 1960) first and second choices shown Louise Ada Lena Adele Marion Jane Frances Cora Eva Maxine Mary Anna Ruth Edna Robin Betty Martha Jean Laura Alice Hazel Helen Ellen Ella Irene Hilda Node Centrality: Degree • Node network properties – from immediate connections indegree=3 • indegree how many directed edges (arcs) are incident on a node outdegree=2 • outdegree how many directed edges (arcs) originate at a node • degree (in or out) number of edges incident on a node – labels degree=5 Centrality: Degree • More on degree and other centrality measures in the next lecture… • Degree: calculate it – Net Partitions Degree • Visualize degree centrality – DrawDraw-Vector – If nodes are not the right size, use resize option • Options Sizeof Vertices • Adjust the default size Connected Components • Strongly connected components – Any two nodes in the component can be reached from each other by following directed edges BCDE A GH F B F C A E D G H • Weakly connected components: every node can either reach or be reached from every other node by following directed edges ABCDE GHF • In undirected networks one talks simply about “connected components” The bowtie model of the Web Broder et al. (1999) • SCC (strongly connected component): – can reach all nodes from any other by following directed edges • IN – can reach SCC from any node in ‘IN’ component by following directed edges • OUT – can reach any node in ‘OUT’ component from SCC • Tendrils and tubes – connect to IN and/or OUT components but not SCC • Disconnected – isolated components Bipartite networks Going from a Bipartite to a One-mode group 1 Graph Two-mode network • One mode projection group 2 – two nodes from the first group are connected if they link to the same node in the second group – some loss of information – naturally high occurrence of cliques Pajek: Wrap Up • Used frequently by sociologists – UCInet is comparable and arguably more user friendly (but not free) • Extensive functionality – But not extendable • What we covered – – – – – – visualization node properties: degree connected components k-neighbors converting two-mode networks to one-mode thresholding the network Quick Overview: Pajek Pajek: Program for Large Network Analysis. Download page: http://pajek.imfm.si/doku.php?id=download Manual: http://vlado.fmf.uni-lj.si/pub/networks/pajek/doc/pajekMan.pdf Quick Overview: Pajek • Draw “Network” with Pajek: – List of neighbours (Arcslist/Edgeslist) (unweighted graph) – Pairs of lines (Arcs/Eges) (weighted graph) – Matrix Quick Overview: Pajek • List of neighbours (Arcslist/Edgeslist) *Vertices 5 1 “a” 2 “b” 3 “c” 4 “d” 5 “e” *Arcslist 124 23 314 45 *Edgeslist 15 Words, starting with *, must be written in first column of the line. Definition of vertices followed after that – to each vertex we give a label. using *Arcslist, a list of directed lines from selected vertices are declared. *Edgeslist, declares a list of undirected lines. No empty lines are allowed. Quick Overview: Pajek 1, read the .net file 1, draw the network Quick Overview: Pajek • Pairs of lines (Arcs/Edges) *Vertices 5 1 "a" 2 "b" 3 "c" 4 "d" 5 "e" *Arcs 121 141 232 311 342 451 *Edges 151 Every arc/edge is defined separately in new line – initial and terminal vertex are given. Directed lines are defined using *Arcs, undirected lines are defined using *Edges, the third number in rows defining the weight. Quick Overview: Pajek • Matrix *Vertices 5 1 "a" 2 "b" 3 "c" 4 "d" 5 "e" *Matrix 01011 00200 10020 00001 10000 In this format directed lines are given in the matrix form (*Matrix). We can transform bidirected arcs to edges. Quick Overview: Pajek • Export to bmp, eps… Case Study: Pajek • Computing indegree and outdegree using Pajek: double click Partitions REFERENCES • Graph and Digraph Glossary example: – Derived from Bill Cherowitzo's Graph and Digraph Glossary. • http://www-math.cudenver.edu/~wcherowi/courses/m4408/glossary.html