* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Assignment 4
Survey
Document related concepts
Transcript
Assignment 4. Transitivity & Hierarchy. 1) The key to measuring transitivity in a network is identifying what proportion of all possible transitive relations are transitive. That is: first identify every place that i sends to j, and j sends to k. j i k it is possible for a transitive triad to form if i were then to send to k. The transitivity ratio is the number of transitive triads over the number of possible transitive triads. The first condition is a simple two-path, and we know how to find all paths of length two using the reach program. Thus, the number of possible transitive triples = the number of two-paths in the network. The number of transitive triples is equal to the number of two paths that are also one-paths, that is every place in the network where there is both a two path and a direct arc. We can calculate this from the adjacency matrix using the following formula: Transitive relations = T3 = Sum(A2#A) Intransitive relations = I3 = Sum(A2) – Trace(A2); Transitivity ratio = T3 / I3. Where the # sign means element-wise multiplication and the trace is the sum of the diagonal (to get rid of two paths from ego to ego) To do this in IML, you would use the lines: T3 = sum((X**2)#X); I3 = sum(inmat**2)-trace(inmat**2); Tranrat = T3/I3; Print Tranrat; where X is the adjacency matrix you have entered into IML. Use this formula to write a SAS IML program that calculates the transitivity ratio for the graduate student ‘help’ network. The program for reading the network & creating PAJEK files is in osugrd_read.sas. The transitivity ration for help should be (see triads1.sas for a sample program on diff data). 0. 3352228 Compare the transitivity ratio for (a) the high school friendship to the graduate school best friendship, and the grad school best friendship to the grad school help. Hint. This will require two separate IML statements: one for the graduate school numbers, a second for the high school numbers. 2) To get at the global structure of a network, we would want to look at the triad census: the frequency count of every type of triad in the network. Calculate the triad census by hand for the small network below. To do this, you need to categorize each triad i.e.: Node 1 1 1 1 1 1 1 2 2 2 3 Node 2 2 2 2 3 3 4 3 3 4 4 Node 3 3 4 5 4 5 5 4 5 5 5 Giving a triad census of: 003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U 120C 210 300 0 1 1 1 0 0 3 0 0 0 1 0 0 1 1 1 Type 021D 012 120C 111D 210 201 102 111D 111D 300 3) Of course, we can’t calculate the triad census by hand for networks of any real size. Using PAJEK, calculate the triad census for (a) the prison network, (b) the high-school network, and (c) the osugrad_help network. To do this you need to: a) Create the PAJEK files. You may have already done so and have them saved (from the other homeworks) Else, use the same programs we’ve used in the earlier homeworks. b) Once you have the PAJEK file open, go to: INFO > NETWORK > TRIADIC CENSUS and you will get a copy of the census. Prison: --------------------------------------1 - 003 39221 --------------------------------------2 - 012 5860 3 - 102 2336 4 - 021D 61 5 - 021U 80 6 - 021C 103 7 - 111D 105 8 - 111U 69 9 - 030T 13 10 - 030C 1 11 - 201 12 12 - 120D 15 13 - 120U 7 14 - 120C 5 15 - 210 12 16 - 300 5 --------------------------------------Sum (2 - 16): 8684 Fake School: ----------------------------------------------------------Type Number of triads (ni) Expected (ei) ----------------------------------------------------------1 - 003 11856124 11688510.46 2 - 012 484386 812746.29 3 - 102 172755 4709.44 4 - 021D 1844 4709.44 5 - 021U 2147 4709.44 6 - 021C 2471 9418.89 7 - 111D 2075 109.16 8 - 111U 1731 109.16 9 - 030T 265 109.16 10 - 030C 12 36.39 11 - 201 571 0.63 12 - 120D 192 0.63 13 - 120U 156 0.63 14 - 120C 105 1.26 15 - 210 242 0.01 16 - 300 95 0.00 ----------------------------------------------------------- 4) It turns out that the triad census contains a lot of information about the graph (See wasserman and faust, chapter 14). We can use this information by looking at frequencies of combinations of triads. In real data, however, the number of triads is somewhat random, so we want to control for the distribution in a “similar” random graph. The program triads1.sas shows you how to do this. The line s_tstat=tstat(tcen,1); In the program will calculate a set of static’s for the triad census. The output will look something like this (This is from a different graph ): Triad Census T TPCNT 003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U 120C 210 300 21 26 11 1 5 3 2 5 3 1 1 1 1 1 1 1 0.25 0.3095 0.131 0.0119 0.0595 0.0357 0.0238 0.0595 0.0357 0.0119 0.0119 0.0119 0.0119 0.0119 0.0119 0.0119 PU EVT 0.2157 0.3235 0.1294 0.0347 0.0347 0.0693 0.0616 0.0616 0.0126 0.0042 0.0185 0.0063 0.0063 0.0126 0.0084 0.0006 18.118 27.176 10.871 2.9118 2.9118 5.8235 5.1765 5.1765 1.0588 0.3529 1.5529 0.5294 0.5294 1.0588 0.7059 0.0471 VARTU STDDIF 3.9576 11.838 4.5459 2.5215 2.5215 4.2626 4.0205 4.0205 0.8292 0.3274 1.0074 0.4946 0.4946 0.9196 0.5754 0.0448 1.4489 -0.342 0.0607 -1.204 1.3151 -1.368 -1.584 -0.088 2.1317 1.1308 -0.551 0.6691 0.6691 -0.061 0.3877 4.5013 The first column is the standard triad census – exactly what PAJEK produces (in fact, you could use the results from PAJEK as input into the statistics function) – and gives you the count of the number of triads of each type (here I am using the small network that is the class logo), so we see there are 26 type 012 triads in this network. The second column gives you the percent, the third the probability of observing a triad like this given the distribution of dyads, the next is the expected value and then the variance. The last column is the standardized difference between observed and expected, and is like a t-test (values > 2 indicate greater than chance numbers of the triad type). For example, we see that we observe 1 complete triad (T300) in the network, but would expect to observe .04 (i.e. none), giving us a t-value of 4.50, meaning that there are more complete triads in this network than we would expect by chance. In addition to the counts and individual triad frequencies, we can get the weighted sum of particular types to see how closely it matches an ideal distribution (such as the rank-cluster system). To do this, we include a vector of zeros and ones that indicate whether the triad should be allowed or not in the ideal model, and calculate tau using the formula from the notes. The program calculates these lines for the rank cluster model and the transitivity model. A value of zero would indicate that the model does not fit any better than random, the larger the value the better the fit. Run the program and discuss the observed distribution of triads for the school friendship network. What do the values of tau tell us? If you were to modify this program to run on the OSU graduate student network, what would you have to do? (You don’t need to do it, but it might be a nice challenge to see if you can). Your output should look something like this: 003 012 102 T TPCNT PU EVT VARTU STDDIF 1.19E7 0.9466 0.9466 1.19E7 11426 7.4868 480137 0.0383 0.0384 480961 28844 -4.851 176379 0.0141 0.0141 176149 7836.5 2.594 021D 021U 021C 111D 111U 030T 030C 201 120D 120U 120C 210 300 1858 2125 2442 2127 1775 266 12 593 194 158 104 250 97 0.0001 0.0002 0.0002 0.0002 0.0001 212E-7 958E-9 473E-7 155E-7 126E-7 83E-7 2E-5 774E-8 0.0001 0.0001 0.0003 0.0002 0.0002 175E-8 584E-9 0.0001 642E-9 642E-9 128E-8 94E-8 114E-9 1624.6 1624.6 3249.1 2381.9 2381.9 21.93 7.3101 870.36 8.0455 8.0455 16.091 11.769 1.4302 1600.3 1600.3 3152 2329.9 2329.9 21.916 7.3085 846.09 8.044 8.044 16.085 11.763 1.4299 5.8357 12.51 -14.38 -5.282 -12.57 52.136 1.7348 -9.536 65.565 52.872 21.919 69.459 79.923 Note the higher and lower than-chance expectations for certain traids. O30T is 52 times more likely than chance to appear, 300 is nearly 80 times more likely than chance. 030C, though rare, occurs essential at random (rule of thumb is that to be significant you should have a std dif > 2). 111U and 201 are unlikely to be found (likely indicating that when i and j agree on k as a friend, they nominate each other). CVMAT 003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U 120C 210 300 11E3 -2E4 -6E3 1496 1496 2991 2193 2193 40.9 13.6 805 15 15 30 22 2.67 -2E4 29E3 4303 -3E3 -3E3 -6E3 -2E3 -2E3 -61 -20 43.6 -15 -15 -29 -10 0.11 -6E3 4303 7837 29.8 29.8 59.7 -2E3 -2E3 0.61 0.2 -2E3 -7.4 -7.4 -15 -22 -4 1496 -3E3 29.8 1600 -24 -49 -15 -15 -.57 -.19 0.19 -.12 -.12 -.24 -.07 0 1496 -3E3 29.8 -24 1600 -49 -15 -15 -.57 -.19 0.19 -.12 -.12 -.24 -.07 0 2991 -6E3 59.7 -49 -49 3152 -30 -30 -1.1 -.38 0.37 -.24 -.24 -.48 -.14 0 2193 -2E3 -2E3 -15 -15 -30 2330 -52 -0.3 -0.1 -22 -.25 -.25 -0.5 -.46 -.05 2193 -2E3 -2E3 -15 -15 -30 -52 2330 -0.3 -0.1 -22 -.25 -.25 -0.5 -.46 -.05 40.9 -61 0.61 -.57 -.57 -1.1 -0.3 -0.3 21.9 0 0 0 0 -.01 0 0 13.6 -20 0.2 -.19 -.19 -.38 -0.1 -0.1 0 7.31 0 0 0 0 0 0 805 43.6 -2E3 0.19 0.19 0.37 -22 -22 0 0 846 -.07 -.07 -.15 -.33 -.08 15 -15 -7.4 -.12 -.12 -.24 -.25 -.25 0 0 -.07 8.04 0 0 0 0 15 -15 -7.4 -.12 -.12 -.24 -.25 -.25 0 0 -.07 0 8.04 0 0 0 30 -29 -15 -.24 -.24 -.48 -0.5 -0.5 -.01 0 -.15 0 0 16.1 0 0 22 -10 -22 -.07 -.07 -.14 -.46 -.46 0 0 -.33 0 0 0 11.8 0 2.67 0.11 -4 0 0 0 -.05 -.05 0 0 -.08 0 0 0 0 1.43 The Covariance matrix really doesn’t help you much substantively. It is a statistical necessity for calculating tau, however. TAU_RC TAU_TR 19.005026 17.652434 These two values are very similar. The values indicate that the Rank-Cluster model fits marginally better than the transitivity model, but not by much. The substantive difference between the transitivity and rank cluster model is that the transitivity model implies a single, unified hierarchy, whereas the ranked cluster model allows multiple hierarchical tracks in a setting. This implies then, that any hierachy within the school is pretty unified.