Download Assignment 4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computer network wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Network tap wikipedia , lookup

Airborne Networking wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Transcript
Assignment 4. Transitivity & Hierarchy.
1) The key to measuring transitivity in a network is identifying what proportion of all possible transitive
relations are transitive. That is:
 first identify every place that i sends to j, and j sends to k.
j
i

k
it is possible for a transitive triad to form if i were then to send to k.
The transitivity ratio is the number of transitive triads over the number of possible transitive triads.
The first condition is a simple two-path, and we know how to find all paths of length two using the
reach program. Thus, the number of possible transitive triples = the number of two-paths in the network.
The number of transitive triples is equal to the number of two paths that are also one-paths, that is every
place in the network where there is both a two path and a direct arc. We can calculate this from the
adjacency matrix using the following formula:
Transitive relations = T3 = Sum(A2#A)
Intransitive relations = I3 = Sum(A2) – Trace(A2);
Transitivity ratio = T3 / I3.
Where the # sign means element-wise multiplication and the trace is the sum of the diagonal (to get rid of
two paths from ego to ego)
To do this in IML, you would use the lines:
T3 = sum((X**2)#X);
I3 = sum(inmat**2)-trace(inmat**2);
Tranrat = T3/I3;
Print Tranrat;
where X is the adjacency matrix you have entered into IML.
Use this formula to write a SAS IML program that calculates the transitivity ratio for the graduate student
‘help’ network. The program for reading the network & creating PAJEK files is in osugrd_read.sas.
The transitivity ration for help should be
(see triads1.sas for a sample program on diff data).
0. 3352228
Compare the transitivity ratio for (a) the high school friendship to the graduate school best friendship, and
the grad school best friendship to the grad school help.
Hint. This will require two separate IML statements: one for the graduate school numbers, a second for the
high school numbers.
2) To get at the global structure of a network, we would want to look at the triad census: the frequency
count of every type of triad in the network. Calculate the triad census by hand for the small network
below.
To do this, you need to categorize each triad i.e.:
Node 1
1
1
1
1
1
1
2
2
2
3
Node 2
2
2
2
3
3
4
3
3
4
4
Node 3
3
4
5
4
5
5
4
5
5
5
Giving a triad census of:
003
012
102
021D
021U
021C
111D
111U
030T
030C
201
120D
120U
120C
210
300
0
1
1
1
0
0
3
0
0
0
1
0
0
1
1
1
Type
021D
012
120C
111D
210
201
102
111D
111D
300
3) Of course, we can’t calculate the triad census by hand for networks of any real size. Using PAJEK,
calculate the triad census for (a) the prison network, (b) the high-school network, and (c) the
osugrad_help network. To do this you need to:
a)
Create the PAJEK files. You may have already done so and have them saved (from the other
homeworks) Else, use the same programs we’ve used in the earlier homeworks.
b) Once you have the PAJEK file open, go to: INFO > NETWORK > TRIADIC CENSUS
and you will get a copy of the census.
Prison:
--------------------------------------1 - 003
39221
--------------------------------------2 - 012
5860
3 - 102
2336
4 - 021D
61
5 - 021U
80
6 - 021C
103
7 - 111D
105
8 - 111U
69
9 - 030T
13
10 - 030C
1
11 - 201
12
12 - 120D
15
13 - 120U
7
14 - 120C
5
15 - 210
12
16 - 300
5
--------------------------------------Sum (2 - 16):
8684
Fake School:
----------------------------------------------------------Type
Number of triads (ni)
Expected (ei)
----------------------------------------------------------1 - 003
11856124
11688510.46
2 - 012
484386
812746.29
3 - 102
172755
4709.44
4 - 021D
1844
4709.44
5 - 021U
2147
4709.44
6 - 021C
2471
9418.89
7 - 111D
2075
109.16
8 - 111U
1731
109.16
9 - 030T
265
109.16
10 - 030C
12
36.39
11 - 201
571
0.63
12 - 120D
192
0.63
13 - 120U
156
0.63
14 - 120C
105
1.26
15 - 210
242
0.01
16 - 300
95
0.00
-----------------------------------------------------------
4) It turns out that the triad census contains a lot of information about the graph (See wasserman and
faust, chapter 14). We can use this information by looking at frequencies of combinations of triads.
In real data, however, the number of triads is somewhat random, so we want to control for the
distribution in a “similar” random graph. The program triads1.sas shows you how to do this. The
line
s_tstat=tstat(tcen,1);
In the program will calculate a set of static’s for the triad census. The output will look something like
this (This is from a different graph ):
Triad Census
T TPCNT
003
012
102
021D
021U
021C
111D
111U
030T
030C
201
120D
120U
120C
210
300
21
26
11
1
5
3
2
5
3
1
1
1
1
1
1
1
0.25
0.3095
0.131
0.0119
0.0595
0.0357
0.0238
0.0595
0.0357
0.0119
0.0119
0.0119
0.0119
0.0119
0.0119
0.0119
PU
EVT
0.2157
0.3235
0.1294
0.0347
0.0347
0.0693
0.0616
0.0616
0.0126
0.0042
0.0185
0.0063
0.0063
0.0126
0.0084
0.0006
18.118
27.176
10.871
2.9118
2.9118
5.8235
5.1765
5.1765
1.0588
0.3529
1.5529
0.5294
0.5294
1.0588
0.7059
0.0471
VARTU STDDIF
3.9576
11.838
4.5459
2.5215
2.5215
4.2626
4.0205
4.0205
0.8292
0.3274
1.0074
0.4946
0.4946
0.9196
0.5754
0.0448
1.4489
-0.342
0.0607
-1.204
1.3151
-1.368
-1.584
-0.088
2.1317
1.1308
-0.551
0.6691
0.6691
-0.061
0.3877
4.5013
The first column is the standard triad census – exactly what PAJEK produces (in fact, you could use the
results from PAJEK as input into the statistics function) – and gives you the count of the number of triads
of each type (here I am using the small network that is the class logo), so we see there are 26 type 012
triads in this network. The second column gives you the percent, the third the probability of observing a
triad like this given the distribution of dyads, the next is the expected value and then the variance. The last
column is the standardized difference between observed and expected, and is like a t-test (values > 2
indicate greater than chance numbers of the triad type). For example, we see that we observe 1 complete
triad (T300) in the network, but would expect to observe .04 (i.e. none), giving us a t-value of 4.50,
meaning that there are more complete triads in this network than we would expect by chance.
In addition to the counts and individual triad frequencies, we can get the weighted sum of particular types
to see how closely it matches an ideal distribution (such as the rank-cluster system). To do this, we include
a vector of zeros and ones that indicate whether the triad should be allowed or not in the ideal model, and
calculate tau using the formula from the notes. The program calculates these lines for the rank cluster
model and the transitivity model. A value of zero would indicate that the model does not fit any better than
random, the larger the value the better the fit.
Run the program and discuss the observed distribution of triads for the school friendship network. What do
the values of tau tell us? If you were to modify this program to run on the OSU graduate student network,
what would you have to do? (You don’t need to do it, but it might be a nice challenge to see if you can).
Your output should look something like this:
003
012
102
T TPCNT
PU
EVT VARTU STDDIF
1.19E7 0.9466 0.9466 1.19E7 11426 7.4868
480137 0.0383 0.0384 480961 28844 -4.851
176379 0.0141 0.0141 176149 7836.5 2.594
021D
021U
021C
111D
111U
030T
030C
201
120D
120U
120C
210
300
1858
2125
2442
2127
1775
266
12
593
194
158
104
250
97
0.0001
0.0002
0.0002
0.0002
0.0001
212E-7
958E-9
473E-7
155E-7
126E-7
83E-7
2E-5
774E-8
0.0001
0.0001
0.0003
0.0002
0.0002
175E-8
584E-9
0.0001
642E-9
642E-9
128E-8
94E-8
114E-9
1624.6
1624.6
3249.1
2381.9
2381.9
21.93
7.3101
870.36
8.0455
8.0455
16.091
11.769
1.4302
1600.3
1600.3
3152
2329.9
2329.9
21.916
7.3085
846.09
8.044
8.044
16.085
11.763
1.4299
5.8357
12.51
-14.38
-5.282
-12.57
52.136
1.7348
-9.536
65.565
52.872
21.919
69.459
79.923
Note the higher and lower than-chance expectations for certain traids. O30T is 52 times more likely than chance
to appear, 300 is nearly 80 times more likely than chance. 030C, though rare, occurs essential at random (rule of
thumb is that to be significant you should have a std dif > 2). 111U and 201 are unlikely to be found (likely
indicating that when i and j agree on k as a friend, they nominate each other).
CVMAT
003
012
102
021D
021U
021C
111D
111U
030T
030C
201
120D
120U
120C
210
300
11E3
-2E4
-6E3
1496
1496
2991
2193
2193
40.9
13.6
805
15
15
30
22
2.67
-2E4
29E3
4303
-3E3
-3E3
-6E3
-2E3
-2E3
-61
-20
43.6
-15
-15
-29
-10
0.11
-6E3
4303
7837
29.8
29.8
59.7
-2E3
-2E3
0.61
0.2
-2E3
-7.4
-7.4
-15
-22
-4
1496
-3E3
29.8
1600
-24
-49
-15
-15
-.57
-.19
0.19
-.12
-.12
-.24
-.07
0
1496
-3E3
29.8
-24
1600
-49
-15
-15
-.57
-.19
0.19
-.12
-.12
-.24
-.07
0
2991
-6E3
59.7
-49
-49
3152
-30
-30
-1.1
-.38
0.37
-.24
-.24
-.48
-.14
0
2193
-2E3
-2E3
-15
-15
-30
2330
-52
-0.3
-0.1
-22
-.25
-.25
-0.5
-.46
-.05
2193
-2E3
-2E3
-15
-15
-30
-52
2330
-0.3
-0.1
-22
-.25
-.25
-0.5
-.46
-.05
40.9
-61
0.61
-.57
-.57
-1.1
-0.3
-0.3
21.9
0
0
0
0
-.01
0
0
13.6
-20
0.2
-.19
-.19
-.38
-0.1
-0.1
0
7.31
0
0
0
0
0
0
805
43.6
-2E3
0.19
0.19
0.37
-22
-22
0
0
846
-.07
-.07
-.15
-.33
-.08
15
-15
-7.4
-.12
-.12
-.24
-.25
-.25
0
0
-.07
8.04
0
0
0
0
15
-15
-7.4
-.12
-.12
-.24
-.25
-.25
0
0
-.07
0
8.04
0
0
0
30
-29
-15
-.24
-.24
-.48
-0.5
-0.5
-.01
0
-.15
0
0
16.1
0
0
22
-10
-22
-.07
-.07
-.14
-.46
-.46
0
0
-.33
0
0
0
11.8
0
2.67
0.11
-4
0
0
0
-.05
-.05
0
0
-.08
0
0
0
0
1.43
The Covariance matrix really doesn’t help you much substantively. It is a statistical necessity for calculating
tau, however.
TAU_RC
TAU_TR
19.005026 17.652434
These two values are very similar. The values indicate that the Rank-Cluster model fits marginally better than the
transitivity model, but not by much. The substantive difference between the transitivity and rank cluster model is that the
transitivity model implies a single, unified hierarchy, whereas the ranked cluster model allows multiple hierarchical tracks in
a setting. This implies then, that any hierachy within the school is pretty unified.