Download Supplementary material for “Modularity in the genetic

Supplementary material for "Modularity in the genetic disease-phenotype network" Xingpeng Jiang, Bing Liu, Jiefeng Jiang, Huizhi Zhao, Ming Fan, Jing Zhang and Zhenjie Fan, Tianzi Jiang Contents I. II. III. Computation of dyadicity D and heterophilicity H Decomposition of large modules into sub-modules The nature of the phenotypic similarity measure and the implications on the findings IV. Genes associated with phenotypes in module are functional similar V. Computing P value for the disease class enrichment of modules in the phenotype networks VI. Constructing pseudo-pleiotropic genes set VII. Supplementary References VIII. Supplementary Figures IX. Supplementary Tables 1 I. Computation of dyadicity D and heterophilicity H The value of a phenotype depends on whether it belongs (1) or does not belong (0) to a disease class. Thus three types of links between phenotypes exist: 1-1, 1-0, and 0-0; the number of these links are termed m11 , m10 and m00 , respectively. The two parameters dyadicity D and heterophilicity H are defined as: D m11 m and H  10 , m11 m10 where m11 , m10 represent the expected values of m11 , m10 respectively. These parameters can successfully characterize the modular structure of protein-protein interaction networks (Park, J. and Barabasi, A.L., 2007). The expected value of m11 and m10 is computed next. If we take cancer as an example, we can call n1 the number of phenotypes belonging to cancer and number of other phenotypes. N  n1  n0 is the total number of phenotypes and the total number of edges in the network. Let p 2M N ( N  1) n0 M the is represent the connectance that indicates the average probability that two phenotypes are connected in the network. The value of a phenotype depends on whether it belongs to a cancer class (1), or does not (0). The three varieties of link styles between phenotypes are 1-1, 1-0, and 0-0, and the number of these links can be labeled as m11 , m10 and m00 respectively. If any phenotype in the network has an equal chance of being cancer, the expected values of m11 and m10 are m11 and m10 respectively (Park, J. and Barabasi, A.L., 2007). 2 n  n (n  1) m11 =  1   p  1 1 p, 2 2   n  n  m10 =  1   0   p  n1 ( N  n1 ) p . 1 1  Statistically significant deviations of m11 and m10 from their expected values of m11 and m10 imply that cancer phenotypes are not distributed randomly in the phenotype network. Dyadicity D  1 ( D  1 ) indicates that phenotypes in the disease class tend to connect more (less) densely among themselves than expected for a random configuration. Similarly heterophilicity H  1 ( H  1 ) means that phenotypes in the disease class have more (fewer) connections to phenotypes in other classes than expected randomly. If D  1 and H  1 , phenotypes in the specific disease class must have a clear clustering tendency within the network. II. Extracting the modules of the phenotype network and decomposition of large modules into sub-modules For a given partition of a network, Q is defined as: m   Q   eii  ( ei j )2  i 1  j  where m is the number of modules, eii are the fraction of the edges that connect two nodes inside a module i , and eij are the fraction of the edges connecting nodes of module i to j . The modularity Q of a partition is high when the number of intra-module edges is much larger than expected for a random partition. We identified modules by maximizing the modularity Q so that there were many intra-module edges and few between-module edges. 3 However the method could not identify the hierarchical structure of the modules. Therefore, we decomposed all modules which had more than 100 phenotypes into sub-modules. The number of final modules which are based in the secondary level of modularity may affect the results. We managed to reduce the effect by visually inspecting each sub-network with more than 100 phenotypes in the first level modules while automatically decomposing the phenotype network using Newman’s algorithm (Newman, M.E., 2006). The network was partitioned into 28 modules in the first partition. We found 16 modules with at least 100 phenotypes, of which 11 modules had a significant secondary level of modularity using Newman’s algorithm (Table S5, red color). We scrutinized the each sub-network of phenotypes in the modules using spring-embedded layout in Cytoscap software (Shannon, P. et al., 2003) to see if each sub-network has a clear modular structure. In the resulting spring-embedded layout, nodes with edges between them tend to be situated near each other, whereas nodes without edges between them tend to be spread apart (desJardins, M. et al., 2007). We found that a modularity 0.5 is an appropriate threshold value for extracting secondary level modules. By way of illustration, we showed the different topology structures in Suppl. Fig S5 for the sub-networks of Modu 3 and Modu 15. The modularity of Modu 3 is 0.423 and the value of Modu 15 is 0.501, which are the nearest modules to the threshold of 0.5. The sub-network of Modu 3 tends to form a densely connected cluster; however, the sub-network of Modu 15 can be partitioned into several parts (Suppl. Fig S5). 4 We identified 231 modules in the end, most of which (214 of 231) are based on the secondary level of modularity. Thus, we believed that this decomposition method will reveal the actual modularity of the phenotype network. III. The nature of the phenotypic similarity measure and the implications on the findings We discussed the nature of the similarity measure and the implications on our findings respectively. 1) The nature of the similarity measure van Driel et al. (2006) established phenotype similarities from the "anatomy (A) and the disease (C) sections of the medical subject headings vocabulary (MeSH)" and the "full-text (TX) and clinical synopsis (CS) fields of all records that describe genetics disorders" in OMIM database. Specifically, the MeSh hierarchical tree can correct for differences in the level of detail of a phenotype description (Brunner, 2004). Then van Driel et al. (2006) chose the "term frequency-inverse document frequency" (tf-idf) to weight each keyword after correcting for the length of the records and determined the feature vector similarities by calculating the cosine of the angle between the vector pairs. In this method, keywords such as "Blood-Retinal Barrier" which can provide more specific information about a phenotype are weighted higher than keywords which are less informative such as "Eye". The nature of the similarity measure ensures that we can discover overlapping phenotypes if they have similar clinical traits. We expected that the phenotype network constructed by the similarity measure would provide a large-scale landscape of phenotype relationships. 5 2) The implications of the similarity measure on the modularity of the phenotype network First, the nature of the similarity measure ensures that phenotypes with similar clinical traits are connected and phenotypes in a disease class tend to group in the phenotype network. Phenotypes in a disease class are not usually formed into a single module. However, they can be divided into many different modules. For example, phenotypes in the neurological disease class are distributed into about ten modules; of them, one module contains primarily ataxia phenotypes, such as spinocerebellar ataxia and cerebellar ataxia; and one module contains mostly Charcot-Marie-Tooth disease phenotypes. Thus modules generally are subclasses of the major disease classes, which were determined manually by Goh et al. Our results indicate that the modularization of phenotypes network can identify detailed classification of disease phenotypes. Second, in several cases, a module can also contain several disease classes. For instance, a module contains neurological phenotypes and metabolic phenotypes, which have a high degree of phenotypic overlap with neurological diseases. An example of such a metabolic disease is sialic acid storage disease, which has mental retardation and clumsiness as primary features. It means that some phenotypes in different disease classes may be grouped together because they have similar clinical traits. Third, phenotypes within a module are more densely connected than those across modules and they are more similar in clinical traits. Given the basic hypothesis that similar phenotypes have a similar genetic foundation, we inferred that phenotypes in a module would have similar genetic mechanisms. 6 We discussed whether disease genes in a phenotype module are functionally similar in section 3.4. The result indicates that phenotype modules may be used to infer the genetic foundations of phenotypes without known genes. In short, the results indicate that the phenotypic similarity network can not only provide a computational validation of the disease classification which was determined manually by Goh et al., but also provide a more specific classification of disease phenotypes. Moreover, these phenotypic modules provide a candidate for understanding the relationship between diseases and genes. IV. Genes associated with phenotypes in a module are functionally similar We investigated whether genes associated with phenotypes in any given module are functionally similar using GO annotation. modules. We measured the GO homogeneity of the GO homogeneity (GH) has been used previously (Goh et al. 2007) to investigate whether genes associated with the same disorder share similar functional characteristics. Similarly GH is defined here as the maximum fraction of genes in the same module that have the same GO terms, GH i  max j [ n ji ], ni where ni denotes the number of genes in the module i that have any GO annotations, and n j i the number of genes that have the specific GO term j . We generated 103 random controls for each module to compute GO homogeneity by picking the same number of genes randomly in the disease genes using GO 7 functional annotation. Suppl. Fig S3 shows the histogram of GO homogeneity in the phenotype modules associated with at least 5 GO annotated genes. As expected, we found that the distribution of GO homogeneity in modules is significantly higher than random expectations. V. Computing P value for the disease class enrichment of modules in the phenotype networks We used the disease classification dataset to see if the disease phenotypes within a single module tended to fall within the same disease class. We used the method described in that computes a P value for the functional enrichment of modules in protein-protein interaction networks (Wang et al., 2007). Take cancer as an example. For a given module M we randomly selected a set of phenotypes which had the same number of members as M, and counted how many of them are cancer. The P value was calculated as the probability that the number of cancer phenotypes in a random group would be equal to or greater than what we observed in M. We used 100,000 simulations to obtain the P values. VI. Constructing pseudo-pleiotropic genes set We constructed a set of 394 pseudo-pleiotropic genes as a random control to investigate the distribution of pleiotropic genes in the phenotype network. Each pseudo-pleiotropic gene had the same number of phenotypes as was known for a 8 corresponding pleiotropic gene, except that these phenotypes were randomly selected. We computed the mean MSC of the control set. The P value was calculated as the probability that the mean MSC in random groups is equal to or greater than the observed after 100,000 random simulations. VII. Supplementary References Park, J. and Barabasi, A.L. (2007) Distribution of node characteristics in complex networks. Proc. Natl. Acad. Sci. USA 104, 17916-20. Newman, M.E. (2006). Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103, 8577-82 Shannon, P. et al. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498-504. desJardins, M., MacGlashan, J., and Ferraioli, J. (2007) Interactive visual clustering. In Proceedings of the 12th international conference on Intelligent user interfaces. ACM, Honolulu, Hawaii, USA. Gene Ontology Consortium. (2006). The Gene Ontology (GO) project in 2006. Nucleic Acids Res. 34:D322-D326. Goh, K.I., Cusick, M.E., Valle, D., Childs, B., Vidal, M. and Barabasi, A.L. (2007). The human disease network. Proc. Natl. Acad. Sci. USA 104, 8685-90. Wang, Z. and Zhang, J. (2007). In search of the biological significance of modular structures in protein networks. PLoS Comput. Biol. 3, e107. 9 VIII. Supplementary Figures Fig. S1: The distinctly modular structure of the phenotype network. We used 100 randomized networks as random controls. Each randomized network was generated by shuffling the edges randomly (10,000 times) in the phenotype network while keeping both the degree of every node and the degree distribution of the network unchanged. The Q values of the randomized networks are all slightly greater than 0.2, but the value of an actual network is 0.78. The z-score ( z  (Q  Q) /  , where Q and  are the mean and standard deviation of modularity Q ), which measures the improbability that the observed result is due to chance, is ~309 (z 1 ). 0.3 z-score = 309 Frequency 0.25 0.2 0.15 0.1 Observed Q = 0.78 0.05 0 0.2 0.3 0.4 0.5 Modularity 10 0.6 0.7 0.8 0.9 Fig. S2: Cartographic representation of the phenotype network. Different colors indicate different disease classes in modules and unclassified phenotypes are not shown. We only show here nodules which have at least 15 phenotypes with disease classifications. 11 Fig. S3: The GO Homogeneity of modules for the GO categories molecular function. Red bars denote the actual histogram and blue bars represent the random control obtained for each module by randomly choosing the same number of disease genes with GO functional annotations. 12 Fig. S4: The target proteins of 37 experimental drugs tend to associate with module-specific disease phenotypes. 0.5 0.45 0.4 Frequency 0.35 0.3 -8 P <10 0.25 0.2 MSC* = 0.2045 0.15 0.1 0.05 0 0 0.05 0.1 0.15 0.2 0.25 Module-specific coefficient 13 0.3 0.35 0.4 Fig. S5: Selection of threshold for the second level of modularity. sub-network of Modu 3 tends to form a dense, connected cluster. sub-network of Modu 15 tends to be partitioned into several parts A. B. 14 A, The B, The Supplementary Tables Table S1: The construction of disease phenotype networks by selecting different thresholds does not have a significant effect on the modularity. Threshol Nodes Edges Modularity d 0.45 0.475 0.5 0.525 0.55 0.575 0.6 0.625 0.65 0.675 4,681 4,425 4,146 3,783 3,381 3,008 2,575 2,202 1,888 1,568 56,752 40,482 29,489 21,778 16,349 12,520 9,592 7,479 5,890 4,672 15 0.717 0.748 0.783 0.807 0.824 0.833 0.833 0.831 0.833 0.833 16 Table S2: Dyadicity and heterophilicity of 21 disease classes. Disease class Bone Cancer Cardiovascular Dermatological Developmental Ear, Nose, Throat Endocrine Gastrointestinal Hematological Immunological Metabolic Muscular Neurological Nutritional Ophthalmological Psychiatric Renal Respiratory Skeletal Multiple Tissue Total Number of Intra-class Phenotypes connections 38 60 50 68 36 40 47 19 40 37 129 61 184 2 100 13 32 6 66 128 28 1,184 48 32 69 100 15 482 21 12 11 31 158 214 508 1 422 7 16 2 119 95 16 2,739 Total Links 264 173 274 658 638 3071 182 53 180 100 513 618 2058 11 1644 175 144 28 1462 1719 214 14,179 17 Average Degree 13.895 5.767 10.96 19.353 35.444 153.55 7.745 5.579 9 5.405 7.954 20.262 22.37 11 32.88 26.923 9 9.333 44.303 26.86 15.286 11.97 Dyadicity 19.895 5.268 16.413 12.791 6.938 180.06 5.661 20.448 4.11 13.563 5.576 34.074 8.792 291.382 24.841 26.15 9.4 38.851 16.165 3.406 12.334 Heterophilicity 0.403 0.168 0.292 0.586 1.227 4.593 0.244 0.152 0.3 0.132 0.2 0.472 0.62 0.352 0.88 0.911 0.283 0.305 1.453 0.92 0.5 Table S3: Target proteins of FDA-approved drugs and their associated phenotype modules. FDA-approved Drugs Target proteins OMIM Gene Number Associated Module ID Amiloride SCNN1A;SCN N1B; SCNN1G 600228;600760;600761 81;45;45 Bendroflumethiazide CA2;CA4 259730;114760 21;6 Benzthiazide CA2;CA4 259730;114760 21;6 Bepridil CACNA1A;SC N5A 601011;600163 1;14 Calcidiol CYP27B1;VD R 609506;601769 21;21 Chlorothiazide CA2;CA4 259730;114760 21;6 Cinnarizine CACNA1A;DR D2 601011;126450 1;10 Coagulation factor VIIa F10;F9 227600;306900 216;28 Cocaine DRD3;SCN5A; SLC6A4 126451;600163;182138 10;14;9 Cyclothiazide CA2;CA4 259730;114760 21;6 Desflurane ATP2C1;GAB RA1;KCNA1 604384;137160;176260 7;180;145 Desipramine ADRB1;SLC6 A4 109630;182138 172;9 Diazoxide CA2;CA4 259730;114760 21;6 Drotrecogin alfa F5;F8 227400;306700 35;28 Enflurane ATP2C1;GAB RA1;KCNA1 604384;137160;176260 7;180;145 Eptifibatide ITGA2B;ITGB 3 607759;173470 216;216 Gliclazide ABCC8;KCNJ 1 600509;600359 5;45 Halothane ATP2C1;GAB RA1 604384;137160 7;180 Heparin F10;SERPINC1 227600;107300 216;11 Hydrochlorothiazide CA2;CA4 259730;114760 21;6 Hydroflumethiazide CA2;CA4;SLC 12A1 259730;114760;600839 21;6;45 Hydroxocobalamin AMN;MMAA; MMAB;MTR; MTRR;MUT 605799;607481;607568;156 570;602568;609058 164;94;94; 98;98;162 18 Imatinib ABL1;KIT 189980;164920 125;1 Indapamide KCNE1;KCNQ 1 176261;607542 175;175 Interferon gamma-1b IFNGR1;IFNG R2 107470;147569 218;218 Iron Dextran FTL;TF 134790;190000 5;65 Isoflurane ATP2C1;GAB RA1;KCNA1 604384;137160;176260 7;180;145 Levocarnitine CPT1A;CPT2; SLC22A4; SLC22A5 600528;600650;604190;603 377 162;162; 95;95 Levosimendan TNNI3;TNNT2 191044;191045 197;195 Liothyronine ALB;THRB 103600;190160 11;1 Menotropins FSHR;LHCGR 136435;152790 134;133 Methoxyflurane ATP2C1;GAB RA1;KCNA1 604384;137160;176260 7;180;145 Methyclothiazide CA2;CA4;SLC 12A1 259730;114760;600839 21;6;45 Minaprine DRD2;SLC6A4 126450;182138 10;9 Perhexiline CPT1A;CPT2 600528;600650 162;162 Pramipexole DRD2;DRD3 126450;126451 10;10 Ribavirin ENPP1;IMPDH 1 173335;146690 19;6 Risperidone ADRB1;DRD2 109630;126450 172;10 Ropinirole DRD2;DRD3 126450;126451 10;10 Sevoflurane ATP2C1;GAB RA1;KCNA1 604384;137160;176260 7;180;145 Tirofiban ITGA2B;ITGB 3 607759;173470 216;216 Tolbutamide ABCC8;KCNJ 1 600509;600359 5;45 Topiramate CA2;CA4;GAB RA1;SCN1A 259730;114760;137160;182 389 21;6; 180;180 Trichlormethiazide CA2;CA4;SLC 12A1 259730;114760;600839 21;6;45 Vitamin A RDH12;RDH5; RLBP1 608830;601617;180090 6;6;6 Vitamin B12 AMN;MMAA; MMAB;MTR; MTRR 605799;607481;607568;156 570;602568 164;94;94; 98;98 Vitamin (Ergocalciferol) D2 CYP27B1;CYP 2R1;VDR 609506;608713;601769 21;21;21 Vitamin D3 CYP2R1;VDR 608713;601769 21;21 19 (Cholecalciferol) Ziprasidone ADRB1;DRD2; DRD3 109630;126450;126451 20 172;10;10 Table S4: Target proteins of experimental drugs and their associated phenotype modules. Target proteins OMIM Gene Number Associated Module ID 3,5,3',5'-TETRAIODO-LTHYRONINE ALB;THRB 103600;190160 11;1 Acetate Ion ANTXR2;CA2; GM2A 608041;259730;272750 60;21;168 Adenosine Monophosphate PDE4D;PYGL 600129;232700 175;162 Adenosine-5'-Diphosphate GSS;HK1;PMS 2 601002;142600;600259 22;22;9 Alpha-D-Mannose CTLA4;GLA;G P1BA;GUSB;IL 12B;LDLR;SER PINC1 123890;300644;606672; 253220;161561;606945; 107300 18;16;114; 16;218;174;11 Adenosine-5'-Triphosphate ABCA1;ABCB 11;ABCC8;AB CC9;ABL1;AC VRL1;AMHR2 600046;603201;600509; 601439;189980;601284; 600956 174;11;5;195; 125;119;210 Beta-Mercaptoethanol CA2;GNMT;U ROD 259730;606628;176100 21;166;65 Biotin HLCS;MCCC1; PC;PCCA;PCC B 609018;609010;608786; 232000;232050 162;162;162; 162;162 Cacodylate Ion ITGB4;THRB 147557;190160 7;1 Citrulline OTC;SLC25A1 3 300461;603859 162;11 Ethylene Glycol GALE;GLA;HE XB 606953;300644;606873 5;16;168 Flavin-Adenine Dinucleotide ACADM;IVD 607008;607036 162;163 Formic Acid CA2;GPHN 259730;603930 21;168 Fucose CTLA4;GLA;L TF 123890;300644;150210 18;16;217 Glucose HK1;PYGL 142600;232700 22;162 Glycine GNMT;GSS 606628;601002 166;22 Isopropyl Alcohol GCH1;GDF5;G M2A 600225;601146;272750 10;3;168 L-Arginine ARG1;ASL 608313;608310 162;162 L-Aspartic Acid ASPA;SLC25A 13 608034;603859 168;11 L-Isoleucine ACAT1;PCCA; 607809;232000;232050 163;162;162 Experimental Drugs 21 PCCB L-Leucine HMGCL;IVD; MCCC1 246450;607036;609010 162;163;162 L-Methionine MTR;MTRR 156570;602568 98;98 L-Ornithine ARG1;OAT;OT C;SLC25A15 608313;258870;300461; 603861 162;168;162; 168 L-Phenylalanine TAT;TH 276600;191290 166;145 L-Proline PRODH;SLC6 A14 606810;300444 11;204 L-Threonine PCCA;PCCB 232000;232050 162;162 L-Tyrosine TAT;TH;YARS 276600;191290;603623 166;145;147 L-Valine PCCA;PCCB 232000;232050 162;162 Lauric Acid ALB;GM2A 103600;272750 11;168 N-Acetyl-D-Glucosamine ARSA;ARSB;C TLA4;GBA;GL A;GP1BA;GUS B;HEXB;IL12B ;LDLR;LTF;SE RPINC1;STS 607574;253200;123890; 606463;300644;606672; 253220;606873;161561; 606945;150210;107300; 308100 168;16;18; 25;16;114; 16;168;218;1 74;217; 11;167 Nicotinamide-AdenineDinucleotide GALE;HSD17B 4;QDPR 606953;601860;261630 5;167;166 Oxalate Ion LTF;TF 150210;190000 217;65 Phosphoaminophosphonic Acid-Adenylate Ester GALK1;HK1 604313;142600 5;22 Pyruvic Acid PKLR;SLC16A 1 609712;600682 22;172 Vitamin B6 CTH;GAD1;GL DC;OAT;PYGL ;TAT 607657;605363;238300; 258870;232700;276600 166;179;162; 168;162;166 Vitamin K3 F10;F9;VKORC 1 227600;306900;608547 216;28;192 22 Table S5: Decomposition of large modules into secondary level modules. Module ID Size Modularity Module ID Size Modularity Modu 1 Modu 2 Modu 3 Modu 4 Modu 6 Modu 7 Modu 8 Modu 11 246 289 471 256 221 234 111 269 0.128 0.659 0.423 0.336 0.557 0.412 0.301 0.683 Modu 12 Modu 13 Modu 15 Modu 17 Modu 18 Modu 19 Modu 24 Modu 28 192 130 147 100 114 169 120 191 0.698 0.766 0.501 0.697 0.575 0.534 0.661 0.725 23

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Supplementary material for “Modularity in the genetic