Download Supplementary material for “Modularity in the genetic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Fetal origins hypothesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression programming wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Microevolution wikipedia , lookup

Gene expression profiling wikipedia , lookup

Behavioural genetics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Designer baby wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Genome (book) wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Public health genomics wikipedia , lookup

Transcript
Supplementary material for "Modularity in the genetic
disease-phenotype network"
Xingpeng Jiang, Bing Liu, Jiefeng Jiang, Huizhi Zhao, Ming Fan, Jing Zhang and Zhenjie
Fan, Tianzi Jiang
Contents
I.
II.
III.
Computation of dyadicity D and heterophilicity H
Decomposition of large modules into sub-modules
The nature of the phenotypic similarity measure and the implications on
the findings
IV.
Genes associated with phenotypes in module are functional similar
V.
Computing P value for the disease class enrichment of modules in the
phenotype networks
VI.
Constructing pseudo-pleiotropic genes set
VII. Supplementary References
VIII. Supplementary Figures
IX.
Supplementary Tables
1
I.
Computation of dyadicity D and heterophilicity H
The value of a phenotype depends on whether it belongs (1) or does not belong (0)
to a disease class.
Thus three types of links between phenotypes exist: 1-1, 1-0, and
0-0; the number of these links are termed m11 , m10 and m00 , respectively.
The two parameters dyadicity D and heterophilicity H are defined as:
D
m11
m
and H  10 ,
m11
m10
where m11 , m10 represent the expected values of m11 , m10 respectively.
These
parameters can successfully characterize the modular structure of protein-protein
interaction networks
(Park, J. and Barabasi, A.L., 2007).
The expected value of m11 and m10 is computed next. If we take cancer as an
example, we can call
n1
the number of phenotypes belonging to cancer and
number of other phenotypes.
N  n1  n0
is the total number of phenotypes and
the total number of edges in the network. Let
p
2M
N ( N  1)
n0
M
the
is
represent the connectance
that indicates the average probability that two phenotypes are connected in the
network. The value of a phenotype depends on whether it belongs to a cancer class
(1), or does not (0). The three varieties of link styles between phenotypes are 1-1, 1-0,
and 0-0, and the number of these links can be labeled as m11 , m10 and m00
respectively.
If any phenotype in the network has an equal chance of being cancer, the expected
values of m11 and m10 are m11 and m10 respectively (Park, J. and Barabasi, A.L.,
2007).
2
n 
n (n  1)
m11 =  1   p  1 1
p,
2
2 
 n  n 
m10 =  1   0   p  n1 ( N  n1 ) p .
1 1 
Statistically significant deviations of m11 and m10 from their expected values of
m11 and m10 imply that cancer phenotypes are not distributed randomly in the
phenotype network.
Dyadicity D  1 ( D  1 ) indicates that phenotypes in the disease class tend to
connect more (less) densely among themselves than expected for a random
configuration. Similarly heterophilicity H  1 ( H  1 ) means that phenotypes in the
disease class have more (fewer) connections to phenotypes in other classes than
expected randomly.
If D  1 and H  1 , phenotypes in the specific disease class
must have a clear clustering tendency within the network.
II.
Extracting the modules of the phenotype network and decomposition of
large modules into sub-modules
For a given partition of a network, Q is defined as:
m 

Q   eii  ( ei j )2 
i 1 
j

where
m
is the number of modules, eii are the fraction of the edges that connect
two nodes inside a module i , and eij are the fraction of the edges connecting nodes
of module i to j . The modularity Q of a partition is high when the number of
intra-module edges is much larger than expected for a random partition.
We
identified modules by maximizing the modularity Q so that there were many
intra-module edges and few between-module edges.
3
However the method could not
identify the hierarchical structure of the modules. Therefore, we decomposed all
modules which had more than 100 phenotypes into sub-modules.
The number of final modules which are based in the secondary level of modularity
may affect the results.
We managed to reduce the effect by visually inspecting each
sub-network with more than 100 phenotypes in the first level modules while
automatically decomposing the phenotype network using Newman’s algorithm
(Newman, M.E., 2006).
The network was partitioned into 28 modules in the first partition.
We found 16
modules with at least 100 phenotypes, of which 11 modules had a significant
secondary level of modularity using Newman’s algorithm (Table S5, red color).
We
scrutinized
the
each
sub-network
of
phenotypes
in
the
modules
using
spring-embedded layout in Cytoscap software (Shannon, P. et al., 2003) to see if each
sub-network has a clear modular structure.
In the resulting spring-embedded layout,
nodes with edges between them tend to be situated near each other, whereas nodes
without edges between them tend to be spread apart (desJardins, M. et al., 2007).
We found that a modularity 0.5 is an appropriate threshold value for extracting
secondary level modules.
By way of illustration, we showed the different topology structures in Suppl. Fig S5
for the sub-networks of Modu 3 and Modu 15.
The modularity of Modu 3 is 0.423
and the value of Modu 15 is 0.501, which are the nearest modules to the threshold of
0.5. The sub-network of Modu 3 tends to form a densely connected cluster; however,
the sub-network of Modu 15 can be partitioned into several parts (Suppl. Fig S5).
4
We identified 231 modules in the end, most of which (214 of 231) are based on the
secondary level of modularity.
Thus, we believed that this decomposition method
will reveal the actual modularity of the phenotype network.
III.
The nature of the phenotypic similarity measure and the implications on
the findings
We discussed the nature of the similarity measure and the implications on our
findings respectively.
1) The nature of the similarity measure
van Driel et al. (2006) established phenotype similarities from the "anatomy (A)
and the disease (C) sections of the medical subject headings vocabulary (MeSH)" and
the "full-text (TX) and clinical synopsis (CS) fields of all records that describe
genetics disorders" in OMIM database.
Specifically, the MeSh hierarchical tree can
correct for differences in the level of detail of a phenotype description (Brunner,
2004). Then van Driel et al. (2006) chose the "term frequency-inverse document
frequency" (tf-idf) to weight each keyword after correcting for the length of the
records and determined the feature vector similarities by calculating the cosine of the
angle between the vector pairs.
In this method, keywords such as "Blood-Retinal Barrier" which can provide more
specific information about a phenotype are weighted higher than keywords which are
less informative such as "Eye".
The nature of the similarity measure ensures that we
can discover overlapping phenotypes if they have similar clinical traits.
We
expected that the phenotype network constructed by the similarity measure would
provide a large-scale landscape of phenotype relationships.
5
2) The implications of the similarity measure on the modularity of the phenotype
network
First, the nature of the similarity measure ensures that phenotypes with similar
clinical traits are connected and phenotypes in a disease class tend to group in the
phenotype network.
Phenotypes in a disease class are not usually formed into a
single module. However, they can be divided into many different modules.
For
example, phenotypes in the neurological disease class are distributed into about ten
modules; of them, one module contains primarily ataxia phenotypes, such as
spinocerebellar ataxia and cerebellar ataxia; and one module contains mostly
Charcot-Marie-Tooth disease phenotypes.
Thus modules generally are subclasses of
the major disease classes, which were determined manually by Goh et al.
Our
results indicate that the modularization of phenotypes network can identify detailed
classification of disease phenotypes.
Second, in several cases, a module can also contain several disease classes.
For
instance, a module contains neurological phenotypes and metabolic phenotypes,
which have a high degree of phenotypic overlap with neurological diseases.
An
example of such a metabolic disease is sialic acid storage disease, which has mental
retardation and clumsiness as primary features. It means that some phenotypes in
different disease classes may be grouped together because they have similar clinical
traits.
Third, phenotypes within a module are more densely connected than those across
modules and they are more similar in clinical traits.
Given the basic hypothesis that
similar phenotypes have a similar genetic foundation, we inferred that phenotypes in a
module would have similar genetic mechanisms.
6
We discussed whether disease
genes in a phenotype module are functionally similar in section 3.4. The result
indicates that phenotype modules may be used to infer the genetic foundations of
phenotypes without known genes.
In short, the results indicate that the phenotypic similarity network can not only
provide a computational validation of the disease classification which was determined
manually by Goh et al., but also provide a more specific classification of disease
phenotypes. Moreover, these phenotypic modules provide a candidate for
understanding the relationship between diseases and genes.
IV.
Genes associated with phenotypes in a module are functionally similar
We investigated whether genes associated with phenotypes in any given module are
functionally similar using GO annotation.
modules.
We measured the GO homogeneity of the
GO homogeneity (GH) has been used previously (Goh et al. 2007) to
investigate whether genes associated with the same disorder share similar functional
characteristics.
Similarly GH is defined here as the maximum fraction of genes in
the same module that have the same GO terms,
GH i  max j [
n ji
],
ni
where ni denotes the number of genes in the module i that have any GO
annotations, and n j i the number of genes that have the specific GO term j .
We generated 103 random controls for each module to compute GO homogeneity
by picking the same number of genes randomly in the disease genes using GO
7
functional annotation.
Suppl. Fig S3 shows the histogram of GO homogeneity in the
phenotype modules associated with at least 5 GO annotated genes.
As expected, we
found that the distribution of GO homogeneity in modules is significantly higher than
random expectations.
V.
Computing P value for the disease class enrichment of modules in the
phenotype networks
We used the disease classification dataset to see if the disease phenotypes within a
single module tended to fall within the same disease class.
We used the method
described in that computes a P value for the functional enrichment of modules in
protein-protein interaction networks (Wang et al., 2007).
Take cancer as an example.
For a given module M we randomly selected a set of phenotypes which had the same
number of members as M, and counted how many of them are cancer.
The P value
was calculated as the probability that the number of cancer phenotypes in a random
group would be equal to or greater than what we observed in M. We used 100,000
simulations to obtain the P values.
VI.
Constructing pseudo-pleiotropic genes set
We constructed a set of 394 pseudo-pleiotropic genes as a random control to
investigate the distribution of pleiotropic genes in the phenotype network. Each
pseudo-pleiotropic gene had the same number of phenotypes as was known for a
8
corresponding pleiotropic gene, except that these phenotypes were randomly selected.
We computed the mean MSC of the control set.
The P value was calculated as the
probability that the mean MSC in random groups is equal to or greater than the
observed after 100,000 random simulations.
VII.
Supplementary References
Park, J. and Barabasi, A.L. (2007) Distribution of node characteristics in complex
networks. Proc. Natl. Acad. Sci. USA 104, 17916-20.
Newman, M.E. (2006). Modularity and community structure in networks. Proc. Natl.
Acad. Sci. USA 103, 8577-82
Shannon, P. et al. (2003). Cytoscape: a software environment for integrated models of
biomolecular interaction networks. Genome Res. 13, 2498-504.
desJardins, M., MacGlashan, J., and Ferraioli, J. (2007) Interactive visual clustering.
In Proceedings of the 12th international conference on Intelligent user interfaces.
ACM, Honolulu, Hawaii, USA.
Gene Ontology Consortium. (2006). The Gene Ontology (GO) project in 2006.
Nucleic Acids Res. 34:D322-D326.
Goh, K.I., Cusick, M.E., Valle, D., Childs, B., Vidal, M. and Barabasi, A.L. (2007).
The human disease network. Proc. Natl. Acad. Sci. USA 104, 8685-90.
Wang, Z. and Zhang, J. (2007). In search of the biological significance of modular
structures in protein networks. PLoS Comput. Biol. 3, e107.
9
VIII. Supplementary Figures
Fig. S1: The distinctly modular structure of the phenotype network. We used 100
randomized networks as random controls. Each randomized network was generated
by shuffling the edges randomly (10,000 times) in the phenotype network while
keeping both the degree of every node and the degree distribution of the network
unchanged. The Q values of the randomized networks are all slightly greater than
0.2, but the value of an actual network is 0.78. The z-score ( z  (Q  Q) /  ,
where Q and  are the mean and standard deviation of modularity Q ), which measures
the improbability that the observed result is due to chance, is ~309 (z 1 ).
0.3
z-score
= 309
Frequency
0.25
0.2
0.15
0.1
Observed Q = 0.78
0.05
0
0.2
0.3
0.4
0.5
Modularity
10
0.6
0.7
0.8
0.9
Fig. S2: Cartographic representation of the phenotype network. Different colors
indicate different disease classes in modules and unclassified phenotypes are not
shown. We only show here nodules which have at least 15 phenotypes with disease
classifications.
11
Fig. S3: The GO Homogeneity of modules for the GO categories molecular function.
Red bars denote the actual histogram and blue bars represent the random control
obtained for each module by randomly choosing the same number of disease genes
with GO functional annotations.
12
Fig. S4: The target proteins of 37 experimental drugs tend to associate with
module-specific disease phenotypes.
0.5
0.45
0.4
Frequency
0.35
0.3
-8
P <10
0.25
0.2
MSC* = 0.2045
0.15
0.1
0.05
0
0
0.05
0.1
0.15
0.2
0.25
Module-specific coefficient
13
0.3
0.35
0.4
Fig. S5: Selection of threshold for the second level of modularity.
sub-network of Modu 3 tends to form a dense, connected cluster.
sub-network of Modu 15 tends to be partitioned into several parts
A.
B.
14
A, The
B, The
Supplementary Tables
Table S1: The construction of disease phenotype networks by selecting different
thresholds does not have a significant effect on the modularity.
Threshol Nodes Edges Modularity
d
0.45
0.475
0.5
0.525
0.55
0.575
0.6
0.625
0.65
0.675
4,681
4,425
4,146
3,783
3,381
3,008
2,575
2,202
1,888
1,568
56,752
40,482
29,489
21,778
16,349
12,520
9,592
7,479
5,890
4,672
15
0.717
0.748
0.783
0.807
0.824
0.833
0.833
0.831
0.833
0.833
16
Table S2: Dyadicity and heterophilicity of 21 disease classes.
Disease class
Bone
Cancer
Cardiovascular
Dermatological
Developmental
Ear, Nose, Throat
Endocrine
Gastrointestinal
Hematological
Immunological
Metabolic
Muscular
Neurological
Nutritional
Ophthalmological
Psychiatric
Renal
Respiratory
Skeletal
Multiple
Tissue
Total
Number of
Intra-class
Phenotypes
connections
38
60
50
68
36
40
47
19
40
37
129
61
184
2
100
13
32
6
66
128
28
1,184
48
32
69
100
15
482
21
12
11
31
158
214
508
1
422
7
16
2
119
95
16
2,739
Total
Links
264
173
274
658
638
3071
182
53
180
100
513
618
2058
11
1644
175
144
28
1462
1719
214
14,179
17
Average
Degree
13.895
5.767
10.96
19.353
35.444
153.55
7.745
5.579
9
5.405
7.954
20.262
22.37
11
32.88
26.923
9
9.333
44.303
26.86
15.286
11.97
Dyadicity
19.895
5.268
16.413
12.791
6.938
180.06
5.661
20.448
4.11
13.563
5.576
34.074
8.792
291.382
24.841
26.15
9.4
38.851
16.165
3.406
12.334
Heterophilicity
0.403
0.168
0.292
0.586
1.227
4.593
0.244
0.152
0.3
0.132
0.2
0.472
0.62
0.352
0.88
0.911
0.283
0.305
1.453
0.92
0.5
Table S3: Target proteins of FDA-approved drugs and their associated phenotype
modules.
FDA-approved
Drugs
Target
proteins
OMIM Gene Number
Associated
Module ID
Amiloride
SCNN1A;SCN
N1B;
SCNN1G
600228;600760;600761
81;45;45
Bendroflumethiazide
CA2;CA4
259730;114760
21;6
Benzthiazide
CA2;CA4
259730;114760
21;6
Bepridil
CACNA1A;SC
N5A
601011;600163
1;14
Calcidiol
CYP27B1;VD
R
609506;601769
21;21
Chlorothiazide
CA2;CA4
259730;114760
21;6
Cinnarizine
CACNA1A;DR
D2
601011;126450
1;10
Coagulation factor VIIa
F10;F9
227600;306900
216;28
Cocaine
DRD3;SCN5A;
SLC6A4
126451;600163;182138
10;14;9
Cyclothiazide
CA2;CA4
259730;114760
21;6
Desflurane
ATP2C1;GAB
RA1;KCNA1
604384;137160;176260
7;180;145
Desipramine
ADRB1;SLC6
A4
109630;182138
172;9
Diazoxide
CA2;CA4
259730;114760
21;6
Drotrecogin alfa
F5;F8
227400;306700
35;28
Enflurane
ATP2C1;GAB
RA1;KCNA1
604384;137160;176260
7;180;145
Eptifibatide
ITGA2B;ITGB
3
607759;173470
216;216
Gliclazide
ABCC8;KCNJ
1
600509;600359
5;45
Halothane
ATP2C1;GAB
RA1
604384;137160
7;180
Heparin
F10;SERPINC1
227600;107300
216;11
Hydrochlorothiazide
CA2;CA4
259730;114760
21;6
Hydroflumethiazide
CA2;CA4;SLC
12A1
259730;114760;600839
21;6;45
Hydroxocobalamin
AMN;MMAA;
MMAB;MTR;
MTRR;MUT
605799;607481;607568;156
570;602568;609058
164;94;94;
98;98;162
18
Imatinib
ABL1;KIT
189980;164920
125;1
Indapamide
KCNE1;KCNQ
1
176261;607542
175;175
Interferon gamma-1b
IFNGR1;IFNG
R2
107470;147569
218;218
Iron Dextran
FTL;TF
134790;190000
5;65
Isoflurane
ATP2C1;GAB
RA1;KCNA1
604384;137160;176260
7;180;145
Levocarnitine
CPT1A;CPT2;
SLC22A4;
SLC22A5
600528;600650;604190;603
377
162;162;
95;95
Levosimendan
TNNI3;TNNT2
191044;191045
197;195
Liothyronine
ALB;THRB
103600;190160
11;1
Menotropins
FSHR;LHCGR
136435;152790
134;133
Methoxyflurane
ATP2C1;GAB
RA1;KCNA1
604384;137160;176260
7;180;145
Methyclothiazide
CA2;CA4;SLC
12A1
259730;114760;600839
21;6;45
Minaprine
DRD2;SLC6A4
126450;182138
10;9
Perhexiline
CPT1A;CPT2
600528;600650
162;162
Pramipexole
DRD2;DRD3
126450;126451
10;10
Ribavirin
ENPP1;IMPDH
1
173335;146690
19;6
Risperidone
ADRB1;DRD2
109630;126450
172;10
Ropinirole
DRD2;DRD3
126450;126451
10;10
Sevoflurane
ATP2C1;GAB
RA1;KCNA1
604384;137160;176260
7;180;145
Tirofiban
ITGA2B;ITGB
3
607759;173470
216;216
Tolbutamide
ABCC8;KCNJ
1
600509;600359
5;45
Topiramate
CA2;CA4;GAB
RA1;SCN1A
259730;114760;137160;182
389
21;6;
180;180
Trichlormethiazide
CA2;CA4;SLC
12A1
259730;114760;600839
21;6;45
Vitamin A
RDH12;RDH5;
RLBP1
608830;601617;180090
6;6;6
Vitamin B12
AMN;MMAA;
MMAB;MTR;
MTRR
605799;607481;607568;156
570;602568
164;94;94;
98;98
Vitamin
(Ergocalciferol)
D2
CYP27B1;CYP
2R1;VDR
609506;608713;601769
21;21;21
Vitamin
D3
CYP2R1;VDR
608713;601769
21;21
19
(Cholecalciferol)
Ziprasidone
ADRB1;DRD2;
DRD3
109630;126450;126451
20
172;10;10
Table S4: Target proteins of experimental drugs and their associated phenotype
modules.
Target
proteins
OMIM Gene Number
Associated
Module ID
3,5,3',5'-TETRAIODO-LTHYRONINE
ALB;THRB
103600;190160
11;1
Acetate Ion
ANTXR2;CA2;
GM2A
608041;259730;272750
60;21;168
Adenosine Monophosphate
PDE4D;PYGL
600129;232700
175;162
Adenosine-5'-Diphosphate
GSS;HK1;PMS
2
601002;142600;600259
22;22;9
Alpha-D-Mannose
CTLA4;GLA;G
P1BA;GUSB;IL
12B;LDLR;SER
PINC1
123890;300644;606672;
253220;161561;606945;
107300
18;16;114;
16;218;174;11
Adenosine-5'-Triphosphate
ABCA1;ABCB
11;ABCC8;AB
CC9;ABL1;AC
VRL1;AMHR2
600046;603201;600509;
601439;189980;601284;
600956
174;11;5;195;
125;119;210
Beta-Mercaptoethanol
CA2;GNMT;U
ROD
259730;606628;176100
21;166;65
Biotin
HLCS;MCCC1;
PC;PCCA;PCC
B
609018;609010;608786;
232000;232050
162;162;162;
162;162
Cacodylate Ion
ITGB4;THRB
147557;190160
7;1
Citrulline
OTC;SLC25A1
3
300461;603859
162;11
Ethylene Glycol
GALE;GLA;HE
XB
606953;300644;606873
5;16;168
Flavin-Adenine
Dinucleotide
ACADM;IVD
607008;607036
162;163
Formic Acid
CA2;GPHN
259730;603930
21;168
Fucose
CTLA4;GLA;L
TF
123890;300644;150210
18;16;217
Glucose
HK1;PYGL
142600;232700
22;162
Glycine
GNMT;GSS
606628;601002
166;22
Isopropyl Alcohol
GCH1;GDF5;G
M2A
600225;601146;272750
10;3;168
L-Arginine
ARG1;ASL
608313;608310
162;162
L-Aspartic Acid
ASPA;SLC25A
13
608034;603859
168;11
L-Isoleucine
ACAT1;PCCA;
607809;232000;232050
163;162;162
Experimental Drugs
21
PCCB
L-Leucine
HMGCL;IVD;
MCCC1
246450;607036;609010
162;163;162
L-Methionine
MTR;MTRR
156570;602568
98;98
L-Ornithine
ARG1;OAT;OT
C;SLC25A15
608313;258870;300461;
603861
162;168;162;
168
L-Phenylalanine
TAT;TH
276600;191290
166;145
L-Proline
PRODH;SLC6
A14
606810;300444
11;204
L-Threonine
PCCA;PCCB
232000;232050
162;162
L-Tyrosine
TAT;TH;YARS
276600;191290;603623
166;145;147
L-Valine
PCCA;PCCB
232000;232050
162;162
Lauric Acid
ALB;GM2A
103600;272750
11;168
N-Acetyl-D-Glucosamine
ARSA;ARSB;C
TLA4;GBA;GL
A;GP1BA;GUS
B;HEXB;IL12B
;LDLR;LTF;SE
RPINC1;STS
607574;253200;123890;
606463;300644;606672;
253220;606873;161561;
606945;150210;107300;
308100
168;16;18;
25;16;114;
16;168;218;1
74;217;
11;167
Nicotinamide-AdenineDinucleotide
GALE;HSD17B
4;QDPR
606953;601860;261630
5;167;166
Oxalate Ion
LTF;TF
150210;190000
217;65
Phosphoaminophosphonic
Acid-Adenylate Ester
GALK1;HK1
604313;142600
5;22
Pyruvic Acid
PKLR;SLC16A
1
609712;600682
22;172
Vitamin B6
CTH;GAD1;GL
DC;OAT;PYGL
;TAT
607657;605363;238300;
258870;232700;276600
166;179;162;
168;162;166
Vitamin K3
F10;F9;VKORC
1
227600;306900;608547
216;28;192
22
Table S5: Decomposition of large modules into secondary level modules.
Module
ID
Size
Modularity
Module
ID
Size
Modularity
Modu 1
Modu 2
Modu 3
Modu 4
Modu 6
Modu 7
Modu 8
Modu 11
246
289
471
256
221
234
111
269
0.128
0.659
0.423
0.336
0.557
0.412
0.301
0.683
Modu 12
Modu 13
Modu 15
Modu 17
Modu 18
Modu 19
Modu 24
Modu 28
192
130
147
100
114
169
120
191
0.698
0.766
0.501
0.697
0.575
0.534
0.661
0.725
23