Download supplementary information - Molecular Systems Biology

SUPPLEMENTARY INFORMATION Strain Construction Our seed genes were TEC1, CUP9, SFL1, SOK2, and SKN7. Tec1 is in the filamentation MAP-kinase pathway. Sfl1 and Sok2 are involved in the Ras-cAMP pathway. Skn7 is in a redox-sensing pathway. Cup9 is involved in ion homeostasis. The chosen transcription factors have sets of known protein-DNA interactions and they have substantial overlap in the genes they bind. This facilitated mapping of molecular influences mediating their interacting influences on gene expression and phenotype. All strains are derivatives of a filamentation-competent 1278b wild-type strain, G85 (MATa/ ura30/ ura30 his30::hisG/his30::hisG). Gene deletions were made using a PCR-based strategy in which KanMX4 “barcode” deletion alleles (Winzeler et al., 1999) or natMX4 alleles (Goldstein and McCusker, 1999) were amplified, transformed into G85, and verified by PCR and tetrad dissection. Standard methods (Guthrie C, 1991) were used for all transformations and crosses to construct homozygous diploid mutant derivatives. Strains tec1::bcKanMX4, sfl1::bcKanMX4, sok2::bcKanMX4, phd1::bcKanMX4, and rox1::bcKanMX4 are from Drees et al. (Drees et al., 2005). The deletion-insertion alleles, cup9::bcKanMX4, skn7::bcKanMX4, cin5::NatMX4, mot3::NatMX4, sko1::NatMX4, and yap6::NatMX4 and all double mutants were constructed for this study. Deletion of UME6 proved inviable in the 1278b genetic background. 1 The CUP9 gene deletion was constructed using primers CUP9-up-F (5’-GCC TCC TGT TTC TGT TAA TTG G-3’) and CUP9-down-R (5’-TCA GAC CAG GTT TCG ATG AAG-3’). The SKN7 primers were SKN7-F1 (5’-AGG CTT GCT GCT TTT GTT TG3’) and SKN7-R1 (5’-AAT TTG AGA GCG GCA GAA AG-3’). The CIN5 primers were CIN5-NatF (5’-AGA ATA ACA GCT TGG AAC AAG AAG GAA AAC CAA AAA CCT ACT CAA GCA TAG GCC ACT AGT GGA TCT G-3’) and CIN5-NatR (5’TGA AAA CTT TTA AGA TGT TAC TAG TAC TAA TAA TTA TTC ATT ATT CAG CTG AAG CTT CGT ACG C-3’). The MOT3 primers were MOT3-NatF (5’- AGG CAA CAG TAG GCA AAT AGT AAA GGG ACA TAT CAT ATT TGA GCA GCA TAG GCC ACT AGT GGA TCT G-3’) and MOT3-NatR (5’-GTT AAA TGA GTG GGA AGG GAT ATT TTG TGT GTC TAT AAA GTC TAT CAG CTG AAG CTT CGT ACG C-3’). The SKO1 primers were SKO1-NatF (5’-CAT TCC AAA TAC ACC TGC CCA GTC TCT AGA CCC TGC TTA ATC ATT GCA TAG GCC ACT AGT GGA TCT G-3’) and SKO1-NatR (5’-AAA GCA TCA GAT AGA AGA CTA TTT AAG AAC CCC GTC GCT ATC TCG CAG CTG AAG CTT CGT ACG C-3’). The YAP6 primers were YAP6-NatF (5’-GAA ATT TCA ATA AAC AAC AGA ATA ACG AAG AGT GCT AAG GGA CAA GCA TAG GCC ACT AGT GGA TCT G-3’) and YAP6-NatR (5’-GAT CTT CCA GTA CTA GAG ATC AAT ATC TGC TCC CTA TTT ATT GTA CAG CTG AAG CTT CGT ACG C-3’). 2 Assays of filamentous growth Synthetic Low-Ammonium Dextrose (SLAD) agar with uracil and histidine was used to induce filamentous growth (Gimeno et al., 1992). Synthetic Complete Dextrose (SCD) was used for yeast-form growth. To test filamentation phenotypes, strains were streaked on SLAD plates, incubated at 30º for 8 hours and imaged using a Nikon Coolpix 990 digital camera mounted on a Nikon TS100 inverted microscope with a 40X objective. Colonies were examined for elongated cell morphology, unipolar budding patterns, and invasive growth into the solid medium and scored for overall filamentation relative to the wild-type strain. Phenotype inequalities were deduced by comparing filamentation of the four relevant strains on the same plate in order to avoid variation in plates. Yeast-filamentation expression-profiling experiments We collected expression profiles in triplicates (Thompson et al., 2005) of wild type and mutant strains grown under filamentous-form conditions for 10 hours, as previously described (Prinz et al., 2004). Target labeling with the GeneChip® One-Cycle Target Labeling kit and hybridization to Yeast Genome S98 Arrays was done according to the manufacturer’s protocols (www.affymetrix.com). Microarray data were collected for yeast diploid strains of 16 genotypes including wild type, deletion mutants for each of the 5 transcription factors, and double-deletion mutants for all 10 double-mutant combinations of the 5 transcription factors. Microarray data were normalized using robust multi-array averaging (RMA) (Irizarry et al., 2003) as implemented in the BioConductor software package (Gentleman et al., 2004). Each gene 3 expression data point was taken as the mean of the three corresponding biological replicates. We verified that the mean variation between biological replicates was less than the mean variation between different strains (data not shown). Expression intensities for each gene were transformed into Log-2 ratios relative to yeast-form wildtype expression for all subsequent analysis. We restricted all subsequent analysis to a set of 1863 genes with differential expression, defined as having a factor of two difference between their lowest and highest expression intensities. Genetic Influences Decomposition The matrix decomposition outlined in Figure 1 is readily expanded for the case of more than two seed genes and an unlimited number of expression profiles. For example, three seed genes we would be written as  X WT  WT Y  WT Z    X A X B X C X AB X AC A B C AB AC Y Z A  Y Z B  Y Z C  Y Z AB  Y Z AC  X BC   x0 x A xB xC   1     WT Y BC   y 0 y A y B yC   g A   WT  Z BC   z0 zA zB zC   gB       gCWT      1 1 1 1 1 1   0 g AB g CA 0 0 g ABC  gBA 0 gBC 0 gBAC 0   A B AB gC gC 0 gC 0 0  (Eq. S1) Subscripts denote the influencer gene and superscripts denote genetic backgrounds, in which labels (A, AB, etc.) imply deleted genes. Our data involved five seed genes. The form of matrix G specified in Equation S1 guarantees the existence a unique best-fit solution due to the strict arrangement of ones and zeros required by the genotypes (i.e., the rows of matrix G are linearly independent and cannot be transformed to yield a similar format). We used singular value decomposition (SVD) to aid in finding the bestfit solution. SVD dimensionally reduces the expression data set to a small number of 4 modes, each with a unique eigengene (Alter et al., 2000; Carter et al., 2006). Of the 16 SVD modes, the first 6 modes account for 96% of the information in the data set. This provides support for the suitability of a linear model, because it is consistent with the dimensional reduction of 16 experimental conditions to 6 linearly independent modes (the 5 perturbed genes plus the collective remainder of the genome), plus a small noise component. For comparison, in SVD analysis of single-gene perturbations in the RascAMP system (Carter et al., 2006) the 9 conditions effectively reduced to 7 modes, a substantially higher fraction than that found in the present case. Genetic-influences decomposition can be readily performed on this much smaller data matrix. Then, the full influences matrix X can be determined by multiplying the results by the SVD eigenarrays. Finding the best-fit solution then becomes a tractable problem using commercial software on a PC. In matrix notation, this procedure is summarized: D = u . v . wT  u . v . x . G = X . G, (Eq. S2) where the symbol  denotes a best-fit solution. The matrices u, v, and wT are the singular value matrices for the first six modes. The 6 x 6 square matrix x encodes the expression influences for the first six eigengenes. SVD results are discussed in greater detail below. The 1863 x 6 matrix X contains the expression influences for each gene (Equation S1). Goodness of fit To assess the goodness of fit for the genetic influences decomposition, we compared fitted double-mutant expression values with those predicted by an additive control model 5 (Equation S8). This estimates the expression of every double mutant as the sum of effects for the two single mutants. We then compared both models to the double-mutant expression measurements. Reduced chi-square values were 0.035 for the model fit compared to 0.41 for the additive control (chi-square variances for experimental data were estimated by mean variation between biological replicates), and the subsequent relative chi-square probability was negligible. For each expression profile, we computed the Pearson correlation between the experimental data and both models. The mean correlation for the model fit was 0.87, compared to 0.49 for the additive control. Distributions of the correlation coefficients are shown in Figure S1, which demonstrates that the model both fits the data very well and fits the data much better than the additive control. Identification of significant expression influences The matrix X (Equations 1 and S2) is a table of influences from each seed gene (plus one influence from the genetic background) to each gene in our expression set. To determine which influences are significant, we performed a series of bootstrap cross-validations. We performed influence decompositions (Equation 1) for 5000 subsets of the data matrix D involving half of the 1863 rows, each chosen at random. This gave us approximately 2500 solutions for each matrix element (influence coefficient) that comprised distributions that reflected the variability in the data. We then examined the probability that two influence coefficients were identical as a function of their difference. We performed Welch’s approximate t-tests for 5 x 105 randomly chosen pairs of coefficients (all pairs would have been unnecessary and computationally prohibitive). The p-values 6 were binned based on the difference between the two corresponding coefficients and we computed the mean p-value as a function of coefficient difference. The mean p-value decreases monotonically from a value of p = 1 for negligible coefficient differences. We chose p < 0.001 as a cutoff, and found that this corresponded to coefficient differences of 0.1 or greater. Thus, we consider every influence coefficient with magnitude of 0.1 or greater to be significantly different from zero. We then individually examined columns 2 through 6 of the X matrix to identify genes that receive significant influences from the five seed genes. For each of the five seed genes, we determined negative-influence and positive-influence gene sets. Thus we obtained ten gene sets in total, with many overlapping elements. The genes are listed in Table S2. The average set had 280 genes, although this ranged from 47 (TEC1-Positive and CUP9-Positive) to 980 (SFL1-Positive) genes. Positive-influence gene sets were many times larger than negative-influence sets for TEC1, CUP9, and SFL1, while the SKN7 was the source of more than twice as many negative influences as positive ones. We queried each gene set for over-represented targets of transcription factors based on a hypergeometric distribution. Results are listed in Table S3. We constructed regulatory networks based on this analysis as described above, mapping putative pathways of transcriptional influence from each seed gene to the sets of genes it positively and negatively influences. A representative network is shown in Figure 3. 7 Network construction We constructed transcriptional regulation networks connecting the seed genes to its influenced genes via the enriched transcription factors (Carter et al., 2006). Public databases were queried for protein-protein and protein-DNA interactions. Because our seed genes were transcription factor genes, we often found these factors among the enriched regulators of the co-regulated genes. Each subnetwork was constructed using the shortest molecular interaction paths connecting the seed gene to the positive or negative influence gene-set via enriched transcription factors (Table S3). Pathways involving five or greater links were discarded because they are longer than the average shortest connection between any two elements in the global network, and thus are of questionable biological relevance. Thus every path from a seed gene to its influence targets passes through one or more of the enriched transcription factors, such that all molecular paths in the network are obtained using statistical evidence of co-regulation. The exclusion of alternate paths that bypass those involving enriched transcription factors was justified a posteriori by the inaccuracy of the predictions generated from them (data not shown). This provides indirect evidence of modularity in transcriptional regulation, suggesting that genetic co-expression results from coordinated activity of small groups of transcription factors. Shortest paths were chosen because they are most likely to be biologically active (Steffen et al., 2002). Each resulting subnetwork (Figure S2) traces a distinct putative influence from seed genes, through specific molecular interactions, to a gene set that received either a positive or negative expression influence. 8 Modeling knockouts in the seed gene network The genotype matrix G encodes activity levels of each seed gene for each of the mutant strains involving seed gene knockouts. Since changes in activity levels result from influence in the biochemical network, seed gene activities can be modified by additional genetic perturbations. To model this, we next infer and quantify the influences the seed genes exert on each others’ activity level. This is a further dimensional reduction of the genotype matrix G. Starting again with the case of two seed genes, A and B, we can write for their activity levels: A = A0 + mAB B B = B0 + mBA A (Eq. S3) The variables A and B define generalized activity levels of the seed genes, and the parameters A0 and B0 represent basal input not directly due to genes A or B. The mij account for influences between A and B. Self-influences, such as mAA, are not included because they cannot be numerically distinguished from the basal input. The model can be readily generalized to the case of N perturbed genes. For a vector of gene activities, we replace {A, B} with g = {g1, g2, ..., gN} and write gi = g0i + mij gj (Eq. S4) where g0 is a vector of base activity and the mij form a N x N matrix encoding the influence of the ith gene on the jth gene. Equation S4 can be solved for the activities and we find the vector solution gWT = (1 - m)-1. g0 (Eq. S5) where 1 is the N x N identity matrix. The vector {1, gWT} is the first column of the genotype matrix G. 9 In this formulation, the deletion of a seed gene requires setting both its base activity and its influences on other seed genes to zero. This corresponds to replacing the appropriate entry in g0 with zero and the appropriate column in m with zeros. This can be achieved by rewriting Equation S5 in terms of a diagonal base activity matrix, G0, formed by placing the elements of the vector g0 along the diagonal, and a scaled influence matrix with elements Mij = mij / (g0)i. Defining the vector 1 = {1, 1, …, 1, 1} of length N, we then have gWT = [(G0)-1 - M]-1. 1 (Eq. S6) In this form, a deletion of gene A is modeled by taking the limit as (G0)AA → 0, which means setting the basal activity of that gene to zero. The resulting gA (with a 1 prepended) corresponds to the second column of the matrix G. Multiple deletions are modeled by taking multiple zero limits for entries of the diagonal matrix G0. This effectively removes all traces of the deleted genes from the system. Note that by fixing gWT = {1, 1, 1, …,1} we can find a solution for the matrix elements of M and the base activities G0 using the matrix elements of the best fit solution for G as described above. Since this is a further dimensional reduction, reducing N2(N-1)/2 parameters to N2, we must again find a best fit solution. For N = 5 perturbed genes the task was easily performed by a desktop PC. For the ordering of seed genes {TEC1, CUP9, SFL1, SOK2, SKN7}, we found 10  0.16  0.49  0  0  1.37   0.39 M    0.2  0.28 0  0.41 0.68  0.25    0.11  0.19  0.81  0.08 0.09    0.39 1.09   0.31 0.38   0  0.10    0.09 0  (Eq. S7) E.g., TEC1 activity receives an influence of –0.49 from SFL1. These influences are depicted graphically in Figure 4A. For each of these influences, the shortest paths were found in the global physical network following the procedure described above. These paths represent putative biomolecular mechanisms for influence between the seed genes. Select paths are shown in Figure S2. The YAP6 gene deletion had multiple effects on M. As with the gene-influences network for Mode 2 (Figure 4B), Yap6 was the primary candidate for all influences from CUP9 and many from SFL1. Specifically, the second and third columns of M correspond to influences from CUP9 and SFL1 to the other seed genes, respectively. The paths that putatively transmit these influences involve Yap6 (Figure S2). To model strains with YAP6 gene deletions, we initially set all Mi2 = 0 and all Mi3 = 0 except for M53 (since we did not find a candidate path involving YAP6 for the SFL1-SKN7 influence). As explained in the main text, we set six of 20 nonzero elements in the matrix M to zero for the YAP6 gene deletion, which lead to changes throughout the G matrix when its columns were computed with Equation S6. This modified G matrix represented the activities of our seed genes in the yap6Δ genetic background, and the columns corresponding to yap6Δ, cup9Δyap6Δ, sfl1Δyap6Δ, and sok2Δyap6Δ were used to compute expression predictions along with the yap6Δ version of X (main text). 11 Assessment of Gene Expression Predictions To assess the accuracy of our predictions, we performed chi-square tests with the error ranges serving as a measure of data variance. Experimental uncertainties were observed in the median average deviation between biological replicates. Theoretical uncertainties were estimated from genome-wide goodness-of-fit of our original model solution. Likelihoods of fit goodness were calculated from a chi-square distribution. The chi-square fits proved to be uniformly excellent. However, it is possible that trends in expression across all genes are readily recovered with most linear modeling procedures. As a control we computed similar predictions based on a model without genetic interactions between the perturbed genes. In this control model, perturbed genes will influence gene expression patterns, but they will not influence each other. Thus each double-mutant prediction will be the direct sum of the single-mutant measurements, all relative to wild-type expression. Following the notation used in Table 1, the expression of gene X in a double-mutant background is: (XAB – XWT) = (XA – XWT) + (XB – XWT). (Eq. S8) This control model retains general trends in gene expression but inherently lacks genetic interactions and is thus an appropriate test of our approach. We repeated the chi-square for both assessments of accuracy to determine if the model with genetic interactions is consistently more predictive than the additive control model. To quantify this, we computed the relative probability of chi-square goodness-of-fit for our model versus the additive control. 12 Singular Value Decomposition Analysis In parallel with our genetic influence decomposition described above, we analyzed the expression data with SVD. This was done in order to identify a co-expression pattern that best correlated with filamentous-growth phenotype observations for all strains. We then hypothesized that the transcriptional network constructed for the genes showing this pattern would be a basis for filamentous-growth phenotype predictions. We performed SVD, a linear algebra method that rearranges microarray data into a series of composite expression patterns, or eigengenes, and sets of genes that show those patterns. We follow the methods of Carter, et al (Carter et al., 2006). Briefly, each mode has a set of genes that exhibits the pattern with positive coefficient, and another set with negative coefficient. These coefficients are the matrix elements of u in Equation S2. The patterns (eigengenes, the matrix wT in Equation S2) and overall mode weights (eigenvalues, diagonal elements of v in Equation S2) for the first six modes (of sixteen total; see discussion following Equation S1) are shown in Figure S3. Because each of the six relevant modes has positive and negative sets, there are 12 gene sets. Joint membership of any gene in more than one SVD mode is possible. Roughly one third of the genes have no gene-set memberships, one third are members of one set, and one third are grouped into more than one mode. The joint membership of a gene in more than one mode indicates that the expression pattern of the gene is a weighted composite of the modes of which it is a member. 13 To identify which mode will serve as the best expression proxy for phenotype, we computed the Pearson correlation coefficient for each expression pattern (eigengene) and a discretized version of our filamentous-growth observations. The discretization is the minimal number of integer values consistent with our strain-by-strain comparisons (see above). For example, the phenotype inequality for the pair tec1 and cup9 is tec1 cup9 = tec1 < wt < cup9, as observed on a plate comparing filamentous growth of all four strains (here A is the phenotype value of strain A). The results are listed in Table S1. The correlation coefficients for the phenotype measurements and each of the 16 SVD modes were, in mode order: {-0.31, 0.74, 0.44, -0.12, 0.0037, 0.19, -0.10, 0.17, 0.086, 0.18, 0.30, -0.00049, 0.21, 0.032, -0.0040, -0.0035}. The SVD gene set that is the best candidate proxy for the phenotype is thus 2-Positive gene set (Table S4). Note that this is not the dominant expression pattern. It is globally weighted less than the first expression pattern (Figure S3A). This provides a posteriori justification of our choice of SVD analysis, as this expression profile would be more difficult to identify using exclusive clustering methods based on non-decomposed expression profiles. Moreover, the bestcorrelated mode was among the first six modes, and is thus one of those previously deemed biologically relevant. We analyzed the Mode-2 gene sets for over-representation of transcription factor targets, as described above for the influence sets. Results are reported in Table S5. To construct a biochemical network for the regulation of the Mode-2 gene set, we determined which seed genes had a substantial influence on that expression pattern. The influences on each SVD set were determined above, and encoded in the 6 x 6 matrix x in 14 Equation S2. To determine which of the seed genes had the strongest influence on a given mode, we were not able to repeat the procedure used for the gene-by-gene influence coefficients (see above) due to the limited number of pairwise comparisons. Instead, we examined the distributions of the coefficients in x from the bootstrap subsolutions. These distributions had a mean standard deviation of 0.14. Thus we considered all coefficients with magnitude greater than 0.14 to be substantially different than zero. Influence coefficients for the Mode-2 gene set compose the second row of x. The values in this row were: xTEC1 = 0.28, xCUP9 = 0.19, xSFL1 = 0.18, xSOK2 = 0.09, and xSKN7 = -0.11. Thus we identified positive influences from TEC1, CUP9, and SFL1, and constructed our network with these (Figure 4B). Network construction was performed as described above, mapping influences from the seed genes (multiple in this case) to the SVD set genes receiving the influences, via the over-represented transcription factors. Phenotype Predictions We made predictions of the genetic interactions and phenotypes of 13 novel doubleknockouts (including the four for which we collected microarray data) based on the topology of the Mode-2 molecular network (Figure 4B). Most of these involved a newly implicated gene with one of the original five seed genes. To make specific predictions we identified three network configurations (or motifs) describing the relationship between the two perturbed genes: (1) serial, in which one gene is in the only molecular path by which an influence is passed from the upstream gene to the Mode-2 gene set; (2) intermediate, in which the serial gene lies in one of two or more molecular paths from the upstream gene; and (3) parallel, in which the two perturbed genes lie in two separate 15 branches of molecular influence. These are summarized in Figure 4C. For the serial motif, we expected the downstream perturbation to mask effects of the upstream perturbation; thus the double perturbation should be identical to a single perturbation of the downstream gene. For the intermediate motif, the effects of the downstream perturbation would only partly mask the upstream perturbation and the double mutant will have a phenotype slightly less than the sum of the single mutant effects. For the parallel motif, we proposed that the genes act independently and hence the double mutant phenotype will be the sum of effects from the two single mutants. We matched one of these motifs to each double mutant based on the two genes’ relative position in the Mode2 network. For example, YAP6 is downstream of CUP9, so the double-mutant phenotype should resemble that of a YAP6 single deletion. We assessed the accuracy of the model predictions by comparing with the results expected from a training set of 1809 genetic interactions for a closely related phenotype (Drees et al., 2005). Using these genetic interactions as a training set, we identified the most probable phenotype for each novel double-mutant perturbation by determining the probabilities for every possible double-mutant outcome given the wild-type and singlemutant phenotypes. For example, for a hypo-invasive A mutant and a hyper-invasive B mutant, we found that the probabilities for the AB phenotype were 28% hypo-invasive, 50% hyper-invasive, and 22% wild-type. We compared the number of correct predictions (NC, out of 13 possible) with the expected number correct obtained from the training set. To do this, we summed the training-set probabilities of all possible outcomes (Bernoulli trials) with NC or more correct to compute the likelihood of correctly 16 predicting NC or more phenotypes (a multinomial distribution is inadequate because the outcome probabilities vary for each single-mutant combination). SUPPLEMENTARY REFERENCES Alter, O., Brown, P.O. and Botstein, D. (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci U S A, 97, 10101-10106. Carter, G.W., Rupp, S., Fink, G.R. and Galitski, T. (2006) Disentangling information flow in the Ras-cAMP signaling network. Genome Res, 16, 520-526. Drees, B.L., Thorsson, V., Carter, G.W., Rives, A.W., Raymond, M.Z., Avila-Campillo, I., Shannon, P. and Galitski, T. (2005) Derivation of genetic interaction networks from quantitative phenotype data. Genome Biol, 6, R38. Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J.Y. and Zhang, J. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol, 5, R80. Gimeno, C.J., Ljungdahl, P.O., Styles, C.A. and Fink, G.R. (1992) Unipolar cell divisions in the yeast S. cerevisiae lead to filamentous growth: regulation by starvation and RAS. Cell, 68, 1077-1090. Goldstein, A.L. and McCusker, J.H. (1999) Three new dominant drug resistance cassettes for gene disruption in Saccharomyces cerevisiae. Yeast, 15, 1541-1553. Guthrie C, F.G. (1991) Guide to yeast genetics and molecular biology. Academic Press, New York. Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U. and Speed, T.P. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249-264. Prinz, S., Avila-Campillo, I., Aldridge, C., Srinivasan, A., Dimitrov, K., Siegel, A.F. and Galitski, T. (2004) Control of yeast filamentous-form growth by modules in an integrated molecular network. Genome Res, 14, 380-390. Steffen, M., Petti, A., Aach, J., D'Haeseleer, P. and Church, G. (2002) Automated modelling of signal transduction networks. BMC Bioinformatics, 3, 34. Thompson, K.L., Rosenzweig, B.A., Pine, P.S., Retief, J., Turpaz, Y., Afshari, C.A., Hamadeh, H.K., Damore, M.A., Boedigheimer, M., Blomme, E., Ciurlionis, R., Waring, J.F., Fuscoe, J.C., Paules, R., Tucker, C.J., Fare, T., Coffey, E.M., He, Y., Collins, P.J., Jarnagin, K., Fujimoto, S., Ganter, B., Kiser, G., KaysserKranich, T., Sina, J. and Sistare, F.D. (2005) Use of a mixed tissue RNA design for performance assessments on multiple microarray formats. Nucleic Acids Res, 33, e187. Winzeler, E.A., Shoemaker, D.D., Astromoff, A., Liang, H., Anderson, K., Andre, B., Bangham, R., Benito, R., Boeke, J.D., Bussey, H., Chu, A.M., Connelly, C., 17 Davis, K., Dietrich, F., Dow, S.W., El Bakkoury, M., Foury, F., Friend, S.H., Gentalen, E., Giaever, G., Hegemann, J.H., Jones, T., Laub, M., Liao, H., Liebundguth, N., Lockhart, D.J., Lucau-Danila, A., Lussier, M., M'Rabet, N., Menard, P., Mittmann, M., Pai, C., Rebischung, C., Revuelta, J.L., Riles, L., Roberts, C.J., Ross-MacDonald, P., Scherens, B., Snyder, M., Sookhai-Mahadeo, S., Storms, R.K., Veronneau, S., Voet, M., Volckaert, G., Ward, T.R., Wysocki, R., Yen, G.S., Yu, K., Zimmermann, K., Philippsen, P., Johnston, M. and Davis, R.W. (1999) Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science, 285, 901-906. www.affymetrix.com. 18 SUPPLEMENTARY TABLES Table S1. Discretized phenotype measurements for the initial strain set. Values were converted to integers based on phenotype assays, by taking the most parsimonious set of values consistent with strain-by-strain comparisons. The origin is defined as wildtype filamentous growth and the overall scale is arbitrary. Strain wild-type tec1 cup9 sfl1 sok2 skn7 tec1cup9 tec1sfl1 tec1sok2 tec1skn7 cup9sfl1 cup9sok2 cup9skn7 sfl1sok2 sfl1skn7 sok2skn7 Filamentation Value 0 -2 1 3 2 -1 -2 -2 -2 -2 3 2 -1 2 3 2 Table S2. Sets of genes with expression influenced by the seed genes. See attached file CarterTableS2.xls. 19 Table S3. Analysis of gene sets influenced by the seed genes. Gene sets composed of genes with influence coefficients with magnitude of 0.1 or greater (see Materials and Methods) and transcription factors with enriched targets in the set (Bonferroni-corrected p < 0.001 enrichment). Network genes are the subset of genes to which molecular paths could be mapped from the seed gene. Gene Set Genes TEC1-Positive TEC1-Negative CUP9-Positive CUP9-Negative SFL1-Positive 305 47 275 47 980 Network Genes 69 33 70 23 341 SFL1-Negative 519 280 SOK2-Positive SOK2-Negative 115 123 68 92 SKN7-Positive SKN7-Negative 122 270 54 161 Transcription Factors Nrg1, Rox1, Skn7, Sko1, Sok2, Ste12, Tec1 Ace2, Fkh2, Flo8, Swi5 Nrg1, Phd1, Skn7, Sko1, Sok2, Ste12, Tec1 Ace2 Abf1, Dal82, Flo8, Phd1, Skn7, Sok2, Ste12, Tec1 Cin5, Fkh1, Fkh2, Flo8, Gcn4, Mbp1, Nrg1, Phd1, Rcs1, Sok2, Ste12, Swi5, Yap6 Cin5, Nrg1, Phd1, Rcs1, Skn7, Yap6 Cin5, Fkh2, Flo8, Mga1, Nrg1, Phd1, Skn7, Sok2, Ste12, Sut1, Tec1, Yap6 Cin5, Flo8, Mga1, Nrg1, Yap6 Flo8, Msn2, Nrg1, Phd1, Skn7, Sko1, Sok2, Ste12, Sut1, Tec1 Table S4. Genes in the SVD Mode-2 set. See attached file CarterTableS4.tsv. 20 Table S5. Enriched transcription factors for the Mode-2 SVD gene set. Transcription factors with an over-representation of targets in the set (p < 0.001 enrichment). Transcription Number of -Log10(p) a Factor Targets Mot3 14 3.15 Phd1 25 5.53 Rox1 18 6.28 Skn7 17 3.88 Sko1 7 3.60 Sok2 23 7.00 Ste12 32 10.70 Tec1 21 6.01 a Bonferroni-corrected -log10 probability (annotation significance). SUPPLEMENTARY FIGURE LEGENDS Figure S1. Goodness of fit for linear influences decomposition. Histograms are shown for Pearson correlation coefficients between experimental data and linear influences decomposition fit (red) and the additive control (blue). Correlations are computed for the expression of each gene for all double-mutant strains. Figure S2. Molecular networks for seed genes. Putative interaction paths are shown that transmit the influences (Figure 4A) from seed gene (A) TEC1, (B) CUP9, (C) SFL1, (D) SOK2, and (E) SKN7 to the other seed genes. Interactions are colored as: proteinprotein in blue, protein-DNA in orange, and protein phosphorylation in violet. Black arrows denote inferred influences for which no molecular path with fewer than five interactions were found. 21 Figure S3. SVD eigenvalues and eigengenes matrix. (A) Bar chart of eigenvalues and (B) raster plot of eigengenes matrix are shown for the first six SVD modes. Contributions are either positive (red) or negative (green), with intensity proportional to magnitude. Figure S4. Phenotype measurements and the Mode-2 expression component. Discretized filamentous growth measurements (blue) plotted with the second eigengene (red). The vertical scale corresponds to fractional gene expression, and the phenotype scale is arbitrary (see text and Table S1). 22

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download supplementary information - Molecular Systems Biology