Download Differential Network Analysis

Tutorial Differential Network Analysis Tova Fuller, Steve Horvath Correspondence: [email protected], [email protected] Abstract Here we illustrate differential network analysis by comparing the connectivity and module structure of two networks based on the liver expression data of lean and heavy mice. This unbiased method for comparing two phenotypically distinct subgroups of mouse samples serves as a method for understanding the underlying differential gene co-expression network topology giving rise to altered biological pathways. This work is in press: Tova Fuller, Anatole Ghazalpour, Jason Aten, Thomas A. Drake, Aldons J. Lusis, Steve Horvath (2007) Weighted gene coexpression network analysis strategies applied to mouse weight. Mamm Genome, in press. The data are described in: Anatole Ghazalpour, Sudheer Doss, Bin Zang, Susanna Wang,Eric E. Schadt, Thomas A. Drake, Aldons J. Lusis, Steve Horvath (2006) Integrating Genetics and Network Analysis to Characterize Genes Related to Mouse Weight. PloS Genetics We provide the statistical code used for generating the weighted gene co-expression network results. Thus, the reader be able to reproduce all of our findings. This document also serves as a tutorial to differential weighted gene co-expression network analysis. Some familiarity with the R software is desirable but the document is fairly self-contained. This document and data files can be found at the following webpage: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/DifferentialNetworkAnalysi s More material on weighted network analysis can be found here: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/ Method Description: The data are described in the PLoS article cited above [1]. Please also refer to the citations above and below for more information regarding weighted gene co-expression network analysis (WGCNA). Here we attempt to show that networks may be constructed from two phenotypically different subgroups of samples from a prior WGCNA experiment on mice. Here we identify 30 mice at both extremes of the weight spectrum in the BxH data and construct weighted gene coexpression networks from each. Network Construction: In co-expression networks, network nodes correspond to genes and connection strengths are determined by the pairwise correlations between expression profiles. In contrast to unweighted networks, weighted networks use soft thresholding of the Pearson correlation matrix for determining the connection strengths between two genes. Soft thresholding of the Pearson correlation preserves the gene co-expression information and leads to weighted coexpression networks that are highly robust with respect to the construction method [2]. The network construction algorithm is described in detail elsewhere [2]. Briefly, a gene coexpression similarity measure (absolute value of Pearson’s product moment correlation) was used to relate every pairwise gene-gene relationship. An adjacency matrix was then constructed using a `soft’ power adjacency function aij = Power(sij, )  |sij|  where sij is the co-expression similarity, and aij represents the resulting adjacency that measures the connection strengths. The power  is chosen using the scale free topology criterion proposed in Zhang and Horvath (2005). Briefly, the power was chosen such the resulting network exhibited approximate scale free topology and a high mean number of connections. The scale free topology criterion led us to choose a power of  = 6 based on the preliminary network built from the 8000 most varying genes. However, since we are using a weighted network as opposed to an unweighted network, the biological findings are highly robust with respect to the choice of this power [2]. Topological Overlap Matrix and Gene Modules The adjacency matrix was then used to define a network distance measure or more precisely a measure of node dissimilarity based on the topological overlap matrix [2]. Specifically the topological overlap matrix is given by lij  aij ij  min{ki , k j }  1  aij where lij   aiu auj denotes the number of nodes to which both i and j are connected, and u u indexes the nodes of the network. The topological overlap matrix (TOM) is given by Ω=[ωij]. ωij is a number between 0 and 1 and is symmetric (i.e, ωij= ωji). The rationale for considering this similarity measure is that nodes that are part of highly integrated modules are expected to have high topological overlap with their neighbors. Network Module Identification. Gene "modules" are groups of nodes that have high topological overlap. Module identification was based on the topological overlap matrix Ω=[ωij] defined above. To use it in hierarchical clustering, it was turned into a dissimilarity measure by subtracting it from one (i.e, the topological overlap based dissimilarity measure is defined by dij  1  ij ). Based on the dissimilarity matrix we can use hierarchical clustering to discriminate one module from another. We used a dynamic cut-tree algorithm for automatically and precisely identifying modules in hierarchical clustering dendrogram (the details of the algorithm could be found at http://www.genetics.ucla.edu/labs/horvath/binzhang/DynamicTreeCut). The algorithm takes into account an essential feature of cluster occurrence and makes use of the internal structure in a dendrogram. Specifically, the algorithm is based on an adaptive process of cluster decomposition and combination and the process is iterated until the number of clusters becomes stable. No claim is made that our module construction method is optimal. A comparison of different module construction methods is beyond the scope of this paper.  Network Comparison Measures: For the ith gene, we denote the whole network connectivity by k1(i) and k2(i) in networks 1 and 2, respectively. To facilitate the comparison between the connectivity measures of each network, we divide each gene connectivity by the maximum network connectivity, i.e. k (i) k2 (i) and K 2 (i)  . We utilize the following measure of differential K1(i)  1 max( k1 ) max( k2 ) connectivity as DiffK(i) = K1(i) – K2(i), but other measures of differential connectivity could also be considered. To measure differential gene expression between the lean and heavy mice, we use the absolute value of the Student t-test statistic.  Sector plots and permutation test: Plotting DiffK versus the t-test statistic value for each gene gives a visual demonstration of how difference in connectivity relates to a more traditional t-statistic describing difference in expression between the two networks. To determine whether membership in each of these sectors was significant, I permute high or low weight status among our 60 mice in Networks 1 and 2, and reconstruct each network accordingly. I obtained values DiffKperm and ttestperm as shown above, taking into account permuted high or low weight status. This permutation process was repeated 1000 times (no.perm = 1000), each time noting the amount of genes that could be found in each of the m = 8 sectors as n mperm . Similarly, the sector counts for unpermuted, observed data was noted as n mobs. Then, I found a value Nm for each sector representing the number of iterations for which n mobs  n mperm . The p-value for each Nm  1  sector was found as . no.perm  1   Functional Analysis Notable genes were analyzed for pathway enrichment using the Database for Annotation,  Visualization, and Integrated Discovery (DAVID) [3] (http://david.niaid.nih.gov/david/ease.htm). 1. 2. 3. Ghazalpour, A., et al., Integrating genetic and network analysis to characterize genes related to mouse weight. PLoS Genet, 2006. 2(8): p. e130. Zhang, B. and S. Horvath, A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol, 2005. 4: p. Article17. Dennis, G., Jr., et al., DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol, 2003. 4(5): p. P3. Statistical References To cite this tutorial or the statistical methods please use 1. Zhang B, Horvath S (2005) A General Framework for Weighted Gene Co-Expression Network Analysis. Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article 17. http://www.bepress.com/sagmb/vol4/iss1/art17 For the generalized topological overlap matrix as applied to unweighted networks see 2. Yip A, Horvath S (2006) Generalized Topological Overlap Matrix and its Applications in Gene Co-expression Networks. Proceedings Volume. Biocomp Conference 2006, Las Vegas. Technical report at http://www.genetics.ucla.edu/labs/horvath/GTOM/. For some additional theoretical insights consider 3. Horvath S, Dong J, Yip A (2006) The Relationship between Intramodular Connectivity and Gene Significance. Proceedings Volume. Biocomp Conference 2006, Las Vegas. Technical report at http://www.genetics.ucla.edu/labs/horvath/ModuleConformity/ 4. Horvath, Dong, Yip (2006) Using Module Eigengenes to Understand Connectivity and Other Network Concepts in Co-expression Networks. Submitted. # Absolutely no warranty on the code. Please contact TF or SH with suggestions. # Downloading the R software # 1) Go to http://www.R-project.org, download R and install it on your computer # After installing R, you need to install several additional R library packages: # For example to install Hmisc, open R, # go to menu "Packages\Install package(s) from CRAN", # then choose Hmisc. R will automatically install the package. # When asked "Delete downloaded files (y/N)? ", answer "y". # Do the same for some of the other libraries mentioned below. But note that # several libraries are already present in the software so there is no need to re-install them. # To get this tutorial and data files, go to the following webpage # http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/DifferentialNetworkAnalysi s # Download the zip file containing: # 1) R function file: "NetworkFunctions.txt", which contains several R functions # needed for Network Analysis. # Unzip all the files into the same directory, ## The user should copy and paste the following script into the R session. ## Text after "#" is a comment and is automatically ignored by R. # read in the R libraries library(MASS) # standard, no need to install library(class) # standard, no need to install library(cluster) library(sma) # install it for the function plot.mat library(impute)# install it for imputing missing value library(scatterplot3d) # Note: alter the following file paths to point towards where your files are. source("/Users/TovaFuller/Documents/HorvathLab2006/NetworkFunctions/Network Functions.txt") setwd("/Users/TovaFuller/Documents/HorvathLab2007/MouseProject2.0/DiffNetwo rkAnalysis/") # The following 3421 probe set were arrived at using the following steps #1) reduce to the 8000 most varying, 2) 3600 most connected, 3) focus on unique genes dat0=read.table("cnew_liver_bxh_f2female_8000mvgenes_p3600_UNIQUE_tommodule s.xls",header=T) names(dat0) # this contains information on the genes datSummary=dat0[,c(1:8,144:150)] # the following data frame contains the gene expression data: columns are genes, rows are # arrays (samples) datExpr <- t(dat0[,9:143]) no.samples <- dim(datExpr)[[1]] dimnames(datExpr)[[2]]=datSummary[,1] dim(datExpr) # We read in the clinical data datClinicalTraits=read.csv("BXH_ClinicalTraits_361mice_forNewBXH.csv",heade r=T) #Now we order the mice so that trait file and expression file agree restrictMice=is.element(datClinicalTraits$MiceID,dimnames(datExpr)[[1]]) table(restrictMice) datClinicalTraits=datClinicalTraits[restrictMice,] orderMiceTraits=order(datClinicalTraits$MiceID) orderMiceExpr=order(dimnames(datExpr)[[1]]) datClinicalTraits =datClinicalTraits[orderMiceTraits,] datExpr =datExpr[orderMiceExpr,] # From the following table, we verify that all 135 mice are in order table(datClinicalTraits$MiceID==dimnames(datExpr)[[1]]) rm(dat0);collect_garbage() BodyWeight=as.numeric(datClinicalTraits$WeightG) # Now we find the 30 heaviest and leanest mice. Network 1 will refer to modules defined by # the 30 leanest mice, and Network 2 will refer to modules defined by the 30 heaviest mic. rest1=rank(BodyWeight,ties="first")<=30 rest2=rank(-BodyWeight,ties="first")<=30 # We check to make sure there are 30 of each, with no overlap between the groups table(rest1) table(rest2) table(rest1==T & rest2==T) # FALSE # 135 # We separate the expressions for the fat and lean groups: datExpr1=datExpr[rest1,] datExpr2=datExpr[rest2,] # Now we find the whole network connectivity measures for each: k1=SoftConnectivity(datExpr1,6) k2=SoftConnectivity(datExpr2,6) # We would like to normalize the connectivity measures. We do this by dividing by the # maximum values. K1=k1/max(k1) K2=k2/max(k2) # Now we find the difference between the two connectivity values. DiffK=K1-K2 # Note that we did not take the absolute value here. # Negative values of this difference imply that normalized Lean connectivity (k2) is # greater than K1, fat normalized connectivity. poolRest=as.logical(rest1+rest2) factorLevels=rep(NA,length(poolRest)) factorLevels[rest1]="lowWeight" # trait 1 is low weight factorLevels[rest2]="highWeight" # trait 2 is high weight ttest=rep(NA, dim(datExpr)[[2]]) # Let's determine the t-statistic. for (i in 1:dim(datExpr)[[2]]){ ttest[i]=t.test(datExpr[poolRest,i]~factorLevels[poolRest])$statistic } # Permuted Status # Let's create a control DiffK with permuted fat/lean status. We choose from the same # mice that were considered in the fat and lean groups. Note results from this following # analysis will be different each time because we are sampling randomly. temp1=sample(which(poolRest==T),30,replace=F) Rest1=rep(F,length(poolRest)) Rest1[temp1]=T table(Rest1) Rest2=as.logical(poolRest-Rest1) table(Rest2) table(Rest1==T & Rest2==T) # FALSE # 135 good! factorLevels2=rep(NA,length(poolRest)) factorLevels2[Rest1]="permLowWeight" factorLevels2[Rest2]="permHighWeight" datExpr1.2=datExpr[Rest1,] datExpr2.2=datExpr[Rest2,] # Now we find the whole network connectivity measures for each: k1.2=SoftConnectivity(datExpr1.2,6) k2.2=SoftConnectivity(datExpr2.2,6) # We would like to normalize the permuted connectivity measures. We do this by # dividing by the maximum values. K1.2=k1.2/max(k1.2) K2.2=k2.2/max(k2.2) # Now we find the difference between the two connectivity values. DiffK.2=K1.2-K2.2 # Note that we did not take the absolute value here. ttest2=rep(NA, dim(datExpr)[[2]]) # Let's determine the t-statistic. for (i in 1:dim(datExpr)[[2]]){ ttest2[i]=t.test(datExpr[poolRest,i]~factorLevels2[poolRest])$statistic } # Plotting DiffK versus ttest in Unpermuted and Permuted par(mfrow=c(1,2)) plot(DiffK,ttest, main=paste("Unpermuted: cor=", signif(cor(DiffK,ttest, use="pairwise.complete.obs") ,3)),xlim=range(DiffK),ylim=range(ttest)) # colorGroup1 ?? abline(h=1.96) abline(h=-1.96) abline(v=.4) abline(v=-.4) plot(DiffK.2,ttest2, main=paste("Permuted: cor=", signif(cor(DiffK.2,ttest2, use="pairwise.complete.obs") ,3)), xlim=range(DiffK),ylim=range(ttest)) # colorGroup1 ?? abline(h=1.96) abline(h=-1.96) abline(v=.4) abline(v=-.4) 1 2 # We repeat this permutation 1000 times, each time # counting the amount of "dots" in the quadrants # shown at left. 4 8 7 3 6 5 # Now we wish to obtain a p-value for the DiffK # when there is permutation versus when there is # not. # Here are the sector counts for unpermuted: sector1obs = (ttest>1.96) & (DiffK<(-0.4)) n1obs=sum(sector1obs) sector2obs = (ttest>1.96) &(DiffK>(-0.4)) & (DiffK<0.4) n2obs=sum(sector2obs) sector3obs = (ttest>1.96) & (DiffK>0.4) n3obs=sum(sector3obs) sector4obs = (ttest>(-1.96)) & (ttest<1.96) & (DiffK>0.4) n4obs=sum(sector4obs) sector5obs = (ttest<(-1.96)) & (DiffK>0.4) n5obs=sum(sector5obs) sector6obs = (ttest<(-1.96)) &(DiffK>(-0.4)) & (DiffK<0.4) n6obs=sum(sector6obs) sector7obs = (ttest<(-1.96)) & (DiffK<(-0.4)) n7obs=sum(sector7obs) sector8obs = (ttest>(-1.96)) & (ttest<1.96) & (DiffK<(-0.4)) n8obs=sum(sector8obs) # We will find a vector of these same counts for each permutation, but the niobs value # will always stay the same (scalar) no.perms=1000 # We started with 100, and then redid the experiment with no.perms=1000 n1perm=rep(NA,no.perms) n2perm=rep(NA,no.perms) n3perm=rep(NA,no.perms) n4perm=rep(NA,no.perms) n5perm=rep(NA,no.perms) n6perm=rep(NA,no.perms) n7perm=rep(NA,no.perms) n8perm=rep(NA,no.perms) # We turn now to obtaining sector counts for each permutation for (i in c(1:no.perms)) { temp1=sample(which(poolRest==T),30,replace=F) Rest1=rep(F,length(poolRest)) Rest1[temp1]=T Rest2=as.logical(poolRest-Rest1) factorLevels2=rep(NA,length(poolRest)) factorLevels2[Rest1]="fakefat" factorLevels2[Rest2]="fakelean" datExprFat.2=datExpr[Rest1,] datExprLean.2=datExpr[Rest2,] k1.2=SoftConnectivity(datExprFat.2,6) k2.2=SoftConnectivity(datExprLean.2,6) K1.2=k1.2/max(k1.2) K2.2=k2.2/max(k2.2) DiffK.2=K1.2-K2.2 # Note that we did not take the absolute value here. ttest2=rep(NA, dim(datExpr)[[2]]) # Let's determine the t-statistic. for (j in 1:dim(datExpr)[[2]]){ ttest2[j]=t.test(datExpr[poolRest,j]~factorLevels2[poolRest])$statistic } sector1perm = (ttest2>1.96) & (DiffK.2<(-0.4)) n1perm[i]=sum(sector1perm) sector2perm = (ttest2>1.96) &(DiffK.2>(-0.4)) & (DiffK.2<0.4) n2perm[i]=sum(sector2perm) sector3perm = (ttest2>1.96) & (DiffK.2>0.4) n3perm[i]=sum(sector3perm) sector4perm = (ttest2>(-1.96)) & (ttest2<1.96) & (DiffK.2>0.4) n4perm[i]=sum(sector4perm) sector5perm = (ttest2<(-1.96)) & (DiffK.2>0.4) n5perm[i]=sum(sector5perm) sector6perm = (ttest2<(-1.96)) &(DiffK.2>(-0.4)) & (DiffK.2<0.4) n6perm[i]=sum(sector6perm) sector7perm = (ttest2<(-1.96)) & (DiffK.2<(-0.4)) n7perm[i]=sum(sector7perm) sector8perm = (ttest2>(-1.96)) & (ttest2<1.96) & (DiffK.2<(-0.4)) n8perm[i]=sum(sector8perm) } logicalSum1=sum(n1obs<=n1perm) logicalSum2=sum(n2obs<=n2perm) logicalSum3=sum(n3obs<=n3perm) logicalSum4=sum(n4obs<=n4perm) logicalSum5=sum(n5obs<=n5perm) logicalSum6=sum(n6obs<=n6perm) logicalSum7=sum(n7obs<=n7perm) logicalSum8=sum(n8obs<=n8perm) pval1=(logicalSum1+1)/(no.perms+1) pval1 pval2=(logicalSum2+1)/(no.perms+1) pval2 pval3=(logicalSum3+1)/(no.perms+1) pval3 pval4=(logicalSum4+1)/(no.perms+1) pval4 pval5=(logicalSum5+1)/(no.perms+1) pval5 pval6=(logicalSum6+1)/(no.perms+1) pval6 pval7=(logicalSum7+1)/(no.perms+1) pval7 pval8=(logicalSum8+1)/(no.perms+1) pval8 # For no.perms=100: # # # # # # # # # # # # # # # # pval1 [1] 0.5940594 pval2 [1] 0.00990099 pval3 [1]0.00990099 pval4 [1] 0.930693 pval5 [1] 0.00990099 pval6 [1] 0.00990099 pval7 [1] 0.4950495 pval8 [1] 0.8118812 # For no.perms=1000 pval1 [1] 0.4645355 pval2 [1] 0.000999001 pval3 [1] 0.000999001 pval4 [1] 0.947053 pval5 [1] 0.00999001 pval6 [1] 0.000999001 pval7 [1] 0.3346653 pval8 [1] 0.7972028 # Because we are dividing by no.perms, the p-values are 10-fold different for 100 or 1000 # iterations. # The result we achieve makes sense; regions with significantly high or low ttest values # (p < 0.05) tend to be significant. However, since there are few data points that have a # DiffK value < -0.4, the upper and lower right hand corner sectors do not prove # significant. This could be because DiffK is defined as "normalized low weight # connectivity" minus "normalized high weight connectivity" and there are few genes that in # fact are more connected in the fat than in the lean that also have extreme ttest scores. # Module Identification beta1=6 # Note: we use a beta1 = 6 as this was the value used in our previous analysis # Finding hierGTOM in our Network 1 (low weight). datExpr1=data.frame(datExpr1) AdjMat1=abs(cor(datExpr1,use="p"))^beta1 collect_garbage() dissGTOM1=TOMdist1(AdjMat1) collect_garbage() hierGTOM1 <- hclust(as.dist(dissGTOM1),method="average") par(mfrow=c(1,1)) plot(hierGTOM1,labels=F) rm(AdjMat1) # Finding hierGTOM in our Network 2 (high weight). datExpr2=data.frame(datExpr2) AdjMat2=abs(cor(datExpr2,use="p"))^beta1 collect_garbage() dissGTOM2=TOMdist1(AdjMat2) collect_garbage() hierGTOM2 <- hclust(as.dist(dissGTOM2),method="average") par(mfrow=c(1,1)) plot(hierGTOM2,labels=F) rm(AdjMat2) # Dynamic Cut-Tree Algorithm # "We used a dynamic cut-tree algorithm for selection branches of the hierarchical # clustering dendrogram (the details of the algorithm can be found at the following link: # www.genetics.ucla.edu/labs/horvath/binzhang/DynamicTreeCut. The algorithm takes # into account an # essential feature of cluster occurrence and makes use of the internal # structure in a dendrogram. Specifically, the algorithm is based on an adaptive process # of cluster decomposition and combination and the process is iterated until the number # of clusters becomes stable. " (From first mouse tutorial) myheightcutoff =0.98 mydeepSplit = FALSE # fine structure within module myminModuleSize = 90 # modules must have this minimum number of genes # Note that we get a "too many modules" warning if we have a smaller minimum size, or # higher height cutoff. # new way for identifying modules based on hierarchical clustering dendrogram color1=cutreeDynamic(hierclust=hierGTOM1, deepSplit=mydeepSplit,maxTreeHeight =myheightcutoff,minModuleSize=myminModuleSize) table(color1) color1 turquoise 355 black 218 salmon 113 blue brown 336 271 pink magenta 184 155 cyan midnightblue 110 92 yellow 255 purple 139 grey 485 green 241 greenyellow 126 red 224 tan 117 pdf(file="modules1.pdf") par(mfrow=c(2,1)) plot(hierGTOM1, main="BXH Mouse Network 1",labels=F,xlab="",sub=""); hclustplot1(hierGTOM1,color1,title1="Colored by Network 1 Modules") dev.off() # Now we define modules in the Network 2. color2=cutreeDynamic(hierclust=hierGTOM2, deepSplit=mydeepSplit,maxTreeHeight =myheightcutoff,minModuleSize=myminModuleSize) table(color2) color2 turquoise 337 black 175 salmon 102 grey 612 blue brown 243 234 pink magenta 172 161 cyan midnightblue 101 100 yellow 210 purple 116 lightcyan 98 green 184 greenyellow 108 grey60 92 red 184 tan 102 lightgreen 90 pdf(file="modules2.pdf") par(mfrow=c(2,1)) plot(hierGTOM2, main="BXH Mouse Network 2",labels=F,xlab="",sub=""); hclustplot1(hierGTOM2,color2,title1="Colored by Network 2 Modules") dev.off() # We also want to look at modules in Network 2 colored by Network 1: pdf(file="modules2by1.pdf") par(mfrow=c(2,1)) plot(hierGTOM2, main="BXH Mouse Network 2",labels=F,xlab="",sub=""); hclustplot1(hierGTOM2,color1,title1="Colored by Network 1 Modules") dev.off() # We look at DiffK versus ttest in different modules. par(mfrow=c(1,2)) plot(DiffK,ttest,col=as.character(color1),main="Colored by Network 1") plot(DiffK,ttest,col=as.character(color2),main="Colored by Network 2") # Here we create a Network 1 plot that is colored by module definitions, with sectors # delineated. par(mfrow=c(1,1)) plot(DiffK,ttest,col=as.character(color1),main=paste("Colored by Network 1 Modules, cor=", signif(cor(DiffK,ttest,use="pairwise.complete.obs"),3)), xlim=range(DiffK),ylim=range(ttest),xlab="Difference in Connectivity",ylab="t-test statistic") abline(h=1.96) abline(h=-1.96) abline(v=.4) abline(v=-.4) text(c(-0.5,-0.2,0.7,0.7,0.7,-0.2,-0.5,-0.5),c(7,7,7,0,-9,-9,9,0),labels=c("1","2","3","4","5","6","7","8"),cex=3) # Here we create a Network 2 plot. par(mfrow=c(1,1)) plot(DiffK.2,ttest2, main=paste("Permuted, cor=", signif(cor(DiffK.2,ttest2, use="pairwise.complete.obs") ,3)), xlim=range(DiffK),ylim=range(ttest),xlab="Difference in Connectivity",ylab="t-test statistic") abline(h=1.96) abline(h=-1.96) abline(v=.4) abline(v=-.4) # Here we create the permuted plot that appears in the Mammalian Genome article: par(mfrow=c(1,2),cex.lab=1.3,mar=c(5,5,4,2)+0.1) plot(DiffK,ttest,col=as.character(color1),main=paste("Colored by Network 1 Modules, cor=", signif(cor(DiffK,ttest,use="pairwise.complete.obs"),3)), xlim=range(DiffK),ylim=range(ttest),xlab="Difference in Connectivity",ylab="t-test statistic") abline(h=1.96) abline(h=-1.96) abline(v=.4) abline(v=-.4) text(c(-0.5,-0.2,0.7,0.7,0.7,-0.2,-0.5,-0.5),c(7,7,7,0,-9,-9,9,0),labels=c("1","2","3","4","5","6","7","8"),cex=3) plot(DiffK.2,ttest2, main=paste("Permuted, cor=", signif(cor(DiffK.2,ttest2, use="pairwise.complete.obs") ,3)), xlim=range(DiffK),ylim=range(ttest),xlab="Difference in Connectivity",ylab="t-test statistic") abline(h=1.96) abline(h=-1.96) abline(v=.4) abline(v=-.4) # FUNCTIONAL ANALYSIS OF HIGHLY DIFFERENTIAL MODULES The yellow module in lean mice and the grey module in fat mice is particularly interesting because it is member to sector 3 – which is found to be significantly different in DiffK versus ttest at the p=9.9e-4 level. Furthermore, high connectivity in low weight mice in alongside low connectivity in high weight mice demonstrates that network structure either affects or is affected by mouse weight. We use an EASE analysis of these genes, looking at: 1. yellow genes as defined by lean modules 2. the intersection of yellow genes as defined by network 1 modules and grey genes defined by network 2 modules 3. #1, limited by sector 3 boundaries 4. #2, limited by sector 3 boundaries. # Let's find the genes we are interested in: # Number 1 (see above) Yellow1=color1=="yellow" table(Yellow1) # Yellow1 # FALSE TRUE # 3166 255 Grey2=color2=="grey" table(Grey2) # Grey2 # FALSE TRUE # 2809 612 # Number 2 Y1G2=Yellow1 & Grey2 table(Y1G2) # FALSE TRUE # 3250 171 # Number 3 Y1S3= ((ttest>1.96) & (DiffK>0.4) & Yellow1) table(Y1S3) # YleanS3 # FALSE TRUE # 3342 79 # Number 4 Y1G2S3=ttest>1.96 & (DiffK>0.4) & Y1G2 table(Y1G2S3) # Y1G2S3 # FALSE TRUE # 3360 61 # dimnames(datExpr)[[2]] are the gene names. We can use the "locus link # IDs", otherwise known as Entrez Gene IDs in the DAVID database to get a functional # annotation. EntrezGeneIDs=datSummary[,3] # Background write.table(EntrezGeneIDs,"background.txt",sep=" ",row.names=F,col.names=F) # Number 1 write.table(datSummary[,3][Yellow1],"Yellow1.txt", sep=" ",row.names=F,col.names=F) # Number 2 write.table(datSummary[,3][Y1G2],"Y1G2.txt", sep=" ",row.names=F,col.names=F) # Number 3 write.table(datSummary[,3][Y1S3],"Y1S3.txt", sep=" ",row.names=F,col.names=F) # Number 4 write.table(datSummary[,3][Y1G2S3],"Y1G2S3.txt", sep=" ",row.names=F,col.names=F) Because a 0 is placed in lieu of a missing ID, I manually opened these files and deleted "0" entries. Then I used the entire gene list of the 3421 most connected genes as a background and entered this information into DAVID for a functional analysis. In the yellow lean gene list, 250 of the 255 yellow lean genes were recognized as mus musculus genes. 168 of yellow lean / grey fat out of 171 were recognized. All of the sector 1 genes, whether only yellow lean genes, or the intersection of lean and fat genes, were recognized. The smallest group of genes – those that met criteria 4 were highly specific for extracellular matrix, extracellular region, epidermal growth factors and its related proteins, cell adhesion, and glycoproteins. On the next page you will find a table of these genes sorted by increasing p-value. Genes with p-value < 0.05 are shaded. In criteria 4 we added the additional requirement of being a grey fat gene. The genes that do not meet this requirement (only meet criteria 3, being yellow genes in sector 1) are involved in blood vessel morphogenesis, metalloprotease inhibition, anion transport, or other similar functions seen in Criteria 4. CRITERIA 4 GENE FUNCTIONAL ANNOTATION – yellow in network 1, grey in network 2, sector 3 Category Term GOTERM_CC_ALL extracellular region Count 23 37.70% 1.75E-04 SP_PIR_KEYWORDS UP_SEQ_FEATURE signal signal peptide 22 22 36.07% 36.07% 3.55E-04 5.39E-04 GOTERM_CC_ALL GOTERM_BP_ALL extracellular space cell adhesion 21 10 34.43% 16.39% 5.69E-04 7.69E-04 UP_SEQ_FEATURE UP_SEQ_FEATURE domain:EGF-like 1 glycosylation site:N-linked (GlcNAc...) 5 21 8.20% 34.43% 8.66E-04 0.00123567 SP_PIR_KEYWORDS UP_SEQ_FEATURE glycoprotein domain:EGF-like 3 21 4 34.43% 6.56% 0.001564983 0.001605736 SP_PIR_KEYWORDS SP_PIR_KEYWORDS cell adhesion collagen 7 5 11.48% 8.20% 0.001704503 0.001819918 GOTERM_BP_ALL UP_SEQ_FEATURE cell-cell adhesion domain:EGF-like 2 5 4 8.20% 6.56% 0.00376904 0.006019022 SP_PIR_KEYWORDS GOTERM_MF_ALL immunoglobulin domain extracellular matrix structural constituent 7 4 11.48% 6.56% 0.006493147 0.009692797 INTERPRO_NAME UP_SEQ_FEATURE IPR007110:Immunoglobulin-like disulfide bond 7 16 11.48% 26.23% 0.010018458 0.011240382 SMART_NAME SP_PIR_KEYWORDS SM00181:EGF extracellular matrix 5 5 8.20% 8.20% 0.012956896 0.014607793 SMART_NAME SP_PIR_KEYWORDS SM00179:EGF_CA egf-like domain 4 5 6.56% 8.20% 0.014804082 0.016893234 GOTERM_MF_ALL INTERPRO_NAME structural molecule activity IPR000742:EGF-like, type 3 7 5 11.48% 8.20% 0.017782941 0.018214759 INTERPRO_NAME INTERPRO_NAME IPR006210:EGF IPR001881:EGF-like calcium-binding 5 4 8.20% 6.56% 0.02067224 0.02409503 INTERPRO_NAME INTERPRO_NAME IPR006209:EGF-like IPR013032:EGF-like region 5 5 8.20% 8.20% 0.029234871 0.029234871 GOTERM_BP_ALL INTERPRO_NAME regulation of signal transduction IPR013106:Immunoglobulin V-set 4 4 6.56% 6.56% 0.033918607 0.043000016 INTERPRO_NAME GOTERM_BP_ALL IPR013111:EGF, extracellular negative regulation of signal transduction 3 3 4.92% 4.92% 0.043093768 0.044205578 GOTERM_CC_ALL GOTERM_CC_ALL extracellular matrix (sensu Metazoa) extracellular matrix 5 5 8.20% 8.20% 0.0465188 0.0465188 SMART_NAME UP_SEQ_FEATURE SM00643:C345C domain:VWFC 2 2 3.28% 3.28% 0.060483805 0.067182734 INTERPRO_NAME INTERPRO_NAME IPR013091:EGF calcium-binding IPR008160:Collagen triple helix repeat 3 3 4.92% 4.92% 0.074400865 0.0801662 SP_PIR_KEYWORDS GOTERM_BP_ALL calcium binding phosphate transport 3 3 4.92% 4.92% 0.082536708 0.088210508 UP_SEQ_FEATURE SP_PIR_KEYWORDS domain:TSP N-terminal duplication 2 3 3.28% 4.92% 0.088585807 0.088868263 PIR_SUPERFAMILY_NAME GOTERM_BP_ALL SF005770:regulator of G protein signaling, RGS4 type organ morphogenesis 2 5 3.28% 8.20% INTERPRO_NAME IPR001134:Netrin, C-terminal 2 3.28% write.table(datSummary[Y1G2S3,],"Y1G2S3summary.csv",sep=",") % PValue 0.089493679 0.094055231 0.095041582

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Differential Network Analysis