* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Gene Co-Expression Network Design from RNA
Epigenetics of human development wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genome evolution wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Gene nomenclature wikipedia , lookup
Gene desert wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome (book) wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Helitron (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
Microevolution wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene Co-Expression Network Design from RNA-seq Data in Arabidopsis thaliana Ximena Contreras, Dan Nichol and Michelle Parker Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG, United Kingdom Motivation Correlation networks are a useful tool in identifying trends in large biological data sets. In particular correlation networks have been used extensively to study gene co-expression; the process by which genes are expressed in coordination to produce proteins. Here we construct and analyse the gene co-expression network of the model plant Arabidopsis thaliana. Weighted gene correlation network analysis (WGCNA) is used in order to find modules of highly correlated genes and for summarizing these clusters of genes by identifying their eigengene; the first principal component. We use eigengenes to collapse modules with similar expression profiles and to identify the biological process common to the genes that contribute the most to the global connectivity of each module. Methodology On the left the methodology for constructing the gene co-expression network and indentifying modules. On the right the Topological Overlap heatplot for the genes of 3 modules. Analysing the results In this project we have built a weighted gene co-expression network for A. thaliana, an organism for which such a network has not been previously built. In doing so we used co-expression values found using RNA-Seq technology which as a new technology has not been used in co-expression network construction previously. From our network we were able to identify 33 statistically significant and independent modules. Using the Eigengene expression profile summary for each of these modules we were able to identify the biological functionality for ten of them. We retrieve global processes, plant related processes as wwll as plant specific processes. These modules with associated functionality have significaly (p-value<0.01) lower mean clustering coefficient and higher heterogeneity measures than the modules without funcitonality. Results On the left module eigengene dendogram. Red line indicates cutoff for merging modules with a correlation higher than 0.8. Arrows indicate modules with statistically significant shared biological function. Eigengenes from modules with light related processes cluster together (right corner). On the right, circle plots for the gene coexpression networks of 9 modules. Global processes: translation, programmed cell death, circadian rythm and nutrient reservoir activity; plant related processes: response to light, photorespiration, fungi response; plant specfic processes: photosynthesis and glycosinolate biosynthesis. Conclusions Acknowledgements From our results we find that the weighted correlation co-expression network construction [1] is well suited for large datasets of RNA-Seq data and can be used to find biologically meaningful gene modules. However, the discovery of a number of gene modules for which no biological function exists could suggest that the methods of WGCNA are too crude and identify modules which can be attributed to coincidence or imperfections in the raw data set. In particular the steps of hierarchical clustering and tree cutting are particularly prone to oversensitivity and can split modules in half. This is the reason that an Eigengene based recombination step is required. Having found that the modules with associated functionality had a significantly lower mean clustering coefficient that those for which it was not we are led to believe that the mean clustering coefficient could be a useful heuristic in finding biologically meaningful modules in clusters. In particular it seems that the creation of a tree cut algorithm which takes into account mean clustering coefficient and perhaps other measures of network topology would help solve the problem of oversensitivity at the module detection step of the network construction. This could be a possible avenue of further research. This work was carried out as part of the Oxford Summer School in Computational Biology, 2012, in conjunction with the Department of Plant Sciences, and with support from the Department of Zoology. We thank the Oxford Supercomputing Centre for providing computational resources. References [1] Ashburner, M. et al. (2005). Gene Ontology: tool for the unification of biology Nature genetics, 25(1), 25. [2] Langfelder,P. et al. (2000). WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics, 9(1), 559. [3] Iancu, O.D. et al. (2012). EUtilizing RNA-Seq data for the novo coexpression network inference Bioinformatics, 28(12), 1592–1597.