Download Gene Co-Expression Network Design from RNA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics of human development wikipedia , lookup

Gene wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene desert wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

History of genetic engineering wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome (book) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Gene Co-Expression Network Design from RNA-seq Data in
Arabidopsis thaliana
Ximena Contreras, Dan Nichol and Michelle Parker
Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG, United Kingdom
Motivation
Correlation networks are a useful tool in identifying trends in large biological data sets. In particular correlation networks have been used extensively to study gene co-expression; the process by which genes are expressed in coordination to produce proteins. Here we construct
and analyse the gene co-expression network of
the model plant Arabidopsis thaliana. Weighted
gene correlation network analysis (WGCNA) is
used in order to find modules of highly correlated genes and for summarizing these clusters
of genes by identifying their eigengene; the first
principal component. We use eigengenes to
collapse modules with similar expression profiles and to identify the biological process common to the genes that contribute the most to the
global connectivity of each module.
Methodology
On the left the methodology for constructing the gene co-expression network and indentifying
modules. On the right the Topological Overlap heatplot for the genes of 3 modules.
Analysing the results
In this project we have built a weighted gene co-expression network for A. thaliana, an organism for which such a network has not been previously
built. In doing so we used co-expression values found using RNA-Seq technology which as a new technology has not been used in co-expression
network construction previously. From our network we were able to identify 33 statistically significant and independent modules. Using the Eigengene
expression profile summary for each of these modules we were able to identify the biological functionality for ten of them. We retrieve global processes,
plant related processes as wwll as plant specific processes. These modules with associated functionality have significaly (p-value<0.01) lower mean
clustering coefficient and higher heterogeneity measures than the modules without funcitonality.
Results
On the left module eigengene dendogram. Red line indicates cutoff for merging modules with a correlation higher than 0.8. Arrows indicate modules
with statistically significant shared biological function. Eigengenes from modules with light related processes cluster together (right corner). On
the right, circle plots for the gene coexpression networks of 9 modules. Global processes: translation, programmed cell death, circadian rythm and
nutrient reservoir activity; plant related processes: response to light, photorespiration, fungi response; plant specfic processes: photosynthesis and
glycosinolate biosynthesis.
Conclusions
Acknowledgements
From our results we find that the weighted correlation co-expression network construction [1] is
well suited for large datasets of RNA-Seq data and can be used to find biologically meaningful gene
modules. However, the discovery of a number of gene modules for which no biological function
exists could suggest that the methods of WGCNA are too crude and identify modules which can be
attributed to coincidence or imperfections in the raw data set. In particular the steps of hierarchical
clustering and tree cutting are particularly prone to oversensitivity and can split modules in half.
This is the reason that an Eigengene based recombination step is required. Having found that the
modules with associated functionality had a significantly lower mean clustering coefficient that
those for which it was not we are led to believe that the mean clustering coefficient could be a
useful heuristic in finding biologically meaningful modules in clusters. In particular it seems that
the creation of a tree cut algorithm which takes into account mean clustering coefficient and perhaps
other measures of network topology would help solve the problem of oversensitivity at the module
detection step of the network construction. This could be a possible avenue of further research.
This work was carried out as part of the Oxford Summer
School in Computational Biology, 2012, in conjunction with
the Department of Plant Sciences, and with support from
the Department of Zoology. We thank the Oxford Supercomputing Centre for providing computational resources.
References
[1] Ashburner, M. et al. (2005). Gene Ontology: tool for the
unification of biology Nature genetics, 25(1), 25.
[2] Langfelder,P. et al. (2000). WGCNA: an R package for
weighted correlation network analysis. BMC Bioinformatics, 9(1), 559.
[3] Iancu, O.D. et al. (2012). EUtilizing RNA-Seq data for
the novo coexpression network inference Bioinformatics, 28(12), 1592–1597.