Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Clustering of co-expressed genes based on RNA-Seq data _______________________________________________________________________ Position: Post-doctoral Duration: 12 months Starting data: September, 2016 Environment: The candidate will be based at the Toulouse Mathematics Institute (Institut de Mathématiques de Toulouse, IMT) in Toulouse, France Contact: Cathy Maugis-Rabusseau ([email protected]) Andrea Rau ([email protected]) MixStatSeq ANR project : http://perso.math.univ-toulouse.fr/maugis/mixstatseq/ ______________________________________________________________________ Topic: Significant advances in next generation sequencing technologies have made RNA sequencing (RNA-seq) a popular choice for studies of gene expression. Although microarrays and RNA-seq both aim to characterize transcriptional activity, the statistical tools developed for the analysis of the former are ill-suited to the latter, and methodological developments specific to RNA-seq data have been an active area of research in recent years. In the French National Research Agency (ANR) project MixStatSeq, we are interested in detecting clusters of co-expressed genes that share similar expression profiles across several experimental conditions from RNA-seq data. Identifying these groups of co-expressed genes is of great biological interest, as they may share similar transcriptional regulatory mechanisms. In addition, such co-expression analyses represent a variety of statistical challenges; in the MixStatSeq project, we focus on the use of modelbased clustering methods to explore RNA-seq data, but several obstacles must still be addressed in this context. The post-doctoral researcher will first focus on identifying the most appropriate strategy to adopt for co-expression analyses of RNA-seq data, including the choice of appropriate transformations and mixture model collections for RNA-seq data, as well as the definition of an adapted criterion to select the number of clusters present in the data. Second, the post-doctoral researcher will focus on the comparison and aggregation of related coexpressed gene clustering results from RNA-seq, microarray, and functional annotation data to improve the biological interpretability and robustness of co-expression analyses. Throughout this work, novel statistical or computational developments, including an R package including graphical tools for data visualization, are expected to be developed as needed. The post-doc will make use of publicly available RNA-seq data as well as data generated in the Animal Genetics and Integrative Biology (GABI) research unit at INRA. Keywords: clustering, mixture models, model selection, clustering aggregation, RNA-seq and microarray datasets Skills : The candidate should have a Ph.D. or equivalent degree in biostatistics or statistics by the start date and written proficiency in English. We are looking for a highly motivated and skilled candidate who is strongly motivated by challenging research topics and applications in biology. Strong programming skills in R are expected, and some experience in Python would be appreciated. Familiarity with RNA-seq data or mixture models is desirable but not required. Additional information: To apply, send an email to Cathy Maugis-Rabusseau with a CV and a letter of motivation describing your background and interest in the project, and the name of two references.