Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ODD-Genes: Accelerating data-driven scientific discovery NeSC Review 2003 NeSC 2003-09-30 Introduction ODD-Genes Background Science enabled by ODD-Genes Automating routine statistical conditioning of highly variable microarray results. Discovering related data sources Querying discovered data sources for relevant data Identifying significant targets for focussed investigation Caveats & further work ODD-Genes Background ODD-Genes is a demonstrator Demonstrates how Grid technologies enable e-Science, accelerating scientific discovery SunDCG’s TOG software allows for job submission on remote compute resources OGSA-DAI provides access, control and discovery of data resources ODD-Genes used to investigate Wilms Tumour Routine statistical conditioning of microarray results Data-driven discovery of novel targets for investigation and potential therapy Collaborative project NeSC/EPCC, Edinburgh, UK Scottish Centre for Genomic Technology and Informatics, Edinburgh, UK (GTI) Human Genetics Unit at MRC, Western General Hospital, Edinburgh, UK (HGU) SunDCG – Enabling Routine Statistical Conditioning Choose analysis to perform Automates analysis process Provides predetermined workflow Can run more than one analysis at a time Multiple reproducible avenues for investigation Reduces cost (human, machine), increases availability TOG enables this by allowing access to HPC resources SunDCG - Conditioning Results Results of conditioning can be analysed and investigated Researcher has potentially several views of data to explore, all presented simultaneously in parallel (cp traditional serialised, manual process) Researcher can reproduce this initial condition for repeated analyses Researcher need not perform each step manually and serially, or ask dedicated statistician to do so. OGSA-DAI - Results Investigation Multiple views of data Raw Heat Map Cluster Map Wilms Tumour study takes a new direction two genes appear significant in early development Researchers would like more info on these genes… OGSA-DAI - Data Resource Discovery OGSA-DAI uses keywords to locate relevant data resources May return data resources previously unknown to researcher Researcher selects most interesting data resource to query for information about gene Researcher selects Mouse atlas – narrow, deep database of spatial gene expression in mice embryonic development Contrast with GTI database of broad, shallow genome-wide gene expression across multiple organisms, stages & conditions OGSA-DAI - Data Resource Query OGSA-DAI returns data from query Data and annotation displayed Data contains references to related images Researcher rapidly moves from numeric and textual description to spatial representation of relevant gene expression These show that the genes are stem cell markers Targets for focussed investigation, potential therapy ODD-Genes Caveats & Further Work ODD-Genes is a demonstrator Need to develop production applications for both routine statistical processing and data resource discovery and query Need to parameterise routine conditioning appropriately to complete automation ODD-Genes requires GRID infrastructure Participating researchers need to partner with centres who host application front-ends (or, host the infrastructure themselves) However, alternatives often proprietary, expensive, less flexible ODD-Genes requires registration by data-hosts Critical mass of registered data sources.