Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories Simon Twigger, Ph.D. MCW Department of Physiology Human & Molecular Genetics Center http://rgd.mcw.edu Meet the client Rat researchers ask... Has anyone done any expression studies using congenic rats? What tissue is this gene expressed in? What expression data is known for SD (aka Are any of these genes SD/NHsd, Harlan associated with my Sprague Dawley, phenotype? Sprague Dawley) rats? Has this gene been seen in the brain? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the breast, breast carcinoma...)? Biological Data Warehouse Really important piece of data... Problem... Where, what, when? + (one) Solution? Where, what, when? + How to create the index? Examine One by One? Analysis of anterior pituitary glands of ACI, Copenhagen, and Brown Norway males following treatment with the synthetic estrogen diethylstilbestrol (DES). Copenhagen = COP Brown Norway = BN NCBO ontology services http://bioportal.bioontology.org/annotator Open Biomedical Annotator http://www.bioontology.org/wiki/index.php/Annotator_Web_service Initial Ontologies & Workflow • • • Datasets Series Samples Phase 1 Small Scale Testing Initial Test Load: 30 Rat Dataset records (GDS) out of 236 32 Series records (GSE) out of 750 587 Sample records (GSM) out of 7288 RubyOnRails web application to view data http://gminer.mcw.edu/ Parallel Annotation Workflow Concurrent Annotation Results August October Cloud-enabled Workflow? Results/Demo Initial Observations Synonyms DES Ept6 Searching with synonyms can be great: Ept6 = ACI.COP-(D3Mgh16D3Rat119)/Shul DES = Diethylystilbestrol Initial Observations Synonyms Searching with synonyms can cause problems: Estrogen-induced pituitary tumorigenesis = EPT Ethanolaminephosphotransferase activity = EPT Initial Observations 2 Rat Strain symbols AT, AN, AS, A, B, CD G (1000 x g) C (˚C) TX (Abbreviation for Texas) Train classifier on real strain phrases? Look for relevant neighboring terms? ...pituitary gland of the ACI, Copenhagen and Brown Norway Rat. ...16 month-old Sprague-Dawley females that... ...expression data from female SD rats with access to lifelong... ...Strain or Line: F344/NCrl ... ...dahl Salt-sensitive (S) rat and S.R(9)x3A congenic rat.... ...kidneys from Dahl salt-sensitive males... Initial Observations Anatomy In GEO records White Adipose Tissue Brown Adipose Tissue Ulnar bone Skeletal Muscle Anterior Pituitary Calvarial Bone Left Ventricle Corresponding MA term White Fat Brown Fat Ulna bone Set of Skeletal Muscle Anterior Pituitary Gland Chondrocranium Heart Left Ventricle Potential synonyms that could be added to MA Phase 2 All Rat Affy Samples 1 ontology (Anatomy) Larger scale data load 0 Rat Dataset records (GDS) 479 Series records (GSE) 12,012 Sample records (GSM) Targeted Indexing Mouse Adult Gross Anatomy Ontology Results/Demo Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 Alb Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb + Scgb2a1 Alb Hbb is_expressed_in rat kidney Tm2d1 is_expressed_in rat kidney Human (U133, U133v2.), Mouse (430, U74, U95) and Rat (U34a/b/c, 230, 230v2) 62,000 samples x ca. 25,000 genes/sample = 1.5B data points Probeset results on GMiner Gabdr Probeset results on GMiner RDF Data integration Probeset to MA Rat Genes & xrefs Probeset to Mouse Anatomy Ontology RGD ID OpenRDF Sesame Virtuoso Open Source Triple Store Ongoing • Work on term recognition, strains, etc. • Evaluation of Probeset-to-Anatomy results • Curation interface to add additional terms • RDF formats, Triple Store implementation • Integrate Strain and tissue results into RGD Education & Outreach Meet the student Video #3 is being shot this week Future Videos Target is the scientist! • Solve common tasks • Use annotation tools • Evaluate annotations • Intro to specific ontologies • Interview ontology teams • Ideas? • What does your community Acknowledgements • • • • • • Joey Geiger - Development of GMiner Jennifer Smith - Video creation, data curation Rajni Nigam - Rat Strain Ontology Clement Jonquet - NCBO OBA tools Trish Whetzel - Video script feedback Mark Musen & NIH Roadmap Initiative - Our Funding! Links • • • • http://twigger.hmgc.mcw.edu/ncbo/ http://gminer.mcw.edu Project webpage Web application http://github.com/mcwbbc/gminer Gminer Code http://github.com/simont/MCW-RDF [email protected] RDFizer code