Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to the W3C for Semantic Web and Life Sciences Interest Group Eric Prud’hommeaux What is the Mission of HCLS IG? The mission of HCLS is to develop, advocate for, and support the use of Semantic Web technologies for biological science, translational medicine and health care. These domains stand to gain tremendous benefit by adoption of Semantic Web technologies, as they depend on the interoperability of information from many domains and processes for efficient decision support. Task Forces • Terminology – Semantic Web representation of existing resources • Task lead - John Madden • BioRDF – integrated neuroscience knowledge base • Task lead - Kei Cheung • Linking Open Drug Data – aggregation of Web-based drug data • Task lead - Chris Bizer • Scientific Discourse – building communities through networking • Task leads - Tim Clark, John Breslin • Clinical Observations Interoperability – patient recruitment in trials • Task lead - Vipul Kashyap • Other Projects: Clinical Decision Support, URI Workshop, Collaborations with CDISC & HL7 Terminology: Overview • Goal is to identify use cases and methods for extracting Semantic Web representations from existing, standard medical record terminologies, e.g. UMLS • Methods should be reproducible and, to the extent possible, not lossy • Identify and document issues along the way related to identification schemes, expressiveness of the relevant languages • Initial effort will start with SNOMED-CT and UMLS Semantic Networks and focus on a particular subdomain (e.g. pharmacological classification) BioRDF: Answering Questions Goals: Get answers to questions posed to a body of collective knowledge in an effective way Knowledge used: Publicly available databases, and text mining Strategy: Integrate knowledge using careful modeling, exploiting Semantic Web standards and technologies BioRDF: Looking for Targets for Alzheimer’s • Signal transduction pathways are considered to be rich in “druggable” targets • CA1 Pyramidal Neurons are known to be particularly damaged in Alzheimer’s disease • Casting a wide net, can we find candidate genes known to be involved in signal transduction and active in Pyramidal Neurons? Source: Alan Ruttenberg BioRDF: Integrating Heterogeneous Data PDSPki Gene Ontology NeuronDB Reactome BAMS Antibodies Entrez Gene Allen Brain Atlas MESH Literature Mammalian Phenotype SWAN AlzGene BrainPharm PubChem Homologene Source: Susie Stephens BioRDF: SPARQL Query Source: Alan Ruttenberg BioRDF: Results: Genes, Processes DRD1, 1812 ADRB2, 154 ADRB2, 154 DRD1IP, 50632 DRD1, 1812 DRD2, 1813 GRM7, 2917 GNG3, 2785 GNG12, 55970 DRD2, 1813 ADRB2, 154 CALM3, 808 HTR2A, 3356 DRD1, 1812 SSTR5, 6755 MTNR1A, 4543 CNR2, 1269 HTR6, 3362 GRIK2, 2898 GRIN1, 2902 GRIN2A, 2903 GRIN2B, 2904 ADAM10, 102 GRM7, 2917 LRP1, 4035 ADAM10, 102 ASCL1, 429 HTR2A, 3356 ADRB2, 154 PTPRG, 5793 EPHA4, 2043 NRTN, 4902 CTNND1, 1500 adenylate cyclase activation adenylate cyclase activation arrestin mediated desensitization of G-protein coupled receptor protein signaling pathway dopamine receptor signaling pathway dopamine receptor, adenylate cyclase activating pathway dopamine receptor, adenylate cyclase inhibiting pathway G-protein coupled receptor protein signaling pathway G-protein coupled receptor protein signaling pathway G-protein coupled receptor protein signaling pathway G-protein coupled receptor protein signaling pathway G-protein coupled receptor protein signaling pathway G-protein coupled receptor protein signaling pathway G-protein coupled receptor protein signaling pathway G-protein signaling, coupled to cyclic nucleotide second messenger G-protein signaling, coupled to cyclic nucleotide second messenger G-protein signaling, coupled to cyclic nucleotide second messenger G-protein signaling, coupled to cyclic nucleotide second messenger G-protein signaling, coupled to cyclic nucleotide second messenger glutamate signaling pathway glutamate signaling pathway glutamate signaling pathway glutamate signaling pathway integrin-mediated signaling pathway negative regulation of adenylate cyclase activity negative regulation of Wnt receptor signaling pathway Notch receptor processing Notch signaling pathway serotonin receptor signaling pathway transmembrane receptor protein tyrosine kinase activation (dimerization) ransmembrane receptor protein tyrosine kinase signaling pathway transmembrane receptor protein tyrosine kinase signaling pathway transmembrane receptor protein tyrosine kinase signaling pathway Wnt receptor signaling pathway Many of the genes are related to AD through gamma secretase (presenilin) activity Source: Alan Ruttenberg LODD: Introduction Use Semantic Web technologies to 1. publish structured data on the Web 2. set links between data from one data source to data within other data sources Linked Data Browsers Linked Data Mashups Search Engines Thing Thing Thing Thing Thing Thing Thing Thing Thing Thing typed links A typed links B typed links C typed links D E Source: Chris Bizer LODD: Potential Links between Data Sets Source: Chris Bizer LODD: Data Set Evaluation Source: Chris Bizer LODD: Potential questions to answer • Physicians and Pharmacists • What are alternative drugs for a given indication (disease)? • What are equivalent drugs (generic version of a brand name, or the chemical name of a active ingredient)? • Are there ongoing clinical trials for a drug? • Patients • • • • What background information is available about a drug? What are the contraindications of a drug? Which alternative drugs are available? What are the results of clinical trials for a drug? • Pharmaceutical Companies • • What are other companies with drugs in similar areas? Which companies have a similar therapeutic focus? Source: Chris Bizer LODD: Linked Version of ClinicalTrials.gov • Total number of triples: 6,998,851 • Number of Trials: 61,920 • RDF links to other data sources: 177,975 • Links to: • DBpedia and YAGO (from intervention and conditions) • GeoNames (from locations) • Bio2RDF.org's PubMed (from references) Source: Chris Bizer LODD: Mashing Clinical Trials and Geo Classification of Places Geo Coordinates Source: Chris Bizer Scientific Discourse: Overview Source: Tim Clark Scientific Discourse: Goals • Provide a Semantic Web platform for scientific discourse in biomedicine • Linked to – key concepts, entities and knowledge • Specified – by ontologies • Integrated with – existing software tools • Useful to – Web communities of working scientists Source: Tim Clark Scientific Discourse: Some Parameters • Discourse categories: research questions, scientific assertions or claims, hypotheses, comments and discussion, and evidence • Biomedical categories: genes, proteins, antibodies, animal models, laboratory protocols, biological processes, reagents, disease classifications, user-generated tags, and bibliographic references • Driving biological project: cross-application of discoveries, methods and reagents in stem cell, Alzheimer and Parkinson disease research • Informatics use cases: interoperability of web-based research communities with (a) each other (b) key biomedical ontologies (c) algorithms for bibliographic annotation and text mining (d) key resources Source: Tim Clark Scientific Discourse: SWAN+SIOC • SIOC • • • Represent activities and contributions of online communities Integration with blogging, wiki and CMS software Use of existing ontologies, e.g. FOAF, SKOS, DC • SWAN • • • • Represents scientific discourse (hypotheses, claims, evidence, concepts, entities, citations) Used to create the SWAN Alzheimer knowledge base Active beta participation of 144 Alzheimer researchers Ongoing integration into SCF Drupal toolkit Source: Tim Clark Scientific Discourse: SIOC Ontology Source: John Breslin Scientific Discourse: SWAN KB Source: Tim Clark COI: Bridging Bench to Bedside • How can existing Electronic Health Records (EHR) formats be reused for patient recruitment? • Quasi standard formats for clinical data: • HL7/RIM/DCM – healthcare delivery systems • CDISC/SDTM – clinical trial systems • How can we map across these formats? • Can we ask questions in one format when the data is represented in another format? Source: Holger Stenzhorn COI: Use Case Pharmaceutical companies pay a lot to test drugs Pharmaceutical companies express protocol in CDISC -- precipitous gap – Hospitals exchange information in HL7/RIM Hospitals have relational databases Source: Eric Prud’hommeaux Inclusion Criteria Type 2 diabetes on diet and exercise therapy or monotherapy with metformin, insulin secretagogue, or alpha-glucosidase inhibitors, or a low-dose combination of these at 50% maximal dose. Dosing is stable for 8 weeks prior to randomization. … ?patient takes meformin . Source: Holger Stenzhorn Exclusion Criteria Use of warfarin (Coumadin), clopidogrel (Plavix) or other anticoagulants. … ?patient doesNotTake anticoagulant . Source: Holger Stenzhorn Criteria in SPARQL ?medication1 sdtm:subject ?patient ; spl:activeIngredient ?ingredient1 . ?ingredient1 spl:classCode 6809 . #metformin OPTIONAL { ?medication2 sdtm:subject ?patient ; spl:activeIngredient ?ingredient2 . ?ingredient2 spl:classCode 11289 . #anticoagulant } FILTER (!BOUND(?medication2)) Source: Holger Stenzhorn Getting Involved • Benefits to getting involved include: • • • early access to use cases and best practice influence standard recommends cost effective exploration of new technology through collaboration • Get involved by contacting the chairs: • [email protected]