Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
•Peter Bram ‘t Hoen •Ellen Sterrenburg •Herman van Haagen •Allessandro Botelho-Bovo •Judith Boer •Johan den Dunnen •Gert Jan van Ommen •Erik van Mulligen •Martijn Schuemie •Rob Jelier •Antoine Velthoven •Christina Hettne •Jan Kors •Johan van der Lei •Christine Chichester •Erik van Mulligen •Marc Weeber •Kevin Kalupsen •Reuben Christie •Jacintha van Beemen •Nickolas Barris •Albert Mons •Gerard Meijssen •Erik Moeller •Peter Jan Roes •Karsten Uil •Siebrand Mazeland •Sabine Cretella Barend Mons Second Order Semantic Enrichment and the role of Wiki’s for Professionals The Consortium Open Access Semantic Support Technology For on-line Knowledge Tracking, Discovery and Management WikiProfessional Semantic Web workspaces for scientists enabling real time knowledge exchange and exploration The Million Minds Approach Why ? Many challenges in current bomedical research • • • • • • • • • Volume of data (both high troughput and text) Complexity Distributed systems and databases Incompatible data formats Multi-disciplinarity Multi-linguality Ambiguity of terminology Inability to share Knowledge Globalization of knowledge •Repetition of facts is of great value for the readability of individual papers, •but the fact itself is a single unit of information, and needs no repetition. The Million Minds Approach – A defining characteristic of wiki technology is the ease with which pages can be created and updated. Generally, there is no review before modifications are accepted. Websites such as www.dmd.nl are increasingly cited in the literature Personal Communication Johan den Dunnen. The majority of (SP) proteins has more than 1 research group asociated 6000 5000 4000 3000 genes/proteins 2000 1000 0 1 research group 2 or m ore groups So…..can we use wikis for this ?????? First order semantic enrichment 2nd order S.E. • Contextual annotation of web pages for interactive browsing, van Mulligen E, Diwersy M, Schijvenaars B, Weeber M, van der Eijk CC, Jelier R, Schuemie M, Kors J, Mons B, Medinfo 2004, 11:94-8 • Which gene did you mean?, Mons B, BMC Bioinformatics 2005 Jun 7, 6:142 The Knowlet What does a Knowlet look like ‘under the hood’? <Source concept> <Target Concept> <Relations>: <Typea1> Database facts (mutiple attributes) <Typea2> Community Annotations (WikiProf) <Typeb1> Co-occurrence sentence <Typeb2> Co-occurrence abstract <Typec1> Concept Profile Match <Type c2> Sequence similarity (BLAST score Genes and Proteins only) <Type c3 Co-expression with (genes from expression Databases) Knowlet building block Knowlet of core concept Knowlet space factual co occurrence associative K D K E K D K G K A K G K H K D K H K Z K F K Z K B K I K B • Rules to combine different sources of information into a single relationship • Time-stamped information • The relationship to the original texts or database entries The Knowlet • A Knowlet represents a unit of thought interconnected with other units of thoughts or in other words: a ‘cloud’ of concepts that have one or more relationship types with the central (selected) concept • The interconnection reflects a semantic relationship derived: – From facts in database – From co-occurrence in a text – From other associations • Relations have a strength – Based on the source of relationship – Based on the amount of «evidence» • Knowlets belong to one or more semantic classes: proteins, diseases, authors, organizations, journals, experiments, etc. • Each Knowlet is uniquely identified by a URL or URI (Unique Resource Identifier) 3. Building an association matrix of large data sources 1 Million 1 Million person organisation Object 1 gene Object 2 disease Object 3 drug Function unknow n FunctionChaperones unknow n Chaperones Chromatin structure Chromatin structure Fibrous proteins Fibrous proteins mRNA metabolism mRNA metabolism Others Others Ribosomal proteins Ribosomal proteinsbiogenesis Ribosome Translation Ribosome biogenesis Translation l Z PARN SRP • Assignment of protein function and discovery of new nucleolar proteins based on automatic analysis of MEDLINE. Martijn Schuemie, Christine Chichester, Frederique Lisaceck, Yohann Coute, Peter-Jan Roes, Jean Charles Sanchez, Barend Mons Special issue on Systems Biology in Proteomics, 2008 (accepted for publication) Kappa-based clustering based on Gene ID Cluster studies on basis of Homologene IDs Cluster 1: Mdx mice Dysferlin-deficient mice Cluster 2: myositis Cluster 3: DMD Cluster 4: EOM-specific genes in mdx Cluster 5: Development of EOM muscle and rat atrophy GeneSet Clusterer, Rob Jelier, Erasmus MC Clustering of genes based on similarity of concept profiles Cluster 1: atrophy and myopathy Cluster 2: extraocular muscle of mdx Cluster 3: human and mouse muscular dystrophies and myositis Cluster 4: long gene lists Cluster 5: muscle differentiation; Ky-mutant and Fxr-/- mice Cluster 6: ageing and sarcopenia GeneSet Clusterer, Rob Jelier, Erasmus MC Evaluate biological processes that bring studies together No overlap on GeneID level Annotate Many assocations on concept profile level DatasetComparer, Rob Jelier, Erasmus MC • • • • • • OmegaWiki (terminology system) Wiki Authors Wiki Medical/Clinical Wiki Proteins Wiki Chemicals Wiki Etc. Allow for: • • • Community Annotation Quick growth of terminology systems Semantic Linking between concepts Association Matrix Literature Meta-analysis Knowlet Expert Challenge Protein A Update Expert comments U.W. Fingerprint WikiZ/P Peer to Peer Review Final Approval Central Annotation Proposals to Data bases ? Discussion Voting in Wiki Solid (a) 0.1 0.9 0.4 Liquid (b) Reduction False Positives Meta-analysis Proximity measures Gas (C) 1st order Semantic enrichment New publications or annotations Science Wiki’s • REGISTRATION (1X) • Unique Author ID • E-mail Adress • PHP/userpage • People Knowlets • Unique concept ID • Language variants • Homonyms • Definitions (brief) • Object Knowlets • UID from WiktionaryZ • Research information • Talk-page • Liquid Threads • Object Knowlets • UID from WiktionaryZ • Articles about UID’s • Encyclopaedic/ NPOV • Anonymous allowed Nature News February 15, 2007 Core concept: v v v v v v ? v v v Malaria (mean distance 5) chloroquine primaquine New Drug ???? Para-amino-benzoic acid Cellular Memberan (GO) Mosquitoes