Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Discovering Disease Associations using a Biomedical Semantic Web: Integration and Ranking Ranga Chandra Gudivada1,2, Xiaoyan A. Qu 1,2, Anil G Jegga2,3,4, Eric K. Neumann5 , Bruce J Aronow1,2,3,4 Departments of Biomedical Engineering1 and Pediatrics2, University of Cincinnati, Center for Computational Medicine3 and Division of Biomedical Informatics4, Cincinnati Children’s Hospital Medical Center, Cincinnati OH-45229, USA and Teranode Corporation5, Seattle, WA 98104 Abstract Mouse Phenotype Description Disease Mammalian Phenotype Biol.Process Description Mouse Phenotype ID Cell.Component Description Others Case Study-Prioritizing Modifier Genes, Pathways and Biological Processes for CARDIOMYOPATHY, DILATED Step1 Pathways Pathway Description OMIM rdfs:label CARDIOMYOPATHY, BIOCARTA KEGG DILATED, BIOCYC X-LINKED hasAssociated Gene Disease CUI Gene / Protein Annotations Entrez Gene Disease Name SwissProt Biological Process Interacting Process Biol.Process GO ID Primary Genes Partners GO_0006936 muscle contraction (1) (16) GO_0007016 cytoskeletal anchoring Biological Processes GO_0043043 peptide biosynthesis (4) GO_0007517 muscle development DMD Molecular Mol.Function GO ID Interactions BIND Anatomy CUI Step2 REACTOME Gene Ontology others Agrin in Postsynaptic h_agrPathway Differentiation (1) inBiological Gene Symbol Cell.Component GO ID Pathways Pathways Pathway Id hasAssociated Anatomy One of the principal goals of biomedical research is to elucidate the complex network of gene interactions underlying common human diseases. Although integrative genomics based approaches have been shown to be successful in understanding the underlying pathways and biological processes in normal and disease states, most of the current biomedical knowledge is spread across different databases in different formats. Semantic Web principals, standards and technologies provide an ideal platform to integrate such heterogeneous information and bring forth implicit relations hitherto embedded in these large integrated biomedical and genomic datasets. Semantic Web query languages such as SPARQL can be effectively used to mine the biological entities underlying complex diseases through richer and complex queries on this integrated data. However, the end results are frequently large and unmanageable. Thus, there is a great need to develop techniques to rank resources on the Semantic Web which can later be used to retrieve and rank the results and prevent the information overload. Such ranking can be used to prioritize the discovered disease–gene, disease–pathway or disease– processes novel relationships. We implemented an existing semantic web based knowledge mining technique which not only discovers underlying genes, processes and pathways of diseases but also determines the importance of the resources to rank the results of a search while determining the semantic associations. Data Integration- RDF MODEL Mol.Function Description Anatomy Name Primary genes Nature Pathway Interaction database SPARQL QUERY + PREFIX CCHMC:<http://www.cchmc.com/test.owl#> Interacting Partners PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT DISTINCT ?pathway (1+16) Ranking on Semantic Web where { ?pathway rdf:type CCHMC:Pathway . Biological Problem KleinBerg Algorithm (1) Biological Processes Disease genes discovered to date likely represent the easy ones. Discovering the genetic basis of remaining Mendelian and complex gene-X-gene-X-environment disorders will be challenging and require consideration of many more features and causal relationships (27) Points to many authoritative sites, increases the hub scores High Hub score Hub Nodes Identifying modifier genes, i.e. gene networks underlying diseases is challenging (pathways, biological processes and functions) High Authoritative score Authoritative node No gene operates in vacuum, all gene, protein, pathway interactions can lead to Modifier Gene effects Computational Problem Pointed by good hubs its authoritative score increases Data complexity poses a formidable challenge to efforts to integrate, formally model, and simulate biological systems behaviors Extending ‘KleinBerg Algorithm’(2) for Semantic Web Likelihood Ranking requires mining and prioritization of entities and events that function in the context of biological networks Subjectivity Weight associatedPathway Objectivity weight gene Semantic Web standards such as Resource Description Framework (RDF) & Ontology Web Language (OWL) facilitate semantic integration of heterogeneous multi-source data Pathway Subjectivity weight > objectivity weight (28) Modifier Genes (16) Rank GeneSymbol Score 1 UTRN 21.89344952 2 FASLG 17.42028994 3 ACTA1 12.36025539 4 DTNA 8.888475658 5 DAG1 5.893112758 6 KCNJ12 4.838225059 7 SNTA1 4.623228312 Pubmed Evidence 12868498 10423348 11186993 16168288 16080838 16945537 10508519 16644324 16427346 15117830 14564412 Novel Gene 16427346 QUERY RESULT WITH PRIORITIZATION Conclusion We have shown that related yet heterogeneous 1 h_agrPathway Agrin in Postsynaptic Differentiation 1.134984242 information can be integrated using RDF-OWL and 2 h_hsp27Pathway Stress Induction of HSP Regulation 0.139887918 that this approach can support mechanistic analyses 3 h_actinYPathway Y branching of actin filaments 0.093908976 of diseases. Specifically, we have uncovered 3 h_no1Pathway Actions of Nitric Oxide in the Heart 0.093908976 additional genes and pathways that could play a role in the onset and treatment of Cardiomyopathy. 3 h_nfatPathway NFAT and Hypertrophy of the heart (Transcription0.093908976 in the broken heart) 3 h_metPathway Signaling of Hepatocyte Growth Factor Receptor 0.093908976 We intend to expand our analyses into additional 3 h_salmonellaPathway How does salmonella hijack a cell 0.093908976 modalities such as anatomy, cellular type, and 3 h_mCalpainPathway mCalpain and friends in Cell motility 0.093908976 symptoms/ phenotypes. A single gene participating in multiple biological pathways is considered more sensitive to perturbation than a single pathway having a large number of nodes (Different weights for non - symmetric properties); corollary : 3 3 h_PDZsPathway h_rabPathway Synaptic Proteins at the Synaptic Junction 0.093908976 Rab GTPases Mark Targets In The Endocytotic Machinery 0.093908976 Biological Processes (27) SPARQL, a semantic web query language , capable of making queries of higher order relationships in multi dimensional data can be used to mine Bio-RDF graphs Prioritization of biological entities on semantic web can be accomplished by extending[2] and applying existing graph algorithms, such as Kleinberg Aglorithm[1] } Pathways (28) Data integration: biological feature complexity is deep, heterogeneous, and extensive. Benefits of Semantic Web ?resource ?PROPERTY ?pathway . Pathways Subjectivity Weight interacts Objectivity weight geneA Subjectivity weight = objectivity weight GeneA interacting with various genes has equal significance as GeneB interacting with geneB various genes (Equal weights for symmetric properties) 1 2 3 4 4 4 4 GO_0006936 GO_0007517 GO_0007165 GO_0048741 GO_0030240 GO_0043043 GO_0007016 muscle contraction muscle development signal transduction skeletal muscle fiber development muscle thin filament assembly peptide biosynthesis cytoskeletal anchoring 1.5385859 0.3562762 0.1139403 0.1102909 0.1102909 0.1027902 0.1027902 1.Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5 (Sep. 1999) 2 Bhuvan Bamba, Sougata Mukherjea: Utilizing Resource Importance for Ranking Semantic Web Query Results. SWDB 2004: 185-198