Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BDPGx - A Big Data Platform for Graph-based Pharmacogenomics Data Pavan Kumar A Big Data Analytics Team C-DAC KP, Bengaluru Outline Pharmacogenomics Biological Data Repositories Graph Databases (What and Why) Big Data Platform for Pharmacogenomcis Databases – Neo4j & Pharmacogenomics Graph Database – MapReduce : BLAST – Web Application for querying and visualization Pharmacogenomics Pharmacogenomics = Pharma + Gene + Omics Drug therapy consists of three major processes Pharmacokinetic process Pharmacodynamic process Therapeutic process Pharmacogenomics Pharmacogenomics led us Personalized Medicine What and Why Personalized Medicine? Pharmacogenomics : ADR ERRORS’s in Health Care Some Facts of ADR 1.6–41.4 % of patients undergo therapy prone to ADR’s $17–29 billion spent annually to preventable ADR In US, ADRs responsible for ~100,000 deaths annually Pharmacogenomics : ADR Factors for ADR’s I. Genetic Factors Pharmacokinetics Pharmacodynamics SNPs (Single Nucleotide Polymorphism) II. Environmental Factors Tobacco, Alcohol, Pollution, Diet habits and so on III. Physiological Factors Age, Gender, Disease state, Pregnancy, Starvation, Microbial Composition and so on Pharmacogenomics : Pharmacokinetics Pharmacokinetics What the body does to drug. This is captured by actions like Movement of drug into the body, through the body and out of the body, which is referred as ABSORBTION, BIOAVAILABILTY, DISTRIBUTION, METABOLISM and EXCRETION Pharmacogenomics : Pharmacodynamics Pharmacodynamics What the drug does to body. This is captured by actions like Receptor Binding Post-receptor effects Chemical Interactions Pharmacogenomics : SNPs SNPs Single Nucleotide Polymorphisms Most common way type of Genetic Variation among people Pharmacogenomics : SNPs Diseases caused SNPs Autoimmune Diseases Neuropsychological Genetic Diseases Cancers Neurodegenerative Disorders Cardiovascular Diseases Neuropsychological Digestive Disorders Addiction Dependence Female-Specific Diseases Pharmacogenomics : SNPs Pharmacogenomics : Microbial Composition Microbial Composition Microbes in our body makeup to 100 Trillion cells (10 fold the number of human cells) Image source: http://www.freegrab.net/Immune Digestive System Connection.htm Pharmacogenomics Finally… Pharmacogenomics SNPs Protein structural variations Microbial Composition Metabolomics Environmental factors: Chemicals, Diet, Tobacco, Alcohol etc Gene Expression Physiological factors: Age, Gender, Disease state, Pregnancy, Circadian rhythm, Starvation How We Study ? Data related to different domains are stored as Open Data Repositories Download the data Data Format : XML, CSV or Excel Query a database via web application Biological Data Repositories Following are some of Pharmacogenomics Databases PharmGKB – Pharmacogenomics Knowledge Base DrugBank - chemical, pharmaceutical and pharmacological data IGVdb - Indian Genome Variation Database CTD - Comparative Toxicogenomics Database STITCH (Search Tool for Interactive Chemicals) – Chemical Protein Interaction Networks TTD - Therapeutic Target Database KEGG (Kyoto Encyclopaedia of Genes and Genomes) Integration of Biological Data Repositories Data is spread across many repositories. User has to navigate many pages on the web or across many websites. So there is a need to integrate all the data to get consolidated information on place Interlinked Biological Data Databases Consortiums Tools Information from Articles, Literature Pasha and Scaria etal 2013 Omics for personalized medicine Integrating Databases Integrating many databases based on Internationalized Resource Identifiers (IRI) Sample for SCN5A(Sodium channel protein type 5 subunit alpha) Database Gene Organism Uniprot SCN5A Human CTD SCN5A PharmGKB SCN5A Len Interacting Chemical Disease/ Disorder Pathway/s sodium arsenite Atrial Fibrillation Developmental Biology Chrom_Start Chrom_End 38564558 38666167 2016 Database Gene Organism Len Interacting Chemical My_DB SCN5A Human 2016 sodium arsenite Disease/ Disorder Pathway/s Chrom_Start Chrom_End Atrial Fibrillation Developme ntal Biology 38564558 38666167 NoSQL database family Graph Databases Graph Databases are NoSQL databases Family. Pictorial representation of data in the form of Nodes and Edges (with or without properties) Image Source : https://www.3pillarglobal.com/insights/exploring-thedifferent-types-of-nosql-databases Why Graph Databases? Graph Databases are well suited for interconnected data. Some of the use cases of Graph Databases Fraud Detection Graph-Based Search Network and IT Operations Real-Time Recommendations Engines Social Network Identity and Access Managements Ref : https://neo4j.com/use-cases/ Graph Databases : Properties Two important properties of graph databases technologies Native Graph Storage Some serialize to RDMS Native Graph Processing (a.k.a “index-free adjacency”) Connected nodes physically “point” to each other Graph Databases Graph Database : Neo4j Most of the Biological data is interconnected, Graph databases are well suited. World’s Leading Graph Database : Open Source and Welcoming UI Native graph storage with Native GPE(Graph Processing Engine) Easy to represent connected data Faster to retrieve/traversal/navigation of more Connected data Represents Semi-structured data Graph Database : Neo4j In Neo4j, Cypher Query Language (CQL) is used to create nodes, labels, edges and properties Example: Pharmacogenomics Graph Database Pharmacogenomics Graph Database Pharmacogenomics Graph Database Pharmacogenomics Graph Database Pharmacogenomics Graph Database Pharmacogenomics Graph Database BDPGx - A Big Data Platform for Graph-based Pharmacogenomics Data Tools and Technologies: BDPGx - A Big Data Platform for Graph-based Pharmacogenomics Data BDPGx has 4 107 474 Nodes 3 994 226 Properties 46 840 614 Relationships 15 Relationship types Conclusion Biological data is generated from various sources and available in different formats Finding correlations among the available data can give better insights BDPGx User-friendly access to get most appropriate information to the researcher THANK YOU