Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BDPGx - A Big Data Platform for Graph-based Pharmacogenomics Data Pavan Kumar A Big Data Analytics Team C-DAC KP, Bengaluru Outline     Pharmacogenomics Biological Data Repositories Graph Databases (What and Why) Big Data Platform for Pharmacogenomcis Databases – Neo4j & Pharmacogenomics Graph Database – MapReduce : BLAST – Web Application for querying and visualization Pharmacogenomics  Pharmacogenomics = Pharma + Gene + Omics  Drug therapy consists of three major processes  Pharmacokinetic process  Pharmacodynamic process  Therapeutic process Pharmacogenomics  Pharmacogenomics led us Personalized Medicine  What and Why Personalized Medicine? Pharmacogenomics : ADR ERRORS’s in Health Care  Some Facts of ADR  1.6–41.4 % of patients undergo therapy prone to ADR’s  $17–29 billion spent annually to preventable ADR  In US, ADRs responsible for ~100,000 deaths annually Pharmacogenomics : ADR  Factors for ADR’s I. Genetic Factors  Pharmacokinetics  Pharmacodynamics  SNPs (Single Nucleotide Polymorphism) II. Environmental Factors  Tobacco, Alcohol, Pollution, Diet habits and so on III. Physiological Factors  Age, Gender, Disease state, Pregnancy, Starvation, Microbial Composition and so on Pharmacogenomics : Pharmacokinetics  Pharmacokinetics  What the body does to drug.  This is captured by actions like  Movement of drug into the body, through the body and out of the body, which is referred as ABSORBTION, BIOAVAILABILTY, DISTRIBUTION, METABOLISM and EXCRETION Pharmacogenomics : Pharmacodynamics  Pharmacodynamics  What the drug does to body.  This is captured by actions like  Receptor Binding  Post-receptor effects  Chemical Interactions Pharmacogenomics : SNPs  SNPs  Single Nucleotide Polymorphisms  Most common way type of Genetic Variation among people Pharmacogenomics : SNPs  Diseases caused SNPs Autoimmune Diseases Neuropsychological Genetic Diseases Cancers Neurodegenerative Disorders Cardiovascular Diseases Neuropsychological Digestive Disorders Addiction Dependence Female-Specific Diseases Pharmacogenomics : SNPs Pharmacogenomics : Microbial Composition  Microbial Composition  Microbes in our body makeup to 100 Trillion cells (10 fold the number of human cells) Image source: http://www.freegrab.net/Immune Digestive System Connection.htm Pharmacogenomics  Finally… Pharmacogenomics SNPs Protein structural variations Microbial Composition Metabolomics Environmental factors: Chemicals, Diet, Tobacco, Alcohol etc Gene Expression Physiological factors: Age, Gender, Disease state, Pregnancy, Circadian rhythm, Starvation How We Study ?  Data related to different domains are stored as Open Data Repositories  Download the data  Data Format : XML, CSV or Excel  Query a database via web application Biological Data Repositories  Following are some of Pharmacogenomics Databases  PharmGKB – Pharmacogenomics Knowledge Base  DrugBank - chemical, pharmaceutical and pharmacological data  IGVdb - Indian Genome Variation Database  CTD - Comparative Toxicogenomics Database  STITCH (Search Tool for Interactive Chemicals) – Chemical Protein Interaction Networks  TTD - Therapeutic Target Database  KEGG (Kyoto Encyclopaedia of Genes and Genomes) Integration of Biological Data Repositories  Data is spread across many repositories.  User has to navigate many pages on the web or across many websites.  So there is a need to integrate all the data to get consolidated information on place Interlinked Biological Data     Databases Consortiums Tools Information from Articles, Literature Pasha and Scaria etal 2013 Omics for personalized medicine Integrating Databases  Integrating many databases based on Internationalized Resource Identifiers (IRI)  Sample for SCN5A(Sodium channel protein type 5 subunit alpha) Database Gene Organism Uniprot SCN5A Human CTD SCN5A PharmGKB SCN5A Len Interacting Chemical Disease/ Disorder Pathway/s sodium arsenite Atrial Fibrillation Developmental Biology Chrom_Start Chrom_End 38564558 38666167 2016 Database Gene Organism Len Interacting Chemical My_DB SCN5A Human 2016 sodium arsenite Disease/ Disorder Pathway/s Chrom_Start Chrom_End Atrial Fibrillation Developme ntal Biology 38564558 38666167 NoSQL database family Graph Databases  Graph Databases are NoSQL databases Family.  Pictorial representation of data in the form of Nodes and Edges (with or without properties) Image Source : https://www.3pillarglobal.com/insights/exploring-thedifferent-types-of-nosql-databases Why Graph Databases?  Graph Databases are well suited for interconnected data.  Some of the use cases of Graph Databases  Fraud Detection  Graph-Based Search  Network and IT Operations  Real-Time Recommendations Engines  Social Network  Identity and Access Managements  Ref : https://neo4j.com/use-cases/ Graph Databases : Properties  Two important properties of graph databases technologies  Native Graph Storage  Some serialize to RDMS  Native Graph Processing (a.k.a “index-free adjacency”)  Connected nodes physically “point” to each other Graph Databases Graph Database : Neo4j  Most of the Biological data is interconnected, Graph databases are well suited.  World’s Leading Graph Database :      Open Source and Welcoming UI Native graph storage with Native GPE(Graph Processing Engine) Easy to represent connected data Faster to retrieve/traversal/navigation of more Connected data Represents Semi-structured data Graph Database : Neo4j  In Neo4j, Cypher Query Language (CQL) is used to create nodes, labels, edges and properties  Example: Pharmacogenomics Graph Database Pharmacogenomics Graph Database Pharmacogenomics Graph Database Pharmacogenomics Graph Database Pharmacogenomics Graph Database Pharmacogenomics Graph Database BDPGx - A Big Data Platform for Graph-based Pharmacogenomics Data  Tools and Technologies: BDPGx - A Big Data Platform for Graph-based Pharmacogenomics Data  BDPGx has  4 107 474 Nodes  3 994 226 Properties  46 840 614 Relationships  15 Relationship types Conclusion  Biological data is generated from various sources and available in different formats  Finding correlations among the available data can give better insights  BDPGx  User-friendly access to get most appropriate information to the researcher THANK YOU