Download Pharmacogenomics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pharmacogenomics wikipedia , lookup

Transcript
BDPGx - A Big Data Platform for
Graph-based Pharmacogenomics
Data
Pavan Kumar A
Big Data Analytics Team
C-DAC KP, Bengaluru
Outline




Pharmacogenomics
Biological Data Repositories
Graph Databases (What and Why)
Big Data Platform for Pharmacogenomcis Databases
– Neo4j & Pharmacogenomics Graph Database
– MapReduce : BLAST
– Web Application for querying and visualization
Pharmacogenomics
 Pharmacogenomics = Pharma + Gene + Omics
 Drug therapy consists of three
major processes
 Pharmacokinetic process
 Pharmacodynamic process
 Therapeutic process
Pharmacogenomics
 Pharmacogenomics led us Personalized Medicine
 What and Why Personalized Medicine?
Pharmacogenomics : ADR
ERRORS’s in Health Care
 Some Facts of ADR
 1.6–41.4 % of patients undergo therapy prone to
ADR’s
 $17–29 billion spent annually to preventable ADR
 In US, ADRs responsible for ~100,000 deaths
annually
Pharmacogenomics : ADR
 Factors for ADR’s
I. Genetic Factors
 Pharmacokinetics
 Pharmacodynamics
 SNPs (Single Nucleotide Polymorphism)
II. Environmental Factors
 Tobacco, Alcohol, Pollution, Diet habits and so on
III. Physiological Factors
 Age, Gender, Disease state, Pregnancy, Starvation,
Microbial Composition and so on
Pharmacogenomics : Pharmacokinetics
 Pharmacokinetics
 What the body does to drug.
 This is captured by actions like
 Movement of drug into the body, through the body and
out of the body, which is referred as ABSORBTION,
BIOAVAILABILTY, DISTRIBUTION, METABOLISM and
EXCRETION
Pharmacogenomics : Pharmacodynamics
 Pharmacodynamics
 What the drug does to body.
 This is captured by actions like
 Receptor Binding
 Post-receptor effects
 Chemical Interactions
Pharmacogenomics : SNPs
 SNPs
 Single Nucleotide
Polymorphisms
 Most common way type
of Genetic Variation
among people
Pharmacogenomics : SNPs
 Diseases caused SNPs
Autoimmune
Diseases
Neuropsychological
Genetic
Diseases
Cancers
Neurodegenerative
Disorders
Cardiovascular
Diseases
Neuropsychological
Digestive
Disorders
Addiction
Dependence
Female-Specific
Diseases
Pharmacogenomics : SNPs
Pharmacogenomics : Microbial Composition
 Microbial Composition
 Microbes in our body
makeup to 100 Trillion
cells (10 fold the number
of human cells)
Image source: http://www.freegrab.net/Immune Digestive System Connection.htm
Pharmacogenomics
 Finally…
Pharmacogenomics
SNPs
Protein structural
variations
Microbial
Composition
Metabolomics
Environmental factors:
Chemicals, Diet, Tobacco,
Alcohol etc
Gene Expression
Physiological factors:
Age, Gender, Disease state,
Pregnancy, Circadian rhythm,
Starvation
How We Study ?
 Data related to different domains are stored as Open Data
Repositories
 Download the data
 Data Format : XML, CSV or Excel
 Query a database via web application
Biological Data Repositories
 Following are some of Pharmacogenomics Databases
 PharmGKB – Pharmacogenomics Knowledge Base
 DrugBank - chemical, pharmaceutical and pharmacological
data
 IGVdb - Indian Genome Variation Database
 CTD - Comparative Toxicogenomics Database
 STITCH (Search Tool for Interactive Chemicals) – Chemical
Protein Interaction Networks
 TTD - Therapeutic Target Database
 KEGG (Kyoto Encyclopaedia of Genes and Genomes)
Integration of Biological Data Repositories
 Data is spread across many repositories.
 User has to navigate many pages on the web or across many
websites.
 So there is a need to integrate all the data to get consolidated
information on place
Interlinked Biological Data




Databases
Consortiums
Tools
Information from
Articles, Literature
Pasha and Scaria etal 2013 Omics for personalized medicine
Integrating Databases
 Integrating many databases based on Internationalized Resource
Identifiers (IRI)
 Sample for SCN5A(Sodium channel protein type 5 subunit alpha)
Database
Gene
Organism
Uniprot
SCN5A Human
CTD
SCN5A
PharmGKB
SCN5A
Len
Interacting
Chemical
Disease/
Disorder
Pathway/s
sodium
arsenite
Atrial
Fibrillation
Developmental
Biology
Chrom_Start
Chrom_End
38564558
38666167
2016
Database
Gene
Organism
Len
Interacting
Chemical
My_DB
SCN5A
Human
2016 sodium
arsenite
Disease/
Disorder
Pathway/s
Chrom_Start
Chrom_End
Atrial
Fibrillation
Developme
ntal Biology
38564558
38666167
NoSQL database family
Graph Databases
 Graph Databases are NoSQL databases Family.
 Pictorial representation of data in the form of Nodes and
Edges (with or without properties)
Image Source : https://www.3pillarglobal.com/insights/exploring-thedifferent-types-of-nosql-databases
Why Graph Databases?
 Graph Databases are well suited for interconnected data.
 Some of the use cases of Graph Databases
 Fraud Detection
 Graph-Based Search
 Network and IT Operations
 Real-Time Recommendations Engines
 Social Network
 Identity and Access Managements
 Ref : https://neo4j.com/use-cases/
Graph Databases : Properties
 Two important properties of graph databases technologies
 Native Graph Storage
 Some serialize to RDMS
 Native Graph Processing (a.k.a “index-free adjacency”)
 Connected nodes physically “point” to each other
Graph Databases
Graph Database : Neo4j
 Most of the Biological data is interconnected, Graph
databases are well suited.
 World’s Leading Graph Database :





Open Source and Welcoming UI
Native graph storage with Native GPE(Graph Processing Engine)
Easy to represent connected data
Faster to retrieve/traversal/navigation of more Connected data
Represents Semi-structured data
Graph Database : Neo4j
 In Neo4j, Cypher Query Language (CQL) is used to create
nodes, labels, edges and properties
 Example:
Pharmacogenomics Graph Database
Pharmacogenomics
Graph Database
Pharmacogenomics Graph Database
Pharmacogenomics Graph Database
Pharmacogenomics Graph Database
Pharmacogenomics Graph Database
BDPGx - A Big Data Platform for Graph-based
Pharmacogenomics Data
 Tools and Technologies:
BDPGx - A Big Data Platform for Graph-based
Pharmacogenomics Data
 BDPGx has
 4 107 474 Nodes
 3 994 226 Properties
 46 840 614 Relationships
 15 Relationship types
Conclusion
 Biological data is generated from various sources and
available in different formats
 Finding correlations among the available data can give better
insights
 BDPGx
 User-friendly access to get most appropriate information
to the researcher
THANK YOU