* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download QSTAR - Institute of Bioinformatics
Polycomb Group Proteins and Cancer wikipedia , lookup
Gene therapy wikipedia , lookup
Public health genomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene desert wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Gene nomenclature wikipedia , lookup
Genome evolution wikipedia , lookup
Genome (book) wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Microevolution wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Designer baby wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression programming wikipedia , lookup
Quantitative Structure-Transcription-Activity Relationships (QSTAR) Günter Klambauer Institute of Bioinformatics Johannes Kepler University, Linz, Austria Günter Klambauer The QSTAR project Günter Klambauer Institute of Bioinformatics, JKU, Linz 2 The QSTAR project ● Sub-projects for specific drug targets ● PDE10: phosphodiesterase inhibitors ● Macrocycles – EGFR inhibitors ● MTP – microsomal triglyceride transfer protein ● ROS1 ● mGLU-R2 PAM: schizophrenia ● FGFR ● ~1600 microarrays, ~750 compounds profiled ● Analysis with machine learning methods Günter Klambauer PDE10 inhibitors project Günter Klambauer Macrocycles / EGFR inhibitors project Günter Klambauer Macrocycles/EGFR inhibitors project ● Identified a fingerprint feature in the MCX/EGFR data set that explains inactivity Günter Klambauer 62513 89668 O N O N O N N NH O Cl O NH NH O Cl EGFR: FABIA for identification of compoundinduced transcriptional modules ● ● Transcriptional module containing genes related to the MAPK/ERK pathway and cell cycle Transcriptional module containing mitochondrial genes → potential adverse side effect Günter Klambauer Institute of Bioinformatics, JKU, Linz 7 MTP: microsomal triglyceride transfer protein Günter Klambauer FABIA for identification of compound-induced transcriptional modules ● Module with genes encoding proteins of the SREBP pathway HepG2 LnCap Günter Klambauer Institute of Bioinformatics, JKU, Linz 9 ROS1: Selection of scaffold with low promiscuity Günter Klambauer Institute of Bioinformatics, JKU, Linz 10 mGLU-R2-PAM Detection of transcriptional side-effect ● CHAC1 stress effect in the mGLU-R2PAM project and in Cmap Günter Klambauer Institute of Bioinformatics, JKU, Linz 11 FGFR: Biclustering for identification of compoundinduced transcriptional modules ● ● Transcriptional module with genes encoding MAPK/ERK inhibiting proteins One transcriptionally inactive compound Günter Klambauer Institute of Bioinformatics, JKU, Linz 12 Methods ● ● Detection of differential expression, sparse signals ● FARMS, Laplace-FARMS (microarrays) ● DEXUS (RNA-Seq) Biclustering ● ● FABIA Molecule kernels, similarity measures for molecules Günter Klambauer Institute of Bioinformatics, JKU, Linz 13 Gene expression data matrix ● ● ● ● Rows: genes, transcripts, probes Columns: Samples Entry: expression value (intensity, read count) Symbol: X Günter Klambauer Institute of Bioinformatics, JKU, Linz 14 FARMS & MetaFARMS: The model ● ● ● ● ● The gene expression data x of the genes in the module is explained by l measures how much a gene contributes to the module z measures how much the gene module is expressed in the sample e is independent noise Difference to FARMS: correction of the covariance matrix not done; negative loading values are allowed in metaFARMS Günter Klambauer Institute of Bioinformatics, JKU, Linz 15 Laplace FARMS for detection of sparse signals in gene expression data ● ● ● Extension of the FARMS algorithm with a Laplacian prior Variational approach and exact computation Detection of rare side-effects of compounds Günter Klambauer DEXUS: Detection of differential expression in RNA-Seq data ● Statistical model for RNA-Seq read counts ● Detection of differential expression ● Negative binomial distribution for read counts ● Previous methods: supervised setting ● ● Case-control studies ● Multiple groups ● Replicates DEXUS: unsupervised setting ● All study designs ● Unknown groups ● No replicates Günter Klambauer The DEXUS Model ● ● mi is the mean of condition i ri controls the variance of condition i (small r → high variance) ● Probability of observing a condition i is ai (mixture weights) ● Estimating MAP parameters using EM Günter Klambauer The DEXUS Model ● ● mi is the mean of condition i ri controls the variance of condition i (small r → high variance) ● Probability of observing a condition i is ai (mixture weights) ● Estimating MAP parameters using EM Günter Klambauer Distribution of Read Counts Per Transcript Günter Klambauer Mixture Model of Readcounts Günter Klambauer Unknown Conditions ● ● Differential expression (conditions known) ● more than one condition is present ● conditions are different with respect to mean read counts For unknown conditions ● must detect conditions first ● multiple conditions can only be detected if transcript is DE → Simultaneous detection of DE and conditions! Günter Klambauer Unknown Conditions ● ● Differential expression (conditions known) M-Step ● more than one condition is present ● conditions are different with respect to mean read counts For unknown conditions E-Step ● must detect conditions first ● multiple conditions can only be detected if transcript is DE → Simultaneous detection of DE and conditions! Günter Klambauer Prior distribution ● Prior on the size parameter: Exponential distribution ● Prior on the condition probabilities: Dirichlet distribution Günter Klambauer MAP and ML Estimators for the Size Parameter r drawn from a Gaussian (1,0.1); m=20; n=5; h=0.8. Günter Klambauer Update rules derived from the EM algorithm E-step: M-step: m update: r update: a update: Günter Klambauer Determining Differential Expression in DEXUS Strong Signal Weak Signal Weak Signal Günter Klambauer Calling Differential Expression ● I/NI call ● ● Evidence for multiple components (Low FDR due to Dirichlet prior) Evidence for different means Günter Klambauer Real world data: Pickrell et al. ● Genes on the Y chromosome ● Genes with eQTLs (known from other studies) with high MAF ● High ranks: genes on X-chromosome ● High ranks: genes with CNVs as eQTLs Günter Klambauer Real world data: maize plant leafs ● Different locations of the maize leaf: ● -1cm from base, ● base, ● +4 cm, ● tip, ● bundle sheat, ● mesophyll ● Illumina Genome Analyzer II ● Mapped to ZmB73v2 with GSNAP, read counts per gene Günter Klambauer Real world data: maize plant leafs Günter Klambauer FABIA: Factor analysis for bicluster acquisition Günter Klambauer FABIA: Factor analysis for bicluster acquisition ● ● ● L is sparse loading matrix Laplace-distribution, mean zero, variance one Z is sparse factors Laplace distribution, mean zero, variance one U is additive noise Günter Klambauer Biclustering of bioassay and fingerprint data Goals ● ● ● ● FABIA using a sparse algebra for efficient biclustering of sparse data Reducing the dimensionality of the chemical and the bioassay data Finding “building blocks” in the chemical data Finding assays correlated on a subset of compounds Günter Klambauer Institute of Bioinformatics, JKU, Linz 34 Sparse FABIA for biclustering of bioassay and fingerprint data RESULTS Günter Klambauer Institute of Bioinformatics, JKU, Linz 35 Rchemcpp: Molecular similarity by kernels ● Molecule kernels ● ● ● ● Measuring the similarity of molecules (compounds) Measure is based on the number of common substructures of two compounds Result is a similarity matrix or (positive semi-definite) kernel matrix Rchemcpp ● Implementation of various types of molecule kernels ● R package for easy handling of the data Günter Klambauer Institute of Bioinformatics, JKU, Linz 36 Molecule kernels for structural analoging ● ● ● R implementation of different types of molecule kernels and visualization ● Walk-based kernels ● Tanimoto kernels ● MinMax kernels ● Pharmacophore kernels Similarity measures for ● Clustering ● Machine learning methods (e.g. SVMs) Dimensionality reduction of the chemical space Günter Klambauer Institute of Bioinformatics, JKU, Linz 37 Rchemcpp: Molecular similarity by kernels ● ● Input is a set of molecules in sdf format Output is a numeric similarity matrix (Fig.) ● ● ● Reducing the dimensionality of the chemical space (clustering) Prediction of properties/activity Does not scale to databases like ChEMBL Günter Klambauer Institute of Bioinformatics, JKU, Linz 38 Rchemcpp ● Molecule kernels available in R via Rchemcpp ● ● Easy handling of similarities of compounds Alternative approach to fingerprints ● Prediction of properties/activity ● Reduction of dimensionality by clustering of compounds ● Identification of “exemplars” for each cluster ● Dataset-wise approach Günter Klambauer Institute of Bioinformatics, JKU, Linz 39 Molecule kernels for structural analoging ● ● ● Bioconductor package Rchemcpp Web-service for finding structural analogs in ChEMBL Winning method of the NIEHS-NCATS-UNC Toxicogenetics Challenge Günter Klambauer Summary ● Application of machine learning techniques in drug discovery ● Gene expression signatures of compounds ● Analysis of gene expression data ● ● Insights in target-related effects ● Insights into side-effecs (off-target effects) All methods thoroughly compared against competing methods in separate publications Günter Klambauer References ● ● ● ● Günter Klambauer, Bie Verbist, Liesbet Vervoort, Willem Talloen, QSTAR Consortium, Ziv Shkedy, Olivier Thas, Andreas Bender, Hinrich W.H. Göhlmann, Sepp Hochreiter (2015). Using transcriptomics to guide lead optimization in drug discovery projects. Drug Discovery Today, 20(5). Klambauer, G., Wischenbart, M., Mahr, M., Unterthiner, T., Mayr, A., & Hochreiter, S. (2015). Rchemcpp: a web service for structural analoging in ChEMBL, Drugbank and the Connectivity Map. Bioinformatics, 31(20), 3392-3394. Hochreiter, S., Bodenhofer, U., Heusel, M., Mayr, A., Mitterecker, A., Kasim, A., ... & Bijnens, L. (2010). FABIA: factor analysis for bicluster acquisition. Bioinformatics, 26(12), 1520-1527. Hochreiter, S., Clevert, D. A., & Obermayer, K. (2006). A new summarization method for Affymetrix probe level data. Bioinformatics, 22(8), 943-949. Günter Klambauer THANK YOU! HepG2 Günter Klambauer P-SVM for prediction of transcriptional response ● Prediction of transcriptional response by chemical features ● Obtaining information about structure-activity relationships ● Construction and verification of gene modules: ● Prediction of gene modules by gene expression features ● Prediction of primary assay by gene expression features Günter Klambauer metaFARMS for summarizing gene modules ● Modification of the FARMS algorithm for summarization of gene modules ● Visualization of gene modules (“gene plots”) ● Expression value of a compound for a gene module ● Ranking compounds with respect to a gene module ● Reducing noise in the measurements Günter Klambauer Institute of Bioinformatics, JKU, Linz 45 Logistic regression for target prediction ● Development of a new method ● ● ● ● ● Logistic regression Fast pipeline using fingerprint features and bioassay data Revealing the biomolecular target of a new compound Identifying the mechanism of action Repurposing known drugs for new indications Günter Klambauer Institute of Bioinformatics, JKU, Linz 46 Logistic regression for predicting compound targets Günter Klambauer Naive Bayes Parzen-Rosenblatt Logistic Regression AUC 0.661 0.630 0.599 Scoring SVM Deep Networks 0.663 0.671 ? Institute of Bioinformatics, JKU, Linz 47