Download Review Slides

Drug target prioritization by perturbed gene expression and network information Zerrin Isik, Christoph Baldow, Carlo Vittorio Cannistraci & Michael Schroeder Nature Scientific Reports 5, Article number: 17417 (2015) doi:10.1038/srep17417 http://www.nature.com/articles/srep17417 Presented By: Jaya Thomas 1 Abstract Drugs bind to their target proteins, which interact with downstream effectors and ultimately perturb the transcriptome of a cancer cell. These perturbations reveal information about their source, i.e., drugs’ targets. Here, we investigate whether these perturbations and protein interaction networks can uncover drug targets and key pathways. We performed the first systematic analysis of over 500 drugs from the Connectivity Map. First, we show that the gene expression of drug targets is usually not significantly affected by the drug perturbation. Hence, expression changes after drug treatment on their own are not sufficient to identify drug targets. However, ranking of candidate drug targets by network topological measures prioritizes the targets. We introduce a novel measure, local radiality, which combines perturbed genes and functional interaction network information. The new measure outperforms other methods in target prioritization and proposes cancer-specific pathways from drugs to affected genes for the first time. Local radiality identifies more diverse targets with fewer neighbors and possibly less side effects. 2 Introduction • Drugs interact with targets and off-targets, which trigger downstream signaling cascades causing perturbations in the cell’s transcriptome. • The term “target” can refer either to proteins physically binding to the drug or to proteins that are only functionally related. • Drug-induced perturbations have been uncovered at very large scale in the Connectivity Map (CMap) for 1300 compounds on four human cancer cell lines. 3 Introduction • A drug modulates the activity of a target protein, which subsequently regulates down-stream proteins. • Protein-protein interaction (PPI) networks provide such downstream relationships between targets and proteins by using physical contacts, genetic interactions and functional relationships • Questions raised: – Can drug targets be identified from network information and expression alterations induced by a drug? – Does a global or a local network feature give higher target prediction accuracy? – Does the target prediction performance depend on the definition of a target protein? 4 Method • A drug target prioritization method was developed by proposing a new network measure, local radiality (lr) , which integrates both topological data and the perturbed gene information. • Radiality is a well-known centrality measure describing the level of node reachability of a node via different shortest paths of the network. • The new measure describes the reachability of a target protein via the shortest paths to deregulated genes. Thus, it is a new locally constrained radiality. 5 Method 6 Method • • The LR measure helps to prove the initial hypothesis that deregulated genes might be close to drug targets in terms of the network topology. It uses a network G and a set of deregulated genes DG as input. • A score of a node n in the network G is calculated as defined in the Equation 1. • If the network is unweighted, |sp| shows the minimum number of nodes to connect dg and n. • LR utilizes both drug perturbation data (i.e., deregulated genes) and topological information (i.e., shortest path distance). 7 Method Other existing Network centrality measure Stress calculates the frequency of a node in any shortest path of the network where sps(s, t) shows the set of all shortest paths from s to t; and ca(n,sp) is defined as follows: Radiality shows the level of reachability of a node via different shortest paths of the network (i.e., the closer to the rest of nodes, the easier it is to reach). 8 Method Target Prioritization • Gene expression and network topological data are integrated to predict all possible drug targets. • Drug perturbation data (i.e., control vs. treatment) is incorporated into calculation of the topological proximity by using either gene expression values or deregulated genes. • The possible targets of a given drug are predicted by a sorted list according to the closeness scores. • Target prediction aims to eliminate as many false positive target predictions as possible. • Proteins predicted in the 1st percentile of the rank list are the most probable drug targets. If a known drug target is ranked in the 1st percentile of all possible targets, this prediction is accepted as a true positive one. 9 Results Gene expression is not sufficient for target prediction 10 Results 11 12 Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations Carlos L Araya, Can Cenik, Jason A Reuter, Gert Kiss, Vijay S Pande, Michael P Snyder & William J Greenleaf Nature Genetics aop, (2015) | doi:10.1038/ng.3471 http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3471.html 13 Abstract Cancer sequencing studies have primarily identified cancer driver genes by the accumulation of protein-altering mutations. An improved method would be annotation independent, sensitive to unknown distributions of functions within proteins and inclusive of noncoding drivers. We employed densitybased clustering methods in 21 tumor types to detect variably sized significantly mutated regions (SMRs). SMRs reveal recurrent alterations across a spectrum of coding and noncoding elements, including transcription factor binding sites and untranslated regions mutated in up to ~15% of specific tumor types. SMRs demonstrate spatial clustering of alterations in molecular domains and at interfaces, often with associated changes in signaling. Mutation frequencies in SMRs demonstrate that distinct protein regions are differentially mutated across tumor types, as exemplified by a linker region of PIK3CA in which biophysical simulations suggest that mutations affect regulatory interactions. The functional diversity of SMRs underscores both the varied mechanisms of oncogenic misregulation and the advantage of functionally agnostic driver identification. 14 Introduction • In cancer, driver mutations alter functional elements of diverse nature and size • A significant proportion of regulatory elements in the genome are located proximal to or even in exons, suggesting that many may be captured by wholeexome sequencing • Systematic analyses of genomic regulatory activity in animals have identified substantial tissue and developmental stage specificity, suggesting that mutations in cancer type–specific regulatory features may be significant noncoding drivers of cancer • Existing studies (TCGA, ICGC) have focused little attention on systematically analyzing the positional distribution of coding mutations or characterizing noncoding alterations *TCGA: The Cancer Genome Atlas ICGC: International Cancer Genome Consortium 15 Introduction • Algorithms to identify cancer driver genes often examine nonsynonymous-tosynonymous mutation rates across the gene body or recurrently mutated amino acids called mutation hotspots • These analyses ignore recurrent alterations in the vast intermediate scale of functional coding elements, such as those affecting protein subunits or interfaces • To examine mutation clustering, analyses have employed windows of fixed length or identified clusters of nonsynonymous mutations, assuming that driver mutations exclusively influence protein sequence and ignoring the importance of exon-embedded regulatory elements 16 Introduction • The presented approach permitted the unbiased identification of variably sized genomic regions recurrently altered by somatic mutation, termed as significantly mutated regions (SMRs). • SMRs were associated with noncoding elements, protein structures, molecular interfaces, and transcriptional and signaling profiles, thereby providing insight into the molecular consequences of accumulating somatic mutations in these regions. • SMRs identified a rich spectrum of coding and noncoding elements recurrently targeted by somatic alterations that complement gene- and pathway-centric analyses. 17 Method • For each tumor type and gene, authors calculated multiple distinct mutation probabilities. • First, they calculated the frequency of transitions and transversions within the mappable, exonic regions of each gene to derive ‘exonic’ mutation probabilities for each gene in the hg19 human genome assembly using whole-exome sequencing data. • For each gene and in each tumor type, they identified the set of genes most similar in expression, replication time and GC content (gene level features) • Compiled expression + replication timing data and derived feature-specific weights defined as the rank correlation between gene features and the observed exonic mutation probabilities in each tumor type 18 Method • Genes were sorted sequentially on the basis of the gene feature weights, and the neighborhood of the 500 closest genes was selected for each query gene. • Measured the sum of correlation-weighted, absolute feature distances between gene pairs within the 500-gene rank neighborhood. • For each gene, selected the ≤200 most similar genes with a normalized distance score ≤1. Lastly, averaged the ‘exonic’ mutation probability per transition/transversion to derive a set of ‘matched’ mutation probabilities. • Bayesian framework was considered to derive posterior mutation probabilities for each transition and transversion per gene in each of the analyzed cancer types • The distributions of whole-exome sequencing–derived (‘exonic’, ‘matched’ and ‘global’) as well as whole-genome sequencing–derived (‘Bayesian’) mutation probabilities varied strongly between cancer types and among genes within individual cancer types, highlighting the importance of such cancer- and genespecific treatment of background mutation probabilities 19 Method Mutation cluster identification • Deployed density-based spatial clustering of applications with noise (DBSCAN) to detect clusters of ≥2 SNVs within exonic domains, evaluating density reachability within ε base pairs in each cancer type. • The reachability parameter ε was dynamically defined with ε = dp/ds where dp and ds refer to the number of mutated positions (base pairs) and the base-pair size of the domain d, thresholded to 10 ≤ ε ≤ 500 bp. • Detected mutation clusters were refined where subclusters of ≥2 SNVs with significantly higher (P < 0.05, binomial test) mutation densities (mutated tumor samples per kilobase) existed. Mutation cluster filtering. As a final step in calling SMRs, we selected clusters with density scores (Pdensity) at the 5% FDR threshold and that were mutated in ≥2% of samples in each cancer type *FDR: false-discovery rate 20 Method Mutation cluster annotation. • SMRs were annotated on the basis of mutation impact on coding, transcribed and gene-associated regions (referred to “Uniform variant annotation”). • For SMRs associated with multiple genes (overlapping annotations), preferentially assigned SMRs to – (i) previously known cancer driver genes or – (ii) the gene affected by the most severe type of mutation. Where mutation impact was insufficient to resolve assignment to multiple genes, thus selected the gene affected by the largest number of mutations within the SMR. On this basis, assigned each SMR to a single gene, recording the types of mutation impacts on the gene and the class of region affected. • Region classes included exon (coding region and noncoding gene), intron, splice, upstream, 5′ UTR, 3′ UTR, downstream and other (intergenic). • Mutation impacts (from snpEff) included, in order of severity, rare amino acid, splice site acceptor, splice site donor, start lost, stop lost, stop gain, nonsynonymous coding, splice-site branch U12, nonsynonymous start, nonsynonymous stop, splice-site region, splice-site branch, start gain, synonymous coding, synonymous start, synonymous stop, noncoding gene (exon), 3′ UTR, 5′ UTR, miRNA, intron, upstream, downstream and intergenic. 21 Results Multiscale detection of SMRs • (a) 79.0% (n = 2,431,360) of these somatic mutations do not alter protein-coding sequences or their splicing and thus were not previously considered in the analysis of cancer driver mutations • (b) Both coding and noncoding cancer drivers, applied an annotation-independent density-based clustering technique to identify 198,247 variably sized clusters of somatic mutations within exon-proximal domains of the human genome • Moreover, ~10% of genes associated with SMRs in the quintile with the top density scores were not found previously in a genelevel analysis. Thus, high density scores are enriched for known cancer genes but also nominate potentially new drivers 22 Results SMRs implicate diverse noncoding regulatory features • A significant proportion (31.2%; P < 2.2 × 10−16, proportions test) of SMRs are not predicted to affect protein sequences, highlighting the potential to discover pathological noncoding variation in whole-exome sequencing data 23 Results Sought to identify SMRs that might affect the molecular interfaces of protein-protein and DNAprotein interactions, a recognized yet understudied mechanism of cancer driver mutation Examined intermolecular distances between SMR residues and interacting proteins or DNA and identified 17 SMRs that likely alter molecular interfaces 24 A spectral approach integrating functional genomic annotations for coding and noncoding variants Iuliana Ionita-Laza, Kenneth McCallum, Bin Xu & Joseph D Buxbaum Nature Genetics 48, 214–220 (2016) doi:10.1038/ng.3477 http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3477.html 25 Abstract Over the past few years, substantial effort has been put into the functional annotation of variation in human genome sequences. Such annotations can have a critical role in identifying putatively causal variants for a disease or trait among the abundant natural variation that occurs at a locus of interest. The main challenges in using these various annotations include their large numbers and their diversity. Here we develop an unsupervised approach to integrate these different annotations into one measure of functional importance (Eigen) that, unlike most existing methods, is not based on any labeled training data. We show that the resulting meta-score has better discriminatory ability using disease-associated and putatively benign variants from published studies (in both coding and noncoding regions) than the recently proposed CADD score. Across varied scenarios, the Eigen score performs generally better than any single individual annotation, representing a powerful single functional score that can be incorporated in fine-mapping studies. 26 Introduction • Annotations are important because they can – help predict the functional effect of a variant, and – can be further combined with population-level genetic data to identify the variants at a locus of interest that are more likely to have a causal role in disease • Important challenges – Different annotations can measure different properties of a variant, such as the degree of evolutionary conservation, the effect of an amino acid change on the protein function or structure in the case of coding variants, or the potential effect on regulatory elements in the case of noncoding variants. – It is not known a priori which of the different annotations is more predictive of the most relevant functional effect of a particular variant. – Another problem is that there is a high degree of correlation among annotations of the same type (for example, evolutionary conservation scores or regulatory-type annotations). – Therefore, despite their potential to be useful for identifying functional variants, most of these annotations tend to be used in a subjective manner 27 Method • An unsupervised spectral approach (Eigen) for scoring variants that does not make use of labeled training data. • Training using a large set of variants, with a diverse set of annotations for each of these variants but no label as to their functional status • Assume that the variants can be partitioned into two distinct groups, functional and non-functional (although the partition is unknown to us), and that for each annotation the distribution is a two-component mixture, corresponding to the two groups. • The key assumption in the Eigen approach is that of blockwise conditional independence between annotations given the true state of a variant (either functional or non-functional). 28 Method • Assumption implies that any correlation between annotations in different blocks is due to differences in the annotation means between functional and non-functional variants • Because of this, the correlation structure among the different functional can be used to determine how well each annotation separates functional and non-functional variants (that is, the predictive accuracy of each annotation). • Subsequently, construct a weighted linear combination of annotations, based on these estimated accuracies. • They illustrate the discriminatory ability of the proposed meta-score using numerous examples of disease-associated variants and putatively benign variants from the literature. • In addition, they consider a related but conceptually simpler meta-score, Eigen-PC, which is based on eigendecomposition of the annotation covariance matrix and uses the lead eigenvector to weight the individual annotations. 29 Method 30 Results 31 Results 32 Results 33 Thank You 34 Additional Content 35

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Review Slides