Download Review Slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Biology and consumer behaviour wikipedia , lookup

Human genome wikipedia , lookup

BRCA mutation wikipedia , lookup

Genomic imprinting wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Gene nomenclature wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Genomics wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Minimal genome wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Public health genomics wikipedia , lookup

Gene desert wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epistasis wikipedia , lookup

Gene expression programming wikipedia , lookup

Helitron (biology) wikipedia , lookup

Designer baby wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Non-coding DNA wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome (book) wikipedia , lookup

Frameshift mutation wikipedia , lookup

Mutation wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome evolution wikipedia , lookup

NEDD9 wikipedia , lookup

Microevolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Oncogenomics wikipedia , lookup

Point mutation wikipedia , lookup

Transcript
Drug target prioritization by
perturbed gene expression and
network information
Zerrin Isik, Christoph Baldow, Carlo Vittorio
Cannistraci & Michael Schroeder
Nature Scientific Reports 5, Article number: 17417 (2015)
doi:10.1038/srep17417
http://www.nature.com/articles/srep17417
Presented By:
Jaya Thomas
1
Abstract
Drugs bind to their target proteins, which interact with downstream effectors and
ultimately perturb the transcriptome of a cancer cell. These perturbations reveal
information about their source, i.e., drugs’ targets. Here, we investigate whether
these perturbations and protein interaction networks can uncover drug targets and
key pathways. We performed the first systematic analysis of over 500 drugs from the
Connectivity Map. First, we show that the gene expression of drug targets is usually
not significantly affected by the drug perturbation. Hence, expression changes after
drug treatment on their own are not sufficient to identify drug targets. However,
ranking of candidate drug targets by network topological measures prioritizes the
targets. We introduce a novel measure, local radiality, which combines perturbed
genes and functional interaction network information. The new measure outperforms
other methods in target prioritization and proposes cancer-specific pathways from
drugs to affected genes for the first time. Local radiality identifies more diverse
targets with fewer neighbors and possibly less side effects.
2
Introduction
• Drugs interact with targets and off-targets, which trigger downstream
signaling cascades causing perturbations in the cell’s transcriptome.
• The term “target” can refer either to proteins physically binding to the
drug or to proteins that are only functionally related.
• Drug-induced perturbations have been uncovered at very large scale in the
Connectivity Map (CMap) for 1300 compounds on four human cancer cell
lines.
3
Introduction
• A drug modulates the activity of a target protein, which
subsequently regulates down-stream proteins.
• Protein-protein interaction (PPI) networks provide such downstream relationships between targets and proteins by using physical
contacts, genetic interactions and functional relationships
• Questions raised:
– Can drug targets be identified from network information and expression
alterations induced by a drug?
– Does a global or a local network feature give higher target prediction accuracy?
– Does the target prediction performance depend on the definition of a target
protein?
4
Method
• A drug target prioritization method was developed by proposing a
new network measure, local radiality (lr) , which integrates both
topological data and the perturbed gene information.
• Radiality is a well-known centrality measure describing the level of
node reachability of a node via different shortest paths of the
network.
• The new measure describes the reachability of a target protein via
the shortest paths to deregulated genes. Thus, it is a new locally
constrained radiality.
5
Method
6
Method
•
•
The LR measure helps to prove the initial hypothesis that deregulated genes might
be close to drug targets in terms of the network topology.
It uses a network G and a set of deregulated genes DG as input.
•
A score of a node n in the network G is calculated as defined in the Equation 1.
•
If the network is unweighted, |sp| shows the minimum number of nodes to connect
dg and n.
•
LR utilizes both drug perturbation data (i.e., deregulated genes) and topological
information (i.e., shortest path distance).
7
Method
Other existing Network centrality measure
Stress calculates the frequency of a node in any shortest path of the network
where sps(s, t) shows the set of all shortest paths from s to t; and ca(n,sp) is
defined as follows:
Radiality shows the level of reachability of a node via different shortest paths
of the network (i.e., the closer to the rest of nodes, the easier it is to reach).
8
Method
Target Prioritization
• Gene expression and network topological data are integrated to predict all
possible drug targets.
• Drug perturbation data (i.e., control vs. treatment) is incorporated into
calculation of the topological proximity by using either gene expression
values or deregulated genes.
• The possible targets of a given drug are predicted by a sorted list according
to the closeness scores.
• Target prediction aims to eliminate as many false positive target predictions
as possible.
• Proteins predicted in the 1st percentile of the rank list are the most probable
drug targets. If a known drug target is ranked in the 1st percentile of all
possible targets, this prediction is accepted as a true positive one.
9
Results
Gene expression is not sufficient for target prediction
10
Results
11
12
Identification of significantly mutated
regions across cancer types highlights a
rich landscape of functional molecular
alterations
Carlos L Araya, Can Cenik, Jason A Reuter, Gert Kiss,
Vijay S Pande, Michael P Snyder & William J Greenleaf
Nature Genetics aop, (2015) | doi:10.1038/ng.3471
http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3471.html
13
Abstract
Cancer sequencing studies have primarily identified cancer driver genes by
the accumulation of protein-altering mutations. An improved method would
be annotation independent, sensitive to unknown distributions of functions
within proteins and inclusive of noncoding drivers. We employed densitybased clustering methods in 21 tumor types to detect variably sized
significantly mutated regions (SMRs). SMRs reveal recurrent alterations
across a spectrum of coding and noncoding elements, including
transcription factor binding sites and untranslated regions mutated in up to
~15% of specific tumor types. SMRs demonstrate spatial clustering of
alterations in molecular domains and at interfaces, often with associated
changes in signaling. Mutation frequencies in SMRs demonstrate that
distinct protein regions are differentially mutated across tumor types, as
exemplified by a linker region of PIK3CA in which biophysical simulations
suggest that mutations affect regulatory interactions. The functional
diversity of SMRs underscores both the varied mechanisms of oncogenic
misregulation and the advantage of functionally agnostic driver
identification.
14
Introduction
•
In cancer, driver mutations alter functional elements of diverse nature and size
•
A significant proportion of regulatory elements in the genome are located
proximal to or even in exons, suggesting that many may be captured by wholeexome sequencing
•
Systematic analyses of genomic regulatory activity in animals have identified
substantial tissue and developmental stage specificity, suggesting that mutations
in cancer type–specific regulatory features may be significant noncoding
drivers of cancer
•
Existing studies (TCGA, ICGC) have focused little attention on systematically
analyzing the positional distribution of coding mutations or characterizing
noncoding alterations
*TCGA: The Cancer Genome Atlas
ICGC: International Cancer Genome Consortium
15
Introduction
•
Algorithms to identify cancer driver genes often examine nonsynonymous-tosynonymous mutation rates across the gene body or recurrently mutated amino
acids called mutation hotspots
•
These analyses ignore recurrent alterations in the vast intermediate scale of
functional coding elements, such as those affecting protein subunits or
interfaces
•
To examine mutation clustering, analyses have employed windows of fixed
length or identified clusters of nonsynonymous mutations, assuming that
driver mutations exclusively influence protein sequence and ignoring the
importance of exon-embedded regulatory elements
16
Introduction
• The presented approach permitted the unbiased identification of
variably sized genomic regions recurrently altered by somatic mutation,
termed as significantly mutated regions (SMRs).
• SMRs were associated with noncoding elements, protein structures,
molecular interfaces, and transcriptional and signaling profiles, thereby
providing insight into the molecular consequences of accumulating
somatic mutations in these regions.
• SMRs identified a rich spectrum of coding and noncoding elements
recurrently targeted by somatic alterations that complement gene- and
pathway-centric analyses.
17
Method
• For each tumor type and gene, authors calculated multiple distinct mutation
probabilities.
• First, they calculated the frequency of transitions and transversions within the
mappable, exonic regions of each gene to derive ‘exonic’ mutation probabilities
for each gene in the hg19 human genome assembly using whole-exome
sequencing data.
•
For each gene and in each tumor type, they identified the set of genes most
similar in expression, replication time and GC content (gene level features)
•
Compiled expression + replication timing data and derived feature-specific
weights defined as the rank correlation between gene features and the observed
exonic mutation probabilities in each tumor type
18
Method
•
Genes were sorted sequentially on the basis of the gene feature weights, and the
neighborhood of the 500 closest genes was selected for each query gene.
•
Measured the sum of correlation-weighted, absolute feature distances between
gene pairs within the 500-gene rank neighborhood.
•
For each gene, selected the ≤200 most similar genes with a normalized distance
score ≤1. Lastly, averaged the ‘exonic’ mutation probability per
transition/transversion to derive a set of ‘matched’ mutation probabilities.
•
Bayesian framework was considered to derive posterior mutation probabilities for
each transition and transversion per gene in each of the analyzed cancer types
•
The distributions of whole-exome sequencing–derived (‘exonic’, ‘matched’ and
‘global’) as well as whole-genome sequencing–derived (‘Bayesian’) mutation
probabilities varied strongly between cancer types and among genes within
individual cancer types, highlighting the importance of such cancer- and genespecific treatment of background mutation probabilities
19
Method
Mutation cluster identification
•
Deployed density-based spatial clustering of applications with noise (DBSCAN) to detect
clusters of ≥2 SNVs within exonic domains, evaluating density reachability within ε base
pairs in each cancer type.
•
The reachability parameter ε was dynamically defined with ε = dp/ds where dp and ds refer
to the number of mutated positions (base pairs) and the base-pair size of the domain d,
thresholded to 10 ≤ ε ≤ 500 bp.
•
Detected mutation clusters were refined where subclusters of ≥2 SNVs with significantly
higher (P < 0.05, binomial test) mutation densities (mutated tumor samples per kilobase)
existed.
Mutation cluster filtering. As a final step in calling SMRs, we selected clusters with density
scores (Pdensity) at the 5% FDR threshold and that were mutated in ≥2% of samples in each
cancer type
*FDR: false-discovery rate
20
Method
Mutation cluster annotation.
•
SMRs were annotated on the basis of mutation impact on coding, transcribed and gene-associated
regions (referred to “Uniform variant annotation”).
•
For SMRs associated with multiple genes (overlapping annotations), preferentially assigned SMRs to
– (i) previously known cancer driver genes or
– (ii) the gene affected by the most severe type of mutation.
Where mutation impact was insufficient to resolve assignment to multiple genes, thus selected the gene
affected by the largest number of mutations within the SMR.
On this basis, assigned each SMR to a single gene, recording the types of mutation impacts on the gene
and the class of region affected.
•
Region classes included exon (coding region and noncoding gene), intron, splice, upstream, 5′ UTR, 3′
UTR, downstream and other (intergenic).
•
Mutation impacts (from snpEff) included, in order of severity, rare amino acid, splice site acceptor, splice
site donor, start lost, stop lost, stop gain, nonsynonymous coding, splice-site branch U12,
nonsynonymous start, nonsynonymous stop, splice-site region, splice-site branch, start gain, synonymous
coding, synonymous start, synonymous stop, noncoding gene (exon), 3′ UTR, 5′ UTR, miRNA, intron,
upstream, downstream and intergenic.
21
Results
Multiscale detection of SMRs
•
(a) 79.0% (n = 2,431,360) of these somatic
mutations do not alter protein-coding
sequences or their splicing and thus were not
previously considered in the analysis of
cancer driver mutations
•
(b) Both coding and noncoding cancer
drivers, applied an annotation-independent
density-based clustering technique to
identify 198,247 variably sized clusters of
somatic mutations within exon-proximal
domains of the human genome
•
Moreover, ~10% of genes associated with
SMRs in the quintile with the top density
scores were not found previously in a genelevel analysis. Thus, high density scores are
enriched for known cancer genes but also
nominate potentially new drivers
22
Results
SMRs implicate diverse noncoding regulatory features
•
A significant proportion (31.2%; P < 2.2 × 10−16, proportions test) of SMRs are not predicted to affect protein
sequences, highlighting the potential to discover pathological noncoding variation in whole-exome sequencing data
23
Results
Sought to identify SMRs that might affect the molecular interfaces of protein-protein and DNAprotein interactions, a recognized yet understudied mechanism of cancer driver mutation
Examined intermolecular distances between SMR residues and interacting proteins or DNA and
identified 17 SMRs that likely alter molecular interfaces
24
A spectral approach integrating functional
genomic annotations for coding and
noncoding variants
Iuliana Ionita-Laza, Kenneth McCallum, Bin Xu &
Joseph D Buxbaum
Nature Genetics 48, 214–220 (2016) doi:10.1038/ng.3477
http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3477.html
25
Abstract
Over the past few years, substantial effort has been put into the
functional annotation of variation in human genome sequences. Such
annotations can have a critical role in identifying putatively causal
variants for a disease or trait among the abundant natural variation that
occurs at a locus of interest. The main challenges in using these various
annotations include their large numbers and their diversity. Here we
develop an unsupervised approach to integrate these different
annotations into one measure of functional importance (Eigen) that,
unlike most existing methods, is not based on any labeled training data.
We show that the resulting meta-score has better discriminatory ability
using disease-associated and putatively benign variants from published
studies (in both coding and noncoding regions) than the recently
proposed CADD score. Across varied scenarios, the Eigen score
performs generally better than any single individual annotation,
representing a powerful single functional score that can be incorporated
in fine-mapping studies.
26
Introduction
•
Annotations are important because they can
– help predict the functional effect of a variant, and
– can be further combined with population-level genetic data to identify the variants at a
locus of interest that are more likely to have a causal role in disease
•
Important challenges
– Different annotations can measure different properties of a variant, such as the degree of
evolutionary conservation, the effect of an amino acid change on the protein function or
structure in the case of coding variants, or the potential effect on regulatory elements in
the case of noncoding variants.
– It is not known a priori which of the different annotations is more predictive of the most
relevant functional effect of a particular variant.
– Another problem is that there is a high degree of correlation among annotations of the
same type (for example, evolutionary conservation scores or regulatory-type
annotations).
– Therefore, despite their potential to be useful for identifying functional variants, most of
these annotations tend to be used in a subjective manner
27
Method
• An unsupervised spectral approach (Eigen) for scoring variants that does
not make use of labeled training data.
• Training using a large set of variants, with a diverse set of annotations for
each of these variants but no label as to their functional status
• Assume that the variants can be partitioned into two distinct groups,
functional and non-functional (although the partition is unknown to us),
and that for each annotation the distribution is a two-component mixture,
corresponding to the two groups.
• The key assumption in the Eigen approach is that of blockwise
conditional independence between annotations given the true state of a
variant (either functional or non-functional).
28
Method
•
Assumption implies that any correlation between annotations in different blocks is
due to differences in the annotation means between functional and non-functional
variants
•
Because of this, the correlation structure among the different functional can be used to
determine how well each annotation separates functional and non-functional variants
(that is, the predictive accuracy of each annotation).
•
Subsequently, construct a weighted linear combination of annotations, based on these
estimated accuracies.
•
They illustrate the discriminatory ability of the proposed meta-score using numerous
examples of disease-associated variants and putatively benign variants from the
literature.
•
In addition, they consider a related but conceptually simpler meta-score, Eigen-PC,
which is based on eigendecomposition of the annotation covariance matrix and uses
the lead eigenvector to weight the individual annotations.
29
Method
30
Results
31
Results
32
Results
33
Thank You
34
Additional Content
35