Download - Wiley Online Library

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nucleic acid double helix wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

Minimal genome wikipedia , lookup

Cancer epigenetics wikipedia , lookup

DNA supercoil wikipedia , lookup

Oncogenomics wikipedia , lookup

Transposable element wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Molecular cloning wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Gene wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

DNA vaccination wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Genome evolution wikipedia , lookup

Epigenomics wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Genetic engineering wikipedia , lookup

Point mutation wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Designer baby wikipedia , lookup

Metagenomics wikipedia , lookup

Human genome wikipedia , lookup

Microevolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Non-coding DNA wikipedia , lookup

Primary transcript wikipedia , lookup

NEDD9 wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genomic library wikipedia , lookup

History of genetic engineering wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

CRISPR wikipedia , lookup

Genome editing wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Transcript
STATE-OF-THE-ART REVIEW
Genetic screens and functional genomics using CRISPR/
Cas9 technology
Ella Hartenian1 and John G. Doench2
1 Department of Molecular and Cellular Biology, University of California Berkeley, Berkeley, CA, USA
2 Broad Institute of MIT and Harvard, Cambridge, MA, USA
Keywords
Cas9; CRISPR; functional genomics; gene
editing; genetic screen; off-target effects; ontarget activity; pooled screens; RNAi; sgRNA
Correspondence
J. Doench, Broad Institute of MIT and
Harvard, Cambridge, MA, 02142-1403, USA
Fax: +1 617 800 1766
Tel: +1 617 714 7260
E-mail: [email protected]
(Received 19 December 2014, revised 1
February 2015, accepted 23 February 2015)
Functional genomics attempts to understand the genome by perturbing the
flow of information from DNA to RNA to protein, in order to learn how
gene dysfunction leads to disease. CRISPR/Cas9 technology is the newest
tool in the geneticist’s toolbox, allowing researchers to edit DNA with
unprecedented ease, speed and accuracy, and representing a novel means to
perform genome-wide genetic screens to discover gene function. In this
review, we first summarize the discovery and characterization of CRISPR/
Cas9, and then compare it to other genome engineering technologies. We
discuss its initial use in screening applications, with a focus on optimizing
on-target activity and minimizing off-target effects. Finally, we comment on
future challenges and opportunities afforded by this technology.
doi:10.1111/febs.13248
Introduction
Protection of a host genome from invading genetic
material is a fundamental challenge faced by all organisms. In prokaryotes, a classical example of a solution
to this problem is use of restriction enzymes to recognize and cut specific sequences in invasive DNA, leading to their degradation. Remarkably, an additional
solution present in a wide variety of prokaryotes went
undetected until very recently: the CRISPR adaptive
immune system. Just as the discovery and re-purposing
of restriction enzymes ushered in the recombinant
DNA revolution in the 1970s, novel methods of engineering the CRISPR system have already resulted in
new tools and techniques that have dramatically accelerated our ability to functionally annotate the genome.
Introduction to CRISPR
Analysis of the Escherichia coli genome in 1987
revealed loci containing repeat sequences with
unknown function [1]. When similarly structured loci
were found in other prokaryotes, the acronym
CRISPR (clustered regularly interspaced short palindromic repeats) was coined [2]. Several years later,
in silico analysis showed that portions of these
sequences mapped to viral and phage genomes, suggesting a role in host genome defense, and, by 2007,
functional studies had validated this hypothesis [3–6].
CRISPR loci have now been identified and characterized in a wide range of both bacteria and archaea, and
function as an adaptive immune system.
As their name suggests, CRISPR loci are organized as
repetitive arrays, with variable sequences of approximately 20 nucleotides known as spacers preceding a relatively invariant common sequence. Endogenous
CRISPR loci are classified into three groups based on
the presence of specific CRISPR-associated (Cas) protein-coding genes located in close proximity to the repetitive arrays [7]. Because both type I and type III
Abbreviations
Cas, CRISPR-associated (protein); CRISPR, clustered regularly interspaced short palindromic repeats; crRNA, CRISPR RNA; DSB, doublestrand DNA break; HDR, homology-directed repair; NHEJ, non-homologous end joining; PAM, protospacer adjacent motif; sgRNA, single
guide RNA; shRNA, short hairpin RNA; tracrRNA, trans-activating crRNA.
FEBS Journal 282 (2015) 1383–1393 ª 2015 FEBS
1383
Use of CRISPR/Cas9 technology
CRISPR systems require multiple proteins to find and
cut DNA, they are not readily amenable to re-purposing.
In contrast, the type II CRISPR systems from Streptococcus thermophilus and Streptococcus pyogenes have
been extensively studied because only a single protein,
Cas9, is required to locate target sequences and perform
DNA cleavage; Cas9 of S. pyogenes, in particular, is the
most widely used due to its minimal additional sequence
requirements, as discussed below [8–13].
Cas proteins perform the two distinct functions of
CRISPR loci: spacer acquisition and target cutting.
Spacer sequences arise from the recognition, processing and insertion of invading genetic material into the
CRISPR loci, serving as a molecular memory to protect from subsequent invasion by the same vector. The
target-cutting phase begins with transcription of the
repetitive array, followed by processing of the spacers
into individual CRISPR RNAs (crRNAs). The processed crRNA hybridizes to a separately transcribed
trans-activating crRNA (tracrRNA), which together
associate with Cas9 and are sufficient for nuclease
activity. The crRNA guides Cas9 to complementary
DNA, which is cleaved by the RuvC and HNH nuclease domains of the Cas9 enzyme [10–12].
The three components necessary for target cutting
(Cas9 protein, tracrRNA and crRNA) may be programmed to cut a DNA sequence of interest simply by
altering the 20 nt variable spacer sequence of the
crRNA [11,12] (Fig. 1). The only sequence constraint
is the presence of a protospacer adjacent motif (PAM)
immediately proximal to the region of crRNA homology on the DNA target; this sequence is NGG for
S. pyogenes Cas9, but other Cas9 proteins have
Fig. 1. Schematic of interaction between sgRNA and DNA. The
crRNA sequence (blue) and tracrRNA sequence (green) are fused
together by a short loop (purple) to create an sgRNA. The 20
underlined nucleotides can be programmed to recognize any DNA
sequence of interest. For S. pyogenes Cas9, the NGG PAM is
required immediately downstream of the target site. The two
strands of DNA are cut by the HNH and RuvC nuclease domains of
Cas9.
1384
E. Hartenian and J. G. Doench
alternative PAM requirements [12,14]. This system
may be further simplified by fusion of the tracrRNA
and crRNA at the DNA level by inclusion of a linker
sequence resulting in transcription of a single guide
RNA (sgRNA) encoding the functionalities of both
RNAs [12]. Expression of these two components (Cas9
and sgRNA) is both necessary and sufficient for applying CRISPR technology to a wide variety of biological
contexts, including human cells [15–17] and mouse
models [18].
Genome editing
For genome editing purposes, generation of a targeted
double-strand DNA break (DSB) is the key event that
opens up multiple repair options both for the cell and
the genome engineer [19,20]. In mammalian cells, such
breaks are generally repaired by one of two pathways,
homology-directed repair (HDR) or non-homologous
end joining (NHEJ). Cells use NHEJ more frequently
than HDR because the latter requires a template
homologous to the regions flanking the break. In all
phases of the cell cycle other than S phase, a homologous region of the chromosome is rarely in close
enough proximity to act as this template, and thus
NHEJ acts as a stop gap to quickly repair the break
and maintain chromosomal integrity. NHEJ is an
error-prone process that uses ligases, nucleases and
polymerases to reseal a break, and generally results in
nucleotides being inserted or deleted (indels) in an
unpredictable process. If the break occurs in a proteincoding region, these indels will often result in a frameshift mutation and subsequent premature stop codon,
abrogating the protein’s function. A properly targeted
DSB engineered in a coding sequence of a protein of
interest may thus result in a loss-of-function allele. As
the efficacy of this process varies based on the target
site and experimental conditions, a mixture of homozygous null, heterozygous and unmodified cells will be
present, and this heterogeneity must be taken into
account when interpreting observed phenotypes.
Alternatively, a DSB may be repaired via HDR by
using homologous DNA as a template. Researchers
may co-opt a cell’s ability to perform high-fidelity repair
to introduce transgenes, tags of interest, or even single
base changes by providing an exogenous template,
either in the form of a short single-strand oligonucleotide or a longer double-strand DNA template, such as a
plasmid. When this DNA contains complementarity to
both sides of the broken DNA, a proportion of cells will
repair the DSB by using the exogenous DNA as a template, thereby incorporating the new information into
the chromosome in a permanent manner. While the
FEBS Journal 282 (2015) 1383–1393 ª 2015 FEBS
E. Hartenian and J. G. Doench
mechanism of HDR is generally understood, it is not
clear how to bias the cell to prefer this pathway over
NHEJ. This remains an active area of research, as current rates of incorporation of templates are often in single digits [21].
Earlier technologies to induce site-specific DNA
breaks, such as zinc-finger nucleases and transcription
activator-like effector nucleases, created DNA targeting specificity by concatenating protein modules, which
were then tethered to the DNA endonuclease domain
of the restriction enzyme FokI. While zinc-finger nucleases and transcription activator-like effector nucleases continue to be important technologies, they are
expensive to access commercially and are laborious to
construct in-house, especially for laboratories that do
not specialize in genome editing technologies. Thus,
one attribute of CRISPR technology that sets it apart
from other genome editing technologies is the simplicity of creating the targeting construct: cloning a short
DNA oligonucleotide into an appropriate plasmid creates a functional CRISPR system.
Introduction to genetic screens
The discovery that CRISPR technology may be easily
adapted to work in mammalian cells and delivered by
lentiviral vectors led to the possibility of developing
CRISPR as a new screening technology. Functional
genetic screens have been performed using previous
technologies for many years, and many of the lessons
and challenges from these screens apply to CRISPR. It
is therefore instructive to briefly review these previous
approaches.
The discovery of RNA interference (RNAi) a decade
ago suggested the possibility of determining loss-offunction phenotypes systematically across many cell
types. Here, delivery of a double-stranded RNA of
approximately 21 nt triggers a pathway that is endogenous to most eukaryotic cells, in which a protein of
the Argonaute family uses one strand of the exogenous
RNA as a guide to cleave and thereby degrade complementary mRNAs. Many discoveries have been
made with RNAi-based screening, using small interfering RNAs directly or DNA-encoded transcriptional
units that produce short hairpin RNAs (shRNAs) or
long microRNA-like transcripts. The technology has
continued to improve, such that effective knockdown
is usually easily achieved [22]. While the successes are
far too numerous to list here, RNAi also has an inherent weakness that arises when the introduced small
RNA enters the microRNA pathway and affects the
expression of many mRNAs that have a complementary ‘seed’ region rather than the full 21 nt of compleFEBS Journal 282 (2015) 1383–1393 ª 2015 FEBS
Use of CRISPR/Cas9 technology
mentarity [23–25]. While off-target effects have long
been recognized as a potential confounder in RNAi
experiments, their pervasiveness only started to be
widely appreciated recently [26–29]. Given that any
single RNAi reagent may function in both on- and
off-target pathways, relying on a single RNA sequence
as a surrogate for a gene may lead to erroneous conclusions. Proper interpretation of RNAi experiments
requires that multiple independent sequences targeting
the same gene result in the same phenotype; this
remains true for CRISPR technology [25,30].
Another important aspect of genetic screens, first
with RNAi and now with CRISPR, is that the targeting information is contained in a short stretch of nucleotides that may easily be created on a large scale
using oligonucleotide synthesis technology. This allows
production of large pooled libraries rapidly and less
expensively than an arrayed library of individual
clones (Fig. 2). Pooled libraries comprising many perturbations are then introduced into retroviral vectors
through traditional cloning methods. Retroviral vectors, including lentivirus, have the important feature of
stably incorporating the DNA sequence of the perturbation into the genome of the host cell, resulting in a
permanent, heritable genomic modification. Screening
all perturbations in a pooled format relies on the ability to physically separate cells with the intended phenotype from the remainder of the population.
Pooled screening assays fall into two broadly defined
types: positive and negative selection screens. The simplest example of a negative selection phenotype is cell
viability: perturbations that cause the cell to die are
depleted from the population over time. Such an
approach has been used to profile genes that are
broadly essential to cell viability, as well as to find
genes that are essential to specific cell types [31,32].
This has been especially useful in the context of cancer
biology to identify specific vulnerabilities of cancer
cells from a certain lineage or carrying a particular
oncogene, [33,34]. A limiting feature of negative selection assays is that, by definition, a perturbation may
only be depleted to the extent that it was present in
the starting library. For example, in a library of
100 000 perturbations, each perturbation is present at
10 parts per million (p.p.m.) on average. Therefore,
the most it may be depleted in a screen is down to
0 p.p.m.. This means that negative selection screens
may have more signal when the library size is small
with a corresponding higher relative abundance of
each perturbation. Larger libraries may thus be broken
into smaller sub-pools, or a more targeted library may
be used as a follow-up to the initial screen. Positive
selection screens, however, rely on enrichment of cells,
1385
Use of CRISPR/Cas9 technology
E. Hartenian and J. G. Doench
Fig. 2. Overview of pooled screening. Oligonucleotides are
synthesized in array format (1), cloned into a plasmid library (2),
and then packaged into a retroviral vector (3). Cells are then
infected en masse, typically at low multiplicity of infection, such
that most cells receive either zero or one viral insert (4). Selective
pressure is then applied to either enrich or deplete particular
perturbations (5). Genomic DNA is then harvested and PCRamplified using primers that add the necessary adaptors (6) for
next-generation sequencing (7). The abundance of each sgRNA
under each condition (initial abundance, positive selection, negative
selection, etc.) is the final output of the screen and the starting
point for analysis (8).
1. Oligo synthesis
2. Plasmid pool
3. Virus production
4. Introduce perturbations
to cells
Positive selection:
enrichment of clones
Negative selection:
loss of essential genes
5. Conduct a screen with positive or negative
selection
gDNA vector sgRNA vector gDNA
6. PCR genomic DNA
7. Next generation
sequencing
sgRNA Init.
16
Pos.
0
Neg.
0
19
100
19
17
0
18
18
0
26
14
0
5
14
0
0
8. Matrix of sgRNA
abundances
1386
and therefore have a much larger dynamic range: in
the same library, a single perturbation may enrich
from 10 p.p.m. to 100% of the final population, representing a 100 000-fold enrichment. Because of this,
positive selection screens often yield more easily interpreted results than negative selection screens. Examples of positive selection screens include growth assays
to rescue from a drug that is toxic to the starting cell
population, and flow cytometry assays to purify a population of cells that modulates expression of a sortable
marker.
Regardless of the direction of selection, the identity
of the perturbations in the population(s) must be
determined following physical separation of cells into
two or more populations. Previously, this was often
performed using hybridization techniques such as
microarrays, but, more recently, massively parallel
sequencing has become the method of choice as costs
have dramatically fallen. Constant vector sequences
are used as priming sites for PCR to amplify the perturbations from viral integration sites in genomic
DNA, and the resulting pool of products is sequenced.
Here the perturbation of interest (shRNA or sgRNA)
acts as its own ‘barcode’ as each sequence is unique
and may be mapped to the original gene without any
additional sequence information. The PCR primers
themselves may also incorporate an additional ‘barcode’ to label samples arising from different conditions, allowing multiple independent cell populations
to be pooled into the same sequencing lane to minimize cost. Care must be taken to avoid introducing
bias during the amplification and sequencing steps,
both in terms of use of optimized protocols and inclusion of the appropriate control populations to correct
for these biases when they do occur [35]. Sequencing
produces raw read counts of each perturbation, and
thus the end of a screen (and the start of data analysis)
is a matrix of read counts where the relative abundance of each perturbation is determined for each
sample.
FEBS Journal 282 (2015) 1383–1393 ª 2015 FEBS
Use of CRISPR/Cas9 technology
E. Hartenian and J. G. Doench
Fig. 2. Overview of pooled screening. Oligonucleotides are
synthesized in array format (1), cloned into a plasmid library (2),
and then packaged into a retroviral vector (3). Cells are then
infected en masse, typically at low multiplicity of infection, such
that most cells receive either zero or one viral insert (4). Selective
pressure is then applied to either enrich or deplete particular
perturbations (5). Genomic DNA is then harvested and PCRamplified using primers that add the necessary adaptors (6) for
next-generation sequencing (7). The abundance of each sgRNA
under each condition (initial abundance, positive selection, negative
selection, etc.) is the final output of the screen and the starting
point for analysis (8).
1. Oligo synthesis
2. Plasmid pool
3. Virus production
4. Introduce perturbations
to cells
Positive selection:
enrichment of clones
Negative selection:
loss of essential genes
5. Conduct a screen with positive or negative
selection
gDNA vector sgRNA vector gDNA
6. PCR genomic DNA
7. Next generation
sequencing
sgRNA Init.
16
Pos.
0
Neg.
0
19
100
19
17
0
18
18
0
26
14
0
5
14
0
0
8. Matrix of sgRNA
abundances
1386
and therefore have a much larger dynamic range: in
the same library, a single perturbation may enrich
from 10 p.p.m. to 100% of the final population, representing a 100 000-fold enrichment. Because of this,
positive selection screens often yield more easily interpreted results than negative selection screens. Examples of positive selection screens include growth assays
to rescue from a drug that is toxic to the starting cell
population, and flow cytometry assays to purify a population of cells that modulates expression of a sortable
marker.
Regardless of the direction of selection, the identity
of the perturbations in the population(s) must be
determined following physical separation of cells into
two or more populations. Previously, this was often
performed using hybridization techniques such as
microarrays, but, more recently, massively parallel
sequencing has become the method of choice as costs
have dramatically fallen. Constant vector sequences
are used as priming sites for PCR to amplify the perturbations from viral integration sites in genomic
DNA, and the resulting pool of products is sequenced.
Here the perturbation of interest (shRNA or sgRNA)
acts as its own ‘barcode’ as each sequence is unique
and may be mapped to the original gene without any
additional sequence information. The PCR primers
themselves may also incorporate an additional ‘barcode’ to label samples arising from different conditions, allowing multiple independent cell populations
to be pooled into the same sequencing lane to minimize cost. Care must be taken to avoid introducing
bias during the amplification and sequencing steps,
both in terms of use of optimized protocols and inclusion of the appropriate control populations to correct
for these biases when they do occur [35]. Sequencing
produces raw read counts of each perturbation, and
thus the end of a screen (and the start of data analysis)
is a matrix of read counts where the relative abundance of each perturbation is determined for each
sample.
FEBS Journal 282 (2015) 1383–1393 ª 2015 FEBS
Use of CRISPR/Cas9 technology
essential genes were classified by their ability to deplete
from a population of cells as a surrogate for their ability to induce null alleles. These two approaches gave
similar results, and showed that the choice of sgRNA
sequence may be optimized for greater activity. The
next study to address this issue utilized a tiling
approach to create libraries covering all possible target
sites for a selection of mouse and human cell-surface
markers [47]. Flow cytometry allowed direct detection
and isolation of sgRNAs that led to bi-allelic loss-offunction of the protein of interest. This set of over
1800 sgRNAs of varying efficacy was then used to
characterize sequence features that led to increased
activity. Interestingly, an extended PAM sequence
beyond the canonical NGG motif was shown to affect
activity, with CGGH being the most-optimal sequence
(where H = A, C or T). This quantitative analysis led
to a predictive model for designing optimal sgRNA
sequences for any target of interest.
Initial studies have shown that there are sequence
features that affect the ability of Cas9 to bind sgRNAs, cleave DNA, and result in a loss-of-function
allele. Further characterization of these rules, and
incorporation of them into library design, will be critical for successful deployment of genome-wide libraries,
and will result in libraries with progressively higher
fractions of active sgRNAs. Indeed, a critical analytical step in determining hits from a screen is the
requirement for multiple independent sgRNAs targeting the same gene to score; thus, sgRNAs with low
activity increase the false-negative rates of a screen.
While one approach is simply to increase the size of a
library to increase statistical power, this comes with
the trade-off of requiring more cells for a screen. Larger libraries increase the cost of a screen, and, for
many biological models, such as use of primary cells
or in vivo screens, increased cell numbers simply may
not be feasible.
Minimizing off-target activity
While optimizing on-target activity has clear ramifications for use of CRISPR technology, understanding
off-target effects is equally important to avoid erroneous interpretation of experimental results. Initial
experiments in mammalian cells showed that S. pyogenes Cas9 cleavage activity tolerates a number of mismatches between the sgRNA and the DNA target
[48,49]. In general, mismatches closer to the 50 end of
the RNA are more tolerated than mismatches close to
the PAM. Currently, there are not enough data to create fully predictive models of when an sgRNA will
lead to appreciable levels of off-target DNA cleavage,
1388
E. Hartenian and J. G. Doench
as the exact base composition of the mismatch appears
to affect activity [48,50]. Further, non-symmetric pairing between the RNA and DNA, e.g. creation of
bulges on either side of the duplex, is also tolerated by
Cas9 [51]. These interactions are particularly troubling
as current rapid alignment algorithms that scan for
potential off-target sites are blind to these interactions,
meaning that all current design software ignores this
type of potential off-target activity. Given the tolerance of mismatches by Cas9 and the likelihood of finding certain sequence motifs in multiple locations in the
genome, it is essential to consider possible off-target
locations when designing sgRNA sequences.
Several strategies have been developed to minimize
the off-target effects of Cas9. One uses a shortened
sgRNA of 17 nt, rather than the standard 20 nt
sequence; under the conditions tested, the 17 nt
sequence was shown to have comparable on-target
activity but decreased tolerance to mismatches [52]. To
date, no large-scale libraries have been created using
shorter sgRNAs, and this will be an important test of
the generalizability of the efficacy of this approach. An
additional approach borrows conceptually from zincfinger and transcription activator-like effector nuclease
designs, by requiring two sgRNAs to find target sites
in close proximity in the genome. One version of this
approach involves a ‘nickase’ version of Cas9 (Cas9n),
where one of the two DNA cleavage domains is inactivated, and thus each sgRNA performs a single-strand
cut [53]. When two nicks occur physically close to each
other on each strand, the result is a DSB. This design
minimizes off-target modifications, as single nicks in
genomic DNA at off-target sites are often repaired
without introduction of indels. Similarly, monomers of
FokI may be fused to dCas9 such that two sgRNA
sequences are required to reconstitute active FokI
activity and initiate a DSB [54]. Off-target rates have
been shown to decrease > 1000 fold using these
approaches. The need to deliver two sgRNA sequences
may limit the utility of this approach for large-scale
screens, but this approach is preferred in cases where
specificity is especially important (e.g. the creation of
edited cell lines).
A series of genome-wide chromatin immunoprecipitation experiments followed by massively parallel
DNA sequencing (ChIP-seq) of dCas9 programmed
with various sgRNAs has shown that Cas9 is more
likely to associate with DNA in an open chromatin
state [55,56]. Surprisingly, a small seed region of 5–8
nucleotides was shown to be sufficient to allow binding
of Cas9 to hundreds of DNA target sites, raising the
possibility of rampant off-target effects. Importantly,
however, the same sites did not show evidence of
FEBS Journal 282 (2015) 1383–1393 ª 2015 FEBS
E. Hartenian and J. G. Doench
cutting when nuclease-active Cas9 was used, reaffirming that near-perfect sequence complementarity is
required for robust endonuclease activity. In vitro data
have also shown that Cas9 first interrogates PAM sites
in DNA and then spends progressively more time
associated with DNA containing 30 ?50 complementarity to the sgRNA sequence [57]. These data, combined
with the specificity shown by CRISPRa and CRISPRi
studies discussed above, suggest that the initial step(s)
of Cas9 target-site recognition, while of a time scale
long enough to be detected by ChIP-seq, are largely
transient in nature unless there is complementarity
throughout the length of the sgRNA.
Two studies have performed whole-genome sequencing of cells following editing with Cas9 [58,59]. While
this approach is not feasible as a routine test, and the
number of unique sgRNA sequences surveyed was limited, the degree of editing at potential off-target sites
was no greater than at other unrelated sites in the genome. Importantly, these studies suggest that mutations
that arise during clonal expansion of single cells are as
much a concern as any off-target effects of Cas9.
Analogous to the requirement for multiple independent sgRNAs targeting a gene to score before considering that gene as validated, experiments that rely on
edited single cell clones should utilize multiple independent clones to decrease the chances of a random
mutation interfering with interpretation.
Continued development and future
challenges
The amount of progress made with CRISPR technology in the past several years is staggering, and the
future offers many opportunities and challenges. For
example, Cas9 from S. pyogenes was one of the first
Cas9 enzymes studied and continues to be extensively
used. While a small number of Cas9 proteins from
other species have begun to be explored, there are
thousands of Cas9 proteins endogenous to various
prokaryotes that may be better suited to a variety of
purposes. S. pyogenes Cas9 is quite large and thus is
difficult to efficiently package into many viral vectors.
Smaller Cas9 proteins have been described but are not
fully developed for widespread use in screening and
routine genome editing. Additional considerations
include the tolerance for appending additional functionalities, such as fusions with transcriptional
domains, as well as the ability to fold and localize efficiently in mammalian cells. Likewise, the search space
of the genome available to Cas9 remains an open
question, as the ChIP-seq experiments discussed above
suggest the possibility that sequences in closed
FEBS Journal 282 (2015) 1383–1393 ª 2015 FEBS
Use of CRISPR/Cas9 technology
chromatin may be more refractory to modification.
Furthermore, the NGG PAM requirement of S. pyogenes Cas9 limits the sequence space accessible for targeting, which becomes increasingly problematic when
the location of editing is of utmost importance, such
as when knocking in a reporter construct at the N- or
C-terminus of a protein, or modeling a specific SNP.
It is worth noting, however, that S. pyogenes Cas9
may soon become fixed in its usage in practice, as any
new Cas9 protein will require a great deal of specificity
and activity characterization before it may be used
appropriately.
The RNA component of the CRISPR system will
also undergo further refinement. Through creation of
larger datasets and enhanced modeling techniques, the
variable sequence of the sgRNA will continue to
undergo design optimization to increase on-target
activity and avoid off-target effects. Likewise, the tracrRNA-derived constant region of the sgRNA that
associates with Cas9 tolerates mutations in some
regions, thus opening the possibility that novel
sequence modifications may be found that increase the
efficacy of this module [45,60]. Library design and testing lends itself to rapid cycles of improvement and
innovation when pooled screens are a possibility.
However, assays such as gene expression or imagebased profiling require arrayed reagents, i.e. a library
of one perturbation per well. Here, achieving full penetrance in the population of cells that receive the perturbation is critical. Further, as the costs of producing
such arrayed libraries are an order of magnitude more
expensive than for pooled libraries, a careful evaluation of the robustness of CRISPR technology for these
approaches is required.
Despite improvements in creating site-specific
dsDNA breaks, first with zinc-finger nucleases, then
transcription activator-like effector nucleases, and now
with Cas9, site-specific incorporation of an exogenously provided DNA sequence remains a highly inefficient process (< 1% efficiency) in many cell types.
While strong selection may be used for certain types of
edits, such as knocking in a fluorescent protein and
using flow cytometry to enrich for edited cells, many
desired gene editing events do not lend themselves to
any means of physically separating the small fraction
of edited cells from the remainder of the population.
For example, it would be highly desirable to model
single base mutations discovered in cancer genome
sequencing at high throughput [61]. Likewise, the
majority of significant associations from genome-wide
association studies are found in non-protein-coding
regions of the genome, and thus gene editing of the
endogenous locus is the only way to fully understand
1389
Use of CRISPR/Cas9 technology
essential genes were classified by their ability to deplete
from a population of cells as a surrogate for their ability to induce null alleles. These two approaches gave
similar results, and showed that the choice of sgRNA
sequence may be optimized for greater activity. The
next study to address this issue utilized a tiling
approach to create libraries covering all possible target
sites for a selection of mouse and human cell-surface
markers [47]. Flow cytometry allowed direct detection
and isolation of sgRNAs that led to bi-allelic loss-offunction of the protein of interest. This set of over
1800 sgRNAs of varying efficacy was then used to
characterize sequence features that led to increased
activity. Interestingly, an extended PAM sequence
beyond the canonical NGG motif was shown to affect
activity, with CGGH being the most-optimal sequence
(where H = A, C or T). This quantitative analysis led
to a predictive model for designing optimal sgRNA
sequences for any target of interest.
Initial studies have shown that there are sequence
features that affect the ability of Cas9 to bind sgRNAs, cleave DNA, and result in a loss-of-function
allele. Further characterization of these rules, and
incorporation of them into library design, will be critical for successful deployment of genome-wide libraries,
and will result in libraries with progressively higher
fractions of active sgRNAs. Indeed, a critical analytical step in determining hits from a screen is the
requirement for multiple independent sgRNAs targeting the same gene to score; thus, sgRNAs with low
activity increase the false-negative rates of a screen.
While one approach is simply to increase the size of a
library to increase statistical power, this comes with
the trade-off of requiring more cells for a screen. Larger libraries increase the cost of a screen, and, for
many biological models, such as use of primary cells
or in vivo screens, increased cell numbers simply may
not be feasible.
Minimizing off-target activity
While optimizing on-target activity has clear ramifications for use of CRISPR technology, understanding
off-target effects is equally important to avoid erroneous interpretation of experimental results. Initial
experiments in mammalian cells showed that S. pyogenes Cas9 cleavage activity tolerates a number of mismatches between the sgRNA and the DNA target
[48,49]. In general, mismatches closer to the 50 end of
the RNA are more tolerated than mismatches close to
the PAM. Currently, there are not enough data to create fully predictive models of when an sgRNA will
lead to appreciable levels of off-target DNA cleavage,
1388
E. Hartenian and J. G. Doench
as the exact base composition of the mismatch appears
to affect activity [48,50]. Further, non-symmetric pairing between the RNA and DNA, e.g. creation of
bulges on either side of the duplex, is also tolerated by
Cas9 [51]. These interactions are particularly troubling
as current rapid alignment algorithms that scan for
potential off-target sites are blind to these interactions,
meaning that all current design software ignores this
type of potential off-target activity. Given the tolerance of mismatches by Cas9 and the likelihood of finding certain sequence motifs in multiple locations in the
genome, it is essential to consider possible off-target
locations when designing sgRNA sequences.
Several strategies have been developed to minimize
the off-target effects of Cas9. One uses a shortened
sgRNA of 17 nt, rather than the standard 20 nt
sequence; under the conditions tested, the 17 nt
sequence was shown to have comparable on-target
activity but decreased tolerance to mismatches [52]. To
date, no large-scale libraries have been created using
shorter sgRNAs, and this will be an important test of
the generalizability of the efficacy of this approach. An
additional approach borrows conceptually from zincfinger and transcription activator-like effector nuclease
designs, by requiring two sgRNAs to find target sites
in close proximity in the genome. One version of this
approach involves a ‘nickase’ version of Cas9 (Cas9n),
where one of the two DNA cleavage domains is inactivated, and thus each sgRNA performs a single-strand
cut [53]. When two nicks occur physically close to each
other on each strand, the result is a DSB. This design
minimizes off-target modifications, as single nicks in
genomic DNA at off-target sites are often repaired
without introduction of indels. Similarly, monomers of
FokI may be fused to dCas9 such that two sgRNA
sequences are required to reconstitute active FokI
activity and initiate a DSB [54]. Off-target rates have
been shown to decrease > 1000 fold using these
approaches. The need to deliver two sgRNA sequences
may limit the utility of this approach for large-scale
screens, but this approach is preferred in cases where
specificity is especially important (e.g. the creation of
edited cell lines).
A series of genome-wide chromatin immunoprecipitation experiments followed by massively parallel
DNA sequencing (ChIP-seq) of dCas9 programmed
with various sgRNAs has shown that Cas9 is more
likely to associate with DNA in an open chromatin
state [55,56]. Surprisingly, a small seed region of 5–8
nucleotides was shown to be sufficient to allow binding
of Cas9 to hundreds of DNA target sites, raising the
possibility of rampant off-target effects. Importantly,
however, the same sites did not show evidence of
FEBS Journal 282 (2015) 1383–1393 ª 2015 FEBS
E. Hartenian and J. G. Doench
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
thermophilus CRISPR/Cas system provides immunity in
Escherichia coli. Nucleic Acids Res 39, 9275–9282.
Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna
JA & Charpentier E (2012) A programmable dualRNA-guided DNA endonuclease in adaptive bacterial
immunity. Science 337, 816–821.
Gasiunas G, Barrangou R, Horvath P & Siksnys V (2012)
Cas9-crRNA ribonucleoprotein complex mediates specific
DNA cleavage for adaptive immunity in bacteria. Proc
Natl Acad Sci USA 109, E2579–E2586.
Esvelt KM, Mali P, Braff JL, Moosburner M, Yaung
SJ & Church GM (2013) Orthogonal Cas9 proteins for
RNA-guided gene regulation and editing. Nat Methods
10, 1116–1121.
Mali P, Yang L, Esvelt KM, Aach J, Guell M, DiCarlo
JE, Norville JE & Church GM (2013) RNA-guided
human genome engineering via Cas9. Science 339,
823–826.
Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N,
Hsu PD, Wu X, Jiang W, Marraffini LA et al. (2013)
Multiplex genome engineering using CRISPR/Cas
systems. Science 339, 819–823.
Jinek M, East A, Cheng A, Lin S, Ma E & Doudna J
(2013) RNA-programmed genome editing in human
cells. eLife 2, e00471–e00471.
Yang H, Wang H, Shivalila CS, Cheng AW, Shi L &
Jaenisch R (2013) One-step generation of mice carrying
reporter and conditional alleles by CRISPR/Casmediated genome engineering. Cell 154, 1370–1379.
Heyer W-D, Ehmsen KT & Liu J (2010) Regulation of
homologous recombination in eukaryotes. Annu Rev
Genet 44, 113–139.
Renkawitz J, Lademann CA & Jentsch S (2014)
Mechanisms and principles of homology search during
recombination. Nat Rev Mol Cell Biol 15, 369–383.
Yang L, G€
uell M, Byrne S, Yang JL, De Los Angeles
A, Mali P, Aach J, Kim-Kiselak C, Briggs AW, Rios X
et al. (2013) Optimization of scarless human stem cell
genome editing. Nucleic Acids Res 41, 9049–9061.
Knott SRV, Maceli AR, Erard N, Chang K, Marran
K, Zhou X, Gordon A, El Demerdash O, Wagenblast
E, Kim S et al. (2014) A computational algorithm to
predict shRNA potency. Mol Cell 56, 796–807.
Jackson AL, Burchard J, Schelter J, Chau BN, Cleary
M, Lim L & Linsley PS (2006) Widespread siRNA ‘offtarget’ transcript silencing mediated by seed region
sequence complementarity. RNA 12, 1179–1187.
Kaelin WG (2012) Molecular biology. Use and abuse
of RNAi to study mammalian gene function. Science
337, 421–422.
Sigoillot FD & King RW (2011) Vigilance and
validation: keys to success in RNAi screening. ACS
Chem Biol 6, 47–60.
Jackson AL, Bartz SR, Schelter J, Kobayashi SV,
Burchard J, Mao M, Li B, Cavet G & Linsley PS
FEBS Journal 282 (2015) 1383–1393 ª 2015 FEBS
Use of CRISPR/Cas9 technology
27
28
29
30
31
32
33
34
35
36
37
38
(2003) Expression profiling reveals off-target gene
regulation by RNAi. Nat Biotechnol 21, 635–637.
Doench JG, Petersen CP & Sharp PA (2003) siRNAs
can function as miRNAs. Genes Dev 17, 438–442.
Buehler E, Khan AA, Marine S, Rajaram M, Bahl A,
Burchard J & Ferrer M (2012) siRNA off-target effects
in genome-wide screens identify signaling pathway
members. Sci Rep 2, 428.
Sigoillot FD, Lyman S, Huckins JF, Adamson B,
Chung E, Quattrochi B & King RW (2012) A
bioinformatics method identifies prominent off-targeted
transcripts in RNAi screens. Nat Methods 9, 363–366.
Echeverri CJ, Beachy PA, Baum B, Boutros M,
Buchholz F, Chanda SK, Downward J, Ellenberg J,
Fraser AG, Hacohen N et al. (2006) Minimizing the
risk of reporting false positives in large-scale RNAi
screens. Nat Methods 3, 777–779.
Westbrook TF, Martin ES, Schlabach MR, Leng Y,
Liang AC, Feng B, Zhao JJ, Roberts TM, Mandel G,
Hannon GJ et al. (2005) A genetic screen for candidate
tumor suppressors identifies REST. Cell 121, 837–848.
Luo B, Cheung HW, Subramanian A, Sharifnia T,
Okamoto M, Yang X, Hinkle G, Boehm JS,
Beroukhim R, Weir BA et al. (2008) Highly parallel
identification of essential genes in cancer cells. Proc
Natl Acad Sci USA 105, 20380–20385.
Cheung HW, Cowley GS, Weir BA, Boehm JS, Rusin
S, Scott JA, East A, Ali LD, Lizotte PH, Wong TC
et al. (2011) Systematic investigation of genetic
vulnerabilities across cancer cell lines reveals lineagespecific dependencies in ovarian cancer. Proc Natl Acad
Sci USA 108, 12372–12377.
Barbie DA, Tamayo P, Boehm JS, Kim SY, Moody
SE, Dunn IF, Schinzel AC, Sandy P, Meylan E, Scholl
C et al. (2009) Systematic RNA interference reveals
that oncogenic KRAS-driven cancers require TBK1.
Nature 462, 108–112.
Licon A, Haimes J, Spayd KJ, Patel KM,
Strezoska Z,
Sullivan K, Jastrzebski K, Simpson KJ, Leake D, van
Brabant Smith A et al. (2012) Optimized PCR
conditions and increased shRNA fold representation
improve reproducibility of pooled shRNA screens.
PLoS One 7, e42341.
Wang T, Wei JJ, Sabatini DM & Lander ES (2014)
Genetic screens in human cells using the CRISPR-Cas9
system. Science 343, 80–84.
Shalem O, Sanjana NE, Hartenian E, Shi X, Scott DA,
Mikkelsen TS, Heckl D, Ebert BL, Root DE, Doench
JG et al. (2014) Genome-scale CRISPR-Cas9 knockout
screening in human cells. Science 343, 84–87.
Whittaker SR, Theurillat JP, Van Allen E, Wagle N,
Hsiao J, Cowley GS, Schadendorf D, Root DE &
Garraway LA (2013) A genome-scale RNA interference
screen implicates NF1 loss in resistance to RAF
inhibition. Cancer Discov 3, 350–362.
1391
Use of CRISPR/Cas9 technology
39 Huang S, H€
olzel M, Knijnenburg T, Schlicker A,
Roepman P, McDermott U, Garnett M, Grernrum W,
Sun C, Prahallad A et al. (2012) MED12 controls the
response to multiple cancer drugs through regulationof
TGF-b receptor signaling. Cell 151, 937–950.
40 Koike-Yusa H, Li Y, Tan E-P, Del Castillo VelascoHerrera M & Yusa K (2014) Genome-wide recessive
genetic screening in mammalian cells with a lentiviral
CRISPR-guide RNA library. Nat Biotechnol 32, 267–
273.
41 Zhou Y, Zhu S, Cai C, Yuan P, Li C, Huang Y & Wei
W (2014) High-throughput screening of a CRISPR/
Cas9 library for functional genomics in human cells.
Nature 509, 487–491.
42 Heckl D, Kowalczyk MS, Yudovich D, Belizaire R,
Puram RV, McConkey ME, Thielke A, Aster JC,
Regev A & Ebert BL (2014) Generation of mouse
models of myeloid malignancy with combinatorial
genetic lesions using CRISPR-Cas9 genome editing.
Nat Biotechnol 32, 941–946.
43 Gilbert LA, Larson MH, Morsut L, Liu Z, Brar GA,
Torres SE, Stern-Ginossar N, Brandman O, Whitehead
EH, Doudna JA et al. (2013) CRISPR-mediated
modular RNA-guided regulation of transcription in
eukaryotes. Cell 154, 442–451.
44 Gilbert LA, Horlbeck MA, Adamson B, Villalta JE,
Chen Y, Whitehead EH, Guimaraes C, Panning B,
Ploegh HL, Bassik MC et al. (2014) Genome-scale
CRISPR-mediated control of gene repression and
activation. Cell 159, 647–661.
45 Konermann S, Brigham MD, Trevino AE, Joung J,
Abudayyeh OO, Barcena C, Hsu PD, Habib N,
Gootenberg JS, Nishimasu H et al. (2014) Genomescale transcriptional activation by an engineered
CRISPR-Cas9 complex. Nature 517, 583–588.
46 Zalatan JG, Lee ME, Almeida R, Gilbert LA,
Whitehead EH, La Russa M, Tsai JC, Weissman JS,
Dueber JE, Qi LS et al. (2015) Engineering complex
synthetic transcriptional programs with CRISPR RNA
scaffolds. Cell 160, 339–350.
47 Doench JG, Hartenian E, Graham DB, Tothova Z,
Hegde M, Smith I, Sullender M, Ebert BL, Xavier RJ
& Root DE (2014) Rational design of highly active
sgRNAs for CRISPR-Cas9-mediated gene inactivation.
Nat Biotechnol 32, 1262–1267.
48 Hsu PD, Scott DA, Weinstein JA, Ran FA, Konermann
S, Agarwala V, Li Y, Fine EJ, Wu X, Shalem O et al.
(2013) DNA targeting specificity of RNA-guided Cas9
nucleases. Nat Biotechnol 31, 827–832.
49 Fu Y, Foden JA, Khayter C, Maeder ML, Reyon D,
Joung JK & Sander JD (2013) High-frequency offtarget mutagenesis induced by CRISPR-Cas nucleases
in human cells. Nat Biotechnol 31, 822–826.
50 Fu BXH, Hansen LL, Artiles KL, Nonet ML & Fire
AZ (2014) Landscape of target:guide homology effects
1392
E. Hartenian and J. G. Doench
51
52
53
54
55
56
57
58
59
60
61
62
63
on Cas9-mediated cleavage. Nucleic Acids Res 42,
13778–13787.
Lin Y, Cradick TJ, Brown MT, Deshmukh H, Ranjan
P, Sarode N, Wile BM, Vertino PM, Stewart FJ & Bao
G (2014) CRISPR/Cas9 systems have off-target activity
with insertions or deletions between target DNA and
guide RNA sequences. Nucleic Acids Res 42, 7473–
7485.
Fu Y, Sander JD, Reyon D, Cascio VM & Joung JK
(2014) Improving CRISPR-Cas nuclease specificity using
truncated guide RNAs. Nat Biotechnol 32, 279–284.
Ran FA, Hsu PD, Lin C-Y, Gootenberg JS,
Konermann S, Trevino AE, Scott DA, Inoue A,
Matoba S, Zhang Y et al. (2013) Double nicking by
RNA-guided CRISPR Cas9 for enhanced genome
editing specificity. Cell 154, 1380–1389.
Guilinger JP, Thompson DB & Liu DR (2014) Fusion
of catalytically inactive Cas9 to FokI nuclease improves
the specificity of genome modification. Nat Biotechnol
32, 577–582.
Kuscu C, Arslan S, Singh R, Thorpe J & Adli M
(2014) Genome-wide analysis reveals characteristics of
off-target sites bound by the Cas9 endonuclease. Nat
Biotechnol 32, 677–683.
Wu X, Scott DA, Kriz AJ, Chiu AC, Hsu PD, Dadon
DB, Cheng AW, Trevino AE, Konermann S, Chen S
et al. (2014) Genome-wide binding of the CRISPR
endonuclease Cas9 in mammalian cells. Nat Biotechnol
32, 670–676.
Sternberg SH, Redding S, Jinek M, Greene EC &
Doudna JA (2014) DNA interrogation by the CRISPR
RNA-guided endonuclease Cas9. Nature 507, 62–67.
Smith C, Gore A, Yan W, Abalde-Atristain L, Li Z,
He C, Wang Y, Brodsky RA, Zhang K, Cheng L et al.
(2014) Whole-genome sequencing analysis reveals high
specificity of CRISPR/Cas9 and TALEN-based genome
editing in human iPSCs. Cell Stem Cell 15, 12–13.
Veres A, Gosis BS, Ding Q, Collins R, Ragavendran A,
Brand H, Erdin S, Talkowski ME & Musunuru K
(2014) Brief report. Stem Cell 15, 27–30.
Nishimasu H, Ran FA, Hsu PD, Konermann S,
Shehata SI, Dohmae N, Ishitani R, Zhang F & Nureki
O (2014) Crystal structure of Cas9 in complex with
guide RNA and target DNA. Cell 156, 935–949.
Lawrence MS, Stojanov P, Mermel CH, Robinson JT,
Garraway LA, Golub TR, Meyerson M, Gabriel SB,
Lander ES & Getz G (2014) Discovery and saturation
analysis of cancer genes across 21 tumour types. Nature
505, 495–501.
ENCODE Project Consortium (2012) An integrated
encyclopedia of DNA elements in the human genome.
Nature 489, 57–74.
Shrivastav M, De Haro LP & Nickoloff JA (2008)
Regulation of DNA double-strand break repair
pathway choice. Cell Res 18, 134–147.
FEBS Journal 282 (2015) 1383–1393 ª 2015 FEBS
E. Hartenian and J. G. Doench
64 Lin S, Staahl BT, Alla RK & Doudna JA (2014)
Enhanced homology-directed human genome
engineering by controlled timing of CRISPR/Cas9
delivery. eLife 4, e04766.
65 Chen B, Gilbert LA, Cimini BA, Schnitzbauer J, Zhang
W, Li G-W, Park J, Blackburn EH, Weissman JS, Qi
LS et al. (2013) Dynamic imaging of genomic lociin
FEBS Journal 282 (2015) 1383–1393 ª 2015 FEBS
Use of CRISPR/Cas9 technology
living human cells by an optimized CRISPR/Cas
system. Cell 155, 1479–1491.
66 Tanenbaum ME, Gilbert LA, Qi LS, Weissman JS &
Vale RD (2014) A protein-tagging system for signal
amplification in gene expression and fluorescence
imaging. Cell 159, 635–646.
1393