Download Intrinsic sequence specificity of the Cas1 integrase directs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Homologous recombination wikipedia , lookup

DNA replication wikipedia , lookup

Helicase wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

DNA polymerase wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

DNA nanotechnology wikipedia , lookup

Microsatellite wikipedia , lookup

CRISPR wikipedia , lookup

Replisome wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
1
Intrinsic sequence specificity of the Cas1 integrase directs new spacer
2
acquisition
3
Clare Rollie1, Stefanie Schneider2, Anna Sophie Brinkmann 3, Edward L Bolt3 and Malcolm
4
F White1
5
1
6
Fife KY16 9ST, UK. 2 Institute of Cell Biology (Cancer Research), Faculty of Medicine, University of
7
Duisburg-Essen, Essen, Germany.
8
School, Nottingham, UK.
9
*for correspondence: email [email protected]; Tel +44-1334 463432.
Biomedical Sciences Research Complex, University of St Andrews, North Haugh, St Andrews,
3
Life Sciences, University of Nottingham, QMC Medical
10
11
Competing Interests statement: The authors declare that they have no competing interests.
12
13
Abstract
14
The adaptive prokaryotic immune system CRISPR-Cas provides RNA-mediated protection from
15
invading genetic elements. The fundamental basis of the system is the ability to capture small
16
pieces of foreign DNA for incorporation into the genome at the CRISPR locus, a process known as
17
Adaptation, which is dependent on the Cas1 and Cas2 proteins. We demonstrate that Cas1
18
catalyses an efficient trans-esterification reaction on branched DNA substrates, which represents
19
the reverse- or disintegration reaction. Cas1 from both Escherichia coli and Sulfolobus solfataricus
20
display sequence specific activity, with a clear preference for the nucleotides flanking the
21
integration site at the leader-repeat 1 boundary of the CRISPR locus. Cas2 is not required for this
22
activity and does not influence the specificity. This suggests that the inherent sequence specificity
23
of Cas1 is a major determinant of the adaptation process.
24
25
Introduction
26
The CRISPR-Cas system is an adaptive immune system present in many archaeal and bacterial
27
species. It provides immunity against viruses and other mobile genetic elements mediated through
28
sequence homology-directed detection and destruction of foreign nucleic acid species (reviewed in
29
(Sorek et al., 2013, Barrangou and Marraffini, 2014, van der Oost et al., 2014)). Organisms with an
30
active CRISPR-Cas system encode one or more CRISPR loci in their genomes. These comprise a
31
leader sequence followed by an array of short, direct, often palindromic repeats interspersed by
32
short “spacer” sequences, together with a variable number of CRISPR associated (cas) genes.
33
Spacers are derived from mobile genetic elements and constitute a “memory” of past infections.
34
The CRISPR locus is transcribed from the leader, generating pre-CRISPR RNA (pre-crRNA) that is
35
subsequently processed into unit length crRNA species and loaded into large ribonucleoprotein
36
complexes. These complexes, known as “Interference”, “Effector” or “Surveillance” complexes,
1
37
utilize crRNA to detect and degrade cognate invading genetic entities, providing immunity against
38
previously encountered viruses and plasmids.
39
40
The process of foreign DNA capture and integration into the CRISPR locus is known as
41
“Acquisition” or “Adaptation” (reviewed in (Fineran and Charpentier, 2012, Heler et al., 2014)). This
42
process underpins the whole CRISPR-Cas system, but is the least well understood aspect.
43
Fundamentally, acquisition can be separated into two steps: the capture of new DNA sequences
44
from mobile genetic elements, followed by the integration of that DNA into the host genome. New
45
spacers are incorporated proximal to the leader sequence and result in the duplication of the first
46
repeat (Goren et al., 2012, Yosef et al., 2012). The leader sequence proximal to the repeat is
47
important for integration, but transcription of the locus is not required (Yosef et al., 2012). New
48
spacers are frequently incorporated in both possible orientations (Shmakov et al., 2014). The
49
integration process in E. coli results in staggered cleavage of the first CRISPR repeat, where the
50
3’-end of one strand of the incoming DNA is joined to the end of the CRISPR repeat in a trans-
51
esterification (TES) reaction, with another TES reaction occurring on the other strand (Diez-
52
Villasenor et al., 2013) (Figure 1, numbered yellow arrows). Intermediates in this reaction have
53
been observed in E. coli, and the sequence of the leader:repeat1 junction is important for the
54
activity (Arslan et al., 2014). The order of the two integration steps shown in Figure 1 is not yet
55
known, although it has been suggested that the reaction on the minus strand (site 2) occurs first in
56
E. coli (Nunez et al., 2015). The sequence at site 2 is flanked by the end of the first repeat and the
57
first spacer, and is therefore inherently less well conserved. In E. coli, the last nucleotide of the
58
newly copied repeat is derived from the first nucleotide of the incoming spacer, which imposes
59
further sequence requirements on that system (Swarts et al., 2012).
60
61
Although shown in Figure 1 as fully double-stranded, the incoming spacer DNA could have a
62
partially single-stranded character. Recent work by Sorek and colleagues has shown that the E.
63
coli RecBCD helicase-nuclease complex, which processes DNA double-strand breaks for
64
recombination and degrades foreign DNA, is implicated in the generation of viral DNA fragments
65
captured by Cas1 and incorporated into the CRISPR locus as new spacers (Levy et al., 2015).
66
This confirms previous observations linking Cas1 with RecBCD (Babu et al., 2011) and raises
67
some intriguing questions as RecBCD generates ssDNA fragments asymmetrically, generating
68
fragments tens to hundreds of nucleotides long from the 3’ terminated strand and much longer
69
fragments from the 5’ terminated strand (reviewed in (Dillingham and Kowalczykowski, 2008)). The
70
Cas4 enzyme, which is a 5’ to 3’ ssDNA exonuclease (Lemak et al., 2013, Zhang et al., 2012),
71
may provide ssDNA fragments for Cas1 in systems lacking RecBCD. However, it is difficult to see
72
how two integration reactions could occur without two 3’ hydroxyl termini (Figure 1) and half-site
73
integration is not observed with a ssDNA substrate (Nunez et al., 2015). Potentially, the ssDNA
2
74
fragments generated by these nucleases may re-anneal and experience further processing to
75
generate partially duplex molecules of defined size prior to integration by Cas1.
76
77
Adaptation requires the products of the cas1 and cas2 genes in vivo and these are the most
78
universally conserved of all the cas genes (Makarova et al., 2006). Cas1 is a homodimeric enzyme
79
with a two-domain structure and a canonical metal dependent nuclease active site in the large
80
domain formed by a cluster of highly conserved residues (Wiedenheft et al., 2009) (Figure 1C). E.
81
coli Cas1 has nuclease activity in vitro, with activity observed against double- and single-stranded
82
DNA and RNA substrates (Babu et al., 2011). Some specificity was observed for branched DNA
83
substrates, in particular for DNA constructs resembling replication forks (Babu et al., 2011). Initial
84
biochemical analyses of a panel of archaeal Cas2 enzymes revealed an endonucleolytic activity
85
against ssRNA substrates that could be abrogated by mutation of conserved residues
86
(Beloglazova et al., 2008). In contrast, Cas2 from Bacillus halodurans has been shown to be
87
specific for cleavage of dsDNA substrates (Nam et al., 2012). Recently however, Doudna and
88
colleagues demonstrated that E. coli Cas1 and Cas2 form a stable complex in vitro and that the
89
“active site” of Cas2 was not required for spacer acquisition, suggesting that Cas2 may not have a
90
catalytic role in spacer acquisition in vivo (Nunez et al., 2014). It is probable that Cas2 acts as an
91
adaptor protein, either bringing two Cas1 dimers together or mediating interactions with other
92
components necessary for spacer acquisition. Recently, Nunez et al. demonstrated that E. coli
93
Cas1 can integrate a protospacer into a supercoiled plasmid in vitro, in a reaction stimulated by
94
Cas2. Integration events were observed at the boundaries of most CRISPR repeats and in other
95
areas of the DNA close to palindromic regions, implicating a role for palindromic DNA structure in
96
the adaptation process (Nunez et al., 2015).
97
98
In order to further mechanistic understanding of the spacer acquisition process, we have
99
undertaken a systematic analysis of the underlying biochemistry. We demonstrate that Cas1
100
catalyses transesterification of branched DNA substrates efficiently in vitro in a reaction that
101
represents the reverse- or disintegration of an incoming spacer from the CRISPR locus. The
102
disintegration reactions catalysed by diverse integrases have proven a powerful model system for
103
the understanding of the mechanism of integration. For Cas1, the reaction is strongly sequence
104
dependent with the specificity matching the predicted integration site 1 for both E. coli and S.
105
solfataricus Cas1, and does not require Cas2.
106
107
Results
108
Cas1 catalyzes a transesterification reaction on branched DNA substrates
109
The Cas1 and Cas2 proteins from S. solfataricus (CRISPR-Cas subtype IA) and E. coli (CRISPR-
110
Cas subtype IE) were expressed in E. coli with N-terminal polyhistidine tags and purified to
111
homogeneity by metal affinity and gel filtration chromatography. Previously, it was demonstrated
3
112
that E. coli Cas1 (EcoCas1) cleaved branched DNA substrates preferentially (Babu et al., 2011).
113
We investigated the activity of S. solfataricus Cas1 (SsoCas1) against a DNA substrate with a 5’-
114
flap structure (Figure 2). By labelling the single-stranded 5’-flap of the downstream duplex at the 5’-
115
end with radioactive phosphorus, we observed cleavage of the flap by SsoCas1, releasing an 18 nt
116
product (Figure 2B). A variant of SsoCas1 where the active site metal ligand glutamic acid 142 was
117
changed to an alanine (E142A) was inactive, implicating the canonical nuclease active site of Cas1
118
in the reaction. This result appeared consistent with the earlier studies for EcoCas1 (Babu et al.,
119
2011) and suggested that the ssDNA flap was removed at or close to the branch point. The activity
120
was independent of the presence or absence of SsoCas2, suggesting that Cas2 is not involved in
121
122
123
this nuclease activity in vitro.
124
detail. The continuous (black) strand was not cleaved by SsoCas1 (Figure 2A). However, when the
125
23 nt (green) strand of the upstream duplex presenting a 3’-hydroxyl end at the junction point was
126
labelled we observed the formation of a new, larger DNA species (Figure 2C). This observation
127
was consistent with a joining or transesterification (TES) of the upstream DNA to the downstream
128
DNA strand by Cas1. By switching to a label at the 3’ end of the downstream duplex we confirmed
129
that the reaction catalyzed by Cas1 involved the joining of the upstream and downstream DNA
130
duplexes with concomitant loss of the 5’-flap (Figure 2D). The lack of evidence for any shorter DNA
131
species in Figure 2D was consistent with the conclusion that we were observing a TES rather than
132
a nuclease reaction. Again, the TES reaction was unaffected by the presence or absence of Cas2
133
and was dependent on the wild-type active site of Cas1, as the E142A variant was inactive.
134
In order to define the product of the TES reaction in more detail, we modified the sequence of the
135
branch point to introduce an interrupted site for the restriction enzyme SacI across the junction. A
136
precise TES reaction at the branch point to generate duplex DNA would result in the “repair” of the
137
restriction site, generating a substrate for SacI. SsoCas1 processed this substrate generating the
138
53 bp TES reaction product. On addition of the SacI restriction enzyme, the 53 bp species was
139
converted to a 25 bp product (Figure 2E). This confirmed that the Cas1-mediated reaction resulted
140
in the formation of a functional SacI site in vitro, consistent with a precise TES reaction at the
141
branch point.
142
It is likely that the TES reaction catalyzed by Cas1 with branched DNA substrates in vitro
143
represents the reverse or disintegration of an integration intermediate, as observed recently for
144
EcoCas1 (Nunez et al., 2015). We therefore tested EcoCas1 with the same set of branched
145
substrates, showing that manganese, magnesium and cobalt all supported the same robust
146
disintegration activity in the absence of Cas2, with no nuclease activity observed (Figure 3A).
147
Given that Eco and SsoCas1 are divergent members of the Cas1 family, this suggests that the
148
disintegration activity is likely to be a general property of Cas1 enzymes. Experiments where the
149
concentration of Cas1 was titrated against a fixed concentration of substrate (50 nM), showed that
We proceeded to label the other strands of the substrate to follow the reaction products in more
4
150
maximal activity plateaued above 250 nM for EcoCas1 and was maximal from 100 – 500 nM for
151
SsoCas1 (Figure 3B, C).
152
153
To characterise the specificity of the disintegration reaction in more detail, we examined SsoCas1
154
activity for a variety of substrates differing in the nature of the 5’-flap or duplex arm released by the
155
reaction (Figure 4). SsoCas1 was indifferent to the presence of duplex or single-stranded DNA in
156
the 5’-flap, processing a nicked 3-way junction with a similar efficiency to the 5’-flap substrate. The
157
disintegration reaction required the presence of the 3’-hydroxyl moiety at the branch point as a
158
three-way (or Y) junction was not a substrate for Cas1. To confirm this observation we studied a 5’-
159
flap substrate with a phosphate moiety at the 3’ end of the upstream strand in place of a hydroxyl
160
group. As expected, this substrate did not support disintegration activity for either Sso or EcoCas1,
161
with no larger TES product detected (right hand lanes). Tellingly, neither enzyme cleaved the 5’-
162
flap of this substrate to generate shorter products (left hand lanes), confirming that Cas1 catalyses
163
a concerted TES reaction rather than a sequential “cut and join” activity.
164
165
Sequence specificity of the disintegration reaction
166
Disintegration reactions are commonly seen for integrases and transposases, and represent a very
167
useful means to study the underlying integration mechanism (Chow et al., 1992, Delelis et al.,
168
2008) as the local arrangement of the DNA in the integrase active site is typically the same for the
169
two reactions (Gerton et al., 1999). One prediction of this hypothesis is that the disintegration
170
reaction could demonstrate some sequence preference if integration, which must happen at a
171
specific, defined site in the CRISPR locus, is partly driven by the sequence specificity of Cas1. We
172
therefore designed a family of related disintegration substrates by systematically varying the
173
nucleotides flanking the TES site and tested how efficiently Cas1 could disintegrate these
174
substrates. Having demonstrated conclusively that we could follow the progress of the
175
disintegration reaction by monitoring the liberation of a displaced DNA strand from a 5’-flap
176
substrate, we used this assay for all subsequent investigations.
177
178
The +1 position
179
We first tested the importance of the nucleotide acting as an acceptor for the attacking 3’-hydroxyl
180
of the incoming DNA strand (the +1 position) by varying the nucleotide at this position in the model
181
substrates (Figure 5). In vivo, this acceptor nucleotide should represent the position in the CRISPR
182
locus where new spacers are joined to the end of the repeat. The S. solfataricus CRISPR repeat
183
has a guanine at one end and a cytosine at the other, each of which are predicted to act as
184
acceptors for a new phosphodiester bond formed with the incoming spacer (Figure 1B). Consistent
185
with this, we observed the most efficient disintegration activity with SsoCas1 when substrates had
186
a guanine at the +1 position (Figure 5A). Assays were carried out in triplicate and the reaction
187
products quantified, confirming the qualitative observation of a preference for guanine, followed by
5
188
cytosine, adenine and thymine, with rates of 0.06, 0.013, 0.0058 and 0.0009 min-1, respectively
189
(Figure 5B).
190
191
For E. coli, the first nucleotide of the repeat is a guanine. Although the corresponding position at
192
the other end of the repeat on the minus strand is a cytosine, it has been demonstrated that the
193
new spacer joins at the penultimate residue, which is also a guanine (Swarts et al., 2012).
194
EcoCas1 displayed a striking preference for a guanine at the +1 position for the disintegration
195
reaction, with all three alternative nucleotides strongly disfavoured at this position (Figure 5C), in
196
good agreement with the prediction based on the repeat sequence. For EcoCas1 the reaction did
197
not go to completion and accordingly we fitted the data with a variable end-point as described in
198
the methods (Figure 5D). Although the reaction rates could not be determined accurately, guanine
199
at position +1 supported rates at least ten-fold higher than any other nucleotide. We also tested the
200
effect of inclusion of Cas2 on the sequence specificity of EcoCas1, and observed that Cas2 had no
201
effect, with strong preference for a guanine at +1 still observed (Figure 5E).
202
203
The -1 position
204
We proceeded to investigate the sequence dependence of the nucleotide at the 3’-end of the
205
attacking DNA strand (the -1 position) in the disintegration reactions. For the first integration site,
206
this should correspond to the last nucleotide of the leader sequence, which is an adenine in both
207
S. solfataricus and E. coli. Although both Cas1 enzymes catalyzed the disintegration of substrates
208
with an adenine at this position, clear sequence preference was not apparent (Figure 6). For S.
209
solfataricus, the -1’ position on the minus strand is variable in sequence. However, in E. coli, site 2
210
occurs at the penultimate nucleotide of the repeat and therefore has a cytosine at the -1’ position
211
(Swarts et al., 2012). In this situation, the incoming 3’ terminal nucleotide of the spacer has to be a
212
cytosine in order to complete the original repeat sequence. To mimic the TES substrate at this site
213
more closely, we tested substrates where the first nucleotide of the 5’ flap, equivalent to the
214
incoming nucleotide in the forward reaction, was a cytosine, but a cytosine at position -1 was still
215
not favoured by EcoCas1 (Figure 6C). This may suggest that the disintegration reaction is not a
216
good model for integration at site 2, which is further discussed later.
217
218
The -2 position
219
The nucleotide at the -2 position, which is part of the conserved leader sequence for integration
220
site 1, is also a potential determinant of integration specificity. Accordingly, we varied this residue
221
systematically and investigated the efficiency of the disintegration reaction for both Cas1 enzymes
222
(Figure 7). In S. solfataricus, the -2 position in the leader is a cytosine, which supported the
223
strongest disintegration activity (Figure 7A). In E. coli, the -2 position in the leader is a guanine. A
224
clear preference for guanine over all other nucleotides was observed for EcoCas1 (Figure 7B),
225
confirmed by a more detailed kinetic analysis (Figure 7C) which was fitted as for Figure 5D. These
6
226
data are consistent with a role for sequence discrimination at the -2 position by both enzymes,
227
which is relevant for integration site 1 but not site 2, where this position varies depending on the
228
sequence of the last spacer inserted.
229
230
The incoming nucleotide
231
We next checked for specificity at the 3’ end of the 5’ flap in the disintegration product, which
232
corresponds to the 3’ end of the incoming spacer during integration. No sequence preference was
233
detected for SsoCas1 (data not shown), which is consistent with the essentially random nature of
234
the incoming DNA. During adaptation in E. coli, the incoming nucleotide at integration site 1 is
235
expected to be random, but at site two is always a cytosine, where it completes the repeat
236
sequence (Swarts et al., 2012). For EcoCas1, an adenine or cytosine was strongly favoured over
237
guanine and particularly thymine (Figure 8), suggesting discrimination by EcoCas1 at this position.
238
239
Disintegration of authentic E. coli integration intermediates
240
The substrates examined so far in this study do not correspond to the actual sequences
241
encountered by EcoCas1 during integration. Accordingly, we constructed a pair of substrates that
242
correspond to the products of integration when spacer 3 in the E. coli CRISPR array is integrated
243
at site 1 (top strand) or site 2 (bottom strand) (Figure 9). These were constructed from
244
oligonucleotides as before to generate disintegration substrates with a 5’ flap. Disintegration was
245
analysed by denaturing gel electrophoresis, phosphorimaging and quantification of triplicate
246
experiments. EcoCas1 disintegrated the site 1 substrate quickly, with the reaction reaching
247
approximately 75% conversion in 3 min, which compares favorably with the best model sequence
248
studied. Site 2 was also a good disintegration substrate, though it was converted significantly more
249
slowly than site 1, perhaps due to the presence of a disfavored cytosine at position -1.
250
251
Discussion
252
Studies of the CRISPR spacer acquisition process in vivo have yielded many key insights, but they
253
are complicated by the fact that it is very difficult to separate the two distinct steps of spacer
254
capture and spacer integration. Consequently we still do not have a clear understanding of the
255
roles of Cas1, Cas2 and host proteins in the acquisition mechanism. In this study we investigated
256
the integration reaction by focusing on the biochemistry of the Cas1 protein from S. solfataricus
257
and E. coli. Efficient trans-esterification of branched DNA substrates with a 5’-flap or duplex arm is
258
clearly possible for both S. solfataricus and E. coli Cas1 in vitro. This is a very precise reaction
259
requiring attack by a 3’-hydroxyl at the branch point, generating a perfect DNA duplex. The
260
reaction almost certainly represents the disintegration reaction that is the reverse of the spacer
261
integration step, as observed for many integrases and transposases where it represents a very
262
useful means to study the underlying integration mechanism (Gerton et al., 1999). Evidence for a
263
disintegration activity was recently described by Doudna and colleagues for EcoCas1, but the
7
264
activity observed was relatively weak, most likely because the branched substrate studied had a
265
non-optimal DNA sequence around the branch point (Nunez et al., 2015).
266
267
For the CRISPR adaptation process in vivo, integration occurs at the junction between the first
268
repeat and the leader sequence, which immediately suggests a role for sequence specificity in the
269
reaction. It has also been suggested that CRISPR repeat sequences, which are often palindromic,
270
form four-way DNA junctions in supercoiled DNA, acting as a recognition signal for Cas1, a
271
possibility that is supported by the observation that EcoCas1 can cut four-way DNA junctions in
272
vitro (Babu et al., 2011), and the finding that spacer integration in a plasmid lacking a CRISPR
273
locus occurs preferentially next to a palindromic site (Nunez et al., 2015). However, a palindrome
274
alone is not sufficient to support spacer insertion in E. coli in vivo (Arslan et al., 2014), and this also
275
holds for the type II CRISPR system of Streptococcus thermophilus (Wei et al., 2015), suggesting
276
that local sequence helps determine the integration site. Furthermore, some CRISPR repeat
277
families, including many in archaea, have little or no palindromic nature and thus cannot form
278
stable hairpin structures (Kunin et al., 2007). Cas1 therefore might be expected to act as a
279
sequence specific integrase, although local DNA structure could also play a part.
280
281
In support of this hypothesis, both S. solfataricus and E. coli Cas1 catalyse a disintegration
282
reaction with distinct, sequence specific properties. In particular, there is a clear preference for a
283
guanine at position +1, corresponding to the first nucleotide of the repeat, suggesting that this
284
residue is recognised specifically in the active site of Cas1. The specificity is particularly strong for
285
EcoCas1, consistent with the presence of a guanine in the repeat sequence at the +1 site in both
286
plus and minus strands. Preference for a guanine at the +1 nucleotide for EcoCas1 catalysed
287
integration events has also been observed (Nunez et al., 2015). For SsoCas1, a guanine at this
288
position was preferred over a cytosine, which is the nucleotide present at the +1’ position on the
289
minus strand, by a factor of five. Although the -1 position might also be expected to play a role in
290
the selection of integration sites, deep sequencing data for integration catalysed by EcoCas1
291
revealed no sequence preference at this position (Nunez et al., 2015). In agreement with this
292
finding, we observed little evidence for sequence discrimination at the -1 position for the
293
disintegration reactions catalysed by either enzyme, with the exception that EcoCas1 disfavours
294
cytosine at this position (Figure 6B). A cytosine at position -1 is the expected residue on the minus
295
strand, suggesting that the disintegration reaction may better reflect the reversal of integration at
296
site 1 in the leader:repeat junction. Deep sequencing data for integration reactions catalysed by
297
EcoCas1 in vitro did reveal a marked preference for a guanine at position -2 in the integration site
298
(Nunez et al., 2015). This corresponds well with the -2 position in the plus strand, which is part of
299
the leader sequence and is a guanine in E. coli, but cannot hold for the minus strand where the -2’
300
position is inherently variable in nature. Disintegration of substrates mimicking the integration
301
intermediates relevant for the integration of spacer 3 into the E. coli CRISPR array reinforce these
8
302
conclusions, with site 1 on the plus strand processed significantly more quickly than site 2 on the
303
bottom strand (Figure 9). Taken together, both the disintegration specificity and the deep
304
sequencing data for integration support the hypothesis that integration is targeted to the
305
leader:repeat 1 junction on the plus strand at least in part by the inherent sequence specificity of
306
Cas1, which presumably involves specific recognition of these bases within the active site of the
307
enzyme (Figure 10).
308
309
For E. coli integration reactions in vitro, a marked preference for cytosine over thymine at the 3’
310
end of the protospacer was observed (Nunez et al., 2015). Furthermore, protospacers with a
311
cytosine at the 3’ end were preferentially incorporated into the minus strand at the junction
312
between repeat 1 and spacer 1. These data are consistent with the requirement for protospacers
313
to supply the final cytosine of the repeat on the minus strand during integration (Swarts et al.,
314
2012). The deep sequencing data also revealed a marked preference for adenine over thymine at
315
the 3’ end of the protospacer (Nunez et al., 2015). For disintegration by EcoCas1, we observed a
316
clear preference for adenine or cytosine at the equivalent position, whilst thymine did not support
317
the disintegration reaction (Figure 8). Thus EcoCas1 appears to recognise the nucleotide at the 3’
318
end of the incoming DNA, although no such discrimination was observed for SsoCas1. A
319
dinucleotide sequence “AA” motif over-represented at the 3’ end of protospacers incorporated in
320
E. coli strain BL21 has been described previously (Yosef et al., 2013).
321
322
Although the CRISPR spacer integration system has been compared to the integration and
323
transposition reactions carried out by mobile genetic elements, there is one key difference in the
324
two processes – the length of DNA integrated. The persistence length of DNA, the distance over
325
which it behaves as a fairly rigid rod, is estimated as 35-50 nm (100-150 bp) under conditions
326
found in cells (Brinkers et al., 2009, Hagerman, 1988). This means that the two ends of a viral
327
genome of several kilobases can be looped around and brought close together relatively easily,
328
but a new spacer of 30-40 base pairs of dsDNA cannot be manipulated in the same way.
329
Considering the scheme in Figure 1, the molecular origami required to achieve the second TES
330
reaction looks challenging. Several related enzymes, including Mu transposase (Savilahti et al.,
331
1995), V(D)J recombinase (Ramsden et al., 1996) and HIV integrase (Gerton et al., 1999) are
332
known to disrupt base pairing of DNA substrates and make sequence-specific contacts during the
333
integration reaction. It is likely that Cas1 also manipulates the local DNA duplex structure, which
334
may help in positioning the DNA strands correctly for the TES reaction. The observation that the
335
incoming DNA (the 5’-flap in the disintegration reaction) can be single stranded, partially or fully
336
duplex in nature suggests that there is some flexibility in the recognition of the incoming spacer.
337
This is also consistent with the recent observation of a link between RecBCD, which generates
338
ssDNA products, and Cas1 in E. coli (Levy et al., 2015). There is no formal requirement that the
339
protospacer should be fully duplex in nature, although current understanding of the integration
9
340
reaction requires that new spacers have two intact 3’-ends for the two integration reactions so
341
must be at least partially duplex in nature. Many integrases process the ends of the integrated
342
DNA using a nuclease activity, which occurs at the same active site as the integrase activity
343
(Gerton et al., 1999). There is no reason to expect that Cas1 will differ in that regard, and indeed
344
the reaction products of the RecBCD nuclease are on average much longer than the DNA
345
molecules integrated by Cas1, suggesting the requirement for further processing.
346
347
In conclusion, we have shown that Cas1 from both S. solfataricus and E. coli have robust TES
348
activities in vitro which reflect the reversal, or disintegration, of the integration reaction.
349
Disintegration is strongly sequence specific, and the specificity fits with the expected sequence for
350
the plus strand at the leader:repeat junction (Figure 9). This site is the logical place for the initiation
351
of integration, as it has a unique, defined sequence, in contrast to the repetitive and more variable
352
sequence found at the second integration site on the minus strand. Doudna and colleagues
353
recently proposed a model based on an initial attack at site 2 on the minus strand (Nunez et al.,
354
2015). However, this preference was only significant for spacers with a cytosine at the 3’ terminus,
355
and does not explain the marked preference observed by the authors for a guanine at position +2,
356
which is a feature of the positive strand. Future studies of both the forward and reverse integration
357
reactions catalyzed by Cas1 will help to address these issues and delineate the mechanism of
358
spacer integration in the CRISPR system.
359
360
Materials and Methods
361
362
Cloning, gene expression and protein purification
363
The sso1450 (cas1) gene and sso1450a (cas2) genes were amplified from Sulfolobus solfataricus
364
P2 genomic DNA by PCR using primer pairs (5’-ATATAACCATGGCAAGCGTGAGGACTT; 5’-
365
TATTGGATCCTCACATCACCAACTTGAAACCC)
366
GGCCGGATCCTTGAAATTATTGGTAGTATATGAC),
367
the pEHisTEV vector (Liu and Naismith, 2009) downstream of a cleavable His6-tag using NcoI and
368
BamHI restriction sites. Site-directed mutagenesis was carried out on the vector containing
369
sso1450 to mutate active site residue glutamic acid 142 to an alanine using the oligonucleotide
370
sequence 5’-GTTGGATAAGGATGCACCGGCTGCTGCTAG. Standard site-directed mutagenesis protocols
371
(QuikChange®, Stratagene) were followed. Mutations were confirmed by sequencing (GATC
372
Biotech). The constructs were expressed in C43 (DE3) Escherichia coli cells grown in LB (Luria-
373
Bertani) medium supplemented with 35 µg/ml kanamycin to an OD600 of 0.6-0.8 at 37 °C.
374
Expression was then induced by the addition of 0.4 mM isopropyl-β-D-thiogalactopyranoside
375
(IPTG) and overnight incubation with shaking at 25 °C. Cells were harvested (8,000 x g, 20 min)
376
and resuspended in lysis buffer (4.5 mM NaH2PO4, 15 mM Na2HPO4 (pH 7.5), 500 mM NaCl, 30
377
mM imidazole, 1 % Triton-X and protease inhibitors (Roche Applied Science)). Cells were lysed by
and (5’-GCGCCATGGTTACACTAACCATTCCTCTAATC; 5’respectively. The amplified genes were cloned into
10
378
sonication, the lysate cleared by ultracentrifugation (90,000 x g, 35 min) and the supernatant
379
filtered though a 0.22 µm syringe filter and loaded on to a HP HisTrap 5 ml column (GE
380
Healthcare) equilibrated in buffer A (4.5 mM NaH2PO4, 15 mM Na2HPO4 (pH 7.5), 500 mM NaCl,
381
30 mM imidazole). The His-tagged protein of interest was eluted with a linear gradient from 30 to
382
500 mM imidazole. Fractions containing Cas1 (or Cas2) were concentrated and buffer exchanged
383
into buffer A, using centrifugal concentrators (30 kDa cutoff, Vivaspin). His-tags were cleaved by
384
the addition of TEV protease (1:10 ratio of TEV:protein) and overnight incubation at room
385
temperature. The cleaved protein was passed through a HisTrap column in buffer A, and the
386
cleaved protein collected in the flow through. The final purification step was gel filtration on a 26/60
387
Superdex 200 prep grade column (GE Healthcare) in buffer C (20 mM Tris-HCL (pH 7.5), 500 mM
388
NaCl, 0.5 mM DTT, 1 mM EDTA, 10 % glycerol). Purified and concentrated protein samples were
389
flash frozen and stored at -80 °C.
390
Genes encoding E. coli K-12 MG1655 Cas1 (ygbT) and Cas2 (ygbF) were amplified by PCR from
391
genomic DNA using the PCR primer pairs (5’-GGAATTCCATATGACCTGGCTTCCCCTTAATC; 5’-
392
GGAATTCTCAGCTACTCCGATGGCCTGC)
393
GGAATTCTCAAACAGGTAAAAAAGACAC),
394
expression plasmid pET14b using restriction enzyme NdeI and EcoRI. EcoCas1 and Cas2 proteins
395
were over-produced individually in strain BL21 AI (Life Technologies), each with N-terminal (His)6
396
tags. Cells were grown at 37 oC to OD600 0.5-0.6 in LB broth containing ampicillin (50 µg/mL) and
397
induced using IPTG (1 mM) and arabinose (0.2 % w/v), with growth continued for 3 h after
398
induction. Cas1 or Cas2 expressing cells were harvested for re-suspension in buffer H (20 mM
399
Tris.HCl pH7.5, 500 mM NaCl, 5 mM imidazole, 10 % glycerol) and flash frozen in liquid nitrogen
400
for storage at -80 oC until required. The first purification step was identical for both Cas1 and Cas2:
401
sonicated biomass was clarified by centrifugation (90,000 x g, 25 min) and soluble extract was
402
passed into a 5 mL Hi-Trap Nickel chelating column (GE Healthcare) equilibrated with buffer H.
403
Cas1 or Cas2 eluted at 70-100 mM imidazole in a linear imidazole gradient. Sodium chloride was
404
reduced to 50 mM by dialysis against buffer S (20 mM Tris.HCl pH7.5, 50 mM NaCl, 1mM DTT, 10
405
% glycerol). Cas1 was loaded onto a 5 mL Hi-Trap heparin column and eluted in a gradient of
406
NaCl at 200-300 mM in buffer S. Pooled Cas1 fractions were loaded directly onto a S300 size
407
exclusion column equilibrated in buffer S with 250 mM NaCl. Cas1 fractions were pooled for
408
storage at -80 oC in buffer S containing 250 mM NaCl and 40 % glycerol. Desalted Cas2 eluted
409
from Ni-NTA was dialyzed into buffer S containing 1.5 M NaCl and applied to a 5 mL Hi-Trap butyl-
410
Sepharose column (GE Healthcare), eluting in the flow through. Cas2 fractions were pooled and
411
loaded directly onto a S300 size exclusion column equilibrated in buffer S with 250 mM NaCl.
412
Following isocratic elution, Cas2 fractions were pooled and stored as for Cas1.
413
Sequence and preparation of DNA substrates
414
Substrates were purchased from Integrated DNA Technologies (IDT) either unlabeled or with a 3’-
415
fluorescein label (Table 1). Oligonucleotides were 5’-32P-radiolabelled and gel purified as described
and
(5’-GGAATTCCATATGAGTATGTTGGTCGTGGTCAC;
5’-
respectively. Each PCR product was subcloned into protein
11
416
previously (Hutton et al., 2010). Labelled oligonucleotides were annealed with complementary
417
strands by heating with an excess of unlabelled strands at 95 °C for 5 min and then slow cooling to
418
room temperature overnight in a heating block. The assembled substrates (Table 2) were purified
419
by native polyacrylamide (12 %) gel electrophoresis with 1X Tris-borate-EDTA (TBE) buffer,
420
followed by band excision, gel extraction and ethanol precipitation before being resuspended in
421
nuclease free water, as described previously (Hutton et al., 2010). The final substrate
422
concentration was measured using the extinction coefficient and absorbance at 260 nm in a UV-
423
Vis spectrophotometer (Varian Cary) and DNA diluted to ~50 nM or ~200 nM final concentration for
424
use in assays.
425
Disintegration reactions
426
Reactions were typically carried out under single turnover conditions. Titration of SsoCas1 (Figure
427
3B, C) showed evidence for inhibition at enzyme:substrate ratios above 10:1. For standard assays,
428
2 μM Cas1 protein was mixed with 200 nM substrate in cleavage buffer (20 mM Tris (pH 7.5), 10
429
mM NaCl, 1 mM DTT and 5 mM MnCl2) and incubated at 55 °C (for SsoCas1) or 37 °C (for
430
EcoCas1). For reactions with Cas1 and Cas2, the proteins were mixed in equimolar concentration
431
and incubated together at either 37 or 55 °C for 30 min before being added to the reaction. At
432
indicated times, reactions were quenched by the addition of EDTA to 20 mM final concentration
433
and 1 μl 20 mg/ml Proteinase K (Promega) and incubation at 37 °C for 30 min. The DNA was then
434
separated
435
phenol:chloroform:isoamyl alcohol (Sigma-Aldrich) was added and the reaction vortexed for 30 s.
436
The sample was then centrifuged (15,000 x g, 1 min) and the upper aqueous phase, containing the
437
DNA, collected. Formamide loading dye (100 % formamide with 0.25 % bromophenol blue, 0.25 %
438
xylene cyanol) was added (5 µl) and the sample heated at 95 °C for 2 min before being chilled on
439
ice. Reaction products were resolved on a pre-run 20 % denaturing (7 M urea) polyacrylamide gel
440
containing 1X TBE in 1X TBE buffer. Gels were run at 80 W, 45 °C for 90 min before overnight
441
exposure to a phosphorimaging plate and imaging with a FLA-5000 Imaging System (FUJIFILM
442
Life Science).
443
SacI site repair
444
Assays with the SacI junction substrate were carried out under standard conditions with SsoCas1
445
for 30 min. FastDigest SacI enzyme (1 μl) and 1 μl FastDigest Buffer (Thermo Scientific) were then
446
added and the reaction incubated at 37 °C for 30 min. Product extraction, separation and
447
visualization was then carried out as described above.
448
Disintegration reaction time courses
449
For the time course assays, the concentration of DNA substrates was 50 nM and the concentration
450
of Cas1 protein 50 nM for SsoCas1 or 500 nM for EcoCas1. Reactions were performed as
451
described above with the omission of the Proteinase K step. Following phosphorimaging,
452
substrates and products were quantified using Image Gauge software (FUJIFILM) and the reaction
453
course was plotted using Kaleidagraph (Synergy Software). Experiments were carried out in
from
the
reaction
by
phenol
chloroform
extraction.
60
μl
neutral
12
454
triplicate and the mean and standard error calculated for each point. For SsoCas1, the data were
455
fitted using a single exponential (Niewoehner et al., 2014). For EcoCas1 the reactions did not go to
456
completion and were therefore fitted with a floating end-point, as described previously (Niewoehner
457
et al., 2014).
458
459
460
Acknowledgements
This work was supported by a grant from the Biotechnology and Biological Sciences Research
461
Council (REF: BB/M000400/1 to MFW). Thanks to Shirley Graham and Kotryna Temcinaite for
462
technical support, and Jing Zhang and Agnes Tello for helpful discussions.
463
464
References
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
ARSLAN, Z., HERMANNS, V., WURM, R., WAGNER, R. & PUL, U. 2014. Detection and
characterization of spacer integration intermediates in type I-E CRISPR-Cas system.
Nucleic Acids Res, 42, 7884-93.
BABU, M., BELOGLAZOVA, N., FLICK, R., GRAHAM, C., SKARINA, T., NOCEK, B.,
GAGARINOVA, A., POGOUTSE, O., BROWN, G., BINKOWSKI, A., PHANSE, S.,
JOACHIMIAK, A., KOONIN, E. V., SAVCHENKO, A., EMILI, A., GREENBLATT, J.,
EDWARDS, A. M. & YAKUNIN, A. F. 2011. A dual function of the CRISPR-Cas system in
bacterial antivirus immunity and DNA repair. Mol Microbiol, 79, 484-502.
BARRANGOU, R. & MARRAFFINI, L. A. 2014. CRISPR-Cas systems: Prokaryotes upgrade to
adaptive immunity. Mol Cell, 54, 234-44.
BELOGLAZOVA, N., BROWN, G., ZIMMERMAN, M. D., PROUDFOOT, M., MAKAROVA, K. S.,
KUDRITSKA, M., KOCHINYAN, S., WANG, S., CHRUSZCZ, M., MINOR, W., KOONIN, E.
V., EDWARDS, A. M., SAVCHENKO, A. & YAKUNIN, A. F. 2008. A novel family of
sequence-specific endoribonucleases associated with the clustered regularly interspaced
short palindromic repeats. J Biol Chem, 283, 20361-71.
BRINKERS, S., DIETRICH, H. R., DE GROOTE, F. H., YOUNG, I. T. & RIEGER, B. 2009. The
persistence length of double stranded DNA determined using dark field tethered particle
motion. J Chem Phys, 130, 215105.
CHOW, S. A., VINCENT, K. A., ELLISON, V. & BROWN, P. O. 1992. Reversal of integration and
DNA splicing mediated by integrase of human immunodeficiency virus. Science, 255, 7236.
DELELIS, O., CARAYON, K., SAIB, A., DEPREZ, E. & MOUSCADET, J. F. 2008. Integrase and
integration: biochemical activities of HIV-1 integrase. Retrovirology, 5, 114.
DIEZ-VILLASENOR, C., GUZMAN, N. M., ALMENDROS, C., GARCIA-MARTINEZ, J. & MOJICA,
F. J. 2013. CRISPR-spacer integration reporter plasmids reveal distinct genuine acquisition
specificities among CRISPR-Cas I-E variants of Escherichia coli. RNA Biol, 10, 792-802.
DILLINGHAM, M. S. & KOWALCZYKOWSKI, S. C. 2008. RecBCD enzyme and the repair of
double-stranded DNA breaks. Microbiol Mol Biol Rev, 72, 642-71, Table of Contents.
FINERAN, P. C. & CHARPENTIER, E. 2012. Memory of viral infections by CRISPR-Cas adaptive
immune systems: acquisition of new information. Virology, 434, 202-9.
GERTON, J. L., HERSCHLAG, D. & BROWN, P. O. 1999. Stereospecificity of reactions catalyzed
by HIV-1 integrase. J Biol Chem, 274, 33480-7.
GOREN, M., YOSEF, I., EDGAR, R. & QIMRON, U. 2012. The Bacterial CRISPR/Cas System as
Analog of the Mammalian Adaptive Immune System. RNA biology, 9, 549-554.
HAGERMAN, P. J. 1988. Flexibility of DNA. Annu Rev Biophys Biophys Chem, 17, 265-86.
HELER, R., MARRAFFINI, L. A. & BIKARD, D. 2014. Adapting to new threats: the generation of
memory by CRISPR-Cas immune systems. Mol Microbiol, 93, 1-9.
HUTTON, R. D., CRAGGS, T. D., WHITE, M. F. & PENEDO, J. C. 2010. PCNA and XPF
cooperate to distort DNA substrates. Nucleic Acids Res, 38, 1664-1675.
KUNIN, V., SOREK, R. & HUGENHOLTZ, P. 2007. Evolutionary conservation of sequence and
secondary structures in CRISPR repeats. Genome Biol, 8, R61.
13
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
LEMAK, S., BELOGLAZOVA, N., NOCEK, B., SKARINA, T., FLICK, R., BROWN, G., POPOVIC,
A., JOACHIMIAK, A., SAVCHENKO, A. & YAKUNIN, A. F. 2013. Toroidal structure and
DNA cleavage by the CRISPR-associated [4Fe-4S] cluster containing Cas4 nuclease
SSO0001 from Sulfolobus solfataricus. J Am Chem Soc, 135, 17476-87.
LEVY, A., GOREN, M. G., YOSEF, I., AUSTER, O., MANOR, M., AMITAI, G., EDGAR, R.,
QIMRON, U. & SOREK, R. 2015. CRISPR adaptation biases explain preference for
acquisition of foreign DNA. Nature 520, 505-510.
LIU, H. & NAISMITH, J. H. 2009. A simple and efficient expression and purification system using
two newly constructed vectors. Protein Expr Purif, 63, 102-11.
MAKAROVA, K. S., GRISHIN, N. V., SHABALINA, S. A., WOLF, Y. I. & KOONIN, E. V. 2006. A
putative RNA-interference-based immune system in prokaryotes: computational analysis of
the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and
hypothetical mechanisms of action. Biol Direct, 1, 7.
NAM, K. H., DING, F., HAITJEMA, C., HUANG, Q., DELISA, M. P. & KE, A. 2012. Doublestranded endonuclease activity in Bacillus halodurans clustered regularly interspaced short
palindromic repeats (CRISPR)-associated Cas2 protein. J Biol Chem, 287, 35943-52.
NIEWOEHNER, O., JINEK, M. & DOUDNA, J. A. 2014. Evolution of CRISPR RNA recognition and
processing by Cas6 endonucleases. Nucleic Acids Res, 42, 1341-53.
NUNEZ, J. K., KRANZUSCH, P. J., NOESKE, J., WRIGHT, A. V., DAVIES, C. W. & DOUDNA, J.
A. 2014. Cas1-Cas2 complex formation mediates spacer acquisition during CRISPR-Cas
adaptive immunity. Nat Struct Mol Biol, 21, 528-34.
NUNEZ, J. K., LEE, A. S., ENGELMAN, A. & DOUDNA, J. A. 2015. Integrase-mediated spacer
acquisition during CRISPR-Cas adaptive immunity. Nature, 519, 193-8.
RAMSDEN, D. A., MCBLANE, J. F., VAN GENT, D. C. & GELLERT, M. 1996. Distinct DNA
sequence and structure requirements for the two steps of V(D)J recombination signal
cleavage. EMBO J, 15, 3197-206.
SAVILAHTI, H., RICE, P. A. & MIZUUCHI, K. 1995. The phage Mu transpososome core: DNA
requirements for assembly and function. EMBO J, 14, 4893-903.
SHMAKOV, S., SAVITSKAYA, E., SEMENOVA, E., LOGACHEVA, M. D., DATSENKO, K. A. &
SEVERINOV, K. 2014. Pervasive generation of oppositely oriented spacers during CRISPR
adaptation. Nucleic Acids Res, 42, 5907-16.
SOREK, R., LAWRENCE, C. M. & WIEDENHEFT, B. 2013. CRISPR-mediated adaptive immune
systems in bacteria and archaea. Annu Rev Biochem, 82, 237-66.
SWARTS, D. C., MOSTERD, C., VAN PASSEL, M. W. & BROUNS, S. J. 2012. CRISPR
interference directs strand specific spacer acquisition. PLoS ONE, 7, e35888.
VAN DER OOST, J., WESTRA, E. R., JACKSON, R. N. & WIEDENHEFT, B. 2014. Unravelling the
structural and mechanistic basis of CRISPR-Cas systems. Nat Rev Microbiol, 12, 479-92.
WEI, Y., TERNS, R. M. & TERNS, M. P. 2015. Cas9 function and host genome sampling in Type
II-A CRISPR-Cas adaptation. Genes Dev, 29, 356-61.
WIEDENHEFT, B., ZHOU, K., JINEK, M., COYLE, S. M., MA, W. & DOUDNA, J. A. 2009.
Structural basis for DNase activity of a conserved protein implicated in CRISPR-mediated
genome defense. Structure, 17, 904-12.
YOSEF, I., GOREN, M. G. & QIMRON, U. 2012. Proteins and DNA elements essential for the
CRISPR adaptation process in Escherichia coli. Nucleic Acids Res, 40, 5569-76.
YOSEF, I., SHITRIT, D., GOREN, M. G., BURSTEIN, D., PUPKO, T. & QIMRON, U. 2013. DNA
motifs determining the efficiency of adaptation into the Escherichia coli CRISPR array. Proc
Natl Acad Sci U S A, 110, 14396-401.
ZHANG, J., KASCIUKOVIC, T. & WHITE, M. F. 2012. The CRISPR associated protein Cas4 Is a 5'
to 3' DNA exonuclease with an iron-sulfur cluster. PLoS One, 7, e47232.
557
Figure legends
558
559
Figure 1. CRISPR spacer acquisition and Cas1
14
560
A. 1. The 3’-end of an incoming protospacer attacks the chromosomal CRISPR locus at the
561
boundary between the leader sequence and repeat 1. A transesterification reaction (yellow arrow
562
1) catalyzed by Cas1 joins the protospacer to the 5’ end of repeat 1. For many integrases a
563
(reverse) disintegration reaction can be observed in vitro. 2. Another transesterification reaction
564
(yellow arrow 2) joins the other strand of the protospacer to the 5’ end of repeat 1 on the bottom
565
(minus) strand, resulting in the formation of a gapped DNA duplex. 3. The gapped duplex is
566
repaired by the host cell DNA replication machinery, resulting in the addition of a new spacer at
567
position 1 and replication of CRISPR repeat 1.
568
B. Sequences flanking the two trans-esterification reaction sites at repeat 1 in S. solfataricus and
569
E. coli are shown. The leader is in blue, repeat in black and spacer 1 in teal. The number of central
570
nucleotides of the repeat omitted from the sequence is shown in parentheses.
571
C. Structure of Cas1 from Pyrococcus horikoshii (PDB 4WJ0) with subunits coloured blue and
572
cyan, showing the dimeric “butterfly” conformation with the active site residues highlighted in
573
green.
574
575
Figure 2. Disintegration of a branched DNA substrate by SsoCas1
576
Denaturing gel electrophoresis was used to analyse the products generated by SsoCas1 with a
577
branched DNA substrate (Substrate 1). The 5’ flap (18 nt) was released when the phosphodiester
578
backbone was attacked by the 3’-hydroxyl group at the branch point. The reaction required active
579
Cas1 and was independent of Cas2. DNA lengths are shown in blue (nt). The TES site is indicated
580
with a yellow arrow and the labelled strand with a red star. A shows reactions with the continuous
581
strand (black) labelled; B with the flap (grey) strand labelled and C with the upstream (green)
582
strand labelled, each on the 5’ end. Lanes: 1, control with no added protein; 2, WT Cas1; 3, Cas2;
583
4, Cas1+ Cas2; 5, Cas1 E142A variant; 6, E142A Cas1 + Cas2. D The 5’-flap strand was labelled
584
on the 3’ end with a fluorescein moiety, and the flap reduced to 14 nt (Substrate 1-FAM). Cas1
585
catalyses the TES reaction generating a 53 nt labelled strand. Lane C: control incubation in
586
absence of Cas1. E TES reactions were carried out with SsoCas1, or the E142A active site
587
mutant, on a fork substrate containing a nicked SacI restriction site spanning the branch point
588
(SacI substrate). A TES product of 53 nucleotides is visible in lane 2 containing Cas1. The right-
589
hand panel shows the effect of adding SacI restriction enzyme after the TES reaction. The TES
590
product is no longer visible, but a shorter product of 25 nucleotides is present indicating
591
regeneration of the SacI recognition sequence by the TES reaction.
592
593
Figure 3. TES activity of E. coli and S. solfataricus Cas1
594
A. E. coli Cas1 also catalyses an efficient metal dependent disintegration reaction. TES reactions
595
were carried out under standard conditions, using Substrate 3 and varying the divalent metal ion
596
as indicated. EcoCas1 showed robust TES activity in the presence of cobalt, magnesium and
597
manganese. Each of the three strands of the substrate was labelled individually as for Figure 2 (5’
15
598
label indicated by a star). Lanes were: c, substrate alone; substrate incubated with Cas1 and 5 mM
599
of E, EDTA; Co, cobalt chloride; Ca, calcium chloride; Mg, magnesium chloride; Mn, manganese
600
chloride for 30 min at 37 °C. B. Concentration dependence of Cas1 TES activity. Substrate 3 (50
601
nM) was incubated with the indicated concentration of Sso or EcoCas1 for 30 min under standard
602
assay conditions and the reactants were analysed by denaturing gel electrophoresis and
603
phosphorimaging. SsoCas1 showed maximal activity at 250 nM, representing a 5-fold molar
604
excess of enzyme over substrate, with a decline in activity above 500 nM enzyme. EcoCas1 had
605
maximal activity that plateaued above 250 nM enzyme. C. Quantification of the data. These data
606
are representative of duplicate experiments.
607
608
Figure 4. Importance of flap and 3’ terminus structure.
609
The importance of the released 25 nt 5’-flap structure was investigated by varying the length of
610
duplex DNA in that arm from 0 (canonical 5’-flap) to a full 25 bp (nicked 3-way junction) (left hand
611
panel, all based on substrate 19). All supported robust disintegration activity by SsoCas1. An intact
612
Y- junction did not support TES activity. Lanes: C, substrate alone (1, 5, 9, 13, 17); E, SsoCas1
613
E142A variant 30 min incubation (2, 6, 10, 14, 18); incubation with wild-type SsoCas1 for 10 and
614
30 min (other lanes).
615
The right hand panel shows the effect of replacing the attacking 3’-hydroxyl moiety at the branch
616
point with a phosphate group (3’phos substrate) no TES or nuclease activity was observed for
617
either Sso or EcoCas1. C, substrate alone; E, SsoCas1 E142A variant.
618
619
Figure 5. Sequence specificity of the disintegration reaction at the +1 position
620
The nucleotide at the acceptor (+1) position was varied systematically to assess the sequence
621
dependence of the disintegration reaction carried out by Cas1 from S. solfataricus (A, B) and E.
622
coli (C, D)) (Substrates 3, 6, 7, 8). In the gels on the left (A and C) each substrate was incubated
623
with Cas1 for 1, 2, 3, 5, 10, 15, 20 and 30 min in reaction buffer prior to electrophoresis to separate
624
the cleaved 5’-flap from the intact substrate. C – control with no Cas1 added. The plots on the right
625
(B and D) show quantification of these assays. Data points represent the means of triplicate
626
experiments with standard errors shown. The data were fitted to an exponential equation, as
627
described in the methods, and for EcoCas1 a variable floating end point was included to allow
628
fitting as the reaction did not go to completion. The effect of Cas2 (150 nM) on EcoCas1 (150 nM)
629
sequence specificity for substrates (50 nM) varying at position +1 (substrates 3, 6, 7, 8) was also
630
tested (E). The second panel from the right is a composite image from two phosphorimages of the
631
same time course as indicated by a black line.
632
633
Figure 6. Sequence specificity of the disintegration reaction at the -1 position
634
The nucleotides participating in the disintegration reaction were varied systematically at the -1
635
position (substrates 3, 9, 10, 11). For SsoCas1 (A) there was some preference for adenine at this
16
636
position, consistent with integration site 1. For EcoCas1 (B, C), a cytosine at position -1 was
637
disfavoured over all other possibilities, even when the residue equivalent to the “incoming”
638
nucleotide was also a cytosine (substrates 15, 16, 17, 18). Each substrate was incubated with
639
Cas1 for 5, 10 and 30 min in reaction buffer prior to electrophoresis. C – control with no Cas1
640
added.
641
642
Figure 7. Sequence specificity of the disintegration reaction at the -2 position
643
The nucleotides participating in the disintegration reaction were varied systematically at the -2
644
position, which is a cytosine (Sso) or guanine (Eco) at integration site 1, and variable at integration
645
site 2 (substrates 10, 12, 13, 14). A. SsoCas1; B. EcoCas1. Each substrate was incubated with
646
Cas1 for 5, 10 and 30 min in reaction buffer prior to electrophoresis. C – control with no Cas1
647
added. C. For EcoCas1, the clear preference for guanine at position -2 was confirmed by more
648
detailed kinetic analysis as described for Figure 5.
649
650
Figure 8. Sequence specificity of the EcoCas1 disintegration reaction for the incoming
651
nucleotide
652
The nucleotide corresponding to the incoming 3’ end of the new spacer, which is the nucleotide at
653
the 3’ end of the 5’-flap in the disintegration substrate, was varied systematically to determine its
654
effect on the disintegration reaction catalysed by EcoCas1 (substrates 2, 3, 4, 5). C – control with
655
no Cas1 added. Time points were 5, 10 and 30 min.
656
657
Figure 9. Disintegration of authentic E. coli integration intermediates
658
Disintegration substrates corresponding to the expected site 1 and site 2 integration products
659
arising from the integration of spacer 3 into the CRISPR array were constructed and tested (spacer
660
3-1 and spacer 3-2 substrates). EcoCas1 processed both, with the rate of reaction significantly
661
higher for the substrate corresponding to site 1 (the top strand) at the leader:repeat junction. Data
662
points represent the means of triplicate experiments with standard errors shown.
663
664
Figure 10. Reaction scheme for spacer integration and disintegration by E. coli Cas1. The
665
Cas 1-2 complex integrates new spacers via two joining reactions (1 and 2) at either end of the first
666
CRISPR repeat, which differ in their sequence context. Disintegration activity by E. coli Cas1
667
shows clear preference for the sequence at site 1, with the guanines at position +1 and -2
668
particularly important. At site 2, the sequence context is not optimal for disintegration in vitro,
669
leading to slower reaction rates. In the active site of Cas1, these nucleobases likely make specific
670
interactions with catalytic residues, and the DNA duplex structure may be distorted.
671
17
672
Table 1. Sequence of oligonucleotides used for substrate construction
Strand
Sequence 5’-->3’
Length
1a
TAGTAAGAGATTAATAAACCCTCAGATAATCTCTTATAGAATTGAAAGTTCGG
53
1b
TTTTTTTTTTTTTTTTTTATTATCTGAGGGTTTATTAATCTCTTACTA
48
1c
CCGAACTTTCAATTCTATAAGAG
23
2a
TAGTAAGAGATTAATAAACCCTCAGATAACCTCTTATAGAATTGAAAGTTCGG
53
2b
TTTTTTTTTTTTTTTTTTGTTATCTGAGGGTTTATTAATCTCTTACTA
48
3b
TTTTTTTTTTTTTTTTTAGTTATCTGAGGGTTTATTAATCTCTTACTA
48
4b
TTTTTTTTTTTTTTTTTCGTTATCTGAGGGTTTATTAATCTCTTACTA
48
5b
TTTTTTTTTTTTTTTTTGGTTATCTGAGGGTTTATTAATCTCTTACTA
48
6a
TAGTAAGAGATTAATAAACCCTCAGATAAGCTCTTATAGAATTGAAAGTTCGG
53
6b
TTTTTTTTTTTTTTTTTACTTATCTGAGGGTTTATTAATCTCTTACTA
48
7a
TAGTAAGAGATTAATAAACCCTCAGATAAACTCTTATAGAATTGAAAGTTCGG
53
7b
TTTTTTTTTTTTTTTTTATTTATCTGAGGGTTTATTAATCTCTTACTA
48
8b
TTTTTTTTTTTTTTTTTAATTATCTGAGGGTTTATTAATCTCTTACTA
48
9a
TAGTAAGAGATTAATAAACCCTCAGATAACATCTTATAGAATTGAAAGTTCGG
53
9c
CCGAACTTTCAATTCTATAAGAT
23
10a
TAGTAAGAGATTAATAAACCCTCAGATAACTTCTTATAGAATTGAAAGTTCGG
53
10c
CCGAACTTTCAATTCTATAAGAA
23
11a
TAGTAAGAGATTAATAAACCCTCAGATAACGTCTTATAGAATTGAAAGTTCGG
53
11c
CCGAACTTTCAATTCTATAAGAC
23
12a
TAGTAAGAGATTAATAAACCCTCAGATAACTCCTTATAGAATTGAAAGTTCGG
53
12c
CCGAACTTTCAATTCTATAAGGA
23
13a
TAGTAAGAGATTAATAAACCCTCAGATAACTGCTTATAGAATTGAAAGTTCGG
53
13c
CCGAACTTTCAATTCTATAAGCA
23
14a
TAGTAAGAGATTAATAAACCCTCAGATAACTACTTATAGAATTGAAAGTTCGG
53
14c
CCGAACTTTCAATTCTATAAGTA
23
SacI-a
TAGTAAGAGATTAATAAACCCTCAGATGAGCTCTTATAGAATTGAAAGTTCGG
53
SacI-b
TTTTTTTTTTTTTTCTCATCTGAGGGTTTATTAATCTCTTACTA
44
1b-3'-FAM
TTTTTTTTTTTTTTATTATCTGAGGGTTTATTAATCTCTTACTA-FAM
48
19a
CCTCGAGGGATCCGTCCTAGCAAGCCGCTGCTACCGGAAGCTTCTGGACC
50
19b
GCTCGAGTCTAGACTGCAGTTGAGAGCTTGCTAGGACGGATCCCTCGAGG
50
19c
GGTCCAGAAGCTTCCGGTAGCAGCG
25
20d-10
AGTCTAGACTCGAGC
15
18
20d-5
ACTGCAGTCTAGACTCGAGC
20
20d
TCTCAACTGCAGTCTAGACTCGAGC
25
25c-d
GGTCCAGAAGCTTCCGGTAGCAGCGTCTCAACTGCAGTCTAGACTCGAGC
50
1c-3'P
CCGAACTTTCAATTCTATAAGAG-phos
25
Sp3-1a
CTGGCGCGGGGAACTCTCTAAAAGTATACATTTGTTCTT
39
Sp3-1b
TGTAATTGATAATGTTGAGAGTTCCCCGCGCCAG
34
Sp3-1c
AAGAACAAATGTATACTTTTAGA
23
Sp3-2a
CCAGCGGGGATAAACCGTTTGGATCGGGTCTGGAATTTC
39
Sp3-2b
TGTTCCGACAGGGAGCCCGGTTTATCCCCGCTGG
34
Sp3-2c
GAAATTCCAGACCCGATCCAAAC
23
673
674
19
675
Table 2. DNA constructs used in this study.
676
The sequence of the central portion of the junction (positions -2, -1, 1 and the incoming nucleotide
677
(IC)) for each substrate is shown. The oligonucleotides used to assemble the complete substrate
678
are indicated.
Junction sequence
Substrate
Oligonucleotide components
-2
-1
1
IC
Substrate 1
1a, 1b, 1c
A
G
A
T
Substrate 1-FAM
1a, 1b-3'-FAM, 1c
A
G
A
T
SacI substrate
SacI-a, SacI-b, 1c
A
G
C
T
Substrate 2
2a, 2b, 1c
A
G
G
T
Substrate 3
2a, 3b, 1c
A
G
G
A
3’-phos substrate
2a, 3b, 1c-3'P
A
G
G
A
Substrate 4
2a, 4b, 1c
A
G
G
C
Substrate 5
2a, 5b, 1c
A
G
G
G
Substrate 6
6a, 6b, 1c
A
G
C
A
Substrate 7
7a, 7b, 1c
A
G
T
A
Substrate 8
1a, 8b, 1c
A
G
A
A
Substrate 9
9a, 3b, 9c
A
T
G
A
Substrate 10
10a, 3b, 10c
A
A
G
A
Substrate 11
11a, 3b, 11c
A
C
G
A
Substrate 12
12a, 3b, 12c
G
A
G
A
Substrate 13
13a, 3b, 13c
C
A
G
A
Substrate 14
14a, 3b, 14c
T
A
G
A
Substrate 15
2a, 4b, 1c
A
G
G
C
Substrate 16
11a, 4b, 11c
A
C
G
C
Substrate 17
10a, 4b, 10c
A
A
G
C
Substrate 18
9a, 4b, 9c
A
T
G
C
Substrate 19
19a, 19b, 19c
C
G
G
A
Gap10
19a, 19b, 19c, 20d-10
C
G
G
A
Gap5
19a, 19b, 19c, 20d-5
C
G
G
A
Nick
19a, 19b, 19c, 20d
C
G
G
A
20
Y-junction
19a, 19b, 20c-d
C
G
G
A
Spacer 3-1 substrate
Sp3-1a, Sp3-1b, Sp3-1c
G
A
G
A
Spacer 3-2 substrate
Sp3-2a, Sp3-2b, Sp3-2c
A
C
G
C
679
680
681
682
Table legends:
683
Table 1. Sequence of oligonucleotides used for substrate construction
684
Table 2. DNA constructs used in this study.
685
686
687
688
689
690
691
692
693
694
695
696
697
698
Source data:
Figure 3-source data 1: Concentration Cas1
Figure 5-source data 1: Nucleotide at +1 position
Figure 5-source data 2: Nucleotide at +1 position
Figure 7-source data 1: Nucleotide at -2 position
Figure 9-source data 1: E.coli Site 1 vs Site 2 time course
21
A
B
protospacer
+ strand
Sso
1
5
Leader
1
Repeat 1
1st integration
Spacer 1 CRISPR
locus
5
+ strand
Eco
C
5
2
2
-1 +1
TAGAGAGT(21)ACCGNNNN
ATCTCTCA(21)TGGCNNNN
+1’-1’
2nd integration
-1 +1
CTCAGATA(16)AAAGNNNN
GAGTCTAT(16)TTTCNNNN
1
disintegration
Cas1
± Cas2
Repeat 1 Spacer 1
+1’-1’ - strand
2
3
Leader
1
active
site
2
- strand
5’
5’
18
Cas1
30
OH
23
m
rea
53
53
A
5’
5’
14
30
OH
23
C
2
2
as
as
2
1 2 1+ 2A 2A
as as as E14 E14
C C C C
+
C
53
48
2
1 2 1+ 2A 2A
as as as E14 E14
C C C C
+
C
5’
E
+ SacI
C
53
TES
product
Cas1
E142A
C
Cas1
E142A
OH
23
53
23
25
23
1 2 3 4 5 6
SacI
product
53
SacI
25
1 2 3 4 5 6
5’
Cas1
5’
18
1 2 3 4 5 6
5’
53
44
2
+
53
53
as
2
1 2 1+ 2A 2A
as as as E14 E14
C C C C
C
Cas1
+
Cas1 Cas2 Cas2 E142A
5’
B
C
D
OH
5’
downstream
st
up
18
A
B
C
5’-flap 25 nt
OH
OH
C E
1
OH
C E
2
3
4
5
nicked 3way
junction
5 nt gap
10 nt gap
C E
6
7
8
3-way
junction
P
OH
C E
3’-phosphate
C E
9 10 11 12 13 14 15 16 17 18 19 20
C E
P
Sso
Sso
Eco C E
Eco
+1 position
5’
G
G
G
A
A
5’
A
G
5’
A
A
A
5’
C
G
5’
5’
5’
5’
C
C
C
C
B
T
SsoCas1
C
D
EcoCas1
E
EcoCas1 +
Cas2
-1 position
A
5’
5’
AG
5’
Leader
Repeat 1
Spacer 1
A
CG
5’
5’
A
A
A
GG
5’
5’
TG
1
5’
-2 -1 +1
C
C
C
CTCAGATA(16)AAAGNNNN
Sso
GAGTCTAT(16)TTTCNNNN
C
+1’ -1’-2’
2
1
SsoCas1
-2 -1 +1
Eco TAGAGAGT(21)ACCGNNNN
ATCTCTCA(21)TGGCNNNN
+1’ -1’-2’
2
B
5’
5’
5’
C
5’
C
5’
C
EcoCas1
CG
AG
5’
C
C
GG
5’
C
TG
5’
C
C
5’
C
C
AG
5’
5’
C
A
CG
5’
5’
A
A
A
GG
5’
5’
C
TG
5’
C
-2 position
A
5’
5’
C
G
AA
5’
C
Leader
A
G
CA
5’
5’
A
A
A
G
GA
5’
5’
G
TA
Spacer 1
1
5’
C
Repeat 1
-2 -1 +1
C
CTCAGATA(16)AAAGNNNN
Sso
GAGTCTAT(16)TTTCNNNN
+1’ -1’-2’
2
1
SsoCas1
-2 -1 +1
Eco TAGAGAGT(21)ACCGNNNN
ATCTCTCA(21)TGGCNNNN
+1’ -1’-2’
2
B
5’
5’
C
AG
C
5’
C
AG
A
5’
C
C
A
AG
G
5’
A
A
A
AG
5’
5’
T
5’
C
EcoCas1
incoming nucleotide - E. coli
5’
5’
GG
5’
C
G
C
GG
5’
5’
C
A
T
GG
5’
5’
GG
5’
C
C
incoming spacer 3
site 1
5
site 2
Leader
1
Repeat
Spacer 4
2
5
A
AT
spacer3
GT
TG
repeat
A
GAGTTCCC
site 1 substrate
A
TAGCTCTCAAGGG
T
T
T
5AC AAA
TG
leader
5
GG
GA spacer3
GC
CC
repeat
GGTTTATC
spacer4
C
site 2 substrate
AATATGCCAAATAG
C
C
5 AT GGT
TA
protospacer
Leader
GAG
Cas1-2
p
Repeat
1
Cas1 active site
p
Spacer
1
GCN
Integration
GA G
G
Disintegration
GCN
G
Integration 2
GA
G
GC
N
CN