* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Creating 3-Dimensional Graph Structures with DNA
Point mutation wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Microevolution wikipedia , lookup
Primary transcript wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Holliday junction wikipedia , lookup
DNA profiling wikipedia , lookup
DNA polymerase wikipedia , lookup
Genomic library wikipedia , lookup
SNP genotyping wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
DNA vaccination wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Non-coding DNA wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Epigenomics wikipedia , lookup
Molecular cloning wikipedia , lookup
History of genetic engineering wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Helitron (biology) wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
DNA nanotechnology wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
DNA supercoil wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
DIMACS Series in Discrete Mathematics and Theoretical Computer Science Creating 3-Dimensional Graph Structures with DNA Natasa Jonoska1 , Stephen A. Karl, and Masahico Saito1 Abstract. We propose solving computational problems with DNA molecules by physically constructing 3-dimensional graph structures. Building blocks consisting of intertwined strands of DNA are used to represent graph edges and vertices. Dierent blocks would be combined to form all possible 3-dimensional structures representing a graph. The solution to the Hamiltonian cycle problem provided requires a constant number of steps regardless of the number of vertices. If a solution to the graph problem exists, then a fully closed circular molecule would be formed and can be isolated. This paper introduces a method of using 3D structures in computing which might signicantly improve the eciency of computations with DNA. 1. Introduction In 1994, Adleman [1] described a laboratory experiment involving DNA in which an example of a Directed Hamiltonian Path Problem was solved. This paper opened the eld of practical DNA computing. Lipton [5] demonstrated that a large class of NP-complete problems also could be solved by encoding the problem in DNA molecules. Shortly after, several authors have suggested applications of DNA methodology for computational purposes (see for example [4] and [8]). Generally, DNA molecules have been treated as linear strings where much of the information content is encoded in the order of nucleotides that make up the DNA. These algorithms require polynomial increases in the number of steps necessary to identify a solution with increasing size of the problem (e.g., the number of vertices in a graph in [1] and the number of variables in a formula in [5]). Recently reported research [10] however, has demonstrated that it is possible to form higher order, three dimensional (3D) structures with DNA molecules. In this paper we explore the possibility of the use of 3D DNA structures. We show that, theoretically, the use of 3D DNA structures could signicantly reduce the time and steps needed to identify a solution. In fact, 3D structures allow the Hamiltonian cycle problem to be solved with a constant number of steps, regardless of the number of vertices in the graph. 1991 Mathematics Subject Classication. Primary ; Secondary . 1 Supported, in part, by the University of South Florida Research and Creative Scholarship Grant Program under grants number 1249933RO and 1249932RO. 1 c 0000 (copyright holder) 2 NATASA JONOSKA, STEPHEN A. KARL, AND MASAHICO SAITO In [12] a method was proposed to solve the Hamiltonian Path Problem in a constant number of steps by self-assembling double-crossover molecules into larger structures such that if a solution exists, then there is a closed DNA molecule which encodes the answers and which can be isolated. Self-assembling of molecules build complex DNA structures that can be viewed as lying on a 2D surface, and the computation does not make use of 3D DNA structures. Our approach is dierent in that we propose use of 3D exible structures of DNA for solving the problem geometrically. In this paper, we review the Hamiltonian cycle problem and describe our approach through a simple example, rst. A general case also is considered and we present the algorithm for arbitrary graphs. Finally, we discuss the general feasibility of the method. Acknowledgement. We are grateful to N.C. Seeman and E. Winfree for valuable comments and discussions. 2. The Hamiltonian Cycle Problem 2.1. Denition. Let G be a nite graph with V (G) the set of vertices and E (G) the set of edges. A Hamiltonian cycle c of G is a cycle (a loop on the graph when we regard G as a 1-complex, without self intersection points) such that it goes through every vertex exactly once. The Hamiltonian cycle problem (HCP) asks whether a given graph G has a Hamiltonian cycle. Figure 1. A simple four vertex graph and all individual cycles 2.2. Example. An example of a graph is depicted in Fig. 1 (left). This is the complete graph with 4 vertices (i.e., C4 ). Individual cycles through the graph are illustrated to the right of the whole graph. Since none of the four cycles in the right top of the gure pass through all vertices, they are non-Hamiltonian cycles. The three cycles illustrated in the right bottom of the gure are Hamiltonian cycles. The HCP is a well known member of a larger class of problems known as NP-complete. NP-complete problems have posed a signicant challenge to computational scientists since all conventional algorithms require (in the worst case) an exponential increase in the number of steps (or time) relative to the size of the problem. Identifying a solution to even a moderately simple problem therefore, can require a prohibitively large number of steps. In fact, the excitement over DNA as a computational tool rests in the ability to perform in parallel some of the computational steps. Thus, the number of steps to identify a solution does not increase exponentially with increases in the size of the graph. CREATING 3-DIMENSIONAL GRAPH STRUCTURES WITH DNA Figure 2. 3 Type I edge building block 2.3. Review of Adleman's Algorithm. Adleman's solution involves 4 basic steps: 1. Form random DNA paths in the graph. 2. Extract paths that go through exactly n (possibly repeating) vertices of the graph, where n is the number of vertices. 3. For each vertex i (i = 1; : : : ; n) extract the paths that go through vertex i (i.e. extract paths that visit every vertex). 4. If a path remains, a Hamiltonian cycle exists. The rst step presumably generates an exponential number of paths that contain the result. Steps 2 and 3 are used to isolate and detect the result generated by step 1. Step 3 of this algorithm must be repeated n times for a graph of n vertices. The laboratory procedure suggested for this step uses biotin label oligos that are complementary in DNA sequence to the vertices. Paths containing a specic vertex are removed from the mix when hybridized to the biotin labeled oligo that is conjugated to paramagnetic beads. The resulting mixture is sequentially treated in a similar way with each vertex specic oligo. The extraction procedure is undoubtedly less then 100% eective. With repeated use, errors will tend to accumulate and could result in false positives (i.e., concluding that a Hamiltonian cycle exists when in fact one does not). For this reason, other extraction techniques such as restriction endonuclease digestion have been suggested [2]. Although these methods may be more ecient than biotin beads, the repetitive use of the extraction technique is not avoided and errors still can accumulate. 2.4. Constructing 3D graphs with DNA. We have chosen a simple four vertex Hamiltonian cycle problem (Fig. 1) to illustrate how to construct 3D graphs with DNA. Each edge of the graph is represented by the 3D DNA structure shown in Fig. 2. Orientation of the strands of DNA are indicated with arrowheads being placed at the 30 end. Hydrogen bonds between the anti-parallel, complementary Watson-Crick (WC) bonds are depicted as dotted segments between the strands. As can be seen in the gure, each edge is a complex of intertwined single and double stranded segments of DNA. The double stranded sections function to hold the 3D structure of the edge and can vary from 25 to 35 nucleotides long depending on double-stranded stability of the sequence. For simplicity, the double helix of each of the molecules is not presented in the gure. This DNA structure is referred to as a type I edge block. Dierent such blocks are needed for each edge of the graph. The 30 ends of the DNA strands in the edge block end with single stranded segments of 20 to 30 nucleotides length that are vertex-edge specic. They have complementary sequence tails such that compatible edge blocks can form WC paired double-stranded DNA segments and be joined together. Figure 3 shows a formation of a vertex in the graph. For example, the 30 ends labeled 1 of the bottom edge 4 NATASA JONOSKA, STEPHEN A. KARL, AND MASAHICO SAITO 2 1 Figure 3. Construction of a trivalent vertex are anti-parallel and complementary to the 30 ends labeled 2 of the right top edge. To form the graph, all edge blocks are combined and their compatible ends are allowed to from double-stranded DNA. Once formed, the edges are locked together by sealing all open \nicks" in the DNA strands with DNA ligase. Three-dimensional DNA structures that do not contain open ends are referred to as graph structures. 2.5. Solving the Hamiltonian path problem: specic case. 1. Combine the edge blocks in a single mix, allow them to hybridize and then ligate them together. 2. Remove partially formed 3D DNA structures with open ends that have not been matched. Biotin labeled oligos complementary to the junction points of all of the edge blocks can be used to remove incomplete structures intact. By this process paths that visit a vertex more than once are removed except those that are part of a covering space of the graph. See Section 4.3 for discussions on covering graphs and how to remove them. 3. Choose a vertex in the graph (e.g., v0 ; Fig. 4). Primers specic to this vertex are used in polymerase chain reaction (PCR) amplication of the graph. Primers for each of the six double strands of DNA that pass through v0 are used in PCR and all circular strings corresponding to cycles passing through v0 are amplied and isolated. Other circular strings not passing through v0 (and hence not Hamiltonian cycles) will remain in the mixture at negligible concentrations. 4. Test for the presence of a Hamiltonian cycle. The lengths of DNA fragments after PCR amplication are determined by gel electrophoresis. If any of the fragments are of the appropriate length for a Hamiltonian cycle (i.e., CREATING 3-DIMENSIONAL GRAPH STRUCTURES WITH DNA v2 5 v1 v3 v0 Figure 4. Figure 5. Cycles formed by DNA A more complex graph requiring type I and type II blocks the number of vertices times the length of each building block), then a Hamiltonian cycle exists for the graph. If, for example, the strands that make a building block of an edge are 50 base pairs long and the number of vertices of the graph is 4, a Hamiltonian cycle is 200 bp in length. No further assays are needed. Figure 4 illustrates the cycles formed by DNA. To simplify the gure each line represents a double stranded DNA molecule. The graph C4 is presented with thin dotted lines and vertices are labeled. In this example, every Hamiltonian cycle is realized in the graph. By following the above procedure, we could conclude if this graph has a Hamiltonian cycle. In general, however, edge blocks cannot form every possible Hamiltonian cycle. Our proposed procedure gives a solution for this particular graph because each vertex belongs to exactly 6 cycles (and there are 6 double strands \passing" through a vertex) and all possible cycles in the graph are produced. Since this may not always be the case, a more generalized procedure is needed. 6 NATASA JONOSKA, STEPHEN A. KARL, AND MASAHICO SAITO Figure 6. Figure 7. Type II edge building block One of the DNA patterns v Figure 8. A DNA strand passing through the vertex v three times 3. Solving the Hamiltonian cycle problem: general case In this section we present a general solution to the Hamiltonian cycle problem. When considering the graph presented in Fig. 5, one can easily see that it is impossible to produce all cycles in the graph using only type I edge blocks. To obtain all cycles of this graph some of the internal cycles must contain non-crossover DNA segments. In order to circumvent this problem, we propose a type II edge block as shown in Fig. 6. By using both type I and type II blocks all possible cycles of the graph can be generated. A specic pattern of cycles obtained by these blocks is presented in Fig. 7. The vertices of the graph are represented by black dots and the edges are represented by thin dotted segments. In this pattern Hamiltonian cycles are realized. CREATING 3-DIMENSIONAL GRAPH STRUCTURES WITH DNA Figure 9. 7 A building block for vertices With this example, however, another problem arises. Although all cycles through the graph can be produced by using both types of edge blocks, circular molecules that pass through a vertex up to three times (Fig. 8) also are possible. Each of the inner three double strands presented in Fig. 3 can be a part of the same circular double stranded molecule. We therefore need to include an additional DNA structure; a vertex building block. A vertex building block of degree 3 is shown in Fig. 9. Shaded areas in the double stranded molecules of the block indicate segments that encode sequences for sites that can be cleaved by restriction endonuclease enzymes. Both sites are encoded with the same recognition site and can be cleaved simultaneously. Only the \inner" strands of the blocks are important to the graph solution. The \outer" strands are needed only as support for the inner. Three blocks are constructed for every vertex combination such that all cyclically rotated versions are represented. This allows a conguration where every pair of inner double strands of a vertex block can be cut by a restriction endonuclease. Building blocks for higher degree vertices are constructed in a manner similar to that shown in Fig. 9. For a given vertex v (with degree 3), a separate building block for every combination of two edges ei ; ej , that pass through the vertex is needed. For example, if a vertex has degree 5 with edges fe1 ; : : : ; e5 g passing through v, then a building block of the form presented in Fig. 10 is needed for each fi; j g, (i 6= j ) where fk; k0; k00 g = f1; : : : ; 5g ? fi; j g. Figure 10 represents the inner double strands of the building block and the actual structure is essentially similar to the one presented in Fig. 9. The shaded areas in Fig. 10 again represent restriction endonuclease recognition sites. The nonshaded areas of the block are 8 NATASA JONOSKA, STEPHEN A. KARL, AND MASAHICO SAITO e e kâ e k kââ v e Figure 10. e i j A ve degree vertex v building block double stranded molecules encoding path ei ej (or ej ei ) that could be part of a cycle passing through vertex v. This block is considered compatible with ei ; ej and is denoted by Bv (ei ; ej ). Note that if a graph has a degree 1 vertex, then there is no Hamiltonian cycle, and if a vertex has degree 2 shared by edges e1 and e2 , then the path e1 e2 can be substituted by an edge block. 3.1. Remark. If the degree of a vertex v is kv 3 then the number of blocks needed is: n X kv + 2m 2 i=1 where m is the number of edges. This expression is bounded by a cubic polynomial on the number of vertices (n). 3.2. Proposition. Let C = fG1; : : : ; Gm g be the set of graph structures Gi that are obtained by adding all building block molecules in a tube and allowing them to join and be ligated. Assume that C contains all possible graph structures. Let c = v0 v1 vk v0 be a cycle in the graph G. Then there is Gj 2 C that contains a circular double stranded DNA molecule that encodes c and does not contain a restriction site. Proof: Let ei be the edge from vi to vi+1 (i + 1 is taken modulo k + 1) for i = 0 : : : ; k. If a vertex vi has degree 2 then there are edge building blocks for the path ei?1 ei . Thus we can assume that all vertices have degree at least 3. We form a graph structure Gj containing vertex blocks B0 ; : : : ; Bk where Bi = Bv (ei?1 ; ei ) (i ? 1 is modulo k +1) and edge blocks for edges e0 ; : : : ; ek . If G has more than k +1 vertices, then for Gj we use arbitrary blocks for vertices other than v0 ; : : : ; vk . A building block for ei connects blocks Bi and Bi+1 . The inner double strands of the two edge building blocks (type I and II) of ei give four possibilities of paths to be constructed: ei?1 ei ei+1 ; ei?1 ei e0k ; ek ei ei+1 and ek ei e0k where ek ; e0k are some other i CREATING 3-DIMENSIONAL GRAPH STRUCTURES WITH DNA Figure 11. 9 Incompatible orientations edges incident to vertices vi and vi+1 respectively. Each type (I or II) of building blocks for ei allows exactly two of these paths to be encoded. The structure Gj is then constructed such that the building block for ei always assures a DNA segment encoding a path ei?1 ei ei+1 from blocks Bi to Bi+1 for i = 0; ; k ? 1. Hence, the structure Gj contains a double stranded circular DNA molecule encoding the cycle c and by the specic choice of B0 ; : : : ; Bk (each Bi is compatible with ei?1 ei ), c does not contain a restriction endonuclease recognition site. 3.3. Procedure. The following procedure describes the DNA based algorithm for solving the Hamiltonian cycle problem: 1. Combine all building blocks molecules in a tube and allow them to hybridize and then be ligated. 2. Remove molecules that are partially annealed or are not part of a graph structure. 3. Remove larger \covering graphs" by gel electrophoresis. Covering graphs will be discussed below. 4. Digest the DNA with a restriction endonuclease. This cleaves cycles that pass through a vertex more than once. 5. PCR amplify using primers specic to a chosen vertex v0 . 6. Size sort the DNA fragments by gel electrophoresis to determine if a Hamiltonian cycle is present. In step 1 it is assumed that every possible graph structure is obtained. By Proposition 3.2, every cycle in the graph is represented by a circular, doublestranded DNA molecule in one of the graph structures obtained. In step 2 we remove the partially formed structures. This is done with biotin beads, as before. In step 3 all closed paths that pass through a vertex more than once are linearized. Any circular molecules that remain represent a cycle in the graph. The Hamiltonian cycle is obtained by PCR amplication in the same way as previously described. It is important to note that none of the steps performed depends on the number of vertices or edges in the graph. This procedure, therefore, describes a constant time algorithm for solving the NP-complete HCP. 10 NATASA JONOSKA, STEPHEN A. KARL, AND MASAHICO SAITO Figure 12. Figure 13. T-shaped structure A physical model of type I edge block 4. Discussion of the feasibility of the method 4.1. Complexity of the building blocks. The idea of using dierent building blocks to depict the entire graph structure was inspired by the work of N. Seeman and his research group [10] where they describe the construction of dierent 3D DNA structures. The building blocks proposed here are somewhat dierent, but contain double parallel crossover molecules which have been previously discussed (see references in [10]). Currently, a major limitation to our approach is that the algorithm presented assumes that desired building blocks already are available. This is likely to be the most dicult and complex part of the procedure and specic laboratory experiments are necessary to determine the degree of diculty involved in forming these building blocks. Seeman [9] suggested two physical models for exploring 3D DNA structures and the plausibility of their construction. We built the structure we propose (type I edge building block) using one of his models, as shown in Fig. 13. This supports the plausibility of our construction. It could be simpler to encode building blocks with single stranded molecules which would carry the information of the double stranded molecules employed in our building blocks. Figure 11 shows an example of a possible single-stranded structures forming a degree 3 vertex. Since the \inner" and \outer" segments are switched at the top, the argument of Proposition 3.2 can not be applied. In order to apply the argument of Proposition 3.2 additional building blocks for the vertices that would realize all the possible extra permuatations between the \inne" and the \outer" segments are needed. Furthermore, a diculty arises in avoiding circular single stranded molecules that pass through a vertex more than once. This CREATING 3-DIMENSIONAL GRAPH STRUCTURES WITH DNA v 11 1 v v v 0 3 2 v 1 v 0 v v 3 2 Figure 14. A double cover of a graph diculty could be overcome by adding a hairpin like structure to the single strands which would contain a restriction site (Fig. 12). This additional structure was suggested to us by Seeman. The shaded part indicates the site that can be cleaved by restriction endonuclease enzymes and can be used to eliminate circular molecules that pass through a vertex more than once. This restriction site corresponds to the restriction site (shaded area of Fig. 9) of the vertex block. With this alterations of the single-stranded building blocks, the Proposition 3.2 holds and the algorithm suggested with Procedure 3.3 solves the HCP in constant time. Another possible approach, suggested by Winfree [13], is a use of k-armed junction molecules with sticky ends as a vertex building blocks. These molecules could be formed from partially phosphorylated oligos such that after formation of the graph structure by hybridization, the ligation process generates only circular molecules that encode cycles (paths that visit a vertex at most once). This simple design of vertex building blocks somewhat increases the number of building block needed, but its simplicity provides a more feasible way of using 3D DNA structures and solving the HCP in constant time. 4.2. Generating graphs. Step 1 of the procedure described in the previous section assumes that all possible combinations of graph structures are generated. This step is similar to the rst step in many DNA based algorithms as well as Adleman's algorithm described in section 2. Studies have shown, however, that even in a fairly simple mixture, appropriate reaction conditions for annealing are important and dicult to determine [6]. To date, we are unaware of laboratory experiments of this type being performed on 3D DNA structures. It is impossible, therefore, to assess the feasibility of generating all possible graph structures. 4.3. Covering graphs. All of the graph constructions proposed here are local in the sense that we start from vertex and edge building blocks that are assembled to form graphs. Graphs other than the ones intended, however, can be obtained (Fig. 14) and may consists of two (or more) copies of the given graph with a pair 12 NATASA JONOSKA, STEPHEN A. KARL, AND MASAHICO SAITO of edges crossing each other to form a new larger graph. The graph in Fig. 14 is a double covering space of C4 when the graphs are considered as topological spaces (1-dimensional complex). Theoretically any covering graph of the intended graph is possible. 4.4. Proposition. Let G = (V; E ) be a graph. Let E^ be a set of building blocks obtained by DNA molecules that encode the edges of G and V^ a set of building blocks obtained by DNA molecules that encode the vertices of G. Assume that the building blocks have single stranded 30 ends such that the building blocks of each vertex v in V have Watson-Crick compatible regions with the edges that are incident to v. If multiple copies of building blocks V^ and E^ are allowed to hybridize and ligate in a tube, then the resulting graph structures are, topologically, covering spaces of G. This proposition follows from the denition of a covering space and the fact that the structure that is formed is obtained locally (see for example [7] (Section 8.3)). Since the structures representing a covering space of a graph are much larger than the structures representing the original graph, it may be possible to remove them by gel electrophoresis. Avoiding the formation of a covering space graphs all together, however, may be impossible. We end this section with a note that, with the exception of forming the building blocks, many of the laboratory techniques suggested are fairly standard and their detailed description can be found in [3]. Their application to complex DNA structures might, however, require extensive experimentation before they can be used as proposed. 4.5. Conclusion. The procedure and the algorithm presented are only an attempt to demonstrate that using higher dimensional structures in DNA based algorithms might provide simpler and faster solutions to many problems. Electronic computers use information only in a linear (string) form. By using DNA, we can take advantage of 3D structures that can contain considerably more information than the sequence of nucleotides from which they are made. At the moment, however, obtaining 3D DNA structures is dicult. The algorithms suggested by Adleman [1] and Lipton are not of constant time, but the laboratory techniques involved are fairly standard and their general characteristics are known. To the contrary, our approach requires a constant number of steps (regardless of the size of the problem) but the laboratory techniques are generally uncharacterized as to their diculty and specicity. It is our belief, however, that as polymer technology develops, constructing the vertex and edge building blocks needed for this approach should become easier. Even if building blocks were readily available, the uncontrolled formation of covering space structures remains a formidable obstacle. Nonetheless, the covering spaces themselves carry a substantial amount of information and may be useful in other computational problems. References [1] L. Adleman, Molecular computation of solutions of combinatorial problems, Science 266 (1994), 1021-1024. [2] M. Amos, A. Gibbons, D. Hodgson, Error-resistant implementation of DNA computations, in [8] CREATING 3-DIMENSIONAL GRAPH STRUCTURES WITH DNA 13 [3] F.M. Ausubel, R. Brent, R.E. Kingston, D.D. Moore, J.G. Seidman, J.A. Smith, K. Struhl, P. Wang-Iverson and S.G. Bonitz. Current Protocols in Molecular Biology, Greene Publishing Associates and Wiley-Interscience, New York, NY (1993). [4] DNA Based Computers, edited by E. Baum, R. Lipton, DIMACS series vol. 27 AMS 1996. [5] R. Lipton, DNA solution of hard computational problems, Science 268 (1995), 542-545. [6] N. Jonoska, S.A. Karl, Ligation Experiments in DNA Computations Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC'97), April 13-16, 1997, 261-265. [7] J. R. Munkres, Topology, a rst course, Prentice-Hall, 1975. [8] Proceedings of the Second Annual Meeting on DNA Based Computers, DIMACS Workshop, Princeton, NJ, June 10-12, 1996, (in press). [9] N.C. Seeman, Physical Models for Exploring DNA Topology. Journal of Biomolecular Structure and Dynamics 5, 997-1004 (1988). [10] N. C. Seeman et al. The Perils of Polynucleotides: The Experimental gap Between the Design and Assembly of Unusual DNA Structures in [8]. [11] N.C. Seeman, private communication. [12] E. Winfree, X. Yang, N.C. Seeman, Universal Computation via Self-assembry of DNA: Some Theory and Experiments. in [8]. [13] E. Winfree, private communication. Department of Mathematics, University of South Florida, Tampa, Florida 33620 E-mail address : [email protected] Department of Biology, University of South Florida, Tampa, Florida 33620 E-mail address : [email protected] Department of Mathematics, University of South Florida, Tampa, Florida 33620 E-mail address : [email protected]