Download lecture_23 - supporting lehigh cse

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mutation wikipedia , lookup

Holliday junction wikipedia , lookup

DNA sequencing wikipedia , lookup

DNA barcoding wikipedia , lookup

DNA repair wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Molecular evolution wikipedia , lookup

Agarose gel electrophoresis wikipedia , lookup

Maurice Wilkins wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Community fingerprinting wikipedia , lookup

Nucleosome wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Replisome wikipedia , lookup

DNA vaccination wikipedia , lookup

Transformation (genetics) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Molecular cloning wikipedia , lookup

Non-coding DNA wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcript
DNA Computing
Charles Ormsby III
CSE 497
4/15/2004
Outline
• DNA Computing Characteristics
• Different Approaches
• Lipton’s Paper
– DNA Solution of Hard Computational Problems
• Practical Purposes
• Future Work/Funding
• References
DNA Computing Characteristics
(Advantages & Disadvantages)
DNA Computation Characteristics
Parallel Processing
Processes all possible solutions simultaneously!
Well kind of, but it is not instantaneous
AND, it is a Physical Process!
Therefore, the molecular steps required to process
the solution set can take weeks
But, we are finding ways improve time efficiency! More To Come
DNA Computation Characteristics
Read/Write Rate of DNA
DNA replication rate = 500 base pairs per second
- 10 times faster than human cells
- Very low error rates
But only 1000 bits/sec? Compare to the data throughput of an
average hard drive? SLOW!!!
Can anyone think of an advantage that DNA-based computers might
have over the way today’s PC’s interact with memory?
http://www.arstechnica.com/reviews/2q00/dna/dna-2.html
DNA Computation Characteristics
…YES, copies of the replication enzymes can work on DNA in parallel
*Bonus* - Replication enzymes can start on the second replicated strand of DNA even
before they're finished copying the first one. So already the data rate jumps to
2000 bits/sec
Electric computers are incapable of such a feat!
http://www.arstechnica.com/reviews/2q00/dna/dna-2.html
DNA Computation Characteristics
Read/Write Rate of DNA (cont’d)
Look what happens after each replicating iteration
– number of DNA strands increases exponentially
• 2^n after n iterations
– Data rate increases by 1000 bits/sec per strand
After 10 iterations, replication rate = 1Mbit/sec
And, after 30 iterations it increases to 1000 Gbits/sec
This is well beyond the sustained data rates of the fastest hard drives!!!
http://www.arstechnica.com/reviews/2q00/dna/dna-2.html
DNA Computation Characteristics
Data density – { A, T, C, G}
Bases spaced every 0.35 nanometers
1-dimension = 18 Mbits per inch
2-dimension = Over one million Gbits per square inch
(assuming one base per square nanometer)
Typical high performance hard drive
data density = 7 Gbits per square inch
A factor of over 100,000 smaller!!
http://www.arstechnica.com/reviews/2q00/dna/dna-2.html
DNA Computation Characteristics
Double stranded nature
- Every DNA sequence has a natural complement
If S = ATTACGTCG
S‘ = TAATGCAGC, its complement
DNA’s complementary nature makes it a unique data
structure for computation and can be exploited in
many ways, such as Error Correction
DNA Computation Characteristics
DNA Error Rates
• Biological error rate 1/10^9 copied bases
• Hard drive read error rate 1/10^13
Error Correction: Errors occur due to many factors, for examples…
– Incorrect insertions/deletions
– Damage from thermal energy and UV energy from the sun
However, if the error occurs in one of the strands of double
stranded DNA, repair enzymes can restore the proper DNA
sequence by using the complement strand as a reference.
RAID 1 array
http://www.arstechnica.com/reviews/2q00/dna/dna-1.html
DNA Computation Characteristics
The Statistics of Randomness
Pertaining to Adleman’s method…
All HDPP’s paths are equally likely to be formed during the random
production of sequences
In other words, over a large well distributed solution set, all
solutions (or at least a great majority) should be present
*This is key because in order for the DNA computer to arrive at the
correct solution, the solution must first exist in the solution set
Statistics – If only 99% of the solutions exist in the solution set
than the method will have a successrate of only…?
Different Approaches
Free Floating vs. DNA Chips
Free Floating
Approach 1: Bits of DNA float freely in a test tube
– (pioneered by Leonard M. Adleman)
Free Floating
Advantages:
- Strong general problem solving application
- Increased freedom in experimentation
i.e. Immediate scalability by amplification
(could the freedom also be also considered a disadvantage?)
- Can encode unique problems
- Scales very well
Can you think of any other advantages?
HAHA, neither could I
DNA-based Chips
Approach #2: A gold-plated square of glass (one inch
square) anchors as many as a trillion individual
strands of DNA to the glass.
Microarrays
http://www.dhgp.de/ethics/ethics02.html
DNA-based Chips
Advantages:
- Easier to handle, specific orientation
- Keeps out impurities
- Serves as a building block to scale upwards
- Programmable interfaces (in the future)
- Very useful for storing information about Bioagents
Business Quiz:
Why is this approach more appealing to
corporations and institutions who fund research?
DNA-based Chips
Can be manufactured!!! =
$$$$$$$$$$$$$$$$
Lipton’s Paper
DNA Solution of Hard Computational Problems
Lipton, Richard J., DNA Hard Solution of Computational Problems.
Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Richard Lipton’s: DNA Solution of Hard
Computational Problems
Two factors limit any computers performance
1) Parallel processing capabilities
3 grams of water  1022 molecules
2) Computations per unit time
100 million instructions per second
Human Time vs. Computation Time
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Richard Lipton’s: DNA Solution of Hard
Computational Problems
State-of-the-Art Supercomputer
– 100 million instructions per second
– Biological computers are limited to only a
fraction of an experiment per second
• Doesn’t the complexity of the experiment
determine the difference?
However, DNA computers counter the
instruction time disparity with parallelism
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Richard Lipton’s: DNA Solution of Hard
Computational Problems
Traveling Salesman Revisited
– Conventional computer can solve tour with 70 cities, but
would fail with 100 or more cities
• Even with 1023 parallel processors, Brute force is too inefficient
However, are DNA computers only advantageous for
problems with very large solutions sets?
No, Adelman’s work can be extended to produce
solutions to all problems that are obtainable and
unobtainable by traditional CPUs in much less time
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Richard Lipton’s: DNA Solution of Hard
Computational Problems
NP-complete  The Satisfaction Problem
(SAT)
SAT is a simple search problem, and was one of the
first NP-complete problems
Consider:
F = (x V y) Λ (Γx V Γy)
Current Best Method: test all 2n solutions for ‘n’
variables
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Richard Lipton’s: DNA Solution of Hard
Computational Problems
Truth Table
Current Best Method: test all 2n solutions for ‘n’
variables
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Richard Lipton’s: DNA Solution of Hard
Computational Problems
Initial Assumptions/Conditions
– This model is simple and idealized
• Ignores many known complex effects, but is an excellent
first order approximation
– Strands of DNA are just sequences
• α1,…, αk of the set {A,C,G,T}
– Double stranded DNA are a pair of sequences
• For i = 1,…,k; given α1,…, αk and b1,…, bk both
sequences of the set {A,C,G,T}; α1 must complement b1,
meaning AT or CG
– Only consider strands with a length of 20
nucleotides
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Richard Lipton’s: DNA Solution of Hard
Computational Problems
Five Simple operations the can be performed
on test tubes that contain DNA strands
1)
2)
3)
4)
5)
Possible to synthesize a large number of copies of any
single strand
Annealing produces a double strand from a single strand
and its complementary strand
Given a test tube of DNA, one can extract a strand that
contains some simple pattern of length ‘l’
Using a Polymearse Chain Reaction (PCR), one can detect
whether there are DNA strands at all in the test tube
All of the DNA in the test tube may be amplified by
replicating the strands in the test tube
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
The Theory
One fixed test tube
– The set in the test tube corresponds to the
following graph Gn
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
All paths the travel from a1 to an + 1 encode an
‘n’-bit binary string
At each stage, a path has exactly two choices
1) Unprimed node encodes a 1
2) Primed node encodes a 0
Therefore, the example path a1x’a2ya3 encodes 01
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
The Solution Set Discovery
1) Encode graph’s vertices in DNA
2) Encode edges in DNA
3) Encode starting and ending points in DNA
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Step 1 - vertices in DNA
• The Graph is encoded in a test tube of DNA
– Each vertex of the graph is assigned a random
pattern of length ‘l’ from {A,C,G,T}
• Each encoding is referred to as the name of
the vertex and is comprised of two parts
1st half  pi
2nd half  qi
Therefore, each vertex can be referenced by piqi
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Step 2 - edges in DNA
Then, fill a test tube with the following…
…For each vertex, add many copies of a 5’  3’ DNA sequence of the
form piqi
…For each edge i  j, put many copies of a 3’  5’ sequence that is of
the form (ΓqjΓpi)
If…
Vertex i = ATCGGCTACTCCTGACTTGA
pi = ATCGGCTACT
qi = CCTGACTTGA
Vertex j = AGGTTCAGTCAGGCCTATTC
pi = AGGTTCAGTC
qj = AGGCCTATTC
Therefore, for edge I  j a sequence like the following would be
added…
Γqj
= GGACTGAACT
+
Γpi = TCCAAGTCAG
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Step 3 – end points in DNA
Then, add the following DNA strands…
…Add a 3’  5’ sequence of length ‘l /2’ that is complementary to the
first half of the initial vertex
…Similarly, add 3’  5’ sequence of length ‘l /2’ that is complementary
to the last half of the final vertex
In other words, add Γp1 Γqn)
If initial vertex was…
ACTTGCCATCTCCGATACTT
And the final vertex was…
TCGCCTAATCTACGATCTTA
then add…
TGAACGGTAG + ATGCTAGAAT
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Goal of Initial Solution Set
KEY = That every legal path in Gn corresponds to a
correctly matched sequence of vertices and edges
*** Any path through the graph must contain a sequence that
alternates between vertex, edge, vertex, edge,...
Try this visual…
Consider the edge v  u, any path that passes through v
and then passes through u must fit together like “bricks”
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
So, the top 5’  3’ represents a series of vertices
Whereas, the bottom 3’  5’ represents an edge
Furthermore…
Vertex ‘v’ is encoded as puqv
Edge ‘uv’ is encoded as Γ qv Γ pu
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Why is this ordering significant?
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
…the end of the vertex and the beginning of the edge
can anneal because they are complementary!
Similarly, the end of the edge and the beginning of the
next vertex can anneal too!
High Probability of No inadvertant paths
1) Sequences are chosen at random
2) The sequence lengths are large
After the annealing, all of the possible paths through
the graph will be encoded into ‘n’-bit long DNA
sequences
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Similarity Between
Sequences
At any given vertex in a path, the choice is simply left or right,
therefore, all paths are similar
What does this mean?
All paths are equally likely to be formed during the random
production of sequences
In other words, over a large well distributed solution set, all
solutions (or at least a great majority) should be present
***This is key because in order for the computer to arrive at the
correct solution, the solution must first exist in the solution set
Statistics!
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Extraction Operations
Notation
E(t,i,a), denotes all sequences in test tube ‘t’ where i == a
Perform one extract operation such that…
checks for the sequence that corresponds to the name of xl if
a = 1,
…and if a = 0, it check for x’l
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Extraction Operations
1) Construct a series of test tubes
t0 = contains all sets
t1 = E(t0, 1, 1)
t’1 = remainder of t1
t2 = E(t’1, 2, 1)
Values Present
{00,01,10,11}
{10,11}
{00,01}
{01}
Pour t1 and t2 together to form t3
t3 = t1 + t2
{01,10,11}
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Extraction Operations
2) Construct a series of test tubes
t4 = E(t3, 1, 0)
t’4 = remainder of t4
t5 = E(t’4, 2, 0)
Values Present
{01}
{00,10,11}
{10}
Pour t4 and t5 together to form t6
t6 = t4 + t5
{01,10}
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Extraction Operations
3) Check to see if there are DNA strands
available in t6
Those left in t6 are the satisfying assignment!
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Understanding How it Works
Test tube t3 consists of all the sequences that satisfy
the first clause {01,10,11}
…and, similarly t6 consists of all those that satisfy the
second clause and are contained in t3
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
More General Case
Any SAT problem on…
‘n’ variables, and
‘m’ clauses,
can be solved with at most ‘m’ extract steps
(with one detect step at end)
Lipton’s Acknowldegments
Operations are assumed perfect and without error
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
Practical Purposes
Purposes
Counter Bioterrorism/Monitor Genetic Progression
Institute for Countermeasures against Agricultural Bioterrorism (ICAB):
Plan:
1) Obtain DNA sequences from crops, animals, bio-agents, etc.
2) Deploy DNA-chip technology to identify and characterize
3) Build geo-referenced information system
4) Predict and track the spread of bio-agents after introduction
5) Create powerful DNA-based tools for monitoring and enhanced
diagnosis
DNA microarrays & DNA-based chips
- Can store 1,000 to 100,000 different diagnostic DNA sequences
Next generation will contain one million tags!
http://icab.tamu.edu/
Purposes
Predictive Gene Testing
http://www.dhgp.de/ethics/ethics02.html
Poker Playing
DNA Computing: 7th International Workshop on DNA Based Computers, Dna7, Tampa,
Florida, June 10-13, 2001: Revised Papers
Weighted-Recursive Algorithms
DNA Computing: 7th International Workshop on DNA Based Computers, Dna7,
Tampa, Florida, June 10-13, 2001: Revised Papers
Pessimism
1) Too fragile and prone to error
2) The field is dominated by hard-core enthusiasts who, will be
forced to "slog through and do the heavy research" before
there is a major breakthrough
Optimism
However, keep in mind the first commercially available electronic
computer was not well received, and IBM in 1951 had to reinvent what
they spent millions of dollars and years working on to fit customers needs
(such as payroll)
http://www.jsonline.com/alive/news/0607dna.stm
The Future of DNA Computing
Commercial application by 2010
Alternative to traditional computing by 2020
Vision: Today we have not one but several companies making
"DNA chips," where DNA strands are attached to a silicon
substrate in large arrays (for example Affymetrix's genechip).
Production technology of MEMS is advancing rapidly, allowing
for novel integrated small scale DNA processing devices. The
Human Genome Project is producing rapid innovations in
sequencing technology. The future of DNA manipulation is
speed, automation, and miniaturization
http://www.jsonline.com/alive/news/0607dna.stm
Research Funding
Funding:
National Science Foundation
Pentagon's Defense Advanced Research
Projects Agency - Much of the military's
interest arises from the increasing
sophistication of encryption techniques that
other countries can use to encode their data.
As a result, Washington needs ever-morepowerful computers for code breaking
Internet References
http://chronicle.com/data/articles.dir/art-44.dir/issue-4.dir/14a02301.htm
http://www.jsonline.com/alive/news/0607dna.stm
http://www.arstechnica.com/reviews/2q00/dna/dna-1.html
Book/Papers References
Lipton, Richard J., DNA Hard Solution of Computational Problems. Science,
New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545
DNA Computing: 8th International Workshop on DNA Based Computers,
Dna8, Sapporo, Japan, June 10-13, 2002: Revised Papers (Lecture Notes
in Computer Science, 2568)
DNA Computing: 7th International Workshop on DNA Based Computers,
Dna7, Tampa, Florida, June 10-13, 2001: Revised Papers
Future References
http://www.nas.nasa.gov/
http://www.nas.nasa.gov/Research/Reports/reportsarchive.html