* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download No Slide Title - Department of Electrical Engineering and Computing
Endogenous retrovirus wikipedia , lookup
Restriction enzyme wikipedia , lookup
DNA repair protein XRCC4 wikipedia , lookup
Gene expression wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Genetic engineering wikipedia , lookup
Agarose gel electrophoresis wikipedia , lookup
DNA profiling wikipedia , lookup
Biochemistry wikipedia , lookup
SNP genotyping wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Community fingerprinting wikipedia , lookup
Genomic library wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Point mutation wikipedia , lookup
Molecular cloning wikipedia , lookup
Biosynthesis wikipedia , lookup
Transformation (genetics) wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Non-coding DNA wikipedia , lookup
DNA supercoil wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
ECECS 819: lecture 1—Introduction Computational aspects of biological systems 1 Biology—Macro and Micro Elements E. coli DNA protein E. coli chromosome An amino acid (alanine) 2 Biosystem: an “information processing system” •“sensor” / “processor”/”actuator” •Self-repairing •Stores information •Can interact with other systems (e.g., use of nerve signals to activate devices) •May be a “community” (e.g., coral, fungus)3 Goal 1: Use “micro” elements as information processing / storage devices—”biomolecular computers” E. coli DNA protein E. coli chromosome An amino acid (alanine) 4 Goal 2: Use computation to understand biomolecular systems E. coli DNA protein E. coli chromosome An amino acid (alanine) 5 Why Do We Need to Learn About Biomolecular Computing? Reason 1: “the disappearing transistor” 3m (lambda) 1.5m (lambda) 0.5m (lambda) •By 2020, “gate” will be only one atom large [Keyes, IBM] • Candidate “new” technologies: +quantum computing +biomolecular computing 6 Relative sizes: 10-18: electron 10-15: proton, neutron “nanotechnology”: 10-14: atomic nucleus 10-10: water molecule (angstrom) molecules, atoms 10-9: (nanometer, nm), one DNA “twist” 10-8: wavelength of UV light 10-7: thickness of cell membrane 0.18 or 0.13 mm, Pentium 4 wire width 10-6: diameter of typical bacterium (micron, mm) 10-5: diameter of typical cell 2-10 mm, typical MEMS feature size 10-4: width of human hair 10-3: diameter of sand grain (millimeter, mm) 10-2: diameter of nickel (centimeter, cm) 35 mm--one side of Pentium 4 chip 100: 1 meter 7 Why Do We Need to Learn About Biomolecular Computing? Reason 2: a host of potential applications •medical: diagnosis / treatment delivery / prosthetics •lab diagnostics: health care / forensics / drug development 8 Why is biomolecular computing attractive? •Size: --typical bacterium has diameter on ht order of 10-6 m. (1 micron); --one twist of DNA double helix is on the order of 10-9 m. (nanometer scale) •Power requirements should be low •Massive parallel computation is theoretically possible •I/O can be two-dimensional •Instabilities of quantum systems are much less of a problem here 9 What are the disadvantages? •Speed--typical reaction can take hours or days •Error rates--may be unacceptably high; may be introduced by mechanical steps in proocessing data •I/O--we do not yet have efficient mechanisms for doing input/output with these systems •“Herd” property--we can affect a mixture of data items; we cannot in general pick out one specific item; biomolecular computing is inherently parallel •Exponential growth in size of computation--it may be that the speed barrier in traditional computing is replaced by a size barrier in biomolecular computing--we may need too much biological material to solve a reasonable sized problem for the “computation” to be feasible 10 Major drawback: typical engineers “don’t know much about biology….” •Biology is traditionally descriptive, rather than computational (HUGE vocabulary) •Biomolecular processes are incredibly complex and many are not well understood •Field is changing rapidly •There are multiple paradigms for computing available 11 Also, there are many different subfields: bioinformatics: the application of computer technology to the management of biological information biomolecular computing: the use of biological and chemical processes to perform computations bio-inspired computing: the use of biological paradigms (e.g., neural nets, genetic algorithms) in the design of computational algorithms. Algorithms may be implemented in any appropriate technology neurocomputing:direct I/O from biological system; interfacing directly with nervous system; currently using traditional analog computing 12 And many computing paradigms: DNA computing--uses physical structure of DNA in vivo computing--uses biological processes, e.g., protein synthesis, to perform computations in silico computing--”traditional” computing; often used to refer to programs that attempt to simulate living organisms; sometimes referred to as “bioSpice” 13 So how can we get started? Some important basic terms (good reference: Brown, Genomes, Wiley-Liss, 1999): 14 •genome: biological information in an organism •DNA: deoxyribonucleic acid, carries genome of cellular lifeforms •RNA: ribonucleic acid, carries genome of some viruses, carries messages within the cell •bases: the four bases found in DNA are adenine (A), cytosine (C), guanine (G), and Thymine (T); in a “double helix” of DNA, bonds are always A--T or C--G; thus a single strand of DNA carries the information about the strand it would bond to 15 DNA—the “double helix” 16 •polynucleotide: a single DNA strand •oligonucleotide: short, single-stranded DNA molecule, usually less than 50 nucleotides in length In DNA computing, specific oligonucleotides are constructed to represent data items. •nucleotide: phosphate group + sugar + one of the 4 bases (A,C,G,T): the phosphate end is labeled 5’, the base end, 3’ Example: in Adelman’s seminal 1994 paper, oligonucleotides of length 20 were built to represent vertices and edges in a given graph: A Vertex V1 T T G C C A A G A A T Vertex V2 Edge V1-V2 17 What interesting projects can build on our knowledge of traditional computer engineering? • “structural” designs—DNA computing • “chemical” designs—using proteins as signals 18 DNA computing (“structural”, “digital”) Possible operations on DNA: •building up custom oligonucleotide sequences to represent parts of your data •splitting--can be done by heating, e.g. •recombining--can be done by cooling •cutting strand at a particular site •“sticking” two fragments together (at their ends) •sorting by some string property (including length) 19 So-----DNA computing: •uses structure of the DNA •relies on mechanical operations •answers “self-assemble” •basic steps: •encode the problem •make a “solution” of problem fragments •cool the solution so fragments will form longer strands •filter out the answers you want 20 Example: solving graph problems A T T C G A C A A G A T •Encode vertices and edges—use DNA properties to encode graph “structure” •Mix up a solution of your fragments •Cool down, get resulting “paths”, “spanning trees”, etc. 21 “Standard cell architectures, FPGAs” Basic idea (after Prof. Tom Knight, MIT): •“gates” are functional units •Ends of gates are standard “join” DNA sequences—reserved for this purpose •So we can build computational chains easily 22 Other applications of DNA computing: •general computing using “sticker” language •study of relationship between traditional architectures and DNA configurations: ---FSMs-linear DNA ---stack machines--branching DNA ---“Turing machines” (general purpose computers)-sheet DNA 23 Other applications of DNA computing (continued): •3-D self-assembled structures: •“walking and rolling DNA”: •structures for nanotube assembly: (recently reported in Science) 24 in vivo computing (“chemical” / ”analog”): uses processes within the cell (e.g., E. coli) as signals model is closer to traditional computing, with electrical signals replaced by chemical signals many processes we would like to use are not well understood requires in silico computing to generate simulations of biomolecular processes, similar to SPICE simulations in traditional electrical circuits this is a new and rapidly growing field with many potential practical applications 25 “central dogma”: DNA ----> RNA-----> protein we can use the presence or absence of the protein to indicate “1” or “0” 26 •Protein: like DNA, a protein is a linear polymer. It is made of units which are amino acids. Proteins are very complex and not completely understood. Proteins have four levels of structure: •primary: the amino acids bonded together •secondary: typically either an “alpha-helix” or a “beta-sheet” •tertiary: formed from folding of the secondary structure into a three-dimensional configuration •quartenary: formed by units folded into the tertiary structure of the protein 27 Some proteins: http://www.biochem.szote.u-szeged.hu/astrojan/protein2.htm 28 •Central Dogma: Before the discovery of retroviruses and prions, this was believed to be the basic mechanism of inheritance in all living things 29 •Plasmid: a “loop” of DNA used to introduce new genetic material into a cell •used for “genetic engineering” •typically plasmid will also have a section which ensures it will have resistance to a particular antibiotic; after insertion into cell, this will provide a marker to show that the new DNA really has been inserted 30 One possible simple mechanism: inhibits gene DNA: promoter Transcript RNA output Translate input RNA translate Protein B input Protein A output (detect by fluoresence) Summary: • 0 input --> output protein A (1); • 1 input (RNA) ---> 0 output 31 Analogy to Electrical Inverter 32 Bio-Inverter Model [Weiss 1999] 33 Deterministic Vs Stochastic Model • Deterministic Model Inverter modeled using a set of differential equations with deterministic variables. No random components. Fixed order for reactions. Stochastic Model Accounts for the random noise components. Simulations under different environmental conditions and other random noise variables. Random order for reactions. 34 Deterministic Simulation 35 Deterministic Simulation Transient Characteristics (Matlab) 36 Deterministic Simulation (6) Transient Characteristics (VHDL-AMS) Deterministic Simulation—Example (5) Transient Characteristics 37 Deterministic Simulation Modified Transient Characteristics • The transient characteristics of the inverter are computed using the modified reaction rates. • The steady state output value has doubled since the transcription rate is doubled (k7*2). • The rise of the output has decreased to about 30 seconds and the rise and fall times are equal. • The reduction of repression rate and the dissociation rate increase are the reasons for the decrease of the rise time. 38 Deterministic Simulation Modified Transient Characteristics (Matlab) 39 Stochastic Simulation • Stochastic simulation based on Gillespie algorithm [Gillespie 1977]. • Two random variables (time and the type of reaction) were introduced. • In biology, the cell reaction occurs at random intervals of time. • The reactions do not occur in order and are random. • Temperature fluctuations, decay rates and other parameters also result in random noise. 40 Stochastic Simulation 41 Some areas to explore: • Stochastic simulation—design space exploration – – – – Similar to CAD tool development for digital and analog circuits Currently trying simulated annealing, genetic algorithms Many other strategies can be explored Will also have applications in medical research • Agent-based modeling and visualization – 3D modeling and dynamic simulations using object-oriented programming • Engineering design process for biomolecular computing applications – Will modify traditional design flows for software, digital, and analog circuits – Will provide support to circuit designers and biomedical researchers • Development of DNA “standard cells” 42