* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Sequence Optimization For Synthetic Genes
Biochemistry wikipedia , lookup
List of types of proteins wikipedia , lookup
Expanded genetic code wikipedia , lookup
Protein moonlighting wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Synthetic biology wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Gene expression wikipedia , lookup
DNA supercoil wikipedia , lookup
Genomic library wikipedia , lookup
Molecular cloning wikipedia , lookup
Non-coding DNA wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Biosynthesis wikipedia , lookup
Genetic code wikipedia , lookup
Community fingerprinting wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Point mutation wikipedia , lookup
Molecular evolution wikipedia , lookup
Sequence Optimization For Synthetic Genes Using Genetic Algorithms David Sigfredo Angulo1 Rob Vogelbacher1, Benjamin R. Capraro2, Tobin Sosnick2, Shohei Koide2 1 School of Computer Science Telecommunications and Information Systems DePaul University 2 Department of Biochemistry and Molecular Biology The University of Chicago 1 Introduction • Genetic Algorithms: – Using ideas based on the biology of genes – Create software to use such a stochastic means to search through large searchspaces – Resulting algorithm has nothing to do with genes • Designing Genes – This search space is huge – REALLY NOVEL IDEA: • Use Genetic Algorithms based on genes to design genes!! Outline • Short biology Tutorial • DNA Sequence Generation – Why is the problem difficult? • IBG Gene Designer – Genetic Algorithm (GA) solution – Heuristics and Fitness Evaluation 3 First • Before the problem can be described – Must give some background biochemistry principles • Tutorial outline – DNA – Codons – Protein • Synthetic genes – What are they and what are they used for? – Restriction Enzymes – Expressing Proteins using Vectors Transcription/Translation Transcription DNA Translation RNA RNA Polymerase Protein Ribosomes Central Dogma of Molecular Biology DNA • Deoxyribonucleic acid • Strand backbone is made of sugar & phosphate molecules • Strands connected by nitrogen containing nucleotide bases • Two strands join making a double helix • Each strand is made of nucleotides joined together Short region of DNA 2bl helix 2 nm "beads on a string" form of Chromatin 11 nm 30 nm chromatin fiber of packed nucleosomes 30 nm Section of chromosome in an extended form Condensed section of chromosome Entire mitotic chromosome 300 nm 700 nm 1100 nm DNA Four Nucleotides: AGTC DNA: Base Pairing Short Biology Tutorial • Tutorial outline – DNA – Codons – Protein – Restriction Enzymes – Expressing Proteins using Vectors DNA Sequence Generation: Codon to Amino Acid Translation 11 http://campus.queens.edu/faculty/jannr/Genetics/images/codon.jpg Short Biology Tutorial • Tutorial outline – DNA – Codons – Protein – Restriction Enzymes – Expressing Proteins using Vectors Proteins: AA Chains Proteins • Amino Acid Chains Fold Into complex 3D Structures • Functional properties depend on 3D structure • Usefulness depends on functional properties – E.g. designing drugs Designed/Expressed Proteins Extremely Useful • Designed Proteins – Can be used to study protein structure – Can be used to study effects of otther proteins • Can be designed to “knock out” other proteins • Can be designed to “block” the acgtion of other proteins • Expressed proteins – Expressed in cow’s milk or chicken eggs – Can manufacture drugs on large scales in this way • E.g. insulin Synthetic Genes • DNA sequences – “backtranslated” from a novel Protein or Amino Acid sequence Transcription DNA Translation RNA RNA Polymerase Protein Ribosomes • We’ll put the DNA for our designed protein into an organism (a vector) • Then that vector will make (express) our protein • But, how do we get the DNA into an organism??? 16 Short Biology Tutorial • Tutorial outline – DNA – Codons – Protein – Restriction Enzymes – Expressing Proteins using Vectors Restriction Enzyme Digests • Watson – Crick 1953 • Took 20 years to be able to do anything with DNA • H. Smith (and others) made a discovery that allowed manipulation and deciphering of DNA • Discovery was that bacteria produced enzymes that introduce breaks in double stranded DNA molecules whenever they encountered a specific string of nucleotides • These enzymes are called Restriction Enzymes • Restriction Enzymes can be used as precise scissors – They let biologists cut (and paste) portions of DNA EcoRI • EcoRI was the very first Restriction Enzyme discovered – "Eco" because it was isolated from E. Coli (Escherichia Coli) – "R" because it is a Restriction Enzyme – "I" because it was the first Restriction Enzyme from E. Coli – Now over 300 Restriction Enzymes known • EcoRI cleaves (restricts, digests) DNA – Between the G and A nucleotides – Only when it encounters them in the string 5'-GAATTC-3' – This is called the restriction site 5'-GAATTC-3' 3'-CTTAAG-5' Regulated by EcoRI 5'-G 3'-CTTAA AATTC-3' G-5' Sticky Ends • Many restriction enzymes in such a way that some single stranded DNA is left at both ends • These nucleotide sequences – Are complimentary to each other – Are 5'-AATT-3' in the case of EcoRI – Can base pair with other nucleotides in a sequence – Thus, are called "sticky ends" – Can temporarily hold two 5'-GAATTC-3' DNA strands together 3'-CTTAAG-5' – The enzyme ligase will permanently join Regulated by EcoRI those strands – This is called 5'-G AATTC-3' ligation 3'-CTTAA G-5' Short Biology Tutorial • Tutorial outline – DNA – Codons – Protein – Restriction Enzymes – Expressing Proteins using Vectors Gene Synthesis: On the Lab Bench • Initial Sequence Construction – Oligonucleotides (short strands of DNA) are defined with complementary overlapping sites • The “sticky ends” – Assembly PCR • Oligonucleotides and polymerase are mixed and placed in a thermocycler • Creates contiguous DNA sequence from component oligos 22 Gene Synthesis: On the Lab Bench (cont) • • • • After PCR, generated DNA sequence cut with restriction enzymes Expression hosts's plasmid cut with restriction enzymes Synthetic gene inserted into plasmid and plasmid repaired Expression Vectors – Host organisms used to express the synthetic genes (make the protein) – Typically E. Coli • Possibly Chickens or Cows • Expression vector can now express protein coded for by synthetic gene – A bit more complicated than described above!!! 23 DNA Sequence Generation: Gene Insertion 24 Outline • Short biology Tutorial • DNA Sequence Generation – Why is the problem difficult? • IBG Gene Designer – Genetic Algorithm (GA) solution – Heuristics and Fitness Evaluation 25 DNA Sequence Generation: The Computational Problem • Why is the problem difficult? – Conflicting goals • Avoid restriction sites • Maximizing Codon Preference • Thus, cannot use deterministic algorithm – Degeneracy (redundancy) of the DNA code – 64 codons, 20 (21) amino acids (see next slide) • Several synonymous codons are translated into the same amino acid • Synonymous codons per AA vary from one to six (average is four codons per AA) • Huge number of possible DNA Sequences – Average 2N for protein of amino acid length n – Codon Preference • Varying levels of tRNA assembly components in organisms • Codon usage for a particular AA greatly influence protein expression – (continued) 26 DNA Sequence Generation: Codon to Amino Acid Translation 27 http://campus.queens.edu/faculty/jannr/Genetics/images/codon.jpg DNA Sequence Generation: The Computational Problem (cont) • Why is the problem difficult? – (continued) – Restriction Enzymes • The vector will contain many restriction enzymes – If these cut up our DNA, we won’t express our proteins – We must design the DNA string using synonymous codons so that there are no restriction sites • Helpful to include some other restriction sites – We must design the DNA string using synonymous codons so that these are included – (continued) 28 DNA Sequence Generation: The Computational Problem (cont) • Why is the problem difficult? – (continued) – mRNA Secondary Structure • In prokaryotes, mRNA can fold into complex shapes • This inhibits protein creation – Oligonucleotide generation • Want a specific melting temperature so that the complex folding doesn’t take place • The “sticky ends” must have the same melting temperature so that they will bind together. 29 Outline • Short biology Tutorial • DNA Sequence Generation – Why is the problem difficult? • IBG Gene Designer – Genetic Algorithm (GA) solution – Heuristics and Fitness Evaluation 30 IBG GeneDesigner: Our Solution •IBG GeneDesigner 31 IBG GeneDesigner: Genetic Algorithm • Uses a Genetic Algorithm for sequence optimization – Tournament selection model – Uniform and single-point crossover (behind the scenes – not user selectable at present.) – Mutation causes codon “wobbling” – Sequence “fitness” determined by heuristic evaluation 32 IBG GeneDesigner: Fitness Evaluation • GeneDesigner heuristics – Manipulation of nucleotide percentages/ratios to reduce mRNA secondary structure formation – Inclusion and Exclusion of restriction sites • Restriction sites requested for inclusion should only occur once – Matching of codon preference – Oligonucleotide generation • Fitness determined by melting points, start and end nucleotide 33 IBG GeneDesigner: Future Work • Algorithm parameters – Systematically manipulate GA parameters to identify default values for sequence optimization • Population size • Number of generations • Mutation rate • Convergence criteria – Modify heuristic weighting scheme • Selection models – Experiment with alternative selection models (Roulette wheel, elitism, limit population replacement) 34 IBG GeneDesigner: Future Work • Move algorithm to ECJ architecture – Use the Strength-Pareto multi-objective optimization algorithm • Create web-based version of application • Explore island model effects on optimization 35 Results • IBG GeneDesigner utilized to generate a nucleotide sequence for the SH3 domain of a-spectrin1. • The codon optimization option was set for expression in E. coli with a 40% G/C bias • We also used the application to generate four assembly PCR template oligonucleotide sequences to produce the protein coding sequence flanked by desired restriction enzyme recognition sites. • The calculated Tm values of the three overlapping regions were within 1.6oC – Promoting similar annealing behavior between strands. – Success of the reaction was confirmed by DNA sequencing of a pUC19 expression vector containing the PCR product cloned between restriction sites included in the gene design. • Summary: Protein Made!!! Input: Protein Sequnce, Vector, Restriction Enzymes Input: Flanking Sequences Input: Algorithm Parameters and Fitness Scores Output: Generation of Oligonucleotides Acknowledgements • Graduate student who did much of the coding • Rob Vogelbacher • University of Chicago undergraduate who used it to build a protein • Benjamin R. Capraro • His advisor • Tobin Sosnick • Our collaborator at University of chicago • Shohei Koide 42