* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download gelfand-genetic-code
Mitochondrial DNA wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene expression programming wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
DNA vaccination wikipedia , lookup
Primary transcript wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Oncogenomics wikipedia , lookup
Molecular cloning wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Population genetics wikipedia , lookup
Epigenomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
DNA supercoil wikipedia , lookup
Genome evolution wikipedia , lookup
Dominance (genetics) wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genome (book) wikipedia , lookup
Genetic engineering wikipedia , lookup
Microsatellite wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Transfer RNA wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Frameshift mutation wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Genome editing wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Designer baby wikipedia , lookup
Helitron (biology) wikipedia , lookup
Point mutation wikipedia , lookup
Expanded genetic code wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
The Genetic Code Math-CS Camp, 19.07.06, Singapore Mikhail S. Gelfand Research and Training Center of Bioinformatics, Institute for Information Transmission Problems, Moscow, Russia and Department of Bioengineering and Bioinformatics, Moscow State University The Biological Code by Martynas Yčas (London, 1969) Биологический код (Mосква, 1971) 140 120 refs. 100 80 60 40 20 0 1956 1951-55 1946-50 1941-45 193X 192X 191X 190X 18XX 47 49 51 53 55 57 59 year(s) 61 63 65 67 69 71 To apply mathematics in biology, a mathematician has to understand biology. Israel Gelfand Plan • Pre-history – Genetics – Evolutionary theory – Chemistry • Cracking the Code • Update Genetics: Gregor Mendel (1822-1884) • Attended the Philosophical Institute in Olomouc • Since 1843 – at the Augustinian Abbey of St. Thomas in Brno • 1851-1853 – studied in the University of Vienna • 1856-1863 – cultivated 28 thousand pea plants • The Three Laws of Genetics (“Experiments on Plant Hybridization”) – Read to the Natural History Society of Brunn in Bohemia (1865) – Published in Proceedings of the Natural History Society (1866) • Since 1866 – abbot, stopped working in science The seven traits of pea plants studied by Mendel The first law Crossing two pure lines different in some trait (e.g. yellow / green seeds), one gets only one variant (allele) in the first generation (the dominant allele) F0 F1 The second law Crossing two pure lines different in some trait (e.g. yellow / green seeds), one gets only one variant (allele) in the first generation (the dominant allele), and the distribution 3:1 of the dominant and recessive alleles in the second generation. F0 F1 F2 (Law of large numbers) The 3:1 ratio is seen only when the number of observations is sufficiently high. F0 F1 F2 The third law Two different traits are inherited independently (in the second generation the ratio is 9:3:3:1) F0 F1 F2 F2 What if we take a pair with a different assortment of the same traits? F0 F0 F1 ? F2 Same F1 F2 F0 F0 F1 F1 Same F2 … regardless of the initial assortment F2 F0 F0 F1 F1 Incomplete dominance Incomplete dominance ? Incomplete dominance ? Incomplete dominance Charles Darwin (1809-1882) • 1825-27 in Edinburgh University and 182731 in University of Cambridge – natural history, geology, botany • 1831-1836 – Voyage of the Beagle • Journal of Researches into the Geology and Natural History of the various countries visited by H.M.S. Beagle (1839) Origin of Species (1859) The Law of Natural Selection • Species make more offspring than can grow to adulthood. • Populations remain roughly the same size. • Food resources are limited, but are relatively constant most of the time. • In such an environment there will be a struggle for survival among individuals. • In sexually reproducing species, generally no two individuals are identical. • Much of the variation is heritable. • Individuals with the "best" characteristics will be more likely to survive … • … those desirable traits will be passed to their offspring … • … and then inherited by following generations, becoming prevalent and then fixed among the population through time. Thomas Huxley (1825-1895) “Darwin’s Bulldog” Origin of Homo sapiens Re-discovery of the Mendel laws and emergence of modern genetics • Hugo de Vries (1900) • William Bateson – genetics, gene, allele • Walter Sutton – Link between genes and chromosomes(1902) • Archibald Garrod – Genetic cause of some human disease (1902-08-23) • Thomas Morgan, work on Drosophila. – Mutants: spontaneous appearance of new alleles (a fly with white eyes in a population of flies with red eyes) (1908) – Universal acceptance of chromosomes (1915) Gene = a set of non-complementing mutations Edward Lewis: Do two recessive mutations occur in the same gene? F1: Mutant phenotype F1: Wild-type phenotype Mutant phenotypes persist in cis (same gene). Mutant phenotypes reappear in trans (different genes) F2 F1: Mutant phenotype F2: All mutant phenotypes F1: Wild-type phenotype F2 WT WT Mut WT WT Mut Mut Mut Mut 1 2 1 2 4 2 1 2 1 9:7 DNA • Friedrich Miescher (1869) – Nucleolin – Richard Altmann: nucleic acid (1889). Only in chromosomes • Phoebus Levene (1929) – Components (four bases, the sugar-phosphate chain) – Nucleotide: phosophate+sugar+base unit • Hammarsten and Casperson (1930s) – DNA is a long polymer; crystals • Astbury (1938) – X-ray photographs • Chargaff rules (1947) – In many organisms, #A=#T, #C=#G Transforming factor (Frederick Griffith,1928) … = DNA (Oswald Avery, Colin McLeod, Maclyn MacCarthy,1944) DNA is the genetic medium of phages (Alfred Hershey and Martha Chase, 1948) – radioactive DNA 35S – radioactive proteins 32P Only DNA enters the cell … and only DNA is inherited by progeny phages Erwin Schrödinger “What is life”, 1946: The gene is an aperiodic crystal The structure of DNA … • Maurice Wilkins and Rosalind Franklin: high-resolution crystals (1950-1953) … is the double helix James Watson and Francis Crick (1953) The Nature paper: a few lines more than one page The DNA chain Complementary pairs of nucleotides С Т G A Figures from the second Watson-Crick paper The main distances are the same One base-pair in the double helix (axial view) The double helix, stick and ball models, axial view The double helix, stick and ball models, side view Three models for the replication of DNA The semi-conservative one is correct (Matthew Meselson and Franklin Stahl, 1958) Cells are grown on the 15N (heavy) medium for several generations, then transferred to 14N (light) medium Q: What would be the outcome if one of the two other models were correct? Electron micrograph of replicating DNA The Central Dogma (F.Crick) DNA RNA protein Crossingover and recombination • Genes from one chromosome are not inherited independently • Recombination allows for relative mapping of gene positions on the chromosome: if two genes are close, the frequency of recombination will be lower Collinearity of the gene and the protein (Charles Yanofsky, 1967) The Genetic Code • The genetic code: correspondence between DNA and protein (George Gamow, 1954) (Георгий Гамов) • Crick and co-authors (1961): – Non-overlapping (one mutation affects one amino acid) – Degenerate (many codons for one amino acid) – Comma-less (no specific markers between codons) – Periodic The codon is a triplet • Mutations caused by acridine – Non-leaky (instead of weakened function, simply no function) – Mechanism: insertions and deletions of nucleotides (the downstream part of the gene completely scrambled the code is comma-less) CUACUACUACUACUACUACUACUACUACUACUACUACUA LeuLeuLeuLeuLeuLeuLeuLeuLeuLeuLeuLeuLeu G insertion CUACUACUACGUACUACUACUACUACUACUACUACUACU LeuLeuLeuArgThrThrThrThrThrThrThrThrThr U deletion CUACUACUACUACUACUACUACUACUACACUACUACUAC LeuLeuLeuLeuLeuLeuLeuLeuLeuHisTyrTyrTyr Double mutants and revertants • Two classes of mutations: (+) and (–) • Double mutants (+)¤(+) and (–)¤(–) still produce loss-offunction phenotypes • Double mutants (+)¤(–) and (–)¤(+) produce leaky phenotypes CUACUACUACGUACUACUACUACUACUACUACUACUACU LeuLeuLeuArgThrThrThrThrThrThrThrThrThr ¤ CUACUACUACUACUACUACUACUACUACACUACUACUAC LeuLeuLeuLeuLeuLeuLeuLeuLeuHisTyrTyrTyr CUACUACUACGUACUACUACUACUACUACACUACUACUA LeuLeuLeuArgThrThrThrThrThrThrLeuLeuLeu Triple mutants are revertants! • Triple mutants of the same class, (+)¤(+)¤(+) and (–)¤(–)¤(–), produce leaky phenotypes CUACUACUACGUACUACUACUACUACUACUACUACUACUACU LeuLeuLeuArgThrThrThrThrThrThrThrThrThrThr ¤ CUACUACUACUACUACUACGUACUACUACUACUACUACUACU LeuLeuLeuLeuLeuLeuArgThrThrThrThrThrThrThr double mutant – loss of function phenotype CUACAUCUACGUACUACUACGUACUACUACUACUACUACUAC LeuLeuLeuArgThrThrThrTyrTyrTyrTyrTyrTyrTyr ¤ CUACUACUACUACUACUACUACUACUACGUACUACUACUACU LeuLeuLeuLeuLeuLeuLeuLeuLeuArgThrThrThrThr triple mutant – leaky phenotype CUACUACUACGUACUACUACGUACUACUACGUACUACUACUA LeuLeuLeuArgThrThrThrTyrTyrTyrValLeuLeuLeu Cracking the Code (F.Crick, M.Nirenberg, J.Matthaei, S.Ochoa, G.Khorana, … and you) • Regular oligonucleotides – … UUUUUUUUUU … – … UCUCUCUCUC … – … UCAUCAUCAU … • Random oligonucleotides with known composition • Changes in proteins caused by deaminationcaused mutations: CU, AG • Changes in proteins caused random mutations • (tRNA binding in the presense of trinucleotides) 20 amino acids and 64 codons • • • • • • • • • • • • • • • • • • • • Alanine Cysteine Aspartate Glutamate Phenylalanine Glycine Histidine Isoleucine Lysine Leucine Methionine Asparagine Proline Glutamine Arginine Serine Threonine Valine Tryptophan Tyrosine UUU UUC UUA UUG CUU CUC CUA CUG AUU AUC AUA AUG GUU GUC GUA GUG Phe UCU UCC UCA UCG CCU CCC CCA CCG ACU ACC ACG ACA GCU GCC GCA GCG Pro UAU UAC UAA UAG CAU CAC CAA CAG AAU AAC AAA AAG GAU GAC GAA GAG Lys UGU UGC UGA UGG CGU CGC CGA CGG AGU AGC AGA AGG GGU GGC GGA GGG Triplet binding data (from Crick’s Croonian lecture, 1966) Reading the code: The ribosome Translation Polysomes Adaptors (F.Crick and S.Brenner) tRNA: secondary structure tRNA: three-dimensional structure tRNA and aminoacid-tRNA-synthetase Initiation of translation Translation start sites dnaN gyrA serS bofA csfB xpaC metS gcaD spoVC ftsH pabB rplJ tufA rpsJ rpoA rplM ACATTATCCGTTAGGAGGATAAAAATG GTGATACTTCAGGGAGGTTTTTTAATG TCAATAAAAAAAGGAGTGTTTCGCATG CAAGCGAAGGAGATGAGAAGATTCATG GCTAACTGTACGGAGGTGGAGAAGATG ATAGACACAGGAGTCGATTATCTCATG ACATTCTGATTAGGAGGTTTCAAGATG AAAAGGGATATTGGAGGCCAATAAATG TATGTGACTAAGGGAGGATTCGCCATG GCTTACTGTGGGAGGAGGTAAGGAATG AAAGAAAATAGAGGAATGATACAAATG CAAGAATCTACAGGAGGTGTAACCATG AAAGCTCTTAAGGAGGATTTTAGAATG TGTAGGCGAAAAGGAGGGAAAATAATG CGTTTTGAAGGAGGGTTTTAAGTAATG AGATCATTTAGGAGGGGAAATTCAATG Translation start sites aligned dnaN gyrA serS bofA csfB xpaC metS gcaD spoVC ftsH pabB rplJ tufA rpsJ rpoA rplM ACATTATCCGTTAGGAGGATAAAAATG GTGATACTTCAGGGAGGTTTTTTAATG TCAATAAAAAAAGGAGTGTTTCGCATG CAAGCGAAGGAGATGAGAAGATTCATG GCTAACTGTACGGAGGTGGAGAAGATG ATAGACACAGGAGTCGATTATCTCATG ACATTCTGATTAGGAGGTTTCAAGATG AAAAGGGATATTGGAGGCCAATAAATG TATGTGACTAAGGGAGGATTCGCCATG GCTTACTGTGGGAGGAGGTAAGGAATG AAAGAAAATAGAGGAATGATACAAATG CAAGAATCTACAGGAGGTGTAACCATG AAAGCTCTTAAGGAGGATTTTAGAATG TGTAGGCGAAAAGGAGGGAAAATAATG CGTTTTGAAGGAGGGTTTTAAGTAATG AGATCATTTAGGAGGGGAAATTCAATG Elongation Termination of translation Dialects • • • • • The genetic code is not universal … but the differences are relatively minor … occur mainly in small genomes of organelles … and involve specific codon families. In many cases symmetry is increased, or entire families reassigned. • Many changes involve stop codons Reassignment CUN (=CUU, CUC, CUA, CUG): LeuThr Possible initiation codons in addition to AUG (Met): NUG (=GUG,UUG,CUG), AUN (=AUU,AUC,AUA) UAA, UAG: stop Gln More symmetry AUU AUC AUA AUG Ile Ile IleMet Met AGU AGC AGA AGG Ser Ser ArgSer ArgSer UGU UGC UGA UGG Cys Cys stopTrp Trp Vulnerable codon families CGU CGC CGA CGG Arg Arg Arg none Arg none AGU AGC AGA AGG Ser Ser Arg Arg GGU GGC GGA GGG Gly Gly Gly Gly Ser Ser Gly Gly stop stop none Stop-containing families UGU UGC UGA UGG Cys Cys stop Trp Trp UAU UAC UAA UAG Tyr Tyr stop Tyr stop Cys Sec Gln Gln (Pyl) How many letters are there in the English alphabet? How many letters are there in the English alphabet? • 26 (everybody knows) … How many letters are there in the English alphabet? • 26 (everybody knows) … • … but we are discussing the book by Yčas … How many letters are there in the English alphabet? • 26 (everybody knows) … • … but we are discussing the book by Yčas … • … so everybody are naïve How many amino acids? • Chemists: hundreds – many occur in proteins: post-translation modifications • How many amino acids are encoded by DNA? Crick: Is formyl-methionine a “standard” amino acid? • Occurs in bacteria at N-termini of all recently synthesized proteins (may be enzymatically removed later on) • Has three codons: AUG, GUG, UUG – unlike “inernal” methionine encoded only by AUG – by the way, internal GUG encodes Valine and internal UUG encodes Leucine Selenocysteine • In all three domains of life (bacteria, eukaryotes, archaea) • Encoded by UGA followed by a special hairpin structure (SECIS) – without this hairpin UGA is a stop-codon – several genes for selenoproteins per genome (or none) – corresponds to cysteine in homologs (more efficient in enzymes) • Complicated mechanism of incorporation (specific tRNA, seryl-tRNA-synthetase, conversion to SeCys on tRNA, specific elongation factor) Alignment of SECIS elements The consensus SECIS structure SECIS elements: examples Pyrrolysine • In methanogenic archaea • A derivative of lysine • Directly encoded (unlike selenocysteine). Standard mechanism: – UAG codon – specific tRNA – aminoacyl-tRNA • UAG rarely used as a stop codon – never as the only stop of a gene Thanks • Wikipedia • Ergito • Authors of papers, photographs and Internet resources • • • • Professor Leong Hon Wai The organizers The assistants The students