Download Bioinformatics Molecular Genetics

Bioinformatics Molecular Genetics David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Molecular genetics • • • • • • Cells Macromolecules (DNA, RNA & proteins) Replication Genes Expression (transcription & translation) Function Resources: • http://www.ornl.gov/hgmis/education/students.html US government resources • http://www.ornl.gov/hgmis/publicat/primer/prim1.html Primer section on molecular genetics (c) David Gilbert 2008 Molecular Genetics 2 DNA Genes to systems "gene" mRNA Protein sequence Folded Protein (c) David Gilbert 2008 Molecular Genetics 3 The three kingdoms of organisms from http://whyfiles.org/022critters/archaea.html (c) David Gilbert 2008 Molecular Genetics 4 Divisions of life • Prokaryotes: organisms without a cell nucleus (= karyon), or any other membrane-bound organelles – Archaebacteria The kingdom (or "domain") of single-celled organisms that live under extreme environmental conditions and have distinctive biochemical features. – Bacteria (s: bacterium) A single-celled organism. Found throughout nature and can be beneficial or pathogenic. • Eukaryotes (also "eucaryotes"): all organisms with complex cells which have a nucleus (in which the genetic material is organized ) and organelles. – animals, plants, & fungi, (mostly multicellular), plus protists, (many are unicellular) NB: Viruses - not free living. Microscopic parasite that infects cells in biological organisms. Can reproduce only by invading and controlling other cells as they lack the cellular machinery for self-reproduction. (c) David Gilbert 2008 Molecular Genetics 5 Unicellular vs multicellular • Unicellular – can be eukaryotes, e.g. Yeast (c) David Gilbert 2008 Molecular Genetics 6 Animal and Plant cells (c) David Gilbert 2008 Molecular Genetics 7 The cell from http://www.accessexcellence.org/AB/GG/cell.html (c) David Gilbert 2008 Molecular Genetics 8 The genome • Genome: the whole hereditary information of an organism that is encoded in the DNA (or, for some viruses, RNA). Includes both the genes and the non-coding sequences. • Term was first coined in 1920 by Hans Winkler, Prof of Botany, Uni Hamburg, Germany. • More precisely, the genome of an organism is a complete DNA sequence of one set of chromosomes; for example, one of the two sets that a diploid individual carries in every somatic cell. The term genome can be applied specifically to mean the complete set of nuclear DNA (i.e., the nuclear genome) but can also be applied to organelles that contain their own DNA, as with the mitochondrial genome or the chloroplast genome. Sequence of a genome of a sexually reproducing species: one set of autosomes & one of each type of sex chromosome. “Genome sequence" may be a composite from the chromosomes of various individuals. The study of the global properties of genomes of related organisms is usually referred to as genomics, which distinguishes it from genetics which generally studies the properties of single genes or groups of genes. Every cell (except sex cells & mature red blood cells) contains the complete genome of an organism • • • • • [wikipaedia] (c) David Gilbert 2008 Molecular Genetics 9 Chromosomes • DNA packaged into individual chromosomes (plus proteins) • prokaryotes (single-cell organisms, no nucleus) - single circular chromosome • eukaryote (organisms with nuclei) - species specific number and size of chromosomes • Viruses: anything goes - circular (single or double stranded), linear, DNA, RNA (c) David Gilbert 2008 Molecular Genetics 10 Chromosomes • DNA packaged in one or more large macromolecules called chromosomes. • Chromosome (Greek chroma = color and soma = body) : a very long, continuous piece of DNA, which contains many genes, regulatory elements and other intervening nucleotide sequences. • In the chromosomes of eukaryotes, the DNA exists in a quasi-ordered structure inside the nucleus, where it wraps around histones (structural proteins), and where this composite material is called chromatin. • Prokaryotes do not possess histones or nuclei. • In its relaxed state, the DNA can be accessed for transcription, regulation, and replication. • Chromosomes were first observed by Karl Wilhelm von Nägeli in 1842 and their behavior later described in detail by Walther Flemming in 1882. In 1910, Thomas Hunt Morgan proved that chromosomes are the carriers of genes. [wikipaedia] (c) David Gilbert 2008 Molecular Genetics 11 DNA & Chromosomes www.ogm-info.com/adn.html (c) David Gilbert 2008 Molecular Genetics 12 Some model genetic organisms Name Genome BP Genes Chromosomes HSV1 (Herpes virus) 1.5x105 70 1 Escherichia Coli 4.6x106 4,300 1 Saccharomyces cerevisiae 1.2x107 5,900 16 Caenorhabditis Elegans Drosophila melanogaster Arabidopsis Thalania Mus Musculus 1.0x108 19,100 6 1.8x108 13,600 6 1.2x108 25,500 5 2.5x109 ?30,000 20+X/Y Homo sapiens 2.9x109 ?30,000 22+X/Y (c) David Gilbert 2008 Molecular Genetics 13 Image from Human Genome Project http://www.ornl.gov/hgmis DNA, the molecule of life 22,000 (c) David Gilbert 2008 Molecular Genetics 14 DNA Genes to proteins "gene" mRNA Protein sequence Folded Protein (c) David Gilbert 2008 Molecular Genetics 15 Molecular building blocks DNA = deoxyribonucleic acid, Σ={A,C,G,T} Double-stranded sequences of bases, Pairings A-T, C-G. Transcription: DNA ↓ RNA • • C-A-T-G-T-C-C-A T-G-G-A-C-A-T-G • C-A-U-G-U-C-C-A RNA = ribonucleic acid, Σ={A,C,G,U} Sequence of bases. Pairings = A-U, C-G, G-U, …. Context Free / Context Sensitive Grammar Translation RNA ↓ Amino-acids Proteins, |Σ | =? E-R-L-N-T-A-S-I-P Sequence (chain) of amino-acids (c) David Gilbert 2008 Molecular Genetics 16 DNA & RNA base DNA = deoxyribonucleic acid RNA = ribonucleic acid P Biological macromolecules built as long linear chains of chemical components - nucleotides = sugar+phosphate+base P Bases: adenine (A), cytosine (C), guanine (G), thymine (T), uracil (U) 4 letter alphabet: DNA = A C G T (adenine cytosine guanine thymine) RNA = A C G U (adenine guanine uracil) (c) David Gilbert 2008 cytosine Molecular Genetics 17 Nucleotides G Guanine base P O HO P O Adenine A Cytosine C Thymine T _ OH phosphate deoxyribose (sugar) bases (c) David Gilbert 2008 Molecular Genetics 18 DNA/RNA chains Biological macromolecule built as long linear chain of chemical components = nucleotides connected by phosphodiester bonds P 5’ A C P C P T P G P 3’ (c) David Gilbert 2008 Molecular Genetics 19 Sequence in FASTA format >1HBB:D HEMOGLOBIN A - CHAIN D 1 atggtgcacc tgactcctga ggagaagtct 51 caaggtgaac gtggatgaag ttggtggtga 101 tggtctaccc ttggacccag aggttctttg 151 actcctgatg ctgttatggg caaccctaag 201 agtgctcggt gcctttagtg atggcctggc 251 gcacctttgc cacactgagt gagctgcact 301 cctgagaact tcaggctcct gggcaacgtg 351 tcactttggc aaagaattca ccccaccagt 401 tggtggctgg tgtggctaat gccctggccc (c) David Gilbert 2008 gccgttactg ggccctgggc agtcctttgg gtgaaggctc tcacctggac gtgacaagct ctggtctgtg gcaggctgcc acaagtatca Molecular Genetics ccctgtgggg aggctgctgg ggatctgtcc atggcaagaa aacctcaagg gcacgtggat tgctggccca tatcagaaag ctaa 20 Base-ambiguity symbols (for information) (c) David Gilbert 2008 IUB symbol Represented bases A A C C G G T/U T M A, C R A, G W A, T S C, G Y C, T K G, T V A, C, G H A, C, T D A, G, T B C, G, T X/N A, C, G, T Molecular Genetics 21 DNA complementarity (base-pairing) A-T C-G (c) David Gilbert 2008 Molecular Genetics 22 5’ (c) David Gilbert 2008 Molecular Genetics 5’ T-G-G-A-C-A-T-G 3’ C-A-T-G-T-C-C-A 3’ Complentary base-pairing & double-helix 119D.pdb 23 AAAAGAAAAGGTTAGAAAGATGAGAGATGATAAAGGGTCCATTTGAGGTTAGGTAATATGGTTTGGTATCCCTGTAGTTAAAAGTTTTTGT CTTATTTTAGAATACTGTGACTATTTCTTTAGTATTAATTTTTCCTTCTGTTTTCCTCATCTAGGGAACCCCAAGAGCATCCAATAGAAGC TGTGCAATTATGTAAAATTTTCAACTGTCTTCCTCAAAATAAAGAAGTATGGTAATCTTTACCTGTATACAGTGCAGAGCCTTCTCAGAAG CACAGAATATTTTTATATTTCCTTTATGTGAATTTTTAAGCTGCAAATCTGATGGCCTTAATTTCCTTTTTGACACTGAAAGTTTTGTAAA AGAAATCATGTCCATACACTTTGTTGCAAGATGTGAATTATTGACACTGAACTTAATAACTGTGTACTGTTCGGAAGGGGTTCCTCAAATT TTTTGACTTTTTTTGTATGTGTGTTTTTTCTTTTTTTTTAAGTTCTTATGAGGAGGGAGGGTAAATAAACCACTGTGCGTCTTGGTGTAAT TTGAAGATTGCCCCATCTAGACTAGCAATCTCTTCATTATTCTCTGCTATATATAAAACGGTGCTGTGAGGGAGGGGAAAAGCATTTTTCA ATATATTGAACTTTTGTACTGAATTTTTTTGTAATAAGCAATCAAGGTTATAATTTTTTTTAAAATAGAAATTTTGTAAGAAGGCAATATT AACCTAATCACCATGTAAGCACTCTGGATGATGGATTCCACAAAACTTGGTTTTATGGTTACTTCTTCTCTTAGATTCTTAATTCATGAGG AGGGTGGGGGAGGGAGGTGGAGGGAGGGAAGGGTTTCTCTATTAAAATGCATTCGTTGTGTTTTTTAAGATAGTGTAACTTGCTAAATTTC TTATGTGACATTAACAAATAAAAAAGCTCTTTTAATATTAGATAA DNA (c) David Gilbert 2008 Molecular Genetics 24 What happens to DNA? DNA replication (2 copies of the DNA) DNA transcription RNA DNA DNA translation (Eventually the cell divides into 2 cells - mitosis) DNA (c) David Gilbert 2008 Protein synthesis Protein = double-stranded DNA Molecular Genetics 25 Image from Human Genome Project http://www.ornl.gov/hgmis DNA replication prior to cell division [anim 12.1 & 2] (c) David Gilbert 2008 Molecular Genetics 26 DNA Replication (Image from D.Leader) (c) David Gilbert 2008 Molecular Genetics 27 The gene • Basic unit of heredity • Sequence of bases carrying the information required to construct a particular protein • A gene encodes a protein or an RNA • Estimated number of genes: – – – – humans & mice: 100,000 28,000-35,000 20,000-22,000 !!! C. elegans (worm): 19,000 S. cerevisiae (yeast): 6,000 Tuberculosis microbe: 4,000 (c) David Gilbert 2008 Molecular Genetics 28 Gene • • • • • • • • Genes are the units of heredity in living organisms. Encoded in the organism's genetic material (usually DNA or RNA), & control the development and behavior of the organism. During reproduction, the genetic material is passed on from the parent(s) to the offspring. Genetic material can also be passed between un-related individuals (e.g. via transfection, or on viruses). (!) Genes encode the information necessary to construct the chemicals (proteins etc.) needed for the organism to function. The word "gene" (coined 1909 by Danish botanist Wilhelm Johannsen) comes from the Greek genos ("origin") and is shared by many disciplines, including classical genetics, molecular genetics, evolutionary biology and population genetics. Because each discipline models the biology of life differently, the usage of the word gene varies between disciplines. It may refer to either material or conceptual entities. Molecular biology: the segments of DNA which cells transcribe into RNA and translate, at least in part, into proteins. The Sequence Ontology project defines a gene as: "A locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions". (c) David Gilbert 2008 Molecular Genetics 29 Nature vs Nurture • • • • • • Common speech: "gene" often used to refer to the hereditary cause of a trait, disease or condition—as in "the gene for obesity." Biologists might refer to an allele or a mutation that has been implicated in or is associated with obesity. (many factors other than genes decide whether a person is obese or not: eating habits, exercise, prenatal environment, upbringing, culture and the availability of food, for example.) Allele: one of a number of viable DNA codings of the same gene occupying a given locus (position) on a chromosome. In an organism which has two copies of each of its chromosomes (diploid organism), 2 alleles make up the individual's genotype. Very unlikely that variations within a single gene—or single genetic locus—fully determine one's genetic predisposition for obesity. Interplay between genes and environment, the influence of many genes—appear to be the norm with regard to many and perhaps most ("complex" or "multi-factoral") traits. Phenotype: the characteristics that result from this interplay. (c) David Gilbert 2008 Molecular Genetics 30 Central Dogma • The central dogma of information flow in biology essentially states that the sequence of amino acids making up a protein and hence its structure (folded state) and thus its function, is determined by transcription from DNA via RNA. • “This states that once ‘information’ has passed into protein it cannot get out again. In more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein.” Francis Crick, On Protein Synthesis, in Symp. Soc. Exp. Biol. XII, 138-167 (1958) • (Nothing said explicitly about transfer from RNA to DNA) (c) David Gilbert 2008 Molecular Genetics 31 DNA (gene) → RNA → Protein control statement Termination (stop) TATA box start control statement gene Ribosome binding 5’ utr Transcription (RNA polymerase) 3’ utr mRNA Translation (tRNA on the Ribosome) Protein (c) David Gilbert 2008 Molecular Genetics 32 Mutations • alterations of DNA sequence • can lead to loss or gain of function • source of developmental problem and disease • motor of evolution (c) David Gilbert 2008 Molecular Genetics 33 RNA - ribonucleic acid structure similar to DNA but single stranded (except local base pairing) sugar is ribose thymine (T) is replaced by uracil (U) various roles: mRNA, rRNA, tRNA, etc! (newly discovered) (information storage, catalysis, transfer, regulation, …) (c) David Gilbert 2008 Molecular Genetics 34 Secondary structure of RNA base-pairing A-U C-G G-U G-A G-G A-A (c) David Gilbert 2008 tRNA Molecular Genetics 35 Transcription process of copying DNA to RNA • operated by RNA polymerase • starts at transcription start; ends at stop signal 5’ DNA T T C A A G 3’ 5’ 3’ (c) David Gilbert 2008 3’ 5’ DNA RNA U U C A A G DNA Molecular Genetics 3’ 5’ 36 Transcription mechanism RNA polymerase 1 start DNA promoter RNA nucleotides 2 DNA stop RNA 3 DNA stop [anim 4.3] (c) David Gilbert 2008 Molecular Genetics 37 Transcription of DNA into RNA (Image from D.Leader) (c) David Gilbert 2008 Molecular Genetics 38 Open Reading Framme • Open Reading Frame (ORF): any sequence of DNA or RNA that can be translated into a protein. • In a gene ORFs are – located between the start-code sequence (initiation codon) and the stop-code sequence (termination codon). – part of the sequence that will be translated by the ribosomes – long and continue over gaps, or introns. • However, short ORFs can also occur by chance outside of genes. Usually ORFs outside genes are not very long and terminate after a few codons. (c) David Gilbert 2008 Molecular Genetics 39 Promoters are important for gene expression (c) David Gilbert 2008 Molecular Genetics 40 Promotor, transcription factor, enhancer,… enhancer enhancer-binding protein DNA TF TF promotor (c) David Gilbert 2008 Molecular Genetics exon 41 Transcription factors, promotors, enhancers, silencers • • • • • • Transcription factor: a protein that binds DNA at a specific promoter or enhancer region or site, where it regulates transcription. Transcription factors can be selectively activated or deactivated by other proteins Promoter: a DNA sequence that enables a gene to be transcribed. The promoter is recognized by RNA polymerase, which then initiates transcription. Promoters represent critical elements that can work in concert with other regulatory regions (enhancers, silencers, boundary elements/insulators) to direct the level of transcription of a given gene. Enhancer: a short region of DNA that can be bound with proteins (namely, the trans-acting factors, much like a set of transcription factors) to enhance transcription levels of genes (hence the name) in a gene-cluster. An enhancer does not need to be particularly close to the genes it acts on, but it is on the same chromosome. An enhancer does not need to bind close to the transcription initiation site to affect its transcription, as some have been found to bind several hundred thousand base pairs upstream or downstream of the start site. Enhancers can also be found within introns. An enhancer's orientation may even be reversed without affecting its function. Furthermore, an enhancer may be excised and inserted elsewhere in the chromosome, and still affect gene transcription. Silencer: a DNA sequence capable of binding transcription regulation factors termed repressors. Upon binding, RNA polymerase is prevented from initiating transcription thus decreasing or fully suppressing RNA synthesis (c) David Gilbert 2008 Molecular Genetics 42 Promoter sites (Image from D.Leader) (c) David Gilbert 2008 Molecular Genetics 43 Transcription switches (Image from D.Leader) (c) David Gilbert 2008 Molecular Genetics 44 RNA splicing gene exon1 intron1 exon2 intron2 exon3 intron3 intron1 exon2 intron2 exon3 intron4 exon5 transcription initial transcript (pre-mRNA) exon1 exon4 intron3 exon4 intron4 exon5 splicing spliced mRNA exon1 exon2 exon3 exon4 exon5 Equivalent to cDNA (coding DNA without introns) (c) David Gilbert 2008 Molecular Genetics 45 Alternative splicing gene exon1 intron1 exon2 intron2 exon3 intron3 exon4 intron4 transcription initial transcript (pre-mRNA) exon1 intron1 exon2 intron2 exon2 protein1 (c) David Gilbert 2008 intron3 exon4 intron4 alternative splicing mRNA1 exon1 exon3 exon3 exon4 exon1 exon2 Altered structure and function Molecular Genetics exon5 exon5 mRNA2 exon3 exon4 exon5 protein2 46 Alternative splicing http://sequence-www.stanford.edu/group/research/arabidopsis/splicing_big.html (c) David Gilbert 2008 Molecular Genetics 47 Pseudogene • A pseudogene is a nucleotide sequences that is similar to a normal gene, but is not expressed as a functional protein. • Several scenarios have been proposed under which a pseudogene might arise: 1. A gene duplication event may mean that a genome has two copies of a gene when it only requires one. Deactivating mutations in one copy of the gene would then not be selected against. In addition, the duplication event may not have been complete, so they might have incomplete promoters. These pseudogenes are called duplicated. 2. Fragments of the mRNA transcript of a gene may be spontaneously reverse transcribed and inserted into chromosomal DNA (called retrotransposition). These pseudogenes are called processed. Since these pseudogenes lack the promoters of normal genes, they are never expressed. 3. A gene may "die" during evolution if the selective pressure for its function is removed. For example, the environment in which a species lives could change such that the gene product is no longer necessary, or even harmful. Deactivating mutations in the gene would then no longer be selected against. (c) David Gilbert 2008 Molecular Genetics 48 Proteins • macromolecules composed of one or more polypeptides • polypeptide = polymer (“many-units”) of amino acids • 20 different amino acids can be used to build a polypeptide • Thus ‘polypeptide’ is a ‘string composed from a 20-character alphabet’ (c) David Gilbert 2008 Molecular Genetics 49 Amino acids (c) David Gilbert 2008 Molecular Genetics 50 Amino acid codes (c) David Gilbert 2008 One-letter code Three-letter-code Name 1 A Ala Alanine 2 C Cys Cysteine 3 D Asp Aspartic Acid 4 E Glu Glutamic Acid 5 F Phe Phenylalanine 6 G Gly Glycine 7 H His Histidine 8 I Ile Isoleucine 9 K Lys Lysine 10 L Leu Leucine 11 M Met Methionine 12 N Asn Asparagine 13 P Pro Proline 14 Q Gln Glutamine 15 R Arg Arginine 16 S Ser Serine 17 T Thr Threonine 18 V Val Valine 19 W Trp Tryptophan 20 Y Tyr Tyrosine Molecular Genetics 51 Translation synthesis of protein from mRNA • operated by ribosomes • read RNA by groups of 3 nucleotides = codons • reading frame = groupings of codons • starts with start codon; ends with stop codon 5’ N (c) David Gilbert 2008 U U C Phe (F) Molecular Genetics 3’ C RNA Polypeptide (amino-acid chain) 52 How to code amino-acids using nucleotides? • 4-letter alphabet of nucleotides (DNA, RNA) • 20-letter alphabet of amino-acids used in proteins • What is the connection mathematically? • What are the implications of this? (c) David Gilbert 2008 Molecular Genetics 53 The genetic code First Position (5’ end) T C A G T TTT TTC TTA TTG CTT CTC CTA CTG ATT ATC ATA ATG GTT GTC GTA GTG (c) David Gilbert 2008 Phe Phe Leu Leu Leu Leu Leu Leu Ile Ile Ile Met* Val Val Val Val C TCT TCC TCA TCG CCT CCC CCA CCG ACT ACC ACA ACG GCT GCC GCA GCG Second Position A Ser TAT Ser TAC Ser TAA Ser TAG Pro CAT Pro CAC Pro CAA Pro CAG Thr AAT Thr AAC Thr AAA Thr AAG Ala GAT Ala GAC Ala GAA Ala GAG Molecular Genetics Tyr Tyr Stop Stop His His Gln Gln Asn Asn Lys Lys Asp Asp Glu Glu G TGT TGC TGA TGG CGT CGC CGA CGG AGT AGC AGA AGG GGT GGC GGA GGG Cys Cys Stop Trp Arg Arg Arg Arg Ser Ser Arg Arg Gly Gly Gly Gly Third Position (3’ end) T C A G T C A G T C A G T C A G 54 Transfer RNA Ser acceptor stem acceptor stem 1EVV.pdb anti-codon U C A A G U anti-codon codon 5’ (c) David Gilbert 2008 3’ Molecular Genetics 55 Translation synthesis of protein from mRNA • operated by ribosomes • read RNA by groups of 3 nucleotides = codons • reading frame = groupings of codons • starts with start codon; ends with stop codon 5’ N (c) David Gilbert 2008 U U C Phe (F) Molecular Genetics 3’ C RNA Polypeptide (amino-acid chain) 56 Translation mechanism 1 ribosome binding site RNA start codon stop codon ORF ribosome 2 start codon RNA ribosome ORF stop codon amino acids protein ribosome 3 RNA start codon ORF protein [anim 4.5] (c) David Gilbert 2008 stop codon Molecular Genetics 57 Non-random use of synonymous codons Bacterial (highly expressed) genes A.Acid Codon Number /1000 Fraction Gly GGG 13 1.89 0.02 Gly GGA 3 0.44 0.00 Gly GGU 365 52.99 0.59 Gly GGC 238 34.55 0.38 Glu GAG 108 15.68 0.22 Glu GAA 394 57.20 0.78 Asp GAU 149 21.63 0.33 Asp GAC 298 43.26 0.67 Val GUC 93 13.50 0.16 Val GUA 146 21.20 0.26 Val GUU 289 41.96 0.51 Val GUC 38 5.52 0.07 (c) David Gilbert 2008 Molecular Genetics 58 NCBI Nucleotide banner My NCBI [Sign In] [Register] PubMed Nucleotide Protein Genome Structure Search for Limits Preview/Index History Display Show Range: from to Reverse complemented strand 1: J02799. Reports E.coli icd gene e...[gi:146431] PMC Taxonomy Clipboard Details Features: Links SNP CDD OMIM MGC HPRD Books STS tRNA LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM ECOICD 1568 bp DNA linear BCT 26-APR-1993 E.coli icd gene encoding isocitrate dehydrogenase, complete cds. J02799 J02799.1 GI:146431 icd gene; isocitrate dehydrogenase. Escherichia coli Escherichia coli Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Escherichia. REFERENCE 1 (bases 1 to 1568) AUTHORS Thorsness,P.E. and Koshland,D.E.Jr. JOURNAL Unpublished (1987) REFERENCE 2 (bases 291 to 1538) AUTHORS Thorsness,P.E. and Koshland,D.E. Jr. TITLE Inactivation of isocitrate dehydrogenase by phosphorylation is mediated by the negative charge of the phosphate JOURNAL J. Biol. Chem. 262 (22), 10422-10425 (1987) PUBMED 3112144 COMMENT Original source text: E.coli DNA, clone pTK512. Draft entry and printed copy of sequence [2],[1] kindly provided by P.E.Thorsness, 01-JUL-1987. FEATURES Location/Qualifiers source 1..1568 /organism="Escherichia coli" /mol_type="genomic DNA" /db_xref="taxon:562" CDS 291..1541 /note="isocitrate dehydrogenase (icd; EC 1.1.1.42)" /codon_start=1 /transl_table=11 /protein_id="AAA24006.1" /db_xref="GI:146432" (c) David Gilbert 2008 Molecular Genetics 59 /translation="MESKVVVPAQGKKITLQNGKLNVPENPIIPYIEGDGIGVDVTPA MLKVVDAAVEKAYKGERKISWMEIYTGEKSTQVYGQDVWLPAETLDLIREYRVAIKGP LTTPVGGGIRSLNVALRQELDLYICLRPVRYYQGTPSPVKHPELTDMVIFRENSEDIY AGIEWKADSADAEKVIKFLREEMGVKKIRFPEHCGIGIKPCSEEGTKRLVRAAIEYAI ANDRDSVTLVHKGNIMKFTEGAFKDWGYQLAREEFGGELIDGGPWLKVKNPNTGKEIV IKDVIADAFLQQILLRPAEYDVIACMNLNGDYISDALAAQVGGIGIAPGANIGDECAL FEATHGTAPKYAGQDKVNPGSIILSAEMMLRHMGWTEAADLIVKGMEGAINAKTVTYD FERLMDGAKLLKCSEFGDAIIENM" ORIGIN MluI site; 25.3 min on K12 map. 1 cgcgtggcgt ggttttcagg tttacgcctg gtagaacgtt gcgagctgaa tcgcttaacc 61 tggtgatttc taaaagaagt tttttgcatg gtattttcag agattatgaa ttgccgcatt 121 atagcctaat aacgcgcatc tttcatgacg gcaaacaata gggtagtatt gacaagccaa 181 ttacaaatca ttaacaaaaa attgctctaa agcatccgta tcgcaggacg caaacgcata 241 tgcaacgtgg tggcagacga gcaaaccagt agcgctcgaa ggagaggtga atggaaagta 301 aagtagttgt tccggcacaa ggcaagaaga tcaccctgca aaacggcaaa ctcaacgttc 361 ctgaaaatcc gattatccct tacattgaag gtgatggaat cggtgtagat gtaaccccag 421 ccatgctgaa agtggtcgac gctgcagtcg agaaagccta taaaggcgag cgtaaaatct 481 cctggatgga aatttacacc ggtgaaaaat ccacacaggt ttatggtcag gacgtctggc 541 tgcctgctga aactcttgat ctgattcgtg aatatcgcgt tgccattaaa ggtccgctga 601 ccactccggt tggtggcggt attcgctctc tgaacgttgc cctgcgccag gaactggatc 661 tctacatctg cctgcgtccg gtacgttact atcagggcac tccaagcccg gttaaacacc 721 ctgaactgac cgatatggtt atcttccgtg aaaactcgga agacatttat gcgggtatcg 781 aatggaaagc agactctgcc gacgccgaga aagtgattaa attcctgcgt gaagagatgg 841 gggtgaagaa aattcgcttc ccggaacatt gtggtatcgg tattaagccg tgttcggaag 901 aaggcaccaa acgtctggtt cgtgcagcga tcgaatacgc aattgctaac gatcgtgact 961 ctgtgactct ggtgcacaaa ggcaacatca tgaagttcac cgaaggagcg tttaaagact 1021 ggggctacca gctggcgcgt gaagagtttg gcggtgaact gatcgacggt ggcccgtggc 1081 tgaaagttaa aaacccgaac actggcaaag agatcgtcat taaagacgtg attgctgatg 1141 cattcctgca acagatcctg ctgcgtccgg ctgaatatga tgttatcgcc tgtatgaacc 1201 tgaacggtga ctacatttct gacgccctgg cagcgcaggt tggcggtatc ggtatcgccc 1261 ctggtgcaaa catcggtgac gaatgcgccc tgtttgaagc cacccacggt actgcgccga 1321 aatatgccgg tcaggacaaa gtaaatcctg gctctattat tctctccgct gagatgatgc 1381 tgcgccacat gggttggacc gaagcggctg acttaattgt taaaggtatg gaaggcgcaa 1441 tcaacgcgaa aaccgtaacc tatgacttcg agcgtctgat ggatggcgct aaactgctga 1501 aatgttcaga gtttggtgac gcgatcatcg aaaacatgta atgccgtagt ttgttaaatt 1561 tattaacg // (c) David Gilbert 2008 Molecular Genetics 60 HEADER COMPND COMPND SOURCE AUTHOR AUTHOR REVDAT REVDAT JRNL JRNL JRNL JRNL JRNL JRNL REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK … REMARK REMARK REMARK … REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK (c) OXIDOREDUCTASE (NAD(A)-CHOH(D)) 30-MAY-90 5ICD ISOCITRATE DEHYDROGENASE (E.C.1.1.1.42) COMPLEX WITH 2 MG2+ AND ISOCITRATE (ESCHERICHIA $COLI) J.H.HURLEY,A.M.DEAN,J.L.SOHL,D.E.KOSHLAND *JUNIOR, 2 R.M.STROUD 2 15-JUL-93 5ICDA 1 SHEET 1 15-OCT-91 5ICD 0 AUTH J.H.HURLEY,A.M.DEAN,J.L.SOHL,D.E.KOSHLAND *JUNIOR, AUTH 2 R.M.STROUD TITL REGULATION OF AN ENZYME BY PHOSPHORYLATION AT THE TITL 2 ACTIVE SITE REF SCIENCE V. 249 1012 1990 REFN ASTM SCIEAS US ISSN 0036-8075 038 1 1 REFERENCE 1 1 AUTH J.H.HURLEY,A.M.DEAN,D.E.KOSHLAND *JUNIOR,R.M.STROUD 1 TITL CATALYTIC MECHANISM OF /NADP+$-*DEPENDENT 1 TITL 2 ISOCITRATE DEHYDROGENASE: IMPLICATIONS FROM THE 1 TITL 3 STRUCTURES OF MAGNESIUM-*ISOCITRATE AND /NADP+$ 1 TITL 4 COMPLEXES 1 REF BIOCHEMISTRY V. 30 8671 1991 1 REFN ASTM BICHAW US ISSN 0006-2960 033 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICDA 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 2 3 4 5 6 7 1 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 2 2 RESOLUTION. 2.5 ANGSTROMS. 3 5ICD 5ICD 5ICD 46 47 48 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICDA 54 55 56 57 58 59 60 61 62 63 64 65 66 2 4 THE ASYMMETRIC UNIT IS A MONOMER. THE FUNCTIONAL DIMER CAN 4 BE GENERATED BY APPLYING THE SYMMETRY OPERATOR (Y, X, -Z) 4 TO THE COORDINATES PRESENTED IN THIS ENTRY. 5 5 RESIDUE 192 WAS INCORRECTLY REFINED AS ASP. IT HAS BEEN 5 CHANGED TO GLU AND THE SIDE CHAIN ATOMS BEYOND CB HAVE BEEN 5 DELETED. 6 6 RESIDUES MET 1 AND GLU 2 ARE MISSING FROM THE STRUCTURE. 6 SOME OF THE SIDE CHAIN ATOMS OF RESIDUES GLN 10, GLN 17, 6 ASN 18, LYS 20, ASP 81, GLU 182, LYS 186, GLU 204, ASP 259, 6 LYS 273, GLU 331, AND LYS 344 COULD NOT BE LOCATED IN THE 6 DENSITY MAPS. 7 David Gilbert 2008 Molecular Genetics 61 REMARK SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES 7 CORRECTION. CORRECT FORMAT OF SHEET RECORDS. 15-JUL-93. 1 416 MET GLU SER LYS VAL VAL VAL PRO ALA GLN GLY LYS LYS 2 416 ILE THR LEU GLN ASN GLY LYS LEU ASN VAL PRO GLU ASN 3 416 PRO ILE ILE PRO TYR ILE GLU GLY ASP GLY ILE GLY VAL 4 416 ASP VAL THR PRO ALA MET LEU LYS VAL VAL ASP ALA ALA 5 416 VAL GLU LYS ALA TYR LYS GLY GLU ARG LYS ILE SER TRP 6 416 MET GLU ILE TYR THR GLY GLU LYS SER THR GLN VAL TYR 7 416 GLY GLN ASP VAL TRP LEU PRO ALA GLU THR LEU ASP LEU 8 416 ILE ARG GLU TYR ARG VAL ALA ILE LYS GLY PRO LEU THR 9 416 THR PRO VAL GLY GLY GLY ILE ARG SER LEU ASN VAL ALA 10 416 LEU ARG GLN GLU LEU ASP LEU TYR ILE CYS LEU ARG PRO 11 416 VAL ARG TYR TYR GLN GLY THR PRO SER PRO VAL LYS HIS 12 416 PRO GLU LEU THR ASP MET VAL ILE PHE ARG GLU ASN SER 13 416 GLU ASP ILE TYR ALA GLY ILE GLU TRP LYS ALA ASP SER 14 416 ALA ASP ALA GLU LYS VAL ILE LYS PHE LEU ARG GLU GLU 15 416 MET GLY VAL LYS LYS ILE ARG PHE PRO GLU HIS CYS GLY 16 416 ILE GLY ILE LYS PRO CYS SER GLU GLU GLY THR LYS ARG 17 416 LEU VAL ARG ALA ALA ILE GLU TYR ALA ILE ALA ASN ASP 18 416 ARG ASP SER VAL THR LEU VAL HIS LYS GLY ASN ILE MET 19 416 LYS PHE THR GLU GLY ALA PHE LYS ASP TRP GLY TYR GLN 20 416 LEU ALA ARG GLU GLU PHE GLY GLY GLU LEU ILE ASP GLY 21 416 GLY PRO TRP LEU LYS VAL LYS ASN PRO ASN THR GLY LYS 22 416 GLU ILE VAL ILE LYS ASP VAL ILE ALA ASP ALA PHE LEU 23 416 GLN GLN ILE LEU LEU ARG PRO ALA GLU TYR ASP VAL ILE 24 416 ALA CYS MET ASN LEU ASN GLY ASP TYR ILE SER ASP ALA 25 416 LEU ALA ALA GLN VAL GLY GLY ILE GLY ILE ALA PRO GLY 26 416 ALA ASN ILE GLY ASP GLU CYS ALA LEU PHE GLU ALA THR 27 416 HIS GLY THR ALA PRO LYS TYR ALA GLY GLN ASP LYS VAL 28 416 ASN PRO GLY SER ILE ILE LEU SER ALA GLU MET MET LEU 29 416 ARG HIS MET GLY TRP THR GLU ALA ALA ASP LEU ILE VAL 30 416 LYS GLY MET GLU GLY ALA ILE ASN ALA LYS THR VAL THR 31 416 TYR ASP PHE GLU ARG LEU MET ASP GLY ALA LYS LEU LEU 32 416 LYS CYS SER GLU PHE GLY ASP ALA ILE ILE GLU ASN MET (c) David Gilbert 2008 Molecular Genetics 5ICDA 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 3 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 62 ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 (c) David Gilbert 2008 N CA C O CB OG N CA C O CB CG CD CE NZ N CA C O CB CG1 CG2 N CA C O CB CG1 CG2 N CA C O SER SER SER SER SER SER LYS LYS LYS LYS LYS LYS LYS LYS LYS VAL VAL VAL VAL VAL VAL VAL VAL VAL VAL VAL VAL VAL VAL VAL VAL VAL VAL 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 94.998 95.429 96.454 97.036 94.224 94.552 96.650 97.856 98.520 99.620 97.552 98.668 98.376 97.200 95.914 97.860 98.445 99.368 99.166 97.295 96.274 96.551 100.485 101.528 101.411 101.934 102.838 104.091 102.797 100.655 100.456 101.670 102.130 42.191 40.983 41.317 40.419 40.251 39.002 42.596 42.962 44.122 44.548 43.353 42.597 42.058 41.055 41.720 44.623 45.659 44.978 43.839 46.513 46.818 45.804 45.651 45.162 46.084 47.209 45.299 45.158 44.166 45.655 46.549 46.602 45.550 11.357 10.663 9.581 8.970 10.042 9.430 9.245 8.519 9.242 8.890 7.088 6.343 4.935 4.851 5.009 10.300 11.131 12.166 12.611 11.784 10.703 12.868 12.444 13.338 14.514 14.448 12.561 13.391 11.535 15.547 16.700 17.625 18.072 Molecular Genetics 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 62.19 64.90 67.08 70.87 64.93 62.85 65.06 60.85 53.37 53.84 65.57 72.00 76.77 81.18 81.33 46.52 46.32 45.91 46.98 44.70 42.01 50.64 43.14 38.76 39.00 39.37 39.41 36.07 41.88 39.30 42.34 42.23 43.38 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 5ICD 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 63 DNA & Chromosomes www.ogm-info.com/adn.html (c) David Gilbert 2008 Molecular Genetics 64 Diploidy (Image from D.Leader) (Image from D.Leader) (c) David Gilbert 2008 Molecular Genetics 65 Centromeres and Telomeres (Image from D.Leader) (c) David Gilbert 2008 Molecular Genetics 66 Recombination & meiosis • • • • • • • Meiosis is the process that transforms one diploid cell into four haploid cells in eukaryotes in order to redistribute the diploid's cell's genome. Forms the basis of sexual reproduction and can only occur in eukaryotes. The diploid cell's genome, composed of ordered structures of coiled DNA called chromosomes, is replicated once and separated twice, producing four sets of haploid cells each containing half of the original cell's chromosomes. These resultant haploid cells will fertilize with other haploid cells of the opposite gender to form a diploid cell again. The cyclical process of separation by meiosis and genetic recombination through fertilization is called the life cycle. The result is that the offspring produced during germination after meiosis will have a slightly different blueprint which has instructions for the cells to work, contained in the DNA. This allows sexual reproduction to occur. Biochemically, meiosis uses many similar processes that mitosis (cell division) uses in order to manipulate the redistribution of chromosomes, but with a vastly different outcome. (c) David Gilbert 2008 Molecular Genetics 67 Recombination • Genetic recombination: the transmission-genetic process by which the combinations of alleles observed at different loci (plural of locus) in two parental individuals become shuffled in offspring individuals. • Such shuffling can be the result of recombination via intrachromososomal recombination (crossing over) and via interchromososomal recombination (independent assortment). • Recombination therefore only shuffles already existing genetic variation and does not create new variation at the involved loci. • In molecular biology, recombination generally refers to the molecular process by which genetic variation found associated at two different places in a continuous piece of DNA becomes disassociated (shuffled). In this process one or both of the genetic variants are replaced by different variants found at same two places in a second DNA molecule. One mechanism leading to such molecular recombination is chromosomal crossing over. (c) David Gilbert 2008 Molecular Genetics 68 Recombination (Image from D.Leader) (c) David Gilbert 2008 Molecular Genetics 69 • • • • Recombination Can have replication errors here In meiosis, the precursor cells of the sperm or ova must multiply and at the same time reduce the number of chromosomes to one full set. During the early stages of cell division in meiosis, two chromosomes of a homologous pair may exchange segments, producing genetic variations in germ cells. Genes that lie far apart are likely to end up on two different chromosomes. On the other hand, genes that lie very close together are less likely to be separated by a break and crossing-over. Genes that tend to stay together during recombination are said to be linked. Sometimes, one gene in a linked pair serves as a "marker" that can be used by geneticists to infer the presence of the other (often, a disease-causing gene). http://www.accessexcellence.org/RC/VL/GG/comeiosis.html (c) David Gilbert 2008 Molecular Genetics 70 Sexual reproduction Can have replication errors here X Diploid progeny (c) David Gilbert 2008 Molecular Genetics 71 Cell levels Literature PubMed Systems Behaviour OMIM System boundary Metabolome space … Metabolic networks … metabolite1 metabolite2 … aMAZE BIND Brenda Signalling Pathways KEGG DIP metabolite3 Proteome space Protein-protein Interaction protein1 Complex1-3 ENZYME CATH Prodom protein2 LIGAND Klotho GO MIPS … GelBANK InterPro SwissSCOP 2DPAGE PIR PDB protein3 SwissProt Transcriptome space RegulonDB … GXD PathDB TRANSPATH WIT2 RNA2 Gene RNA1 Regulatory Pathways RNA3 gene1 Molecular Genetics gene3 Rfam RNA Sequence ArrayExpress Database Genome space gene2 (c) David Gilbert 2008 SMD NDB COG MethDB … TRANSFAC TIGR Ensembl GenBank/ DDBJ/ EMBL UCSC Genome Browser 72 The bigger story sequence Sequencing (fast) structure Structure determination (slow, not all) function gene product ⊗ ⊗ Bioengineering? function Network/organism (c) David Gilbert 2008 Molecular Genetics Biological assays in-vitro Biological assays in-vivo… 73 Summary - message • • • • • • • • • DNA → RNA → Protein Central Dogma Transcription Translation Genetic code, codon “Easy” to sequence DNA Can compute RNA and Protein sequences from DNA Protein structure ? Biological function? – How does it all “work”? (c) David Gilbert 2008 Molecular Genetics 74

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Bioinformatics Molecular Genetics