* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Integrated Microbial Genomes
Proteolysis wikipedia , lookup
Transformation (genetics) wikipedia , lookup
Bioluminescence wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genomic library wikipedia , lookup
Point mutation wikipedia , lookup
Cyanobacteria wikipedia , lookup
Genetic engineering wikipedia , lookup
Community fingerprinting wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Microbial metabolism wikipedia , lookup
Metalloprotein wikipedia , lookup
Evolution of metal ions in biological systems wikipedia , lookup
An introduction to metalloenzymes and biotechnological approaches to studying them 12.755 L10 Urease is an enzyme that catalyzes the hydrolysis of urea into carbon dioxide and ammonia. The reaction occurs as follows: (NH2)2CO + H2O → CO2 + 2NH3 Aconitase Outline • • • • • • Introduction – global BGC to cellular physiology to metalloenzyme and molecular Categories of metalloprotein and metalloenzymes functions The code: amino acids The Genomic Firehose Bioinformatic terminology Intergrated Microbial Genomics Portal Roles of metal in biology (From Bioinorganic Chemistry, Lippard and Berg) Metalloprotein Functions • Dioxygen Transport • Hemoglobin-myoglobin family • Hemocyanins • Hemerthyrins • Electron Transfer (e.g. nitrogen fixation) • Structural Roles (zinc fingers) Metalloenzyme Functions (Note: Metalloenzymes are metalloproteins that perform a catalytic function) • Hydrolytic Enzymes (Carbonic Anhydrases) • Two Electron Redox Enzymes (Nitrate Reductase, oxidation of hydrocarbons by P-450) • Multielectron Pair Redox Enzymes (Cytochrome c, PSII, Nitrogenase) • Rearrangements (Vitamin B12) Metalloenzymes in Photosynthesis Metalloenzymes in Photosynthesis (From Raven 2000) Metalloenzymes in carbon fixation Metalloenzymes in Nitrogen Utilization Metalloenzymes in the Nitrogen Biogeochemical Cycle Key enzyme in the nitrification reaction: ammonia (NH3) hydroxylamine (NH2OH) nitrite (NO2-) Found in anaerobic oxidizing bacteria (AOB) but not the more abundant anaerobic oxidizing archaea (AOA) 24 hemes (irons) per molecule! What does nature actually use in the oceans if this enzyme is not present? How does a particular amino acid sequence create the function of a metalloprotein or the activity of a metalloenzyme? “The sequence itself is not informative; it must be analyzed by comparative methods against existing databases to develop hypothesis concerning relatives and function.“ Terminology for comparing sequences: • Identity: The extent to which two (nucleotide or amino acid) sequences are invariant. • Similarity: The extent to which nucleotide or protein sequences are related. The extent of similarity between two sequences can be based on percent sequence identity and/or conservation. In BLAST similarity refers to a positive matrix score. • Conservation: Changes at a specific position of an amino acid or (less commonly, DNA) sequence that preserve the physico-chemical properties of the original residue. • Homology - Similarity attributed to descent from a common ancestor. NOTE: it is binary, sequences have homology or they do not. Something cannot be “highly homologous” • Source: http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/glossary2.html BLAST Basic Local Alignment Search Tool • E-Values: Expectation value. The number of different alignments with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score. • In the limit of sufficiently large sequence lengths m and n, the statistics of HSP scores are characterized by two parameters, K and lambda. Most simply, the expected number of HSPs with score at least S is given by the formula The parameters K and lambda can be thought of simply as natural scales for the search space size and the scoring system respectively. • We call this the E-value for the score S. This formula makes eminently intuitive sense. Doubling the length of either sequence should double the number of HSPs attaining a given score. Also, for an HSP to attain the score 2x it must attain the score x twice in a row, so one expects E to decrease exponentially with score. • Raw Score: The score of an alignment, S, calculated as the sum of substitution and gap scores. Substitution scores are given by a look-up table (see PAM, BLOSUM). Gap scores are typically calculated as the sum of G, the gap opening penalty and L, the gap extension penalty. For a gap of length n, the gap cost would be G+Ln. The choice of gap costs, G and L is empirical, but it is customary to choose a high value for G (10-15)and a low value for L (1-2). • HSP: High-scoring segment pair. Local alignments with no gaps that achieve one of the top alignment scores in a given search. Sources: • http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/similarity.html • http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html Program Description blastp Compares an amino acid query sequence against a protein sequence database. blastn Compares a nucleotide query sequence against a nucleotide sequence database. blastx Compares a nucleotide query sequence translated in all reading frames against a protein sequence database. You could use this option to find potential translation products of an unknown nucleotide sequence. tblastn Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames. tblastx Compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. Please note that the tblastx program cannot be used with the nr database on the BLAST Web page because it is computationally intensive. There are many metalloenzymes often doing crucial cellular biochemical (and biogeochemical) processes Enzymes containing metals: • Superoxide dismutase • Urease • Aconitase • Zinc finger proteins • Carbonic anhydrase • Alkaline phosphatase • DNA polymerase • Nitrate Reductase • Multi-copper oxidase • uvrA (ultraviolet resistence gene) • Ferredoxin • Nitrogenase • Many more… There are also many proteins and enzymes that are involved in metal processes (uptake, storage, insertion, transformations etc). Integrated Microbial Genomics Joint Genome Institute, Department of Energy The U.S. Department of Energy (DOE) Office of Science supports innovative, high-impact, peer-reviewed biological science to seek solutions to difficult DOE mission challenges. These challenges include finding alternative sources of energy, understanding biological carbon cycling as it relates to global climate change, and cleaning up environmental wastes. •Cleanup of toxic-waste sites worldwide. •Production of novel therapeutic and preventive agents and pathways. •Energy generation and development of renewable energy sources (e.g., methane and hydrogen). •Production of chemical catalysts, reagents, and enzymes to improve efficiency of industrial processes. •Management of environmental carbon dioxide, which is related to climate change. •Detection of disease-causing organisms and monitoring of the safety of food and water supplies. •Use of genetically altered bacteria as living sensors (biosensors) to detect harmful chemicals in soil, air, or water. •Understanding of specialized systems used by microbial cells to live in natural environments with other cells. http://microbialgenomics.energy.gov/index.shtml The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis and annotation of all publicly available genomes from three domains of life, in a uniquely integrated context. Go To: http://img.jgi.doe.gov/ Compile list of Organisms IMG Carts • Carts are needed since IMG resets your session’s cache when you leave the site. • Carts are an easy way to save a list of: – Organisms (eg. all cyanobacteria) – Genes (i.e you have a list of genes that code for superoxide dismutase in 16 different organisms) – Functions (you have a list of the most popular metalloenzymes in the form of COG, Pfam, TigerFam, or EC#) • Saved as tab delimited text files Organism Cart (cyanobac) taxon_oid 641228474 637000006 638341074 640612201 637000121 640963043 639857035 639857037 638341137 637000199 640069321 640753041 640069322 640069323 637000210 637000211 640069324 640069325 637000212 637000213 637000214 641228501 637000307 637000308 639857006 637000309 637000310 637000311 637000312 637000313 640427148 639857007 638341213 638341214 640427149 638341215 637000314 637000315 637000320 637000329 Genome Name Acaryochloris marina MBIC11017 Anabaena variabilis ATCC 29413 Crocosphaera watsonii WH 8501 Cyanothece sp. CCY 0110 Gloeobacter violaceus PCC 7421 Leptolyngbya valderiana BDU 20041 Lyngbya sp. PCC 8106 Nodularia spumigena CCY9414 Nostoc punctiforme PCC 73102 Nostoc sp. PCC 7120 Prochlorococcus marinus AS9601 Prochlorococcus marinus MIT 9215 Prochlorococcus marinus MIT 9301 Prochlorococcus marinus MIT 9303 Prochlorococcus marinus MIT 9312 Prochlorococcus marinus MIT 9313 Prochlorococcus marinus MIT 9515 Prochlorococcus marinus NATL1A Prochlorococcus marinus NATL2A Prochlorococcus marinus marinus CCMP1375 Prochlorococcus marinus pastoris CCMP1986 Prochlorococcus marinus str. MIT 9211 Synechococcus elongatus PCC 6301 Synechococcus elongatus PCC 7942 Synechococcus sp. BL107 Synechococcus sp. CC9311 Synechococcus sp. CC9605 Synechococcus sp. CC9902 Synechococcus sp. JA-2-3Ba(2-13) Synechococcus sp. JA-3-3Ab Synechococcus sp. RCC307 Synechococcus sp. RS9916 Synechococcus sp. RS9917 Synechococcus sp. WH 5701 Synechococcus sp. WH 7803 Synechococcus sp. WH 7805 Synechococcus sp. WH 8102 Synechocystis sp. PCC 6803 Thermosynechococcus elongatus BP-1 Trichodesmium erythraeum IMS101 SequencingDomain Status Finished Bacteria Finished Bacteria Draft Bacteria Draft Bacteria Finished Bacteria Draft Bacteria Draft Bacteria Draft Bacteria Draft Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Draft Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Draft Bacteria Draft Bacteria Draft Bacteria Finished Bacteria Draft Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Finished Bacteria Genes GC Perc Bases 8488 0.47 8361599 5764 0.41 7068601 6004 0.37 6238156 6520 0.37 5880532 4488 0.62 4659019 12 0.53 89264 6185 0.41 7037511 4904 0.41 5316258 7818 0.41 9020037 6217 0.41 7211789 1982 0.31 1669886 2056 0.31 1738790 1963 0.31 1641879 3127 0.5 2682675 1856 0.31 1709204 2345 0.51 2410873 1964 0.31 1704176 2247 0.35 1864731 1942 0.35 1842899 1932 0.36 1751080 1765 0.31 1657990 1901 0.38 1688963 2584 0.55 2696255 2715 0.55 2742269 2553 0.54 2283377 2945 0.52 2606748 2756 0.59 2510659 2358 0.54 2234828 2938 0.58 3046680 2891 0.6 2932766 2583 0.61 2224914 3009 0.6 2664465 2820 0.64 2579542 3401 0.65 3043834 2586 0.6 2366980 2938 0.58 2620367 2586 0.59 2434428 3626 0.47 3947019 2554 0.54 2593857 5124 0.34 7750108 Gene cart (Cu/Zn superoxide dismutase) gene_oid 641254312 637459373 637459565 640015250 639885006 638115359 637776096 637771156 640545246 639889548 640543304 639020551 Locus Tag AM1_5239 glr1981 glr2170 L8106_24545 BL107_14050 sync_1771 Syncc9605_1507 Syncc9902_0982 SynRCC307_0325 RS9916_26849 SynWH7803_0951 WH7805_01302 Gene Symbol Product Name sodCC copper/zinc superoxide dismutase similar to superoxide dismutase similar to superoxide dismutase superoxide dismutase putative superoxide dismutase sodC Copper/zinc superoxide dismutase superoxide dismutase precursor (Cu-Zn) putative superoxide dismutase sodC Superoxide dismutase [Cu-Zn]( EC:1.15.1.1 ) superoxide dismutase precursor (Cu-Zn) sodC Superoxide dismutase [Cu-Zn]( EC:1.15.1.1 ) putative superoxide dismutase AA Seq Length 196 233 191 201 198 175 178 175 175 177 174 174 Genome Acaryochloris marina MBIC11017 Gloeobacter violaceus PCC 7421 Gloeobacter violaceus PCC 7421 Lyngbya sp. PCC 8106 Synechococcus sp. BL107 Synechococcus sp. CC9311 Synechococcus sp. CC9605 Synechococcus sp. CC9902 Synechococcus sp. RCC307 Synechococcus sp. RS9916 Synechococcus sp. WH 7803 Synechococcus sp. WH 7805 Function Cart (metalloenzymes) func_id func_name COG0619 ABC-type cobalt transport system, permease component CbiQ and related transporters COG1122 ABC-type cobalt transport system, ATPase component COG1930 ABC-type cobalt transport system, periplasmic component COG2032 Cu/Zn superoxide dismutase COG2140 Thermophilic glucose-6-phosphate isomerase and related metalloenzymes COG3227 Zinc metalloprotease (elastase) COG4097 Predicted ferric reductase COG4300 Predicted permease, cadmium resistance protein pfam01676Metalloenzyme pfam01794Ferric_reduct pfam02022Integrase_Zn pfam02361CbiQ pfam02553CbiN pfam02742Fe_dep_repr_C pfam03596Cad In Class Exercise on IMG: http://img.jgi.doe.gov • Load genomes – – – – – • Go to “FIND GENOMES” Click “VIEW PHYTOGENETICALLY” Click “CLEAR ALL” to unselect all genomes Click “ALL” after Cyanobacteria listings to select all Cyanobacterial genomes Click “SAVE SELECTIONS” to choose only these selected Cyanobacterial genomes. Note at top now it should say 40 genomes selected. Gene Search for Superoxide Dismutase, using “FIND GENES” function – – – – – – By “GENE SEARCH”: type in superoxide dismutase and hit search. Note that this will only return genes that have been “annotated” as a superoxide dismutase by a previous computer or human annotator. Go ahead and grab a sequence for Synechococcus strain WH8102’s nickel superoxide dismutase, by clicking on the 474bp to the clipboard (highlight the area and hit control-C). Note that this is the DNA sequence. Click the “FIND GENES” tab and then the “BLAST” tab: Paste in the nickel superoxide dismutase into open box. Choose BLASTn for nucleotide (DNA) search Set the cutoff value to 1e-2, (less stringent). Note that the best hit is where you got the sequence from. Repeat, but now with the amino acid sequence instead of the DNA sequence Blast results Sequences producing significant alignments: (bits) E-Value 637000314.NC_005070 Synechococcus sp. WH 8102, complete genome. 637000310.NC_007516 Synechococcus sp. CC9605, complete genome. 639857006.NZ_AATZ01000003 Synechococcus sp. BL107, unfinished se... 637000311.NC_007513 Synechococcus sp. CC9902, complete genome. 637000309.NC_008319 Synechococcus sp. CC9311, complete genome. 640069323.NC_008820 Prochlorococcus marinus str. MIT 9303, compl... 637000211.NC_005071 Prochlorococcus marinus str. MIT 9313, compl... 640963030.NZ_ABCS01000039 Plesiocystis pacifica SIR-1, unfinishe... 637000213.NC_005042 Prochlorococcus marinus subsp. marinus str. ... 641228501.NC_009976 Prochlorococcus marinus str. MIT 9211, compl... 940 0.0 389 e-105 311 3e-82 287 4e-75 208 3e-51 168 3e-39 153 2e-34 68 7e-09 54 1e-04 48 0.007 The COG database: new developments in phylogenetic classification of proteins from complete genomes Roman L. Tatusov, Darren A. Natale, Igor V. Garkavtsev, Tatiana A. Tatusova, Uma T. Shankavaram, Bachoti S. Rao, Boris Kiryutin, Michael Y. Galperin, Natalie D. Fedorova, and Eugene V. KooninaNational Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih.gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis. Growth dynamics of the COG set with the increase of number of included genomes. The circles show the sequence of genome inclusion according to the actual order of sequencing, and the smooth line shows the mean of 106 random permutations of the genome order. The colored area indicates the range between the maximal and minimal value for each point (number of genomes) in 106 random permutations. Nucleic Acids Res. 2001 January 1; 29(1): 22–28. End for today