Download Plant Genome Resources at the National Center for Biotechnology

Bioinformatics Plant Genome Resources at the National Center for Biotechnology Information David L. Wheeler*, Brian Smith-White, Vyacheslav Chetvernin, Sergei Resenchuk, Susan M. Dombrowski, Steven W. Pechous, Tatiana Tatusova, and James Ostell National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20894 The National Center for Biotechnology Information (NCBI) integrates data from more than 20 biological databases through a flexible search and retrieval system called Entrez. A core Entrez database, Entrez Nucleotide, includes GenBank and is tightly linked to the NCBI Taxonomy database, the Entrez Protein database, and the scientific literature in PubMed. A suite of more specialized databases for genomes, genes, gene families, gene expression, gene variation, and protein domains dovetails with the core databases to make Entrez a powerful system for genomic research. Linked to the full range of Entrez databases is the NCBI Map Viewer, which displays aligned genetic, physical, and sequence maps for eukaryotic genomes including those of many plants. A specialized plant query page allow maps from all plant genomes covered by the Map Viewer to be searched in tandem to produce a display of aligned maps from several species. PlantBLAST searches against the sequences shown in the Map Viewer allow BLAST alignments to be viewed within a genomic context. In addition, precomputed sequence similarities, such as those for proteins offered by BLAST Link, enable fluid navigation from unannotated to annotated sequences, quickening the pace of discovery. NCBI Web pages for plants, such as Plant Genome Central, complete the system by providing centralized access to NCBI’s genomic resources as well as links to organism-specific Web pages beyond NCBI. DATABASES COVERED BY THE NATIONAL CENTER FOR BIOTECHNOLOGY INFORMATION’S ENTREZ SYSTEM The National Center for Biotechnology Information (NCBI) provides a data-rich environment in support of genomic research by integrating the data from more than 20 biological databases through a flexible search and retrieval system called Entrez. A core database in Entrez, Entrez Nucleotide, includes GenBank (Benson et al., 2005), a primary database of nucleotide sequences built and maintained by NCBI and synchronized daily with the DNA Databank of Japan (Tateno et al., 2004) and the European Molecular Biological Laboratory (Kanz et al., 2004) databases. Entrez Nucleotide contains almost 12 million plant-derived sequences. Entrez databases that are coupled tightly with the nucleotide sequences in Entrez Nucleotide are the NCBI Taxonomy, Protein, PubMed, and PubMed Central databases. The NCBI Taxonomy database covers over 160,000 organisms and includes almost 60,000 plant taxa. Entrez Protein comprises sequences taken from several sources (Boeckmann et al., 2003; Wu et al., 2003; Deshpande et al., 2004), including the conceptual translations of coding regions annotated on GenBank records, and contains over 350,000 plant-derived sequences. This primary sequence data is coupled to the scientific literature via links to PubMed and PubMed Central. In addition to the core databases mentioned, the Entrez system also covers a suite of more special* Corresponding author; e-mail [email protected]; fax 301–480–9241. www.plantphysiol.org/cgi/doi/10.1104/pp.104.058842. 1280 ized databases using a variety of database schemas. A few of particular relevance to genomics are Entrez Genomes, containing genomic sequence and annotations for over 1,000 organisms, and including 39 complete sequences for plant chromosomes, plastids, and mitochondria; UniGene, containing gene-oriented sequence clusters for over 40 organisms, and including more than 200,000 clusters for 20 plants; Entrez Gene (Maglott et al., 2004), a gene-centered resource covering over a million loci, including over 81,000 loci for plants; HomoloGene, containing clusters of homologous genes for over a dozen model organisms, including almost 16,000 clusters for plants; RefSeq (Pruitt et al., 2005), released bimonthly via FTP, updated daily in Entrez, and containing reference nucleotide and protein sequences for almost 3,000 organisms, including over 65,000 transcript sequences and more then 69,000 protein sequences for 40 plants; UniSTS, a nonredundant database of sequence-tagged sites (STSs) from over 200 organisms, including over 8,000 STSs from 30 plants; dbSNP (Sherry et al., 2001), a database of nucleotide variations from over 20 organisms, including almost four million single nucleotide polymorphisms (SNPs) from six plants; the Conserved Domain Database (CDD; Marchler-Bauer et al., 2004), containing over 10,000 alignment-based protein domain profiles, including over 250 that are specific to plants; and the Gene Expression Omnibus (GEO; Barrett et al., 2004) database of gene expression, with data from over 20 organisms, including 38 datasets and over 600,000 expression profiles for plant tissues. A network of links allows fluid navigation between these databases. Links may interconnect a sequence in Entrez Nucleotide to the abstract in Plant Physiology, July 2005, Vol. 138, pp. 1280–1288, www.plantphysiol.org Ó 2005 American Society of Plant Biologists Downloaded from on August 3, 2017 - Published by www.plantphysiol.org Copyright © 2005 American Society of Plant Biologists. All rights reserved. Plant Genome Resources PubMed of the paper in which it is cited. If a sequence in Entrez Nucleotide bears annotated coding regions, then it will be interconnected to the sequences in Entrez Protein derived from the respective conceptual translations. Many Entrez links are computationally derived, such as links between protein sequences made on the basis of sequence similarity. These precomputed protein alignments, updated daily, are available for viewing using the BLAST Link (BLink) resource. To see the variety of links to other databases or resources available for any record in Entrez, click on the ‘‘Links’’ link in the upper right-hand corner of the Entrez display. The NCBI Map Viewer (http://www.ncbi.nlm.nih. gov/mapview/), used to display genomic maps for many plant and animal genomes, takes advantage of the same database technology as Entrez and supports text queries with Boolean logic. Searches of complete genomic sequences from organisms ranging from microbes to higher plants and animals may be performed via genomic BLAST searches that lead to Map Viewer displays in which the genomic context of the hits can be seen. The NCBI genomics environment offers a unique opportunity for researchers. While specialized genomic resources provide excellent coverage of the data for a single organism or a closely related group of organisms, the environment at NCBI enables the comparison of genomic data from species across the entire taxonomic range. This wide taxonomic coverage greatly enhances the utility of the data by allowing advances in the understanding of the genomics of one organism to be applied to a broader genomic context spanning many organisms. In the descriptions that follow, addresses for the key NCBI Web pages appear directly in the text and are also compiled in the ‘‘URLs for NCBI Resources and Download Sites’’ section below. For a more detailed overview of NCBI resources, see the NCBI home page (http://www.ncbi.nlm.nih.gov) and the review in Wheeler et al. (2005). Details of the creation and curation of NCBI RefSeqs can be found at http:// www.ncbi.nlm.nih.gov/RefSeq/, and a synopsis of the HomoloGene build procedure can be found at http://www.ncbi.nlm.nih.gov/HomoloGene/HTML/ homologene_buildproc.html. SOURCES AND ORGANIZATION OF THE PLANT GENOMIC DATA AT NCBI The bulk of the primary plant genomic data available at NCBI falls into one of three categories that mirror the types of projects currently conducted by the plant research community; these are genomic assemblies, batches of Expressed Sequence Tags (ESTs), and genetic or physical genomic maps. Genomic assemblies include that of the Arabadopsis (Arabidopsis thaliana) genome produced by the Arabidopsis Genome Initiative (2000), the assembly of Oryza sativa subsp. japonica resulting from the International Rice Genome Sequencing Project (Yu et al., 2002) and yielding assemblies for chromosomes 1 and 10, and that of O. sativa L. subsp. indica produced by the Chinese Academy of Sciences in Beijing (Sasaki et al., 2002). Emerging data from several plant genome sequencing projects, including projects for Lycopersicon esculentum, Medicago sativa and Medicago truncatula, Lotus corniculatus var. japonicus, Populus balsamifera, Poncirus trifoliata, Vitis vinifera, and many others, can be accessed in the NCBI Genome Project Database (http://www.ncbi.nlm.nih.gov/entrez/query. fcgi?CMD5search&DB5genomeprj), which provides a good overview of current and planned sequencing efforts. Data resulting from large-scale EST sequencing projects for over 70 plants have been deposited into GenBank and are integrated with other sequence data at NCBI in a number of ways. Organisms for which more than 70,000 EST sequences have been deposited are automatically entered into the UniGene database, in which case their EST sequences are combined with other transcript sequences in GenBank and partitioned into gene-oriented clusters. In this manner, ESTs from 20 plant species have been used to produce more than 200,000 transcript clusters. The UniGene clusters themselves are subjected to further analysis to link them to the STSs in UniSTS, genes in Entrez Gene, gene homologs in HomoloGene, and proteins in Entrez Protein. As a consequence, it is often possible to simply query Entrez with a GenBank EST accession number and wind up with a gene location, protein sequence, and estimate of sequence conservation across organisms in a single step. ESTs arising from several plant species as well as the UniGene clusters derived from them are aligned to the well-assembled Arabidopsis and O. sativa genomes. ESTs from the Liliopsida are aligned to the O. sativa genome; these include Hordeum vulgare, O. sativa, Sorghum bicolor, Triticum aestivum, and Zea mays. Those from the Eudicotyledons are aligned to the Arabidopsis genome and include Arabidopsis itself, Glycine max, Lactuca sativa, L. corniculatus, Malus 3 domestica, M. truncatula, Populus tremula 3 Populus tremuloides, Solanum tuberosum, and V. vinifera. These alignments make an important link between EST data, which is relatively inexpensive to obtain, and assembled, wellannotated genomic sequence, a more expensive commodity. In addition to these ESTs, more than three million SNPs, submitted to NCBI’s dbSNP database, have been mapped to the O. sativa genome at NCBI. Genetic mapping projects are under way for a number of plants, and maps resulting from these projects are displayed in the Map Viewer discussed below. Such nonsequence data is imported by NCBI from many publicly available databases in formats ranging from Structured Query Language table dumps to flat files with unique formats. NCBI processes the data to link feature locations with GenBank records, with data in related databases, and with URLs that point back to the source of the data. Once processed, the data is Plant Physiol. Vol. 138, 2005 1281 Downloaded from on August 3, 2017 - Published by www.plantphysiol.org Copyright © 2005 American Society of Plant Biologists. All rights reserved. Wheeler et al. entered into the Map Viewer database for display. Sources of nonsequence data shown in the Map Viewer include maps from MaizeGDB (Lawrence et al., 2004), Gramene (Ware et al., 2002), GrainGenes (2005, http://wheat.pw.usda.gov/GG2/index.shtml), the Solanaceae Genomics Network (2005, http:// www.sgn.cornell.edu), and BarleyDB (BarleyDB at Cropnet, 2005, http://ukcrop.net/barley.html). The map names in Map Viewer displays link to descriptions of the source of each map data with references to the primary data. A summary of the scope of the plant genomic data available at NCBI with links to associated resources can be found on the Plant Genome Central Web page. ACCESSING THE DATA The Map Viewer provides a central interface to plant genomic data at NCBI, serving as an interactive tool for viewing an ensemble of aligned genetic, physical, or sequence-based maps with an adjustable focus ranging from that of a complete chromosome to that of a portion of a gene. The maps displayed in the Map Viewer may be derived from a single organism or from multiple organisms; map alignments are performed on the basis of shared markers. There are many species of plant for which Map Viewer displays are available: Arabidopsis, oat (Avena sativa), G. max, H. vulgare, L. esculentum, O. sativa, S. bicolor, T. aestivum, Triticum monococcum, Z. mays, and L. corniculatus var. japonicus. The creation of displays for M. sativa and M. truncatula is in progress. A number of genetic or physical maps are usually available for each organism displayed in the Map Viewer. For two organisms, O. sativa and Arabidopsis, sequence maps are available with annotated genes, and for these two genomes EST and UniGene alignments are available, as discussed above. Annotations on the plant genomic assemblies displayed in the Map Viewer are those produced by the appropriate model organism community annotation effort. References and links to the projects generating these annotations are shown on Map Viewer genome overview pages. Navigation to a target Map Viewer display is via one of three primary routes: a Map Viewer text query, an Entrez text query, or a PlantBLAST search. When a match is found through one of these avenues, it is possible to navigate to a Map Viewer display from which many routes to genomic comparison and analysis are available. The usage scenarios outlined below illustrate these three entry mechanisms. USAGE SCENARIO: BEGINNING WITH A MAP VIEWER TEXT QUERY The Plant Genomes Query page (http://www. ncbi.nlm.nih.gov/mapview/map_search.cgi?chr5plants. inf) enables one to simultaneously search physical, genetic, and sequence maps shown in Map Viewer for several plant genomes. Queries may include clone names or aliases, gene names or symbols, locus identifiers, gene products or descriptions, genetic marker or GenBank identifiers. Multiple search terms can be combined using Boolean logic. Since the early 1990s, various lines of research have shown that large-scale genome structure is conserved in blocks across the grasses (Ahn and Tanksley, 1993; Devos et al., 1994; Kurata et al., 1994; Van Deynze et al., 1995). Locus nomenclature is organism specific and is unreliable as a query method between species; however, the regular nomenclature of plasmids (Lederberg, 1986) is not influenced by how the plasmid or insert is used. The data for the plant maps available through Map Viewer include the marker-locus relationships, based on experimental evidence and provided by model organism annotators external to NCBI, for each locus where the allelic state is identified by the nucleic acid sequence. This information enables the rendering of visual connection between those mapped loci in adjacently displayed maps that were identified by the same probe. In principle, this locus-probe relationship allows a cross-species text search using the probe name as the query string as depicted in Figure 1. The text query used was ‘‘cdo718’’ (Fig. 1A), the name of a plasmid with an oat cDNA insert. This probe was used to map loci in nine maps available in Map Viewer: the AxH-92 map in oat; the Cons map in H. vulgare; the RC94, RW99, R, RC00, and RC01 maps in O. sativa; the S-0 map in T. aestivum; and the RW99 map in Z. mays. The red lines between each map connect the loci identified by the probe. The gray lines connect the other loci in adjacent maps that have been identified by the same probe. Note that the Map Viewer display of Figure 1 shows full marker names for each track because the compress maps check box (Fig. 1B), checked by default to accommodate a large number of maps, has been unchecked. Mouseovers reveal more information about each marker (Fig. 1C) and links to external resources, such as MaizeGDB (Lawrence et al., 2004), are found in the links column (Fig. 1D). USAGE SCENARIO: BEGINNING WITH A TEXT-BASED SEARCH IN ENTREZ The entire suite of Entrez databases may be searched in tandem using the Entrez Global Query page (http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi). Using Global Query, it is not necessary to know in advance in which databases the data will be found, and, as such, it is an excellent starting point for an initial survey of a new research area. As an example, consider a search for salt tolerance-related genes in plants. One may begin with the following query: ‘‘salt tolerance’’ AND viridiplantae [organism]. The query consists of two parts, linked by a Boolean ‘‘AND,’’ which is always given in capital letters. For 1282 Plant Physiol. Vol. 138, 2005 Downloaded from on August 3, 2017 - Published by www.plantphysiol.org Copyright © 2005 American Society of Plant Biologists. All rights reserved. Plant Genome Resources Figure 1. Map Viewer displays resulting from a search for marker ‘‘cdo718’’ showing aligned maps from several plants. The text ‘‘cdo718,’’ shown near A, is highlighted on each map with lines between maps connecting the highlighted markers (C). The Compress maps check box (B) has been unchecked to show full locus names on the maps. The Symbol links under D lead to the MaizeGDB (Lawrence et al., 2004) Web resource. the query to be successful, the quoted phrase ‘‘salt tolerance’’ must be found in some part of a database record. The second portion of the query limits the search to the taxon viridiplantae within the Entrez organism field, given in the square brackets. Hence, for databases, such as the sequence databases, that classify records by organism, only records for the viridiplantae will be returned. The entries identified with the query are displayed on the Global Entrez results page (Fig. 2H). The search matches entries in the Entrez Nucleotide, Protein, Gene, UniGene, HomoloGene, and GEO databases. Navigating directly to the matches in the Gene database by clicking on the link in Global Query generates a display of summaries (data not shown) for six genes. Clicking on the link to the first of these Entrez Gene records, the salt-tolerance (STO) gene of Arabidopsis, yields the report shown in the main area of Figure 2. The report begins with the Entrez Gene GeneID and name of the gene, shown in section A. Following this is a graphical representation of the structure of the gene, section B, showing two splice variants, a two-exon form, and a three-exon form. In section C, the genomic context of STO is shown, followed, in section D, by its type, aliases, taxonomic lineage, and links to the literature in PubMed. Gene Ontology (Gene Ontology Consortium, 2004) terms are given in section E. Gene Ontology terms are taken directly from the Gene Ontology database without modification. Section F, titled ‘‘NCBI Reference Sequences,’’ gives links to transcript and protein reference sequences as well as links to precomputed analyses showing the existence of a Z-box zinc finger domain from the CDD in the predicted protein. Many links to other NCBI databases such as UniGene and HomoloGene, to other NCBI resources such as Map Viewer, and to other organismspecific sites such as The Arabidopsis Information Resource (TAIR; Garcia-Hernandez et al., 2002) and Munich Information Center for Protein Sequence (MIPS) Arabidopsis Database (Schoof et al., 2002) are available from the Links pulldown menu shown in section G. Plant Physiol. Vol. 138, 2005 1283 Downloaded from on August 3, 2017 - Published by www.plantphysiol.org Copyright © 2005 American Society of Plant Biologists. All rights reserved. Wheeler et al. Figure 2. Entrez Gene report for the STO gene showing many links to related databases and resources. The Entrez Gene report for STO shows the GeneID and name of the gene (A), the structure of the gene (B), its genomic context (C), gene-type, aliases, taxonomic lineage, and links to the literature. Gene Ontology (Gene Ontology Consortium, 2004) terms are listed in section E, while links to RefSeqs and NCBI Conserved Domains detected within gene products are given in section F. Many links to other NCBI databases such as UniGene and HomoloGene as well as to other organisms-specific sites such as TAIR (Garcia-Hernandez et al., 2002) and MIPS (Schoof et al., 2002) are available from the Links pulldown menu. Returning to the Global Query page, it may be of interest to see if additional genes related to salt tolerance can be found using a different route. For example, clicking on the Global Query link to the Entrez Protein database yields a list of the first 20 matching proteins. Among these entries is a SwissProt-derived record for Arabidopsis SAL1 phosphatase, accession number Q42546. Clicking on the accession number to see the GenBank flat file view of this entry reveals that the first literature reference (Quintero et al., 1996) within the record is titled ‘‘The SAL1 gene of Arabidopsis, encoding an enzyme with 3#(2#),5#bisphosphate nucleotidase and inositol polyphosphate 1-phosphatase activities, increases salt tolerance in yeast.’’ Navigating from the protein record to Entrez Gene using the a Links pulldown menu similar to that shown in Figure 2G returns a report for a gene not identified in Entrez Gene by a direct text query because ‘‘salt tolerance’’ is not among its annotations. The report for this gene lists Gene Ontology terms suggesting an involvement with stress responses other than salt concentration, such as cold stress, so it is possible that the gene has a role in a general stress-response pathway. USAGE SCENARIO: BEGINNING WITH A SEQUENCE-BASED SEARCH USING PLANTBLAST The suite of BLAST sequence similarity programs offers an alternative to a text query as a method of entry into the Entrez system. Using BLAST, a novel sequence can be quickly linked to known database sequences to which it is similar. Links from these 1284 Plant Physiol. Vol. 138, 2005 Downloaded from on August 3, 2017 - Published by www.plantphysiol.org Copyright © 2005 American Society of Plant Biologists. All rights reserved. Plant Genome Resources sequence records to others in Entrez allow fluid navigation between NCBI resources. As an example, consider an unknown European white birch (Betula pendula) EST, GenBank accession number AJ606366. A simple Entrez query reveals that this EST does not belong to any UniGene cluster and, therefore, no geneoriented information is available. In addition, since European white birch ESTs are not among those routinely aligned to the O. sativa or Arabidopsis genomic sequence by NCBI, it will not be possible to simply look this one up in Map Viewer. However, using Plant Genome BLAST, it may still be possible to place this EST on the Arabidopsis genome. The PlantBLAST Page (http://www.ncbi.nlm.nih. gov/BLAST/Genome/PlantBlast.shtml) is reached via the Plants link on the BLAST homepage. The query form allows the selection of the species of plant to BLAST against, the database to search, and the BLAST algorithm to use. For example, one may select Arabidopsis, one of 10 available plant species, as the organism, the set of protein sequences derived from its genomic annotation as the database, and the BLASTX program from the pulldown menus, using the form shown in Figure 3A. Alternatively, mapped sequences from all available plants may be selected as the target of the search. The BLASTX program allows one to compare the six-frame protein translations of a nucleotide query to the sequences in a protein database and is a very sensitive way to perform a cross-species search. As a query, either the EST sequence or its GenBank accession number, AJ606366, may be used. Such a search returns significant matches to several members of the chitinase protein family, among them NCBI RefSeq accession number NP_172076. Clicking Figure 3. Pathway beginning with a PlantBLAST search of an EST sequence. Armed with an EST sequence, one may navigate to a Map Viewer display for a sequenced-anchored organism by performing a translated PlantBLAST search against mapped proteins from Arabidopsis (A). The BLAST program chosen is BLASTX in which the six-frame translations of the EST sequence will be compared to sequences from the selected protein database. Matches to Arabidopsis proteins mapping to the genome are seen in the overview PlantBLAST graphic (B); clicking on the ‘‘NP_172076’’ link to the first hit in the table generates a Map Viewer display (C) of the Arabidopsis Genes_seq track showing the position of the hit. The ‘‘At1g05850’’ link immediately to the right of the Genes_seq track in the Map Viewer display leads to Entrez Gene and is followed by links to other NCBI and external organism-specific resources. Plant Physiol. Vol. 138, 2005 1285 Downloaded from on August 3, 2017 - Published by www.plantphysiol.org Copyright © 2005 American Society of Plant Biologists. All rights reserved. Wheeler et al. on the Genome View button in the BLAST results and adding the gene map as the rightmost map (data not shown) invokes a graphical Arabidopsis Map Viewer overview revealing many hits across the Arabidopsis genome (Fig. 3B). The ‘‘NP_172076’’ link in the overview leads to the detailed display the alignment of NP_172076 to the Arabidopsis genome of Figure 3C with links; ‘‘At1g05850’’ leads to an Entrez Gene report for the corresponding gene, and ‘‘TAIR’’ leads to The Arabidopsis Information Resource. The Entrez Gene report contains links to a publication (Zhong et al., 2002) in PubMed titled ‘‘Mutation of a chitinase-like gene causes ectopic deposition of lignin, aberrant cell shapes, and overproduction of ethylene.’’ The Gene report also contains a link to the BLink view (Fig. 4A) of alignments between the sequence of the At1g05850 gene product and similar protein sequences. Using the BLink report CDD-Search button, the CDD view of Figure 4B can be reached. The BLink report, a portion of which is shown in Figure 4A, presents 200 alignments to other plant proteins including proteins from O. sativa. Returning to the Map Viewer and following the link to the TAIR Web page for the gene (data not shown) also shows that this gene has two known mutant alleles leading to ectopic deposition of lignin in pith. Consider as a final example an EST with GenBank accession number AU251269 from Italian rye grass (Lolium multiflorum). A BLASTX PlantBLAST search against O. sativa proteins, performed as in the preceding case, reveals a putative protein homolog in O. sativa with RefSeq accession number NP_909676. Navigating to the BLink report for NP_909676 shows conservation across all taxonomy branches (Fig. 5A) and includes an alignment to human protein NP_003932, Wiskott-Aldrich syndrome gene-like protein (Fig. 5B). The Entrez Gene report for this curated RefSeq contains links to a large collection of articles in PubMed and a number of Gene Ontology terms pertaining to actin activity (Fig. 5, C and D) as well as an impressive array of links to other resources (Fig. 5E). Figure 4. BLink and CDD reports for the protein product of At1g05850. The BLink report shown in A presents precomputed alignments between the sequence of the At1g05850 gene product and similar sequences in the Entrez Protein database. The CDD-Search button in the BLink report leads to a conserved domain view (B) of the At1g05850 gene product. 1286 Plant Physiol. Vol. 138, 2005 Downloaded from on August 3, 2017 - Published by www.plantphysiol.org Copyright © 2005 American Society of Plant Biologists. All rights reserved. Plant Genome Resources Figure 5. BLink report for an O. sativa protein and the Entrez Gene report of a similar human protein. O. sativa protein NP_909676 aligns well to proteins from an array of organisms, as shown in the BLink report header (A). Among these proteins is a human protein, NP_003932, associated with Wiskott-Aldrich syndrome and found in position (B). The Entrez Gene report for the human protein provides information that may be helpful in understanding the O. sativa protein such as literature references in PubMed and Gene Ontology terms relating to actin (C and D). Links to many other NCBI and gene-related resources are given in the Links pulldown menu (E). URLS FOR NCBI RESOURCES AND DOWNLOAD SITES Resources NCBI Homepage: http://www.ncbi.nlm.nih.gov The Reference Sequence Database: http://www. ncbi.nlm.nih.gov/RefSeq/ Entrez Global Query: http://www.ncbi.nlm.nih. gov/gquery/gquery.fcgi Map Viewer: http://www.ncbi.nlm.nih.gov/ mapview/ Plant Genomes Central: http://www.ncbi.nlm.nih. gov/genomes/PLANTS/PlantList.html The Plant Genomes Query page: http://www. ncbi.nlm.nih.gov/mapview/map_search.cgi?chr5 plants.inf PlantBLAST: http://www.ncbi.nlm.nih.gov/ BLAST/Genome/PlantBlast.shtml Plant Genome sequencing projects database: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? CMD5search&DB5genomeprj Downloads Genome records can be downloaded from the following FTP directories: ftp://ftp.ncbi.nih.gov/ genbank/; ftp://ftp.ncbi.nih.gov/genomes/; ftp://ftp.ncbi.nih.gov/refseq/ Arabidopsis data in several file formats for each chromosome available at: ftp://ftp.ncbi.nih. gov/genomes/Arabidopsis_thaliana GenBank sequence identifiers (NCBI gis) used to create the databases available in PlantBLAST: ftp://ftp.ncbi.nih.gov/genomes/PLANTS/ BLASTDB/ Plant Physiol. Vol. 138, 2005 1287 Downloaded from on August 3, 2017 - Published by www.plantphysiol.org Copyright © 2005 American Society of Plant Biologists. All rights reserved. Wheeler et al. For details and the statistics for the NCBI rice genome assembly, see: http://www.ncbi.nlm.nih. gov/genomes/PLANTS/rice_0.html FASTA-format file of BAC-end GenBank records organized by organism: ftp://ftp.ncbi.nih.gov/ genomes/BACENDS/ ACKNOWLEDGMENTS The authors would like to thank Pavel Bolotov, Andrei Kochergin, Igor Tolstoy, and Boris Kiryutin for their expertise and diligence in the maintenance of many of the databases highlighted in this article. Received December 22, 2004; revised April 7, 2005; accepted April 22, 2005; published July 11, 2005. LITERATURE CITED Ahn SN, Tanksley SD (1993) Comparative linkage maps of the rice and maize genomes. Proc Natl Acad Sci USA 90: 7980–7984 Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R (2004) NCBI GEO: mining millions of expression profiles: database and tools. Nucleic Acids Res 33: D562–D566 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2005) GenBank: update. Nucleic Acids Res 33: D34–D38 Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O’Donovan C, Phan I, et al (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31: 365–370 Deshpande N, Addess KJ, Bluhm WF, Merino-Ott JC, Townsend-Merino W, Zhang Q, Knezevich C, Xie L, Chen L, Feng Z, et al (2004) The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res 33: D233–D237 Devos KM, Chao S, Li QY, Simonetti MC, Gale MD (1994) Relationship between chromosome 9 of maize and wheat homeologous group 7 chromosomes. Genetics 138: 1287–1292 Garcia-Hernandez M, Berardini TZ, Chen G, Crist D, Doyle A, Huala E, Knee E, Lambrecht M, Miller N, Mueller LA, et al (2002) TAIR: a resource for integrated Arabidopsis data. Funct Integr Genomics 2: 239 Gene Ontology Consortium (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32: D258–D261 Kanz C, Aldebert P, Althorpe N, Baker W, Baldwin A, Bates K, Browne P, van den Broek A, Castro M, Cochrane G, et al (2004) The EMBL nucleotide sequence database. Nucleic Acids Res 33: D29–D33 Kurata N, Moore G, Nagamura Y, Foote T, Yano M, Minobe Y, Gale MD (1994) Conservation of genome structure between rice and wheat. Biotechnology (N Y) 12: 276–278 Lawrence CJ, Dong Q, Polacco ML, Seigfried TE, Brendel V (2004) MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res 32: D393–D397 Lederberg EM (1986) Plasmid prefix designations registered by the Plasmid Reference Center 1977-1985. Plasmid 1: 57–92 Maglott D, Ostell J, Pruitt KD, Tatusova T (2004) Entrez Gene: genecentered information at NCBI. Nucleic Acids Res 33: D54–D58 Marchler-Bauer A, Anderson JB, Cherukuri PF, DeWeese-Scott C, Geer LY, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, et al (2004) CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res 33: D192–D196 Pruitt KD, Tatusova T, Maglott DR (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33: D501–D504 Quintero FJ, Garciadeblas B, Rodriguez-Navarro A (1996) The SAL1 gene of Arabidopsis, encoding an enzyme with 3#(2#),5#-bisphosphate nucleotidase and inositol polyphosphate 1-phosphatase activities, increases salt tolerance in yeast. Plant Cell 8: 529–537 Sasaki T, Matsumoto T, Yamamoto K, Sakata K, Baba T, Katayose Y, Wu J, Niimura Y, Cheng Z, Nagamura Y, et al (2002) The genome sequence and structure of rice chromosome 1. Nature 420: 312–316 Schoof H, Zaccaria P, Gundlach H, Lemcke K, Rudd S, Kolesov G, Arnold R, Mewes HW, Mayer KF (2002) MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource based on the first complete plant genome. Nucleic Acids Res 30: 91–93 Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29: 308–311 Tateno Y, Saitou N, Okubo K, Sugawara H, Gojobori T (2004) DDBJ in collaboration with mass-sequencing teams on annotation. Nucleic Acids Res 33: D25–D28 The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815 Van Deynze AE, Nelson JC, O’Donoughue LS, Ahn SN, Siripoonwiwat W, Harrington SE, Yglesias ES, Braga DP, McCouch SR, Sorrells ME (1995) Comparative mapping in grasses: oat relationships. Mol Gen Genet 249: 349–356 Ware D, Jaiswal P, Ni J, Pan X, Chang K, Clark K, Teytelman L, Schmidt S, Zhao W, Cartinhour S, et al (2002) Gramene: a resource for comparative grass genomics. Nucleic Acids Res 30: 103–105 Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Church DM, DiCuccio M, Edgar R, Federhen S, Helmberg W, et al (2005) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 33: D39–D45 Wu CH, Yeh LSL, Huang H, Arminski L, Castro-Alvear J, Chen Y, Hu Z, Kourtesis P, Ledley RS, Suzek BE, et al (2003) The Protein Information Resource. Nucleic Acids Res 31: 345–347 Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, et al (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296: 79–92 Zhong R, Kays SJ, Schroeder BP, Ye ZH (2002) Mutation of a chitinase-like gene causes ectopic deposition of lignin, aberrant cell shapes, and overproduction of ethylene. Plant Cell 14: 165–179 1288 Plant Physiol. Vol. 138, 2005 Downloaded from on August 3, 2017 - Published by www.plantphysiol.org Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Plant Genome Resources at the National Center for Biotechnology