Download The Map-based Sequence of the Rice Genome

The Map-based Sequence of the Rice Genome Yue-Ie Hsing Abstract Rice, one of the world's most important food plants, has important syntenic relationships with the other cereal species and is a model plant for the grasses. Here we present a map-based, finished quality sequence that covers 95% of the 389Mb genome, including virtually all of the euchromatin and two complete centromeres. A total of 37,544 nontransposable-element-related protein-coding genes were identified, of which 71% had a putative homologue in Arabidopsis. In a reciprocal analysis, 90% of the Arabidopsis proteins had a putative homologue in the predicted rice proteome. Twenty-nine percent of the 37,544 predicted genes appear in clustered gene families. The number and classes of transposable elements found in the rice genome are consistent with the expansion of syntenic regions in the maize and sorghum genomes. We find evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes. The map-based sequence has proven useful for the identification of genes underlying agronomic traits. Rice (Oryza sativa L.) is the most important food crop in the world and feeds over half of the global population. The annual rate of global rice production in recent years has reached a point that is no longer keeping pace with the growth in number of consumers. Rice production in the next few decades will face even greater challenges. With a larger and more affluent population, greater demands will come for higher production and betterquality rice. On the other hand, world development means less arable land, water, and labor to produce the crop, while a sustainable agriculture system is required. Thus, there are great demands on biotechnology to improve rice production. Better understanding of the rice genome will facilitate rice research, which in turn speeds up rice biotechnology. As the first step in a systematic and complete functional characterization of the rice genome, the International Rice Genome Sequencing Project (IRGSP) has generated and analyzed a highly accurate finished sequence of the rice genome that is anchored to the genetic map. Taiwan joined this consortium, cooperating with another 9 countries, to decode the rice genome. We worked on the sequencing work of the entire chromosome 5. Figure 1 illustrates the IRGSP team. Institute of Plant and Microbial Biology, Academia Sinica 57 ACADEMIA SINICA Figure 1. The logo and chromosome assignment of IRGSP. Our team is responsible for the decoding of chromosome 5. Table 1. The sizes, estimated gaps, and coverage rates of the 12 rice chromosomes. Chr. ACADEMIA SINICA Sequenced Gaps on arm Telomeric Centromeric Total Coverage bases (bp) regions gaps (Mb) gap (Mb) (Mb) (%) No. Length (Mb) 1 43,260,640 5 0.33 0.06 1.4 45.05 96 2 35,954,074 3 0.1 0.01 0.72 36.78 97.7 3 36,189,985 4 0.96 0.04 0.18 37.37 96.8 4 35,489,479 3 0.46 0.2 -- 36.15 98.2 5 29,733,216 4 0.22 0.05 -- 30 99.1 6 30,731,386 1 0.02 0.03 0.82 31.6 97.2 7 29,643,843 1 0.31 0.01 0.32 30.28 97.9 8 28,434,680 1 0.09 0.05 -- 28.57 99.5 9 22,692,709 4 0.13 0.14 0.62 30.53 74.3 10 22,683,701 4 0.68 0.13 0.47 23.96 94.7 11 28,357,783 4 0.21 0.04 1.9 30.76 92.2 12 27,561,960 0 0 0.05 0.16 27.77 99.2 All 370,733,456 36 3.51 0.81 6.59 388.82 95.3 58 The team started the decoding work in 1999 and finished/published the work in 2005. Our analysis has revealed several salient features of the rice genome: * We provide evidence for a genome size of 389Mb. This size estimation is 260Mb larger than the fully sequenced dicot plant model Arabidopsis thaliana. We generated 370 Mb of finished sequence, representing 95% coverage of the genome and virtually all of the euchromatic regions, as listed in Table 1. * A total of 37,544 non-transposable-element-related protein-coding sequences were detected, compared with 28,000–29,000 in Arabidopsis, with a lower gene density of one gene per 9.9 kb in rice. A total of 2,859 genes seem to be unique to rice and the other cereals, some of which might differentiate monocot and dicot lineages. * Between 0.38 and 0.43% of the nuclear genome contains organellar DNA fragments, representing repeated and ongoing transfer of organellar DNA to the nuclear genome. * The transposon content of rice is at least 35% and is populated by representatives from all known transposon superfamilies. * We have identified 80,127 polymorphic sites that distinguish between two cultivated rice subspecies, japonica and indica, resulting in a highresolution genetic map for rice. Single-nucleotide polymorphism (SNP) frequency varies from 0.53 to 0.78%. As to chromosome 5, we sequenced 318 Figure 2. The gene density and T-DNA integration site at BAC/PAC clones and 288 clones were used to pre- rice chromosome 5. The bar represents this chromosome, pare the minimal tiling path. The pseudomolecule is and the grey scale represents gene density of that region. 29,826,963 bp in length, with 282 clones being in the completed (PLN) phase with high-quality annotation. This chromosome contains 3687 genes, 8.7% related to transposable elements. The dwarf and severe dwarf mutants shown in Figure 3 are two examples of phenotypes controlled by the genes located at this chromosome. The gene functions were studied in detail and the results pub- 59 ACADEMIA SINICA lished in Nature. With the rice genome fully decoded, the post- rice functional genomic team and are working on the T-DNA knockout/activated rice population. genomic era is launched. We have also joined the Figure 3. The finding of the gid1 gene which is located at center of chromosome 5. Panel D. These three rice plants were germinated and transplanted at the same time. The left plant is the control plant, the middle one is d1 mutant, and the right one is gid1 plant. The zoom-in photo illustrates that although this gid1 plant is very small, it contains leaves, stems and roots, as the normal plant does. Both the dwarf mutant d1 and severe dwarf mutant gid1 are caused by changes of sequences in rice chromosome 5. Panel B illustrates how we pulled out the gene. We walked in the chromosome by using genetic markers and finally found the target gene. Panel A shows the sequence of the gid1 gene. This region contains 2450 words, written only with A, T, C, and G, the genetic codes. It is located at the center region of chromosome 5 and consists of less than 0.01% of the whole chromosome. While the two yellow regions were cut and pasted, it might be translated into a protein, as shown in Panel C. For the gid1 mutant, a “G” in the red circle shown in Panel A changed to “A”, which in turn would change one single amino acid, indicated at the red circle of Panel C, and caused the failure of the protein function. As a consequence, a normal rice plant would become a severe dwarf plant, the gid1. The original paper was published in Nature 436 (2005): 793-800. References: 1. International Rice Genome Sequencing Project. (2005). The map-based sequence of the rice genome. Nature 436, 793-800. Author lists: Academia Sinica Plant Genome Center ( ASPGC) Teh-Yuan Chow, Hong-Hwa Chen, Mei-Chu Chung, Ching-San Chen, JeiFu Shaw, Hong-Pang Wu, Kwang-Jen Hsiao, Ya-Ting Chao, Mu-kuei Chu, Chia-Hsiung Cheng, Ai-Ling Hour, Pei-Fang Lee, ShuJen Lin, Yao-Cheng Lin, John-Yu Liou, Shu-Mei Liu, Yue-Ie Hsing (Principal Investigator) 2. Ueguchi-Tanaka, M., Ashikari, M., Nakajima M., Itoh H., Katoh E., Kobayashi M., Chow T., Hsing Y., Kitano H., Yamaguchi I. And Matsuoka M. (2005) Nature 437, 693-698. ACADEMIA SINICA 60

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download The Map-based Sequence of the Rice Genome