Download The Map-based Sequence of the Rice Genome

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Copy-number variation wikipedia , lookup

NUMT wikipedia , lookup

Gene desert wikipedia , lookup

Genomic imprinting wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Transposable element wikipedia , lookup

Gene expression programming wikipedia , lookup

Y chromosome wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene wikipedia , lookup

Public health genomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Polyploid wikipedia , lookup

Neocentromere wikipedia , lookup

Pathogenomics wikipedia , lookup

X-inactivation wikipedia , lookup

Microevolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Minimal genome wikipedia , lookup

Designer baby wikipedia , lookup

History of genetic engineering wikipedia , lookup

Human genome wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Human Genome Project wikipedia , lookup

Genomic library wikipedia , lookup

Genome (book) wikipedia , lookup

Genome editing wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
The Map-based Sequence of the Rice Genome
Yue-Ie Hsing
Abstract
Rice, one of the world's most important food plants, has important syntenic relationships with the other cereal species and is a model plant for the grasses. Here we
present a map-based, finished quality sequence that covers 95% of the 389Mb
genome, including virtually all of the euchromatin and two complete centromeres. A
total of 37,544 nontransposable-element-related protein-coding genes were identified,
of which 71% had a putative homologue in Arabidopsis. In a reciprocal analysis, 90%
of the Arabidopsis proteins had a putative homologue in the predicted rice proteome.
Twenty-nine percent of the 37,544 predicted genes appear in clustered gene families.
The number and classes of transposable elements found in the rice genome are consistent with the expansion of syntenic regions in the maize and sorghum genomes.
We find evidence for widespread and recurrent gene transfer from the organelles to
the nuclear chromosomes. The map-based sequence has proven useful for the identification of genes underlying agronomic traits.
Rice (Oryza sativa L.) is the most important food crop in the world
and feeds over half of the global population. The annual rate of global rice
production in recent years has reached a point that is no longer keeping
pace with the growth in number of consumers. Rice production in the next
few decades will face even greater challenges. With a larger and more affluent population, greater demands will come for higher production and betterquality rice. On the other hand, world development means less arable land,
water, and labor to produce the crop, while a sustainable agriculture system
is required. Thus, there are great demands on biotechnology to improve rice
production. Better understanding of the rice genome will facilitate rice
research, which in turn speeds up rice biotechnology. As the first step in a
systematic and complete functional characterization of the rice genome, the
International Rice Genome Sequencing Project (IRGSP) has generated and
analyzed a highly accurate finished sequence of the rice genome that is
anchored to the genetic map. Taiwan joined this consortium, cooperating
with another 9 countries, to decode the rice genome. We worked on the
sequencing work of the entire chromosome 5. Figure 1 illustrates the
IRGSP team.
Institute of Plant and Microbial Biology, Academia Sinica
57
ACADEMIA SINICA
Figure 1. The logo and chromosome assignment of IRGSP. Our team is responsible for the decoding of chromosome 5.
Table 1. The sizes, estimated gaps, and coverage rates of the 12 rice chromosomes.
Chr.
ACADEMIA SINICA
Sequenced
Gaps on arm
Telomeric
Centromeric
Total
Coverage
bases (bp)
regions
gaps (Mb)
gap (Mb)
(Mb)
(%)
No.
Length (Mb)
1
43,260,640
5
0.33
0.06
1.4
45.05
96
2
35,954,074
3
0.1
0.01
0.72
36.78
97.7
3
36,189,985
4
0.96
0.04
0.18
37.37
96.8
4
35,489,479
3
0.46
0.2
--
36.15
98.2
5
29,733,216
4
0.22
0.05
--
30
99.1
6
30,731,386
1
0.02
0.03
0.82
31.6
97.2
7
29,643,843
1
0.31
0.01
0.32
30.28
97.9
8
28,434,680
1
0.09
0.05
--
28.57
99.5
9
22,692,709
4
0.13
0.14
0.62
30.53
74.3
10
22,683,701
4
0.68
0.13
0.47
23.96
94.7
11
28,357,783
4
0.21
0.04
1.9
30.76
92.2
12
27,561,960
0
0
0.05
0.16
27.77
99.2
All
370,733,456
36
3.51
0.81
6.59
388.82
95.3
58
The team started the decoding work in 1999
and finished/published the work in 2005. Our
analysis has revealed several salient features of the
rice genome:
* We provide evidence for a genome size of
389Mb. This size estimation is 260Mb larger than
the fully sequenced dicot plant model Arabidopsis
thaliana. We generated 370 Mb of finished
sequence, representing 95% coverage of the
genome and virtually all of the euchromatic
regions, as listed in Table 1.
* A total of 37,544 non-transposable-element-related protein-coding sequences were detected, compared with 28,000–29,000 in Arabidopsis, with a
lower gene density of one gene per 9.9 kb in rice.
A total of 2,859 genes seem to be unique to rice
and the other cereals, some of which might differentiate monocot and dicot lineages.
* Between 0.38 and 0.43% of the nuclear genome
contains organellar DNA fragments, representing
repeated and ongoing transfer of organellar DNA
to the nuclear genome.
* The transposon content of rice is at least 35% and
is populated by representatives from all known
transposon superfamilies.
* We have identified 80,127 polymorphic sites that
distinguish between two cultivated rice subspecies, japonica and indica, resulting in a highresolution genetic map for rice. Single-nucleotide
polymorphism (SNP) frequency varies from 0.53
to 0.78%.
As to chromosome 5, we sequenced 318
Figure 2. The gene density and T-DNA integration site at
BAC/PAC clones and 288 clones were used to pre-
rice chromosome 5. The bar represents this chromosome,
pare the minimal tiling path. The pseudomolecule is
and the grey scale represents gene density of that region.
29,826,963 bp in length, with 282 clones being in
the completed (PLN) phase with high-quality annotation. This chromosome contains 3687 genes,
8.7% related to transposable elements. The dwarf
and severe dwarf mutants shown in Figure 3 are
two examples of phenotypes controlled by the
genes located at this chromosome. The gene functions were studied in detail and the results pub-
59
ACADEMIA SINICA
lished in Nature.
With the rice genome fully decoded, the post-
rice functional genomic team and are working on
the T-DNA knockout/activated rice population.
genomic era is launched. We have also joined the
Figure 3. The finding of the gid1 gene which is located at center of chromosome 5. Panel D. These three rice plants were germinated and transplanted at the same time. The left plant is the control plant, the middle one is d1 mutant, and the right one is
gid1 plant. The zoom-in photo illustrates that although this gid1 plant is very small, it contains leaves, stems and roots, as the
normal plant does. Both the dwarf mutant d1 and severe dwarf mutant gid1 are caused by changes of sequences in rice chromosome 5. Panel B illustrates how we pulled out the gene. We walked in the chromosome by using genetic markers and finally found the target gene. Panel A shows the sequence of the gid1 gene. This region contains 2450 words, written only with A,
T, C, and G, the genetic codes. It is located at the center region of chromosome 5 and consists of less than 0.01% of the whole
chromosome. While the two yellow regions were cut and pasted, it might be translated into a protein, as shown in Panel C. For
the gid1 mutant, a “G” in the red circle shown in Panel A changed to “A”, which in turn would change one single amino acid,
indicated at the red circle of Panel C, and caused the failure of the protein function. As a consequence, a normal rice plant
would become a severe dwarf plant, the gid1.
The original paper was published in Nature 436 (2005): 793-800.
References:
1. International Rice Genome Sequencing Project. (2005). The map-based sequence of the rice genome. Nature 436, 793-800. Author
lists: Academia Sinica Plant Genome Center ( ASPGC) Teh-Yuan Chow, Hong-Hwa Chen, Mei-Chu Chung, Ching-San Chen, JeiFu Shaw, Hong-Pang Wu, Kwang-Jen Hsiao, Ya-Ting Chao, Mu-kuei Chu, Chia-Hsiung Cheng, Ai-Ling Hour, Pei-Fang Lee, ShuJen Lin, Yao-Cheng Lin, John-Yu Liou, Shu-Mei Liu, Yue-Ie Hsing (Principal Investigator)
2. Ueguchi-Tanaka, M., Ashikari, M., Nakajima M., Itoh H., Katoh E., Kobayashi M., Chow T., Hsing Y., Kitano H., Yamaguchi I.
And Matsuoka M. (2005) Nature 437, 693-698.
ACADEMIA SINICA
60