* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 450 Mbp genome of rice, Oryza sativa
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
X-inactivation wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene desert wikipedia , lookup
Genetic engineering wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Copy-number variation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Genetically modified crops wikipedia , lookup
Metagenomics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Gene expression programming wikipedia , lookup
Non-coding DNA wikipedia , lookup
Oncogenomics wikipedia , lookup
Essential gene wikipedia , lookup
Public health genomics wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Transposable element wikipedia , lookup
Designer baby wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Human genome wikipedia , lookup
Genomic library wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Helitron (biology) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Pathogenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome editing wikipedia , lookup
Human Genome Project wikipedia , lookup
Genome (book) wikipedia , lookup
IB404 - 14 - Other plants - March 10 1. The next plant genome was the 450 Mbp genome of rice, Oryza sativa. Several groups contributed to this effort, including two large companies, Syngenta and Monsanto, who produced WGS drafts, a WGS draft by a Chinese genome center, and detailed clone-by-clone efforts by the Japanese. Several conclusions are worth noting: A. Despite about at least 200 Myr divergence between these two major lineages of angiosperms (dicots and monocots), there is still considerable microsynteny between their genes. B. The rice genome is much larger because of many transposon insertions between genes, that is, genes are islands in a sea of transposons. C. The rice and other grass/cereal genomes are essentially colinear, being around 70 Myr old, so rice serves as a model for the others, including maize/corn at 3 Gbp and wheat at 16 Gbp. 2. An enormous effort has gone into sequencing the maize genome, but at ~2.5 Gbp it is nearly the same size as ours, except even more of it is transposons, and these transposons form long stretches of seemingly barren genomic landscape. Like rice, but even more so, genes are islands in this sea of transposons. Various strategies were employed to separate out the gene islands and sequence only them, but eventually the genome was completed, using old-fashioned clone-byclone approach involving sequencing ~17,000 BACs, and published in 2009. It is 85% transposons, and the authors predict around 32,000 genes. The transposon numbers are simply amazing. 3. Like Arabidopsis, rice, and sorghum, the genes are mostly on the chromosome arms of the 10 metacentric chromosomes, with massive pericentromeric regions completely dominated by transposons and other repeats. The first line is exons, the rest are various transposons. 4. Since these are all reasonably young flowering plants or angiosperms, we expect to see most genes shared, especially amongst the three grains, but even across to Arabidopsis the number of shared genes is high, with only a few losses in each lineage, and of course some rapidly evolving lineage-specific genes. This Venn diagram is simplified by looking only at gene families, of which there are ~12,000 per genome (recall that animals have ~10,000 gene families). 5. The soybean (Glycine max) genome was published in 2010. It also took a long time because it is relatively large at around 1 Gbp with tons of transposon repeats, and again has a lot of duplicated genes. The genome is in 20 chromosomes, and extensive genetic and physical maps allowed assigning most scaffolds to these chromosomes. In this image of them, genes are in blue, transposons in green, yellow, and light blue, while the upper grey layer is “none of the above”. Notice that the genes are mostly at the ends of the chromosome arms, with the pericentromeric transposon repeats making up most of the genome. Centromeres are pink. 6. The ~46,000 genes reveal two cycles of polyploidization, estimated by these authors at 13 and 60 Myr ago (which is completely different from the diagram at the end of the last lecture??). In this histogram, the ~31,000 genes that are paralogous are shown as pairs, with the younger polyploidization event having a Ks mode of ~0.15 (15% of possible synonymous changes have occurred), whereas the older event has a Ks mode of ~0.5. The remaining ~15,000 genes are singletons, presumably because their paralog was lost. 7. One of the most important features of soybeans is their production of lipids, with soybean oil being one of the major products. They tried to annotate all the genes possibly involved in lipid metabolism, and came up with 1,157. In every category there are more than in Arabidopsis, presumably due to retention of gene duplicates after the ~13 MYA polyploidization. Other analyses include an extensive discussion of the genes implicated in facilitating and regulating the formation of nodules, the root structures peculiar to legumes which harbor nitrogen-fixing bacteria. 8. Just to show the complexity of these genomes, here are their pie charts of the many different kinds of transcription factors found in these two plant genomes (5,673 genes in 63 families for soybean, fully 12% of their gene catalog). Remember that the MADS family genes are the major regulators of plant form. They do have homeodomain/homeobox TFs, but they have different functions in plants. Soybean is on the left, with roughly double the number of each class of TF. 9. Several other plants have been sequenced, including sorghum, grape, and Populus, and more recently cucumber and strawberry. Our own Ray Ming in Plant Biology led sequencing of the papaya genome, starting when he was working in Hawaii generating transgenic strains resistant to viral infection. They did a fairly rough draft 3X WGS genome assembly with lots of gaps and a total size around 400 Mbp. Unlike all these other plants, it does not appear to have undergone a recent polyploidization, and hence has only ~13,000 genes. Indeed, this is close to the estimate for what the ancestral angiosperm genome looked like. 10. The Populus genome is reminiscent of the soybean genome, with ~45,000 protein-coding genes, and a relatively recent wholegenome duplication or polyploidization event contributing ~8000 paralog pairs that persist in this genome in large microsyntenic chunks. Populus is being promoted as a biofuel. 11. Many plants have large genomes, especially amongst the gymnosperms like pines (15-25 Gbp), so they are unlikely to be completely sequenced any time soon. Instead, large scale sequencing of cDNAs, plus massive scale EST projects are being undertaken, allowing comparisons across all plants. Here is a phylogenomic tree based on large numbers of proteins from both genomes and transcriptomes, rooted with mosses, ferns, and liverworts at the bottom. The authors used the tree and protein dataset to conclude that several aspects of microRNA biology only evolved in the angiosperms, and might have contributed to their diversity.